diff options
274 files changed, 10067 insertions, 3215 deletions
diff --git a/Documentation/bpf/bpf_devel_QA.txt b/Documentation/bpf/bpf_devel_QA.txt new file mode 100644 index 000000000000..cefef855dea4 --- /dev/null +++ b/Documentation/bpf/bpf_devel_QA.txt @@ -0,0 +1,519 @@ +This document provides information for the BPF subsystem about various +workflows related to reporting bugs, submitting patches, and queueing +patches for stable kernels. + +For general information about submitting patches, please refer to +Documentation/process/. This document only describes additional specifics +related to BPF. + +Reporting bugs: +--------------- + +Q: How do I report bugs for BPF kernel code? + +A: Since all BPF kernel development as well as bpftool and iproute2 BPF + loader development happens through the netdev kernel mailing list, + please report any found issues around BPF to the following mailing + list: + + netdev@vger.kernel.org + + This may also include issues related to XDP, BPF tracing, etc. + + Given netdev has a high volume of traffic, please also add the BPF + maintainers to Cc (from kernel MAINTAINERS file): + + Alexei Starovoitov <ast@kernel.org> + Daniel Borkmann <daniel@iogearbox.net> + + In case a buggy commit has already been identified, make sure to keep + the actual commit authors in Cc as well for the report. They can + typically be identified through the kernel's git tree. + + Please do *not* report BPF issues to bugzilla.kernel.org since it + is a guarantee that the reported issue will be overlooked. + +Submitting patches: +------------------- + +Q: To which mailing list do I need to submit my BPF patches? + +A: Please submit your BPF patches to the netdev kernel mailing list: + + netdev@vger.kernel.org + + Historically, BPF came out of networking and has always been maintained + by the kernel networking community. Although these days BPF touches + many other subsystems as well, the patches are still routed mainly + through the networking community. + + In case your patch has changes in various different subsystems (e.g. + tracing, security, etc), make sure to Cc the related kernel mailing + lists and maintainers from there as well, so they are able to review + the changes and provide their Acked-by's to the patches. + +Q: Where can I find patches currently under discussion for BPF subsystem? + +A: All patches that are Cc'ed to netdev are queued for review under netdev + patchwork project: + + http://patchwork.ozlabs.org/project/netdev/list/ + + Those patches which target BPF, are assigned to a 'bpf' delegate for + further processing from BPF maintainers. The current queue with + patches under review can be found at: + + https://patchwork.ozlabs.org/project/netdev/list/?delegate=77147 + + Once the patches have been reviewed by the BPF community as a whole + and approved by the BPF maintainers, their status in patchwork will be + changed to 'Accepted' and the submitter will be notified by mail. This + means that the patches look good from a BPF perspective and have been + applied to one of the two BPF kernel trees. + + In case feedback from the community requires a respin of the patches, + their status in patchwork will be set to 'Changes Requested', and purged + from the current review queue. Likewise for cases where patches would + get rejected or are not applicable to the BPF trees (but assigned to + the 'bpf' delegate). + +Q: How do the changes make their way into Linux? + +A: There are two BPF kernel trees (git repositories). Once patches have + been accepted by the BPF maintainers, they will be applied to one + of the two BPF trees: + + https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/ + https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/ + + The bpf tree itself is for fixes only, whereas bpf-next for features, + cleanups or other kind of improvements ("next-like" content). This is + analogous to net and net-next trees for networking. Both bpf and + bpf-next will only have a master branch in order to simplify against + which branch patches should get rebased to. + + Accumulated BPF patches in the bpf tree will regularly get pulled + into the net kernel tree. Likewise, accumulated BPF patches accepted + into the bpf-next tree will make their way into net-next tree. net and + net-next are both run by David S. Miller. From there, they will go + into the kernel mainline tree run by Linus Torvalds. To read up on the + process of net and net-next being merged into the mainline tree, see + the netdev FAQ under: + + Documentation/networking/netdev-FAQ.txt + + Occasionally, to prevent merge conflicts, we might send pull requests + to other trees (e.g. tracing) with a small subset of the patches, but + net and net-next are always the main trees targeted for integration. + + The pull requests will contain a high-level summary of the accumulated + patches and can be searched on netdev kernel mailing list through the + following subject lines (yyyy-mm-dd is the date of the pull request): + + pull-request: bpf yyyy-mm-dd + pull-request: bpf-next yyyy-mm-dd + +Q: How do I indicate which tree (bpf vs. bpf-next) my patch should be + applied to? + +A: The process is the very same as described in the netdev FAQ, so + please read up on it. The subject line must indicate whether the + patch is a fix or rather "next-like" content in order to let the + maintainers know whether it is targeted at bpf or bpf-next. + + For fixes eventually landing in bpf -> net tree, the subject must + look like: + + git format-patch --subject-prefix='PATCH bpf' start..finish + + For features/improvements/etc that should eventually land in + bpf-next -> net-next, the subject must look like: + + git format-patch --subject-prefix='PATCH bpf-next' start..finish + + If unsure whether the patch or patch series should go into bpf + or net directly, or bpf-next or net-next directly, it is not a + problem either if the subject line says net or net-next as target. + It is eventually up to the maintainers to do the delegation of + the patches. + + If it is clear that patches should go into bpf or bpf-next tree, + please make sure to rebase the patches against those trees in + order to reduce potential conflicts. + + In case the patch or patch series has to be reworked and sent out + again in a second or later revision, it is also required to add a + version number (v2, v3, ...) into the subject prefix: + + git format-patch --subject-prefix='PATCH net-next v2' start..finish + + When changes have been requested to the patch series, always send the + whole patch series again with the feedback incorporated (never send + individual diffs on top of the old series). + +Q: What does it mean when a patch gets applied to bpf or bpf-next tree? + +A: It means that the patch looks good for mainline inclusion from + a BPF point of view. + + Be aware that this is not a final verdict that the patch will + automatically get accepted into net or net-next trees eventually: + + On the netdev kernel mailing list reviews can come in at any point + in time. If discussions around a patch conclude that they cannot + get included as-is, we will either apply a follow-up fix or drop + them from the trees entirely. Therefore, we also reserve to rebase + the trees when deemed necessary. After all, the purpose of the tree + is to i) accumulate and stage BPF patches for integration into trees + like net and net-next, and ii) run extensive BPF test suite and + workloads on the patches before they make their way any further. + + Once the BPF pull request was accepted by David S. Miller, then + the patches end up in net or net-next tree, respectively, and + make their way from there further into mainline. Again, see the + netdev FAQ for additional information e.g. on how often they are + merged to mainline. + +Q: How long do I need to wait for feedback on my BPF patches? + +A: We try to keep the latency low. The usual time to feedback will + be around 2 or 3 business days. It may vary depending on the + complexity of changes and current patch load. + +Q: How often do you send pull requests to major kernel trees like + net or net-next? + +A: Pull requests will be sent out rather often in order to not + accumulate too many patches in bpf or bpf-next. + + As a rule of thumb, expect pull requests for each tree regularly + at the end of the week. In some cases pull requests could additionally + come also in the middle of the week depending on the current patch + load or urgency. + +Q: Are patches applied to bpf-next when the merge window is open? + +A: For the time when the merge window is open, bpf-next will not be + processed. This is roughly analogous to net-next patch processing, + so feel free to read up on the netdev FAQ about further details. + + During those two weeks of merge window, we might ask you to resend + your patch series once bpf-next is open again. Once Linus released + a v*-rc1 after the merge window, we continue processing of bpf-next. + + For non-subscribers to kernel mailing lists, there is also a status + page run by David S. Miller on net-next that provides guidance: + + http://vger.kernel.org/~davem/net-next.html + +Q: I made a BPF verifier change, do I need to add test cases for + BPF kernel selftests? + +A: If the patch has changes to the behavior of the verifier, then yes, + it is absolutely necessary to add test cases to the BPF kernel + selftests suite. If they are not present and we think they are + needed, then we might ask for them before accepting any changes. + + In particular, test_verifier.c is tracking a high number of BPF test + cases, including a lot of corner cases that LLVM BPF back end may + generate out of the restricted C code. Thus, adding test cases is + absolutely crucial to make sure future changes do not accidentally + affect prior use-cases. Thus, treat those test cases as: verifier + behavior that is not tracked in test_verifier.c could potentially + be subject to change. + +Q: When should I add code to samples/bpf/ and when to BPF kernel + selftests? + +A: In general, we prefer additions to BPF kernel selftests rather than + samples/bpf/. The rationale is very simple: kernel selftests are + regularly run by various bots to test for kernel regressions. + + The more test cases we add to BPF selftests, the better the coverage + and the less likely it is that those could accidentally break. It is + not that BPF kernel selftests cannot demo how a specific feature can + be used. + + That said, samples/bpf/ may be a good place for people to get started, + so it might be advisable that simple demos of features could go into + samples/bpf/, but advanced functional and corner-case testing rather + into kernel selftests. + + If your sample looks like a test case, then go for BPF kernel selftests + instead! + +Q: When should I add code to the bpftool? + +A: The main purpose of bpftool (under tools/bpf/bpftool/) is to provide + a central user space tool for debugging and introspection of BPF programs + and maps that are active in the kernel. If UAPI changes related to BPF + enable for dumping additional information of programs or maps, then + bpftool should be extended as well to support dumping them. + +Q: When should I add code to iproute2's BPF loader? + +A: For UAPI changes related to the XDP or tc layer (e.g. cls_bpf), the + convention is that those control-path related changes are added to + iproute2's BPF loader as well from user space side. This is not only + useful to have UAPI changes properly designed to be usable, but also + to make those changes available to a wider user base of major + downstream distributions. + +Q: Do you accept patches as well for iproute2's BPF loader? + +A: Patches for the iproute2's BPF loader have to be sent to: + + netdev@vger.kernel.org + + While those patches are not processed by the BPF kernel maintainers, + please keep them in Cc as well, so they can be reviewed. + + The official git repository for iproute2 is run by Stephen Hemminger + and can be found at: + + https://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git/ + + The patches need to have a subject prefix of '[PATCH iproute2 master]' + or '[PATCH iproute2 net-next]'. 'master' or 'net-next' describes the + target branch where the patch should be applied to. Meaning, if kernel + changes went into the net-next kernel tree, then the related iproute2 + changes need to go into the iproute2 net-next branch, otherwise they + can be targeted at master branch. The iproute2 net-next branch will get + merged into the master branch after the current iproute2 version from + master has been released. + + Like BPF, the patches end up in patchwork under the netdev project and + are delegated to 'shemminger' for further processing: + + http://patchwork.ozlabs.org/project/netdev/list/?delegate=389 + +Q: What is the minimum requirement before I submit my BPF patches? + +A: When submitting patches, always take the time and properly test your + patches *prior* to submission. Never rush them! If maintainers find + that your patches have not been properly tested, it is a good way to + get them grumpy. Testing patch submissions is a hard requirement! + + Note, fixes that go to bpf tree *must* have a Fixes: tag included. The + same applies to fixes that target bpf-next, where the affected commit + is in net-next (or in some cases bpf-next). The Fixes: tag is crucial + in order to identify follow-up commits and tremendously helps for people + having to do backporting, so it is a must have! + + We also don't accept patches with an empty commit message. Take your + time and properly write up a high quality commit message, it is + essential! + + Think about it this way: other developers looking at your code a month + from now need to understand *why* a certain change has been done that + way, and whether there have been flaws in the analysis or assumptions + that the original author did. Thus providing a proper rationale and + describing the use-case for the changes is a must. + + Patch submissions with >1 patch must have a cover letter which includes + a high level description of the series. This high level summary will + then be placed into the merge commit by the BPF maintainers such that + it is also accessible from the git log for future reference. + +Q: What do I need to consider when adding a new instruction or feature + that would require BPF JIT and/or LLVM integration as well? + +A: We try hard to keep all BPF JITs up to date such that the same user + experience can be guaranteed when running BPF programs on different + architectures without having the program punt to the less efficient + interpreter in case the in-kernel BPF JIT is enabled. + + If you are unable to implement or test the required JIT changes for + certain architectures, please work together with the related BPF JIT + developers in order to get the feature implemented in a timely manner. + Please refer to the git log (arch/*/net/) to locate the necessary + people for helping out. + + Also always make sure to add BPF test cases (e.g. test_bpf.c and + test_verifier.c) for new instructions, so that they can receive + broad test coverage and help run-time testing the various BPF JITs. + + In case of new BPF instructions, once the changes have been accepted + into the Linux kernel, please implement support into LLVM's BPF back + end. See LLVM section below for further information. + +Stable submission: +------------------ + +Q: I need a specific BPF commit in stable kernels. What should I do? + +A: In case you need a specific fix in stable kernels, first check whether + the commit has already been applied in the related linux-*.y branches: + + https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/ + + If not the case, then drop an email to the BPF maintainers with the + netdev kernel mailing list in Cc and ask for the fix to be queued up: + + netdev@vger.kernel.org + + The process in general is the same as on netdev itself, see also the + netdev FAQ document. + +Q: Do you also backport to kernels not currently maintained as stable? + +A: No. If you need a specific BPF commit in kernels that are currently not + maintained by the stable maintainers, then you are on your own. + + The current stable and longterm stable kernels are all listed here: + + https://www.kernel.org/ + +Q: The BPF patch I am about to submit needs to go to stable as well. What + should I do? + +A: The same rules apply as with netdev patch submissions in general, see + netdev FAQ under: + + Documentation/networking/netdev-FAQ.txt + + Never add "Cc: stable@vger.kernel.org" to the patch description, but + ask the BPF maintainers to queue the patches instead. This can be done + with a note, for example, under the "---" part of the patch which does + not go into the git log. Alternatively, this can be done as a simple + request by mail instead. + +Q: Where do I find currently queued BPF patches that will be submitted + to stable? + +A: Once patches that fix critical bugs got applied into the bpf tree, they + are queued up for stable submission under: + + http://patchwork.ozlabs.org/bundle/bpf/stable/?state=* + + They will be on hold there at minimum until the related commit made its + way into the mainline kernel tree. + + After having been under broader exposure, the queued patches will be + submitted by the BPF maintainers to the stable maintainers. + +Testing patches: +---------------- + +Q: Which BPF kernel selftests version should I run my kernel against? + +A: If you run a kernel xyz, then always run the BPF kernel selftests from + that kernel xyz as well. Do not expect that the BPF selftest from the + latest mainline tree will pass all the time. + + In particular, test_bpf.c and test_verifier.c have a large number of + test cases and are constantly updated with new BPF test sequences, or + existing ones are adapted to verifier changes e.g. due to verifier + becoming smarter and being able to better track certain things. + +LLVM: +----- + +Q: Where do I find LLVM with BPF support? + +A: The BPF back end for LLVM is upstream in LLVM since version 3.7.1. + + All major distributions these days ship LLVM with BPF back end enabled, + so for the majority of use-cases it is not required to compile LLVM by + hand anymore, just install the distribution provided package. + + LLVM's static compiler lists the supported targets through 'llc --version', + make sure BPF targets are listed. Example: + + $ llc --version + LLVM (http://llvm.org/): + LLVM version 6.0.0svn + Optimized build. + Default target: x86_64-unknown-linux-gnu + Host CPU: skylake + + Registered Targets: + bpf - BPF (host endian) + bpfeb - BPF (big endian) + bpfel - BPF (little endian) + x86 - 32-bit X86: Pentium-Pro and above + x86-64 - 64-bit X86: EM64T and AMD64 + + For developers in order to utilize the latest features added to LLVM's + BPF back end, it is advisable to run the latest LLVM releases. Support + for new BPF kernel features such as additions to the BPF instruction + set are often developed together. + + All LLVM releases can be found at: http://releases.llvm.org/ + +Q: Got it, so how do I build LLVM manually anyway? + +A: You need cmake and gcc-c++ as build requisites for LLVM. Once you have + that set up, proceed with building the latest LLVM and clang version + from the git repositories: + + $ git clone http://llvm.org/git/llvm.git + $ cd llvm/tools + $ git clone --depth 1 http://llvm.org/git/clang.git + $ cd ..; mkdir build; cd build + $ cmake .. -DLLVM_TARGETS_TO_BUILD="BPF;X86" \ + -DBUILD_SHARED_LIBS=OFF \ + -DCMAKE_BUILD_TYPE=Release \ + -DLLVM_BUILD_RUNTIME=OFF + $ make -j $(getconf _NPROCESSORS_ONLN) + + The built binaries can then be found in the build/bin/ directory, where + you can point the PATH variable to. + +Q: Should I notify BPF kernel maintainers about issues in LLVM's BPF code + generation back end or about LLVM generated code that the verifier + refuses to accept? + +A: Yes, please do! LLVM's BPF back end is a key piece of the whole BPF + infrastructure and it ties deeply into verification of programs from the + kernel side. Therefore, any issues on either side need to be investigated + and fixed whenever necessary. + + Therefore, please make sure to bring them up at netdev kernel mailing + list and Cc BPF maintainers for LLVM and kernel bits: + + Yonghong Song <yhs@fb.com> + Alexei Starovoitov <ast@kernel.org> + Daniel Borkmann <daniel@iogearbox.net> + + LLVM also has an issue tracker where BPF related bugs can be found: + + https://bugs.llvm.org/buglist.cgi?quicksearch=bpf + + However, it is better to reach out through mailing lists with having + maintainers in Cc. + +Q: I have added a new BPF instruction to the kernel, how can I integrate + it into LLVM? + +A: LLVM has a -mcpu selector for the BPF back end in order to allow the + selection of BPF instruction set extensions. By default the 'generic' + processor target is used, which is the base instruction set (v1) of BPF. + + LLVM has an option to select -mcpu=probe where it will probe the host + kernel for supported BPF instruction set extensions and selects the + optimal set automatically. + + For cross-compilation, a specific version can be select manually as well. + + $ llc -march bpf -mcpu=help + Available CPUs for this target: + + generic - Select the generic processor. + probe - Select the probe processor. + v1 - Select the v1 processor. + v2 - Select the v2 processor. + [...] + + Newly added BPF instructions to the Linux kernel need to follow the same + scheme, bump the instruction set version and implement probing for the + extensions such that -mcpu=probe users can benefit from the optimization + transparently when upgrading their kernels. + + If you are unable to implement support for the newly added BPF instruction + please reach out to BPF developers for help. + + By the way, the BPF kernel selftests run with -mcpu=probe for better + test coverage. + +Happy BPF hacking! diff --git a/Documentation/devicetree/bindings/net/can/fsl-flexcan.txt b/Documentation/devicetree/bindings/net/can/fsl-flexcan.txt index 56d6cc336e1c..bfc0c433654f 100644 --- a/Documentation/devicetree/bindings/net/can/fsl-flexcan.txt +++ b/Documentation/devicetree/bindings/net/can/fsl-flexcan.txt @@ -18,6 +18,12 @@ Optional properties: - xceiver-supply: Regulator that powers the CAN transceiver +- big-endian: This means the registers of FlexCAN controller are big endian. + This is optional property.i.e. if this property is not present in + device tree node then controller is assumed to be little endian. + if this property is present then controller is assumed to be big + endian. + Example: can@1c000 { diff --git a/Documentation/devicetree/bindings/net/ieee802154/adf7242.txt b/Documentation/devicetree/bindings/net/ieee802154/adf7242.txt index dea5124cdc52..d24172cc6d32 100644 --- a/Documentation/devicetree/bindings/net/ieee802154/adf7242.txt +++ b/Documentation/devicetree/bindings/net/ieee802154/adf7242.txt @@ -1,7 +1,7 @@ * ADF7242 IEEE 802.15.4 * Required properties: - - compatible: should be "adi,adf7242" + - compatible: should be "adi,adf7242", "adi,adf7241" - spi-max-frequency: maximal bus speed (12.5 MHz) - reg: the chipselect index - interrupts: the interrupt generated by the device via pin IRQ1. diff --git a/Documentation/devicetree/bindings/net/phy.txt b/Documentation/devicetree/bindings/net/phy.txt index 77d0b2a61ffa..c05479f5ac7c 100644 --- a/Documentation/devicetree/bindings/net/phy.txt +++ b/Documentation/devicetree/bindings/net/phy.txt @@ -53,6 +53,8 @@ Optional Properties: to ensure the integrated PHY is used. The absence of this property indicates the muxers should be configured so that the external PHY is used. +- reset-gpios: The GPIO phandle and specifier for the PHY reset signal. + Example: ethernet-phy@0 { diff --git a/Documentation/networking/dsa/dsa.txt b/Documentation/networking/dsa/dsa.txt index b8b40753133e..25170ad7d25b 100644 --- a/Documentation/networking/dsa/dsa.txt +++ b/Documentation/networking/dsa/dsa.txt @@ -385,11 +385,6 @@ Switch configuration avoid relying on what a previous software agent such as a bootloader/firmware may have previously configured. -- set_addr: Some switches require the programming of the management interface's - Ethernet MAC address, switch drivers can also disable ageing of MAC addresses - on the management interface and "hardcode"/"force" this MAC address for the - CPU/management interface as an optimization - PHY devices and link management ------------------------------- diff --git a/Documentation/networking/ieee802154.txt b/Documentation/networking/ieee802154.txt index 057e9fdbfac9..e74d8e1da0e2 100644 --- a/Documentation/networking/ieee802154.txt +++ b/Documentation/networking/ieee802154.txt @@ -97,6 +97,46 @@ The include/net/mac802154.h defines following functions: - void ieee802154_unregister_hw(struct ieee802154_hw *hw): freeing registered PHY + - void ieee802154_rx_irqsafe(struct ieee802154_hw *hw, struct sk_buff *skb, + u8 lqi): + telling 802.15.4 module there is a new received frame in the skb with + the RF Link Quality Indicator (LQI) from the hardware device + + - void ieee802154_xmit_complete(struct ieee802154_hw *hw, struct sk_buff *skb, + bool ifs_handling): + telling 802.15.4 module the frame in the skb is or going to be + transmitted through the hardware device + +The device driver must implement the following callbacks in the IEEE 802.15.4 +operations structure at least: +struct ieee802154_ops { + ... + int (*start)(struct ieee802154_hw *hw); + void (*stop)(struct ieee802154_hw *hw); + ... + int (*xmit_async)(struct ieee802154_hw *hw, struct sk_buff *skb); + int (*ed)(struct ieee802154_hw *hw, u8 *level); + int (*set_channel)(struct ieee802154_hw *hw, u8 page, u8 channel); + ... +}; + + - int start(struct ieee802154_hw *hw): + handler that 802.15.4 module calls for the hardware device initialization. + + - void stop(struct ieee802154_hw *hw): + handler that 802.15.4 module calls for the hardware device cleanup. + + - int xmit_async(struct ieee802154_hw *hw, struct sk_buff *skb): + handler that 802.15.4 module calls for each frame in the skb going to be + transmitted through the hardware device. + + - int ed(struct ieee802154_hw *hw, u8 *level): + handler that 802.15.4 module calls for Energy Detection from the hardware + device. + + - int set_channel(struct ieee802154_hw *hw, u8 page, u8 channel): + set radio for listening on specific channel of the hardware device. + Moreover IEEE 802.15.4 device operations structure should be filled. Fake drivers diff --git a/Documentation/networking/kapi.rst b/Documentation/networking/kapi.rst index 580289f345da..f03ae64be8bc 100644 --- a/Documentation/networking/kapi.rst +++ b/Documentation/networking/kapi.rst @@ -145,3 +145,27 @@ PHY Support .. kernel-doc:: drivers/net/phy/mdio_bus.c :internal: + +PHYLINK +------- + + PHYLINK interfaces traditional network drivers with PHYLIB, fixed-links, + and SFF modules (eg, hot-pluggable SFP) that may contain PHYs. PHYLINK + provides management of the link state and link modes. + +.. kernel-doc:: include/linux/phylink.h + :internal: + +.. kernel-doc:: drivers/net/phy/phylink.c + +SFP support +----------- + +.. kernel-doc:: drivers/net/phy/sfp-bus.c + :internal: + +.. kernel-doc:: include/linux/sfp.h + :internal: + +.. kernel-doc:: drivers/net/phy/sfp-bus.c + :export: diff --git a/MAINTAINERS b/MAINTAINERS index 9e0045e3ee0c..a5c0fd3bed32 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2725,12 +2725,16 @@ M: Alexei Starovoitov <ast@kernel.org> M: Daniel Borkmann <daniel@iogearbox.net> L: netdev@vger.kernel.org L: linux-kernel@vger.kernel.org +T: git git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git +T: git git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git S: Supported F: arch/x86/net/bpf_jit* F: Documentation/networking/filter.txt F: Documentation/bpf/ F: include/linux/bpf* F: include/linux/filter.h +F: include/trace/events/bpf.h +F: include/trace/events/xdp.h F: include/uapi/linux/bpf* F: include/uapi/linux/filter.h F: kernel/bpf/ @@ -9600,6 +9604,11 @@ NETWORKING [WIRELESS] L: linux-wireless@vger.kernel.org Q: http://patchwork.kernel.org/project/linux-wireless/list/ +NETDEVSIM +M: Jakub Kicinski <jakub.kicinski@netronome.com> +S: Maintained +F: drivers/net/netdevsim/* + NETXEN (1/10) GbE SUPPORT M: Manish Chopra <manish.chopra@cavium.com> M: Rahul Verma <rahul.verma@cavium.com> diff --git a/arch/arm/boot/dts/imx25.dtsi b/arch/arm/boot/dts/imx25.dtsi index 09ce8b81fafa..fcaff1c66bcb 100644 --- a/arch/arm/boot/dts/imx25.dtsi +++ b/arch/arm/boot/dts/imx25.dtsi @@ -122,7 +122,7 @@ }; can1: can@43f88000 { - compatible = "fsl,imx25-flexcan", "fsl,p1010-flexcan"; + compatible = "fsl,imx25-flexcan"; reg = <0x43f88000 0x4000>; interrupts = <43>; clocks = <&clks 75>, <&clks 75>; @@ -131,7 +131,7 @@ }; can2: can@43f8c000 { - compatible = "fsl,imx25-flexcan", "fsl,p1010-flexcan"; + compatible = "fsl,imx25-flexcan"; reg = <0x43f8c000 0x4000>; interrupts = <44>; clocks = <&clks 76>, <&clks 76>; diff --git a/arch/arm/boot/dts/imx28.dtsi b/arch/arm/boot/dts/imx28.dtsi index 2f4ebe0318d3..e52e05c0fe56 100644 --- a/arch/arm/boot/dts/imx28.dtsi +++ b/arch/arm/boot/dts/imx28.dtsi @@ -1038,7 +1038,7 @@ }; can0: can@80032000 { - compatible = "fsl,imx28-flexcan", "fsl,p1010-flexcan"; + compatible = "fsl,imx28-flexcan"; reg = <0x80032000 0x2000>; interrupts = <8>; clocks = <&clks 58>, <&clks 58>; @@ -1047,7 +1047,7 @@ }; can1: can@80034000 { - compatible = "fsl,imx28-flexcan", "fsl,p1010-flexcan"; + compatible = "fsl,imx28-flexcan"; reg = <0x80034000 0x2000>; interrupts = <9>; clocks = <&clks 59>, <&clks 59>; diff --git a/arch/arm/boot/dts/imx35.dtsi b/arch/arm/boot/dts/imx35.dtsi index 6d5e6a60bee7..1f0e2203b576 100644 --- a/arch/arm/boot/dts/imx35.dtsi +++ b/arch/arm/boot/dts/imx35.dtsi @@ -303,7 +303,7 @@ }; can1: can@53fe4000 { - compatible = "fsl,imx35-flexcan", "fsl,p1010-flexcan"; + compatible = "fsl,imx35-flexcan"; reg = <0x53fe4000 0x1000>; clocks = <&clks 33>, <&clks 33>; clock-names = "ipg", "per"; @@ -312,7 +312,7 @@ }; can2: can@53fe8000 { - compatible = "fsl,imx35-flexcan", "fsl,p1010-flexcan"; + compatible = "fsl,imx35-flexcan"; reg = <0x53fe8000 0x1000>; clocks = <&clks 34>, <&clks 34>; clock-names = "ipg", "per"; diff --git a/arch/arm/boot/dts/imx53.dtsi b/arch/arm/boot/dts/imx53.dtsi index 589a67c5f796..a5a050703320 100644 --- a/arch/arm/boot/dts/imx53.dtsi +++ b/arch/arm/boot/dts/imx53.dtsi @@ -545,7 +545,7 @@ }; can1: can@53fc8000 { - compatible = "fsl,imx53-flexcan", "fsl,p1010-flexcan"; + compatible = "fsl,imx53-flexcan"; reg = <0x53fc8000 0x4000>; interrupts = <82>; clocks = <&clks IMX5_CLK_CAN1_IPG_GATE>, @@ -555,7 +555,7 @@ }; can2: can@53fcc000 { - compatible = "fsl,imx53-flexcan", "fsl,p1010-flexcan"; + compatible = "fsl,imx53-flexcan"; reg = <0x53fcc000 0x4000>; interrupts = <83>; clocks = <&clks IMX5_CLK_CAN2_IPG_GATE>, diff --git a/arch/arm/boot/dts/ls1021a-qds.dts b/arch/arm/boot/dts/ls1021a-qds.dts index 940875316d0f..4f211e3c903a 100644 --- a/arch/arm/boot/dts/ls1021a-qds.dts +++ b/arch/arm/boot/dts/ls1021a-qds.dts @@ -331,3 +331,19 @@ &uart1 { status = "okay"; }; + +&can0 { + status = "okay"; +}; + +&can1 { + status = "okay"; +}; + +&can2 { + status = "disabled"; +}; + +&can3 { + status = "disabled"; +}; diff --git a/arch/arm/boot/dts/ls1021a-twr.dts b/arch/arm/boot/dts/ls1021a-twr.dts index a8b148ad1dd2..7202d9c504be 100644 --- a/arch/arm/boot/dts/ls1021a-twr.dts +++ b/arch/arm/boot/dts/ls1021a-twr.dts @@ -243,3 +243,19 @@ &uart1 { status = "okay"; }; + +&can0 { + status = "okay"; +}; + +&can1 { + status = "okay"; +}; + +&can2 { + status = "disabled"; +}; + +&can3 { + status = "disabled"; +}; diff --git a/arch/arm/boot/dts/ls1021a.dtsi b/arch/arm/boot/dts/ls1021a.dtsi index 9319e1f0f1d8..7789031898b0 100644 --- a/arch/arm/boot/dts/ls1021a.dtsi +++ b/arch/arm/boot/dts/ls1021a.dtsi @@ -730,5 +730,41 @@ <0000 0 0 3 &gic GIC_SPI 191 IRQ_TYPE_LEVEL_HIGH>, <0000 0 0 4 &gic GIC_SPI 193 IRQ_TYPE_LEVEL_HIGH>; }; + + can0: can@2a70000 { + compatible = "fsl,ls1021ar2-flexcan"; + reg = <0x0 0x2a70000 0x0 0x1000>; + interrupts = <GIC_SPI 126 IRQ_TYPE_LEVEL_HIGH>; + clocks = <&clockgen 4 1>, <&clockgen 4 1>; + clock-names = "ipg", "per"; + big-endian; + }; + + can1: can@2a80000 { + compatible = "fsl,ls1021ar2-flexcan"; + reg = <0x0 0x2a80000 0x0 0x1000>; + interrupts = <GIC_SPI 127 IRQ_TYPE_LEVEL_HIGH>; + clocks = <&clockgen 4 1>, <&clockgen 4 1>; + clock-names = "ipg", "per"; + big-endian; + }; + + can2: can@2a90000 { + compatible = "fsl,ls1021ar2-flexcan"; + reg = <0x0 0x2a90000 0x0 0x1000>; + interrupts = <GIC_SPI 128 IRQ_TYPE_LEVEL_HIGH>; + clocks = <&clockgen 4 1>, <&clockgen 4 1>; + clock-names = "ipg", "per"; + big-endian; + }; + + can3: can@2aa0000 { + compatible = "fsl,ls1021ar2-flexcan"; + reg = <0x0 0x2aa0000 0x0 0x1000>; + interrupts = <GIC_SPI 129 IRQ_TYPE_LEVEL_HIGH>; + clocks = <&clockgen 4 1>, <&clockgen 4 1>; + clock-names = "ipg", "per"; + big-endian; + }; }; }; diff --git a/arch/powerpc/boot/dts/fsl/p1010si-post.dtsi b/arch/powerpc/boot/dts/fsl/p1010si-post.dtsi index af12ead88c5f..1b4aafc1f6a2 100644 --- a/arch/powerpc/boot/dts/fsl/p1010si-post.dtsi +++ b/arch/powerpc/boot/dts/fsl/p1010si-post.dtsi @@ -137,12 +137,14 @@ compatible = "fsl,p1010-flexcan"; reg = <0x1c000 0x1000>; interrupts = <48 0x2 0 0>; + big-endian; }; can1: can@1d000 { compatible = "fsl,p1010-flexcan"; reg = <0x1d000 0x1000>; interrupts = <61 0x2 0 0>; + big-endian; }; L2: l2-cache-controller@20000 { diff --git a/drivers/atm/eni.c b/drivers/atm/eni.c index ce47eb17901d..6470e3c4c990 100644 --- a/drivers/atm/eni.c +++ b/drivers/atm/eni.c @@ -473,7 +473,7 @@ static int do_rx_dma(struct atm_vcc *vcc,struct sk_buff *skb, ENI_PRV_POS(skb) = eni_vcc->descr+size+1; skb_queue_tail(&eni_dev->rx_queue,skb); eni_vcc->last = skb; -rx_enqueued++; + rx_enqueued++; } eni_vcc->descr = here; eni_out(dma_wr,MID_DMA_WR_RX); @@ -715,7 +715,7 @@ static void get_service(struct atm_dev *dev) else eni_dev->slow = vcc; eni_dev->last_slow = vcc; } -putting++; + putting++; ENI_VCC(vcc)->servicing++; } } @@ -744,7 +744,7 @@ static void dequeue_rx(struct atm_dev *dev) } EVENT("dequeued (size=%ld,pos=0x%lx)\n",ENI_PRV_SIZE(skb), ENI_PRV_POS(skb)); -rx_dequeued++; + rx_dequeued++; vcc = ATM_SKB(skb)->vcc; eni_vcc = ENI_VCC(vcc); first = 0; @@ -1174,7 +1174,7 @@ DPRINTK("doing direct send\n"); /* @@@ well, this doesn't work anyway */ DPRINTK("dma_wr set to %d, tx_pos is now %ld\n",dma_wr,tx->tx_pos); eni_out(dma_wr,MID_DMA_WR_TX); skb_queue_tail(&eni_dev->tx_queue,skb); -queued++; + queued++; return enq_ok; } @@ -1195,7 +1195,7 @@ static void poll_tx(struct atm_dev *dev) if (res == enq_ok) continue; DPRINTK("re-queuing TX PDU\n"); skb_queue_head(&tx->backlog,skb); -requeued++; + requeued++; if (res == enq_jam) return; break; } @@ -1232,7 +1232,7 @@ static void dequeue_tx(struct atm_dev *dev) else dev_kfree_skb_irq(skb); atomic_inc(&vcc->stats->tx); wake_up(&eni_dev->tx_wait); -dma_complete++; + dma_complete++; } } @@ -1555,7 +1555,7 @@ static void eni_tasklet(unsigned long data) } if (events & MID_TX_COMPLETE) { EVENT("INT: TX COMPLETE\n",0,0); -tx_complete++; + tx_complete++; wake_up(&eni_dev->tx_wait); /* poll_rx ? */ } @@ -2069,14 +2069,14 @@ static int eni_send(struct atm_vcc *vcc,struct sk_buff *skb) } *(u32 *) skb->data = htonl(*(u32 *) skb->data); } -submitted++; + submitted++; ATM_SKB(skb)->vcc = vcc; tasklet_disable(&ENI_DEV(vcc->dev)->task); res = do_tx(skb); tasklet_enable(&ENI_DEV(vcc->dev)->task); if (res == enq_ok) return 0; skb_queue_tail(&ENI_VCC(vcc)->tx->backlog,skb); -backlogged++; + backlogged++; tasklet_schedule(&ENI_DEV(vcc->dev)->task); return 0; } diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c index 12eb8caa4263..50e071444a5c 100644 --- a/drivers/hv/ring_buffer.c +++ b/drivers/hv/ring_buffer.c @@ -140,6 +140,29 @@ static u32 hv_copyto_ringbuffer( return start_write_offset; } +/* + * + * hv_get_ringbuffer_availbytes() + * + * Get number of bytes available to read and to write to + * for the specified ring buffer + */ +static void +hv_get_ringbuffer_availbytes(const struct hv_ring_buffer_info *rbi, + u32 *read, u32 *write) +{ + u32 read_loc, write_loc, dsize; + + /* Capture the read/write indices before they changed */ + read_loc = READ_ONCE(rbi->ring_buffer->read_index); + write_loc = READ_ONCE(rbi->ring_buffer->write_index); + dsize = rbi->ring_datasize; + + *write = write_loc >= read_loc ? dsize - (write_loc - read_loc) : + read_loc - write_loc; + *read = dsize - *write; +} + /* Get various debug metrics for the specified ring buffer. */ void hv_ringbuffer_get_debuginfo(const struct hv_ring_buffer_info *ring_info, struct hv_ring_buffer_debug_info *debug_info) diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index 0936da592e12..944ec3c9282c 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -497,4 +497,15 @@ config THUNDERBOLT_NET source "drivers/net/hyperv/Kconfig" +config NETDEVSIM + tristate "Simulated networking device" + depends on DEBUG_FS + help + This driver is a developer testing tool and software model that can + be used to test various control path networking APIs, especially + HW-offload related. + + To compile this driver as a module, choose M here: the module + will be called netdevsim. + endif # NETDEVICES diff --git a/drivers/net/Makefile b/drivers/net/Makefile index 766f62d02a0b..04c3b747812c 100644 --- a/drivers/net/Makefile +++ b/drivers/net/Makefile @@ -78,3 +78,4 @@ obj-$(CONFIG_FUJITSU_ES) += fjes/ thunderbolt-net-y += thunderbolt.o obj-$(CONFIG_THUNDERBOLT_NET) += thunderbolt-net.o +obj-$(CONFIG_NETDEVSIM) += netdevsim/ diff --git a/drivers/net/can/c_can/c_can_pci.c b/drivers/net/can/c_can/c_can_pci.c index d065c0e2d18e..406b4847e5dc 100644 --- a/drivers/net/can/c_can/c_can_pci.c +++ b/drivers/net/can/c_can/c_can_pci.c @@ -251,14 +251,14 @@ static void c_can_pci_remove(struct pci_dev *pdev) pci_disable_device(pdev); } -static struct c_can_pci_data c_can_sta2x11= { +static const struct c_can_pci_data c_can_sta2x11= { .type = BOSCH_C_CAN, .reg_align = C_CAN_REG_ALIGN_32, .freq = 52000000, /* 52 Mhz */ .bar = 0, }; -static struct c_can_pci_data c_can_pch = { +static const struct c_can_pci_data c_can_pch = { .type = BOSCH_C_CAN, .reg_align = C_CAN_REG_32, .freq = 50000000, /* 50 MHz */ diff --git a/drivers/net/can/flexcan.c b/drivers/net/can/flexcan.c index 0626dcfd1f3d..3cd371c94e83 100644 --- a/drivers/net/can/flexcan.c +++ b/drivers/net/can/flexcan.c @@ -190,6 +190,7 @@ * MX53 FlexCAN2 03.00.00.00 yes no no no no * MX6s FlexCAN3 10.00.12.00 yes yes no no yes * VF610 FlexCAN3 ? no yes no yes yes? + * LS1021A FlexCAN2 03.00.04.00 no yes no no yes * * Some SOCs do not have the RX_WARN & TX_WARN interrupt line connected. */ @@ -279,6 +280,10 @@ struct flexcan_priv { struct clk *clk_per; const struct flexcan_devtype_data *devtype_data; struct regulator *reg_xceiver; + + /* Read and Write APIs */ + u32 (*read)(void __iomem *addr); + void (*write)(u32 val, void __iomem *addr); }; static const struct flexcan_devtype_data fsl_p1010_devtype_data = { @@ -301,6 +306,12 @@ static const struct flexcan_devtype_data fsl_vf610_devtype_data = { FLEXCAN_QUIRK_BROKEN_PERR_STATE, }; +static const struct flexcan_devtype_data fsl_ls1021a_r2_devtype_data = { + .quirks = FLEXCAN_QUIRK_DISABLE_RXFG | FLEXCAN_QUIRK_ENABLE_EACEN_RRS | + FLEXCAN_QUIRK_DISABLE_MECR | FLEXCAN_QUIRK_BROKEN_PERR_STATE | + FLEXCAN_QUIRK_USE_OFF_TIMESTAMP, +}; + static const struct can_bittiming_const flexcan_bittiming_const = { .name = DRV_NAME, .tseg1_min = 4, @@ -313,39 +324,45 @@ static const struct can_bittiming_const flexcan_bittiming_const = { .brp_inc = 1, }; -/* Abstract off the read/write for arm versus ppc. This - * assumes that PPC uses big-endian registers and everything - * else uses little-endian registers, independent of CPU - * endianness. +/* FlexCAN module is essentially modelled as a little-endian IP in most + * SoCs, i.e the registers as well as the message buffer areas are + * implemented in a little-endian fashion. + * + * However there are some SoCs (e.g. LS1021A) which implement the FlexCAN + * module in a big-endian fashion (i.e the registers as well as the + * message buffer areas are implemented in a big-endian way). + * + * In addition, the FlexCAN module can be found on SoCs having ARM or + * PPC cores. So, we need to abstract off the register read/write + * functions, ensuring that these cater to all the combinations of module + * endianness and underlying CPU endianness. */ -#if defined(CONFIG_PPC) -static inline u32 flexcan_read(void __iomem *addr) +static inline u32 flexcan_read_be(void __iomem *addr) { - return in_be32(addr); + return ioread32be(addr); } -static inline void flexcan_write(u32 val, void __iomem *addr) +static inline void flexcan_write_be(u32 val, void __iomem *addr) { - out_be32(addr, val); + iowrite32be(val, addr); } -#else -static inline u32 flexcan_read(void __iomem *addr) + +static inline u32 flexcan_read_le(void __iomem *addr) { - return readl(addr); + return ioread32(addr); } -static inline void flexcan_write(u32 val, void __iomem *addr) +static inline void flexcan_write_le(u32 val, void __iomem *addr) { - writel(val, addr); + iowrite32(val, addr); } -#endif static inline void flexcan_error_irq_enable(const struct flexcan_priv *priv) { struct flexcan_regs __iomem *regs = priv->regs; u32 reg_ctrl = (priv->reg_ctrl_default | FLEXCAN_CTRL_ERR_MSK); - flexcan_write(reg_ctrl, ®s->ctrl); + priv->write(reg_ctrl, ®s->ctrl); } static inline void flexcan_error_irq_disable(const struct flexcan_priv *priv) @@ -353,7 +370,7 @@ static inline void flexcan_error_irq_disable(const struct flexcan_priv *priv) struct flexcan_regs __iomem *regs = priv->regs; u32 reg_ctrl = (priv->reg_ctrl_default & ~FLEXCAN_CTRL_ERR_MSK); - flexcan_write(reg_ctrl, ®s->ctrl); + priv->write(reg_ctrl, ®s->ctrl); } static inline int flexcan_transceiver_enable(const struct flexcan_priv *priv) @@ -378,14 +395,14 @@ static int flexcan_chip_enable(struct flexcan_priv *priv) unsigned int timeout = FLEXCAN_TIMEOUT_US / 10; u32 reg; - reg = flexcan_read(®s->mcr); + reg = priv->read(®s->mcr); reg &= ~FLEXCAN_MCR_MDIS; - flexcan_write(reg, ®s->mcr); + priv->write(reg, ®s->mcr); - while (timeout-- && (flexcan_read(®s->mcr) & FLEXCAN_MCR_LPM_ACK)) + while (timeout-- && (priv->read(®s->mcr) & FLEXCAN_MCR_LPM_ACK)) udelay(10); - if (flexcan_read(®s->mcr) & FLEXCAN_MCR_LPM_ACK) + if (priv->read(®s->mcr) & FLEXCAN_MCR_LPM_ACK) return -ETIMEDOUT; return 0; @@ -397,14 +414,14 @@ static int flexcan_chip_disable(struct flexcan_priv *priv) unsigned int timeout = FLEXCAN_TIMEOUT_US / 10; u32 reg; - reg = flexcan_read(®s->mcr); + reg = priv->read(®s->mcr); reg |= FLEXCAN_MCR_MDIS; - flexcan_write(reg, ®s->mcr); + priv->write(reg, ®s->mcr); - while (timeout-- && !(flexcan_read(®s->mcr) & FLEXCAN_MCR_LPM_ACK)) + while (timeout-- && !(priv->read(®s->mcr) & FLEXCAN_MCR_LPM_ACK)) udelay(10); - if (!(flexcan_read(®s->mcr) & FLEXCAN_MCR_LPM_ACK)) + if (!(priv->read(®s->mcr) & FLEXCAN_MCR_LPM_ACK)) return -ETIMEDOUT; return 0; @@ -416,14 +433,14 @@ static int flexcan_chip_freeze(struct flexcan_priv *priv) unsigned int timeout = 1000 * 1000 * 10 / priv->can.bittiming.bitrate; u32 reg; - reg = flexcan_read(®s->mcr); + reg = priv->read(®s->mcr); reg |= FLEXCAN_MCR_HALT; - flexcan_write(reg, ®s->mcr); + priv->write(reg, ®s->mcr); - while (timeout-- && !(flexcan_read(®s->mcr) & FLEXCAN_MCR_FRZ_ACK)) + while (timeout-- && !(priv->read(®s->mcr) & FLEXCAN_MCR_FRZ_ACK)) udelay(100); - if (!(flexcan_read(®s->mcr) & FLEXCAN_MCR_FRZ_ACK)) + if (!(priv->read(®s->mcr) & FLEXCAN_MCR_FRZ_ACK)) return -ETIMEDOUT; return 0; @@ -435,14 +452,14 @@ static int flexcan_chip_unfreeze(struct flexcan_priv *priv) unsigned int timeout = FLEXCAN_TIMEOUT_US / 10; u32 reg; - reg = flexcan_read(®s->mcr); + reg = priv->read(®s->mcr); reg &= ~FLEXCAN_MCR_HALT; - flexcan_write(reg, ®s->mcr); + priv->write(reg, ®s->mcr); - while (timeout-- && (flexcan_read(®s->mcr) & FLEXCAN_MCR_FRZ_ACK)) + while (timeout-- && (priv->read(®s->mcr) & FLEXCAN_MCR_FRZ_ACK)) udelay(10); - if (flexcan_read(®s->mcr) & FLEXCAN_MCR_FRZ_ACK) + if (priv->read(®s->mcr) & FLEXCAN_MCR_FRZ_ACK) return -ETIMEDOUT; return 0; @@ -453,11 +470,11 @@ static int flexcan_chip_softreset(struct flexcan_priv *priv) struct flexcan_regs __iomem *regs = priv->regs; unsigned int timeout = FLEXCAN_TIMEOUT_US / 10; - flexcan_write(FLEXCAN_MCR_SOFTRST, ®s->mcr); - while (timeout-- && (flexcan_read(®s->mcr) & FLEXCAN_MCR_SOFTRST)) + priv->write(FLEXCAN_MCR_SOFTRST, ®s->mcr); + while (timeout-- && (priv->read(®s->mcr) & FLEXCAN_MCR_SOFTRST)) udelay(10); - if (flexcan_read(®s->mcr) & FLEXCAN_MCR_SOFTRST) + if (priv->read(®s->mcr) & FLEXCAN_MCR_SOFTRST) return -ETIMEDOUT; return 0; @@ -468,7 +485,7 @@ static int __flexcan_get_berr_counter(const struct net_device *dev, { const struct flexcan_priv *priv = netdev_priv(dev); struct flexcan_regs __iomem *regs = priv->regs; - u32 reg = flexcan_read(®s->ecr); + u32 reg = priv->read(®s->ecr); bec->txerr = (reg >> 0) & 0xff; bec->rxerr = (reg >> 8) & 0xff; @@ -524,24 +541,24 @@ static int flexcan_start_xmit(struct sk_buff *skb, struct net_device *dev) if (cf->can_dlc > 0) { data = be32_to_cpup((__be32 *)&cf->data[0]); - flexcan_write(data, &priv->tx_mb->data[0]); + priv->write(data, &priv->tx_mb->data[0]); } if (cf->can_dlc > 3) { data = be32_to_cpup((__be32 *)&cf->data[4]); - flexcan_write(data, &priv->tx_mb->data[1]); + priv->write(data, &priv->tx_mb->data[1]); } can_put_echo_skb(skb, dev, 0); - flexcan_write(can_id, &priv->tx_mb->can_id); - flexcan_write(ctrl, &priv->tx_mb->can_ctrl); + priv->write(can_id, &priv->tx_mb->can_id); + priv->write(ctrl, &priv->tx_mb->can_ctrl); /* Errata ERR005829 step8: * Write twice INACTIVE(0x8) code to first MB. */ - flexcan_write(FLEXCAN_MB_CODE_TX_INACTIVE, + priv->write(FLEXCAN_MB_CODE_TX_INACTIVE, &priv->tx_mb_reserved->can_ctrl); - flexcan_write(FLEXCAN_MB_CODE_TX_INACTIVE, + priv->write(FLEXCAN_MB_CODE_TX_INACTIVE, &priv->tx_mb_reserved->can_ctrl); return NETDEV_TX_OK; @@ -660,7 +677,7 @@ static unsigned int flexcan_mailbox_read(struct can_rx_offload *offload, u32 code; do { - reg_ctrl = flexcan_read(&mb->can_ctrl); + reg_ctrl = priv->read(&mb->can_ctrl); } while (reg_ctrl & FLEXCAN_MB_CODE_RX_BUSY_BIT); /* is this MB empty? */ @@ -675,17 +692,17 @@ static unsigned int flexcan_mailbox_read(struct can_rx_offload *offload, offload->dev->stats.rx_errors++; } } else { - reg_iflag1 = flexcan_read(®s->iflag1); + reg_iflag1 = priv->read(®s->iflag1); if (!(reg_iflag1 & FLEXCAN_IFLAG_RX_FIFO_AVAILABLE)) return 0; - reg_ctrl = flexcan_read(&mb->can_ctrl); + reg_ctrl = priv->read(&mb->can_ctrl); } /* increase timstamp to full 32 bit */ *timestamp = reg_ctrl << 16; - reg_id = flexcan_read(&mb->can_id); + reg_id = priv->read(&mb->can_id); if (reg_ctrl & FLEXCAN_MB_CNT_IDE) cf->can_id = ((reg_id >> 0) & CAN_EFF_MASK) | CAN_EFF_FLAG; else @@ -695,19 +712,19 @@ static unsigned int flexcan_mailbox_read(struct can_rx_offload *offload, cf->can_id |= CAN_RTR_FLAG; cf->can_dlc = get_can_dlc((reg_ctrl >> 16) & 0xf); - *(__be32 *)(cf->data + 0) = cpu_to_be32(flexcan_read(&mb->data[0])); - *(__be32 *)(cf->data + 4) = cpu_to_be32(flexcan_read(&mb->data[1])); + *(__be32 *)(cf->data + 0) = cpu_to_be32(priv->read(&mb->data[0])); + *(__be32 *)(cf->data + 4) = cpu_to_be32(priv->read(&mb->data[1])); /* mark as read */ if (priv->devtype_data->quirks & FLEXCAN_QUIRK_USE_OFF_TIMESTAMP) { /* Clear IRQ */ if (n < 32) - flexcan_write(BIT(n), ®s->iflag1); + priv->write(BIT(n), ®s->iflag1); else - flexcan_write(BIT(n - 32), ®s->iflag2); + priv->write(BIT(n - 32), ®s->iflag2); } else { - flexcan_write(FLEXCAN_IFLAG_RX_FIFO_AVAILABLE, ®s->iflag1); - flexcan_read(®s->timer); + priv->write(FLEXCAN_IFLAG_RX_FIFO_AVAILABLE, ®s->iflag1); + priv->read(®s->timer); } return 1; @@ -719,8 +736,8 @@ static inline u64 flexcan_read_reg_iflag_rx(struct flexcan_priv *priv) struct flexcan_regs __iomem *regs = priv->regs; u32 iflag1, iflag2; - iflag2 = flexcan_read(®s->iflag2) & priv->reg_imask2_default; - iflag1 = flexcan_read(®s->iflag1) & priv->reg_imask1_default & + iflag2 = priv->read(®s->iflag2) & priv->reg_imask2_default; + iflag1 = priv->read(®s->iflag1) & priv->reg_imask1_default & ~FLEXCAN_IFLAG_MB(priv->tx_mb_idx); return (u64)iflag2 << 32 | iflag1; @@ -736,7 +753,7 @@ static irqreturn_t flexcan_irq(int irq, void *dev_id) u32 reg_iflag1, reg_esr; enum can_state last_state = priv->can.state; - reg_iflag1 = flexcan_read(®s->iflag1); + reg_iflag1 = priv->read(®s->iflag1); /* reception interrupt */ if (priv->devtype_data->quirks & FLEXCAN_QUIRK_USE_OFF_TIMESTAMP) { @@ -759,7 +776,8 @@ static irqreturn_t flexcan_irq(int irq, void *dev_id) /* FIFO overflow interrupt */ if (reg_iflag1 & FLEXCAN_IFLAG_RX_FIFO_OVERFLOW) { handled = IRQ_HANDLED; - flexcan_write(FLEXCAN_IFLAG_RX_FIFO_OVERFLOW, ®s->iflag1); + priv->write(FLEXCAN_IFLAG_RX_FIFO_OVERFLOW, + ®s->iflag1); dev->stats.rx_over_errors++; dev->stats.rx_errors++; } @@ -773,18 +791,18 @@ static irqreturn_t flexcan_irq(int irq, void *dev_id) can_led_event(dev, CAN_LED_EVENT_TX); /* after sending a RTR frame MB is in RX mode */ - flexcan_write(FLEXCAN_MB_CODE_TX_INACTIVE, - &priv->tx_mb->can_ctrl); - flexcan_write(FLEXCAN_IFLAG_MB(priv->tx_mb_idx), ®s->iflag1); + priv->write(FLEXCAN_MB_CODE_TX_INACTIVE, + &priv->tx_mb->can_ctrl); + priv->write(FLEXCAN_IFLAG_MB(priv->tx_mb_idx), ®s->iflag1); netif_wake_queue(dev); } - reg_esr = flexcan_read(®s->esr); + reg_esr = priv->read(®s->esr); /* ACK all bus error and state change IRQ sources */ if (reg_esr & FLEXCAN_ESR_ALL_INT) { handled = IRQ_HANDLED; - flexcan_write(reg_esr & FLEXCAN_ESR_ALL_INT, ®s->esr); + priv->write(reg_esr & FLEXCAN_ESR_ALL_INT, ®s->esr); } /* state change interrupt or broken error state quirk fix is enabled */ @@ -846,7 +864,7 @@ static void flexcan_set_bittiming(struct net_device *dev) struct flexcan_regs __iomem *regs = priv->regs; u32 reg; - reg = flexcan_read(®s->ctrl); + reg = priv->read(®s->ctrl); reg &= ~(FLEXCAN_CTRL_PRESDIV(0xff) | FLEXCAN_CTRL_RJW(0x3) | FLEXCAN_CTRL_PSEG1(0x7) | @@ -870,11 +888,11 @@ static void flexcan_set_bittiming(struct net_device *dev) reg |= FLEXCAN_CTRL_SMP; netdev_dbg(dev, "writing ctrl=0x%08x\n", reg); - flexcan_write(reg, ®s->ctrl); + priv->write(reg, ®s->ctrl); /* print chip status */ netdev_dbg(dev, "%s: mcr=0x%08x ctrl=0x%08x\n", __func__, - flexcan_read(®s->mcr), flexcan_read(®s->ctrl)); + priv->read(®s->mcr), priv->read(®s->ctrl)); } /* flexcan_chip_start @@ -913,7 +931,7 @@ static int flexcan_chip_start(struct net_device *dev) * choose format C * set max mailbox number */ - reg_mcr = flexcan_read(®s->mcr); + reg_mcr = priv->read(®s->mcr); reg_mcr &= ~FLEXCAN_MCR_MAXMB(0xff); reg_mcr |= FLEXCAN_MCR_FRZ | FLEXCAN_MCR_HALT | FLEXCAN_MCR_SUPV | FLEXCAN_MCR_WRN_EN | FLEXCAN_MCR_SRX_DIS | FLEXCAN_MCR_IRMQ | @@ -927,7 +945,7 @@ static int flexcan_chip_start(struct net_device *dev) FLEXCAN_MCR_MAXMB(priv->tx_mb_idx); } netdev_dbg(dev, "%s: writing mcr=0x%08x", __func__, reg_mcr); - flexcan_write(reg_mcr, ®s->mcr); + priv->write(reg_mcr, ®s->mcr); /* CTRL * @@ -940,7 +958,7 @@ static int flexcan_chip_start(struct net_device *dev) * enable bus off interrupt * (== FLEXCAN_CTRL_ERR_STATE) */ - reg_ctrl = flexcan_read(®s->ctrl); + reg_ctrl = priv->read(®s->ctrl); reg_ctrl &= ~FLEXCAN_CTRL_TSYN; reg_ctrl |= FLEXCAN_CTRL_BOFF_REC | FLEXCAN_CTRL_LBUF | FLEXCAN_CTRL_ERR_STATE; @@ -960,45 +978,45 @@ static int flexcan_chip_start(struct net_device *dev) /* leave interrupts disabled for now */ reg_ctrl &= ~FLEXCAN_CTRL_ERR_ALL; netdev_dbg(dev, "%s: writing ctrl=0x%08x", __func__, reg_ctrl); - flexcan_write(reg_ctrl, ®s->ctrl); + priv->write(reg_ctrl, ®s->ctrl); if ((priv->devtype_data->quirks & FLEXCAN_QUIRK_ENABLE_EACEN_RRS)) { - reg_ctrl2 = flexcan_read(®s->ctrl2); + reg_ctrl2 = priv->read(®s->ctrl2); reg_ctrl2 |= FLEXCAN_CTRL2_EACEN | FLEXCAN_CTRL2_RRS; - flexcan_write(reg_ctrl2, ®s->ctrl2); + priv->write(reg_ctrl2, ®s->ctrl2); } /* clear and invalidate all mailboxes first */ for (i = priv->tx_mb_idx; i < ARRAY_SIZE(regs->mb); i++) { - flexcan_write(FLEXCAN_MB_CODE_RX_INACTIVE, - ®s->mb[i].can_ctrl); + priv->write(FLEXCAN_MB_CODE_RX_INACTIVE, + ®s->mb[i].can_ctrl); } if (priv->devtype_data->quirks & FLEXCAN_QUIRK_USE_OFF_TIMESTAMP) { for (i = priv->offload.mb_first; i <= priv->offload.mb_last; i++) - flexcan_write(FLEXCAN_MB_CODE_RX_EMPTY, - ®s->mb[i].can_ctrl); + priv->write(FLEXCAN_MB_CODE_RX_EMPTY, + ®s->mb[i].can_ctrl); } /* Errata ERR005829: mark first TX mailbox as INACTIVE */ - flexcan_write(FLEXCAN_MB_CODE_TX_INACTIVE, - &priv->tx_mb_reserved->can_ctrl); + priv->write(FLEXCAN_MB_CODE_TX_INACTIVE, + &priv->tx_mb_reserved->can_ctrl); /* mark TX mailbox as INACTIVE */ - flexcan_write(FLEXCAN_MB_CODE_TX_INACTIVE, - &priv->tx_mb->can_ctrl); + priv->write(FLEXCAN_MB_CODE_TX_INACTIVE, + &priv->tx_mb->can_ctrl); /* acceptance mask/acceptance code (accept everything) */ - flexcan_write(0x0, ®s->rxgmask); - flexcan_write(0x0, ®s->rx14mask); - flexcan_write(0x0, ®s->rx15mask); + priv->write(0x0, ®s->rxgmask); + priv->write(0x0, ®s->rx14mask); + priv->write(0x0, ®s->rx15mask); if (priv->devtype_data->quirks & FLEXCAN_QUIRK_DISABLE_RXFG) - flexcan_write(0x0, ®s->rxfgmask); + priv->write(0x0, ®s->rxfgmask); /* clear acceptance filters */ for (i = 0; i < ARRAY_SIZE(regs->mb); i++) - flexcan_write(0, ®s->rximr[i]); + priv->write(0, ®s->rximr[i]); /* On Vybrid, disable memory error detection interrupts * and freeze mode. @@ -1011,16 +1029,16 @@ static int flexcan_chip_start(struct net_device *dev) * and Correction of Memory Errors" to write to * MECR register */ - reg_ctrl2 = flexcan_read(®s->ctrl2); + reg_ctrl2 = priv->read(®s->ctrl2); reg_ctrl2 |= FLEXCAN_CTRL2_ECRWRE; - flexcan_write(reg_ctrl2, ®s->ctrl2); + priv->write(reg_ctrl2, ®s->ctrl2); - reg_mecr = flexcan_read(®s->mecr); + reg_mecr = priv->read(®s->mecr); reg_mecr &= ~FLEXCAN_MECR_ECRWRDIS; - flexcan_write(reg_mecr, ®s->mecr); + priv->write(reg_mecr, ®s->mecr); reg_mecr &= ~(FLEXCAN_MECR_NCEFAFRZ | FLEXCAN_MECR_HANCEI_MSK | FLEXCAN_MECR_FANCEI_MSK); - flexcan_write(reg_mecr, ®s->mecr); + priv->write(reg_mecr, ®s->mecr); } err = flexcan_transceiver_enable(priv); @@ -1036,14 +1054,14 @@ static int flexcan_chip_start(struct net_device *dev) /* enable interrupts atomically */ disable_irq(dev->irq); - flexcan_write(priv->reg_ctrl_default, ®s->ctrl); - flexcan_write(priv->reg_imask1_default, ®s->imask1); - flexcan_write(priv->reg_imask2_default, ®s->imask2); + priv->write(priv->reg_ctrl_default, ®s->ctrl); + priv->write(priv->reg_imask1_default, ®s->imask1); + priv->write(priv->reg_imask2_default, ®s->imask2); enable_irq(dev->irq); /* print chip status */ netdev_dbg(dev, "%s: reading mcr=0x%08x ctrl=0x%08x\n", __func__, - flexcan_read(®s->mcr), flexcan_read(®s->ctrl)); + priv->read(®s->mcr), priv->read(®s->ctrl)); return 0; @@ -1068,10 +1086,10 @@ static void flexcan_chip_stop(struct net_device *dev) flexcan_chip_disable(priv); /* Disable all interrupts */ - flexcan_write(0, ®s->imask2); - flexcan_write(0, ®s->imask1); - flexcan_write(priv->reg_ctrl_default & ~FLEXCAN_CTRL_ERR_ALL, - ®s->ctrl); + priv->write(0, ®s->imask2); + priv->write(0, ®s->imask1); + priv->write(priv->reg_ctrl_default & ~FLEXCAN_CTRL_ERR_ALL, + ®s->ctrl); flexcan_transceiver_disable(priv); priv->can.state = CAN_STATE_STOPPED; @@ -1186,26 +1204,26 @@ static int register_flexcandev(struct net_device *dev) err = flexcan_chip_disable(priv); if (err) goto out_disable_per; - reg = flexcan_read(®s->ctrl); + reg = priv->read(®s->ctrl); reg |= FLEXCAN_CTRL_CLK_SRC; - flexcan_write(reg, ®s->ctrl); + priv->write(reg, ®s->ctrl); err = flexcan_chip_enable(priv); if (err) goto out_chip_disable; /* set freeze, halt and activate FIFO, restrict register access */ - reg = flexcan_read(®s->mcr); + reg = priv->read(®s->mcr); reg |= FLEXCAN_MCR_FRZ | FLEXCAN_MCR_HALT | FLEXCAN_MCR_FEN | FLEXCAN_MCR_SUPV; - flexcan_write(reg, ®s->mcr); + priv->write(reg, ®s->mcr); /* Currently we only support newer versions of this core * featuring a RX hardware FIFO (although this driver doesn't * make use of it on some cores). Older cores, found on some * Coldfire derivates are not tested. */ - reg = flexcan_read(®s->mcr); + reg = priv->read(®s->mcr); if (!(reg & FLEXCAN_MCR_FEN)) { netdev_err(dev, "Could not enable RX FIFO, unsupported core\n"); err = -ENODEV; @@ -1233,8 +1251,12 @@ static void unregister_flexcandev(struct net_device *dev) static const struct of_device_id flexcan_of_match[] = { { .compatible = "fsl,imx6q-flexcan", .data = &fsl_imx6q_devtype_data, }, { .compatible = "fsl,imx28-flexcan", .data = &fsl_imx28_devtype_data, }, + { .compatible = "fsl,imx53-flexcan", .data = &fsl_p1010_devtype_data, }, + { .compatible = "fsl,imx35-flexcan", .data = &fsl_p1010_devtype_data, }, + { .compatible = "fsl,imx25-flexcan", .data = &fsl_p1010_devtype_data, }, { .compatible = "fsl,p1010-flexcan", .data = &fsl_p1010_devtype_data, }, { .compatible = "fsl,vf610-flexcan", .data = &fsl_vf610_devtype_data, }, + { .compatible = "fsl,ls1021ar2-flexcan", .data = &fsl_ls1021a_r2_devtype_data, }, { /* sentinel */ }, }; MODULE_DEVICE_TABLE(of, flexcan_of_match); @@ -1314,6 +1336,21 @@ static int flexcan_probe(struct platform_device *pdev) dev->flags |= IFF_ECHO; priv = netdev_priv(dev); + + if (of_property_read_bool(pdev->dev.of_node, "big-endian")) { + priv->read = flexcan_read_be; + priv->write = flexcan_write_be; + } else { + if (of_device_is_compatible(pdev->dev.of_node, + "fsl,p1010-flexcan")) { + priv->read = flexcan_read_be; + priv->write = flexcan_write_be; + } else { + priv->read = flexcan_read_le; + priv->write = flexcan_write_le; + } + } + priv->can.clock.freq = clock_freq; priv->can.bittiming_const = &flexcan_bittiming_const; priv->can.do_set_mode = flexcan_set_mode; diff --git a/drivers/net/can/usb/peak_usb/pcan_usb.c b/drivers/net/can/usb/peak_usb/pcan_usb.c index 25a9b79cc42d..f530a80f5051 100644 --- a/drivers/net/can/usb/peak_usb/pcan_usb.c +++ b/drivers/net/can/usb/peak_usb/pcan_usb.c @@ -408,7 +408,6 @@ static int pcan_usb_decode_error(struct pcan_usb_msg_context *mc, u8 n, { struct sk_buff *skb; struct can_frame *cf; - struct timeval tv; enum can_state new_state; /* ignore this error until 1st ts received */ @@ -525,8 +524,8 @@ static int pcan_usb_decode_error(struct pcan_usb_msg_context *mc, u8 n, if (status_len & PCAN_USB_STATUSLEN_TIMESTAMP) { struct skb_shared_hwtstamps *hwts = skb_hwtstamps(skb); - peak_usb_get_ts_tv(&mc->pdev->time_ref, mc->ts16, &tv); - hwts->hwtstamp = timeval_to_ktime(tv); + peak_usb_get_ts_time(&mc->pdev->time_ref, mc->ts16, + &hwts->hwtstamp); } mc->netdev->stats.rx_packets++; @@ -610,7 +609,6 @@ static int pcan_usb_decode_data(struct pcan_usb_msg_context *mc, u8 status_len) u8 rec_len = status_len & PCAN_USB_STATUSLEN_DLC; struct sk_buff *skb; struct can_frame *cf; - struct timeval tv; struct skb_shared_hwtstamps *hwts; skb = alloc_can_skb(mc->netdev, &cf); @@ -658,9 +656,8 @@ static int pcan_usb_decode_data(struct pcan_usb_msg_context *mc, u8 status_len) } /* convert timestamp into kernel time */ - peak_usb_get_ts_tv(&mc->pdev->time_ref, mc->ts16, &tv); hwts = skb_hwtstamps(skb); - hwts->hwtstamp = timeval_to_ktime(tv); + peak_usb_get_ts_time(&mc->pdev->time_ref, mc->ts16, &hwts->hwtstamp); /* update statistics */ mc->netdev->stats.rx_packets++; diff --git a/drivers/net/can/usb/peak_usb/pcan_usb_core.c b/drivers/net/can/usb/peak_usb/pcan_usb_core.c index 1ca76e03e965..8f699ee6a528 100644 --- a/drivers/net/can/usb/peak_usb/pcan_usb_core.c +++ b/drivers/net/can/usb/peak_usb/pcan_usb_core.c @@ -80,21 +80,6 @@ void peak_usb_init_time_ref(struct peak_time_ref *time_ref, } } -static void peak_usb_add_us(struct timeval *tv, u32 delta_us) -{ - /* number of s. to add to final time */ - u32 delta_s = delta_us / 1000000; - - delta_us -= delta_s * 1000000; - - tv->tv_usec += delta_us; - if (tv->tv_usec >= 1000000) { - tv->tv_usec -= 1000000; - delta_s++; - } - tv->tv_sec += delta_s; -} - /* * sometimes, another now may be more recent than current one... */ @@ -103,7 +88,7 @@ void peak_usb_update_ts_now(struct peak_time_ref *time_ref, u32 ts_now) time_ref->ts_dev_2 = ts_now; /* should wait at least two passes before computing */ - if (time_ref->tv_host.tv_sec > 0) { + if (ktime_to_ns(time_ref->tv_host) > 0) { u32 delta_ts = time_ref->ts_dev_2 - time_ref->ts_dev_1; if (time_ref->ts_dev_2 < time_ref->ts_dev_1) @@ -118,26 +103,26 @@ void peak_usb_update_ts_now(struct peak_time_ref *time_ref, u32 ts_now) */ void peak_usb_set_ts_now(struct peak_time_ref *time_ref, u32 ts_now) { - if (time_ref->tv_host_0.tv_sec == 0) { + if (ktime_to_ns(time_ref->tv_host_0) == 0) { /* use monotonic clock to correctly compute further deltas */ - time_ref->tv_host_0 = ktime_to_timeval(ktime_get()); - time_ref->tv_host.tv_sec = 0; + time_ref->tv_host_0 = ktime_get(); + time_ref->tv_host = ktime_set(0, 0); } else { /* - * delta_us should not be >= 2^32 => delta_s should be < 4294 + * delta_us should not be >= 2^32 => delta should be < 4294s * handle 32-bits wrapping here: if count of s. reaches 4200, * reset counters and change time base */ - if (time_ref->tv_host.tv_sec != 0) { - u32 delta_s = time_ref->tv_host.tv_sec - - time_ref->tv_host_0.tv_sec; - if (delta_s > 4200) { + if (ktime_to_ns(time_ref->tv_host)) { + ktime_t delta = ktime_sub(time_ref->tv_host, + time_ref->tv_host_0); + if (ktime_to_ns(delta) > (4200ull * NSEC_PER_SEC)) { time_ref->tv_host_0 = time_ref->tv_host; time_ref->ts_total = 0; } } - time_ref->tv_host = ktime_to_timeval(ktime_get()); + time_ref->tv_host = ktime_get(); time_ref->tick_count++; } @@ -146,13 +131,12 @@ void peak_usb_set_ts_now(struct peak_time_ref *time_ref, u32 ts_now) } /* - * compute timeval according to current ts and time_ref data + * compute time according to current ts and time_ref data */ -void peak_usb_get_ts_tv(struct peak_time_ref *time_ref, u32 ts, - struct timeval *tv) +void peak_usb_get_ts_time(struct peak_time_ref *time_ref, u32 ts, ktime_t *time) { - /* protect from getting timeval before setting now */ - if (time_ref->tv_host.tv_sec > 0) { + /* protect from getting time before setting now */ + if (ktime_to_ns(time_ref->tv_host)) { u64 delta_us; delta_us = ts - time_ref->ts_dev_2; @@ -164,10 +148,9 @@ void peak_usb_get_ts_tv(struct peak_time_ref *time_ref, u32 ts, delta_us *= time_ref->adapter->us_per_ts_scale; delta_us >>= time_ref->adapter->us_per_ts_shift; - *tv = time_ref->tv_host_0; - peak_usb_add_us(tv, (u32)delta_us); + *time = ktime_add_us(time_ref->tv_host_0, delta_us); } else { - *tv = ktime_to_timeval(ktime_get()); + *time = ktime_get(); } } @@ -178,10 +161,8 @@ int peak_usb_netif_rx(struct sk_buff *skb, struct peak_time_ref *time_ref, u32 ts_low, u32 ts_high) { struct skb_shared_hwtstamps *hwts = skb_hwtstamps(skb); - struct timeval tv; - peak_usb_get_ts_tv(time_ref, ts_low, &tv); - hwts->hwtstamp = timeval_to_ktime(tv); + peak_usb_get_ts_time(time_ref, ts_low, &hwts->hwtstamp); return netif_rx(skb); } diff --git a/drivers/net/can/usb/peak_usb/pcan_usb_core.h b/drivers/net/can/usb/peak_usb/pcan_usb_core.h index c01316cac354..29f03dccca10 100644 --- a/drivers/net/can/usb/peak_usb/pcan_usb_core.h +++ b/drivers/net/can/usb/peak_usb/pcan_usb_core.h @@ -96,7 +96,7 @@ extern const struct peak_usb_adapter pcan_usb_pro_fd; extern const struct peak_usb_adapter pcan_usb_x6; struct peak_time_ref { - struct timeval tv_host_0, tv_host; + ktime_t tv_host_0, tv_host; u32 ts_dev_1, ts_dev_2; u64 ts_total; u32 tick_count; @@ -151,8 +151,7 @@ void peak_usb_init_time_ref(struct peak_time_ref *time_ref, const struct peak_usb_adapter *adapter); void peak_usb_update_ts_now(struct peak_time_ref *time_ref, u32 ts_now); void peak_usb_set_ts_now(struct peak_time_ref *time_ref, u32 ts_now); -void peak_usb_get_ts_tv(struct peak_time_ref *time_ref, u32 ts, - struct timeval *tv); +void peak_usb_get_ts_time(struct peak_time_ref *time_ref, u32 ts, ktime_t *tv); int peak_usb_netif_rx(struct sk_buff *skb, struct peak_time_ref *time_ref, u32 ts_low, u32 ts_high); void peak_usb_async_complete(struct urb *urb); diff --git a/drivers/net/can/usb/peak_usb/pcan_usb_pro.c b/drivers/net/can/usb/peak_usb/pcan_usb_pro.c index bbdd6058cd2f..0105fbfea273 100644 --- a/drivers/net/can/usb/peak_usb/pcan_usb_pro.c +++ b/drivers/net/can/usb/peak_usb/pcan_usb_pro.c @@ -531,7 +531,6 @@ static int pcan_usb_pro_handle_canmsg(struct pcan_usb_pro_interface *usb_if, struct net_device *netdev = dev->netdev; struct can_frame *can_frame; struct sk_buff *skb; - struct timeval tv; struct skb_shared_hwtstamps *hwts; skb = alloc_can_skb(netdev, &can_frame); @@ -549,9 +548,9 @@ static int pcan_usb_pro_handle_canmsg(struct pcan_usb_pro_interface *usb_if, else memcpy(can_frame->data, rx->data, can_frame->can_dlc); - peak_usb_get_ts_tv(&usb_if->time_ref, le32_to_cpu(rx->ts32), &tv); hwts = skb_hwtstamps(skb); - hwts->hwtstamp = timeval_to_ktime(tv); + peak_usb_get_ts_time(&usb_if->time_ref, le32_to_cpu(rx->ts32), + &hwts->hwtstamp); netdev->stats.rx_packets++; netdev->stats.rx_bytes += can_frame->can_dlc; @@ -571,7 +570,6 @@ static int pcan_usb_pro_handle_error(struct pcan_usb_pro_interface *usb_if, enum can_state new_state = CAN_STATE_ERROR_ACTIVE; u8 err_mask = 0; struct sk_buff *skb; - struct timeval tv; struct skb_shared_hwtstamps *hwts; /* nothing should be sent while in BUS_OFF state */ @@ -667,9 +665,8 @@ static int pcan_usb_pro_handle_error(struct pcan_usb_pro_interface *usb_if, dev->can.state = new_state; - peak_usb_get_ts_tv(&usb_if->time_ref, le32_to_cpu(er->ts32), &tv); hwts = skb_hwtstamps(skb); - hwts->hwtstamp = timeval_to_ktime(tv); + peak_usb_get_ts_time(&usb_if->time_ref, le32_to_cpu(er->ts32), &hwts->hwtstamp); netdev->stats.rx_packets++; netdev->stats.rx_bytes += can_frame->can_dlc; netif_rx(skb); diff --git a/drivers/net/can/vxcan.c b/drivers/net/can/vxcan.c index 8404e8852a0f..5d1753cfacea 100644 --- a/drivers/net/can/vxcan.c +++ b/drivers/net/can/vxcan.c @@ -227,10 +227,8 @@ static int vxcan_newlink(struct net *net, struct net_device *dev, netif_carrier_off(peer); err = rtnl_configure_link(peer, ifmp); - if (err < 0) { - unregister_netdevice(peer); - return err; - } + if (err < 0) + goto unregister_network_device; /* register first device */ if (tb[IFLA_IFNAME]) @@ -239,10 +237,8 @@ static int vxcan_newlink(struct net *net, struct net_device *dev, snprintf(dev->name, IFNAMSIZ, DRV_NAME "%%d"); err = register_netdevice(dev); - if (err < 0) { - unregister_netdevice(peer); - return err; - } + if (err < 0) + goto unregister_network_device; netif_carrier_off(dev); @@ -254,6 +250,10 @@ static int vxcan_newlink(struct net *net, struct net_device *dev, rcu_assign_pointer(priv->peer, dev); return 0; + +unregister_network_device: + unregister_netdevice(peer); + return err; } static void vxcan_dellink(struct net_device *dev, struct list_head *head) diff --git a/drivers/net/dsa/Kconfig b/drivers/net/dsa/Kconfig index 83a9bc892a3b..2b81b97e994f 100644 --- a/drivers/net/dsa/Kconfig +++ b/drivers/net/dsa/Kconfig @@ -33,7 +33,7 @@ config NET_DSA_MT7530 config NET_DSA_MV88E6060 tristate "Marvell 88E6060 ethernet switch chip support" - depends on NET_DSA + depends on NET_DSA && NET_DSA_LEGACY select NET_DSA_TAG_TRAILER ---help--- This enables support for the Marvell 88E6060 ethernet switch diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c index f5a8dd96fd75..561b05089cb6 100644 --- a/drivers/net/dsa/b53/b53_common.c +++ b/drivers/net/dsa/b53/b53_common.c @@ -1029,8 +1029,7 @@ int b53_vlan_filtering(struct dsa_switch *ds, int port, bool vlan_filtering) EXPORT_SYMBOL(b53_vlan_filtering); int b53_vlan_prepare(struct dsa_switch *ds, int port, - const struct switchdev_obj_port_vlan *vlan, - struct switchdev_trans *trans) + const struct switchdev_obj_port_vlan *vlan) { struct b53_device *dev = ds->priv; @@ -1047,8 +1046,7 @@ int b53_vlan_prepare(struct dsa_switch *ds, int port, EXPORT_SYMBOL(b53_vlan_prepare); void b53_vlan_add(struct dsa_switch *ds, int port, - const struct switchdev_obj_port_vlan *vlan, - struct switchdev_trans *trans) + const struct switchdev_obj_port_vlan *vlan) { struct b53_device *dev = ds->priv; bool untagged = vlan->flags & BRIDGE_VLAN_INFO_UNTAGGED; @@ -1495,8 +1493,7 @@ static bool b53_can_enable_brcm_tags(struct dsa_switch *ds, int port) return false; } -static enum dsa_tag_protocol b53_get_tag_protocol(struct dsa_switch *ds, - int port) +enum dsa_tag_protocol b53_get_tag_protocol(struct dsa_switch *ds, int port) { struct b53_device *dev = ds->priv; @@ -1514,6 +1511,7 @@ static enum dsa_tag_protocol b53_get_tag_protocol(struct dsa_switch *ds, return DSA_TAG_PROTO_BRCM; } +EXPORT_SYMBOL(b53_get_tag_protocol); int b53_mirror_add(struct dsa_switch *ds, int port, struct dsa_mall_mirror_tc_entry *mirror, bool ingress) diff --git a/drivers/net/dsa/b53/b53_priv.h b/drivers/net/dsa/b53/b53_priv.h index daaaa1ecb996..d954cf36ecd8 100644 --- a/drivers/net/dsa/b53/b53_priv.h +++ b/drivers/net/dsa/b53/b53_priv.h @@ -295,11 +295,9 @@ void b53_br_set_stp_state(struct dsa_switch *ds, int port, u8 state); void b53_br_fast_age(struct dsa_switch *ds, int port); int b53_vlan_filtering(struct dsa_switch *ds, int port, bool vlan_filtering); int b53_vlan_prepare(struct dsa_switch *ds, int port, - const struct switchdev_obj_port_vlan *vlan, - struct switchdev_trans *trans); + const struct switchdev_obj_port_vlan *vlan); void b53_vlan_add(struct dsa_switch *ds, int port, - const struct switchdev_obj_port_vlan *vlan, - struct switchdev_trans *trans); + const struct switchdev_obj_port_vlan *vlan); int b53_vlan_del(struct dsa_switch *ds, int port, const struct switchdev_obj_port_vlan *vlan); int b53_fdb_add(struct dsa_switch *ds, int port, @@ -310,6 +308,7 @@ int b53_fdb_dump(struct dsa_switch *ds, int port, dsa_fdb_dump_cb_t *cb, void *data); int b53_mirror_add(struct dsa_switch *ds, int port, struct dsa_mall_mirror_tc_entry *mirror, bool ingress); +enum dsa_tag_protocol b53_get_tag_protocol(struct dsa_switch *ds, int port); void b53_mirror_del(struct dsa_switch *ds, int port, struct dsa_mall_mirror_tc_entry *mirror); int b53_enable_port(struct dsa_switch *ds, int port, struct phy_device *phy); diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c index b62d47210db8..8ea0cd494ea0 100644 --- a/drivers/net/dsa/bcm_sf2.c +++ b/drivers/net/dsa/bcm_sf2.c @@ -34,12 +34,6 @@ #include "b53/b53_priv.h" #include "b53/b53_regs.h" -static enum dsa_tag_protocol bcm_sf2_sw_get_tag_protocol(struct dsa_switch *ds, - int port) -{ - return DSA_TAG_PROTO_BRCM; -} - static void bcm_sf2_imp_setup(struct dsa_switch *ds, int port) { struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds); @@ -860,7 +854,7 @@ static const struct b53_io_ops bcm_sf2_io_ops = { }; static const struct dsa_switch_ops bcm_sf2_ops = { - .get_tag_protocol = bcm_sf2_sw_get_tag_protocol, + .get_tag_protocol = b53_get_tag_protocol, .setup = bcm_sf2_sw_setup, .get_strings = b53_get_strings, .get_ethtool_stats = b53_get_ethtool_stats, diff --git a/drivers/net/dsa/dsa_loop.c b/drivers/net/dsa/dsa_loop.c index bb71d3d6f65b..7aa84ee4e771 100644 --- a/drivers/net/dsa/dsa_loop.c +++ b/drivers/net/dsa/dsa_loop.c @@ -174,9 +174,9 @@ static int dsa_loop_port_vlan_filtering(struct dsa_switch *ds, int port, return 0; } -static int dsa_loop_port_vlan_prepare(struct dsa_switch *ds, int port, - const struct switchdev_obj_port_vlan *vlan, - struct switchdev_trans *trans) +static int +dsa_loop_port_vlan_prepare(struct dsa_switch *ds, int port, + const struct switchdev_obj_port_vlan *vlan) { struct dsa_loop_priv *ps = ds->priv; struct mii_bus *bus = ps->bus; @@ -193,8 +193,7 @@ static int dsa_loop_port_vlan_prepare(struct dsa_switch *ds, int port, } static void dsa_loop_port_vlan_add(struct dsa_switch *ds, int port, - const struct switchdev_obj_port_vlan *vlan, - struct switchdev_trans *trans) + const struct switchdev_obj_port_vlan *vlan) { bool untagged = vlan->flags & BRIDGE_VLAN_INFO_UNTAGGED; bool pvid = vlan->flags & BRIDGE_VLAN_INFO_PVID; diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c index b24566bb74d2..c1b004fa64d9 100644 --- a/drivers/net/dsa/lan9303-core.c +++ b/drivers/net/dsa/lan9303-core.c @@ -583,6 +583,7 @@ static void lan9303_alr_loop(struct lan9303 *chip, alr_loop_cb_t *cb, void *ctx) { int i; + mutex_lock(&chip->alr_mutex); lan9303_write_switch_reg(chip, LAN9303_SWE_ALR_CMD, LAN9303_ALR_CMD_GET_FIRST); lan9303_write_switch_reg(chip, LAN9303_SWE_ALR_CMD, 0); @@ -606,6 +607,7 @@ static void lan9303_alr_loop(struct lan9303 *chip, alr_loop_cb_t *cb, void *ctx) LAN9303_ALR_CMD_GET_NEXT); lan9303_write_switch_reg(chip, LAN9303_SWE_ALR_CMD, 0); } + mutex_unlock(&chip->alr_mutex); } static void alr_reg_to_mac(u32 dat0, u32 dat1, u8 mac[6]) @@ -694,16 +696,20 @@ static int lan9303_alr_add_port(struct lan9303 *chip, const u8 *mac, int port, { struct lan9303_alr_cache_entry *entr; + mutex_lock(&chip->alr_mutex); entr = lan9303_alr_cache_find_mac(chip, mac); if (!entr) { /*New entry */ entr = lan9303_alr_cache_find_free(chip); - if (!entr) + if (!entr) { + mutex_unlock(&chip->alr_mutex); return -ENOSPC; + } ether_addr_copy(entr->mac_addr, mac); } entr->port_map |= BIT(port); entr->stp_override = stp_override; lan9303_alr_set_entry(chip, mac, entr->port_map, stp_override); + mutex_unlock(&chip->alr_mutex); return 0; } @@ -713,15 +719,18 @@ static int lan9303_alr_del_port(struct lan9303 *chip, const u8 *mac, int port) { struct lan9303_alr_cache_entry *entr; + mutex_lock(&chip->alr_mutex); entr = lan9303_alr_cache_find_mac(chip, mac); if (!entr) - return 0; /* no static entry found */ + goto out; /* no static entry found */ entr->port_map &= ~BIT(port); if (entr->port_map == 0) /* zero means its free again */ eth_zero_addr(entr->mac_addr); lan9303_alr_set_entry(chip, mac, entr->port_map, entr->stp_override); +out: + mutex_unlock(&chip->alr_mutex); return 0; } @@ -1217,8 +1226,7 @@ static int lan9303_port_fdb_dump(struct dsa_switch *ds, int port, } static int lan9303_port_mdb_prepare(struct dsa_switch *ds, int port, - const struct switchdev_obj_port_mdb *mdb, - struct switchdev_trans *trans) + const struct switchdev_obj_port_mdb *mdb) { struct lan9303 *chip = ds->priv; @@ -1235,8 +1243,7 @@ static int lan9303_port_mdb_prepare(struct dsa_switch *ds, int port, } static void lan9303_port_mdb_add(struct dsa_switch *ds, int port, - const struct switchdev_obj_port_mdb *mdb, - struct switchdev_trans *trans) + const struct switchdev_obj_port_mdb *mdb) { struct lan9303 *chip = ds->priv; @@ -1325,6 +1332,7 @@ int lan9303_probe(struct lan9303 *chip, struct device_node *np) int ret; mutex_init(&chip->indirect_mutex); + mutex_init(&chip->alr_mutex); lan9303_probe_reset_gpio(chip, np); diff --git a/drivers/net/dsa/microchip/ksz_common.c b/drivers/net/dsa/microchip/ksz_common.c index b5be93a1e0df..663b0d5b982b 100644 --- a/drivers/net/dsa/microchip/ksz_common.c +++ b/drivers/net/dsa/microchip/ksz_common.c @@ -559,8 +559,7 @@ static int ksz_port_vlan_filtering(struct dsa_switch *ds, int port, bool flag) } static int ksz_port_vlan_prepare(struct dsa_switch *ds, int port, - const struct switchdev_obj_port_vlan *vlan, - struct switchdev_trans *trans) + const struct switchdev_obj_port_vlan *vlan) { /* nothing needed */ @@ -568,8 +567,7 @@ static int ksz_port_vlan_prepare(struct dsa_switch *ds, int port, } static void ksz_port_vlan_add(struct dsa_switch *ds, int port, - const struct switchdev_obj_port_vlan *vlan, - struct switchdev_trans *trans) + const struct switchdev_obj_port_vlan *vlan) { struct ksz_device *dev = ds->priv; u32 vlan_table[3]; @@ -858,16 +856,14 @@ exit: } static int ksz_port_mdb_prepare(struct dsa_switch *ds, int port, - const struct switchdev_obj_port_mdb *mdb, - struct switchdev_trans *trans) + const struct switchdev_obj_port_mdb *mdb) { /* nothing to do */ return 0; } static void ksz_port_mdb_add(struct dsa_switch *ds, int port, - const struct switchdev_obj_port_mdb *mdb, - struct switchdev_trans *trans) + const struct switchdev_obj_port_mdb *mdb) { struct ksz_device *dev = ds->priv; u32 static_table[4]; diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c index 66d33e97cbc5..fc512c98f2f8 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -1185,8 +1185,7 @@ static int mv88e6xxx_port_vlan_filtering(struct dsa_switch *ds, int port, static int mv88e6xxx_port_vlan_prepare(struct dsa_switch *ds, int port, - const struct switchdev_obj_port_vlan *vlan, - struct switchdev_trans *trans) + const struct switchdev_obj_port_vlan *vlan) { struct mv88e6xxx_chip *chip = ds->priv; int err; @@ -1295,8 +1294,7 @@ static int _mv88e6xxx_port_vlan_add(struct mv88e6xxx_chip *chip, int port, } static void mv88e6xxx_port_vlan_add(struct dsa_switch *ds, int port, - const struct switchdev_obj_port_vlan *vlan, - struct switchdev_trans *trans) + const struct switchdev_obj_port_vlan *vlan) { struct mv88e6xxx_chip *chip = ds->priv; bool untagged = vlan->flags & BRIDGE_VLAN_INFO_UNTAGGED; @@ -1725,9 +1723,11 @@ static int mv88e6xxx_setup_message_port(struct mv88e6xxx_chip *chip, int port) static int mv88e6xxx_setup_egress_floods(struct mv88e6xxx_chip *chip, int port) { - bool flood = port == dsa_upstream_port(chip->ds); + struct dsa_switch *ds = chip->ds; + bool flood; /* Upstream ports flood frames with unknown unicast or multicast DA */ + flood = dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port); if (chip->info->ops->port_set_egress_floods) return chip->info->ops->port_set_egress_floods(chip, port, flood, flood); @@ -1744,6 +1744,39 @@ static int mv88e6xxx_serdes_power(struct mv88e6xxx_chip *chip, int port, return 0; } +static int mv88e6xxx_setup_upstream_port(struct mv88e6xxx_chip *chip, int port) +{ + struct dsa_switch *ds = chip->ds; + int upstream_port; + int err; + + upstream_port = dsa_upstream_port(ds, port); + if (chip->info->ops->port_set_upstream_port) { + err = chip->info->ops->port_set_upstream_port(chip, port, + upstream_port); + if (err) + return err; + } + + if (port == upstream_port) { + if (chip->info->ops->set_cpu_port) { + err = chip->info->ops->set_cpu_port(chip, + upstream_port); + if (err) + return err; + } + + if (chip->info->ops->set_egress_port) { + err = chip->info->ops->set_egress_port(chip, + upstream_port); + if (err) + return err; + } + } + + return 0; +} + static int mv88e6xxx_setup_port(struct mv88e6xxx_chip *chip, int port) { struct dsa_switch *ds = chip->ds; @@ -1814,13 +1847,9 @@ static int mv88e6xxx_setup_port(struct mv88e6xxx_chip *chip, int port) if (err) return err; - reg = 0; - if (chip->info->ops->port_set_upstream_port) { - err = chip->info->ops->port_set_upstream_port( - chip, port, dsa_upstream_port(ds)); - if (err) - return err; - } + err = mv88e6xxx_setup_upstream_port(chip, port); + if (err) + return err; err = mv88e6xxx_port_set_8021q_mode(chip, port, MV88E6XXX_PORT_CTL2_8021Q_MODE_DISABLED); @@ -1946,21 +1975,8 @@ static int mv88e6xxx_set_ageing_time(struct dsa_switch *ds, static int mv88e6xxx_g1_setup(struct mv88e6xxx_chip *chip) { struct dsa_switch *ds = chip->ds; - u32 upstream_port = dsa_upstream_port(ds); int err; - if (chip->info->ops->set_cpu_port) { - err = chip->info->ops->set_cpu_port(chip, upstream_port); - if (err) - return err; - } - - if (chip->info->ops->set_egress_port) { - err = chip->info->ops->set_egress_port(chip, upstream_port); - if (err) - return err; - } - /* Disable remote management, and set the switch's DSA device number. */ err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_CTL2, MV88E6XXX_G1_CTL2_MULTIPLE_CASCADE | @@ -3741,6 +3757,7 @@ static enum dsa_tag_protocol mv88e6xxx_get_tag_protocol(struct dsa_switch *ds, return chip->info->tag_protocol; } +#if IS_ENABLED(CONFIG_NET_DSA_LEGACY) static const char *mv88e6xxx_drv_probe(struct device *dsa_dev, struct device *host_dev, int sw_addr, void **priv) @@ -3788,10 +3805,10 @@ free: return NULL; } +#endif static int mv88e6xxx_port_mdb_prepare(struct dsa_switch *ds, int port, - const struct switchdev_obj_port_mdb *mdb, - struct switchdev_trans *trans) + const struct switchdev_obj_port_mdb *mdb) { /* We don't need any dynamic resource from the kernel (yet), * so skip the prepare phase. @@ -3801,8 +3818,7 @@ static int mv88e6xxx_port_mdb_prepare(struct dsa_switch *ds, int port, } static void mv88e6xxx_port_mdb_add(struct dsa_switch *ds, int port, - const struct switchdev_obj_port_mdb *mdb, - struct switchdev_trans *trans) + const struct switchdev_obj_port_mdb *mdb) { struct mv88e6xxx_chip *chip = ds->priv; @@ -3829,7 +3845,9 @@ static int mv88e6xxx_port_mdb_del(struct dsa_switch *ds, int port, } static const struct dsa_switch_ops mv88e6xxx_switch_ops = { +#if IS_ENABLED(CONFIG_NET_DSA_LEGACY) .probe = mv88e6xxx_drv_probe, +#endif .get_tag_protocol = mv88e6xxx_get_tag_protocol, .setup = mv88e6xxx_setup, .adjust_link = mv88e6xxx_adjust_link, diff --git a/drivers/net/dummy.c b/drivers/net/dummy.c index 58483af80bdb..30b1c8512049 100644 --- a/drivers/net/dummy.c +++ b/drivers/net/dummy.c @@ -42,48 +42,7 @@ #define DRV_NAME "dummy" #define DRV_VERSION "1.0" -#undef pr_fmt -#define pr_fmt(fmt) DRV_NAME ": " fmt - static int numdummies = 1; -static int num_vfs; - -struct vf_data_storage { - u8 vf_mac[ETH_ALEN]; - u16 pf_vlan; /* When set, guest VLAN config not allowed. */ - u16 pf_qos; - __be16 vlan_proto; - u16 min_tx_rate; - u16 max_tx_rate; - u8 spoofchk_enabled; - bool rss_query_enabled; - u8 trusted; - int link_state; -}; - -struct dummy_priv { - struct vf_data_storage *vfinfo; -}; - -static int dummy_num_vf(struct device *dev) -{ - return num_vfs; -} - -static struct bus_type dummy_bus = { - .name = "dummy", - .num_vf = dummy_num_vf, -}; - -static void release_dummy_parent(struct device *dev) -{ -} - -static struct device dummy_parent = { - .init_name = "dummy", - .bus = &dummy_bus, - .release = release_dummy_parent, -}; /* fake multicast ability */ static void set_multicast_list(struct net_device *dev) @@ -133,25 +92,10 @@ static netdev_tx_t dummy_xmit(struct sk_buff *skb, struct net_device *dev) static int dummy_dev_init(struct net_device *dev) { - struct dummy_priv *priv = netdev_priv(dev); - dev->dstats = netdev_alloc_pcpu_stats(struct pcpu_dstats); if (!dev->dstats) return -ENOMEM; - priv->vfinfo = NULL; - - if (!num_vfs) - return 0; - - dev->dev.parent = &dummy_parent; - priv->vfinfo = kcalloc(num_vfs, sizeof(struct vf_data_storage), - GFP_KERNEL); - if (!priv->vfinfo) { - free_percpu(dev->dstats); - return -ENOMEM; - } - return 0; } @@ -169,117 +113,6 @@ static int dummy_change_carrier(struct net_device *dev, bool new_carrier) return 0; } -static int dummy_set_vf_mac(struct net_device *dev, int vf, u8 *mac) -{ - struct dummy_priv *priv = netdev_priv(dev); - - if (!is_valid_ether_addr(mac) || (vf >= num_vfs)) - return -EINVAL; - - memcpy(priv->vfinfo[vf].vf_mac, mac, ETH_ALEN); - - return 0; -} - -static int dummy_set_vf_vlan(struct net_device *dev, int vf, - u16 vlan, u8 qos, __be16 vlan_proto) -{ - struct dummy_priv *priv = netdev_priv(dev); - - if ((vf >= num_vfs) || (vlan > 4095) || (qos > 7)) - return -EINVAL; - - priv->vfinfo[vf].pf_vlan = vlan; - priv->vfinfo[vf].pf_qos = qos; - priv->vfinfo[vf].vlan_proto = vlan_proto; - - return 0; -} - -static int dummy_set_vf_rate(struct net_device *dev, int vf, int min, int max) -{ - struct dummy_priv *priv = netdev_priv(dev); - - if (vf >= num_vfs) - return -EINVAL; - - priv->vfinfo[vf].min_tx_rate = min; - priv->vfinfo[vf].max_tx_rate = max; - - return 0; -} - -static int dummy_set_vf_spoofchk(struct net_device *dev, int vf, bool val) -{ - struct dummy_priv *priv = netdev_priv(dev); - - if (vf >= num_vfs) - return -EINVAL; - - priv->vfinfo[vf].spoofchk_enabled = val; - - return 0; -} - -static int dummy_set_vf_rss_query_en(struct net_device *dev, int vf, bool val) -{ - struct dummy_priv *priv = netdev_priv(dev); - - if (vf >= num_vfs) - return -EINVAL; - - priv->vfinfo[vf].rss_query_enabled = val; - - return 0; -} - -static int dummy_set_vf_trust(struct net_device *dev, int vf, bool val) -{ - struct dummy_priv *priv = netdev_priv(dev); - - if (vf >= num_vfs) - return -EINVAL; - - priv->vfinfo[vf].trusted = val; - - return 0; -} - -static int dummy_get_vf_config(struct net_device *dev, - int vf, struct ifla_vf_info *ivi) -{ - struct dummy_priv *priv = netdev_priv(dev); - - if (vf >= num_vfs) - return -EINVAL; - - ivi->vf = vf; - memcpy(&ivi->mac, priv->vfinfo[vf].vf_mac, ETH_ALEN); - ivi->vlan = priv->vfinfo[vf].pf_vlan; - ivi->qos = priv->vfinfo[vf].pf_qos; - ivi->spoofchk = priv->vfinfo[vf].spoofchk_enabled; - ivi->linkstate = priv->vfinfo[vf].link_state; - ivi->min_tx_rate = priv->vfinfo[vf].min_tx_rate; - ivi->max_tx_rate = priv->vfinfo[vf].max_tx_rate; - ivi->rss_query_en = priv->vfinfo[vf].rss_query_enabled; - ivi->trusted = priv->vfinfo[vf].trusted; - ivi->vlan_proto = priv->vfinfo[vf].vlan_proto; - - return 0; -} - -static int dummy_set_vf_link_state(struct net_device *dev, int vf, int state) -{ - struct dummy_priv *priv = netdev_priv(dev); - - if (vf >= num_vfs) - return -EINVAL; - - priv->vfinfo[vf].link_state = state; - - return 0; -} - static const struct net_device_ops dummy_netdev_ops = { .ndo_init = dummy_dev_init, .ndo_uninit = dummy_dev_uninit, @@ -289,14 +122,6 @@ static const struct net_device_ops dummy_netdev_ops = { .ndo_set_mac_address = eth_mac_addr, .ndo_get_stats64 = dummy_get_stats64, .ndo_change_carrier = dummy_change_carrier, - .ndo_set_vf_mac = dummy_set_vf_mac, - .ndo_set_vf_vlan = dummy_set_vf_vlan, - .ndo_set_vf_rate = dummy_set_vf_rate, - .ndo_set_vf_spoofchk = dummy_set_vf_spoofchk, - .ndo_set_vf_trust = dummy_set_vf_trust, - .ndo_get_vf_config = dummy_get_vf_config, - .ndo_set_vf_link_state = dummy_set_vf_link_state, - .ndo_set_vf_rss_query_en = dummy_set_vf_rss_query_en, }; static void dummy_get_drvinfo(struct net_device *dev, @@ -323,13 +148,6 @@ static const struct ethtool_ops dummy_ethtool_ops = { .get_ts_info = dummy_get_ts_info, }; -static void dummy_free_netdev(struct net_device *dev) -{ - struct dummy_priv *priv = netdev_priv(dev); - - kfree(priv->vfinfo); -} - static void dummy_setup(struct net_device *dev) { ether_setup(dev); @@ -338,7 +156,6 @@ static void dummy_setup(struct net_device *dev) dev->netdev_ops = &dummy_netdev_ops; dev->ethtool_ops = &dummy_ethtool_ops; dev->needs_free_netdev = true; - dev->priv_destructor = dummy_free_netdev; /* Fill in device structure with ethernet-generic values. */ dev->flags |= IFF_NOARP; @@ -370,7 +187,6 @@ static int dummy_validate(struct nlattr *tb[], struct nlattr *data[], static struct rtnl_link_ops dummy_link_ops __read_mostly = { .kind = DRV_NAME, - .priv_size = sizeof(struct dummy_priv), .setup = dummy_setup, .validate = dummy_validate, }; @@ -379,16 +195,12 @@ static struct rtnl_link_ops dummy_link_ops __read_mostly = { module_param(numdummies, int, 0); MODULE_PARM_DESC(numdummies, "Number of dummy pseudo devices"); -module_param(num_vfs, int, 0); -MODULE_PARM_DESC(num_vfs, "Number of dummy VFs per dummy device"); - static int __init dummy_init_one(void) { struct net_device *dev_dummy; int err; - dev_dummy = alloc_netdev(sizeof(struct dummy_priv), - "dummy%d", NET_NAME_ENUM, dummy_setup); + dev_dummy = alloc_netdev(0, "dummy%d", NET_NAME_ENUM, dummy_setup); if (!dev_dummy) return -ENOMEM; @@ -407,21 +219,6 @@ static int __init dummy_init_module(void) { int i, err = 0; - if (num_vfs) { - err = bus_register(&dummy_bus); - if (err < 0) { - pr_err("registering dummy bus failed\n"); - return err; - } - - err = device_register(&dummy_parent); - if (err < 0) { - pr_err("registering dummy parent device failed\n"); - bus_unregister(&dummy_bus); - return err; - } - } - rtnl_lock(); err = __rtnl_link_register(&dummy_link_ops); if (err < 0) @@ -437,22 +234,12 @@ static int __init dummy_init_module(void) out: rtnl_unlock(); - if (err && num_vfs) { - device_unregister(&dummy_parent); - bus_unregister(&dummy_bus); - } - return err; } static void __exit dummy_cleanup_module(void) { rtnl_link_unregister(&dummy_link_ops); - - if (num_vfs) { - device_unregister(&dummy_parent); - bus_unregister(&dummy_bus); - } } module_init(dummy_init_module); diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 61ca4eb7c6fa..1d865ae201db 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -1706,12 +1706,16 @@ static int bnxt_async_event_process(struct bnxt *bp, if (BNXT_VF(bp)) goto async_event_process_exit; - if (data1 & 0x20000) { + + /* print unsupported speed warning in forced speed mode only */ + if (!(link_info->autoneg & BNXT_AUTONEG_SPEED) && + (data1 & 0x20000)) { u16 fw_speed = link_info->force_link_speed; u32 speed = bnxt_fw_to_ethtool_speed(fw_speed); - netdev_warn(bp->dev, "Link speed %d no longer supported\n", - speed); + if (speed != SPEED_UNKNOWN) + netdev_warn(bp->dev, "Link speed %d no longer supported\n", + speed); } set_bit(BNXT_LINK_SPEED_CHNG_SP_EVENT, &bp->sp_event); /* fall thru */ @@ -7800,8 +7804,6 @@ static void bnxt_remove_one(struct pci_dev *pdev) bnxt_dcb_free(bp); kfree(bp->edev); bp->edev = NULL; - if (bp->xdp_prog) - bpf_prog_put(bp->xdp_prog); bnxt_cleanup_pci(bp); free_netdev(dev); } diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c index b13ce5ebde8d..fe7599f404bf 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c @@ -1376,6 +1376,9 @@ static int bnxt_firmware_reset(struct net_device *dev, req.embedded_proc_type = FW_RESET_REQ_EMBEDDED_PROC_TYPE_CHIP; req.selfrst_status = FW_RESET_REQ_SELFRST_STATUS_SELFRSTASAP; break; + case BNXT_FW_RESET_AP: + req.embedded_proc_type = FW_RESET_REQ_EMBEDDED_PROC_TYPE_AP; + break; default: return -EINVAL; } @@ -2522,6 +2525,14 @@ static int bnxt_reset(struct net_device *dev, u32 *flags) rc = bnxt_firmware_reset(dev, BNXT_FW_RESET_CHIP); if (!rc) netdev_info(dev, "Reset request successful. Reload driver to complete reset\n"); + } else if (*flags == ETH_RESET_AP) { + /* This feature is not supported in older firmware versions */ + if (bp->hwrm_spec_code < 0x10803) + return -EOPNOTSUPP; + + rc = bnxt_firmware_reset(dev, BNXT_FW_RESET_AP); + if (!rc) + netdev_info(dev, "Reset Application Processor request successful.\n"); } else { rc = -EINVAL; } diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.h index ff601b42fcc8..836ef682f24c 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.h +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.h @@ -34,6 +34,7 @@ struct bnxt_led_cfg { #define BNXT_LED_DFLT_ENABLES(x) \ cpu_to_le32(BNXT_LED_DFLT_ENA << (BNXT_LED_DFLT_ENA_SHIFT * (x))) +#define BNXT_FW_RESET_AP 0xfffe #define BNXT_FW_RESET_CHIP 0xffff extern const struct ethtool_ops bnxt_ethtool_ops; diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c index 3d201d7324bd..9807214da206 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c @@ -54,12 +54,10 @@ static int bnxt_tc_parse_redir(struct bnxt *bp, struct bnxt_tc_actions *actions, const struct tc_action *tc_act) { - int ifindex = tcf_mirred_ifindex(tc_act); - struct net_device *dev; + struct net_device *dev = tcf_mirred_dev(tc_act); - dev = __dev_get_by_index(dev_net(bp->dev), ifindex); if (!dev) { - netdev_info(bp->dev, "no dev for ifindex=%d", ifindex); + netdev_info(bp->dev, "no dev in mirred action"); return -EINVAL; } @@ -148,9 +146,6 @@ static int bnxt_tc_parse_actions(struct bnxt *bp, } } - if (rc) - return rc; - if (actions->flags & BNXT_TC_ACTION_FLAG_FWD) { if (actions->flags & BNXT_TC_ACTION_FLAG_TUNNEL_ENCAP) { /* dst_fid is PF's fid */ @@ -164,7 +159,7 @@ static int bnxt_tc_parse_actions(struct bnxt *bp, } } - return rc; + return 0; } #define GET_KEY(flow_cmd, key_type) \ diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h index c93f3a2dc6c1..c50c5ec49b1d 100644 --- a/drivers/net/ethernet/cadence/macb.h +++ b/drivers/net/ethernet/cadence/macb.h @@ -164,14 +164,38 @@ #define GEM_DCFG5 0x0290 /* Design Config 5 */ #define GEM_DCFG6 0x0294 /* Design Config 6 */ #define GEM_DCFG7 0x0298 /* Design Config 7 */ +#define GEM_DCFG8 0x029C /* Design Config 8 */ #define GEM_TXBDCTRL 0x04cc /* TX Buffer Descriptor control register */ #define GEM_RXBDCTRL 0x04d0 /* RX Buffer Descriptor control register */ +/* Screener Type 2 match registers */ +#define GEM_SCRT2 0x540 + +/* EtherType registers */ +#define GEM_ETHT 0x06E0 + +/* Type 2 compare registers */ +#define GEM_T2CMPW0 0x0700 +#define GEM_T2CMPW1 0x0704 +#define T2CMP_OFST(t2idx) (t2idx * 2) + +/* type 2 compare registers + * each location requires 3 compare regs + */ +#define GEM_IP4SRC_CMP(idx) (idx * 3) +#define GEM_IP4DST_CMP(idx) (idx * 3 + 1) +#define GEM_PORT_CMP(idx) (idx * 3 + 2) + +/* Which screening type 2 EtherType register will be used (0 - 7) */ +#define SCRT2_ETHT 0 + #define GEM_ISR(hw_q) (0x0400 + ((hw_q) << 2)) #define GEM_TBQP(hw_q) (0x0440 + ((hw_q) << 2)) #define GEM_TBQPH(hw_q) (0x04C8) #define GEM_RBQP(hw_q) (0x0480 + ((hw_q) << 2)) +#define GEM_RBQS(hw_q) (0x04A0 + ((hw_q) << 2)) +#define GEM_RBQPH(hw_q) (0x04D4) #define GEM_IER(hw_q) (0x0600 + ((hw_q) << 2)) #define GEM_IDR(hw_q) (0x0620 + ((hw_q) << 2)) #define GEM_IMR(hw_q) (0x0640 + ((hw_q) << 2)) @@ -455,6 +479,16 @@ #define GEM_DAW64_OFFSET 23 #define GEM_DAW64_SIZE 1 +/* Bitfields in DCFG8. */ +#define GEM_T1SCR_OFFSET 24 +#define GEM_T1SCR_SIZE 8 +#define GEM_T2SCR_OFFSET 16 +#define GEM_T2SCR_SIZE 8 +#define GEM_SCR2ETH_OFFSET 8 +#define GEM_SCR2ETH_SIZE 8 +#define GEM_SCR2CMP_OFFSET 0 +#define GEM_SCR2CMP_SIZE 8 + /* Bitfields in TISUBN */ #define GEM_SUBNSINCR_OFFSET 0 #define GEM_SUBNSINCR_SIZE 16 @@ -483,6 +517,66 @@ #define GEM_RXTSMODE_OFFSET 4 /* RX Descriptor Timestamp Insertion mode */ #define GEM_RXTSMODE_SIZE 2 +/* Bitfields in SCRT2 */ +#define GEM_QUEUE_OFFSET 0 /* Queue Number */ +#define GEM_QUEUE_SIZE 4 +#define GEM_VLANPR_OFFSET 4 /* VLAN Priority */ +#define GEM_VLANPR_SIZE 3 +#define GEM_VLANEN_OFFSET 8 /* VLAN Enable */ +#define GEM_VLANEN_SIZE 1 +#define GEM_ETHT2IDX_OFFSET 9 /* Index to screener type 2 EtherType register */ +#define GEM_ETHT2IDX_SIZE 3 +#define GEM_ETHTEN_OFFSET 12 /* EtherType Enable */ +#define GEM_ETHTEN_SIZE 1 +#define GEM_CMPA_OFFSET 13 /* Compare A - Index to screener type 2 Compare register */ +#define GEM_CMPA_SIZE 5 +#define GEM_CMPAEN_OFFSET 18 /* Compare A Enable */ +#define GEM_CMPAEN_SIZE 1 +#define GEM_CMPB_OFFSET 19 /* Compare B - Index to screener type 2 Compare register */ +#define GEM_CMPB_SIZE 5 +#define GEM_CMPBEN_OFFSET 24 /* Compare B Enable */ +#define GEM_CMPBEN_SIZE 1 +#define GEM_CMPC_OFFSET 25 /* Compare C - Index to screener type 2 Compare register */ +#define GEM_CMPC_SIZE 5 +#define GEM_CMPCEN_OFFSET 30 /* Compare C Enable */ +#define GEM_CMPCEN_SIZE 1 + +/* Bitfields in ETHT */ +#define GEM_ETHTCMP_OFFSET 0 /* EtherType compare value */ +#define GEM_ETHTCMP_SIZE 16 + +/* Bitfields in T2CMPW0 */ +#define GEM_T2CMP_OFFSET 16 /* 0xFFFF0000 compare value */ +#define GEM_T2CMP_SIZE 16 +#define GEM_T2MASK_OFFSET 0 /* 0x0000FFFF compare value or mask */ +#define GEM_T2MASK_SIZE 16 + +/* Bitfields in T2CMPW1 */ +#define GEM_T2DISMSK_OFFSET 9 /* disable mask */ +#define GEM_T2DISMSK_SIZE 1 +#define GEM_T2CMPOFST_OFFSET 7 /* compare offset */ +#define GEM_T2CMPOFST_SIZE 2 +#define GEM_T2OFST_OFFSET 0 /* offset value */ +#define GEM_T2OFST_SIZE 7 + +/* Offset for screener type 2 compare values (T2CMPOFST). + * Note the offset is applied after the specified point, + * e.g. GEM_T2COMPOFST_ETYPE denotes the EtherType field, so an offset + * of 12 bytes from this would be the source IP address in an IP header + */ +#define GEM_T2COMPOFST_SOF 0 +#define GEM_T2COMPOFST_ETYPE 1 +#define GEM_T2COMPOFST_IPHDR 2 +#define GEM_T2COMPOFST_TCPUDP 3 + +/* offset from EtherType to IP address */ +#define ETYPE_SRCIP_OFFSET 12 +#define ETYPE_DSTIP_OFFSET 16 + +/* offset from IP header to port */ +#define IPHDR_SRCPORT_OFFSET 0 +#define IPHDR_DSTPORT_OFFSET 2 + /* Transmit DMA buffer descriptor Word 1 */ #define GEM_DMA_TXVALID_OFFSET 23 /* timestamp has been captured in the Buffer Descriptor */ #define GEM_DMA_TXVALID_SIZE 1 @@ -583,6 +677,8 @@ #define gem_writel(port, reg, value) (port)->macb_reg_writel((port), GEM_##reg, (value)) #define queue_readl(queue, reg) (queue)->bp->macb_reg_readl((queue)->bp, (queue)->reg) #define queue_writel(queue, reg, value) (queue)->bp->macb_reg_writel((queue)->bp, (queue)->reg, (value)) +#define gem_readl_n(port, reg, idx) (port)->macb_reg_readl((port), GEM_##reg + idx * 4) +#define gem_writel_n(port, reg, idx, value) (port)->macb_reg_writel((port), GEM_##reg + idx * 4, (value)) #define PTP_TS_BUFFER_SIZE 128 /* must be power of 2 */ @@ -920,13 +1016,42 @@ static const struct gem_statistic gem_statistics[] = { #define GEM_STATS_LEN ARRAY_SIZE(gem_statistics) +#define QUEUE_STAT_TITLE(title) { \ + .stat_string = title, \ +} + +/* per queue statistics, each should be unsigned long type */ +struct queue_stats { + union { + unsigned long first; + unsigned long rx_packets; + }; + unsigned long rx_bytes; + unsigned long rx_dropped; + unsigned long tx_packets; + unsigned long tx_bytes; + unsigned long tx_dropped; +}; + +static const struct gem_statistic queue_statistics[] = { + QUEUE_STAT_TITLE("rx_packets"), + QUEUE_STAT_TITLE("rx_bytes"), + QUEUE_STAT_TITLE("rx_dropped"), + QUEUE_STAT_TITLE("tx_packets"), + QUEUE_STAT_TITLE("tx_bytes"), + QUEUE_STAT_TITLE("tx_dropped"), +}; + +#define QUEUE_STATS_LEN ARRAY_SIZE(queue_statistics) + struct macb; +struct macb_queue; struct macb_or_gem_ops { int (*mog_alloc_rx_buffers)(struct macb *bp); void (*mog_free_rx_buffers)(struct macb *bp); void (*mog_init_rings)(struct macb *bp); - int (*mog_rx)(struct macb *bp, int budget); + int (*mog_rx)(struct macb_queue *queue, int budget); }; /* MACB-PTP interface: adapt to platform needs. */ @@ -968,6 +1093,9 @@ struct macb_queue { unsigned int IMR; unsigned int TBQP; unsigned int TBQPH; + unsigned int RBQS; + unsigned int RBQP; + unsigned int RBQPH; unsigned int tx_head, tx_tail; struct macb_dma_desc *tx_ring; @@ -975,6 +1103,16 @@ struct macb_queue { dma_addr_t tx_ring_dma; struct work_struct tx_error_task; + dma_addr_t rx_ring_dma; + dma_addr_t rx_buffers_dma; + unsigned int rx_tail; + unsigned int rx_prepared_head; + struct macb_dma_desc *rx_ring; + struct sk_buff **rx_skbuff; + void *rx_buffers; + struct napi_struct napi; + struct queue_stats stats; + #ifdef CONFIG_MACB_USE_HWSTAMP struct work_struct tx_ts_task; unsigned int tx_ts_head, tx_ts_tail; @@ -982,6 +1120,16 @@ struct macb_queue { #endif }; +struct ethtool_rx_fs_item { + struct ethtool_rx_flow_spec fs; + struct list_head list; +}; + +struct ethtool_rx_fs_list { + struct list_head list; + unsigned int count; +}; + struct macb { void __iomem *regs; bool native_io; @@ -990,11 +1138,6 @@ struct macb { u32 (*macb_reg_readl)(struct macb *bp, int offset); void (*macb_reg_writel)(struct macb *bp, int offset, u32 value); - unsigned int rx_tail; - unsigned int rx_prepared_head; - struct macb_dma_desc *rx_ring; - struct sk_buff **rx_skbuff; - void *rx_buffers; size_t rx_buffer_size; unsigned int rx_ring_size; @@ -1011,15 +1154,11 @@ struct macb { struct clk *tx_clk; struct clk *rx_clk; struct net_device *dev; - struct napi_struct napi; union { struct macb_stats macb; struct gem_stats gem; } hw_stats; - dma_addr_t rx_ring_dma; - dma_addr_t rx_buffers_dma; - struct macb_or_gem_ops macbgem_ops; struct mii_bus *mii_bus; @@ -1032,7 +1171,6 @@ struct macb { unsigned int dma_burst_length; phy_interface_t phy_interface; - struct gpio_desc *reset_gpio; /* AT91RM9200 transmit */ struct sk_buff *skb; /* holds skb until xmit interrupt completes */ @@ -1040,7 +1178,7 @@ struct macb { int skb_length; /* saved skb length for pci_unmap_single */ unsigned int max_tx_length; - u64 ethtool_stats[GEM_STATS_LEN]; + u64 ethtool_stats[GEM_STATS_LEN + QUEUE_STATS_LEN * MACB_MAX_QUEUES]; unsigned int rx_frm_len_mask; unsigned int jumbo_max_len; @@ -1057,6 +1195,11 @@ struct macb { struct ptp_clock_info ptp_clock_info; struct tsu_incr tsu_incr; struct hwtstamp_config tstamp_config; + + /* RX queue filer rule set*/ + struct ethtool_rx_fs_list rx_fs_list; + spinlock_t rx_fs_lock; + unsigned int max_tuples; }; #ifdef CONFIG_MACB_USE_HWSTAMP diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c index 72a67f74b97b..234667eaaa92 100644 --- a/drivers/net/ethernet/cadence/macb_main.c +++ b/drivers/net/ethernet/cadence/macb_main.c @@ -194,17 +194,17 @@ static unsigned int macb_rx_ring_wrap(struct macb *bp, unsigned int index) return index & (bp->rx_ring_size - 1); } -static struct macb_dma_desc *macb_rx_desc(struct macb *bp, unsigned int index) +static struct macb_dma_desc *macb_rx_desc(struct macb_queue *queue, unsigned int index) { - index = macb_rx_ring_wrap(bp, index); - index = macb_adj_dma_desc_idx(bp, index); - return &bp->rx_ring[index]; + index = macb_rx_ring_wrap(queue->bp, index); + index = macb_adj_dma_desc_idx(queue->bp, index); + return &queue->rx_ring[index]; } -static void *macb_rx_buffer(struct macb *bp, unsigned int index) +static void *macb_rx_buffer(struct macb_queue *queue, unsigned int index) { - return bp->rx_buffers + bp->rx_buffer_size * - macb_rx_ring_wrap(bp, index); + return queue->rx_buffers + queue->bp->rx_buffer_size * + macb_rx_ring_wrap(queue->bp, index); } /* I/O accessors */ @@ -759,7 +759,9 @@ static void macb_tx_error_task(struct work_struct *work) macb_tx_ring_wrap(bp, tail), skb->data); bp->dev->stats.tx_packets++; + queue->stats.tx_packets++; bp->dev->stats.tx_bytes += skb->len; + queue->stats.tx_bytes += skb->len; } } else { /* "Buffers exhausted mid-frame" errors may only happen @@ -859,7 +861,9 @@ static void macb_tx_interrupt(struct macb_queue *queue) macb_tx_ring_wrap(bp, tail), skb->data); bp->dev->stats.tx_packets++; + queue->stats.tx_packets++; bp->dev->stats.tx_bytes += skb->len; + queue->stats.tx_bytes += skb->len; } /* Now we can safely release resources */ @@ -881,24 +885,25 @@ static void macb_tx_interrupt(struct macb_queue *queue) netif_wake_subqueue(bp->dev, queue_index); } -static void gem_rx_refill(struct macb *bp) +static void gem_rx_refill(struct macb_queue *queue) { unsigned int entry; struct sk_buff *skb; dma_addr_t paddr; + struct macb *bp = queue->bp; struct macb_dma_desc *desc; - while (CIRC_SPACE(bp->rx_prepared_head, bp->rx_tail, - bp->rx_ring_size) > 0) { - entry = macb_rx_ring_wrap(bp, bp->rx_prepared_head); + while (CIRC_SPACE(queue->rx_prepared_head, queue->rx_tail, + bp->rx_ring_size) > 0) { + entry = macb_rx_ring_wrap(bp, queue->rx_prepared_head); /* Make hw descriptor updates visible to CPU */ rmb(); - bp->rx_prepared_head++; - desc = macb_rx_desc(bp, entry); + queue->rx_prepared_head++; + desc = macb_rx_desc(queue, entry); - if (!bp->rx_skbuff[entry]) { + if (!queue->rx_skbuff[entry]) { /* allocate sk_buff for this free entry in ring */ skb = netdev_alloc_skb(bp->dev, bp->rx_buffer_size); if (unlikely(!skb)) { @@ -916,7 +921,7 @@ static void gem_rx_refill(struct macb *bp) break; } - bp->rx_skbuff[entry] = skb; + queue->rx_skbuff[entry] = skb; if (entry == bp->rx_ring_size - 1) paddr |= MACB_BIT(RX_WRAP); @@ -934,18 +939,18 @@ static void gem_rx_refill(struct macb *bp) /* Make descriptor updates visible to hardware */ wmb(); - netdev_vdbg(bp->dev, "rx ring: prepared head %d, tail %d\n", - bp->rx_prepared_head, bp->rx_tail); + netdev_vdbg(bp->dev, "rx ring: queue: %p, prepared head %d, tail %d\n", + queue, queue->rx_prepared_head, queue->rx_tail); } /* Mark DMA descriptors from begin up to and not including end as unused */ -static void discard_partial_frame(struct macb *bp, unsigned int begin, +static void discard_partial_frame(struct macb_queue *queue, unsigned int begin, unsigned int end) { unsigned int frag; for (frag = begin; frag != end; frag++) { - struct macb_dma_desc *desc = macb_rx_desc(bp, frag); + struct macb_dma_desc *desc = macb_rx_desc(queue, frag); desc->addr &= ~MACB_BIT(RX_USED); } @@ -959,8 +964,9 @@ static void discard_partial_frame(struct macb *bp, unsigned int begin, */ } -static int gem_rx(struct macb *bp, int budget) +static int gem_rx(struct macb_queue *queue, int budget) { + struct macb *bp = queue->bp; unsigned int len; unsigned int entry; struct sk_buff *skb; @@ -972,8 +978,8 @@ static int gem_rx(struct macb *bp, int budget) dma_addr_t addr; bool rxused; - entry = macb_rx_ring_wrap(bp, bp->rx_tail); - desc = macb_rx_desc(bp, entry); + entry = macb_rx_ring_wrap(bp, queue->rx_tail); + desc = macb_rx_desc(queue, entry); /* Make hw descriptor updates visible to CPU */ rmb(); @@ -985,24 +991,26 @@ static int gem_rx(struct macb *bp, int budget) if (!rxused) break; - bp->rx_tail++; + queue->rx_tail++; count++; if (!(ctrl & MACB_BIT(RX_SOF) && ctrl & MACB_BIT(RX_EOF))) { netdev_err(bp->dev, "not whole frame pointed by descriptor\n"); bp->dev->stats.rx_dropped++; + queue->stats.rx_dropped++; break; } - skb = bp->rx_skbuff[entry]; + skb = queue->rx_skbuff[entry]; if (unlikely(!skb)) { netdev_err(bp->dev, "inconsistent Rx descriptor chain\n"); bp->dev->stats.rx_dropped++; + queue->stats.rx_dropped++; break; } /* now everything is ready for receiving packet */ - bp->rx_skbuff[entry] = NULL; + queue->rx_skbuff[entry] = NULL; len = ctrl & bp->rx_frm_len_mask; netdev_vdbg(bp->dev, "gem_rx %u (len %u)\n", entry, len); @@ -1019,7 +1027,9 @@ static int gem_rx(struct macb *bp, int budget) skb->ip_summed = CHECKSUM_UNNECESSARY; bp->dev->stats.rx_packets++; + queue->stats.rx_packets++; bp->dev->stats.rx_bytes += skb->len; + queue->stats.rx_bytes += skb->len; gem_ptp_do_rxstamp(bp, skb, desc); @@ -1035,12 +1045,12 @@ static int gem_rx(struct macb *bp, int budget) netif_receive_skb(skb); } - gem_rx_refill(bp); + gem_rx_refill(queue); return count; } -static int macb_rx_frame(struct macb *bp, unsigned int first_frag, +static int macb_rx_frame(struct macb_queue *queue, unsigned int first_frag, unsigned int last_frag) { unsigned int len; @@ -1048,8 +1058,9 @@ static int macb_rx_frame(struct macb *bp, unsigned int first_frag, unsigned int offset; struct sk_buff *skb; struct macb_dma_desc *desc; + struct macb *bp = queue->bp; - desc = macb_rx_desc(bp, last_frag); + desc = macb_rx_desc(queue, last_frag); len = desc->ctrl & bp->rx_frm_len_mask; netdev_vdbg(bp->dev, "macb_rx_frame frags %u - %u (len %u)\n", @@ -1068,7 +1079,7 @@ static int macb_rx_frame(struct macb *bp, unsigned int first_frag, if (!skb) { bp->dev->stats.rx_dropped++; for (frag = first_frag; ; frag++) { - desc = macb_rx_desc(bp, frag); + desc = macb_rx_desc(queue, frag); desc->addr &= ~MACB_BIT(RX_USED); if (frag == last_frag) break; @@ -1096,10 +1107,10 @@ static int macb_rx_frame(struct macb *bp, unsigned int first_frag, frag_len = len - offset; } skb_copy_to_linear_data_offset(skb, offset, - macb_rx_buffer(bp, frag), + macb_rx_buffer(queue, frag), frag_len); offset += bp->rx_buffer_size; - desc = macb_rx_desc(bp, frag); + desc = macb_rx_desc(queue, frag); desc->addr &= ~MACB_BIT(RX_USED); if (frag == last_frag) @@ -1121,32 +1132,34 @@ static int macb_rx_frame(struct macb *bp, unsigned int first_frag, return 0; } -static inline void macb_init_rx_ring(struct macb *bp) +static inline void macb_init_rx_ring(struct macb_queue *queue) { + struct macb *bp = queue->bp; dma_addr_t addr; struct macb_dma_desc *desc = NULL; int i; - addr = bp->rx_buffers_dma; + addr = queue->rx_buffers_dma; for (i = 0; i < bp->rx_ring_size; i++) { - desc = macb_rx_desc(bp, i); + desc = macb_rx_desc(queue, i); macb_set_addr(bp, desc, addr); desc->ctrl = 0; addr += bp->rx_buffer_size; } desc->addr |= MACB_BIT(RX_WRAP); - bp->rx_tail = 0; + queue->rx_tail = 0; } -static int macb_rx(struct macb *bp, int budget) +static int macb_rx(struct macb_queue *queue, int budget) { + struct macb *bp = queue->bp; bool reset_rx_queue = false; int received = 0; unsigned int tail; int first_frag = -1; - for (tail = bp->rx_tail; budget > 0; tail++) { - struct macb_dma_desc *desc = macb_rx_desc(bp, tail); + for (tail = queue->rx_tail; budget > 0; tail++) { + struct macb_dma_desc *desc = macb_rx_desc(queue, tail); u32 ctrl; /* Make hw descriptor updates visible to CPU */ @@ -1159,7 +1172,7 @@ static int macb_rx(struct macb *bp, int budget) if (ctrl & MACB_BIT(RX_SOF)) { if (first_frag != -1) - discard_partial_frame(bp, first_frag, tail); + discard_partial_frame(queue, first_frag, tail); first_frag = tail; } @@ -1171,7 +1184,7 @@ static int macb_rx(struct macb *bp, int budget) continue; } - dropped = macb_rx_frame(bp, first_frag, tail); + dropped = macb_rx_frame(queue, first_frag, tail); first_frag = -1; if (unlikely(dropped < 0)) { reset_rx_queue = true; @@ -1195,8 +1208,8 @@ static int macb_rx(struct macb *bp, int budget) ctrl = macb_readl(bp, NCR); macb_writel(bp, NCR, ctrl & ~MACB_BIT(RE)); - macb_init_rx_ring(bp); - macb_writel(bp, RBQP, bp->rx_ring_dma); + macb_init_rx_ring(queue); + queue_writel(queue, RBQP, queue->rx_ring_dma); macb_writel(bp, NCR, ctrl | MACB_BIT(RE)); @@ -1205,16 +1218,17 @@ static int macb_rx(struct macb *bp, int budget) } if (first_frag != -1) - bp->rx_tail = first_frag; + queue->rx_tail = first_frag; else - bp->rx_tail = tail; + queue->rx_tail = tail; return received; } static int macb_poll(struct napi_struct *napi, int budget) { - struct macb *bp = container_of(napi, struct macb, napi); + struct macb_queue *queue = container_of(napi, struct macb_queue, napi); + struct macb *bp = queue->bp; int work_done; u32 status; @@ -1224,7 +1238,7 @@ static int macb_poll(struct napi_struct *napi, int budget) netdev_vdbg(bp->dev, "poll: status = %08lx, budget = %d\n", (unsigned long)status, budget); - work_done = bp->macbgem_ops.mog_rx(bp, budget); + work_done = bp->macbgem_ops.mog_rx(queue, budget); if (work_done < budget) { napi_complete_done(napi, work_done); @@ -1232,10 +1246,10 @@ static int macb_poll(struct napi_struct *napi, int budget) status = macb_readl(bp, RSR); if (status) { if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE) - macb_writel(bp, ISR, MACB_BIT(RCOMP)); + queue_writel(queue, ISR, MACB_BIT(RCOMP)); napi_reschedule(napi); } else { - macb_writel(bp, IER, MACB_RX_INT_FLAGS); + queue_writel(queue, IER, MACB_RX_INT_FLAGS); } } @@ -1282,9 +1296,9 @@ static irqreturn_t macb_interrupt(int irq, void *dev_id) if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE) queue_writel(queue, ISR, MACB_BIT(RCOMP)); - if (napi_schedule_prep(&bp->napi)) { + if (napi_schedule_prep(&queue->napi)) { netdev_vdbg(bp->dev, "scheduling RX softirq\n"); - __napi_schedule(&bp->napi); + __napi_schedule(&queue->napi); } } @@ -1708,38 +1722,44 @@ static void gem_free_rx_buffers(struct macb *bp) { struct sk_buff *skb; struct macb_dma_desc *desc; + struct macb_queue *queue; dma_addr_t addr; + unsigned int q; int i; - if (!bp->rx_skbuff) - return; + for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue) { + if (!queue->rx_skbuff) + continue; - for (i = 0; i < bp->rx_ring_size; i++) { - skb = bp->rx_skbuff[i]; + for (i = 0; i < bp->rx_ring_size; i++) { + skb = queue->rx_skbuff[i]; - if (!skb) - continue; + if (!skb) + continue; - desc = macb_rx_desc(bp, i); - addr = macb_get_addr(bp, desc); + desc = macb_rx_desc(queue, i); + addr = macb_get_addr(bp, desc); - dma_unmap_single(&bp->pdev->dev, addr, bp->rx_buffer_size, - DMA_FROM_DEVICE); - dev_kfree_skb_any(skb); - skb = NULL; - } + dma_unmap_single(&bp->pdev->dev, addr, bp->rx_buffer_size, + DMA_FROM_DEVICE); + dev_kfree_skb_any(skb); + skb = NULL; + } - kfree(bp->rx_skbuff); - bp->rx_skbuff = NULL; + kfree(queue->rx_skbuff); + queue->rx_skbuff = NULL; + } } static void macb_free_rx_buffers(struct macb *bp) { - if (bp->rx_buffers) { + struct macb_queue *queue = &bp->queues[0]; + + if (queue->rx_buffers) { dma_free_coherent(&bp->pdev->dev, bp->rx_ring_size * bp->rx_buffer_size, - bp->rx_buffers, bp->rx_buffers_dma); - bp->rx_buffers = NULL; + queue->rx_buffers, queue->rx_buffers_dma); + queue->rx_buffers = NULL; } } @@ -1748,11 +1768,12 @@ static void macb_free_consistent(struct macb *bp) struct macb_queue *queue; unsigned int q; + queue = &bp->queues[0]; bp->macbgem_ops.mog_free_rx_buffers(bp); - if (bp->rx_ring) { + if (queue->rx_ring) { dma_free_coherent(&bp->pdev->dev, RX_RING_BYTES(bp), - bp->rx_ring, bp->rx_ring_dma); - bp->rx_ring = NULL; + queue->rx_ring, queue->rx_ring_dma); + queue->rx_ring = NULL; } for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue) { @@ -1768,32 +1789,37 @@ static void macb_free_consistent(struct macb *bp) static int gem_alloc_rx_buffers(struct macb *bp) { + struct macb_queue *queue; + unsigned int q; int size; - size = bp->rx_ring_size * sizeof(struct sk_buff *); - bp->rx_skbuff = kzalloc(size, GFP_KERNEL); - if (!bp->rx_skbuff) - return -ENOMEM; - else - netdev_dbg(bp->dev, - "Allocated %d RX struct sk_buff entries at %p\n", - bp->rx_ring_size, bp->rx_skbuff); + for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue) { + size = bp->rx_ring_size * sizeof(struct sk_buff *); + queue->rx_skbuff = kzalloc(size, GFP_KERNEL); + if (!queue->rx_skbuff) + return -ENOMEM; + else + netdev_dbg(bp->dev, + "Allocated %d RX struct sk_buff entries at %p\n", + bp->rx_ring_size, queue->rx_skbuff); + } return 0; } static int macb_alloc_rx_buffers(struct macb *bp) { + struct macb_queue *queue = &bp->queues[0]; int size; size = bp->rx_ring_size * bp->rx_buffer_size; - bp->rx_buffers = dma_alloc_coherent(&bp->pdev->dev, size, - &bp->rx_buffers_dma, GFP_KERNEL); - if (!bp->rx_buffers) + queue->rx_buffers = dma_alloc_coherent(&bp->pdev->dev, size, + &queue->rx_buffers_dma, GFP_KERNEL); + if (!queue->rx_buffers) return -ENOMEM; netdev_dbg(bp->dev, "Allocated RX buffers of %d bytes at %08lx (mapped %p)\n", - size, (unsigned long)bp->rx_buffers_dma, bp->rx_buffers); + size, (unsigned long)queue->rx_buffers_dma, queue->rx_buffers); return 0; } @@ -1819,17 +1845,16 @@ static int macb_alloc_consistent(struct macb *bp) queue->tx_skb = kmalloc(size, GFP_KERNEL); if (!queue->tx_skb) goto out_err; - } - - size = RX_RING_BYTES(bp); - bp->rx_ring = dma_alloc_coherent(&bp->pdev->dev, size, - &bp->rx_ring_dma, GFP_KERNEL); - if (!bp->rx_ring) - goto out_err; - netdev_dbg(bp->dev, - "Allocated RX ring of %d bytes at %08lx (mapped %p)\n", - size, (unsigned long)bp->rx_ring_dma, bp->rx_ring); + size = RX_RING_BYTES(bp); + queue->rx_ring = dma_alloc_coherent(&bp->pdev->dev, size, + &queue->rx_ring_dma, GFP_KERNEL); + if (!queue->rx_ring) + goto out_err; + netdev_dbg(bp->dev, + "Allocated RX ring of %d bytes at %08lx (mapped %p)\n", + size, (unsigned long)queue->rx_ring_dma, queue->rx_ring); + } if (bp->macbgem_ops.mog_alloc_rx_buffers(bp)) goto out_err; @@ -1856,12 +1881,13 @@ static void gem_init_rings(struct macb *bp) desc->ctrl |= MACB_BIT(TX_WRAP); queue->tx_head = 0; queue->tx_tail = 0; - } - bp->rx_tail = 0; - bp->rx_prepared_head = 0; + queue->rx_tail = 0; + queue->rx_prepared_head = 0; + + gem_rx_refill(queue); + } - gem_rx_refill(bp); } static void macb_init_rings(struct macb *bp) @@ -1869,7 +1895,7 @@ static void macb_init_rings(struct macb *bp) int i; struct macb_dma_desc *desc = NULL; - macb_init_rx_ring(bp); + macb_init_rx_ring(&bp->queues[0]); for (i = 0; i < bp->tx_ring_size; i++) { desc = macb_tx_desc(&bp->queues[0], i); @@ -1978,11 +2004,20 @@ static u32 macb_dbw(struct macb *bp) */ static void macb_configure_dma(struct macb *bp) { + struct macb_queue *queue; + u32 buffer_size; + unsigned int q; u32 dmacfg; + buffer_size = bp->rx_buffer_size / RX_BUFFER_MULTIPLE; if (macb_is_gem(bp)) { dmacfg = gem_readl(bp, DMACFG) & ~GEM_BF(RXBS, -1L); - dmacfg |= GEM_BF(RXBS, bp->rx_buffer_size / RX_BUFFER_MULTIPLE); + for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue) { + if (q) + queue_writel(queue, RBQS, buffer_size); + else + dmacfg |= GEM_BF(RXBS, buffer_size); + } if (bp->dma_burst_length) dmacfg = GEM_BFINS(FBLDO, bp->dma_burst_length, dmacfg); dmacfg |= GEM_BIT(TXPBMS) | GEM_BF(RXBMS, -1L); @@ -2051,12 +2086,12 @@ static void macb_init_hw(struct macb *bp) macb_configure_dma(bp); /* Initialize TX and RX buffers */ - macb_writel(bp, RBQP, lower_32_bits(bp->rx_ring_dma)); + for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue) { + queue_writel(queue, RBQP, lower_32_bits(queue->rx_ring_dma)); #ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT - if (bp->hw_dma_cap & HW_DMA_CAP_64B) - macb_writel(bp, RBQPH, upper_32_bits(bp->rx_ring_dma)); + if (bp->hw_dma_cap & HW_DMA_CAP_64B) + queue_writel(queue, RBQPH, upper_32_bits(queue->rx_ring_dma)); #endif - for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue) { queue_writel(queue, TBQP, lower_32_bits(queue->tx_ring_dma)); #ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT if (bp->hw_dma_cap & HW_DMA_CAP_64B) @@ -2197,6 +2232,8 @@ static int macb_open(struct net_device *dev) { struct macb *bp = netdev_priv(dev); size_t bufsz = dev->mtu + ETH_HLEN + ETH_FCS_LEN + NET_IP_ALIGN; + struct macb_queue *queue; + unsigned int q; int err; netdev_dbg(bp->dev, "open\n"); @@ -2218,11 +2255,12 @@ static int macb_open(struct net_device *dev) return err; } - napi_enable(&bp->napi); - bp->macbgem_ops.mog_init_rings(bp); macb_init_hw(bp); + for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue) + napi_enable(&queue->napi); + /* schedule a link state check */ phy_start(dev->phydev); @@ -2237,10 +2275,14 @@ static int macb_open(struct net_device *dev) static int macb_close(struct net_device *dev) { struct macb *bp = netdev_priv(dev); + struct macb_queue *queue; unsigned long flags; + unsigned int q; netif_tx_stop_all_queues(dev); - napi_disable(&bp->napi); + + for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue) + napi_disable(&queue->napi); if (dev->phydev) phy_stop(dev->phydev); @@ -2270,7 +2312,10 @@ static int macb_change_mtu(struct net_device *dev, int new_mtu) static void gem_update_stats(struct macb *bp) { - unsigned int i; + struct macb_queue *queue; + unsigned int i, q, idx; + unsigned long *stat; + u32 *p = &bp->hw_stats.gem.tx_octets_31_0; for (i = 0; i < GEM_STATS_LEN; ++i, ++p) { @@ -2287,6 +2332,11 @@ static void gem_update_stats(struct macb *bp) *(++p) += val; } } + + idx = GEM_STATS_LEN; + for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue) + for (i = 0, stat = &queue->stats.first; i < QUEUE_STATS_LEN; ++i, ++stat) + bp->ethtool_stats[idx++] = *stat; } static struct net_device_stats *gem_get_stats(struct macb *bp) @@ -2334,14 +2384,17 @@ static void gem_get_ethtool_stats(struct net_device *dev, bp = netdev_priv(dev); gem_update_stats(bp); - memcpy(data, &bp->ethtool_stats, sizeof(u64) * GEM_STATS_LEN); + memcpy(data, &bp->ethtool_stats, sizeof(u64) + * (GEM_STATS_LEN + QUEUE_STATS_LEN * MACB_MAX_QUEUES)); } static int gem_get_sset_count(struct net_device *dev, int sset) { + struct macb *bp = netdev_priv(dev); + switch (sset) { case ETH_SS_STATS: - return GEM_STATS_LEN; + return GEM_STATS_LEN + bp->num_queues * QUEUE_STATS_LEN; default: return -EOPNOTSUPP; } @@ -2349,13 +2402,25 @@ static int gem_get_sset_count(struct net_device *dev, int sset) static void gem_get_ethtool_strings(struct net_device *dev, u32 sset, u8 *p) { + char stat_string[ETH_GSTRING_LEN]; + struct macb *bp = netdev_priv(dev); + struct macb_queue *queue; unsigned int i; + unsigned int q; switch (sset) { case ETH_SS_STATS: for (i = 0; i < GEM_STATS_LEN; i++, p += ETH_GSTRING_LEN) memcpy(p, gem_statistics[i].stat_string, ETH_GSTRING_LEN); + + for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue) { + for (i = 0; i < QUEUE_STATS_LEN; i++, p += ETH_GSTRING_LEN) { + snprintf(stat_string, ETH_GSTRING_LEN, "q%d_%s", + q, queue_statistics[i].stat_string); + memcpy(p, stat_string, ETH_GSTRING_LEN); + } + } break; } } @@ -2603,6 +2668,307 @@ static int macb_get_ts_info(struct net_device *netdev, return ethtool_op_get_ts_info(netdev, info); } +static void gem_enable_flow_filters(struct macb *bp, bool enable) +{ + struct ethtool_rx_fs_item *item; + u32 t2_scr; + int num_t2_scr; + + num_t2_scr = GEM_BFEXT(T2SCR, gem_readl(bp, DCFG8)); + + list_for_each_entry(item, &bp->rx_fs_list.list, list) { + struct ethtool_rx_flow_spec *fs = &item->fs; + struct ethtool_tcpip4_spec *tp4sp_m; + + if (fs->location >= num_t2_scr) + continue; + + t2_scr = gem_readl_n(bp, SCRT2, fs->location); + + /* enable/disable screener regs for the flow entry */ + t2_scr = GEM_BFINS(ETHTEN, enable, t2_scr); + + /* only enable fields with no masking */ + tp4sp_m = &(fs->m_u.tcp_ip4_spec); + + if (enable && (tp4sp_m->ip4src == 0xFFFFFFFF)) + t2_scr = GEM_BFINS(CMPAEN, 1, t2_scr); + else + t2_scr = GEM_BFINS(CMPAEN, 0, t2_scr); + + if (enable && (tp4sp_m->ip4dst == 0xFFFFFFFF)) + t2_scr = GEM_BFINS(CMPBEN, 1, t2_scr); + else + t2_scr = GEM_BFINS(CMPBEN, 0, t2_scr); + + if (enable && ((tp4sp_m->psrc == 0xFFFF) || (tp4sp_m->pdst == 0xFFFF))) + t2_scr = GEM_BFINS(CMPCEN, 1, t2_scr); + else + t2_scr = GEM_BFINS(CMPCEN, 0, t2_scr); + + gem_writel_n(bp, SCRT2, fs->location, t2_scr); + } +} + +static void gem_prog_cmp_regs(struct macb *bp, struct ethtool_rx_flow_spec *fs) +{ + struct ethtool_tcpip4_spec *tp4sp_v, *tp4sp_m; + uint16_t index = fs->location; + u32 w0, w1, t2_scr; + bool cmp_a = false; + bool cmp_b = false; + bool cmp_c = false; + + tp4sp_v = &(fs->h_u.tcp_ip4_spec); + tp4sp_m = &(fs->m_u.tcp_ip4_spec); + + /* ignore field if any masking set */ + if (tp4sp_m->ip4src == 0xFFFFFFFF) { + /* 1st compare reg - IP source address */ + w0 = 0; + w1 = 0; + w0 = tp4sp_v->ip4src; + w1 = GEM_BFINS(T2DISMSK, 1, w1); /* 32-bit compare */ + w1 = GEM_BFINS(T2CMPOFST, GEM_T2COMPOFST_ETYPE, w1); + w1 = GEM_BFINS(T2OFST, ETYPE_SRCIP_OFFSET, w1); + gem_writel_n(bp, T2CMPW0, T2CMP_OFST(GEM_IP4SRC_CMP(index)), w0); + gem_writel_n(bp, T2CMPW1, T2CMP_OFST(GEM_IP4SRC_CMP(index)), w1); + cmp_a = true; + } + + /* ignore field if any masking set */ + if (tp4sp_m->ip4dst == 0xFFFFFFFF) { + /* 2nd compare reg - IP destination address */ + w0 = 0; + w1 = 0; + w0 = tp4sp_v->ip4dst; + w1 = GEM_BFINS(T2DISMSK, 1, w1); /* 32-bit compare */ + w1 = GEM_BFINS(T2CMPOFST, GEM_T2COMPOFST_ETYPE, w1); + w1 = GEM_BFINS(T2OFST, ETYPE_DSTIP_OFFSET, w1); + gem_writel_n(bp, T2CMPW0, T2CMP_OFST(GEM_IP4DST_CMP(index)), w0); + gem_writel_n(bp, T2CMPW1, T2CMP_OFST(GEM_IP4DST_CMP(index)), w1); + cmp_b = true; + } + + /* ignore both port fields if masking set in both */ + if ((tp4sp_m->psrc == 0xFFFF) || (tp4sp_m->pdst == 0xFFFF)) { + /* 3rd compare reg - source port, destination port */ + w0 = 0; + w1 = 0; + w1 = GEM_BFINS(T2CMPOFST, GEM_T2COMPOFST_IPHDR, w1); + if (tp4sp_m->psrc == tp4sp_m->pdst) { + w0 = GEM_BFINS(T2MASK, tp4sp_v->psrc, w0); + w0 = GEM_BFINS(T2CMP, tp4sp_v->pdst, w0); + w1 = GEM_BFINS(T2DISMSK, 1, w1); /* 32-bit compare */ + w1 = GEM_BFINS(T2OFST, IPHDR_SRCPORT_OFFSET, w1); + } else { + /* only one port definition */ + w1 = GEM_BFINS(T2DISMSK, 0, w1); /* 16-bit compare */ + w0 = GEM_BFINS(T2MASK, 0xFFFF, w0); + if (tp4sp_m->psrc == 0xFFFF) { /* src port */ + w0 = GEM_BFINS(T2CMP, tp4sp_v->psrc, w0); + w1 = GEM_BFINS(T2OFST, IPHDR_SRCPORT_OFFSET, w1); + } else { /* dst port */ + w0 = GEM_BFINS(T2CMP, tp4sp_v->pdst, w0); + w1 = GEM_BFINS(T2OFST, IPHDR_DSTPORT_OFFSET, w1); + } + } + gem_writel_n(bp, T2CMPW0, T2CMP_OFST(GEM_PORT_CMP(index)), w0); + gem_writel_n(bp, T2CMPW1, T2CMP_OFST(GEM_PORT_CMP(index)), w1); + cmp_c = true; + } + + t2_scr = 0; + t2_scr = GEM_BFINS(QUEUE, (fs->ring_cookie) & 0xFF, t2_scr); + t2_scr = GEM_BFINS(ETHT2IDX, SCRT2_ETHT, t2_scr); + if (cmp_a) + t2_scr = GEM_BFINS(CMPA, GEM_IP4SRC_CMP(index), t2_scr); + if (cmp_b) + t2_scr = GEM_BFINS(CMPB, GEM_IP4DST_CMP(index), t2_scr); + if (cmp_c) + t2_scr = GEM_BFINS(CMPC, GEM_PORT_CMP(index), t2_scr); + gem_writel_n(bp, SCRT2, index, t2_scr); +} + +static int gem_add_flow_filter(struct net_device *netdev, + struct ethtool_rxnfc *cmd) +{ + struct macb *bp = netdev_priv(netdev); + struct ethtool_rx_flow_spec *fs = &cmd->fs; + struct ethtool_rx_fs_item *item, *newfs; + unsigned long flags; + int ret = -EINVAL; + bool added = false; + + newfs = kmalloc(sizeof(*newfs), GFP_KERNEL); + if (newfs == NULL) + return -ENOMEM; + memcpy(&newfs->fs, fs, sizeof(newfs->fs)); + + netdev_dbg(netdev, + "Adding flow filter entry,type=%u,queue=%u,loc=%u,src=%08X,dst=%08X,ps=%u,pd=%u\n", + fs->flow_type, (int)fs->ring_cookie, fs->location, + htonl(fs->h_u.tcp_ip4_spec.ip4src), + htonl(fs->h_u.tcp_ip4_spec.ip4dst), + htons(fs->h_u.tcp_ip4_spec.psrc), htons(fs->h_u.tcp_ip4_spec.pdst)); + + spin_lock_irqsave(&bp->rx_fs_lock, flags); + + /* find correct place to add in list */ + list_for_each_entry(item, &bp->rx_fs_list.list, list) { + if (item->fs.location > newfs->fs.location) { + list_add_tail(&newfs->list, &item->list); + added = true; + break; + } else if (item->fs.location == fs->location) { + netdev_err(netdev, "Rule not added: location %d not free!\n", + fs->location); + ret = -EBUSY; + goto err; + } + } + if (!added) + list_add_tail(&newfs->list, &bp->rx_fs_list.list); + + gem_prog_cmp_regs(bp, fs); + bp->rx_fs_list.count++; + /* enable filtering if NTUPLE on */ + if (netdev->features & NETIF_F_NTUPLE) + gem_enable_flow_filters(bp, 1); + + spin_unlock_irqrestore(&bp->rx_fs_lock, flags); + return 0; + +err: + spin_unlock_irqrestore(&bp->rx_fs_lock, flags); + kfree(newfs); + return ret; +} + +static int gem_del_flow_filter(struct net_device *netdev, + struct ethtool_rxnfc *cmd) +{ + struct macb *bp = netdev_priv(netdev); + struct ethtool_rx_fs_item *item; + struct ethtool_rx_flow_spec *fs; + unsigned long flags; + + spin_lock_irqsave(&bp->rx_fs_lock, flags); + + list_for_each_entry(item, &bp->rx_fs_list.list, list) { + if (item->fs.location == cmd->fs.location) { + /* disable screener regs for the flow entry */ + fs = &(item->fs); + netdev_dbg(netdev, + "Deleting flow filter entry,type=%u,queue=%u,loc=%u,src=%08X,dst=%08X,ps=%u,pd=%u\n", + fs->flow_type, (int)fs->ring_cookie, fs->location, + htonl(fs->h_u.tcp_ip4_spec.ip4src), + htonl(fs->h_u.tcp_ip4_spec.ip4dst), + htons(fs->h_u.tcp_ip4_spec.psrc), + htons(fs->h_u.tcp_ip4_spec.pdst)); + + gem_writel_n(bp, SCRT2, fs->location, 0); + + list_del(&item->list); + bp->rx_fs_list.count--; + spin_unlock_irqrestore(&bp->rx_fs_lock, flags); + kfree(item); + return 0; + } + } + + spin_unlock_irqrestore(&bp->rx_fs_lock, flags); + return -EINVAL; +} + +static int gem_get_flow_entry(struct net_device *netdev, + struct ethtool_rxnfc *cmd) +{ + struct macb *bp = netdev_priv(netdev); + struct ethtool_rx_fs_item *item; + + list_for_each_entry(item, &bp->rx_fs_list.list, list) { + if (item->fs.location == cmd->fs.location) { + memcpy(&cmd->fs, &item->fs, sizeof(cmd->fs)); + return 0; + } + } + return -EINVAL; +} + +static int gem_get_all_flow_entries(struct net_device *netdev, + struct ethtool_rxnfc *cmd, u32 *rule_locs) +{ + struct macb *bp = netdev_priv(netdev); + struct ethtool_rx_fs_item *item; + uint32_t cnt = 0; + + list_for_each_entry(item, &bp->rx_fs_list.list, list) { + if (cnt == cmd->rule_cnt) + return -EMSGSIZE; + rule_locs[cnt] = item->fs.location; + cnt++; + } + cmd->data = bp->max_tuples; + cmd->rule_cnt = cnt; + + return 0; +} + +static int gem_get_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd, + u32 *rule_locs) +{ + struct macb *bp = netdev_priv(netdev); + int ret = 0; + + switch (cmd->cmd) { + case ETHTOOL_GRXRINGS: + cmd->data = bp->num_queues; + break; + case ETHTOOL_GRXCLSRLCNT: + cmd->rule_cnt = bp->rx_fs_list.count; + break; + case ETHTOOL_GRXCLSRULE: + ret = gem_get_flow_entry(netdev, cmd); + break; + case ETHTOOL_GRXCLSRLALL: + ret = gem_get_all_flow_entries(netdev, cmd, rule_locs); + break; + default: + netdev_err(netdev, + "Command parameter %d is not supported\n", cmd->cmd); + ret = -EOPNOTSUPP; + } + + return ret; +} + +static int gem_set_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd) +{ + struct macb *bp = netdev_priv(netdev); + int ret; + + switch (cmd->cmd) { + case ETHTOOL_SRXCLSRLINS: + if ((cmd->fs.location >= bp->max_tuples) + || (cmd->fs.ring_cookie >= bp->num_queues)) { + ret = -EINVAL; + break; + } + ret = gem_add_flow_filter(netdev, cmd); + break; + case ETHTOOL_SRXCLSRLDEL: + ret = gem_del_flow_filter(netdev, cmd); + break; + default: + netdev_err(netdev, + "Command parameter %d is not supported\n", cmd->cmd); + ret = -EOPNOTSUPP; + } + + return ret; +} + static const struct ethtool_ops macb_ethtool_ops = { .get_regs_len = macb_get_regs_len, .get_regs = macb_get_regs, @@ -2628,6 +2994,8 @@ static const struct ethtool_ops gem_ethtool_ops = { .set_link_ksettings = phy_ethtool_set_link_ksettings, .get_ringparam = macb_get_ringparam, .set_ringparam = macb_set_ringparam, + .get_rxnfc = gem_get_rxnfc, + .set_rxnfc = gem_set_rxnfc, }; static int macb_ioctl(struct net_device *dev, struct ifreq *rq, int cmd) @@ -2685,6 +3053,12 @@ static int macb_set_features(struct net_device *netdev, gem_writel(bp, NCFGR, netcfg); } + /* RX Flow Filters */ + if ((changed & NETIF_F_NTUPLE) && macb_is_gem(bp)) { + bool turn_on = features & NETIF_F_NTUPLE; + + gem_enable_flow_filters(bp, turn_on); + } return 0; } @@ -2850,7 +3224,7 @@ static int macb_init(struct platform_device *pdev) struct macb *bp = netdev_priv(dev); struct macb_queue *queue; int err; - u32 val; + u32 val, reg; bp->tx_ring_size = DEFAULT_TX_RING_SIZE; bp->rx_ring_size = DEFAULT_RX_RING_SIZE; @@ -2865,15 +3239,20 @@ static int macb_init(struct platform_device *pdev) queue = &bp->queues[q]; queue->bp = bp; + netif_napi_add(dev, &queue->napi, macb_poll, 64); if (hw_q) { queue->ISR = GEM_ISR(hw_q - 1); queue->IER = GEM_IER(hw_q - 1); queue->IDR = GEM_IDR(hw_q - 1); queue->IMR = GEM_IMR(hw_q - 1); queue->TBQP = GEM_TBQP(hw_q - 1); + queue->RBQP = GEM_RBQP(hw_q - 1); + queue->RBQS = GEM_RBQS(hw_q - 1); #ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT - if (bp->hw_dma_cap & HW_DMA_CAP_64B) + if (bp->hw_dma_cap & HW_DMA_CAP_64B) { queue->TBQPH = GEM_TBQPH(hw_q - 1); + queue->RBQPH = GEM_RBQPH(hw_q - 1); + } #endif } else { /* queue0 uses legacy registers */ @@ -2882,9 +3261,12 @@ static int macb_init(struct platform_device *pdev) queue->IDR = MACB_IDR; queue->IMR = MACB_IMR; queue->TBQP = MACB_TBQP; + queue->RBQP = MACB_RBQP; #ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT - if (bp->hw_dma_cap & HW_DMA_CAP_64B) + if (bp->hw_dma_cap & HW_DMA_CAP_64B) { queue->TBQPH = MACB_TBQPH; + queue->RBQPH = MACB_RBQPH; + } #endif } @@ -2908,7 +3290,6 @@ static int macb_init(struct platform_device *pdev) } dev->netdev_ops = &macb_netdev_ops; - netif_napi_add(dev, &bp->napi, macb_poll, 64); /* setup appropriated routines according to adapter type */ if (macb_is_gem(bp)) { @@ -2941,6 +3322,30 @@ static int macb_init(struct platform_device *pdev) dev->hw_features &= ~NETIF_F_SG; dev->features = dev->hw_features; + /* Check RX Flow Filters support. + * Max Rx flows set by availability of screeners & compare regs: + * each 4-tuple define requires 1 T2 screener reg + 3 compare regs + */ + reg = gem_readl(bp, DCFG8); + bp->max_tuples = min((GEM_BFEXT(SCR2CMP, reg) / 3), + GEM_BFEXT(T2SCR, reg)); + if (bp->max_tuples > 0) { + /* also needs one ethtype match to check IPv4 */ + if (GEM_BFEXT(SCR2ETH, reg) > 0) { + /* program this reg now */ + reg = 0; + reg = GEM_BFINS(ETHTCMP, (uint16_t)ETH_P_IP, reg); + gem_writel_n(bp, ETHT, SCRT2_ETHT, reg); + /* Filtering is supported in hw but don't enable it in kernel now */ + dev->hw_features |= NETIF_F_NTUPLE; + /* init Rx flow definitions */ + INIT_LIST_HEAD(&bp->rx_fs_list.list); + bp->rx_fs_list.count = 0; + spin_lock_init(&bp->rx_fs_lock); + } else + bp->max_tuples = 0; + } + if (!(bp->caps & MACB_CAPS_USRIO_DISABLED)) { val = 0; if (bp->phy_interface == PHY_INTERFACE_MODE_RGMII) @@ -2977,34 +3382,35 @@ static int macb_init(struct platform_device *pdev) static int at91ether_start(struct net_device *dev) { struct macb *lp = netdev_priv(dev); + struct macb_queue *q = &lp->queues[0]; struct macb_dma_desc *desc; dma_addr_t addr; u32 ctl; int i; - lp->rx_ring = dma_alloc_coherent(&lp->pdev->dev, + q->rx_ring = dma_alloc_coherent(&lp->pdev->dev, (AT91ETHER_MAX_RX_DESCR * macb_dma_desc_get_size(lp)), - &lp->rx_ring_dma, GFP_KERNEL); - if (!lp->rx_ring) + &q->rx_ring_dma, GFP_KERNEL); + if (!q->rx_ring) return -ENOMEM; - lp->rx_buffers = dma_alloc_coherent(&lp->pdev->dev, + q->rx_buffers = dma_alloc_coherent(&lp->pdev->dev, AT91ETHER_MAX_RX_DESCR * AT91ETHER_MAX_RBUFF_SZ, - &lp->rx_buffers_dma, GFP_KERNEL); - if (!lp->rx_buffers) { + &q->rx_buffers_dma, GFP_KERNEL); + if (!q->rx_buffers) { dma_free_coherent(&lp->pdev->dev, AT91ETHER_MAX_RX_DESCR * macb_dma_desc_get_size(lp), - lp->rx_ring, lp->rx_ring_dma); - lp->rx_ring = NULL; + q->rx_ring, q->rx_ring_dma); + q->rx_ring = NULL; return -ENOMEM; } - addr = lp->rx_buffers_dma; + addr = q->rx_buffers_dma; for (i = 0; i < AT91ETHER_MAX_RX_DESCR; i++) { - desc = macb_rx_desc(lp, i); + desc = macb_rx_desc(q, i); macb_set_addr(lp, desc, addr); desc->ctrl = 0; addr += AT91ETHER_MAX_RBUFF_SZ; @@ -3014,10 +3420,10 @@ static int at91ether_start(struct net_device *dev) desc->addr |= MACB_BIT(RX_WRAP); /* Reset buffer index */ - lp->rx_tail = 0; + q->rx_tail = 0; /* Program address of descriptor list in Rx Buffer Queue register */ - macb_writel(lp, RBQP, lp->rx_ring_dma); + macb_writel(lp, RBQP, q->rx_ring_dma); /* Enable Receive and Transmit */ ctl = macb_readl(lp, NCR); @@ -3064,6 +3470,7 @@ static int at91ether_open(struct net_device *dev) static int at91ether_close(struct net_device *dev) { struct macb *lp = netdev_priv(dev); + struct macb_queue *q = &lp->queues[0]; u32 ctl; /* Disable Receiver and Transmitter */ @@ -3084,13 +3491,13 @@ static int at91ether_close(struct net_device *dev) dma_free_coherent(&lp->pdev->dev, AT91ETHER_MAX_RX_DESCR * macb_dma_desc_get_size(lp), - lp->rx_ring, lp->rx_ring_dma); - lp->rx_ring = NULL; + q->rx_ring, q->rx_ring_dma); + q->rx_ring = NULL; dma_free_coherent(&lp->pdev->dev, AT91ETHER_MAX_RX_DESCR * AT91ETHER_MAX_RBUFF_SZ, - lp->rx_buffers, lp->rx_buffers_dma); - lp->rx_buffers = NULL; + q->rx_buffers, q->rx_buffers_dma); + q->rx_buffers = NULL; return 0; } @@ -3134,14 +3541,15 @@ static int at91ether_start_xmit(struct sk_buff *skb, struct net_device *dev) static void at91ether_rx(struct net_device *dev) { struct macb *lp = netdev_priv(dev); + struct macb_queue *q = &lp->queues[0]; struct macb_dma_desc *desc; unsigned char *p_recv; struct sk_buff *skb; unsigned int pktlen; - desc = macb_rx_desc(lp, lp->rx_tail); + desc = macb_rx_desc(q, q->rx_tail); while (desc->addr & MACB_BIT(RX_USED)) { - p_recv = lp->rx_buffers + lp->rx_tail * AT91ETHER_MAX_RBUFF_SZ; + p_recv = q->rx_buffers + q->rx_tail * AT91ETHER_MAX_RBUFF_SZ; pktlen = MACB_BF(RX_FRMLEN, desc->ctrl); skb = netdev_alloc_skb(dev, pktlen + 2); if (skb) { @@ -3163,12 +3571,12 @@ static void at91ether_rx(struct net_device *dev) desc->addr &= ~MACB_BIT(RX_USED); /* wrap after last buffer */ - if (lp->rx_tail == AT91ETHER_MAX_RX_DESCR - 1) - lp->rx_tail = 0; + if (q->rx_tail == AT91ETHER_MAX_RX_DESCR - 1) + q->rx_tail = 0; else - lp->rx_tail++; + q->rx_tail++; - desc = macb_rx_desc(lp, lp->rx_tail); + desc = macb_rx_desc(q, q->rx_tail); } } @@ -3394,7 +3802,6 @@ static int macb_probe(struct platform_device *pdev) = macb_config->clk_init; int (*init)(struct platform_device *) = macb_config->init; struct device_node *np = pdev->dev.of_node; - struct device_node *phy_node; struct clk *pclk, *hclk = NULL, *tx_clk = NULL, *rx_clk = NULL; unsigned int queue_mask, num_queues; struct macb_platform_data *pdata; @@ -3500,18 +3907,6 @@ static int macb_probe(struct platform_device *pdev) else macb_get_hwaddr(bp); - /* Power up the PHY if there is a GPIO reset */ - phy_node = of_get_next_available_child(np, NULL); - if (phy_node) { - int gpio = of_get_named_gpio(phy_node, "reset-gpios", 0); - - if (gpio_is_valid(gpio)) { - bp->reset_gpio = gpio_to_desc(gpio); - gpiod_direction_output(bp->reset_gpio, 1); - } - } - of_node_put(phy_node); - err = of_get_phy_mode(np); if (err < 0) { pdata = dev_get_platdata(&pdev->dev); @@ -3558,10 +3953,6 @@ err_out_unregister_mdio: of_phy_deregister_fixed_link(np); mdiobus_free(bp->mii_bus); - /* Shutdown the PHY if there is a GPIO reset */ - if (bp->reset_gpio) - gpiod_set_value(bp->reset_gpio, 0); - err_out_free_netdev: free_netdev(dev); @@ -3592,10 +3983,6 @@ static int macb_remove(struct platform_device *pdev) dev->phydev = NULL; mdiobus_free(bp->mii_bus); - /* Shutdown the PHY if there is a GPIO reset */ - if (bp->reset_gpio) - gpiod_set_value(bp->reset_gpio, 0); - unregister_netdev(dev); clk_disable_unprepare(bp->tx_clk); clk_disable_unprepare(bp->hclk); diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c b/drivers/net/ethernet/cavium/thunder/nicvf_main.c index a063c36c4c58..52b3a6044f85 100644 --- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c +++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c @@ -65,6 +65,11 @@ module_param(cpi_alg, int, S_IRUGO); MODULE_PARM_DESC(cpi_alg, "PFC algorithm (0=none, 1=VLAN, 2=VLAN16, 3=IP Diffserv)"); +struct nicvf_xdp_tx { + u64 dma_addr; + u8 qidx; +}; + static inline u8 nicvf_netdev_qidx(struct nicvf *nic, u8 qidx) { if (nic->sqs_mode) @@ -500,14 +505,29 @@ static int nicvf_init_resources(struct nicvf *nic) return 0; } +static void nicvf_unmap_page(struct nicvf *nic, struct page *page, u64 dma_addr) +{ + /* Check if it's a recycled page, if not unmap the DMA mapping. + * Recycled page holds an extra reference. + */ + if (page_ref_count(page) == 1) { + dma_addr &= PAGE_MASK; + dma_unmap_page_attrs(&nic->pdev->dev, dma_addr, + RCV_FRAG_LEN + XDP_HEADROOM, + DMA_FROM_DEVICE, + DMA_ATTR_SKIP_CPU_SYNC); + } +} + static inline bool nicvf_xdp_rx(struct nicvf *nic, struct bpf_prog *prog, struct cqe_rx_t *cqe_rx, struct snd_queue *sq, struct sk_buff **skb) { struct xdp_buff xdp; struct page *page; + struct nicvf_xdp_tx *xdp_tx = NULL; u32 action; - u16 len, offset = 0; + u16 len, err, offset = 0; u64 dma_addr, cpu_addr; void *orig_data; @@ -521,7 +541,7 @@ static inline bool nicvf_xdp_rx(struct nicvf *nic, struct bpf_prog *prog, cpu_addr = (u64)phys_to_virt(cpu_addr); page = virt_to_page((void *)cpu_addr); - xdp.data_hard_start = page_address(page); + xdp.data_hard_start = page_address(page) + RCV_BUF_HEADROOM; xdp.data = (void *)cpu_addr; xdp_set_data_meta_invalid(&xdp); xdp.data_end = xdp.data + len; @@ -540,18 +560,7 @@ static inline bool nicvf_xdp_rx(struct nicvf *nic, struct bpf_prog *prog, switch (action) { case XDP_PASS: - /* Check if it's a recycled page, if not - * unmap the DMA mapping. - * - * Recycled page holds an extra reference. - */ - if (page_ref_count(page) == 1) { - dma_addr &= PAGE_MASK; - dma_unmap_page_attrs(&nic->pdev->dev, dma_addr, - RCV_FRAG_LEN + XDP_PACKET_HEADROOM, - DMA_FROM_DEVICE, - DMA_ATTR_SKIP_CPU_SYNC); - } + nicvf_unmap_page(nic, page, dma_addr); /* Build SKB and pass on packet to network stack */ *skb = build_skb(xdp.data, @@ -564,6 +573,20 @@ static inline bool nicvf_xdp_rx(struct nicvf *nic, struct bpf_prog *prog, case XDP_TX: nicvf_xdp_sq_append_pkt(nic, sq, (u64)xdp.data, dma_addr, len); return true; + case XDP_REDIRECT: + /* Save DMA address for use while transmitting */ + xdp_tx = (struct nicvf_xdp_tx *)page_address(page); + xdp_tx->dma_addr = dma_addr; + xdp_tx->qidx = nicvf_netdev_qidx(nic, cqe_rx->rq_idx); + + err = xdp_do_redirect(nic->pnicvf->netdev, &xdp, prog); + if (!err) + return true; + + /* Free the page on error */ + nicvf_unmap_page(nic, page, dma_addr); + put_page(page); + break; default: bpf_warn_invalid_xdp_action(action); /* fall through */ @@ -571,18 +594,7 @@ static inline bool nicvf_xdp_rx(struct nicvf *nic, struct bpf_prog *prog, trace_xdp_exception(nic->netdev, prog, action); /* fall through */ case XDP_DROP: - /* Check if it's a recycled page, if not - * unmap the DMA mapping. - * - * Recycled page holds an extra reference. - */ - if (page_ref_count(page) == 1) { - dma_addr &= PAGE_MASK; - dma_unmap_page_attrs(&nic->pdev->dev, dma_addr, - RCV_FRAG_LEN + XDP_PACKET_HEADROOM, - DMA_FROM_DEVICE, - DMA_ATTR_SKIP_CPU_SYNC); - } + nicvf_unmap_page(nic, page, dma_addr); put_page(page); return true; } @@ -1764,6 +1776,50 @@ static int nicvf_xdp(struct net_device *netdev, struct netdev_bpf *xdp) } } +static int nicvf_xdp_xmit(struct net_device *netdev, struct xdp_buff *xdp) +{ + struct nicvf *nic = netdev_priv(netdev); + struct nicvf *snic = nic; + struct nicvf_xdp_tx *xdp_tx; + struct snd_queue *sq; + struct page *page; + int err, qidx; + + if (!netif_running(netdev) || !nic->xdp_prog) + return -EINVAL; + + page = virt_to_page(xdp->data); + xdp_tx = (struct nicvf_xdp_tx *)page_address(page); + qidx = xdp_tx->qidx; + + if (xdp_tx->qidx >= nic->xdp_tx_queues) + return -EINVAL; + + /* Get secondary Qset's info */ + if (xdp_tx->qidx >= MAX_SND_QUEUES_PER_QS) { + qidx = xdp_tx->qidx / MAX_SND_QUEUES_PER_QS; + snic = (struct nicvf *)nic->snicvf[qidx - 1]; + if (!snic) + return -EINVAL; + qidx = xdp_tx->qidx % MAX_SND_QUEUES_PER_QS; + } + + sq = &snic->qs->sq[qidx]; + err = nicvf_xdp_sq_append_pkt(snic, sq, (u64)xdp->data, + xdp_tx->dma_addr, + xdp->data_end - xdp->data); + if (err) + return -ENOMEM; + + nicvf_xdp_sq_doorbell(snic, sq, qidx); + return 0; +} + +static void nicvf_xdp_flush(struct net_device *dev) +{ + return; +} + static const struct net_device_ops nicvf_netdev_ops = { .ndo_open = nicvf_open, .ndo_stop = nicvf_stop, @@ -1775,6 +1831,8 @@ static const struct net_device_ops nicvf_netdev_ops = { .ndo_fix_features = nicvf_fix_features, .ndo_set_features = nicvf_set_features, .ndo_bpf = nicvf_xdp, + .ndo_xdp_xmit = nicvf_xdp_xmit, + .ndo_xdp_flush = nicvf_xdp_flush, }; static int nicvf_probe(struct pci_dev *pdev, const struct pci_device_id *ent) @@ -1833,6 +1891,11 @@ static int nicvf_probe(struct pci_dev *pdev, const struct pci_device_id *ent) nic->pdev = pdev; nic->pnicvf = nic; nic->max_queues = qcount; + /* If no of CPUs are too low, there won't be any queues left + * for XDP_TX, hence double it. + */ + if (!nic->t88) + nic->max_queues *= 2; /* MAP VF's configuration registers */ nic->reg_base = pcim_iomap(pdev, PCI_CFG_REG_BAR_NUM, 0); diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c index a3d12dbde95b..f38ea349aa00 100644 --- a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c +++ b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c @@ -204,7 +204,7 @@ static inline int nicvf_alloc_rcv_buffer(struct nicvf *nic, struct rbdr *rbdr, /* Reserve space for header modifications by BPF program */ if (rbdr->is_xdp) - buf_len += XDP_PACKET_HEADROOM; + buf_len += XDP_HEADROOM; /* Check if it's recycled */ if (pgcache) @@ -224,8 +224,9 @@ ret: nic->rb_page = NULL; return -ENOMEM; } + if (pgcache) - pgcache->dma_addr = *rbuf + XDP_PACKET_HEADROOM; + pgcache->dma_addr = *rbuf + XDP_HEADROOM; nic->rb_page_offset += buf_len; } @@ -1236,7 +1237,7 @@ int nicvf_xdp_sq_append_pkt(struct nicvf *nic, struct snd_queue *sq, int qentry; if (subdesc_cnt > sq->xdp_free_cnt) - return 0; + return -1; qentry = nicvf_get_sq_desc(sq, subdesc_cnt); @@ -1247,7 +1248,7 @@ int nicvf_xdp_sq_append_pkt(struct nicvf *nic, struct snd_queue *sq, sq->xdp_desc_cnt += subdesc_cnt; - return 1; + return 0; } /* Calculate no of SQ subdescriptors needed to transmit all @@ -1625,7 +1626,7 @@ static void nicvf_unmap_rcv_buffer(struct nicvf *nic, u64 dma_addr, if (page_ref_count(page) != 1) return; - len += XDP_PACKET_HEADROOM; + len += XDP_HEADROOM; /* Receive buffers in XDP mode are mapped from page start */ dma_addr &= PAGE_MASK; } diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_queues.h b/drivers/net/ethernet/cavium/thunder/nicvf_queues.h index 67d1a3230773..178ab6e8e3c5 100644 --- a/drivers/net/ethernet/cavium/thunder/nicvf_queues.h +++ b/drivers/net/ethernet/cavium/thunder/nicvf_queues.h @@ -11,6 +11,7 @@ #include <linux/netdevice.h> #include <linux/iommu.h> +#include <linux/bpf.h> #include "q_struct.h" #define MAX_QUEUE_SET 128 @@ -92,6 +93,9 @@ #define RCV_FRAG_LEN (SKB_DATA_ALIGN(DMA_BUFFER_LEN + NET_SKB_PAD) + \ SKB_DATA_ALIGN(sizeof(struct skb_shared_info))) +#define RCV_BUF_HEADROOM 128 /* To store dma address for XDP redirect */ +#define XDP_HEADROOM (XDP_PACKET_HEADROOM + RCV_BUF_HEADROOM) + #define MAX_CQES_FOR_TX ((SND_QUEUE_LEN / MIN_SQ_DESC_PER_PKT_XMIT) * \ MAX_CQE_PER_PKT_XMIT) diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h b/drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h index 605689957496..2e71e334d819 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h +++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h @@ -18,17 +18,15 @@ #ifndef __CUDBG_ENTITY_H__ #define __CUDBG_ENTITY_H__ -#define EDC0_FLAG 3 -#define EDC1_FLAG 4 +#define EDC0_FLAG 0 +#define EDC1_FLAG 1 +#define MC_FLAG 2 +#define MC0_FLAG 3 +#define MC1_FLAG 4 +#define HMA_FLAG 5 #define CUDBG_ENTITY_SIGNATURE 0xCCEDB001 -struct card_mem { - u16 size_edc0; - u16 size_edc1; - u16 mem_flag; -}; - struct cudbg_mbox_log { struct mbox_cmd entry; u32 hi[MBOX_LEN / 8]; @@ -87,6 +85,48 @@ struct cudbg_tp_la { u8 data[0]; }; +static const char * const cudbg_region[] = { + "DBQ contexts:", "IMSG contexts:", "FLM cache:", "TCBs:", + "Pstructs:", "Timers:", "Rx FL:", "Tx FL:", "Pstruct FL:", + "Tx payload:", "Rx payload:", "LE hash:", "iSCSI region:", + "TDDP region:", "TPT region:", "STAG region:", "RQ region:", + "RQUDP region:", "PBL region:", "TXPBL region:", + "DBVFIFO region:", "ULPRX state:", "ULPTX state:", + "On-chip queues:" +}; + +/* Memory region info relative to current memory (i.e. wrt 0). */ +struct cudbg_region_info { + bool exist; /* Does region exists in current memory? */ + u32 start; /* Start wrt 0 */ + u32 end; /* End wrt 0 */ +}; + +struct cudbg_mem_desc { + u32 base; + u32 limit; + u32 idx; +}; + +struct cudbg_meminfo { + struct cudbg_mem_desc avail[4]; + struct cudbg_mem_desc mem[ARRAY_SIZE(cudbg_region) + 3]; + u32 avail_c; + u32 mem_c; + u32 up_ram_lo; + u32 up_ram_hi; + u32 up_extmem2_lo; + u32 up_extmem2_hi; + u32 rx_pages_data[3]; + u32 tx_pages_data[4]; + u32 p_structs; + u32 reserved[12]; + u32 port_used[4]; + u32 port_alloc[4]; + u32 loopback_used[NCHAN]; + u32 loopback_alloc[NCHAN]; +}; + struct cudbg_cim_pif_la { int size; u8 data[0]; @@ -145,6 +185,7 @@ struct cudbg_tid_info_region_rev1 { u32 reserved[16]; }; +#define CUDBG_LOWMEM_MAX_CTXT_QIDS 256 #define CUDBG_MAX_FL_QIDS 1024 struct cudbg_ch_cntxt { @@ -334,6 +375,25 @@ static const u32 t5_pm_tx_array[][IREG_NUM_ELEM] = { {0x8FF0, 0x8FF4, 0x10021, 0x1D}, /* t5_pm_tx_regs_10021_to_1003c */ }; +#define CUDBG_NUM_PCIE_CONFIG_REGS 0x61 + +static const u32 t5_pcie_config_array[][2] = { + {0x0, 0x34}, + {0x3c, 0x40}, + {0x50, 0x64}, + {0x70, 0x80}, + {0x94, 0xa0}, + {0xb0, 0xb8}, + {0xd0, 0xd4}, + {0x100, 0x128}, + {0x140, 0x148}, + {0x150, 0x164}, + {0x170, 0x178}, + {0x180, 0x194}, + {0x1a0, 0x1b8}, + {0x1c0, 0x208}, +}; + static const u32 t6_ma_ireg_array[][IREG_NUM_ELEM] = { {0x78f8, 0x78fc, 0xa000, 23}, /* t6_ma_regs_a000_to_a016 */ {0x78f8, 0x78fc, 0xa400, 30}, /* t6_ma_regs_a400_to_a41e */ diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h b/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h index e10ff1ee62c5..e8173ae32158 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h +++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h @@ -47,6 +47,8 @@ enum cudbg_dbg_entity_type { CUDBG_CIM_OBQ_NCSI = 17, CUDBG_EDC0 = 18, CUDBG_EDC1 = 19, + CUDBG_MC0 = 20, + CUDBG_MC1 = 21, CUDBG_RSS = 22, CUDBG_RSS_VF_CONF = 25, CUDBG_PATH_MTU = 27, @@ -56,6 +58,7 @@ enum cudbg_dbg_entity_type { CUDBG_SGE_INDIRECT = 37, CUDBG_ULPRX_LA = 41, CUDBG_TP_LA = 43, + CUDBG_MEMINFO = 44, CUDBG_CIM_PIF_LA = 45, CUDBG_CLK = 46, CUDBG_CIM_OBQ_RXQ0 = 47, @@ -63,6 +66,7 @@ enum cudbg_dbg_entity_type { CUDBG_PCIE_INDIRECT = 50, CUDBG_PM_INDIRECT = 51, CUDBG_TID_INFO = 54, + CUDBG_PCIE_CONFIG = 55, CUDBG_DUMP_CONTEXT = 56, CUDBG_MPS_TCAM = 57, CUDBG_VPD_DATA = 58, @@ -74,6 +78,7 @@ enum cudbg_dbg_entity_type { CUDBG_PBT_TABLE = 65, CUDBG_MBOX_LOG = 66, CUDBG_HMA_INDIRECT = 67, + CUDBG_HMA = 68, CUDBG_MAX_ENTITY = 70, }; diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c b/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c index d699bf88d18f..d73fb6a85f8e 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c @@ -15,12 +15,14 @@ * */ +#include <linux/sort.h> + #include "t4_regs.h" #include "cxgb4.h" #include "cudbg_if.h" #include "cudbg_lib_common.h" -#include "cudbg_lib.h" #include "cudbg_entity.h" +#include "cudbg_lib.h" static void cudbg_write_and_release_buff(struct cudbg_buffer *pin_buff, struct cudbg_buffer *dbg_buff) @@ -84,6 +86,277 @@ static int cudbg_read_vpd_reg(struct adapter *padap, u32 addr, u32 len, return 0; } +static int cudbg_mem_desc_cmp(const void *a, const void *b) +{ + return ((const struct cudbg_mem_desc *)a)->base - + ((const struct cudbg_mem_desc *)b)->base; +} + +int cudbg_fill_meminfo(struct adapter *padap, + struct cudbg_meminfo *meminfo_buff) +{ + struct cudbg_mem_desc *md; + u32 lo, hi, used, alloc; + int n, i; + + memset(meminfo_buff->avail, 0, + ARRAY_SIZE(meminfo_buff->avail) * + sizeof(struct cudbg_mem_desc)); + memset(meminfo_buff->mem, 0, + (ARRAY_SIZE(cudbg_region) + 3) * sizeof(struct cudbg_mem_desc)); + md = meminfo_buff->mem; + + for (i = 0; i < ARRAY_SIZE(meminfo_buff->mem); i++) { + meminfo_buff->mem[i].limit = 0; + meminfo_buff->mem[i].idx = i; + } + + /* Find and sort the populated memory ranges */ + i = 0; + lo = t4_read_reg(padap, MA_TARGET_MEM_ENABLE_A); + if (lo & EDRAM0_ENABLE_F) { + hi = t4_read_reg(padap, MA_EDRAM0_BAR_A); + meminfo_buff->avail[i].base = + cudbg_mbytes_to_bytes(EDRAM0_BASE_G(hi)); + meminfo_buff->avail[i].limit = + meminfo_buff->avail[i].base + + cudbg_mbytes_to_bytes(EDRAM0_SIZE_G(hi)); + meminfo_buff->avail[i].idx = 0; + i++; + } + + if (lo & EDRAM1_ENABLE_F) { + hi = t4_read_reg(padap, MA_EDRAM1_BAR_A); + meminfo_buff->avail[i].base = + cudbg_mbytes_to_bytes(EDRAM1_BASE_G(hi)); + meminfo_buff->avail[i].limit = + meminfo_buff->avail[i].base + + cudbg_mbytes_to_bytes(EDRAM1_SIZE_G(hi)); + meminfo_buff->avail[i].idx = 1; + i++; + } + + if (is_t5(padap->params.chip)) { + if (lo & EXT_MEM0_ENABLE_F) { + hi = t4_read_reg(padap, MA_EXT_MEMORY0_BAR_A); + meminfo_buff->avail[i].base = + cudbg_mbytes_to_bytes(EXT_MEM_BASE_G(hi)); + meminfo_buff->avail[i].limit = + meminfo_buff->avail[i].base + + cudbg_mbytes_to_bytes(EXT_MEM_SIZE_G(hi)); + meminfo_buff->avail[i].idx = 3; + i++; + } + + if (lo & EXT_MEM1_ENABLE_F) { + hi = t4_read_reg(padap, MA_EXT_MEMORY1_BAR_A); + meminfo_buff->avail[i].base = + cudbg_mbytes_to_bytes(EXT_MEM1_BASE_G(hi)); + meminfo_buff->avail[i].limit = + meminfo_buff->avail[i].base + + cudbg_mbytes_to_bytes(EXT_MEM1_SIZE_G(hi)); + meminfo_buff->avail[i].idx = 4; + i++; + } + } else { + if (lo & EXT_MEM_ENABLE_F) { + hi = t4_read_reg(padap, MA_EXT_MEMORY_BAR_A); + meminfo_buff->avail[i].base = + cudbg_mbytes_to_bytes(EXT_MEM_BASE_G(hi)); + meminfo_buff->avail[i].limit = + meminfo_buff->avail[i].base + + cudbg_mbytes_to_bytes(EXT_MEM_SIZE_G(hi)); + meminfo_buff->avail[i].idx = 2; + i++; + } + + if (lo & HMA_MUX_F) { + hi = t4_read_reg(padap, MA_EXT_MEMORY1_BAR_A); + meminfo_buff->avail[i].base = + cudbg_mbytes_to_bytes(EXT_MEM1_BASE_G(hi)); + meminfo_buff->avail[i].limit = + meminfo_buff->avail[i].base + + cudbg_mbytes_to_bytes(EXT_MEM1_SIZE_G(hi)); + meminfo_buff->avail[i].idx = 5; + i++; + } + } + + if (!i) /* no memory available */ + return CUDBG_STATUS_ENTITY_NOT_FOUND; + + meminfo_buff->avail_c = i; + sort(meminfo_buff->avail, i, sizeof(struct cudbg_mem_desc), + cudbg_mem_desc_cmp, NULL); + (md++)->base = t4_read_reg(padap, SGE_DBQ_CTXT_BADDR_A); + (md++)->base = t4_read_reg(padap, SGE_IMSG_CTXT_BADDR_A); + (md++)->base = t4_read_reg(padap, SGE_FLM_CACHE_BADDR_A); + (md++)->base = t4_read_reg(padap, TP_CMM_TCB_BASE_A); + (md++)->base = t4_read_reg(padap, TP_CMM_MM_BASE_A); + (md++)->base = t4_read_reg(padap, TP_CMM_TIMER_BASE_A); + (md++)->base = t4_read_reg(padap, TP_CMM_MM_RX_FLST_BASE_A); + (md++)->base = t4_read_reg(padap, TP_CMM_MM_TX_FLST_BASE_A); + (md++)->base = t4_read_reg(padap, TP_CMM_MM_PS_FLST_BASE_A); + + /* the next few have explicit upper bounds */ + md->base = t4_read_reg(padap, TP_PMM_TX_BASE_A); + md->limit = md->base - 1 + + t4_read_reg(padap, TP_PMM_TX_PAGE_SIZE_A) * + PMTXMAXPAGE_G(t4_read_reg(padap, TP_PMM_TX_MAX_PAGE_A)); + md++; + + md->base = t4_read_reg(padap, TP_PMM_RX_BASE_A); + md->limit = md->base - 1 + + t4_read_reg(padap, TP_PMM_RX_PAGE_SIZE_A) * + PMRXMAXPAGE_G(t4_read_reg(padap, TP_PMM_RX_MAX_PAGE_A)); + md++; + + if (t4_read_reg(padap, LE_DB_CONFIG_A) & HASHEN_F) { + if (CHELSIO_CHIP_VERSION(padap->params.chip) <= CHELSIO_T5) { + hi = t4_read_reg(padap, LE_DB_TID_HASHBASE_A) / 4; + md->base = t4_read_reg(padap, LE_DB_HASH_TID_BASE_A); + } else { + hi = t4_read_reg(padap, LE_DB_HASH_TID_BASE_A); + md->base = t4_read_reg(padap, + LE_DB_HASH_TBL_BASE_ADDR_A); + } + md->limit = 0; + } else { + md->base = 0; + md->idx = ARRAY_SIZE(cudbg_region); /* hide it */ + } + md++; + +#define ulp_region(reg) do { \ + md->base = t4_read_reg(padap, ULP_ ## reg ## _LLIMIT_A);\ + (md++)->limit = t4_read_reg(padap, ULP_ ## reg ## _ULIMIT_A);\ +} while (0) + + ulp_region(RX_ISCSI); + ulp_region(RX_TDDP); + ulp_region(TX_TPT); + ulp_region(RX_STAG); + ulp_region(RX_RQ); + ulp_region(RX_RQUDP); + ulp_region(RX_PBL); + ulp_region(TX_PBL); +#undef ulp_region + md->base = 0; + md->idx = ARRAY_SIZE(cudbg_region); + if (!is_t4(padap->params.chip)) { + u32 fifo_size = t4_read_reg(padap, SGE_DBVFIFO_SIZE_A); + u32 sge_ctrl = t4_read_reg(padap, SGE_CONTROL2_A); + u32 size = 0; + + if (is_t5(padap->params.chip)) { + if (sge_ctrl & VFIFO_ENABLE_F) + size = DBVFIFO_SIZE_G(fifo_size); + } else { + size = T6_DBVFIFO_SIZE_G(fifo_size); + } + + if (size) { + md->base = BASEADDR_G(t4_read_reg(padap, + SGE_DBVFIFO_BADDR_A)); + md->limit = md->base + (size << 2) - 1; + } + } + + md++; + + md->base = t4_read_reg(padap, ULP_RX_CTX_BASE_A); + md->limit = 0; + md++; + md->base = t4_read_reg(padap, ULP_TX_ERR_TABLE_BASE_A); + md->limit = 0; + md++; + + md->base = padap->vres.ocq.start; + if (padap->vres.ocq.size) + md->limit = md->base + padap->vres.ocq.size - 1; + else + md->idx = ARRAY_SIZE(cudbg_region); /* hide it */ + md++; + + /* add any address-space holes, there can be up to 3 */ + for (n = 0; n < i - 1; n++) + if (meminfo_buff->avail[n].limit < + meminfo_buff->avail[n + 1].base) + (md++)->base = meminfo_buff->avail[n].limit; + + if (meminfo_buff->avail[n].limit) + (md++)->base = meminfo_buff->avail[n].limit; + + n = md - meminfo_buff->mem; + meminfo_buff->mem_c = n; + + sort(meminfo_buff->mem, n, sizeof(struct cudbg_mem_desc), + cudbg_mem_desc_cmp, NULL); + + lo = t4_read_reg(padap, CIM_SDRAM_BASE_ADDR_A); + hi = t4_read_reg(padap, CIM_SDRAM_ADDR_SIZE_A) + lo - 1; + meminfo_buff->up_ram_lo = lo; + meminfo_buff->up_ram_hi = hi; + + lo = t4_read_reg(padap, CIM_EXTMEM2_BASE_ADDR_A); + hi = t4_read_reg(padap, CIM_EXTMEM2_ADDR_SIZE_A) + lo - 1; + meminfo_buff->up_extmem2_lo = lo; + meminfo_buff->up_extmem2_hi = hi; + + lo = t4_read_reg(padap, TP_PMM_RX_MAX_PAGE_A); + meminfo_buff->rx_pages_data[0] = PMRXMAXPAGE_G(lo); + meminfo_buff->rx_pages_data[1] = + t4_read_reg(padap, TP_PMM_RX_PAGE_SIZE_A) >> 10; + meminfo_buff->rx_pages_data[2] = (lo & PMRXNUMCHN_F) ? 2 : 1; + + lo = t4_read_reg(padap, TP_PMM_TX_MAX_PAGE_A); + hi = t4_read_reg(padap, TP_PMM_TX_PAGE_SIZE_A); + meminfo_buff->tx_pages_data[0] = PMTXMAXPAGE_G(lo); + meminfo_buff->tx_pages_data[1] = + hi >= (1 << 20) ? (hi >> 20) : (hi >> 10); + meminfo_buff->tx_pages_data[2] = + hi >= (1 << 20) ? 'M' : 'K'; + meminfo_buff->tx_pages_data[3] = 1 << PMTXNUMCHN_G(lo); + + meminfo_buff->p_structs = t4_read_reg(padap, TP_CMM_MM_MAX_PSTRUCT_A); + + for (i = 0; i < 4; i++) { + if (CHELSIO_CHIP_VERSION(padap->params.chip) > CHELSIO_T5) + lo = t4_read_reg(padap, + MPS_RX_MAC_BG_PG_CNT0_A + i * 4); + else + lo = t4_read_reg(padap, MPS_RX_PG_RSV0_A + i * 4); + if (is_t5(padap->params.chip)) { + used = T5_USED_G(lo); + alloc = T5_ALLOC_G(lo); + } else { + used = USED_G(lo); + alloc = ALLOC_G(lo); + } + meminfo_buff->port_used[i] = used; + meminfo_buff->port_alloc[i] = alloc; + } + + for (i = 0; i < padap->params.arch.nchan; i++) { + if (CHELSIO_CHIP_VERSION(padap->params.chip) > CHELSIO_T5) + lo = t4_read_reg(padap, + MPS_RX_LPBK_BG_PG_CNT0_A + i * 4); + else + lo = t4_read_reg(padap, MPS_RX_PG_RSV4_A + i * 4); + if (is_t5(padap->params.chip)) { + used = T5_USED_G(lo); + alloc = T5_ALLOC_G(lo); + } else { + used = USED_G(lo); + alloc = ALLOC_G(lo); + } + meminfo_buff->loopback_used[i] = used; + meminfo_buff->loopback_alloc[i] = alloc; + } + + return 0; +} + int cudbg_collect_reg_dump(struct cudbg_init *pdbg_init, struct cudbg_buffer *dbg_buff, struct cudbg_error *cudbg_err) @@ -420,23 +693,211 @@ int cudbg_collect_obq_sge_rx_q1(struct cudbg_init *pdbg_init, return cudbg_read_cim_obq(pdbg_init, dbg_buff, cudbg_err, 7); } +static int cudbg_meminfo_get_mem_index(struct adapter *padap, + struct cudbg_meminfo *mem_info, + u8 mem_type, u8 *idx) +{ + u8 i, flag; + + switch (mem_type) { + case MEM_EDC0: + flag = EDC0_FLAG; + break; + case MEM_EDC1: + flag = EDC1_FLAG; + break; + case MEM_MC0: + /* Some T5 cards have both MC0 and MC1. */ + flag = is_t5(padap->params.chip) ? MC0_FLAG : MC_FLAG; + break; + case MEM_MC1: + flag = MC1_FLAG; + break; + case MEM_HMA: + flag = HMA_FLAG; + break; + default: + return CUDBG_STATUS_ENTITY_NOT_FOUND; + } + + for (i = 0; i < mem_info->avail_c; i++) { + if (mem_info->avail[i].idx == flag) { + *idx = i; + return 0; + } + } + + return CUDBG_STATUS_ENTITY_NOT_FOUND; +} + +/* Fetch the @region_name's start and end from @meminfo. */ +static int cudbg_get_mem_region(struct adapter *padap, + struct cudbg_meminfo *meminfo, + u8 mem_type, const char *region_name, + struct cudbg_mem_desc *mem_desc) +{ + u8 mc, found = 0; + u32 i, idx = 0; + int rc; + + rc = cudbg_meminfo_get_mem_index(padap, meminfo, mem_type, &mc); + if (rc) + return rc; + + for (i = 0; i < ARRAY_SIZE(cudbg_region); i++) { + if (!strcmp(cudbg_region[i], region_name)) { + found = 1; + idx = i; + break; + } + } + if (!found) + return -EINVAL; + + found = 0; + for (i = 0; i < meminfo->mem_c; i++) { + if (meminfo->mem[i].idx >= ARRAY_SIZE(cudbg_region)) + continue; /* Skip holes */ + + if (!(meminfo->mem[i].limit)) + meminfo->mem[i].limit = + i < meminfo->mem_c - 1 ? + meminfo->mem[i + 1].base - 1 : ~0; + + if (meminfo->mem[i].idx == idx) { + /* Check if the region exists in @mem_type memory */ + if (meminfo->mem[i].base < meminfo->avail[mc].base && + meminfo->mem[i].limit < meminfo->avail[mc].base) + return -EINVAL; + + if (meminfo->mem[i].base > meminfo->avail[mc].limit) + return -EINVAL; + + memcpy(mem_desc, &meminfo->mem[i], + sizeof(struct cudbg_mem_desc)); + found = 1; + break; + } + } + if (!found) + return -EINVAL; + + return 0; +} + +/* Fetch and update the start and end of the requested memory region w.r.t 0 + * in the corresponding EDC/MC/HMA. + */ +static int cudbg_get_mem_relative(struct adapter *padap, + struct cudbg_meminfo *meminfo, + u8 mem_type, u32 *out_base, u32 *out_end) +{ + u8 mc_idx; + int rc; + + rc = cudbg_meminfo_get_mem_index(padap, meminfo, mem_type, &mc_idx); + if (rc) + return rc; + + if (*out_base < meminfo->avail[mc_idx].base) + *out_base = 0; + else + *out_base -= meminfo->avail[mc_idx].base; + + if (*out_end > meminfo->avail[mc_idx].limit) + *out_end = meminfo->avail[mc_idx].limit; + else + *out_end -= meminfo->avail[mc_idx].base; + + return 0; +} + +/* Get TX and RX Payload region */ +static int cudbg_get_payload_range(struct adapter *padap, u8 mem_type, + const char *region_name, + struct cudbg_region_info *payload) +{ + struct cudbg_mem_desc mem_desc = { 0 }; + struct cudbg_meminfo meminfo; + int rc; + + rc = cudbg_fill_meminfo(padap, &meminfo); + if (rc) + return rc; + + rc = cudbg_get_mem_region(padap, &meminfo, mem_type, region_name, + &mem_desc); + if (rc) { + payload->exist = false; + return 0; + } + + payload->exist = true; + payload->start = mem_desc.base; + payload->end = mem_desc.limit; + + return cudbg_get_mem_relative(padap, &meminfo, mem_type, + &payload->start, &payload->end); +} + +#define CUDBG_YIELD_ITERATION 256 + static int cudbg_read_fw_mem(struct cudbg_init *pdbg_init, struct cudbg_buffer *dbg_buff, u8 mem_type, unsigned long tot_len, struct cudbg_error *cudbg_err) { + static const char * const region_name[] = { "Tx payload:", + "Rx payload:" }; unsigned long bytes, bytes_left, bytes_read = 0; struct adapter *padap = pdbg_init->adap; struct cudbg_buffer temp_buff = { 0 }; + struct cudbg_region_info payload[2]; + u32 yield_count = 0; int rc = 0; + u8 i; + + /* Get TX/RX Payload region range if they exist */ + memset(payload, 0, sizeof(payload)); + for (i = 0; i < ARRAY_SIZE(region_name); i++) { + rc = cudbg_get_payload_range(padap, mem_type, region_name[i], + &payload[i]); + if (rc) + return rc; + + if (payload[i].exist) { + /* Align start and end to avoid wrap around */ + payload[i].start = roundup(payload[i].start, + CUDBG_CHUNK_SIZE); + payload[i].end = rounddown(payload[i].end, + CUDBG_CHUNK_SIZE); + } + } bytes_left = tot_len; while (bytes_left > 0) { + /* As MC size is huge and read through PIO access, this + * loop will hold cpu for a longer time. OS may think that + * the process is hanged and will generate CPU stall traces. + * So yield the cpu regularly. + */ + yield_count++; + if (!(yield_count % CUDBG_YIELD_ITERATION)) + schedule(); + bytes = min_t(unsigned long, bytes_left, (unsigned long)CUDBG_CHUNK_SIZE); rc = cudbg_get_buff(dbg_buff, bytes, &temp_buff); if (rc) return rc; + + for (i = 0; i < ARRAY_SIZE(payload); i++) + if (payload[i].exist && + bytes_read >= payload[i].start && + bytes_read + bytes <= payload[i].end) + /* TX and RX Payload regions can't overlap */ + goto skip_read; + spin_lock(&padap->win0_lock); rc = t4_memory_rw(padap, MEMWIN_NIC, mem_type, bytes_read, bytes, @@ -448,6 +909,8 @@ static int cudbg_read_fw_mem(struct cudbg_init *pdbg_init, cudbg_put_buff(&temp_buff, dbg_buff); return rc; } + +skip_read: bytes_left -= bytes; bytes_read += bytes; cudbg_write_and_release_buff(&temp_buff, dbg_buff); @@ -455,27 +918,6 @@ static int cudbg_read_fw_mem(struct cudbg_init *pdbg_init, return rc; } -static void cudbg_collect_mem_info(struct cudbg_init *pdbg_init, - struct card_mem *mem_info) -{ - struct adapter *padap = pdbg_init->adap; - u32 value; - - value = t4_read_reg(padap, MA_EDRAM0_BAR_A); - value = EDRAM0_SIZE_G(value); - mem_info->size_edc0 = (u16)value; - - value = t4_read_reg(padap, MA_EDRAM1_BAR_A); - value = EDRAM1_SIZE_G(value); - mem_info->size_edc1 = (u16)value; - - value = t4_read_reg(padap, MA_TARGET_MEM_ENABLE_A); - if (value & EDRAM0_ENABLE_F) - mem_info->mem_flag |= (1 << EDC0_FLAG); - if (value & EDRAM1_ENABLE_F) - mem_info->mem_flag |= (1 << EDC1_FLAG); -} - static void cudbg_t4_fwcache(struct cudbg_init *pdbg_init, struct cudbg_error *cudbg_err) { @@ -495,37 +937,25 @@ static int cudbg_collect_mem_region(struct cudbg_init *pdbg_init, struct cudbg_error *cudbg_err, u8 mem_type) { - struct card_mem mem_info = {0}; - unsigned long flag, size; + struct adapter *padap = pdbg_init->adap; + struct cudbg_meminfo mem_info; + unsigned long size; + u8 mc_idx; int rc; + memset(&mem_info, 0, sizeof(struct cudbg_meminfo)); + rc = cudbg_fill_meminfo(padap, &mem_info); + if (rc) + return rc; + cudbg_t4_fwcache(pdbg_init, cudbg_err); - cudbg_collect_mem_info(pdbg_init, &mem_info); - switch (mem_type) { - case MEM_EDC0: - flag = (1 << EDC0_FLAG); - size = cudbg_mbytes_to_bytes(mem_info.size_edc0); - break; - case MEM_EDC1: - flag = (1 << EDC1_FLAG); - size = cudbg_mbytes_to_bytes(mem_info.size_edc1); - break; - default: - rc = CUDBG_STATUS_ENTITY_NOT_FOUND; - goto err; - } + rc = cudbg_meminfo_get_mem_index(padap, &mem_info, mem_type, &mc_idx); + if (rc) + return rc; - if (mem_info.mem_flag & flag) { - rc = cudbg_read_fw_mem(pdbg_init, dbg_buff, mem_type, - size, cudbg_err); - if (rc) - goto err; - } else { - rc = CUDBG_STATUS_ENTITY_NOT_FOUND; - goto err; - } -err: - return rc; + size = mem_info.avail[mc_idx].limit - mem_info.avail[mc_idx].base; + return cudbg_read_fw_mem(pdbg_init, dbg_buff, mem_type, size, + cudbg_err); } int cudbg_collect_edc0_meminfo(struct cudbg_init *pdbg_init, @@ -544,6 +974,30 @@ int cudbg_collect_edc1_meminfo(struct cudbg_init *pdbg_init, MEM_EDC1); } +int cudbg_collect_mc0_meminfo(struct cudbg_init *pdbg_init, + struct cudbg_buffer *dbg_buff, + struct cudbg_error *cudbg_err) +{ + return cudbg_collect_mem_region(pdbg_init, dbg_buff, cudbg_err, + MEM_MC0); +} + +int cudbg_collect_mc1_meminfo(struct cudbg_init *pdbg_init, + struct cudbg_buffer *dbg_buff, + struct cudbg_error *cudbg_err) +{ + return cudbg_collect_mem_region(pdbg_init, dbg_buff, cudbg_err, + MEM_MC1); +} + +int cudbg_collect_hma_meminfo(struct cudbg_init *pdbg_init, + struct cudbg_buffer *dbg_buff, + struct cudbg_error *cudbg_err) +{ + return cudbg_collect_mem_region(pdbg_init, dbg_buff, cudbg_err, + MEM_HMA); +} + int cudbg_collect_rss(struct cudbg_init *pdbg_init, struct cudbg_buffer *dbg_buff, struct cudbg_error *cudbg_err) @@ -843,6 +1297,31 @@ int cudbg_collect_tp_la(struct cudbg_init *pdbg_init, return rc; } +int cudbg_collect_meminfo(struct cudbg_init *pdbg_init, + struct cudbg_buffer *dbg_buff, + struct cudbg_error *cudbg_err) +{ + struct adapter *padap = pdbg_init->adap; + struct cudbg_buffer temp_buff = { 0 }; + struct cudbg_meminfo *meminfo_buff; + int rc; + + rc = cudbg_get_buff(dbg_buff, sizeof(struct cudbg_meminfo), &temp_buff); + if (rc) + return rc; + + meminfo_buff = (struct cudbg_meminfo *)temp_buff.data; + rc = cudbg_fill_meminfo(padap, meminfo_buff); + if (rc) { + cudbg_err->sys_err = rc; + cudbg_put_buff(&temp_buff, dbg_buff); + return rc; + } + + cudbg_write_and_release_buff(&temp_buff, dbg_buff); + return rc; +} + int cudbg_collect_cim_pif_la(struct cudbg_init *pdbg_init, struct cudbg_buffer *dbg_buff, struct cudbg_error *cudbg_err) @@ -1115,22 +1594,135 @@ int cudbg_collect_tid(struct cudbg_init *pdbg_init, return rc; } -int cudbg_dump_context_size(struct adapter *padap) +int cudbg_collect_pcie_config(struct cudbg_init *pdbg_init, + struct cudbg_buffer *dbg_buff, + struct cudbg_error *cudbg_err) { - u32 value, size; + struct adapter *padap = pdbg_init->adap; + struct cudbg_buffer temp_buff = { 0 }; + u32 size, *value, j; + int i, rc, n; + + size = sizeof(u32) * CUDBG_NUM_PCIE_CONFIG_REGS; + n = sizeof(t5_pcie_config_array) / (2 * sizeof(u32)); + rc = cudbg_get_buff(dbg_buff, size, &temp_buff); + if (rc) + return rc; + + value = (u32 *)temp_buff.data; + for (i = 0; i < n; i++) { + for (j = t5_pcie_config_array[i][0]; + j <= t5_pcie_config_array[i][1]; j += 4) { + t4_hw_pci_read_cfg4(padap, j, value); + value++; + } + } + cudbg_write_and_release_buff(&temp_buff, dbg_buff); + return rc; +} + +static int cudbg_sge_ctxt_check_valid(u32 *buf, int type) +{ + int index, bit, bit_pos = 0; + + switch (type) { + case CTXT_EGRESS: + bit_pos = 176; + break; + case CTXT_INGRESS: + bit_pos = 141; + break; + case CTXT_FLM: + bit_pos = 89; + break; + } + index = bit_pos / 32; + bit = bit_pos % 32; + return buf[index] & (1U << bit); +} + +static int cudbg_get_ctxt_region_info(struct adapter *padap, + struct cudbg_region_info *ctx_info, + u8 *mem_type) +{ + struct cudbg_mem_desc mem_desc; + struct cudbg_meminfo meminfo; + u32 i, j, value, found; u8 flq; + int rc; + rc = cudbg_fill_meminfo(padap, &meminfo); + if (rc) + return rc; + + /* Get EGRESS and INGRESS context region size */ + for (i = CTXT_EGRESS; i <= CTXT_INGRESS; i++) { + found = 0; + memset(&mem_desc, 0, sizeof(struct cudbg_mem_desc)); + for (j = 0; j < ARRAY_SIZE(meminfo.avail); j++) { + rc = cudbg_get_mem_region(padap, &meminfo, j, + cudbg_region[i], + &mem_desc); + if (!rc) { + found = 1; + rc = cudbg_get_mem_relative(padap, &meminfo, j, + &mem_desc.base, + &mem_desc.limit); + if (rc) { + ctx_info[i].exist = false; + break; + } + ctx_info[i].exist = true; + ctx_info[i].start = mem_desc.base; + ctx_info[i].end = mem_desc.limit; + mem_type[i] = j; + break; + } + } + if (!found) + ctx_info[i].exist = false; + } + + /* Get FLM and CNM max qid. */ value = t4_read_reg(padap, SGE_FLM_CFG_A); /* Get number of data freelist queues */ flq = HDRSTARTFLQ_G(value); - size = CUDBG_MAX_FL_QIDS >> flq; + ctx_info[CTXT_FLM].exist = true; + ctx_info[CTXT_FLM].end = (CUDBG_MAX_FL_QIDS >> flq) * SGE_CTXT_SIZE; - /* Add extra space for congestion manager contexts. - * The number of CONM contexts are same as number of freelist + /* The number of CONM contexts are same as number of freelist * queues. */ - size += size; + ctx_info[CTXT_CNM].exist = true; + ctx_info[CTXT_CNM].end = ctx_info[CTXT_FLM].end; + + return 0; +} + +int cudbg_dump_context_size(struct adapter *padap) +{ + struct cudbg_region_info region_info[CTXT_CNM + 1] = { {0} }; + u8 mem_type[CTXT_INGRESS + 1] = { 0 }; + u32 i, size = 0; + int rc; + + /* Get max valid qid for each type of queue */ + rc = cudbg_get_ctxt_region_info(padap, region_info, mem_type); + if (rc) + return rc; + + for (i = 0; i < CTXT_CNM; i++) { + if (!region_info[i].exist) { + if (i == CTXT_EGRESS || i == CTXT_INGRESS) + size += CUDBG_LOWMEM_MAX_CTXT_QIDS * + SGE_CTXT_SIZE; + continue; + } + + size += (region_info[i].end - region_info[i].start + 1) / + SGE_CTXT_SIZE; + } return size * sizeof(struct cudbg_ch_cntxt); } @@ -1153,16 +1745,54 @@ static void cudbg_read_sge_ctxt(struct cudbg_init *pdbg_init, u32 cid, t4_sge_ctxt_rd_bd(padap, cid, ctype, data); } +static void cudbg_get_sge_ctxt_fw(struct cudbg_init *pdbg_init, u32 max_qid, + u8 ctxt_type, + struct cudbg_ch_cntxt **out_buff) +{ + struct cudbg_ch_cntxt *buff = *out_buff; + int rc; + u32 j; + + for (j = 0; j < max_qid; j++) { + cudbg_read_sge_ctxt(pdbg_init, j, ctxt_type, buff->data); + rc = cudbg_sge_ctxt_check_valid(buff->data, ctxt_type); + if (!rc) + continue; + + buff->cntxt_type = ctxt_type; + buff->cntxt_id = j; + buff++; + if (ctxt_type == CTXT_FLM) { + cudbg_read_sge_ctxt(pdbg_init, j, CTXT_CNM, buff->data); + buff->cntxt_type = CTXT_CNM; + buff->cntxt_id = j; + buff++; + } + } + + *out_buff = buff; +} + int cudbg_collect_dump_context(struct cudbg_init *pdbg_init, struct cudbg_buffer *dbg_buff, struct cudbg_error *cudbg_err) { + struct cudbg_region_info region_info[CTXT_CNM + 1] = { {0} }; struct adapter *padap = pdbg_init->adap; + u32 j, size, max_ctx_size, max_ctx_qid; + u8 mem_type[CTXT_INGRESS + 1] = { 0 }; struct cudbg_buffer temp_buff = { 0 }; struct cudbg_ch_cntxt *buff; - u32 size, i = 0; + u64 *dst_off, *src_off; + u8 *ctx_buf; + u8 i, k; int rc; + /* Get max valid qid for each type of queue */ + rc = cudbg_get_ctxt_region_info(padap, region_info, mem_type); + if (rc) + return rc; + rc = cudbg_dump_context_size(padap); if (rc <= 0) return CUDBG_STATUS_ENTITY_NOT_FOUND; @@ -1172,23 +1802,79 @@ int cudbg_collect_dump_context(struct cudbg_init *pdbg_init, if (rc) return rc; + /* Get buffer with enough space to read the biggest context + * region in memory. + */ + max_ctx_size = max(region_info[CTXT_EGRESS].end - + region_info[CTXT_EGRESS].start + 1, + region_info[CTXT_INGRESS].end - + region_info[CTXT_INGRESS].start + 1); + + ctx_buf = kvzalloc(max_ctx_size, GFP_KERNEL); + if (!ctx_buf) { + cudbg_put_buff(&temp_buff, dbg_buff); + return -ENOMEM; + } + buff = (struct cudbg_ch_cntxt *)temp_buff.data; - while (size > 0) { - buff->cntxt_type = CTXT_FLM; - buff->cntxt_id = i; - cudbg_read_sge_ctxt(pdbg_init, i, CTXT_FLM, buff->data); - buff++; - size -= sizeof(struct cudbg_ch_cntxt); - buff->cntxt_type = CTXT_CNM; - buff->cntxt_id = i; - cudbg_read_sge_ctxt(pdbg_init, i, CTXT_CNM, buff->data); - buff++; - size -= sizeof(struct cudbg_ch_cntxt); + /* Collect EGRESS and INGRESS context data. + * In case of failures, fallback to collecting via FW or + * backdoor access. + */ + for (i = CTXT_EGRESS; i <= CTXT_INGRESS; i++) { + if (!region_info[i].exist) { + max_ctx_qid = CUDBG_LOWMEM_MAX_CTXT_QIDS; + cudbg_get_sge_ctxt_fw(pdbg_init, max_ctx_qid, i, + &buff); + continue; + } - i++; + max_ctx_size = region_info[i].end - region_info[i].start + 1; + max_ctx_qid = max_ctx_size / SGE_CTXT_SIZE; + + t4_sge_ctxt_flush(padap, padap->mbox, i); + rc = t4_memory_rw(padap, MEMWIN_NIC, mem_type[i], + region_info[i].start, max_ctx_size, + (__be32 *)ctx_buf, 1); + if (rc) { + max_ctx_qid = CUDBG_LOWMEM_MAX_CTXT_QIDS; + cudbg_get_sge_ctxt_fw(pdbg_init, max_ctx_qid, i, + &buff); + continue; + } + + for (j = 0; j < max_ctx_qid; j++) { + src_off = (u64 *)(ctx_buf + j * SGE_CTXT_SIZE); + dst_off = (u64 *)buff->data; + + /* The data is stored in 64-bit cpu order. Convert it + * to big endian before parsing. + */ + for (k = 0; k < SGE_CTXT_SIZE / sizeof(u64); k++) + dst_off[k] = cpu_to_be64(src_off[k]); + + rc = cudbg_sge_ctxt_check_valid(buff->data, i); + if (!rc) + continue; + + buff->cntxt_type = i; + buff->cntxt_id = j; + buff++; + } } + kvfree(ctx_buf); + + /* Collect FREELIST and CONGESTION MANAGER contexts */ + max_ctx_size = region_info[CTXT_FLM].end - + region_info[CTXT_FLM].start + 1; + max_ctx_qid = max_ctx_size / SGE_CTXT_SIZE; + /* Since FLM and CONM are 1-to-1 mapped, the below function + * will fetch both FLM and CONM contexts. + */ + cudbg_get_sge_ctxt_fw(pdbg_init, max_ctx_qid, CTXT_FLM, &buff); + cudbg_write_and_release_buff(&temp_buff, dbg_buff); return rc; } diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.h b/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.h index caeee8e33e86..eebefe7cd18e 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.h +++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.h @@ -75,6 +75,12 @@ int cudbg_collect_edc0_meminfo(struct cudbg_init *pdbg_init, int cudbg_collect_edc1_meminfo(struct cudbg_init *pdbg_init, struct cudbg_buffer *dbg_buff, struct cudbg_error *cudbg_err); +int cudbg_collect_mc0_meminfo(struct cudbg_init *pdbg_init, + struct cudbg_buffer *dbg_buff, + struct cudbg_error *cudbg_err); +int cudbg_collect_mc1_meminfo(struct cudbg_init *pdbg_init, + struct cudbg_buffer *dbg_buff, + struct cudbg_error *cudbg_err); int cudbg_collect_rss(struct cudbg_init *pdbg_init, struct cudbg_buffer *dbg_buff, struct cudbg_error *cudbg_err); @@ -102,6 +108,9 @@ int cudbg_collect_ulprx_la(struct cudbg_init *pdbg_init, int cudbg_collect_tp_la(struct cudbg_init *pdbg_init, struct cudbg_buffer *dbg_buff, struct cudbg_error *cudbg_err); +int cudbg_collect_meminfo(struct cudbg_init *pdbg_init, + struct cudbg_buffer *dbg_buff, + struct cudbg_error *cudbg_err); int cudbg_collect_cim_pif_la(struct cudbg_init *pdbg_init, struct cudbg_buffer *dbg_buff, struct cudbg_error *cudbg_err); @@ -123,6 +132,9 @@ int cudbg_collect_pm_indirect(struct cudbg_init *pdbg_init, int cudbg_collect_tid(struct cudbg_init *pdbg_init, struct cudbg_buffer *dbg_buff, struct cudbg_error *cudbg_err); +int cudbg_collect_pcie_config(struct cudbg_init *pdbg_init, + struct cudbg_buffer *dbg_buff, + struct cudbg_error *cudbg_err); int cudbg_collect_dump_context(struct cudbg_init *pdbg_init, struct cudbg_buffer *dbg_buff, struct cudbg_error *cudbg_err); @@ -156,6 +168,9 @@ int cudbg_collect_mbox_log(struct cudbg_init *pdbg_init, int cudbg_collect_hma_indirect(struct cudbg_init *pdbg_init, struct cudbg_buffer *dbg_buff, struct cudbg_error *cudbg_err); +int cudbg_collect_hma_meminfo(struct cudbg_init *pdbg_init, + struct cudbg_buffer *dbg_buff, + struct cudbg_error *cudbg_err); struct cudbg_entity_hdr *cudbg_get_entity_hdr(void *outbuf, int i); void cudbg_align_debug_buffer(struct cudbg_buffer *dbg_buff, @@ -163,7 +178,8 @@ void cudbg_align_debug_buffer(struct cudbg_buffer *dbg_buff, u32 cudbg_cim_obq_size(struct adapter *padap, int qid); int cudbg_dump_context_size(struct adapter *padap); -struct cudbg_tcam; +int cudbg_fill_meminfo(struct adapter *padap, + struct cudbg_meminfo *meminfo_buff); void cudbg_fill_le_tcam_info(struct adapter *padap, struct cudbg_tcam *tcam_region); #endif /* __CUDBG_LIB_H__ */ diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h index 6f9fa6e3c42a..97dc3efeb234 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h @@ -77,7 +77,8 @@ enum { MEM_EDC1, MEM_MC, MEM_MC0 = MEM_MC, - MEM_MC1 + MEM_MC1, + MEM_HMA, }; enum { @@ -1653,7 +1654,7 @@ int t4_ctrl_eq_free(struct adapter *adap, unsigned int mbox, unsigned int pf, unsigned int vf, unsigned int eqid); int t4_ofld_eq_free(struct adapter *adap, unsigned int mbox, unsigned int pf, unsigned int vf, unsigned int eqid); -int t4_sge_ctxt_flush(struct adapter *adap, unsigned int mbox); +int t4_sge_ctxt_flush(struct adapter *adap, unsigned int mbox, int ctxt_type); void t4_handle_get_port_info(struct port_info *pi, const __be64 *rpl); int t4_update_port_info(struct port_info *pi); int t4_get_link_params(struct port_info *pi, unsigned int *link_okp, diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c index 29cc625e9833..41c8736314f8 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c @@ -18,11 +18,13 @@ #include "t4_regs.h" #include "cxgb4.h" #include "cxgb4_cudbg.h" -#include "cudbg_entity.h" static const struct cxgb4_collect_entity cxgb4_collect_mem_dump[] = { { CUDBG_EDC0, cudbg_collect_edc0_meminfo }, { CUDBG_EDC1, cudbg_collect_edc1_meminfo }, + { CUDBG_MC0, cudbg_collect_mc0_meminfo }, + { CUDBG_MC1, cudbg_collect_mc1_meminfo }, + { CUDBG_HMA, cudbg_collect_hma_meminfo }, }; static const struct cxgb4_collect_entity cxgb4_collect_hw_dump[] = { @@ -53,6 +55,7 @@ static const struct cxgb4_collect_entity cxgb4_collect_hw_dump[] = { { CUDBG_SGE_INDIRECT, cudbg_collect_sge_indirect }, { CUDBG_ULPRX_LA, cudbg_collect_ulprx_la }, { CUDBG_TP_LA, cudbg_collect_tp_la }, + { CUDBG_MEMINFO, cudbg_collect_meminfo }, { CUDBG_CIM_PIF_LA, cudbg_collect_cim_pif_la }, { CUDBG_CLK, cudbg_collect_clk_info }, { CUDBG_CIM_OBQ_RXQ0, cudbg_collect_obq_sge_rx_q0 }, @@ -60,6 +63,7 @@ static const struct cxgb4_collect_entity cxgb4_collect_hw_dump[] = { { CUDBG_PCIE_INDIRECT, cudbg_collect_pcie_indirect }, { CUDBG_PM_INDIRECT, cudbg_collect_pm_indirect }, { CUDBG_TID_INFO, cudbg_collect_tid }, + { CUDBG_PCIE_CONFIG, cudbg_collect_pcie_config }, { CUDBG_DUMP_CONTEXT, cudbg_collect_dump_context }, { CUDBG_MPS_TCAM, cudbg_collect_mps_tcam }, { CUDBG_VPD_DATA, cudbg_collect_vpd_data }, @@ -158,6 +162,22 @@ static u32 cxgb4_get_entity_length(struct adapter *adap, u32 entity) } len = cudbg_mbytes_to_bytes(len); break; + case CUDBG_MC0: + value = t4_read_reg(adap, MA_TARGET_MEM_ENABLE_A); + if (value & EXT_MEM0_ENABLE_F) { + value = t4_read_reg(adap, MA_EXT_MEMORY0_BAR_A); + len = EXT_MEM0_SIZE_G(value); + } + len = cudbg_mbytes_to_bytes(len); + break; + case CUDBG_MC1: + value = t4_read_reg(adap, MA_TARGET_MEM_ENABLE_A); + if (value & EXT_MEM1_ENABLE_F) { + value = t4_read_reg(adap, MA_EXT_MEMORY1_BAR_A); + len = EXT_MEM1_SIZE_G(value); + } + len = cudbg_mbytes_to_bytes(len); + break; case CUDBG_RSS: len = RSS_NENTRIES * sizeof(u16); break; @@ -201,6 +221,9 @@ static u32 cxgb4_get_entity_length(struct adapter *adap, u32 entity) case CUDBG_TP_LA: len = sizeof(struct cudbg_tp_la) + TPLA_SIZE * sizeof(u64); break; + case CUDBG_MEMINFO: + len = sizeof(struct cudbg_meminfo); + break; case CUDBG_CIM_PIF_LA: len = sizeof(struct cudbg_cim_pif_la); len += 2 * CIM_PIFLA_SIZE * 6 * sizeof(u32); @@ -219,6 +242,9 @@ static u32 cxgb4_get_entity_length(struct adapter *adap, u32 entity) case CUDBG_TID_INFO: len = sizeof(struct cudbg_tid_info_region_rev1); break; + case CUDBG_PCIE_CONFIG: + len = sizeof(u32) * CUDBG_NUM_PCIE_CONFIG_REGS; + break; case CUDBG_DUMP_CONTEXT: len = cudbg_dump_context_size(adap); break; @@ -264,6 +290,17 @@ static u32 cxgb4_get_entity_length(struct adapter *adap, u32 entity) len = sizeof(struct ireg_buf) * n; } break; + case CUDBG_HMA: + value = t4_read_reg(adap, MA_TARGET_MEM_ENABLE_A); + if (value & HMA_MUX_F) { + /* In T6, there's no MC1. So, HMA shares MC1 + * address space. + */ + value = t4_read_reg(adap, MA_EXT_MEMORY1_BAR_A); + len = EXT_MEM1_SIZE_G(value); + } + len = cudbg_mbytes_to_bytes(len); + break; default: break; } diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.h index c099b5aa2214..7ceeb0bc9fa8 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.h +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.h @@ -20,6 +20,7 @@ #include "cudbg_if.h" #include "cudbg_lib_common.h" +#include "cudbg_entity.h" #include "cudbg_lib.h" typedef int (*cudbg_collect_callback_t)(struct cudbg_init *pdbg_init, diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c index 917663b35603..4956e429ae1d 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c @@ -45,6 +45,10 @@ #include "cxgb4_debugfs.h" #include "clip_tbl.h" #include "l2t.h" +#include "cudbg_if.h" +#include "cudbg_lib_common.h" +#include "cudbg_entity.h" +#include "cudbg_lib.h" /* generic seq_file support for showing a table of size rows x width. */ static void *seq_tab_get_idx(struct seq_tab *tb, loff_t pos) @@ -2794,18 +2798,6 @@ static const struct file_operations blocked_fl_fops = { .llseek = generic_file_llseek, }; -struct mem_desc { - unsigned int base; - unsigned int limit; - unsigned int idx; -}; - -static int mem_desc_cmp(const void *a, const void *b) -{ - return ((const struct mem_desc *)a)->base - - ((const struct mem_desc *)b)->base; -} - static void mem_region_show(struct seq_file *seq, const char *name, unsigned int from, unsigned int to) { @@ -2819,250 +2811,60 @@ static void mem_region_show(struct seq_file *seq, const char *name, static int meminfo_show(struct seq_file *seq, void *v) { static const char * const memory[] = { "EDC0:", "EDC1:", "MC:", - "MC0:", "MC1:"}; - static const char * const region[] = { - "DBQ contexts:", "IMSG contexts:", "FLM cache:", "TCBs:", - "Pstructs:", "Timers:", "Rx FL:", "Tx FL:", "Pstruct FL:", - "Tx payload:", "Rx payload:", "LE hash:", "iSCSI region:", - "TDDP region:", "TPT region:", "STAG region:", "RQ region:", - "RQUDP region:", "PBL region:", "TXPBL region:", - "DBVFIFO region:", "ULPRX state:", "ULPTX state:", - "On-chip queues:" - }; - - int i, n; - u32 lo, hi, used, alloc; - struct mem_desc avail[4]; - struct mem_desc mem[ARRAY_SIZE(region) + 3]; /* up to 3 holes */ - struct mem_desc *md = mem; + "MC0:", "MC1:", "HMA:"}; struct adapter *adap = seq->private; + struct cudbg_meminfo meminfo; + int i, rc; - for (i = 0; i < ARRAY_SIZE(mem); i++) { - mem[i].limit = 0; - mem[i].idx = i; - } - - /* Find and sort the populated memory ranges */ - i = 0; - lo = t4_read_reg(adap, MA_TARGET_MEM_ENABLE_A); - if (lo & EDRAM0_ENABLE_F) { - hi = t4_read_reg(adap, MA_EDRAM0_BAR_A); - avail[i].base = EDRAM0_BASE_G(hi) << 20; - avail[i].limit = avail[i].base + (EDRAM0_SIZE_G(hi) << 20); - avail[i].idx = 0; - i++; - } - if (lo & EDRAM1_ENABLE_F) { - hi = t4_read_reg(adap, MA_EDRAM1_BAR_A); - avail[i].base = EDRAM1_BASE_G(hi) << 20; - avail[i].limit = avail[i].base + (EDRAM1_SIZE_G(hi) << 20); - avail[i].idx = 1; - i++; - } - - if (is_t5(adap->params.chip)) { - if (lo & EXT_MEM0_ENABLE_F) { - hi = t4_read_reg(adap, MA_EXT_MEMORY0_BAR_A); - avail[i].base = EXT_MEM0_BASE_G(hi) << 20; - avail[i].limit = - avail[i].base + (EXT_MEM0_SIZE_G(hi) << 20); - avail[i].idx = 3; - i++; - } - if (lo & EXT_MEM1_ENABLE_F) { - hi = t4_read_reg(adap, MA_EXT_MEMORY1_BAR_A); - avail[i].base = EXT_MEM1_BASE_G(hi) << 20; - avail[i].limit = - avail[i].base + (EXT_MEM1_SIZE_G(hi) << 20); - avail[i].idx = 4; - i++; - } - } else { - if (lo & EXT_MEM_ENABLE_F) { - hi = t4_read_reg(adap, MA_EXT_MEMORY_BAR_A); - avail[i].base = EXT_MEM_BASE_G(hi) << 20; - avail[i].limit = - avail[i].base + (EXT_MEM_SIZE_G(hi) << 20); - avail[i].idx = 2; - i++; - } - } - if (!i) /* no memory available */ - return 0; - sort(avail, i, sizeof(struct mem_desc), mem_desc_cmp, NULL); - - (md++)->base = t4_read_reg(adap, SGE_DBQ_CTXT_BADDR_A); - (md++)->base = t4_read_reg(adap, SGE_IMSG_CTXT_BADDR_A); - (md++)->base = t4_read_reg(adap, SGE_FLM_CACHE_BADDR_A); - (md++)->base = t4_read_reg(adap, TP_CMM_TCB_BASE_A); - (md++)->base = t4_read_reg(adap, TP_CMM_MM_BASE_A); - (md++)->base = t4_read_reg(adap, TP_CMM_TIMER_BASE_A); - (md++)->base = t4_read_reg(adap, TP_CMM_MM_RX_FLST_BASE_A); - (md++)->base = t4_read_reg(adap, TP_CMM_MM_TX_FLST_BASE_A); - (md++)->base = t4_read_reg(adap, TP_CMM_MM_PS_FLST_BASE_A); - - /* the next few have explicit upper bounds */ - md->base = t4_read_reg(adap, TP_PMM_TX_BASE_A); - md->limit = md->base - 1 + - t4_read_reg(adap, TP_PMM_TX_PAGE_SIZE_A) * - PMTXMAXPAGE_G(t4_read_reg(adap, TP_PMM_TX_MAX_PAGE_A)); - md++; - - md->base = t4_read_reg(adap, TP_PMM_RX_BASE_A); - md->limit = md->base - 1 + - t4_read_reg(adap, TP_PMM_RX_PAGE_SIZE_A) * - PMRXMAXPAGE_G(t4_read_reg(adap, TP_PMM_RX_MAX_PAGE_A)); - md++; - - if (t4_read_reg(adap, LE_DB_CONFIG_A) & HASHEN_F) { - if (CHELSIO_CHIP_VERSION(adap->params.chip) <= CHELSIO_T5) { - hi = t4_read_reg(adap, LE_DB_TID_HASHBASE_A) / 4; - md->base = t4_read_reg(adap, LE_DB_HASH_TID_BASE_A); - } else { - hi = t4_read_reg(adap, LE_DB_HASH_TID_BASE_A); - md->base = t4_read_reg(adap, - LE_DB_HASH_TBL_BASE_ADDR_A); - } - md->limit = 0; - } else { - md->base = 0; - md->idx = ARRAY_SIZE(region); /* hide it */ - } - md++; - -#define ulp_region(reg) do { \ - md->base = t4_read_reg(adap, ULP_ ## reg ## _LLIMIT_A);\ - (md++)->limit = t4_read_reg(adap, ULP_ ## reg ## _ULIMIT_A); \ -} while (0) - - ulp_region(RX_ISCSI); - ulp_region(RX_TDDP); - ulp_region(TX_TPT); - ulp_region(RX_STAG); - ulp_region(RX_RQ); - ulp_region(RX_RQUDP); - ulp_region(RX_PBL); - ulp_region(TX_PBL); -#undef ulp_region - md->base = 0; - md->idx = ARRAY_SIZE(region); - if (!is_t4(adap->params.chip)) { - u32 size = 0; - u32 sge_ctrl = t4_read_reg(adap, SGE_CONTROL2_A); - u32 fifo_size = t4_read_reg(adap, SGE_DBVFIFO_SIZE_A); - - if (is_t5(adap->params.chip)) { - if (sge_ctrl & VFIFO_ENABLE_F) - size = DBVFIFO_SIZE_G(fifo_size); - } else { - size = T6_DBVFIFO_SIZE_G(fifo_size); - } - - if (size) { - md->base = BASEADDR_G(t4_read_reg(adap, - SGE_DBVFIFO_BADDR_A)); - md->limit = md->base + (size << 2) - 1; - } - } - - md++; - - md->base = t4_read_reg(adap, ULP_RX_CTX_BASE_A); - md->limit = 0; - md++; - md->base = t4_read_reg(adap, ULP_TX_ERR_TABLE_BASE_A); - md->limit = 0; - md++; - - md->base = adap->vres.ocq.start; - if (adap->vres.ocq.size) - md->limit = md->base + adap->vres.ocq.size - 1; - else - md->idx = ARRAY_SIZE(region); /* hide it */ - md++; - - /* add any address-space holes, there can be up to 3 */ - for (n = 0; n < i - 1; n++) - if (avail[n].limit < avail[n + 1].base) - (md++)->base = avail[n].limit; - if (avail[n].limit) - (md++)->base = avail[n].limit; - - n = md - mem; - sort(mem, n, sizeof(struct mem_desc), mem_desc_cmp, NULL); + memset(&meminfo, 0, sizeof(struct cudbg_meminfo)); + rc = cudbg_fill_meminfo(adap, &meminfo); + if (rc) + return -ENXIO; - for (lo = 0; lo < i; lo++) - mem_region_show(seq, memory[avail[lo].idx], avail[lo].base, - avail[lo].limit - 1); + for (i = 0; i < meminfo.avail_c; i++) + mem_region_show(seq, memory[meminfo.avail[i].idx], + meminfo.avail[i].base, + meminfo.avail[i].limit - 1); seq_putc(seq, '\n'); - for (i = 0; i < n; i++) { - if (mem[i].idx >= ARRAY_SIZE(region)) + for (i = 0; i < meminfo.mem_c; i++) { + if (meminfo.mem[i].idx >= ARRAY_SIZE(cudbg_region)) continue; /* skip holes */ - if (!mem[i].limit) - mem[i].limit = i < n - 1 ? mem[i + 1].base - 1 : ~0; - mem_region_show(seq, region[mem[i].idx], mem[i].base, - mem[i].limit); + if (!meminfo.mem[i].limit) + meminfo.mem[i].limit = + i < meminfo.mem_c - 1 ? + meminfo.mem[i + 1].base - 1 : ~0; + mem_region_show(seq, cudbg_region[meminfo.mem[i].idx], + meminfo.mem[i].base, meminfo.mem[i].limit); } seq_putc(seq, '\n'); - lo = t4_read_reg(adap, CIM_SDRAM_BASE_ADDR_A); - hi = t4_read_reg(adap, CIM_SDRAM_ADDR_SIZE_A) + lo - 1; - mem_region_show(seq, "uP RAM:", lo, hi); + mem_region_show(seq, "uP RAM:", meminfo.up_ram_lo, meminfo.up_ram_hi); + mem_region_show(seq, "uP Extmem2:", meminfo.up_extmem2_lo, + meminfo.up_extmem2_hi); - lo = t4_read_reg(adap, CIM_EXTMEM2_BASE_ADDR_A); - hi = t4_read_reg(adap, CIM_EXTMEM2_ADDR_SIZE_A) + lo - 1; - mem_region_show(seq, "uP Extmem2:", lo, hi); - - lo = t4_read_reg(adap, TP_PMM_RX_MAX_PAGE_A); seq_printf(seq, "\n%u Rx pages of size %uKiB for %u channels\n", - PMRXMAXPAGE_G(lo), - t4_read_reg(adap, TP_PMM_RX_PAGE_SIZE_A) >> 10, - (lo & PMRXNUMCHN_F) ? 2 : 1); + meminfo.rx_pages_data[0], meminfo.rx_pages_data[1], + meminfo.rx_pages_data[2]); - lo = t4_read_reg(adap, TP_PMM_TX_MAX_PAGE_A); - hi = t4_read_reg(adap, TP_PMM_TX_PAGE_SIZE_A); seq_printf(seq, "%u Tx pages of size %u%ciB for %u channels\n", - PMTXMAXPAGE_G(lo), - hi >= (1 << 20) ? (hi >> 20) : (hi >> 10), - hi >= (1 << 20) ? 'M' : 'K', 1 << PMTXNUMCHN_G(lo)); - seq_printf(seq, "%u p-structs\n\n", - t4_read_reg(adap, TP_CMM_MM_MAX_PSTRUCT_A)); - - for (i = 0; i < 4; i++) { - if (CHELSIO_CHIP_VERSION(adap->params.chip) > CHELSIO_T5) - lo = t4_read_reg(adap, MPS_RX_MAC_BG_PG_CNT0_A + i * 4); - else - lo = t4_read_reg(adap, MPS_RX_PG_RSV0_A + i * 4); - if (is_t5(adap->params.chip)) { - used = T5_USED_G(lo); - alloc = T5_ALLOC_G(lo); - } else { - used = USED_G(lo); - alloc = ALLOC_G(lo); - } + meminfo.tx_pages_data[0], meminfo.tx_pages_data[1], + meminfo.tx_pages_data[2], meminfo.tx_pages_data[3]); + + seq_printf(seq, "%u p-structs\n\n", meminfo.p_structs); + + for (i = 0; i < 4; i++) /* For T6 these are MAC buffer groups */ seq_printf(seq, "Port %d using %u pages out of %u allocated\n", - i, used, alloc); - } - for (i = 0; i < adap->params.arch.nchan; i++) { - if (CHELSIO_CHIP_VERSION(adap->params.chip) > CHELSIO_T5) - lo = t4_read_reg(adap, - MPS_RX_LPBK_BG_PG_CNT0_A + i * 4); - else - lo = t4_read_reg(adap, MPS_RX_PG_RSV4_A + i * 4); - if (is_t5(adap->params.chip)) { - used = T5_USED_G(lo); - alloc = T5_ALLOC_G(lo); - } else { - used = USED_G(lo); - alloc = ALLOC_G(lo); - } + i, meminfo.port_used[i], meminfo.port_alloc[i]); + + for (i = 0; i < adap->params.arch.nchan; i++) /* For T6 these are MAC buffer groups */ seq_printf(seq, "Loopback %d using %u pages out of %u allocated\n", - i, used, alloc); - } + i, meminfo.loopback_used[i], + meminfo.loopback_alloc[i]); + return 0; } diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c index 6f900ffe25cc..87ac1e4dafc1 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c @@ -1673,7 +1673,7 @@ int cxgb4_flush_eq_cache(struct net_device *dev) { struct adapter *adap = netdev2adap(dev); - return t4_sge_ctxt_flush(adap, adap->mbox); + return t4_sge_ctxt_flush(adap, adap->mbox, CTXT_EGRESS); } EXPORT_SYMBOL(cxgb4_flush_eq_cache); diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c index d4a548a6a55c..a12b894f135d 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c @@ -405,9 +405,7 @@ static void cxgb4_process_flow_actions(struct net_device *in, } else if (is_tcf_gact_shot(a)) { fs->action = FILTER_DROP; } else if (is_tcf_mirred_egress_redirect(a)) { - int ifindex = tcf_mirred_ifindex(a); - struct net_device *out = __dev_get_by_index(dev_net(in), - ifindex); + struct net_device *out = tcf_mirred_dev(a); struct port_info *pi = netdev_priv(out); fs->action = FILTER_SWITCH; @@ -582,14 +580,14 @@ static int cxgb4_validate_flow_actions(struct net_device *dev, /* Do nothing */ } else if (is_tcf_mirred_egress_redirect(a)) { struct adapter *adap = netdev2adap(dev); - struct net_device *n_dev; - unsigned int i, ifindex; + struct net_device *n_dev, *target_dev; + unsigned int i; bool found = false; - ifindex = tcf_mirred_ifindex(a); + target_dev = tcf_mirred_dev(a); for_each_port(adap, i) { n_dev = adap->port[i]; - if (ifindex == n_dev->ifindex) { + if (target_dev == n_dev) { found = true; break; } diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_u32.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_u32.c index cd0cd13a964d..ab174bcfbfb0 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_u32.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_u32.c @@ -114,14 +114,14 @@ static int fill_action_fields(struct adapter *adap, /* Re-direct to specified port in hardware. */ if (is_tcf_mirred_egress_redirect(a)) { - struct net_device *n_dev; - unsigned int i, index; + struct net_device *n_dev, *target_dev; bool found = false; + unsigned int i; - index = tcf_mirred_ifindex(a); + target_dev = tcf_mirred_dev(a); for_each_port(adap, i) { n_dev = adap->port[i]; - if (index == n_dev->ifindex) { + if (target_dev == n_dev) { fs->action = FILTER_SWITCH; fs->eport = i; found = true; diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c index f63210f15579..112963defd0b 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c +++ b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c @@ -524,11 +524,14 @@ int t4_memory_rw(struct adapter *adap, int win, int mtype, u32 addr, * MEM_EDC1 = 1 * MEM_MC = 2 -- MEM_MC for chips with only 1 memory controller * MEM_MC1 = 3 -- for chips with 2 memory controllers (e.g. T5) + * MEM_HMA = 4 */ edc_size = EDRAM0_SIZE_G(t4_read_reg(adap, MA_EDRAM0_BAR_A)); - if (mtype != MEM_MC1) + if (mtype == MEM_HMA) { + memoffset = 2 * (edc_size * 1024 * 1024); + } else if (mtype != MEM_MC1) { memoffset = (mtype * (edc_size * 1024 * 1024)); - else { + } else { mc_size = EXT_MEM0_SIZE_G(t4_read_reg(adap, MA_EXT_MEMORY0_BAR_A)); memoffset = (MEM_MC0 * edc_size + mc_size) * 1024 * 1024; @@ -6527,18 +6530,21 @@ void t4_sge_decode_idma_state(struct adapter *adapter, int state) * t4_sge_ctxt_flush - flush the SGE context cache * @adap: the adapter * @mbox: mailbox to use for the FW command + * @ctx_type: Egress or Ingress * * Issues a FW command through the given mailbox to flush the * SGE context cache. */ -int t4_sge_ctxt_flush(struct adapter *adap, unsigned int mbox) +int t4_sge_ctxt_flush(struct adapter *adap, unsigned int mbox, int ctxt_type) { int ret; u32 ldst_addrspace; struct fw_ldst_cmd c; memset(&c, 0, sizeof(c)); - ldst_addrspace = FW_LDST_CMD_ADDRSPACE_V(FW_LDST_ADDRSPC_SGE_EGRC); + ldst_addrspace = FW_LDST_CMD_ADDRSPACE_V(ctxt_type == CTXT_EGRESS ? + FW_LDST_ADDRSPC_SGE_EGRC : + FW_LDST_ADDRSPC_SGE_INGC); c.op_to_addrspace = cpu_to_be32(FW_CMD_OP_V(FW_LDST_CMD) | FW_CMD_REQUEST_F | FW_CMD_READ_F | ldst_addrspace); diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.h b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.h index a964ed184356..83afb32c8491 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.h +++ b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.h @@ -70,7 +70,9 @@ enum { /* SGE context types */ enum ctxt_type { - CTXT_FLM = 2, + CTXT_EGRESS, + CTXT_INGRESS, + CTXT_FLM, CTXT_CNM, }; diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h b/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h index a7cfece72828..f6701e0a6701 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h +++ b/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h @@ -961,6 +961,10 @@ #define MA_EXT_MEMORY1_BAR_A 0x7808 +#define HMA_MUX_S 5 +#define HMA_MUX_V(x) ((x) << HMA_MUX_S) +#define HMA_MUX_F HMA_MUX_V(1U) + #define EXT_MEM1_BASE_S 16 #define EXT_MEM1_BASE_M 0xfffU #define EXT_MEM1_BASE_G(x) (((x) >> EXT_MEM1_BASE_S) & EXT_MEM1_BASE_M) diff --git a/drivers/net/ethernet/cisco/enic/enic_ethtool.c b/drivers/net/ethernet/cisco/enic/enic_ethtool.c index 462d0ce51240..efb9333c7cf8 100644 --- a/drivers/net/ethernet/cisco/enic/enic_ethtool.c +++ b/drivers/net/ethernet/cisco/enic/enic_ethtool.c @@ -18,6 +18,7 @@ #include <linux/netdevice.h> #include <linux/ethtool.h> +#include <linux/net_tstamp.h> #include "enic_res.h" #include "enic.h" @@ -578,6 +579,16 @@ static int enic_set_rxfh(struct net_device *netdev, const u32 *indir, return __enic_set_rsskey(enic); } +static int enic_get_ts_info(struct net_device *netdev, + struct ethtool_ts_info *info) +{ + info->so_timestamping = SOF_TIMESTAMPING_TX_SOFTWARE | + SOF_TIMESTAMPING_RX_SOFTWARE | + SOF_TIMESTAMPING_SOFTWARE; + + return 0; +} + static const struct ethtool_ops enic_ethtool_ops = { .get_drvinfo = enic_get_drvinfo, .get_msglevel = enic_get_msglevel, @@ -597,6 +608,7 @@ static const struct ethtool_ops enic_ethtool_ops = { .get_rxfh = enic_get_rxfh, .set_rxfh = enic_set_rxfh, .get_link_ksettings = enic_get_ksettings, + .get_ts_info = enic_get_ts_info, }; void enic_set_ethtool_ops(struct net_device *netdev) diff --git a/drivers/net/ethernet/cisco/enic/enic_main.c b/drivers/net/ethernet/cisco/enic/enic_main.c index e130fb757e7b..d98676e43e03 100644 --- a/drivers/net/ethernet/cisco/enic/enic_main.c +++ b/drivers/net/ethernet/cisco/enic/enic_main.c @@ -856,6 +856,7 @@ static netdev_tx_t enic_hard_start_xmit(struct sk_buff *skb, if (vnic_wq_desc_avail(wq) < MAX_SKB_FRAGS + ENIC_DESC_MAX_SPLITS) netif_tx_stop_queue(txq); + skb_tx_timestamp(skb); if (!skb->xmit_more || netif_xmit_stopped(txq)) vnic_wq_doorbell(wq); diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c index 59ed806a52c3..d07c700c7ff8 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c @@ -17,7 +17,7 @@ #include <linux/netdevice.h> #include <linux/pci.h> #include <linux/platform_device.h> - +#include <net/rtnetlink.h> #include "hclge_cmd.h" #include "hclge_dcb.h" #include "hclge_main.h" @@ -2226,6 +2226,12 @@ static int hclge_mac_init(struct hclge_dev *hdev) return hclge_cfg_func_mta_filter(hdev, 0, hdev->accept_mta_mc); } +static void hclge_reset_task_schedule(struct hclge_dev *hdev) +{ + if (!test_and_set_bit(HCLGE_STATE_RST_SERVICE_SCHED, &hdev->state)) + schedule_work(&hdev->rst_service_task); +} + static void hclge_task_schedule(struct hclge_dev *hdev) { if (!test_bit(HCLGE_STATE_DOWN, &hdev->state) && @@ -2362,6 +2368,46 @@ static void hclge_service_complete(struct hclge_dev *hdev) clear_bit(HCLGE_STATE_SERVICE_SCHED, &hdev->state); } +static u32 hclge_check_event_cause(struct hclge_dev *hdev, u32 *clearval) +{ + u32 rst_src_reg; + + /* fetch the events from their corresponding regs */ + rst_src_reg = hclge_read_dev(&hdev->hw, HCLGE_MISC_RESET_STS_REG); + + /* check for vector0 reset event sources */ + if (BIT(HCLGE_VECTOR0_GLOBALRESET_INT_B) & rst_src_reg) { + set_bit(HNAE3_GLOBAL_RESET, &hdev->reset_pending); + *clearval = BIT(HCLGE_VECTOR0_GLOBALRESET_INT_B); + return HCLGE_VECTOR0_EVENT_RST; + } + + if (BIT(HCLGE_VECTOR0_CORERESET_INT_B) & rst_src_reg) { + set_bit(HNAE3_CORE_RESET, &hdev->reset_pending); + *clearval = BIT(HCLGE_VECTOR0_CORERESET_INT_B); + return HCLGE_VECTOR0_EVENT_RST; + } + + if (BIT(HCLGE_VECTOR0_IMPRESET_INT_B) & rst_src_reg) { + set_bit(HNAE3_IMP_RESET, &hdev->reset_pending); + *clearval = BIT(HCLGE_VECTOR0_IMPRESET_INT_B); + return HCLGE_VECTOR0_EVENT_RST; + } + + /* mailbox event sharing vector 0 interrupt would be placed here */ + + return HCLGE_VECTOR0_EVENT_OTHER; +} + +static void hclge_clear_event_cause(struct hclge_dev *hdev, u32 event_type, + u32 regclr) +{ + if (event_type == HCLGE_VECTOR0_EVENT_RST) + hclge_write_dev(&hdev->hw, HCLGE_MISC_RESET_STS_REG, regclr); + + /* mailbox event sharing vector 0 interrupt would be placed here */ +} + static void hclge_enable_vector(struct hclge_misc_vector *vector, bool enable) { writel(enable ? 1 : 0, vector->addr); @@ -2370,10 +2416,28 @@ static void hclge_enable_vector(struct hclge_misc_vector *vector, bool enable) static irqreturn_t hclge_misc_irq_handle(int irq, void *data) { struct hclge_dev *hdev = data; + u32 event_cause; + u32 clearval; hclge_enable_vector(&hdev->misc_vector, false); - if (!test_and_set_bit(HCLGE_STATE_SERVICE_SCHED, &hdev->state)) - schedule_work(&hdev->service_task); + event_cause = hclge_check_event_cause(hdev, &clearval); + + /* vector 0 interrupt is shared with reset and mailbox source events. + * For now, we are not handling mailbox events. + */ + switch (event_cause) { + case HCLGE_VECTOR0_EVENT_RST: + hclge_reset_task_schedule(hdev); + break; + default: + dev_dbg(&hdev->pdev->dev, + "received unknown or unhandled event of vector0\n"); + break; + } + + /* we should clear the source of interrupt */ + hclge_clear_event_cause(hdev, event_cause, clearval); + hclge_enable_vector(&hdev->misc_vector, true); return IRQ_HANDLED; } @@ -2404,9 +2468,9 @@ static int hclge_misc_irq_init(struct hclge_dev *hdev) hclge_get_misc_vector(hdev); - ret = devm_request_irq(&hdev->pdev->dev, - hdev->misc_vector.vector_irq, - hclge_misc_irq_handle, 0, "hclge_misc", hdev); + /* this would be explicitly freed in the end */ + ret = request_irq(hdev->misc_vector.vector_irq, hclge_misc_irq_handle, + 0, "hclge_misc", hdev); if (ret) { hclge_free_vector(hdev, 0); dev_err(&hdev->pdev->dev, "request misc irq(%d) fail\n", @@ -2416,6 +2480,12 @@ static int hclge_misc_irq_init(struct hclge_dev *hdev) return ret; } +static void hclge_misc_irq_uninit(struct hclge_dev *hdev) +{ + free_irq(hdev->misc_vector.vector_irq, hdev); + hclge_free_vector(hdev, 0); +} + static int hclge_notify_client(struct hclge_dev *hdev, enum hnae3_reset_notify_type type) { @@ -2471,12 +2541,6 @@ static int hclge_reset_wait(struct hclge_dev *hdev) cnt++; } - /* must clear reset status register to - * prevent driver detect reset interrupt again - */ - reg = hclge_read_dev(&hdev->hw, HCLGE_MISC_RESET_STS_REG); - hclge_write_dev(&hdev->hw, HCLGE_MISC_RESET_STS_REG, reg); - if (cnt >= HCLGE_RESET_WAIT_CNT) { dev_warn(&hdev->pdev->dev, "Wait for reset timeout: %d\n", hdev->reset_type); @@ -2505,12 +2569,12 @@ static int hclge_func_reset_cmd(struct hclge_dev *hdev, int func_id) return ret; } -static void hclge_do_reset(struct hclge_dev *hdev, enum hnae3_reset_type type) +static void hclge_do_reset(struct hclge_dev *hdev) { struct pci_dev *pdev = hdev->pdev; u32 val; - switch (type) { + switch (hdev->reset_type) { case HNAE3_GLOBAL_RESET: val = hclge_read_dev(&hdev->hw, HCLGE_GLOBAL_RESET_REG); hnae_set_bit(val, HCLGE_GLOBAL_RESET_BIT, 1); @@ -2526,30 +2590,62 @@ static void hclge_do_reset(struct hclge_dev *hdev, enum hnae3_reset_type type) case HNAE3_FUNC_RESET: dev_info(&pdev->dev, "PF Reset requested\n"); hclge_func_reset_cmd(hdev, 0); + /* schedule again to check later */ + set_bit(HNAE3_FUNC_RESET, &hdev->reset_pending); + hclge_reset_task_schedule(hdev); break; default: dev_warn(&pdev->dev, - "Unsupported reset type: %d\n", type); + "Unsupported reset type: %d\n", hdev->reset_type); break; } } -static enum hnae3_reset_type hclge_detected_reset_event(struct hclge_dev *hdev) +static enum hnae3_reset_type hclge_get_reset_level(struct hclge_dev *hdev, + unsigned long *addr) { enum hnae3_reset_type rst_level = HNAE3_NONE_RESET; - u32 rst_reg_val; - rst_reg_val = hclge_read_dev(&hdev->hw, HCLGE_MISC_RESET_STS_REG); - if (BIT(HCLGE_VECTOR0_GLOBALRESET_INT_B) & rst_reg_val) + /* return the highest priority reset level amongst all */ + if (test_bit(HNAE3_GLOBAL_RESET, addr)) rst_level = HNAE3_GLOBAL_RESET; - else if (BIT(HCLGE_VECTOR0_CORERESET_INT_B) & rst_reg_val) + else if (test_bit(HNAE3_CORE_RESET, addr)) rst_level = HNAE3_CORE_RESET; - else if (BIT(HCLGE_VECTOR0_IMPRESET_INT_B) & rst_reg_val) + else if (test_bit(HNAE3_IMP_RESET, addr)) rst_level = HNAE3_IMP_RESET; + else if (test_bit(HNAE3_FUNC_RESET, addr)) + rst_level = HNAE3_FUNC_RESET; + + /* now, clear all other resets */ + clear_bit(HNAE3_GLOBAL_RESET, addr); + clear_bit(HNAE3_CORE_RESET, addr); + clear_bit(HNAE3_IMP_RESET, addr); + clear_bit(HNAE3_FUNC_RESET, addr); return rst_level; } +static void hclge_reset(struct hclge_dev *hdev) +{ + /* perform reset of the stack & ae device for a client */ + + hclge_notify_client(hdev, HNAE3_DOWN_CLIENT); + + if (!hclge_reset_wait(hdev)) { + rtnl_lock(); + hclge_notify_client(hdev, HNAE3_UNINIT_CLIENT); + hclge_reset_ae_dev(hdev->ae_dev); + hclge_notify_client(hdev, HNAE3_INIT_CLIENT); + rtnl_unlock(); + } else { + /* schedule again to check pending resets later */ + set_bit(hdev->reset_type, &hdev->reset_pending); + hclge_reset_task_schedule(hdev); + } + + hclge_notify_client(hdev, HNAE3_UP_CLIENT); +} + static void hclge_reset_event(struct hnae3_handle *handle, enum hnae3_reset_type reset) { @@ -2563,14 +2659,9 @@ static void hclge_reset_event(struct hnae3_handle *handle, case HNAE3_FUNC_RESET: case HNAE3_CORE_RESET: case HNAE3_GLOBAL_RESET: - if (test_bit(HCLGE_STATE_RESET_INT, &hdev->state)) { - dev_err(&hdev->pdev->dev, "Already in reset state"); - return; - } - hdev->reset_type = reset; - set_bit(HCLGE_STATE_RESET_INT, &hdev->state); - set_bit(HCLGE_STATE_SERVICE_SCHED, &hdev->state); - schedule_work(&hdev->service_task); + /* request reset & schedule reset task */ + set_bit(reset, &hdev->reset_request); + hclge_reset_task_schedule(hdev); break; default: dev_warn(&hdev->pdev->dev, "Unsupported reset event:%d", reset); @@ -2580,49 +2671,40 @@ static void hclge_reset_event(struct hnae3_handle *handle, static void hclge_reset_subtask(struct hclge_dev *hdev) { - bool do_reset; - - do_reset = hdev->reset_type != HNAE3_NONE_RESET; - - /* Reset is detected by interrupt */ - if (hdev->reset_type == HNAE3_NONE_RESET) - hdev->reset_type = hclge_detected_reset_event(hdev); - - if (hdev->reset_type == HNAE3_NONE_RESET) - return; - - switch (hdev->reset_type) { - case HNAE3_FUNC_RESET: - case HNAE3_CORE_RESET: - case HNAE3_GLOBAL_RESET: - case HNAE3_IMP_RESET: - hclge_notify_client(hdev, HNAE3_DOWN_CLIENT); + /* check if there is any ongoing reset in the hardware. This status can + * be checked from reset_pending. If there is then, we need to wait for + * hardware to complete reset. + * a. If we are able to figure out in reasonable time that hardware + * has fully resetted then, we can proceed with driver, client + * reset. + * b. else, we can come back later to check this status so re-sched + * now. + */ + hdev->reset_type = hclge_get_reset_level(hdev, &hdev->reset_pending); + if (hdev->reset_type != HNAE3_NONE_RESET) + hclge_reset(hdev); - if (do_reset) - hclge_do_reset(hdev, hdev->reset_type); - else - set_bit(HCLGE_STATE_RESET_INT, &hdev->state); + /* check if we got any *new* reset requests to be honored */ + hdev->reset_type = hclge_get_reset_level(hdev, &hdev->reset_request); + if (hdev->reset_type != HNAE3_NONE_RESET) + hclge_do_reset(hdev); - if (!hclge_reset_wait(hdev)) { - hclge_notify_client(hdev, HNAE3_UNINIT_CLIENT); - hclge_reset_ae_dev(hdev->ae_dev); - hclge_notify_client(hdev, HNAE3_INIT_CLIENT); - clear_bit(HCLGE_STATE_RESET_INT, &hdev->state); - } - hclge_notify_client(hdev, HNAE3_UP_CLIENT); - break; - default: - dev_err(&hdev->pdev->dev, "Unsupported reset type:%d\n", - hdev->reset_type); - break; - } hdev->reset_type = HNAE3_NONE_RESET; } -static void hclge_misc_irq_service_task(struct hclge_dev *hdev) +static void hclge_reset_service_task(struct work_struct *work) { + struct hclge_dev *hdev = + container_of(work, struct hclge_dev, rst_service_task); + + if (test_and_set_bit(HCLGE_STATE_RST_HANDLING, &hdev->state)) + return; + + clear_bit(HCLGE_STATE_RST_SERVICE_SCHED, &hdev->state); + hclge_reset_subtask(hdev); - hclge_enable_vector(&hdev->misc_vector, true); + + clear_bit(HCLGE_STATE_RST_HANDLING, &hdev->state); } static void hclge_service_task(struct work_struct *work) @@ -2630,7 +2712,6 @@ static void hclge_service_task(struct work_struct *work) struct hclge_dev *hdev = container_of(work, struct hclge_dev, service_task); - hclge_misc_irq_service_task(hdev); hclge_update_speed_duplex(hdev); hclge_update_link_status(hdev); hclge_update_stats_for_all(hdev); @@ -4661,6 +4742,8 @@ static int hclge_init_ae_dev(struct hnae3_ae_dev *ae_dev) hdev->pdev = pdev; hdev->ae_dev = ae_dev; hdev->reset_type = HNAE3_NONE_RESET; + hdev->reset_request = 0; + hdev->reset_pending = 0; ae_dev->priv = hdev; ret = hclge_pci_init(hdev); @@ -4772,12 +4855,15 @@ static int hclge_init_ae_dev(struct hnae3_ae_dev *ae_dev) timer_setup(&hdev->service_timer, hclge_service_timer, 0); INIT_WORK(&hdev->service_task, hclge_service_task); + INIT_WORK(&hdev->rst_service_task, hclge_reset_service_task); /* Enable MISC vector(vector0) */ hclge_enable_vector(&hdev->misc_vector, true); set_bit(HCLGE_STATE_SERVICE_INITED, &hdev->state); set_bit(HCLGE_STATE_DOWN, &hdev->state); + clear_bit(HCLGE_STATE_RST_SERVICE_SCHED, &hdev->state); + clear_bit(HCLGE_STATE_RST_HANDLING, &hdev->state); pr_info("%s driver initialization finished.\n", HCLGE_DRIVER_NAME); return 0; @@ -4889,14 +4975,16 @@ static void hclge_uninit_ae_dev(struct hnae3_ae_dev *ae_dev) del_timer_sync(&hdev->service_timer); if (hdev->service_task.func) cancel_work_sync(&hdev->service_task); + if (hdev->rst_service_task.func) + cancel_work_sync(&hdev->rst_service_task); if (mac->phydev) mdiobus_unregister(mac->mdio_bus); /* Disable MISC vector(vector0) */ hclge_enable_vector(&hdev->misc_vector, false); - hclge_free_vector(hdev, 0); hclge_destroy_cmd_queue(&hdev->hw); + hclge_misc_irq_uninit(hdev); hclge_pci_uninit(hdev); ae_dev->priv = NULL; } diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h index 7027814ea5d7..aacec438b933 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h @@ -99,12 +99,19 @@ enum HCLGE_DEV_STATE { HCLGE_STATE_REMOVING, HCLGE_STATE_SERVICE_INITED, HCLGE_STATE_SERVICE_SCHED, + HCLGE_STATE_RST_SERVICE_SCHED, + HCLGE_STATE_RST_HANDLING, HCLGE_STATE_MBX_HANDLING, HCLGE_STATE_MBX_IRQ, - HCLGE_STATE_RESET_INT, HCLGE_STATE_MAX }; +enum hclge_evt_cause { + HCLGE_VECTOR0_EVENT_RST, + HCLGE_VECTOR0_EVENT_MBX, + HCLGE_VECTOR0_EVENT_OTHER, +}; + #define HCLGE_MPF_ENBALE 1 struct hclge_caps { u16 num_tqp; @@ -420,6 +427,8 @@ struct hclge_dev { unsigned long state; enum hnae3_reset_type reset_type; + unsigned long reset_request; /* reset has been requested */ + unsigned long reset_pending; /* client rst is pending to be served */ u32 fw_version; u16 num_vmdq_vport; /* Num vmdq vport this PF has set up */ u16 num_tqps; /* Num task queue pairs of this PF */ @@ -469,6 +478,7 @@ struct hclge_dev { unsigned long service_timer_previous; struct timer_list service_timer; struct work_struct service_task; + struct work_struct rst_service_task; bool cur_promisc; int num_alloc_vfs; /* Actual number of VFs allocated */ diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c index 62a18914f00f..7737a05c717c 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -9101,9 +9101,11 @@ static int parse_tc_actions(struct ixgbe_adapter *adapter, /* Redirect to a VF or a offloaded macvlan */ if (is_tcf_mirred_egress_redirect(a)) { - int ifindex = tcf_mirred_ifindex(a); + struct net_device *dev = tcf_mirred_dev(a); - err = handle_redirect_action(adapter, ifindex, queue, + if (!dev) + return -EINVAL; + err = handle_redirect_action(adapter, dev->ifindex, queue, action); if (err == 0) return err; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index d2b057a3e512..0f5c012de52e 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -4308,9 +4308,6 @@ static void mlx5e_nic_cleanup(struct mlx5e_priv *priv) { mlx5e_ipsec_cleanup(priv); mlx5e_vxlan_cleanup(priv); - - if (priv->channels.params.xdp_prog) - bpf_prog_put(priv->channels.params.xdp_prog); } static int mlx5e_init_nic_rx(struct mlx5e_priv *priv) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c index 55979ec2e88a..3e03d2e8f96a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -1982,11 +1982,10 @@ static int parse_tc_fdb_actions(struct mlx5e_priv *priv, struct tcf_exts *exts, } if (is_tcf_mirred_egress_redirect(a)) { - int ifindex = tcf_mirred_ifindex(a); struct net_device *out_dev; struct mlx5e_priv *out_priv; - out_dev = __dev_get_by_index(dev_net(priv->netdev), ifindex); + out_dev = tcf_mirred_dev(a); if (switchdev_port_same_parent_id(priv->netdev, out_dev)) { @@ -1996,7 +1995,7 @@ static int parse_tc_fdb_actions(struct mlx5e_priv *priv, struct tcf_exts *exts, rpriv = out_priv->ppriv; attr->out_rep = rpriv->rep; } else if (encap) { - parse_attr->mirred_ifindex = ifindex; + parse_attr->mirred_ifindex = out_dev->ifindex; parse_attr->tun_info = *info; attr->parse_attr = parse_attr; attr->action |= MLX5_FLOW_CONTEXT_ACTION_ENCAP | diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c index 2d0897b7d860..3b9c8a0437bf 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c @@ -1571,14 +1571,11 @@ mlxsw_sp_port_add_cls_matchall_mirror(struct mlxsw_sp_port *mlxsw_sp_port, const struct tc_action *a, bool ingress) { - struct net *net = dev_net(mlxsw_sp_port->dev); enum mlxsw_sp_span_type span_type; struct mlxsw_sp_port *to_port; struct net_device *to_dev; - int ifindex; - ifindex = tcf_mirred_ifindex(a); - to_dev = __dev_get_by_index(net, ifindex); + to_dev = tcf_mirred_dev(a); if (!to_dev) { netdev_err(mlxsw_sp_port->dev, "Could not find requested device\n"); return -EINVAL; @@ -1838,6 +1835,54 @@ static int mlxsw_sp_setup_tc(struct net_device *dev, enum tc_setup_type type, } } + +static int mlxsw_sp_feature_hw_tc(struct net_device *dev, bool enable) +{ + struct mlxsw_sp_port *mlxsw_sp_port = netdev_priv(dev); + + if (!enable && (mlxsw_sp_port->acl_rule_count || + !list_empty(&mlxsw_sp_port->mall_tc_list))) { + netdev_err(dev, "Active offloaded tc filters, can't turn hw_tc_offload off\n"); + return -EINVAL; + } + return 0; +} + +typedef int (*mlxsw_sp_feature_handler)(struct net_device *dev, bool enable); + +static int mlxsw_sp_handle_feature(struct net_device *dev, + netdev_features_t wanted_features, + netdev_features_t feature, + mlxsw_sp_feature_handler feature_handler) +{ + netdev_features_t changes = wanted_features ^ dev->features; + bool enable = !!(wanted_features & feature); + int err; + + if (!(changes & feature)) + return 0; + + err = feature_handler(dev, enable); + if (err) { + netdev_err(dev, "%s feature %pNF failed, err %d\n", + enable ? "Enable" : "Disable", &feature, err); + return err; + } + + if (enable) + dev->features |= feature; + else + dev->features &= ~feature; + + return 0; +} +static int mlxsw_sp_set_features(struct net_device *dev, + netdev_features_t features) +{ + return mlxsw_sp_handle_feature(dev, features, NETIF_F_HW_TC, + mlxsw_sp_feature_hw_tc); +} + static const struct net_device_ops mlxsw_sp_port_netdev_ops = { .ndo_open = mlxsw_sp_port_open, .ndo_stop = mlxsw_sp_port_stop, @@ -1852,6 +1897,7 @@ static const struct net_device_ops mlxsw_sp_port_netdev_ops = { .ndo_vlan_rx_add_vid = mlxsw_sp_port_add_vid, .ndo_vlan_rx_kill_vid = mlxsw_sp_port_kill_vid, .ndo_get_phys_port_name = mlxsw_sp_port_get_phys_port_name, + .ndo_set_features = mlxsw_sp_set_features, }; static void mlxsw_sp_port_get_drvinfo(struct net_device *dev, diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h index 432ab9b12b7f..a0adcd886589 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h @@ -270,6 +270,7 @@ struct mlxsw_sp_port { struct mlxsw_sp_port_sample *sample; struct list_head vlans_list; struct mlxsw_sp_qdisc root_qdisc; + unsigned acl_rule_count; }; static inline bool diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c index 2f0e57857ea4..42e8a36b9b95 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c @@ -92,7 +92,6 @@ static int mlxsw_sp_flower_parse_actions(struct mlxsw_sp *mlxsw_sp, if (err) return err; } else if (is_tcf_mirred_egress_redirect(a)) { - int ifindex = tcf_mirred_ifindex(a); struct net_device *out_dev; struct mlxsw_sp_fid *fid; u16 fid_index; @@ -104,7 +103,7 @@ static int mlxsw_sp_flower_parse_actions(struct mlxsw_sp *mlxsw_sp, if (err) return err; - out_dev = __dev_get_by_index(dev_net(dev), ifindex); + out_dev = tcf_mirred_dev(a); if (out_dev == dev) out_dev = NULL; @@ -424,6 +423,7 @@ int mlxsw_sp_flower_replace(struct mlxsw_sp_port *mlxsw_sp_port, bool ingress, goto err_rule_add; mlxsw_sp_acl_ruleset_put(mlxsw_sp, ruleset); + mlxsw_sp_port->acl_rule_count++; return 0; err_rule_add: @@ -455,6 +455,7 @@ void mlxsw_sp_flower_destroy(struct mlxsw_sp_port *mlxsw_sp_port, bool ingress, } mlxsw_sp_acl_ruleset_put(mlxsw_sp, ruleset); + mlxsw_sp_port->acl_rule_count--; } int mlxsw_sp_flower_stats(struct mlxsw_sp_port *mlxsw_sp_port, bool ingress, diff --git a/drivers/net/ethernet/netronome/nfp/Makefile b/drivers/net/ethernet/netronome/nfp/Makefile index 24c4408b5734..6e5ef984398b 100644 --- a/drivers/net/ethernet/netronome/nfp/Makefile +++ b/drivers/net/ethernet/netronome/nfp/Makefile @@ -22,6 +22,7 @@ nfp-objs := \ nfp_hwmon.o \ nfp_main.o \ nfp_net_common.o \ + nfp_net_debugdump.o \ nfp_net_ethtool.o \ nfp_net_main.o \ nfp_net_repr.o \ diff --git a/drivers/net/ethernet/netronome/nfp/bpf/jit.c b/drivers/net/ethernet/netronome/nfp/bpf/jit.c index 995e95410b11..3419ad495962 100644 --- a/drivers/net/ethernet/netronome/nfp/bpf/jit.c +++ b/drivers/net/ethernet/netronome/nfp/bpf/jit.c @@ -1,5 +1,5 @@ /* - * Copyright (C) 2016 Netronome Systems, Inc. + * Copyright (C) 2016-2017 Netronome Systems, Inc. * * This software is dual licensed under the GNU General License Version 2, * June 1991 as shown in the file COPYING in the top-level directory of this @@ -66,12 +66,6 @@ next2 = nfp_meta_next(next)) static bool -nfp_meta_has_next(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta) -{ - return meta->l.next != &nfp_prog->insns; -} - -static bool nfp_meta_has_prev(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta) { return meta->l.prev != &nfp_prog->insns; @@ -102,7 +96,7 @@ nfp_prog_offset_to_index(struct nfp_prog *nfp_prog, unsigned int offset) /* --- Emitters --- */ static void __emit_cmd(struct nfp_prog *nfp_prog, enum cmd_tgt_map op, - u8 mode, u8 xfer, u8 areg, u8 breg, u8 size, bool sync) + u8 mode, u8 xfer, u8 areg, u8 breg, u8 size, bool sync, bool indir) { enum cmd_ctx_swap ctx; u64 insn; @@ -120,14 +114,15 @@ __emit_cmd(struct nfp_prog *nfp_prog, enum cmd_tgt_map op, FIELD_PREP(OP_CMD_CNT, size) | FIELD_PREP(OP_CMD_SIG, sync) | FIELD_PREP(OP_CMD_TGT_CMD, cmd_tgt_act[op].tgt_cmd) | + FIELD_PREP(OP_CMD_INDIR, indir) | FIELD_PREP(OP_CMD_MODE, mode); nfp_prog_push(nfp_prog, insn); } static void -emit_cmd(struct nfp_prog *nfp_prog, enum cmd_tgt_map op, - u8 mode, u8 xfer, swreg lreg, swreg rreg, u8 size, bool sync) +emit_cmd_any(struct nfp_prog *nfp_prog, enum cmd_tgt_map op, u8 mode, u8 xfer, + swreg lreg, swreg rreg, u8 size, bool sync, bool indir) { struct nfp_insn_re_regs reg; int err; @@ -148,7 +143,22 @@ emit_cmd(struct nfp_prog *nfp_prog, enum cmd_tgt_map op, return; } - __emit_cmd(nfp_prog, op, mode, xfer, reg.areg, reg.breg, size, sync); + __emit_cmd(nfp_prog, op, mode, xfer, reg.areg, reg.breg, size, sync, + indir); +} + +static void +emit_cmd(struct nfp_prog *nfp_prog, enum cmd_tgt_map op, u8 mode, u8 xfer, + swreg lreg, swreg rreg, u8 size, bool sync) +{ + emit_cmd_any(nfp_prog, op, mode, xfer, lreg, rreg, size, sync, false); +} + +static void +emit_cmd_indir(struct nfp_prog *nfp_prog, enum cmd_tgt_map op, u8 mode, u8 xfer, + swreg lreg, swreg rreg, u8 size, bool sync) +{ + emit_cmd_any(nfp_prog, op, mode, xfer, lreg, rreg, size, sync, true); } static void @@ -230,9 +240,11 @@ emit_immed(struct nfp_prog *nfp_prog, swreg dst, u16 imm, return; } - __emit_immed(nfp_prog, reg.areg, reg.breg, imm >> 8, width, - invert, shift, reg.wr_both, - reg.dst_lmextn, reg.src_lmextn); + /* Use reg.dst when destination is No-Dest. */ + __emit_immed(nfp_prog, + swreg_type(dst) == NN_REG_NONE ? reg.dst : reg.areg, + reg.breg, imm >> 8, width, invert, shift, + reg.wr_both, reg.dst_lmextn, reg.src_lmextn); } static void @@ -510,6 +522,147 @@ static void wrp_reg_mov(struct nfp_prog *nfp_prog, u16 dst, u16 src) wrp_mov(nfp_prog, reg_both(dst), reg_b(src)); } +/* wrp_reg_subpart() - load @field_len bytes from @offset of @src, write the + * result to @dst from low end. + */ +static void +wrp_reg_subpart(struct nfp_prog *nfp_prog, swreg dst, swreg src, u8 field_len, + u8 offset) +{ + enum shf_sc sc = offset ? SHF_SC_R_SHF : SHF_SC_NONE; + u8 mask = (1 << field_len) - 1; + + emit_ld_field_any(nfp_prog, dst, mask, src, sc, offset * 8, true); +} + +/* NFP has Command Push Pull bus which supports bluk memory operations. */ +static int nfp_cpp_memcpy(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta) +{ + bool descending_seq = meta->ldst_gather_len < 0; + s16 len = abs(meta->ldst_gather_len); + swreg src_base, off; + unsigned int i; + u8 xfer_num; + + off = re_load_imm_any(nfp_prog, meta->insn.off, imm_b(nfp_prog)); + src_base = reg_a(meta->insn.src_reg * 2); + xfer_num = round_up(len, 4) / 4; + + /* Setup PREV_ALU fields to override memory read length. */ + if (len > 32) + wrp_immed(nfp_prog, reg_none(), + CMD_OVE_LEN | FIELD_PREP(CMD_OV_LEN, xfer_num - 1)); + + /* Memory read from source addr into transfer-in registers. */ + emit_cmd_any(nfp_prog, CMD_TGT_READ32_SWAP, CMD_MODE_32b, 0, src_base, + off, xfer_num - 1, true, len > 32); + + /* Move from transfer-in to transfer-out. */ + for (i = 0; i < xfer_num; i++) + wrp_mov(nfp_prog, reg_xfer(i), reg_xfer(i)); + + off = re_load_imm_any(nfp_prog, meta->paired_st->off, imm_b(nfp_prog)); + + if (len <= 8) { + /* Use single direct_ref write8. */ + emit_cmd(nfp_prog, CMD_TGT_WRITE8_SWAP, CMD_MODE_32b, 0, + reg_a(meta->paired_st->dst_reg * 2), off, len - 1, + true); + } else if (len <= 32 && IS_ALIGNED(len, 4)) { + /* Use single direct_ref write32. */ + emit_cmd(nfp_prog, CMD_TGT_WRITE32_SWAP, CMD_MODE_32b, 0, + reg_a(meta->paired_st->dst_reg * 2), off, xfer_num - 1, + true); + } else if (len <= 32) { + /* Use single indirect_ref write8. */ + wrp_immed(nfp_prog, reg_none(), + CMD_OVE_LEN | FIELD_PREP(CMD_OV_LEN, len - 1)); + emit_cmd_indir(nfp_prog, CMD_TGT_WRITE8_SWAP, CMD_MODE_32b, 0, + reg_a(meta->paired_st->dst_reg * 2), off, + len - 1, true); + } else if (IS_ALIGNED(len, 4)) { + /* Use single indirect_ref write32. */ + wrp_immed(nfp_prog, reg_none(), + CMD_OVE_LEN | FIELD_PREP(CMD_OV_LEN, xfer_num - 1)); + emit_cmd_indir(nfp_prog, CMD_TGT_WRITE32_SWAP, CMD_MODE_32b, 0, + reg_a(meta->paired_st->dst_reg * 2), off, + xfer_num - 1, true); + } else if (len <= 40) { + /* Use one direct_ref write32 to write the first 32-bytes, then + * another direct_ref write8 to write the remaining bytes. + */ + emit_cmd(nfp_prog, CMD_TGT_WRITE32_SWAP, CMD_MODE_32b, 0, + reg_a(meta->paired_st->dst_reg * 2), off, 7, + true); + + off = re_load_imm_any(nfp_prog, meta->paired_st->off + 32, + imm_b(nfp_prog)); + emit_cmd(nfp_prog, CMD_TGT_WRITE8_SWAP, CMD_MODE_32b, 8, + reg_a(meta->paired_st->dst_reg * 2), off, len - 33, + true); + } else { + /* Use one indirect_ref write32 to write 4-bytes aligned length, + * then another direct_ref write8 to write the remaining bytes. + */ + u8 new_off; + + wrp_immed(nfp_prog, reg_none(), + CMD_OVE_LEN | FIELD_PREP(CMD_OV_LEN, xfer_num - 2)); + emit_cmd_indir(nfp_prog, CMD_TGT_WRITE32_SWAP, CMD_MODE_32b, 0, + reg_a(meta->paired_st->dst_reg * 2), off, + xfer_num - 2, true); + new_off = meta->paired_st->off + (xfer_num - 1) * 4; + off = re_load_imm_any(nfp_prog, new_off, imm_b(nfp_prog)); + emit_cmd(nfp_prog, CMD_TGT_WRITE8_SWAP, CMD_MODE_32b, + xfer_num - 1, reg_a(meta->paired_st->dst_reg * 2), off, + (len & 0x3) - 1, true); + } + + /* TODO: The following extra load is to make sure data flow be identical + * before and after we do memory copy optimization. + * + * The load destination register is not guaranteed to be dead, so we + * need to make sure it is loaded with the value the same as before + * this transformation. + * + * These extra loads could be removed once we have accurate register + * usage information. + */ + if (descending_seq) + xfer_num = 0; + else if (BPF_SIZE(meta->insn.code) != BPF_DW) + xfer_num = xfer_num - 1; + else + xfer_num = xfer_num - 2; + + switch (BPF_SIZE(meta->insn.code)) { + case BPF_B: + wrp_reg_subpart(nfp_prog, reg_both(meta->insn.dst_reg * 2), + reg_xfer(xfer_num), 1, + IS_ALIGNED(len, 4) ? 3 : (len & 3) - 1); + break; + case BPF_H: + wrp_reg_subpart(nfp_prog, reg_both(meta->insn.dst_reg * 2), + reg_xfer(xfer_num), 2, (len & 3) ^ 2); + break; + case BPF_W: + wrp_mov(nfp_prog, reg_both(meta->insn.dst_reg * 2), + reg_xfer(0)); + break; + case BPF_DW: + wrp_mov(nfp_prog, reg_both(meta->insn.dst_reg * 2), + reg_xfer(xfer_num)); + wrp_mov(nfp_prog, reg_both(meta->insn.dst_reg * 2 + 1), + reg_xfer(xfer_num + 1)); + break; + } + + if (BPF_SIZE(meta->insn.code) != BPF_DW) + wrp_immed(nfp_prog, reg_both(meta->insn.dst_reg * 2 + 1), 0); + + return 0; +} + static int data_ld(struct nfp_prog *nfp_prog, swreg offset, u8 dst_gpr, int size) { @@ -975,9 +1128,6 @@ wrp_test_reg(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta, { const struct bpf_insn *insn = &meta->insn; - if (insn->off < 0) /* TODO */ - return -EOPNOTSUPP; - wrp_test_reg_one(nfp_prog, insn->dst_reg * 2, alu_op, insn->src_reg * 2, br_mask, insn->off); wrp_test_reg_one(nfp_prog, insn->dst_reg * 2 + 1, alu_op, @@ -995,9 +1145,6 @@ wrp_cmp_imm(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta, u8 reg = insn->dst_reg * 2; swreg tmp_reg; - if (insn->off < 0) /* TODO */ - return -EOPNOTSUPP; - tmp_reg = ur_load_imm_any(nfp_prog, imm & ~0U, imm_b(nfp_prog)); if (!swap) emit_alu(nfp_prog, reg_none(), reg_a(reg), ALU_OP_SUB, tmp_reg); @@ -1027,9 +1174,6 @@ wrp_cmp_reg(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta, areg = insn->dst_reg * 2; breg = insn->src_reg * 2; - if (insn->off < 0) /* TODO */ - return -EOPNOTSUPP; - if (swap) { areg ^= breg; breg ^= areg; @@ -1494,6 +1638,9 @@ static int mem_ldx(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta, unsigned int size) { + if (meta->ldst_gather_len) + return nfp_cpp_memcpy(nfp_prog, meta); + if (meta->ptr.type == PTR_TO_CTX) { if (nfp_prog->type == BPF_PROG_TYPE_XDP) return mem_ldx_xdp(nfp_prog, meta, size); @@ -1630,8 +1777,6 @@ static int mem_stx8(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta) static int jump(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta) { - if (meta->insn.off < 0) /* TODO */ - return -EOPNOTSUPP; emit_br(nfp_prog, BR_UNC, meta->insn.off, 0); return 0; @@ -1646,9 +1791,6 @@ static int jeq_imm(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta) or1 = reg_a(insn->dst_reg * 2); or2 = reg_b(insn->dst_reg * 2 + 1); - if (insn->off < 0) /* TODO */ - return -EOPNOTSUPP; - if (imm & ~0U) { tmp_reg = ur_load_imm_any(nfp_prog, imm & ~0U, imm_b(nfp_prog)); emit_alu(nfp_prog, imm_a(nfp_prog), @@ -1695,9 +1837,6 @@ static int jset_imm(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta) u64 imm = insn->imm; /* sign extend */ swreg tmp_reg; - if (insn->off < 0) /* TODO */ - return -EOPNOTSUPP; - if (!imm) { meta->skip = true; return 0; @@ -1726,9 +1865,6 @@ static int jne_imm(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta) u64 imm = insn->imm; /* sign extend */ swreg tmp_reg; - if (insn->off < 0) /* TODO */ - return -EOPNOTSUPP; - if (!imm) { emit_alu(nfp_prog, reg_none(), reg_a(insn->dst_reg * 2), ALU_OP_OR, reg_b(insn->dst_reg * 2 + 1)); @@ -1753,9 +1889,6 @@ static int jeq_reg(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta) { const struct bpf_insn *insn = &meta->insn; - if (insn->off < 0) /* TODO */ - return -EOPNOTSUPP; - emit_alu(nfp_prog, imm_a(nfp_prog), reg_a(insn->dst_reg * 2), ALU_OP_XOR, reg_b(insn->src_reg * 2)); emit_alu(nfp_prog, imm_b(nfp_prog), reg_a(insn->dst_reg * 2 + 1), @@ -1887,17 +2020,22 @@ static void br_set_offset(u64 *instr, u16 offset) /* --- Assembler logic --- */ static int nfp_fixup_branches(struct nfp_prog *nfp_prog) { - struct nfp_insn_meta *meta, *next; - u32 off, br_idx; - u32 idx; + struct nfp_insn_meta *meta, *jmp_dst; + u32 idx, br_idx; - nfp_for_each_insn_walk2(nfp_prog, meta, next) { + list_for_each_entry(meta, &nfp_prog->insns, l) { if (meta->skip) continue; if (BPF_CLASS(meta->insn.code) != BPF_JMP) continue; - br_idx = nfp_prog_offset_to_index(nfp_prog, next->off) - 1; + if (list_is_last(&meta->l, &nfp_prog->insns)) + idx = nfp_prog->last_bpf_off; + else + idx = list_next_entry(meta, l)->off - 1; + + br_idx = nfp_prog_offset_to_index(nfp_prog, idx); + if (!nfp_is_br(nfp_prog->prog[br_idx])) { pr_err("Fixup found block not ending in branch %d %02x %016llx!!\n", br_idx, meta->insn.code, nfp_prog->prog[br_idx]); @@ -1907,23 +2045,14 @@ static int nfp_fixup_branches(struct nfp_prog *nfp_prog) if (FIELD_GET(OP_BR_SPECIAL, nfp_prog->prog[br_idx])) continue; - /* Find the target offset in assembler realm */ - off = meta->insn.off; - if (!off) { - pr_err("Fixup found zero offset!!\n"); + if (!meta->jmp_dst) { + pr_err("Non-exit jump doesn't have destination info recorded!!\n"); return -ELOOP; } - while (off && nfp_meta_has_next(nfp_prog, next)) { - next = nfp_meta_next(next); - off--; - } - if (off) { - pr_err("Fixup found too large jump!! %d\n", off); - return -ELOOP; - } + jmp_dst = meta->jmp_dst; - if (next->skip) { + if (jmp_dst->skip) { pr_err("Branch landing on removed instruction!!\n"); return -ELOOP; } @@ -1932,7 +2061,7 @@ static int nfp_fixup_branches(struct nfp_prog *nfp_prog) idx <= br_idx; idx++) { if (!nfp_is_br(nfp_prog->prog[idx])) continue; - br_set_offset(&nfp_prog->prog[idx], next->off); + br_set_offset(&nfp_prog->prog[idx], jmp_dst->off); } } @@ -2105,6 +2234,8 @@ static int nfp_translate(struct nfp_prog *nfp_prog) nfp_prog->n_translated++; } + nfp_prog->last_bpf_off = nfp_prog_current_offset(nfp_prog) - 1; + nfp_outro(nfp_prog); if (nfp_prog->error) return nfp_prog->error; @@ -2173,6 +2304,9 @@ static void nfp_bpf_opt_ld_mask(struct nfp_prog *nfp_prog) if (next.src_reg || next.dst_reg) continue; + if (meta2->flags & FLAG_INSN_IS_JUMP_DST) + continue; + meta2->skip = true; } } @@ -2209,17 +2343,258 @@ static void nfp_bpf_opt_ld_shift(struct nfp_prog *nfp_prog) if (next1.imm != 0x20 || next2.imm != 0x20) continue; + if (meta2->flags & FLAG_INSN_IS_JUMP_DST || + meta3->flags & FLAG_INSN_IS_JUMP_DST) + continue; + meta2->skip = true; meta3->skip = true; } } +/* load/store pair that forms memory copy sould look like the following: + * + * ld_width R, [addr_src + offset_src] + * st_width [addr_dest + offset_dest], R + * + * The destination register of load and source register of store should + * be the same, load and store should also perform at the same width. + * If either of addr_src or addr_dest is stack pointer, we don't do the + * CPP optimization as stack is modelled by registers on NFP. + */ +static bool +curr_pair_is_memcpy(struct nfp_insn_meta *ld_meta, + struct nfp_insn_meta *st_meta) +{ + struct bpf_insn *ld = &ld_meta->insn; + struct bpf_insn *st = &st_meta->insn; + + if (!is_mbpf_load(ld_meta) || !is_mbpf_store(st_meta)) + return false; + + if (ld_meta->ptr.type != PTR_TO_PACKET) + return false; + + if (st_meta->ptr.type != PTR_TO_PACKET) + return false; + + if (BPF_SIZE(ld->code) != BPF_SIZE(st->code)) + return false; + + if (ld->dst_reg != st->src_reg) + return false; + + /* There is jump to the store insn in this pair. */ + if (st_meta->flags & FLAG_INSN_IS_JUMP_DST) + return false; + + return true; +} + +/* Currently, we only support chaining load/store pairs if: + * + * - Their address base registers are the same. + * - Their address offsets are in the same order. + * - They operate at the same memory width. + * - There is no jump into the middle of them. + */ +static bool +curr_pair_chain_with_previous(struct nfp_insn_meta *ld_meta, + struct nfp_insn_meta *st_meta, + struct bpf_insn *prev_ld, + struct bpf_insn *prev_st) +{ + u8 prev_size, curr_size, prev_ld_base, prev_st_base, prev_ld_dst; + struct bpf_insn *ld = &ld_meta->insn; + struct bpf_insn *st = &st_meta->insn; + s16 prev_ld_off, prev_st_off; + + /* This pair is the start pair. */ + if (!prev_ld) + return true; + + prev_size = BPF_LDST_BYTES(prev_ld); + curr_size = BPF_LDST_BYTES(ld); + prev_ld_base = prev_ld->src_reg; + prev_st_base = prev_st->dst_reg; + prev_ld_dst = prev_ld->dst_reg; + prev_ld_off = prev_ld->off; + prev_st_off = prev_st->off; + + if (ld->dst_reg != prev_ld_dst) + return false; + + if (ld->src_reg != prev_ld_base || st->dst_reg != prev_st_base) + return false; + + if (curr_size != prev_size) + return false; + + /* There is jump to the head of this pair. */ + if (ld_meta->flags & FLAG_INSN_IS_JUMP_DST) + return false; + + /* Both in ascending order. */ + if (prev_ld_off + prev_size == ld->off && + prev_st_off + prev_size == st->off) + return true; + + /* Both in descending order. */ + if (ld->off + curr_size == prev_ld_off && + st->off + curr_size == prev_st_off) + return true; + + return false; +} + +/* Return TRUE if cross memory access happens. Cross memory access means + * store area is overlapping with load area that a later load might load + * the value from previous store, for this case we can't treat the sequence + * as an memory copy. + */ +static bool +cross_mem_access(struct bpf_insn *ld, struct nfp_insn_meta *head_ld_meta, + struct nfp_insn_meta *head_st_meta) +{ + s16 head_ld_off, head_st_off, ld_off; + + /* Different pointer types does not overlap. */ + if (head_ld_meta->ptr.type != head_st_meta->ptr.type) + return false; + + /* load and store are both PTR_TO_PACKET, check ID info. */ + if (head_ld_meta->ptr.id != head_st_meta->ptr.id) + return true; + + /* Canonicalize the offsets. Turn all of them against the original + * base register. + */ + head_ld_off = head_ld_meta->insn.off + head_ld_meta->ptr.off; + head_st_off = head_st_meta->insn.off + head_st_meta->ptr.off; + ld_off = ld->off + head_ld_meta->ptr.off; + + /* Ascending order cross. */ + if (ld_off > head_ld_off && + head_ld_off < head_st_off && ld_off >= head_st_off) + return true; + + /* Descending order cross. */ + if (ld_off < head_ld_off && + head_ld_off > head_st_off && ld_off <= head_st_off) + return true; + + return false; +} + +/* This pass try to identify the following instructoin sequences. + * + * load R, [regA + offA] + * store [regB + offB], R + * load R, [regA + offA + const_imm_A] + * store [regB + offB + const_imm_A], R + * load R, [regA + offA + 2 * const_imm_A] + * store [regB + offB + 2 * const_imm_A], R + * ... + * + * Above sequence is typically generated by compiler when lowering + * memcpy. NFP prefer using CPP instructions to accelerate it. + */ +static void nfp_bpf_opt_ldst_gather(struct nfp_prog *nfp_prog) +{ + struct nfp_insn_meta *head_ld_meta = NULL; + struct nfp_insn_meta *head_st_meta = NULL; + struct nfp_insn_meta *meta1, *meta2; + struct bpf_insn *prev_ld = NULL; + struct bpf_insn *prev_st = NULL; + u8 count = 0; + + nfp_for_each_insn_walk2(nfp_prog, meta1, meta2) { + struct bpf_insn *ld = &meta1->insn; + struct bpf_insn *st = &meta2->insn; + + /* Reset record status if any of the following if true: + * - The current insn pair is not load/store. + * - The load/store pair doesn't chain with previous one. + * - The chained load/store pair crossed with previous pair. + * - The chained load/store pair has a total size of memory + * copy beyond 128 bytes which is the maximum length a + * single NFP CPP command can transfer. + */ + if (!curr_pair_is_memcpy(meta1, meta2) || + !curr_pair_chain_with_previous(meta1, meta2, prev_ld, + prev_st) || + (head_ld_meta && (cross_mem_access(ld, head_ld_meta, + head_st_meta) || + head_ld_meta->ldst_gather_len >= 128))) { + if (!count) + continue; + + if (count > 1) { + s16 prev_ld_off = prev_ld->off; + s16 prev_st_off = prev_st->off; + s16 head_ld_off = head_ld_meta->insn.off; + + if (prev_ld_off < head_ld_off) { + head_ld_meta->insn.off = prev_ld_off; + head_st_meta->insn.off = prev_st_off; + head_ld_meta->ldst_gather_len = + -head_ld_meta->ldst_gather_len; + } + + head_ld_meta->paired_st = &head_st_meta->insn; + head_st_meta->skip = true; + } else { + head_ld_meta->ldst_gather_len = 0; + } + + /* If the chain is ended by an load/store pair then this + * could serve as the new head of the the next chain. + */ + if (curr_pair_is_memcpy(meta1, meta2)) { + head_ld_meta = meta1; + head_st_meta = meta2; + head_ld_meta->ldst_gather_len = + BPF_LDST_BYTES(ld); + meta1 = nfp_meta_next(meta1); + meta2 = nfp_meta_next(meta2); + prev_ld = ld; + prev_st = st; + count = 1; + } else { + head_ld_meta = NULL; + head_st_meta = NULL; + prev_ld = NULL; + prev_st = NULL; + count = 0; + } + + continue; + } + + if (!head_ld_meta) { + head_ld_meta = meta1; + head_st_meta = meta2; + } else { + meta1->skip = true; + meta2->skip = true; + } + + head_ld_meta->ldst_gather_len += BPF_LDST_BYTES(ld); + meta1 = nfp_meta_next(meta1); + meta2 = nfp_meta_next(meta2); + prev_ld = ld; + prev_st = st; + count++; + } +} + static int nfp_bpf_optimize(struct nfp_prog *nfp_prog) { nfp_bpf_opt_reg_init(nfp_prog); nfp_bpf_opt_ld_mask(nfp_prog); nfp_bpf_opt_ld_shift(nfp_prog); + nfp_bpf_opt_ldst_gather(nfp_prog); return 0; } diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.c b/drivers/net/ethernet/netronome/nfp/bpf/main.c index e379b78e86ef..54bfd7846f6d 100644 --- a/drivers/net/ethernet/netronome/nfp/bpf/main.c +++ b/drivers/net/ethernet/netronome/nfp/bpf/main.c @@ -82,12 +82,6 @@ static const char *nfp_bpf_extra_cap(struct nfp_app *app, struct nfp_net *nn) return nfp_net_ebpf_capable(nn) ? "BPF" : ""; } -static void nfp_bpf_vnic_free(struct nfp_app *app, struct nfp_net *nn) -{ - if (nn->dp.bpf_offload_xdp) - nfp_bpf_xdp_offload(app, nn, NULL); -} - static int nfp_bpf_setup_tc_block_cb(enum tc_setup_type type, void *type_data, void *cb_priv) { @@ -168,7 +162,6 @@ const struct nfp_app_type app_bpf = { .extra_cap = nfp_bpf_extra_cap, .vnic_alloc = nfp_app_nic_vnic_alloc, - .vnic_free = nfp_bpf_vnic_free, .setup_tc = nfp_bpf_setup_tc, .tc_busy = nfp_bpf_tc_busy, diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h b/drivers/net/ethernet/netronome/nfp/bpf/main.h index 082a15f6dfb5..5884291ddba5 100644 --- a/drivers/net/ethernet/netronome/nfp/bpf/main.h +++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h @@ -1,5 +1,5 @@ /* - * Copyright (C) 2016 Netronome Systems, Inc. + * Copyright (C) 2016-2017 Netronome Systems, Inc. * * This software is dual licensed under the GNU General License Version 2, * June 1991 as shown in the file COPYING in the top-level directory of this @@ -89,23 +89,37 @@ typedef int (*instr_cb_t)(struct nfp_prog *, struct nfp_insn_meta *); #define nfp_meta_next(meta) list_next_entry(meta, l) #define nfp_meta_prev(meta) list_prev_entry(meta, l) +#define FLAG_INSN_IS_JUMP_DST BIT(0) + /** * struct nfp_insn_meta - BPF instruction wrapper * @insn: BPF instruction * @ptr: pointer type for memory operations + * @ldst_gather_len: memcpy length gathered from load/store sequence + * @paired_st: the paired store insn at the head of the sequence * @ptr_not_const: pointer is not always constant + * @jmp_dst: destination info for jump instructions * @off: index of first generated machine instruction (in nfp_prog.prog) * @n: eBPF instruction number + * @flags: eBPF instruction extra optimization flags * @skip: skip this instruction (optimized out) * @double_cb: callback for second part of the instruction * @l: link on nfp_prog->insns list */ struct nfp_insn_meta { struct bpf_insn insn; - struct bpf_reg_state ptr; - bool ptr_not_const; + union { + struct { + struct bpf_reg_state ptr; + struct bpf_insn *paired_st; + s16 ldst_gather_len; + bool ptr_not_const; + }; + struct nfp_insn_meta *jmp_dst; + }; unsigned int off; unsigned short n; + unsigned short flags; bool skip; instr_cb_t double_cb; @@ -134,6 +148,16 @@ static inline u8 mbpf_mode(const struct nfp_insn_meta *meta) return BPF_MODE(meta->insn.code); } +static inline bool is_mbpf_load(const struct nfp_insn_meta *meta) +{ + return (meta->insn.code & ~BPF_SIZE_MASK) == (BPF_LDX | BPF_MEM); +} + +static inline bool is_mbpf_store(const struct nfp_insn_meta *meta) +{ + return (meta->insn.code & ~BPF_SIZE_MASK) == (BPF_STX | BPF_MEM); +} + /** * struct nfp_prog - nfp BPF program * @prog: machine code @@ -142,6 +166,7 @@ static inline u8 mbpf_mode(const struct nfp_insn_meta *meta) * @verifier_meta: temporary storage for verifier's insn meta * @type: BPF program type * @start_off: address of the first instruction in the memory + * @last_bpf_off: address of the last instruction translated from BPF * @tgt_out: jump target for normal exit * @tgt_abort: jump target for abort (e.g. access outside of packet buffer) * @tgt_done: jump target to get the next packet @@ -160,6 +185,7 @@ struct nfp_prog { enum bpf_prog_type type; unsigned int start_off; + unsigned int last_bpf_off; unsigned int tgt_out; unsigned int tgt_abort; unsigned int tgt_done; @@ -189,4 +215,7 @@ int nfp_bpf_translate(struct nfp_app *app, struct nfp_net *nn, struct bpf_prog *prog); int nfp_bpf_destroy(struct nfp_app *app, struct nfp_net *nn, struct bpf_prog *prog); +struct nfp_insn_meta * +nfp_bpf_goto_meta(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta, + unsigned int insn_idx, unsigned int n_insns); #endif diff --git a/drivers/net/ethernet/netronome/nfp/bpf/offload.c b/drivers/net/ethernet/netronome/nfp/bpf/offload.c index bc879aeb62d4..377976ce92dd 100644 --- a/drivers/net/ethernet/netronome/nfp/bpf/offload.c +++ b/drivers/net/ethernet/netronome/nfp/bpf/offload.c @@ -1,5 +1,5 @@ /* - * Copyright (C) 2016 Netronome Systems, Inc. + * Copyright (C) 2016-2017 Netronome Systems, Inc. * * This software is dual licensed under the GNU General License Version 2, * June 1991 as shown in the file COPYING in the top-level directory of this @@ -55,11 +55,10 @@ static int nfp_prog_prepare(struct nfp_prog *nfp_prog, const struct bpf_insn *prog, unsigned int cnt) { + struct nfp_insn_meta *meta; unsigned int i; for (i = 0; i < cnt; i++) { - struct nfp_insn_meta *meta; - meta = kzalloc(sizeof(*meta), GFP_KERNEL); if (!meta) return -ENOMEM; @@ -70,6 +69,24 @@ nfp_prog_prepare(struct nfp_prog *nfp_prog, const struct bpf_insn *prog, list_add_tail(&meta->l, &nfp_prog->insns); } + /* Another pass to record jump information. */ + list_for_each_entry(meta, &nfp_prog->insns, l) { + u64 code = meta->insn.code; + + if (BPF_CLASS(code) == BPF_JMP && BPF_OP(code) != BPF_EXIT && + BPF_OP(code) != BPF_CALL) { + struct nfp_insn_meta *dst_meta; + unsigned short dst_indx; + + dst_indx = meta->n + 1 + meta->insn.off; + dst_meta = nfp_bpf_goto_meta(nfp_prog, meta, dst_indx, + cnt); + + meta->jmp_dst = dst_meta; + dst_meta->flags |= FLAG_INSN_IS_JUMP_DST; + } + } + return 0; } diff --git a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c index 8d43491ddd6b..d2bf29c90226 100644 --- a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c +++ b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c @@ -1,5 +1,5 @@ /* - * Copyright (C) 2016 Netronome Systems, Inc. + * Copyright (C) 2016-2017 Netronome Systems, Inc. * * This software is dual licensed under the GNU General License Version 2, * June 1991 as shown in the file COPYING in the top-level directory of this @@ -40,7 +40,7 @@ #include "main.h" -static struct nfp_insn_meta * +struct nfp_insn_meta * nfp_bpf_goto_meta(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta, unsigned int insn_idx, unsigned int n_insns) { @@ -180,10 +180,10 @@ nfp_verify_insn(struct bpf_verifier_env *env, int insn_idx, int prev_insn_idx) if (meta->insn.code == (BPF_JMP | BPF_EXIT)) return nfp_bpf_check_exit(nfp_prog, env); - if ((meta->insn.code & ~BPF_SIZE_MASK) == (BPF_LDX | BPF_MEM)) + if (is_mbpf_load(meta)) return nfp_bpf_check_ptr(nfp_prog, meta, env, meta->insn.src_reg); - if ((meta->insn.code & ~BPF_SIZE_MASK) == (BPF_STX | BPF_MEM)) + if (is_mbpf_store(meta)) return nfp_bpf_check_ptr(nfp_prog, meta, env, meta->insn.dst_reg); diff --git a/drivers/net/ethernet/netronome/nfp/flower/action.c b/drivers/net/ethernet/netronome/nfp/flower/action.c index c1c595f8bb87..ca74c517f626 100644 --- a/drivers/net/ethernet/netronome/nfp/flower/action.c +++ b/drivers/net/ethernet/netronome/nfp/flower/action.c @@ -93,13 +93,11 @@ nfp_fl_output(struct nfp_fl_output *output, const struct tc_action *action, size_t act_size = sizeof(struct nfp_fl_output); struct net_device *out_dev; u16 tmp_flags; - int ifindex; output->head.jump_id = NFP_FL_ACTION_OPCODE_OUTPUT; output->head.len_lw = act_size >> NFP_FL_LW_SIZ; - ifindex = tcf_mirred_ifindex(action); - out_dev = __dev_get_by_index(dev_net(in_dev), ifindex); + out_dev = tcf_mirred_dev(action); if (!out_dev) return -EOPNOTSUPP; diff --git a/drivers/net/ethernet/netronome/nfp/nfp_asm.c b/drivers/net/ethernet/netronome/nfp/nfp_asm.c index 830f6de25f47..d3610987fb07 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_asm.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_asm.c @@ -41,6 +41,7 @@ const struct cmd_tgt_act cmd_tgt_act[__CMD_TGT_MAP_SIZE] = { [CMD_TGT_WRITE8_SWAP] = { 0x02, 0x42 }, + [CMD_TGT_WRITE32_SWAP] = { 0x02, 0x5f }, [CMD_TGT_READ8] = { 0x01, 0x43 }, [CMD_TGT_READ32] = { 0x00, 0x5c }, [CMD_TGT_READ32_LE] = { 0x01, 0x5c }, @@ -120,7 +121,8 @@ int swreg_to_unrestricted(swreg dst, swreg lreg, swreg rreg, reg->dst = nfp_swreg_to_unreg(dst, true); /* Decode source operands */ - if (swreg_type(lreg) == swreg_type(rreg)) + if (swreg_type(lreg) == swreg_type(rreg) && + swreg_type(lreg) != NN_REG_NONE) return -EFAULT; if (swreg_type(lreg) == NN_REG_GPR_B || @@ -200,7 +202,8 @@ int swreg_to_restricted(swreg dst, swreg lreg, swreg rreg, reg->dst = nfp_swreg_to_rereg(dst, true, false, NULL); /* Decode source operands */ - if (swreg_type(lreg) == swreg_type(rreg)) + if (swreg_type(lreg) == swreg_type(rreg) && + swreg_type(lreg) != NN_REG_NONE) return -EFAULT; if (swreg_type(lreg) == NN_REG_GPR_B || diff --git a/drivers/net/ethernet/netronome/nfp/nfp_asm.h b/drivers/net/ethernet/netronome/nfp/nfp_asm.h index 74d0c11ab2f9..3387e6926eb0 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_asm.h +++ b/drivers/net/ethernet/netronome/nfp/nfp_asm.h @@ -1,5 +1,5 @@ /* - * Copyright (C) 2016 Netronome Systems, Inc. + * Copyright (C) 2016-2017 Netronome Systems, Inc. * * This software is dual licensed under the GNU General License Version 2, * June 1991 as shown in the file COPYING in the top-level directory of this @@ -209,6 +209,7 @@ enum alu_dst_ab { #define OP_CMD_CNT 0x0000e000000ULL #define OP_CMD_SIG 0x000f0000000ULL #define OP_CMD_TGT_CMD 0x07f00000000ULL +#define OP_CMD_INDIR 0x20000000000ULL #define OP_CMD_MODE 0x1c0000000000ULL struct cmd_tgt_act { @@ -219,6 +220,7 @@ struct cmd_tgt_act { enum cmd_tgt_map { CMD_TGT_READ8, CMD_TGT_WRITE8_SWAP, + CMD_TGT_WRITE32_SWAP, CMD_TGT_READ32, CMD_TGT_READ32_LE, CMD_TGT_READ32_SWAP, @@ -240,6 +242,9 @@ enum cmd_ctx_swap { CMD_CTX_NO_SWAP = 3, }; +#define CMD_OVE_LEN BIT(7) +#define CMD_OV_LEN GENMASK(12, 8) + #define OP_LCSR_BASE 0x0fc00000000ULL #define OP_LCSR_A_SRC 0x000000003ffULL #define OP_LCSR_B_SRC 0x000000ffc00ULL @@ -257,6 +262,7 @@ enum lcsr_wr_src { #define OP_CARB_BASE 0x0e000000000ULL #define OP_CARB_OR 0x00000010000ULL +#define NFP_CSR_CTX_PTR 0x20 #define NFP_CSR_ACT_LM_ADDR0 0x64 #define NFP_CSR_ACT_LM_ADDR1 0x6c #define NFP_CSR_ACT_LM_ADDR2 0x94 @@ -377,4 +383,13 @@ int swreg_to_restricted(swreg dst, swreg lreg, swreg rreg, int nfp_ustore_check_valid_no_ecc(u64 insn); u64 nfp_ustore_calc_ecc_insn(u64 insn); +#define NFP_IND_ME_REFL_WR_SIG_INIT 3 +#define NFP_IND_ME_CTX_PTR_BASE_MASK GENMASK(9, 0) +#define NFP_IND_NUM_CONTEXTS 8 + +static inline u32 nfp_get_ind_csr_ctx_ptr_offs(u32 read_offset) +{ + return (read_offset & ~NFP_IND_ME_CTX_PTR_BASE_MASK) | NFP_CSR_CTX_PTR; +} + #endif diff --git a/drivers/net/ethernet/netronome/nfp/nfp_main.c b/drivers/net/ethernet/netronome/nfp/nfp_main.c index 35eaccbece36..0953fa8f3109 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_main.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_main.c @@ -45,6 +45,7 @@ #include <linux/pci.h> #include <linux/firmware.h> #include <linux/vermagic.h> +#include <linux/vmalloc.h> #include <net/devlink.h> #include "nfpcore/nfp.h" @@ -509,6 +510,9 @@ static int nfp_pci_probe(struct pci_dev *pdev, pf->mip = nfp_mip_open(pf->cpp); pf->rtbl = __nfp_rtsym_table_read(pf->cpp, pf->mip); + pf->dump_flag = NFP_DUMP_NSP_DIAG; + pf->dumpspec = nfp_net_dump_load_dumpspec(pf->cpp, pf->rtbl); + err = nfp_pcie_sriov_read_nfd_limit(pf); if (err) goto err_fw_unload; @@ -544,6 +548,7 @@ err_fw_unload: nfp_fw_unload(pf); kfree(pf->eth_tbl); kfree(pf->nspi); + vfree(pf->dumpspec); err_devlink_unreg: devlink_unregister(devlink); err_hwinfo_free: @@ -579,6 +584,7 @@ static void nfp_pci_remove(struct pci_dev *pdev) devlink_unregister(devlink); + vfree(pf->dumpspec); kfree(pf->rtbl); nfp_mip_close(pf->mip); if (pf->fw_loaded) diff --git a/drivers/net/ethernet/netronome/nfp/nfp_main.h b/drivers/net/ethernet/netronome/nfp/nfp_main.h index be0ee59f2eb9..add46e28212b 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_main.h +++ b/drivers/net/ethernet/netronome/nfp/nfp_main.h @@ -39,6 +39,7 @@ #ifndef NFP_MAIN_H #define NFP_MAIN_H +#include <linux/ethtool.h> #include <linux/list.h> #include <linux/types.h> #include <linux/msi.h> @@ -62,6 +63,17 @@ struct nfp_port; struct nfp_rtsym_table; /** + * struct nfp_dumpspec - NFP FW dump specification structure + * @size: Size of the data + * @data: Sequence of TLVs, each being an instruction to dump some data + * from FW + */ +struct nfp_dumpspec { + u32 size; + u8 data[0]; +}; + +/** * struct nfp_pf - NFP PF-specific device structure * @pdev: Backpointer to PCI device * @cpp: Pointer to the CPP handle @@ -83,6 +95,9 @@ struct nfp_rtsym_table; * @mip: MIP handle * @rtbl: RTsym table * @hwinfo: HWInfo table + * @dumpspec: Debug dump specification + * @dump_flag: Store dump flag between set_dump and get_dump_flag + * @dump_len: Store dump length between set_dump and get_dump_flag * @eth_tbl: NSP ETH table * @nspi: NSP identification info * @hwmon_dev: pointer to hwmon device @@ -124,6 +139,9 @@ struct nfp_pf { const struct nfp_mip *mip; struct nfp_rtsym_table *rtbl; struct nfp_hwinfo *hwinfo; + struct nfp_dumpspec *dumpspec; + u32 dump_flag; + u32 dump_len; struct nfp_eth_table *eth_tbl; struct nfp_nsp_identify *nspi; @@ -157,4 +175,15 @@ void nfp_net_get_mac_addr(struct nfp_pf *pf, struct nfp_port *port); bool nfp_ctrl_tx(struct nfp_net *nn, struct sk_buff *skb); +enum nfp_dump_diag { + NFP_DUMP_NSP_DIAG = 0, +}; + +struct nfp_dumpspec * +nfp_net_dump_load_dumpspec(struct nfp_cpp *cpp, struct nfp_rtsym_table *rtbl); +s64 nfp_net_dump_calculate_size(struct nfp_pf *pf, struct nfp_dumpspec *spec, + u32 flag); +int nfp_net_dump_populate_buffer(struct nfp_pf *pf, struct nfp_dumpspec *spec, + struct ethtool_dump *dump_param, void *dest); + #endif /* NFP_MAIN_H */ diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h b/drivers/net/ethernet/netronome/nfp/nfp_net.h index 7f9857c276b1..3801c52098d5 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net.h +++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h @@ -548,6 +548,8 @@ struct nfp_net_dp { * @max_r_vecs: Number of allocated interrupt vectors for RX/TX * @max_tx_rings: Maximum number of TX rings supported by the Firmware * @max_rx_rings: Maximum number of RX rings supported by the Firmware + * @stride_rx: Queue controller RX queue spacing + * @stride_tx: Queue controller TX queue spacing * @r_vecs: Pre-allocated array of ring vectors * @irq_entries: Pre-allocated array of MSI-X entries * @lsc_handler: Handler for Link State Change interrupt diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c index 1a603fdd9e80..ad3e9f6a61e5 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c @@ -3392,6 +3392,7 @@ static int nfp_net_xdp(struct net_device *netdev, struct netdev_bpf *xdp) if (nn->dp.bpf_offload_xdp) xdp->prog_attached = XDP_ATTACHED_HW; xdp->prog_id = nn->xdp_prog ? nn->xdp_prog->aux->id : 0; + xdp->flags = nn->xdp_prog ? nn->xdp_flags : 0; return 0; case BPF_OFFLOAD_VERIFIER_PREP: return nfp_app_bpf_verifier_prep(nn->app, nn, xdp); @@ -3561,9 +3562,6 @@ struct nfp_net *nfp_net_alloc(struct pci_dev *pdev, bool needs_netdev, */ void nfp_net_free(struct nfp_net *nn) { - if (nn->xdp_prog) - bpf_prog_put(nn->xdp_prog); - if (nn->dp.netdev) free_netdev(nn->dp.netdev); else diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_debugdump.c b/drivers/net/ethernet/netronome/nfp/nfp_net_debugdump.c new file mode 100644 index 000000000000..cb74602f0907 --- /dev/null +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_debugdump.c @@ -0,0 +1,787 @@ +/* + * Copyright (C) 2017 Netronome Systems, Inc. + * + * This software is dual licensed under the GNU General License Version 2, + * June 1991 as shown in the file COPYING in the top-level directory of this + * source tree or the BSD 2-Clause License provided below. You have the + * option to license this software under the complete terms of either license. + * + * The BSD 2-Clause License: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * 1. Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * 2. Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#include <linux/ethtool.h> +#include <linux/vmalloc.h> + +#include "nfp_asm.h" +#include "nfp_main.h" +#include "nfpcore/nfp.h" +#include "nfpcore/nfp_nffw.h" + +#define NFP_DUMP_SPEC_RTSYM "_abi_dump_spec" + +#define ALIGN8(x) ALIGN(x, 8) + +enum nfp_dumpspec_type { + NFP_DUMPSPEC_TYPE_CPP_CSR = 0, + NFP_DUMPSPEC_TYPE_XPB_CSR = 1, + NFP_DUMPSPEC_TYPE_ME_CSR = 2, + NFP_DUMPSPEC_TYPE_INDIRECT_ME_CSR = 3, + NFP_DUMPSPEC_TYPE_RTSYM = 4, + NFP_DUMPSPEC_TYPE_HWINFO = 5, + NFP_DUMPSPEC_TYPE_FWNAME = 6, + NFP_DUMPSPEC_TYPE_HWINFO_FIELD = 7, + NFP_DUMPSPEC_TYPE_PROLOG = 10000, + NFP_DUMPSPEC_TYPE_ERROR = 10001, +}; + +/* The following structs must be carefully aligned so that they can be used to + * interpret the binary dumpspec and populate the dump data in a deterministic + * way. + */ + +/* generic type plus length */ +struct nfp_dump_tl { + __be32 type; + __be32 length; /* chunk length to follow, aligned to 8 bytes */ + char data[0]; +}; + +/* NFP CPP parameters */ +struct nfp_dumpspec_cpp_isl_id { + u8 target; + u8 action; + u8 token; + u8 island; +}; + +struct nfp_dump_common_cpp { + struct nfp_dumpspec_cpp_isl_id cpp_id; + __be32 offset; /* address to start dump */ + __be32 dump_length; /* total bytes to dump, aligned to reg size */ +}; + +/* CSR dumpables */ +struct nfp_dumpspec_csr { + struct nfp_dump_tl tl; + struct nfp_dump_common_cpp cpp; + __be32 register_width; /* in bits */ +}; + +struct nfp_dumpspec_rtsym { + struct nfp_dump_tl tl; + char rtsym[0]; +}; + +/* header for register dumpable */ +struct nfp_dump_csr { + struct nfp_dump_tl tl; + struct nfp_dump_common_cpp cpp; + __be32 register_width; /* in bits */ + __be32 error; /* error code encountered while reading */ + __be32 error_offset; /* offset being read when error occurred */ +}; + +struct nfp_dump_rtsym { + struct nfp_dump_tl tl; + struct nfp_dump_common_cpp cpp; + __be32 error; /* error code encountered while reading */ + u8 padded_name_length; /* pad so data starts at 8 byte boundary */ + char rtsym[0]; + /* after padded_name_length, there is dump_length data */ +}; + +struct nfp_dump_prolog { + struct nfp_dump_tl tl; + __be32 dump_level; +}; + +struct nfp_dump_error { + struct nfp_dump_tl tl; + __be32 error; + char padding[4]; + char spec[0]; +}; + +/* to track state through debug size calculation TLV traversal */ +struct nfp_level_size { + u32 requested_level; /* input */ + u32 total_size; /* output */ +}; + +/* to track state during debug dump creation TLV traversal */ +struct nfp_dump_state { + u32 requested_level; /* input param */ + u32 dumped_size; /* adds up to size of dumped data */ + u32 buf_size; /* size of buffer pointer to by p */ + void *p; /* current point in dump buffer */ +}; + +typedef int (*nfp_tlv_visit)(struct nfp_pf *pf, struct nfp_dump_tl *tl, + void *param); + +static int +nfp_traverse_tlvs(struct nfp_pf *pf, void *data, u32 data_length, void *param, + nfp_tlv_visit tlv_visit) +{ + long long remaining = data_length; + struct nfp_dump_tl *tl; + u32 total_tlv_size; + void *p = data; + int err; + + while (remaining >= sizeof(*tl)) { + tl = p; + if (!tl->type && !tl->length) + break; + + if (be32_to_cpu(tl->length) > remaining - sizeof(*tl)) + return -EINVAL; + + total_tlv_size = sizeof(*tl) + be32_to_cpu(tl->length); + + /* Spec TLVs should be aligned to 4 bytes. */ + if (total_tlv_size % 4 != 0) + return -EINVAL; + + p += total_tlv_size; + remaining -= total_tlv_size; + err = tlv_visit(pf, tl, param); + if (err) + return err; + } + + return 0; +} + +static u32 nfp_get_numeric_cpp_id(struct nfp_dumpspec_cpp_isl_id *cpp_id) +{ + return NFP_CPP_ISLAND_ID(cpp_id->target, cpp_id->action, cpp_id->token, + cpp_id->island); +} + +struct nfp_dumpspec * +nfp_net_dump_load_dumpspec(struct nfp_cpp *cpp, struct nfp_rtsym_table *rtbl) +{ + const struct nfp_rtsym *specsym; + struct nfp_dumpspec *dumpspec; + int bytes_read; + u32 cpp_id; + + specsym = nfp_rtsym_lookup(rtbl, NFP_DUMP_SPEC_RTSYM); + if (!specsym) + return NULL; + + /* expected size of this buffer is in the order of tens of kilobytes */ + dumpspec = vmalloc(sizeof(*dumpspec) + specsym->size); + if (!dumpspec) + return NULL; + + dumpspec->size = specsym->size; + + cpp_id = NFP_CPP_ISLAND_ID(specsym->target, NFP_CPP_ACTION_RW, 0, + specsym->domain); + + bytes_read = nfp_cpp_read(cpp, cpp_id, specsym->addr, dumpspec->data, + specsym->size); + if (bytes_read != specsym->size) { + vfree(dumpspec); + nfp_warn(cpp, "Debug dump specification read failed.\n"); + return NULL; + } + + return dumpspec; +} + +static int nfp_dump_error_tlv_size(struct nfp_dump_tl *spec) +{ + return ALIGN8(sizeof(struct nfp_dump_error) + sizeof(*spec) + + be32_to_cpu(spec->length)); +} + +static int nfp_calc_fwname_tlv_size(struct nfp_pf *pf) +{ + u32 fwname_len = strlen(nfp_mip_name(pf->mip)); + + return sizeof(struct nfp_dump_tl) + ALIGN8(fwname_len + 1); +} + +static int nfp_calc_hwinfo_field_sz(struct nfp_pf *pf, struct nfp_dump_tl *spec) +{ + u32 tl_len, key_len; + const char *value; + + tl_len = be32_to_cpu(spec->length); + key_len = strnlen(spec->data, tl_len); + if (key_len == tl_len) + return nfp_dump_error_tlv_size(spec); + + value = nfp_hwinfo_lookup(pf->hwinfo, spec->data); + if (!value) + return nfp_dump_error_tlv_size(spec); + + return sizeof(struct nfp_dump_tl) + ALIGN8(key_len + strlen(value) + 2); +} + +static bool nfp_csr_spec_valid(struct nfp_dumpspec_csr *spec_csr) +{ + u32 required_read_sz = sizeof(*spec_csr) - sizeof(spec_csr->tl); + u32 available_sz = be32_to_cpu(spec_csr->tl.length); + u32 reg_width; + + if (available_sz < required_read_sz) + return false; + + reg_width = be32_to_cpu(spec_csr->register_width); + + return reg_width == 32 || reg_width == 64; +} + +static int +nfp_calc_rtsym_dump_sz(struct nfp_pf *pf, struct nfp_dump_tl *spec) +{ + struct nfp_rtsym_table *rtbl = pf->rtbl; + struct nfp_dumpspec_rtsym *spec_rtsym; + const struct nfp_rtsym *sym; + u32 tl_len, key_len; + + spec_rtsym = (struct nfp_dumpspec_rtsym *)spec; + tl_len = be32_to_cpu(spec->length); + key_len = strnlen(spec_rtsym->rtsym, tl_len); + if (key_len == tl_len) + return nfp_dump_error_tlv_size(spec); + + sym = nfp_rtsym_lookup(rtbl, spec_rtsym->rtsym); + if (!sym) + return nfp_dump_error_tlv_size(spec); + + return ALIGN8(offsetof(struct nfp_dump_rtsym, rtsym) + key_len + 1) + + ALIGN8(sym->size); +} + +static int +nfp_add_tlv_size(struct nfp_pf *pf, struct nfp_dump_tl *tl, void *param) +{ + struct nfp_dumpspec_csr *spec_csr; + u32 *size = param; + u32 hwinfo_size; + + switch (be32_to_cpu(tl->type)) { + case NFP_DUMPSPEC_TYPE_FWNAME: + *size += nfp_calc_fwname_tlv_size(pf); + break; + case NFP_DUMPSPEC_TYPE_CPP_CSR: + case NFP_DUMPSPEC_TYPE_XPB_CSR: + case NFP_DUMPSPEC_TYPE_ME_CSR: + spec_csr = (struct nfp_dumpspec_csr *)tl; + if (!nfp_csr_spec_valid(spec_csr)) + *size += nfp_dump_error_tlv_size(tl); + else + *size += ALIGN8(sizeof(struct nfp_dump_csr)) + + ALIGN8(be32_to_cpu(spec_csr->cpp.dump_length)); + break; + case NFP_DUMPSPEC_TYPE_INDIRECT_ME_CSR: + spec_csr = (struct nfp_dumpspec_csr *)tl; + if (!nfp_csr_spec_valid(spec_csr)) + *size += nfp_dump_error_tlv_size(tl); + else + *size += ALIGN8(sizeof(struct nfp_dump_csr)) + + ALIGN8(be32_to_cpu(spec_csr->cpp.dump_length) * + NFP_IND_NUM_CONTEXTS); + break; + case NFP_DUMPSPEC_TYPE_RTSYM: + *size += nfp_calc_rtsym_dump_sz(pf, tl); + break; + case NFP_DUMPSPEC_TYPE_HWINFO: + hwinfo_size = nfp_hwinfo_get_packed_str_size(pf->hwinfo); + *size += sizeof(struct nfp_dump_tl) + ALIGN8(hwinfo_size); + break; + case NFP_DUMPSPEC_TYPE_HWINFO_FIELD: + *size += nfp_calc_hwinfo_field_sz(pf, tl); + break; + default: + *size += nfp_dump_error_tlv_size(tl); + break; + } + + return 0; +} + +static int +nfp_calc_specific_level_size(struct nfp_pf *pf, struct nfp_dump_tl *dump_level, + void *param) +{ + struct nfp_level_size *lev_sz = param; + + if (be32_to_cpu(dump_level->type) != lev_sz->requested_level) + return 0; + + return nfp_traverse_tlvs(pf, dump_level->data, + be32_to_cpu(dump_level->length), + &lev_sz->total_size, nfp_add_tlv_size); +} + +s64 nfp_net_dump_calculate_size(struct nfp_pf *pf, struct nfp_dumpspec *spec, + u32 flag) +{ + struct nfp_level_size lev_sz; + int err; + + lev_sz.requested_level = flag; + lev_sz.total_size = ALIGN8(sizeof(struct nfp_dump_prolog)); + + err = nfp_traverse_tlvs(pf, spec->data, spec->size, &lev_sz, + nfp_calc_specific_level_size); + if (err) + return err; + + return lev_sz.total_size; +} + +static int nfp_add_tlv(u32 type, u32 total_tlv_sz, struct nfp_dump_state *dump) +{ + struct nfp_dump_tl *tl = dump->p; + + if (total_tlv_sz > dump->buf_size) + return -ENOSPC; + + if (dump->buf_size - total_tlv_sz < dump->dumped_size) + return -ENOSPC; + + tl->type = cpu_to_be32(type); + tl->length = cpu_to_be32(total_tlv_sz - sizeof(*tl)); + + dump->dumped_size += total_tlv_sz; + dump->p += total_tlv_sz; + + return 0; +} + +static int +nfp_dump_error_tlv(struct nfp_dump_tl *spec, int error, + struct nfp_dump_state *dump) +{ + struct nfp_dump_error *dump_header = dump->p; + u32 total_spec_size, total_size; + int err; + + total_spec_size = sizeof(*spec) + be32_to_cpu(spec->length); + total_size = ALIGN8(sizeof(*dump_header) + total_spec_size); + + err = nfp_add_tlv(NFP_DUMPSPEC_TYPE_ERROR, total_size, dump); + if (err) + return err; + + dump_header->error = cpu_to_be32(error); + memcpy(dump_header->spec, spec, total_spec_size); + + return 0; +} + +static int nfp_dump_fwname(struct nfp_pf *pf, struct nfp_dump_state *dump) +{ + struct nfp_dump_tl *dump_header = dump->p; + u32 fwname_len, total_size; + const char *fwname; + int err; + + fwname = nfp_mip_name(pf->mip); + fwname_len = strlen(fwname); + total_size = sizeof(*dump_header) + ALIGN8(fwname_len + 1); + + err = nfp_add_tlv(NFP_DUMPSPEC_TYPE_FWNAME, total_size, dump); + if (err) + return err; + + memcpy(dump_header->data, fwname, fwname_len); + + return 0; +} + +static int +nfp_dump_hwinfo(struct nfp_pf *pf, struct nfp_dump_tl *spec, + struct nfp_dump_state *dump) +{ + struct nfp_dump_tl *dump_header = dump->p; + u32 hwinfo_size, total_size; + char *hwinfo; + int err; + + hwinfo = nfp_hwinfo_get_packed_strings(pf->hwinfo); + hwinfo_size = nfp_hwinfo_get_packed_str_size(pf->hwinfo); + total_size = sizeof(*dump_header) + ALIGN8(hwinfo_size); + + err = nfp_add_tlv(NFP_DUMPSPEC_TYPE_HWINFO, total_size, dump); + if (err) + return err; + + memcpy(dump_header->data, hwinfo, hwinfo_size); + + return 0; +} + +static int nfp_dump_hwinfo_field(struct nfp_pf *pf, struct nfp_dump_tl *spec, + struct nfp_dump_state *dump) +{ + struct nfp_dump_tl *dump_header = dump->p; + u32 tl_len, key_len, val_len; + const char *key, *value; + u32 total_size; + int err; + + tl_len = be32_to_cpu(spec->length); + key_len = strnlen(spec->data, tl_len); + if (key_len == tl_len) + return nfp_dump_error_tlv(spec, -EINVAL, dump); + + key = spec->data; + value = nfp_hwinfo_lookup(pf->hwinfo, key); + if (!value) + return nfp_dump_error_tlv(spec, -ENOENT, dump); + + val_len = strlen(value); + total_size = sizeof(*dump_header) + ALIGN8(key_len + val_len + 2); + err = nfp_add_tlv(NFP_DUMPSPEC_TYPE_HWINFO_FIELD, total_size, dump); + if (err) + return err; + + memcpy(dump_header->data, key, key_len + 1); + memcpy(dump_header->data + key_len + 1, value, val_len + 1); + + return 0; +} + +static int +nfp_dump_csr_range(struct nfp_pf *pf, struct nfp_dumpspec_csr *spec_csr, + struct nfp_dump_state *dump) +{ + struct nfp_dump_csr *dump_header = dump->p; + u32 reg_sz, header_size, total_size; + u32 cpp_rd_addr, max_rd_addr; + int bytes_read; + void *dest; + u32 cpp_id; + int err; + + if (!nfp_csr_spec_valid(spec_csr)) + return nfp_dump_error_tlv(&spec_csr->tl, -EINVAL, dump); + + reg_sz = be32_to_cpu(spec_csr->register_width) / BITS_PER_BYTE; + header_size = ALIGN8(sizeof(*dump_header)); + total_size = header_size + + ALIGN8(be32_to_cpu(spec_csr->cpp.dump_length)); + dest = dump->p + header_size; + + err = nfp_add_tlv(be32_to_cpu(spec_csr->tl.type), total_size, dump); + if (err) + return err; + + dump_header->cpp = spec_csr->cpp; + dump_header->register_width = spec_csr->register_width; + + cpp_id = nfp_get_numeric_cpp_id(&spec_csr->cpp.cpp_id); + cpp_rd_addr = be32_to_cpu(spec_csr->cpp.offset); + max_rd_addr = cpp_rd_addr + be32_to_cpu(spec_csr->cpp.dump_length); + + while (cpp_rd_addr < max_rd_addr) { + bytes_read = nfp_cpp_read(pf->cpp, cpp_id, cpp_rd_addr, dest, + reg_sz); + if (bytes_read != reg_sz) { + if (bytes_read >= 0) + bytes_read = -EIO; + dump_header->error = cpu_to_be32(bytes_read); + dump_header->error_offset = cpu_to_be32(cpp_rd_addr); + break; + } + cpp_rd_addr += reg_sz; + dest += reg_sz; + } + + return 0; +} + +/* Write context to CSRCtxPtr, then read from it. Then the value can be read + * from IndCtxStatus. + */ +static int +nfp_read_indirect_csr(struct nfp_cpp *cpp, + struct nfp_dumpspec_cpp_isl_id cpp_params, u32 offset, + u32 reg_sz, u32 context, void *dest) +{ + u32 csr_ctx_ptr_offs; + u32 cpp_id; + int result; + + csr_ctx_ptr_offs = nfp_get_ind_csr_ctx_ptr_offs(offset); + cpp_id = NFP_CPP_ISLAND_ID(cpp_params.target, + NFP_IND_ME_REFL_WR_SIG_INIT, + cpp_params.token, cpp_params.island); + result = nfp_cpp_writel(cpp, cpp_id, csr_ctx_ptr_offs, context); + if (result != sizeof(context)) + return result < 0 ? result : -EIO; + + cpp_id = nfp_get_numeric_cpp_id(&cpp_params); + result = nfp_cpp_read(cpp, cpp_id, csr_ctx_ptr_offs, dest, reg_sz); + if (result != reg_sz) + return result < 0 ? result : -EIO; + + result = nfp_cpp_read(cpp, cpp_id, offset, dest, reg_sz); + if (result != reg_sz) + return result < 0 ? result : -EIO; + + return 0; +} + +static int +nfp_read_all_indirect_csr_ctx(struct nfp_cpp *cpp, + struct nfp_dumpspec_csr *spec_csr, u32 address, + u32 reg_sz, void *dest) +{ + u32 ctx; + int err; + + for (ctx = 0; ctx < NFP_IND_NUM_CONTEXTS; ctx++) { + err = nfp_read_indirect_csr(cpp, spec_csr->cpp.cpp_id, address, + reg_sz, ctx, dest + ctx * reg_sz); + if (err) + return err; + } + + return 0; +} + +static int +nfp_dump_indirect_csr_range(struct nfp_pf *pf, + struct nfp_dumpspec_csr *spec_csr, + struct nfp_dump_state *dump) +{ + struct nfp_dump_csr *dump_header = dump->p; + u32 reg_sz, header_size, total_size; + u32 cpp_rd_addr, max_rd_addr; + u32 reg_data_length; + void *dest; + int err; + + if (!nfp_csr_spec_valid(spec_csr)) + return nfp_dump_error_tlv(&spec_csr->tl, -EINVAL, dump); + + reg_sz = be32_to_cpu(spec_csr->register_width) / BITS_PER_BYTE; + header_size = ALIGN8(sizeof(*dump_header)); + reg_data_length = be32_to_cpu(spec_csr->cpp.dump_length) * + NFP_IND_NUM_CONTEXTS; + total_size = header_size + ALIGN8(reg_data_length); + dest = dump->p + header_size; + + err = nfp_add_tlv(be32_to_cpu(spec_csr->tl.type), total_size, dump); + if (err) + return err; + + dump_header->cpp = spec_csr->cpp; + dump_header->register_width = spec_csr->register_width; + + cpp_rd_addr = be32_to_cpu(spec_csr->cpp.offset); + max_rd_addr = cpp_rd_addr + be32_to_cpu(spec_csr->cpp.dump_length); + while (cpp_rd_addr < max_rd_addr) { + err = nfp_read_all_indirect_csr_ctx(pf->cpp, spec_csr, + cpp_rd_addr, reg_sz, dest); + if (err) { + dump_header->error = cpu_to_be32(err); + dump_header->error_offset = cpu_to_be32(cpp_rd_addr); + break; + } + cpp_rd_addr += reg_sz; + dest += reg_sz * NFP_IND_NUM_CONTEXTS; + } + + return 0; +} + +static int +nfp_dump_single_rtsym(struct nfp_pf *pf, struct nfp_dumpspec_rtsym *spec, + struct nfp_dump_state *dump) +{ + struct nfp_dump_rtsym *dump_header = dump->p; + struct nfp_dumpspec_cpp_isl_id cpp_params; + struct nfp_rtsym_table *rtbl = pf->rtbl; + const struct nfp_rtsym *sym; + u32 header_size, total_size; + u32 tl_len, key_len; + int bytes_read; + u32 cpp_id; + void *dest; + int err; + + tl_len = be32_to_cpu(spec->tl.length); + key_len = strnlen(spec->rtsym, tl_len); + if (key_len == tl_len) + return nfp_dump_error_tlv(&spec->tl, -EINVAL, dump); + + sym = nfp_rtsym_lookup(rtbl, spec->rtsym); + if (!sym) + return nfp_dump_error_tlv(&spec->tl, -ENOENT, dump); + + header_size = + ALIGN8(offsetof(struct nfp_dump_rtsym, rtsym) + key_len + 1); + total_size = header_size + ALIGN8(sym->size); + dest = dump->p + header_size; + + err = nfp_add_tlv(be32_to_cpu(spec->tl.type), total_size, dump); + if (err) + return err; + + dump_header->padded_name_length = + header_size - offsetof(struct nfp_dump_rtsym, rtsym); + memcpy(dump_header->rtsym, spec->rtsym, key_len + 1); + + cpp_params.target = sym->target; + cpp_params.action = NFP_CPP_ACTION_RW; + cpp_params.token = 0; + cpp_params.island = sym->domain; + cpp_id = nfp_get_numeric_cpp_id(&cpp_params); + + dump_header->cpp.cpp_id = cpp_params; + dump_header->cpp.offset = cpu_to_be32(sym->addr); + dump_header->cpp.dump_length = cpu_to_be32(sym->size); + + bytes_read = nfp_cpp_read(pf->cpp, cpp_id, sym->addr, dest, sym->size); + if (bytes_read != sym->size) { + if (bytes_read >= 0) + bytes_read = -EIO; + dump_header->error = cpu_to_be32(bytes_read); + } + + return 0; +} + +static int +nfp_dump_for_tlv(struct nfp_pf *pf, struct nfp_dump_tl *tl, void *param) +{ + struct nfp_dumpspec_rtsym *spec_rtsym; + struct nfp_dump_state *dump = param; + struct nfp_dumpspec_csr *spec_csr; + int err; + + switch (be32_to_cpu(tl->type)) { + case NFP_DUMPSPEC_TYPE_FWNAME: + err = nfp_dump_fwname(pf, dump); + if (err) + return err; + break; + case NFP_DUMPSPEC_TYPE_CPP_CSR: + case NFP_DUMPSPEC_TYPE_XPB_CSR: + case NFP_DUMPSPEC_TYPE_ME_CSR: + spec_csr = (struct nfp_dumpspec_csr *)tl; + err = nfp_dump_csr_range(pf, spec_csr, dump); + if (err) + return err; + break; + case NFP_DUMPSPEC_TYPE_INDIRECT_ME_CSR: + spec_csr = (struct nfp_dumpspec_csr *)tl; + err = nfp_dump_indirect_csr_range(pf, spec_csr, dump); + if (err) + return err; + break; + case NFP_DUMPSPEC_TYPE_RTSYM: + spec_rtsym = (struct nfp_dumpspec_rtsym *)tl; + err = nfp_dump_single_rtsym(pf, spec_rtsym, dump); + if (err) + return err; + break; + case NFP_DUMPSPEC_TYPE_HWINFO: + err = nfp_dump_hwinfo(pf, tl, dump); + if (err) + return err; + break; + case NFP_DUMPSPEC_TYPE_HWINFO_FIELD: + err = nfp_dump_hwinfo_field(pf, tl, dump); + if (err) + return err; + break; + default: + err = nfp_dump_error_tlv(tl, -EOPNOTSUPP, dump); + if (err) + return err; + } + + return 0; +} + +static int +nfp_dump_specific_level(struct nfp_pf *pf, struct nfp_dump_tl *dump_level, + void *param) +{ + struct nfp_dump_state *dump = param; + + if (be32_to_cpu(dump_level->type) != dump->requested_level) + return 0; + + return nfp_traverse_tlvs(pf, dump_level->data, + be32_to_cpu(dump_level->length), dump, + nfp_dump_for_tlv); +} + +static int nfp_dump_populate_prolog(struct nfp_dump_state *dump) +{ + struct nfp_dump_prolog *prolog = dump->p; + u32 total_size; + int err; + + total_size = ALIGN8(sizeof(*prolog)); + + err = nfp_add_tlv(NFP_DUMPSPEC_TYPE_PROLOG, total_size, dump); + if (err) + return err; + + prolog->dump_level = cpu_to_be32(dump->requested_level); + + return 0; +} + +int nfp_net_dump_populate_buffer(struct nfp_pf *pf, struct nfp_dumpspec *spec, + struct ethtool_dump *dump_param, void *dest) +{ + struct nfp_dump_state dump; + int err; + + dump.requested_level = dump_param->flag; + dump.dumped_size = 0; + dump.p = dest; + dump.buf_size = dump_param->len; + + err = nfp_dump_populate_prolog(&dump); + if (err) + return err; + + err = nfp_traverse_tlvs(pf, spec->data, spec->size, &dump, + nfp_dump_specific_level); + if (err) + return err; + + /* Set size of actual dump, to trigger warning if different from + * calculated size. + */ + dump_param->len = dump.dumped_size; + + return 0; +} diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c index 2801ecd09eab..2cde0eb00ee3 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c @@ -51,14 +51,11 @@ #include "nfpcore/nfp.h" #include "nfpcore/nfp_nsp.h" #include "nfp_app.h" +#include "nfp_main.h" #include "nfp_net_ctrl.h" #include "nfp_net.h" #include "nfp_port.h" -enum nfp_dump_diag { - NFP_DUMP_NSP_DIAG = 0, -}; - struct nfp_et_stat { char name[ETH_GSTRING_LEN]; int off; @@ -1066,15 +1063,34 @@ exit_release: return ret; } +/* Set the dump flag/level. Calculate the dump length for flag > 0 only (new TLV + * based dumps), since flag 0 (default) calculates the length in + * nfp_app_get_dump_flag(), and we need to support triggering a level 0 dump + * without setting the flag first, for backward compatibility. + */ static int nfp_app_set_dump(struct net_device *netdev, struct ethtool_dump *val) { struct nfp_app *app = nfp_app_from_netdev(netdev); + s64 len; if (!app) return -EOPNOTSUPP; - if (val->flag != NFP_DUMP_NSP_DIAG) - return -EINVAL; + if (val->flag == NFP_DUMP_NSP_DIAG) { + app->pf->dump_flag = val->flag; + return 0; + } + + if (!app->pf->dumpspec) + return -EOPNOTSUPP; + + len = nfp_net_dump_calculate_size(app->pf, app->pf->dumpspec, + val->flag); + if (len < 0) + return len; + + app->pf->dump_flag = val->flag; + app->pf->dump_len = len; return 0; } @@ -1082,14 +1098,37 @@ static int nfp_app_set_dump(struct net_device *netdev, struct ethtool_dump *val) static int nfp_app_get_dump_flag(struct net_device *netdev, struct ethtool_dump *dump) { - return nfp_dump_nsp_diag(nfp_app_from_netdev(netdev), dump, NULL); + struct nfp_app *app = nfp_app_from_netdev(netdev); + + if (!app) + return -EOPNOTSUPP; + + if (app->pf->dump_flag == NFP_DUMP_NSP_DIAG) + return nfp_dump_nsp_diag(app, dump, NULL); + + dump->flag = app->pf->dump_flag; + dump->len = app->pf->dump_len; + + return 0; } static int nfp_app_get_dump_data(struct net_device *netdev, struct ethtool_dump *dump, void *buffer) { - return nfp_dump_nsp_diag(nfp_app_from_netdev(netdev), dump, buffer); + struct nfp_app *app = nfp_app_from_netdev(netdev); + + if (!app) + return -EOPNOTSUPP; + + if (app->pf->dump_flag == NFP_DUMP_NSP_DIAG) + return nfp_dump_nsp_diag(app, dump, buffer); + + dump->flag = app->pf->dump_flag; + dump->len = app->pf->dump_len; + + return nfp_net_dump_populate_buffer(app->pf, app->pf->dumpspec, dump, + buffer); } static int nfp_net_set_coalesce(struct net_device *netdev, diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp.h b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp.h index 3ce51f03126f..ced62d112aa2 100644 --- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp.h +++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp.h @@ -49,6 +49,8 @@ struct nfp_hwinfo; struct nfp_hwinfo *nfp_hwinfo_read(struct nfp_cpp *cpp); const char *nfp_hwinfo_lookup(struct nfp_hwinfo *hwinfo, const char *lookup); +char *nfp_hwinfo_get_packed_strings(struct nfp_hwinfo *hwinfo); +u32 nfp_hwinfo_get_packed_str_size(struct nfp_hwinfo *hwinfo); /* Implemented in nfp_nsp.c, low level functions */ diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c index 04dd5758ecf5..3fcb522d2e85 100644 --- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c +++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c @@ -372,8 +372,7 @@ nfp_cpp_area_alloc(struct nfp_cpp *cpp, u32 dest, * that it can be accessed directly. * * NOTE: @address and @size must be 32-bit aligned values. - * - * NOTE: The area must also be 'released' when the structure is freed. + * The area must also be 'released' when the structure is freed. * * Return: NFP CPP Area handle, or NULL */ @@ -536,8 +535,7 @@ void nfp_cpp_area_release_free(struct nfp_cpp_area *area) * Read data from indicated CPP region. * * NOTE: @offset and @length must be 32-bit aligned values. - * - * NOTE: Area must have been locked down with an 'acquire'. + * Area must have been locked down with an 'acquire'. * * Return: length of io, or -ERRNO */ @@ -558,8 +556,7 @@ int nfp_cpp_area_read(struct nfp_cpp_area *area, * Write data to indicated CPP region. * * NOTE: @offset and @length must be 32-bit aligned values. - * - * NOTE: Area must have been locked down with an 'acquire'. + * Area must have been locked down with an 'acquire'. * * Return: length of io, or -ERRNO */ diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_hwinfo.c b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_hwinfo.c index 4f24aff1e772..063a9a6243d6 100644 --- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_hwinfo.c +++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_hwinfo.c @@ -302,3 +302,13 @@ const char *nfp_hwinfo_lookup(struct nfp_hwinfo *hwinfo, const char *lookup) return NULL; } + +char *nfp_hwinfo_get_packed_strings(struct nfp_hwinfo *hwinfo) +{ + return hwinfo->data; +} + +u32 nfp_hwinfo_get_packed_str_size(struct nfp_hwinfo *hwinfo) +{ + return le32_to_cpu(hwinfo->size) - sizeof(u32); +} diff --git a/drivers/net/ethernet/nvidia/forcedeth.c b/drivers/net/ethernet/nvidia/forcedeth.c index 481876b5424c..53614ed694fc 100644 --- a/drivers/net/ethernet/nvidia/forcedeth.c +++ b/drivers/net/ethernet/nvidia/forcedeth.c @@ -2563,7 +2563,7 @@ static int nv_tx_done(struct net_device *dev, int limit) if (np->desc_ver == DESC_VER_1) { if (flags & NV_TX_LASTPACKET) { - if (flags & NV_TX_ERROR) { + if (unlikely(flags & NV_TX_ERROR)) { if ((flags & NV_TX_RETRYERROR) && !(flags & NV_TX_RETRYCOUNT_MASK)) nv_legacybackoff_reseed(dev); @@ -2580,7 +2580,7 @@ static int nv_tx_done(struct net_device *dev, int limit) } } else { if (flags & NV_TX2_LASTPACKET) { - if (flags & NV_TX2_ERROR) { + if (unlikely(flags & NV_TX2_ERROR)) { if ((flags & NV_TX2_RETRYERROR) && !(flags & NV_TX2_RETRYCOUNT_MASK)) nv_legacybackoff_reseed(dev); @@ -2626,7 +2626,7 @@ static int nv_tx_done_optimized(struct net_device *dev, int limit) nv_unmap_txskb(np, np->get_tx_ctx); if (flags & NV_TX2_LASTPACKET) { - if (flags & NV_TX2_ERROR) { + if (unlikely(flags & NV_TX2_ERROR)) { if ((flags & NV_TX2_RETRYERROR) && !(flags & NV_TX2_RETRYCOUNT_MASK)) { if (np->driver_data & DEV_HAS_GEAR_MODE) diff --git a/drivers/net/ethernet/qlogic/netxen/netxen_nic_hw.c b/drivers/net/ethernet/qlogic/netxen/netxen_nic_hw.c index 0a66389c06c2..1cd39c9a0345 100644 --- a/drivers/net/ethernet/qlogic/netxen/netxen_nic_hw.c +++ b/drivers/net/ethernet/qlogic/netxen/netxen_nic_hw.c @@ -2502,12 +2502,10 @@ netxen_collect_minidump(struct netxen_adapter *adapter) { int ret = 0; struct netxen_minidump_template_hdr *hdr; - struct timespec val; hdr = (struct netxen_minidump_template_hdr *) adapter->mdump.md_template; hdr->driver_capture_mask = adapter->mdump.md_capture_mask; - jiffies_to_timespec(jiffies, &val); - hdr->driver_timestamp = (u32) val.tv_sec; + hdr->driver_timestamp = ktime_get_seconds(); hdr->driver_info_word2 = adapter->fw_version; hdr->driver_info_word3 = NXRD32(adapter, CRB_DRIVER_VERSION); ret = netxen_parse_md_template(adapter); diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c index 8f9b3eb82137..57332b3e5e64 100644 --- a/drivers/net/ethernet/qlogic/qede/qede_main.c +++ b/drivers/net/ethernet/qlogic/qede/qede_main.c @@ -1068,10 +1068,6 @@ static void __qede_remove(struct pci_dev *pdev, enum qede_remove_mode mode) pci_set_drvdata(pdev, NULL); - /* Release edev's reference to XDP's bpf if such exist */ - if (edev->xdp_prog) - bpf_prog_put(edev->xdp_prog); - /* Use global ops since we've freed edev */ qed_ops->common->slowpath_stop(cdev); if (system_state == SYSTEM_POWER_OFF) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index d7250539d0bd..c52a9963c19d 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -1997,22 +1997,60 @@ static void stmmac_set_dma_operation_mode(struct stmmac_priv *priv, u32 txmode, static void stmmac_dma_interrupt(struct stmmac_priv *priv) { u32 tx_channel_count = priv->plat->tx_queues_to_use; - int status; + u32 rx_channel_count = priv->plat->rx_queues_to_use; + u32 channels_to_check = tx_channel_count > rx_channel_count ? + tx_channel_count : rx_channel_count; u32 chan; + bool poll_scheduled = false; + int status[channels_to_check]; + + /* Each DMA channel can be used for rx and tx simultaneously, yet + * napi_struct is embedded in struct stmmac_rx_queue rather than in a + * stmmac_channel struct. + * Because of this, stmmac_poll currently checks (and possibly wakes) + * all tx queues rather than just a single tx queue. + */ + for (chan = 0; chan < channels_to_check; chan++) + status[chan] = priv->hw->dma->dma_interrupt(priv->ioaddr, + &priv->xstats, + chan); - for (chan = 0; chan < tx_channel_count; chan++) { - struct stmmac_rx_queue *rx_q = &priv->rx_queue[chan]; + for (chan = 0; chan < rx_channel_count; chan++) { + if (likely(status[chan] & handle_rx)) { + struct stmmac_rx_queue *rx_q = &priv->rx_queue[chan]; - status = priv->hw->dma->dma_interrupt(priv->ioaddr, - &priv->xstats, chan); - if (likely((status & handle_rx)) || (status & handle_tx)) { if (likely(napi_schedule_prep(&rx_q->napi))) { stmmac_disable_dma_irq(priv, chan); __napi_schedule(&rx_q->napi); + poll_scheduled = true; + } + } + } + + /* If we scheduled poll, we already know that tx queues will be checked. + * If we didn't schedule poll, see if any DMA channel (used by tx) has a + * completed transmission, if so, call stmmac_poll (once). + */ + if (!poll_scheduled) { + for (chan = 0; chan < tx_channel_count; chan++) { + if (status[chan] & handle_tx) { + /* It doesn't matter what rx queue we choose + * here. We use 0 since it always exists. + */ + struct stmmac_rx_queue *rx_q = + &priv->rx_queue[0]; + + if (likely(napi_schedule_prep(&rx_q->napi))) { + stmmac_disable_dma_irq(priv, chan); + __napi_schedule(&rx_q->napi); + } + break; } } + } - if (unlikely(status & tx_hard_error_bump_tc)) { + for (chan = 0; chan < tx_channel_count; chan++) { + if (unlikely(status[chan] & tx_hard_error_bump_tc)) { /* Try to bump up the dma threshold on this failure */ if (unlikely(priv->xstats.threshold != SF_DMA_MODE) && (tc <= 256)) { @@ -2029,7 +2067,7 @@ static void stmmac_dma_interrupt(struct stmmac_priv *priv) chan); priv->xstats.threshold = tc; } - } else if (unlikely(status == tx_hard_error)) { + } else if (unlikely(status[chan] == tx_hard_error)) { stmmac_tx_err(priv, chan); } } diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c index a73600dceb8b..a60a378b8b29 100644 --- a/drivers/net/ethernet/ti/cpsw.c +++ b/drivers/net/ethernet/ti/cpsw.c @@ -88,6 +88,7 @@ do { \ #define CPSW_VERSION_4 0x190112 #define HOST_PORT_NUM 0 +#define CPSW_ALE_PORTS_NUM 3 #define SLIVER_SIZE 0x40 #define CPSW1_HOST_PORT_OFFSET 0x028 @@ -352,6 +353,27 @@ struct cpsw_hw_stats { u32 rxdmaoverruns; }; +struct cpsw_slave_data { + struct device_node *phy_node; + char phy_id[MII_BUS_ID_SIZE]; + int phy_if; + u8 mac_addr[ETH_ALEN]; + u16 dual_emac_res_vlan; /* Reserved VLAN for DualEMAC */ +}; + +struct cpsw_platform_data { + struct cpsw_slave_data *slave_data; + u32 ss_reg_ofs; /* Subsystem control register offset */ + u32 channels; /* number of cpdma channels (symmetric) */ + u32 slaves; /* number of slave cpgmac ports */ + u32 active_slave; /* time stamping, ethtool and SIOCGMIIPHY slave */ + u32 ale_entries; /* ale table size */ + u32 bd_ram_size; /*buffer descriptor ram size */ + u32 mac_control; /* Mac control register */ + u16 default_vlan; /* Def VLAN for ALE lookup in VLAN aware mode*/ + bool dual_emac; /* Enable Dual EMAC mode */ +}; + struct cpsw_slave { void __iomem *regs; struct cpsw_sliver_regs __iomem *sliver; @@ -365,12 +387,12 @@ struct cpsw_slave { static inline u32 slave_read(struct cpsw_slave *slave, u32 offset) { - return __raw_readl(slave->regs + offset); + return readl_relaxed(slave->regs + offset); } static inline void slave_write(struct cpsw_slave *slave, u32 val, u32 offset) { - __raw_writel(val, slave->regs + offset); + writel_relaxed(val, slave->regs + offset); } struct cpsw_vector { @@ -660,8 +682,8 @@ static void cpsw_ndo_set_rx_mode(struct net_device *ndev) static void cpsw_intr_enable(struct cpsw_common *cpsw) { - __raw_writel(0xFF, &cpsw->wr_regs->tx_en); - __raw_writel(0xFF, &cpsw->wr_regs->rx_en); + writel_relaxed(0xFF, &cpsw->wr_regs->tx_en); + writel_relaxed(0xFF, &cpsw->wr_regs->rx_en); cpdma_ctlr_int_ctrl(cpsw->dma, true); return; @@ -669,8 +691,8 @@ static void cpsw_intr_enable(struct cpsw_common *cpsw) static void cpsw_intr_disable(struct cpsw_common *cpsw) { - __raw_writel(0, &cpsw->wr_regs->tx_en); - __raw_writel(0, &cpsw->wr_regs->rx_en); + writel_relaxed(0, &cpsw->wr_regs->tx_en); + writel_relaxed(0, &cpsw->wr_regs->rx_en); cpdma_ctlr_int_ctrl(cpsw->dma, false); return; @@ -949,18 +971,14 @@ static inline void soft_reset(const char *module, void __iomem *reg) { unsigned long timeout = jiffies + HZ; - __raw_writel(1, reg); + writel_relaxed(1, reg); do { cpu_relax(); - } while ((__raw_readl(reg) & 1) && time_after(timeout, jiffies)); + } while ((readl_relaxed(reg) & 1) && time_after(timeout, jiffies)); - WARN(__raw_readl(reg) & 1, "failed to soft-reset %s\n", module); + WARN(readl_relaxed(reg) & 1, "failed to soft-reset %s\n", module); } -#define mac_hi(mac) (((mac)[0] << 0) | ((mac)[1] << 8) | \ - ((mac)[2] << 16) | ((mac)[3] << 24)) -#define mac_lo(mac) (((mac)[4] << 0) | ((mac)[5] << 8)) - static void cpsw_set_slave_mac(struct cpsw_slave *slave, struct cpsw_priv *priv) { @@ -1015,7 +1033,7 @@ static void _cpsw_adjust_link(struct cpsw_slave *slave, if (mac_control != slave->mac_control) { phy_print_status(phy); - __raw_writel(mac_control, &slave->sliver->mac_control); + writel_relaxed(mac_control, &slave->sliver->mac_control); } slave->mac_control = mac_control; @@ -1278,7 +1296,7 @@ static void cpsw_slave_open(struct cpsw_slave *slave, struct cpsw_priv *priv) soft_reset_slave(slave); /* setup priority mapping */ - __raw_writel(RX_PRIORITY_MAPPING, &slave->sliver->rx_pri_map); + writel_relaxed(RX_PRIORITY_MAPPING, &slave->sliver->rx_pri_map); switch (cpsw->version) { case CPSW_VERSION_1: @@ -1304,7 +1322,7 @@ static void cpsw_slave_open(struct cpsw_slave *slave, struct cpsw_priv *priv) } /* setup max packet size, and mac address */ - __raw_writel(cpsw->rx_packet_max, &slave->sliver->rx_maxlen); + writel_relaxed(cpsw->rx_packet_max, &slave->sliver->rx_maxlen); cpsw_set_slave_mac(slave, priv); slave->mac_control = 0; /* no link yet */ @@ -1395,9 +1413,9 @@ static void cpsw_init_host_port(struct cpsw_priv *priv) writel(fifo_mode, &cpsw->host_port_regs->tx_in_ctl); /* setup host port priority mapping */ - __raw_writel(CPDMA_TX_PRIORITY_MAP, - &cpsw->host_port_regs->cpdma_tx_pri_map); - __raw_writel(0, &cpsw->host_port_regs->cpdma_rx_chan_map); + writel_relaxed(CPDMA_TX_PRIORITY_MAP, + &cpsw->host_port_regs->cpdma_tx_pri_map); + writel_relaxed(0, &cpsw->host_port_regs->cpdma_rx_chan_map); cpsw_ale_control_set(cpsw->ale, HOST_PORT_NUM, ALE_PORT_STATE, ALE_PORT_STATE_FORWARD); @@ -1514,10 +1532,10 @@ static int cpsw_ndo_open(struct net_device *ndev) /* initialize shared resources for every ndev */ if (!cpsw->usage_count) { /* disable priority elevation */ - __raw_writel(0, &cpsw->regs->ptype); + writel_relaxed(0, &cpsw->regs->ptype); /* enable statistics collection only on all ports */ - __raw_writel(0x7, &cpsw->regs->stat_port_en); + writel_relaxed(0x7, &cpsw->regs->stat_port_en); /* Enable internal fifo flow control */ writel(0x7, &cpsw->regs->flow_control); @@ -1701,7 +1719,7 @@ static void cpsw_hwtstamp_v2(struct cpsw_priv *priv) slave_write(slave, mtype, CPSW2_TS_SEQ_MTYPE); slave_write(slave, ctrl, CPSW2_CONTROL); - __raw_writel(ETH_P_1588, &cpsw->regs->ts_ltype); + writel_relaxed(ETH_P_1588, &cpsw->regs->ts_ltype); } static int cpsw_hwtstamp_set(struct net_device *dev, struct ifreq *ifr) @@ -2298,7 +2316,6 @@ static int cpsw_check_ch_settings(struct cpsw_common *cpsw, static int cpsw_update_channels_res(struct cpsw_priv *priv, int ch_num, int rx) { - int (*poll)(struct napi_struct *, int); struct cpsw_common *cpsw = priv->cpsw; void (*handler)(void *, int, int); struct netdev_queue *queue; @@ -2309,12 +2326,10 @@ static int cpsw_update_channels_res(struct cpsw_priv *priv, int ch_num, int rx) ch = &cpsw->rx_ch_num; vec = cpsw->rxv; handler = cpsw_rx_handler; - poll = cpsw_rx_poll; } else { ch = &cpsw->tx_ch_num; vec = cpsw->txv; handler = cpsw_tx_handler; - poll = cpsw_tx_poll; } while (*ch < ch_num) { @@ -3060,7 +3075,7 @@ static int cpsw_probe(struct platform_device *pdev) ale_params.dev = &pdev->dev; ale_params.ale_ageout = ale_ageout; ale_params.ale_entries = data->ale_entries; - ale_params.ale_ports = data->slaves; + ale_params.ale_ports = CPSW_ALE_PORTS_NUM; cpsw->ale = cpsw_ale_create(&ale_params); if (!cpsw->ale) { @@ -3072,14 +3087,14 @@ static int cpsw_probe(struct platform_device *pdev) cpsw->cpts = cpts_create(cpsw->dev, cpts_regs, cpsw->dev->of_node); if (IS_ERR(cpsw->cpts)) { ret = PTR_ERR(cpsw->cpts); - goto clean_ale_ret; + goto clean_dma_ret; } ndev->irq = platform_get_irq(pdev, 1); if (ndev->irq < 0) { dev_err(priv->dev, "error getting irq resource\n"); ret = ndev->irq; - goto clean_ale_ret; + goto clean_dma_ret; } of_id = of_match_device(cpsw_of_mtable, &pdev->dev); @@ -3103,7 +3118,7 @@ static int cpsw_probe(struct platform_device *pdev) if (ret) { dev_err(priv->dev, "error registering net device\n"); ret = -ENODEV; - goto clean_ale_ret; + goto clean_dma_ret; } if (cpsw->data.dual_emac) { @@ -3126,7 +3141,7 @@ static int cpsw_probe(struct platform_device *pdev) irq = platform_get_irq(pdev, 1); if (irq < 0) { ret = irq; - goto clean_ale_ret; + goto clean_dma_ret; } cpsw->irqs_table[0] = irq; @@ -3134,14 +3149,14 @@ static int cpsw_probe(struct platform_device *pdev) 0, dev_name(&pdev->dev), cpsw); if (ret < 0) { dev_err(priv->dev, "error attaching irq (%d)\n", ret); - goto clean_ale_ret; + goto clean_dma_ret; } /* TX IRQ */ irq = platform_get_irq(pdev, 2); if (irq < 0) { ret = irq; - goto clean_ale_ret; + goto clean_dma_ret; } cpsw->irqs_table[1] = irq; @@ -3149,7 +3164,7 @@ static int cpsw_probe(struct platform_device *pdev) 0, dev_name(&pdev->dev), cpsw); if (ret < 0) { dev_err(priv->dev, "error attaching irq (%d)\n", ret); - goto clean_ale_ret; + goto clean_dma_ret; } cpsw_notice(priv, probe, @@ -3162,8 +3177,6 @@ static int cpsw_probe(struct platform_device *pdev) clean_unregister_netdev_ret: unregister_netdev(ndev); -clean_ale_ret: - cpsw_ale_destroy(cpsw->ale); clean_dma_ret: cpdma_ctlr_destroy(cpsw->dma); clean_dt_ret: @@ -3193,7 +3206,6 @@ static int cpsw_remove(struct platform_device *pdev) unregister_netdev(ndev); cpts_release(cpsw->cpts); - cpsw_ale_destroy(cpsw->ale); cpdma_ctlr_destroy(cpsw->dma); cpsw_remove_dt(pdev); pm_runtime_put_sync(&pdev->dev); diff --git a/drivers/net/ethernet/ti/cpsw.h b/drivers/net/ethernet/ti/cpsw.h index 6c3037aa2cd3..cf111db3dc27 100644 --- a/drivers/net/ethernet/ti/cpsw.h +++ b/drivers/net/ethernet/ti/cpsw.h @@ -17,26 +17,9 @@ #include <linux/if_ether.h> #include <linux/phy.h> -struct cpsw_slave_data { - struct device_node *phy_node; - char phy_id[MII_BUS_ID_SIZE]; - int phy_if; - u8 mac_addr[ETH_ALEN]; - u16 dual_emac_res_vlan; /* Reserved VLAN for DualEMAC */ -}; - -struct cpsw_platform_data { - struct cpsw_slave_data *slave_data; - u32 ss_reg_ofs; /* Subsystem control register offset */ - u32 channels; /* number of cpdma channels (symmetric) */ - u32 slaves; /* number of slave cpgmac ports */ - u32 active_slave; /* time stamping, ethtool and SIOCGMIIPHY slave */ - u32 ale_entries; /* ale table size */ - u32 bd_ram_size; /*buffer descriptor ram size */ - u32 mac_control; /* Mac control register */ - u16 default_vlan; /* Def VLAN for ALE lookup in VLAN aware mode*/ - bool dual_emac; /* Enable Dual EMAC mode */ -}; +#define mac_hi(mac) (((mac)[0] << 0) | ((mac)[1] << 8) | \ + ((mac)[2] << 16) | ((mac)[3] << 24)) +#define mac_lo(mac) (((mac)[4] << 0) | ((mac)[5] << 8)) void cpsw_phy_sel(struct device *dev, phy_interface_t phy_mode, int slave); int ti_cm_get_macid(struct device *dev, int slave, u8 *mac_addr); diff --git a/drivers/net/ethernet/ti/cpsw_ale.c b/drivers/net/ethernet/ti/cpsw_ale.c index b432a75fb874..93dc05c194d3 100644 --- a/drivers/net/ethernet/ti/cpsw_ale.c +++ b/drivers/net/ethernet/ti/cpsw_ale.c @@ -150,11 +150,11 @@ static int cpsw_ale_read(struct cpsw_ale *ale, int idx, u32 *ale_entry) WARN_ON(idx > ale->params.ale_entries); - __raw_writel(idx, ale->params.ale_regs + ALE_TABLE_CONTROL); + writel_relaxed(idx, ale->params.ale_regs + ALE_TABLE_CONTROL); for (i = 0; i < ALE_ENTRY_WORDS; i++) - ale_entry[i] = __raw_readl(ale->params.ale_regs + - ALE_TABLE + 4 * i); + ale_entry[i] = readl_relaxed(ale->params.ale_regs + + ALE_TABLE + 4 * i); return idx; } @@ -166,11 +166,11 @@ static int cpsw_ale_write(struct cpsw_ale *ale, int idx, u32 *ale_entry) WARN_ON(idx > ale->params.ale_entries); for (i = 0; i < ALE_ENTRY_WORDS; i++) - __raw_writel(ale_entry[i], ale->params.ale_regs + - ALE_TABLE + 4 * i); + writel_relaxed(ale_entry[i], ale->params.ale_regs + + ALE_TABLE + 4 * i); - __raw_writel(idx | ALE_TABLE_WRITE, ale->params.ale_regs + - ALE_TABLE_CONTROL); + writel_relaxed(idx | ALE_TABLE_WRITE, ale->params.ale_regs + + ALE_TABLE_CONTROL); return idx; } @@ -723,7 +723,7 @@ int cpsw_ale_control_set(struct cpsw_ale *ale, int port, int control, if (info->port_offset == 0 && info->port_shift == 0) port = 0; /* global, port is a dont care */ - if (port < 0 || port > ale->params.ale_ports) + if (port < 0 || port >= ale->params.ale_ports) return -EINVAL; mask = BITMASK(info->bits); @@ -733,9 +733,9 @@ int cpsw_ale_control_set(struct cpsw_ale *ale, int port, int control, offset = info->offset + (port * info->port_offset); shift = info->shift + (port * info->port_shift); - tmp = __raw_readl(ale->params.ale_regs + offset); + tmp = readl_relaxed(ale->params.ale_regs + offset); tmp = (tmp & ~(mask << shift)) | (value << shift); - __raw_writel(tmp, ale->params.ale_regs + offset); + writel_relaxed(tmp, ale->params.ale_regs + offset); return 0; } @@ -754,13 +754,13 @@ int cpsw_ale_control_get(struct cpsw_ale *ale, int port, int control) if (info->port_offset == 0 && info->port_shift == 0) port = 0; /* global, port is a dont care */ - if (port < 0 || port > ale->params.ale_ports) + if (port < 0 || port >= ale->params.ale_ports) return -EINVAL; offset = info->offset + (port * info->port_offset); shift = info->shift + (port * info->port_shift); - tmp = __raw_readl(ale->params.ale_regs + offset) >> shift; + tmp = readl_relaxed(ale->params.ale_regs + offset) >> shift; return tmp & BITMASK(info->bits); } EXPORT_SYMBOL_GPL(cpsw_ale_control_get); @@ -779,9 +779,37 @@ static void cpsw_ale_timer(struct timer_list *t) void cpsw_ale_start(struct cpsw_ale *ale) { + cpsw_ale_control_set(ale, 0, ALE_ENABLE, 1); + cpsw_ale_control_set(ale, 0, ALE_CLEAR, 1); + + timer_setup(&ale->timer, cpsw_ale_timer, 0); + if (ale->ageout) { + ale->timer.expires = jiffies + ale->ageout; + add_timer(&ale->timer); + } +} +EXPORT_SYMBOL_GPL(cpsw_ale_start); + +void cpsw_ale_stop(struct cpsw_ale *ale) +{ + del_timer_sync(&ale->timer); + cpsw_ale_control_set(ale, 0, ALE_ENABLE, 0); +} +EXPORT_SYMBOL_GPL(cpsw_ale_stop); + +struct cpsw_ale *cpsw_ale_create(struct cpsw_ale_params *params) +{ + struct cpsw_ale *ale; u32 rev, ale_entries; - rev = __raw_readl(ale->params.ale_regs + ALE_IDVER); + ale = devm_kzalloc(params->dev, sizeof(*ale), GFP_KERNEL); + if (!ale) + return NULL; + + ale->params = *params; + ale->ageout = ale->params.ale_ageout * HZ; + + rev = readl_relaxed(ale->params.ale_regs + ALE_IDVER); if (!ale->params.major_ver_mask) ale->params.major_ver_mask = 0xff; ale->version = @@ -793,8 +821,8 @@ void cpsw_ale_start(struct cpsw_ale *ale) if (!ale->params.ale_entries) { ale_entries = - __raw_readl(ale->params.ale_regs + ALE_STATUS) & - ALE_STATUS_SIZE_MASK; + readl_relaxed(ale->params.ale_regs + ALE_STATUS) & + ALE_STATUS_SIZE_MASK; /* ALE available on newer NetCP switches has introduced * a register, ALE_STATUS, to indicate the size of ALE * table which shows the size as a multiple of 1024 entries. @@ -816,9 +844,9 @@ void cpsw_ale_start(struct cpsw_ale *ale) "ALE Table size %ld\n", ale->params.ale_entries); /* set default bits for existing h/w */ - ale->port_mask_bits = 3; - ale->port_num_bits = 2; - ale->vlan_field_bits = 3; + ale->port_mask_bits = ale->params.ale_ports; + ale->port_num_bits = order_base_2(ale->params.ale_ports); + ale->vlan_field_bits = ale->params.ale_ports; /* Set defaults override for ALE on NetCP NU switch and for version * 1R3 @@ -847,57 +875,12 @@ void cpsw_ale_start(struct cpsw_ale *ale) ale_controls[ALE_PORT_UNTAGGED_EGRESS].shift = 0; ale_controls[ALE_PORT_UNTAGGED_EGRESS].offset = ALE_UNKNOWNVLAN_FORCE_UNTAG_EGRESS; - ale->port_mask_bits = ale->params.ale_ports; - ale->port_num_bits = ale->params.ale_ports - 1; - ale->vlan_field_bits = ale->params.ale_ports; - } else if (ale->version == ALE_VERSION_1R3) { - ale->port_mask_bits = ale->params.ale_ports; - ale->port_num_bits = 3; - ale->vlan_field_bits = ale->params.ale_ports; } - cpsw_ale_control_set(ale, 0, ALE_ENABLE, 1); - cpsw_ale_control_set(ale, 0, ALE_CLEAR, 1); - - timer_setup(&ale->timer, cpsw_ale_timer, 0); - if (ale->ageout) { - ale->timer.expires = jiffies + ale->ageout; - add_timer(&ale->timer); - } -} -EXPORT_SYMBOL_GPL(cpsw_ale_start); - -void cpsw_ale_stop(struct cpsw_ale *ale) -{ - del_timer_sync(&ale->timer); -} -EXPORT_SYMBOL_GPL(cpsw_ale_stop); - -struct cpsw_ale *cpsw_ale_create(struct cpsw_ale_params *params) -{ - struct cpsw_ale *ale; - - ale = kzalloc(sizeof(*ale), GFP_KERNEL); - if (!ale) - return NULL; - - ale->params = *params; - ale->ageout = ale->params.ale_ageout * HZ; - return ale; } EXPORT_SYMBOL_GPL(cpsw_ale_create); -int cpsw_ale_destroy(struct cpsw_ale *ale) -{ - if (!ale) - return -EINVAL; - cpsw_ale_control_set(ale, 0, ALE_ENABLE, 0); - kfree(ale); - return 0; -} -EXPORT_SYMBOL_GPL(cpsw_ale_destroy); - void cpsw_ale_dump(struct cpsw_ale *ale, u32 *data) { int i; diff --git a/drivers/net/ethernet/ti/cpsw_ale.h b/drivers/net/ethernet/ti/cpsw_ale.h index 25d24e8d0904..d4fe9016429b 100644 --- a/drivers/net/ethernet/ti/cpsw_ale.h +++ b/drivers/net/ethernet/ti/cpsw_ale.h @@ -100,7 +100,6 @@ enum cpsw_ale_port_state { #define ALE_ENTRY_WORDS DIV_ROUND_UP(ALE_ENTRY_BITS, 32) struct cpsw_ale *cpsw_ale_create(struct cpsw_ale_params *params); -int cpsw_ale_destroy(struct cpsw_ale *ale); void cpsw_ale_start(struct cpsw_ale *ale); void cpsw_ale_stop(struct cpsw_ale *ale); diff --git a/drivers/net/ethernet/ti/davinci_emac.c b/drivers/net/ethernet/ti/davinci_emac.c index 4bb561856af5..f58c0c620356 100644 --- a/drivers/net/ethernet/ti/davinci_emac.c +++ b/drivers/net/ethernet/ti/davinci_emac.c @@ -1385,11 +1385,6 @@ static int emac_devioctl(struct net_device *ndev, struct ifreq *ifrq, int cmd) return -EOPNOTSUPP; } -static int match_first_device(struct device *dev, void *data) -{ - return !strncmp(dev_name(dev), "davinci_mdio", 12); -} - /** * emac_dev_open - EMAC device open * @ndev: The DaVinci EMAC network adapter @@ -1489,8 +1484,8 @@ static int emac_dev_open(struct net_device *ndev) /* use the first phy on the bus if pdata did not give us a phy id */ if (!phydev && !priv->phy_id) { - phy = bus_find_device(&mdio_bus_type, NULL, NULL, - match_first_device); + phy = bus_find_device_by_name(&mdio_bus_type, NULL, + "davinci_mdio"); if (phy) { priv->phy_id = dev_name(phy); if (!priv->phy_id || !*priv->phy_id) diff --git a/drivers/net/ethernet/ti/netcp_ethss.c b/drivers/net/ethernet/ti/netcp_ethss.c index e831c49713ee..56dbc0b9fedc 100644 --- a/drivers/net/ethernet/ti/netcp_ethss.c +++ b/drivers/net/ethernet/ti/netcp_ethss.c @@ -27,6 +27,7 @@ #include <linux/net_tstamp.h> #include <linux/ethtool.h> +#include "cpsw.h" #include "cpsw_ale.h" #include "netcp.h" #include "cpts.h" @@ -2047,10 +2048,6 @@ static const struct ethtool_ops keystone_ethtool_ops = { .get_ts_info = keystone_get_ts_info, }; -#define mac_hi(mac) (((mac)[0] << 0) | ((mac)[1] << 8) | \ - ((mac)[2] << 16) | ((mac)[3] << 24)) -#define mac_lo(mac) (((mac)[4] << 0) | ((mac)[5] << 8)) - static void gbe_set_slave_mac(struct gbe_slave *slave, struct gbe_intf *gbe_intf) { @@ -3692,7 +3689,6 @@ static int gbe_remove(struct netcp_device *netcp_device, void *inst_priv) del_timer_sync(&gbe_dev->timer); cpts_release(gbe_dev->cpts); cpsw_ale_stop(gbe_dev->ale); - cpsw_ale_destroy(gbe_dev->ale); netcp_txpipe_close(&gbe_dev->tx_pipe); free_secondary_ports(gbe_dev); diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h index 88ddfb92122b..3d940c67ea94 100644 --- a/drivers/net/hyperv/hyperv_net.h +++ b/drivers/net/hyperv/hyperv_net.h @@ -146,7 +146,6 @@ struct hv_netvsc_packet { struct netvsc_device_info { unsigned char mac_adr[ETH_ALEN]; - int ring_size; u32 num_chn; u32 send_sections; u32 recv_sections; @@ -188,6 +187,9 @@ struct rndis_message; struct netvsc_device; struct net_device_context; +extern u32 netvsc_ring_bytes; +extern struct reciprocal_value netvsc_ring_reciprocal; + struct netvsc_device *netvsc_device_add(struct hv_device *device, const struct netvsc_device_info *info); int netvsc_alloc_recv_comp_ring(struct netvsc_device *net_device, u32 q_idx); @@ -804,8 +806,6 @@ struct netvsc_device { struct rndis_device *extension; - int ring_size; - u32 max_pkt; /* max number of pkt in one send, e.g. 8 */ u32 pkt_align; /* alignment bytes, e.g. 8 */ @@ -1425,32 +1425,6 @@ struct rndis_message { (sizeof(msg) + (sizeof(struct rndis_message) - \ sizeof(union rndis_message_container))) -/* get pointer to info buffer with message pointer */ -#define MESSAGE_TO_INFO_BUFFER(msg) \ - (((unsigned char *)(msg)) + msg->info_buf_offset) - -/* get pointer to status buffer with message pointer */ -#define MESSAGE_TO_STATUS_BUFFER(msg) \ - (((unsigned char *)(msg)) + msg->status_buf_offset) - -/* get pointer to OOBD buffer with message pointer */ -#define MESSAGE_TO_OOBD_BUFFER(msg) \ - (((unsigned char *)(msg)) + msg->oob_data_offset) - -/* get pointer to data buffer with message pointer */ -#define MESSAGE_TO_DATA_BUFFER(msg) \ - (((unsigned char *)(msg)) + msg->per_pkt_info_offset) - -/* get pointer to contained message from NDIS_MESSAGE pointer */ -#define RNDIS_MESSAGE_PTR_TO_MESSAGE_PTR(rndis_msg) \ - ((void *) &rndis_msg->msg) - -/* get pointer to contained message from NDIS_MESSAGE pointer */ -#define RNDIS_MESSAGE_RAW_PTR_TO_MESSAGE_PTR(rndis_msg) \ - ((void *) rndis_msg) - - - #define RNDIS_HEADER_SIZE (sizeof(struct rndis_message) - \ sizeof(union rndis_message_container)) diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c index bfc79698b8f4..e4bcd202a56a 100644 --- a/drivers/net/hyperv/netvsc.c +++ b/drivers/net/hyperv/netvsc.c @@ -31,6 +31,7 @@ #include <linux/vmalloc.h> #include <linux/rtnetlink.h> #include <linux/prefetch.h> +#include <linux/reciprocal_div.h> #include <asm/sync_bitops.h> @@ -588,14 +589,11 @@ void netvsc_device_remove(struct hv_device *device) * Get the percentage of available bytes to write in the ring. * The return value is in range from 0 to 100. */ -static inline u32 hv_ringbuf_avail_percent( - struct hv_ring_buffer_info *ring_info) +static u32 hv_ringbuf_avail_percent(const struct hv_ring_buffer_info *ring_info) { - u32 avail_read, avail_write; + u32 avail_write = hv_get_bytes_to_write(ring_info); - hv_get_ringbuffer_availbytes(ring_info, &avail_read, &avail_write); - - return avail_write * 100 / ring_info->ring_datasize; + return reciprocal_divide(avail_write * 100, netvsc_ring_reciprocal); } static inline void netvsc_free_send_slot(struct netvsc_device *net_device, @@ -712,11 +710,12 @@ static u32 netvsc_copy_to_send_buf(struct netvsc_device *net_device, int i; u32 msg_size = 0; u32 padding = 0; - u32 remain = packet->total_data_buflen % net_device->pkt_align; u32 page_count = packet->cp_partial ? packet->rmsg_pgcnt : packet->page_buf_cnt; + u32 remain; /* Add padding */ + remain = packet->total_data_buflen & (net_device->pkt_align - 1); if (skb->xmit_more && remain && !packet->cp_partial) { padding = net_device->pkt_align - remain; rndis_msg->msg_len += padding; @@ -848,7 +847,6 @@ int netvsc_send(struct net_device_context *ndev_ctx, struct hv_netvsc_packet *msd_send = NULL, *cur_send = NULL; struct sk_buff *msd_skb = NULL; bool try_batch; - bool xmit_more = (skb != NULL) ? skb->xmit_more : false; /* If device is rescinded, return error and packet will get dropped. */ if (unlikely(!net_device || net_device->destroy)) @@ -922,7 +920,7 @@ int netvsc_send(struct net_device_context *ndev_ctx, if (msdp->skb) dev_consume_skb_any(msdp->skb); - if (xmit_more && !packet->cp_partial) { + if (skb->xmit_more && !packet->cp_partial) { msdp->skb = skb; msdp->pkt = packet; msdp->count++; @@ -1249,7 +1247,6 @@ struct netvsc_device *netvsc_device_add(struct hv_device *device, const struct netvsc_device_info *device_info) { int i, ret = 0; - int ring_size = device_info->ring_size; struct netvsc_device *net_device; struct net_device *ndev = hv_get_drvdata(device); struct net_device_context *net_device_ctx = netdev_priv(ndev); @@ -1261,8 +1258,6 @@ struct netvsc_device *netvsc_device_add(struct hv_device *device, for (i = 0; i < VRSS_SEND_TAB_SIZE; i++) net_device_ctx->tx_table[i] = 0; - net_device->ring_size = ring_size; - /* Because the device uses NAPI, all the interrupt batching and * control is done via Net softirq, not the channel handling */ @@ -1289,10 +1284,9 @@ struct netvsc_device *netvsc_device_add(struct hv_device *device, netvsc_poll, NAPI_POLL_WEIGHT); /* Open the channel */ - ret = vmbus_open(device->channel, ring_size * PAGE_SIZE, - ring_size * PAGE_SIZE, NULL, 0, - netvsc_channel_cb, - net_device->chan_table); + ret = vmbus_open(device->channel, netvsc_ring_bytes, + netvsc_ring_bytes, NULL, 0, + netvsc_channel_cb, net_device->chan_table); if (ret != 0) { netif_napi_del(&net_device->chan_table[0].napi); diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c index 5129647d420c..dc70de674ca9 100644 --- a/drivers/net/hyperv/netvsc_drv.c +++ b/drivers/net/hyperv/netvsc_drv.c @@ -35,6 +35,7 @@ #include <linux/slab.h> #include <linux/rtnetlink.h> #include <linux/netpoll.h> +#include <linux/reciprocal_div.h> #include <net/arp.h> #include <net/route.h> @@ -54,9 +55,11 @@ #define LINKCHANGE_INT (2 * HZ) #define VF_TAKEOVER_INT (HZ / 10) -static int ring_size = 128; -module_param(ring_size, int, S_IRUGO); +static unsigned int ring_size __ro_after_init = 128; +module_param(ring_size, uint, S_IRUGO); MODULE_PARM_DESC(ring_size, "Ring buffer size (# of pages)"); +unsigned int netvsc_ring_bytes __ro_after_init; +struct reciprocal_value netvsc_ring_reciprocal __ro_after_init; static const u32 default_msg = NETIF_MSG_DRV | NETIF_MSG_PROBE | NETIF_MSG_LINK | NETIF_MSG_IFUP | @@ -174,17 +177,15 @@ out: return ret; } -static void *init_ppi_data(struct rndis_message *msg, u32 ppi_size, - int pkt_type) +static inline void *init_ppi_data(struct rndis_message *msg, + u32 ppi_size, u32 pkt_type) { - struct rndis_packet *rndis_pkt; + struct rndis_packet *rndis_pkt = &msg->msg.pkt; struct rndis_per_packet_info *ppi; - rndis_pkt = &msg->msg.pkt; rndis_pkt->data_offset += ppi_size; - - ppi = (struct rndis_per_packet_info *)((void *)rndis_pkt + - rndis_pkt->per_pkt_info_offset + rndis_pkt->per_pkt_info_len); + ppi = (void *)rndis_pkt + rndis_pkt->per_pkt_info_offset + + rndis_pkt->per_pkt_info_len; ppi->size = ppi_size; ppi->type = pkt_type; @@ -192,7 +193,7 @@ static void *init_ppi_data(struct rndis_message *msg, u32 ppi_size, rndis_pkt->per_pkt_info_len += ppi_size; - return ppi; + return ppi + 1; } /* Azure hosts don't support non-TCP port numbers in hashing for fragmented @@ -469,10 +470,8 @@ static int netvsc_start_xmit(struct sk_buff *skb, struct net_device *net) int ret; unsigned int num_data_pgs; struct rndis_message *rndis_msg; - struct rndis_packet *rndis_pkt; struct net_device *vf_netdev; u32 rndis_msg_size; - struct rndis_per_packet_info *ppi; u32 hash; struct hv_page_buffer pb[MAX_PAGE_BUFFER_COUNT]; @@ -527,34 +526,36 @@ static int netvsc_start_xmit(struct sk_buff *skb, struct net_device *net) rndis_msg = (struct rndis_message *)skb->head; - memset(rndis_msg, 0, RNDIS_AND_PPI_SIZE); - /* Add the rndis header */ rndis_msg->ndis_msg_type = RNDIS_MSG_PACKET; rndis_msg->msg_len = packet->total_data_buflen; - rndis_pkt = &rndis_msg->msg.pkt; - rndis_pkt->data_offset = sizeof(struct rndis_packet); - rndis_pkt->data_len = packet->total_data_buflen; - rndis_pkt->per_pkt_info_offset = sizeof(struct rndis_packet); + + rndis_msg->msg.pkt = (struct rndis_packet) { + .data_offset = sizeof(struct rndis_packet), + .data_len = packet->total_data_buflen, + .per_pkt_info_offset = sizeof(struct rndis_packet), + }; rndis_msg_size = RNDIS_MESSAGE_SIZE(struct rndis_packet); hash = skb_get_hash_raw(skb); if (hash != 0 && net->real_num_tx_queues > 1) { + u32 *hash_info; + rndis_msg_size += NDIS_HASH_PPI_SIZE; - ppi = init_ppi_data(rndis_msg, NDIS_HASH_PPI_SIZE, - NBL_HASH_VALUE); - *(u32 *)((void *)ppi + ppi->ppi_offset) = hash; + hash_info = init_ppi_data(rndis_msg, NDIS_HASH_PPI_SIZE, + NBL_HASH_VALUE); + *hash_info = hash; } if (skb_vlan_tag_present(skb)) { struct ndis_pkt_8021q_info *vlan; rndis_msg_size += NDIS_VLAN_PPI_SIZE; - ppi = init_ppi_data(rndis_msg, NDIS_VLAN_PPI_SIZE, - IEEE_8021Q_INFO); + vlan = init_ppi_data(rndis_msg, NDIS_VLAN_PPI_SIZE, + IEEE_8021Q_INFO); - vlan = (void *)ppi + ppi->ppi_offset; + vlan->value = 0; vlan->vlanid = skb->vlan_tci & VLAN_VID_MASK; vlan->pri = (skb->vlan_tci & VLAN_PRIO_MASK) >> VLAN_PRIO_SHIFT; @@ -564,11 +565,10 @@ static int netvsc_start_xmit(struct sk_buff *skb, struct net_device *net) struct ndis_tcp_lso_info *lso_info; rndis_msg_size += NDIS_LSO_PPI_SIZE; - ppi = init_ppi_data(rndis_msg, NDIS_LSO_PPI_SIZE, - TCP_LARGESEND_PKTINFO); - - lso_info = (void *)ppi + ppi->ppi_offset; + lso_info = init_ppi_data(rndis_msg, NDIS_LSO_PPI_SIZE, + TCP_LARGESEND_PKTINFO); + lso_info->value = 0; lso_info->lso_v2_transmit.type = NDIS_TCP_LARGE_SEND_OFFLOAD_V2_TYPE; if (skb->protocol == htons(ETH_P_IP)) { lso_info->lso_v2_transmit.ip_version = @@ -593,12 +593,10 @@ static int netvsc_start_xmit(struct sk_buff *skb, struct net_device *net) struct ndis_tcp_ip_checksum_info *csum_info; rndis_msg_size += NDIS_CSUM_PPI_SIZE; - ppi = init_ppi_data(rndis_msg, NDIS_CSUM_PPI_SIZE, - TCPIP_CHKSUM_PKTINFO); - - csum_info = (struct ndis_tcp_ip_checksum_info *)((void *)ppi + - ppi->ppi_offset); + csum_info = init_ppi_data(rndis_msg, NDIS_CSUM_PPI_SIZE, + TCPIP_CHKSUM_PKTINFO); + csum_info->value = 0; csum_info->transmit.tcp_header_offset = skb_transport_offset(skb); if (skb->protocol == htons(ETH_P_IP)) { @@ -860,7 +858,6 @@ static int netvsc_set_channels(struct net_device *net, memset(&device_info, 0, sizeof(device_info)); device_info.num_chn = count; - device_info.ring_size = ring_size; device_info.send_sections = nvdev->send_section_cnt; device_info.send_section_size = nvdev->send_section_size; device_info.recv_sections = nvdev->recv_section_cnt; @@ -975,7 +972,6 @@ static int netvsc_change_mtu(struct net_device *ndev, int mtu) rndis_filter_close(nvdev); memset(&device_info, 0, sizeof(device_info)); - device_info.ring_size = ring_size; device_info.num_chn = nvdev->num_chn; device_info.send_sections = nvdev->send_section_cnt; device_info.send_section_size = nvdev->send_section_size; @@ -1539,7 +1535,6 @@ static int netvsc_set_ringparam(struct net_device *ndev, memset(&device_info, 0, sizeof(device_info)); device_info.num_chn = nvdev->num_chn; - device_info.ring_size = ring_size; device_info.send_sections = new_tx; device_info.send_section_size = nvdev->send_section_size; device_info.recv_sections = new_rx; @@ -1995,7 +1990,6 @@ static int netvsc_probe(struct hv_device *dev, /* Notify the netvsc driver of the new device */ memset(&device_info, 0, sizeof(device_info)); - device_info.ring_size = ring_size; device_info.num_chn = VRSS_CHANNEL_DEFAULT; device_info.send_sections = NETVSC_DEFAULT_TX; device_info.send_section_size = NETVSC_SEND_SECTION_SIZE; @@ -2158,11 +2152,13 @@ static int __init netvsc_drv_init(void) if (ring_size < RING_SIZE_MIN) { ring_size = RING_SIZE_MIN; - pr_info("Increased ring_size to %d (min allowed)\n", + pr_info("Increased ring_size to %u (min allowed)\n", ring_size); } - ret = vmbus_driver_register(&netvsc_drv); + netvsc_ring_bytes = ring_size * PAGE_SIZE; + netvsc_ring_reciprocal = reciprocal_value(netvsc_ring_bytes); + ret = vmbus_driver_register(&netvsc_drv); if (ret) return ret; diff --git a/drivers/net/hyperv/rndis_filter.c b/drivers/net/hyperv/rndis_filter.c index 7b637c7dd1e5..673492063307 100644 --- a/drivers/net/hyperv/rndis_filter.c +++ b/drivers/net/hyperv/rndis_filter.c @@ -1040,8 +1040,8 @@ static void netvsc_sc_open(struct vmbus_channel *new_sc) /* Set the channel before opening.*/ nvchan->channel = new_sc; - ret = vmbus_open(new_sc, nvscdev->ring_size * PAGE_SIZE, - nvscdev->ring_size * PAGE_SIZE, NULL, 0, + ret = vmbus_open(new_sc, netvsc_ring_bytes, + netvsc_ring_bytes, NULL, 0, netvsc_channel_cb, nvchan); if (ret == 0) napi_enable(&nvchan->napi); diff --git a/drivers/net/ieee802154/adf7242.c b/drivers/net/ieee802154/adf7242.c index 400fdbd3a120..64f1b1e77bc0 100644 --- a/drivers/net/ieee802154/adf7242.c +++ b/drivers/net/ieee802154/adf7242.c @@ -1,7 +1,7 @@ /* * Analog Devices ADF7242 Low-Power IEEE 802.15.4 Transceiver * - * Copyright 2009-2015 Analog Devices Inc. + * Copyright 2009-2017 Analog Devices Inc. * * Licensed under the GPL-2 or later. * @@ -344,12 +344,18 @@ static int adf7242_wait_status(struct adf7242_local *lp, unsigned int status, return ret; } -static int adf7242_wait_ready(struct adf7242_local *lp, int line) +static int adf7242_wait_rc_ready(struct adf7242_local *lp, int line) { return adf7242_wait_status(lp, STAT_RC_READY | STAT_SPI_READY, STAT_RC_READY | STAT_SPI_READY, line); } +static int adf7242_wait_spi_ready(struct adf7242_local *lp, int line) +{ + return adf7242_wait_status(lp, STAT_SPI_READY, + STAT_SPI_READY, line); +} + static int adf7242_write_fbuf(struct adf7242_local *lp, u8 *data, u8 len) { u8 *buf = lp->buf; @@ -369,7 +375,7 @@ static int adf7242_write_fbuf(struct adf7242_local *lp, u8 *data, u8 len) spi_message_add_tail(&xfer_head, &msg); spi_message_add_tail(&xfer_buf, &msg); - adf7242_wait_ready(lp, __LINE__); + adf7242_wait_spi_ready(lp, __LINE__); mutex_lock(&lp->bmux); buf[0] = CMD_SPI_PKT_WR; @@ -401,7 +407,7 @@ static int adf7242_read_fbuf(struct adf7242_local *lp, spi_message_add_tail(&xfer_head, &msg); spi_message_add_tail(&xfer_buf, &msg); - adf7242_wait_ready(lp, __LINE__); + adf7242_wait_spi_ready(lp, __LINE__); mutex_lock(&lp->bmux); if (packet_read) { @@ -432,7 +438,7 @@ static int adf7242_read_reg(struct adf7242_local *lp, u16 addr, u8 *data) .rx_buf = lp->buf_read_rx, }; - adf7242_wait_ready(lp, __LINE__); + adf7242_wait_spi_ready(lp, __LINE__); mutex_lock(&lp->bmux); lp->buf_read_tx[0] = CMD_SPI_MEM_RD(addr); @@ -462,7 +468,7 @@ static int adf7242_write_reg(struct adf7242_local *lp, u16 addr, u8 data) { int status; - adf7242_wait_ready(lp, __LINE__); + adf7242_wait_spi_ready(lp, __LINE__); mutex_lock(&lp->bmux); lp->buf_reg_tx[0] = CMD_SPI_MEM_WR(addr); @@ -484,7 +490,7 @@ static int adf7242_cmd(struct adf7242_local *lp, unsigned int cmd) dev_vdbg(&lp->spi->dev, "%s : CMD=0x%X\n", __func__, cmd); if (cmd != CMD_RC_PC_RESET_NO_WAIT) - adf7242_wait_ready(lp, __LINE__); + adf7242_wait_rc_ready(lp, __LINE__); mutex_lock(&lp->bmux); lp->buf_cmd = cmd; @@ -557,6 +563,22 @@ static int adf7242_verify_firmware(struct adf7242_local *lp, return 0; } +static void adf7242_clear_irqstat(struct adf7242_local *lp) +{ + adf7242_write_reg(lp, REG_IRQ1_SRC1, IRQ_CCA_COMPLETE | IRQ_SFD_RX | + IRQ_SFD_TX | IRQ_RX_PKT_RCVD | IRQ_TX_PKT_SENT | + IRQ_FRAME_VALID | IRQ_ADDRESS_VALID | IRQ_CSMA_CA); +} + +static int adf7242_cmd_rx(struct adf7242_local *lp) +{ + /* Wait until the ACK is sent */ + adf7242_wait_status(lp, RC_STATUS_PHY_RDY, RC_STATUS_MASK, __LINE__); + adf7242_clear_irqstat(lp); + + return adf7242_cmd(lp, CMD_RC_RX); +} + static int adf7242_set_txpower(struct ieee802154_hw *hw, int mbm) { struct adf7242_local *lp = hw->priv; @@ -660,7 +682,7 @@ static int adf7242_start(struct ieee802154_hw *hw) struct adf7242_local *lp = hw->priv; adf7242_cmd(lp, CMD_RC_PHY_RDY); - adf7242_write_reg(lp, REG_IRQ1_SRC1, 0xFF); + adf7242_clear_irqstat(lp); enable_irq(lp->spi->irq); set_bit(FLAG_START, &lp->flags); @@ -671,10 +693,10 @@ static void adf7242_stop(struct ieee802154_hw *hw) { struct adf7242_local *lp = hw->priv; + disable_irq(lp->spi->irq); adf7242_cmd(lp, CMD_RC_IDLE); clear_bit(FLAG_START, &lp->flags); - disable_irq(lp->spi->irq); - adf7242_write_reg(lp, REG_IRQ1_SRC1, 0xFF); + adf7242_clear_irqstat(lp); } static int adf7242_channel(struct ieee802154_hw *hw, u8 page, u8 channel) @@ -789,9 +811,12 @@ static int adf7242_xmit(struct ieee802154_hw *hw, struct sk_buff *skb) struct adf7242_local *lp = hw->priv; int ret; + /* ensure existing instances of the IRQ handler have completed */ + disable_irq(lp->spi->irq); set_bit(FLAG_XMIT, &lp->flags); reinit_completion(&lp->tx_complete); adf7242_cmd(lp, CMD_RC_PHY_RDY); + adf7242_clear_irqstat(lp); ret = adf7242_write_fbuf(lp, skb->data, skb->len); if (ret) @@ -800,6 +825,7 @@ static int adf7242_xmit(struct ieee802154_hw *hw, struct sk_buff *skb) ret = adf7242_cmd(lp, CMD_RC_CSMACA); if (ret) goto err; + enable_irq(lp->spi->irq); ret = wait_for_completion_interruptible_timeout(&lp->tx_complete, HZ / 10); @@ -822,7 +848,7 @@ static int adf7242_xmit(struct ieee802154_hw *hw, struct sk_buff *skb) err: clear_bit(FLAG_XMIT, &lp->flags); - adf7242_cmd(lp, CMD_RC_RX); + adf7242_cmd_rx(lp); return ret; } @@ -846,7 +872,7 @@ static int adf7242_rx(struct adf7242_local *lp) skb = dev_alloc_skb(len); if (!skb) { - adf7242_cmd(lp, CMD_RC_RX); + adf7242_cmd_rx(lp); return -ENOMEM; } @@ -854,14 +880,14 @@ static int adf7242_rx(struct adf7242_local *lp) ret = adf7242_read_fbuf(lp, data, len, true); if (ret < 0) { kfree_skb(skb); - adf7242_cmd(lp, CMD_RC_RX); + adf7242_cmd_rx(lp); return ret; } lqi = data[len - 2]; lp->rssi = data[len - 1]; - adf7242_cmd(lp, CMD_RC_RX); + ret = adf7242_cmd_rx(lp); skb_trim(skb, len - 2); /* Don't put RSSI/LQI or CRC into the frame */ @@ -870,7 +896,7 @@ static int adf7242_rx(struct adf7242_local *lp) dev_dbg(&lp->spi->dev, "%s: ret=%d len=%d lqi=%d rssi=%d\n", __func__, ret, (int)len, (int)lqi, lp->rssi); - return 0; + return ret; } static const struct ieee802154_ops adf7242_ops = { @@ -888,7 +914,7 @@ static const struct ieee802154_ops adf7242_ops = { .set_cca_ed_level = adf7242_set_cca_ed_level, }; -static void adf7242_debug(u8 irq1) +static void adf7242_debug(struct adf7242_local *lp, u8 irq1) { #ifdef DEBUG u8 stat; @@ -906,9 +932,12 @@ static void adf7242_debug(u8 irq1) irq1 & IRQ_FRAME_VALID ? "IRQ_FRAME_VALID\n" : "", irq1 & IRQ_ADDRESS_VALID ? "IRQ_ADDRESS_VALID\n" : ""); - dev_dbg(&lp->spi->dev, "%s STATUS = %X:\n%s\n%s%s%s%s%s\n", + dev_dbg(&lp->spi->dev, "%s STATUS = %X:\n%s\n%s\n%s\n%s\n%s%s%s%s%s\n", __func__, stat, + stat & STAT_SPI_READY ? "SPI_READY" : "SPI_BUSY", + stat & STAT_IRQ_STATUS ? "IRQ_PENDING" : "IRQ_CLEAR", stat & STAT_RC_READY ? "RC_READY" : "RC_BUSY", + stat & STAT_CCA_RESULT ? "CHAN_IDLE" : "CHAN_BUSY", (stat & 0xf) == RC_STATUS_IDLE ? "RC_STATUS_IDLE" : "", (stat & 0xf) == RC_STATUS_MEAS ? "RC_STATUS_MEAS" : "", (stat & 0xf) == RC_STATUS_PHY_RDY ? "RC_STATUS_PHY_RDY" : "", @@ -923,20 +952,20 @@ static irqreturn_t adf7242_isr(int irq, void *data) unsigned int xmit; u8 irq1; - adf7242_wait_status(lp, RC_STATUS_PHY_RDY, RC_STATUS_MASK, __LINE__); - adf7242_read_reg(lp, REG_IRQ1_SRC1, &irq1); - adf7242_write_reg(lp, REG_IRQ1_SRC1, irq1); if (!(irq1 & (IRQ_RX_PKT_RCVD | IRQ_CSMA_CA))) dev_err(&lp->spi->dev, "%s :ERROR IRQ1 = 0x%X\n", __func__, irq1); - adf7242_debug(irq1); + adf7242_debug(lp, irq1); xmit = test_bit(FLAG_XMIT, &lp->flags); if (xmit && (irq1 & IRQ_CSMA_CA)) { + adf7242_wait_status(lp, RC_STATUS_PHY_RDY, + RC_STATUS_MASK, __LINE__); + if (ADF7242_REPORT_CSMA_CA_STAT) { u8 astat; @@ -957,6 +986,7 @@ static irqreturn_t adf7242_isr(int irq, void *data) lp->tx_stat = SUCCESS; } complete(&lp->tx_complete); + adf7242_clear_irqstat(lp); } else if (!xmit && (irq1 & IRQ_RX_PKT_RCVD) && (irq1 & IRQ_FRAME_VALID)) { adf7242_rx(lp); @@ -965,16 +995,19 @@ static irqreturn_t adf7242_isr(int irq, void *data) dev_dbg(&lp->spi->dev, "%s:%d : ERROR IRQ1 = 0x%X\n", __func__, __LINE__, irq1); adf7242_cmd(lp, CMD_RC_PHY_RDY); - adf7242_write_reg(lp, REG_IRQ1_SRC1, 0xFF); - adf7242_cmd(lp, CMD_RC_RX); + adf7242_cmd_rx(lp); } else { /* This can only be xmit without IRQ, likely a RX packet. * we get an TX IRQ shortly - do nothing or let the xmit * timeout handle this */ + dev_dbg(&lp->spi->dev, "%s:%d : ERROR IRQ1 = 0x%X, xmit %d\n", __func__, __LINE__, irq1, xmit); + adf7242_wait_status(lp, RC_STATUS_PHY_RDY, + RC_STATUS_MASK, __LINE__); complete(&lp->tx_complete); + adf7242_clear_irqstat(lp); } return IRQ_HANDLED; @@ -994,7 +1027,7 @@ static int adf7242_soft_reset(struct adf7242_local *lp, int line) adf7242_set_promiscuous_mode(lp->hw, lp->promiscuous); adf7242_set_csma_params(lp->hw, lp->min_be, lp->max_be, lp->max_cca_retries); - adf7242_write_reg(lp, REG_IRQ1_SRC1, 0xFF); + adf7242_clear_irqstat(lp); if (test_bit(FLAG_START, &lp->flags)) { enable_irq(lp->spi->irq); @@ -1060,7 +1093,7 @@ static int adf7242_hw_init(struct adf7242_local *lp) adf7242_write_reg(lp, REG_IRQ1_EN0, 0); adf7242_write_reg(lp, REG_IRQ1_EN1, IRQ_RX_PKT_RCVD | IRQ_CSMA_CA); - adf7242_write_reg(lp, REG_IRQ1_SRC1, 0xFF); + adf7242_clear_irqstat(lp); adf7242_write_reg(lp, REG_IRQ1_SRC0, 0xFF); adf7242_cmd(lp, CMD_RC_IDLE); @@ -1086,8 +1119,11 @@ static int adf7242_stats_show(struct seq_file *file, void *offset) irq1 & IRQ_FRAME_VALID ? "IRQ_FRAME_VALID\n" : "", irq1 & IRQ_ADDRESS_VALID ? "IRQ_ADDRESS_VALID\n" : ""); - seq_printf(file, "STATUS = %X:\n%s\n%s%s%s%s%s\n", stat, + seq_printf(file, "STATUS = %X:\n%s\n%s\n%s\n%s\n%s%s%s%s%s\n", stat, + stat & STAT_SPI_READY ? "SPI_READY" : "SPI_BUSY", + stat & STAT_IRQ_STATUS ? "IRQ_PENDING" : "IRQ_CLEAR", stat & STAT_RC_READY ? "RC_READY" : "RC_BUSY", + stat & STAT_CCA_RESULT ? "CHAN_IDLE" : "CHAN_BUSY", (stat & 0xf) == RC_STATUS_IDLE ? "RC_STATUS_IDLE" : "", (stat & 0xf) == RC_STATUS_MEAS ? "RC_STATUS_MEAS" : "", (stat & 0xf) == RC_STATUS_PHY_RDY ? "RC_STATUS_PHY_RDY" : "", @@ -1257,12 +1293,14 @@ static int adf7242_remove(struct spi_device *spi) static const struct of_device_id adf7242_of_match[] = { { .compatible = "adi,adf7242", }, + { .compatible = "adi,adf7241", }, { }, }; MODULE_DEVICE_TABLE(of, adf7242_of_match); static const struct spi_device_id adf7242_device_id[] = { { .name = "adf7242", }, + { .name = "adf7241", }, { }, }; MODULE_DEVICE_TABLE(spi, adf7242_device_id); diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c index 77cc4fbaeace..9774c96ac7bb 100644 --- a/drivers/net/ipvlan/ipvlan_core.c +++ b/drivers/net/ipvlan/ipvlan_core.c @@ -664,8 +664,6 @@ static rx_handler_result_t ipvlan_handle_mode_l2(struct sk_buff **pskb, struct sk_buff *skb = *pskb; struct ethhdr *eth = eth_hdr(skb); rx_handler_result_t ret = RX_HANDLER_PASS; - void *lyr3h; - int addr_type; if (is_multicast_ether_addr(eth->h_dest)) { if (ipvlan_external_frame(skb, port)) { @@ -683,15 +681,8 @@ static rx_handler_result_t ipvlan_handle_mode_l2(struct sk_buff **pskb, } } } else { - struct ipvl_addr *addr; - - lyr3h = ipvlan_get_L3_hdr(port, skb, &addr_type); - if (!lyr3h) - return ret; - - addr = ipvlan_addr_lookup(port, lyr3h, addr_type, true); - if (addr) - ret = ipvlan_rcv_frame(addr, pskb, false); + /* Perform like l3 mode for non-multicast packet */ + ret = ipvlan_handle_mode_l3(pskb, port); } return ret; diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c index 30cb803e2fe5..2469df118fbf 100644 --- a/drivers/net/ipvlan/ipvlan_main.c +++ b/drivers/net/ipvlan/ipvlan_main.c @@ -850,6 +850,19 @@ static void ipvlan_del_addr6(struct ipvl_dev *ipvlan, struct in6_addr *ip6_addr) return ipvlan_del_addr(ipvlan, ip6_addr, true); } +static bool ipvlan_is_valid_dev(const struct net_device *dev) +{ + struct ipvl_dev *ipvlan = netdev_priv(dev); + + if (!netif_is_ipvlan(dev)) + return false; + + if (!ipvlan || !ipvlan->port) + return false; + + return true; +} + static int ipvlan_addr6_event(struct notifier_block *unused, unsigned long event, void *ptr) { @@ -857,10 +870,7 @@ static int ipvlan_addr6_event(struct notifier_block *unused, struct net_device *dev = (struct net_device *)if6->idev->dev; struct ipvl_dev *ipvlan = netdev_priv(dev); - if (!netif_is_ipvlan(dev)) - return NOTIFY_DONE; - - if (!ipvlan || !ipvlan->port) + if (!ipvlan_is_valid_dev(dev)) return NOTIFY_DONE; switch (event) { @@ -888,10 +898,7 @@ static int ipvlan_addr6_validator_event(struct notifier_block *unused, if (in_softirq()) return NOTIFY_DONE; - if (!netif_is_ipvlan(dev)) - return NOTIFY_DONE; - - if (!ipvlan || !ipvlan->port) + if (!ipvlan_is_valid_dev(dev)) return NOTIFY_DONE; switch (event) { @@ -932,10 +939,7 @@ static int ipvlan_addr4_event(struct notifier_block *unused, struct ipvl_dev *ipvlan = netdev_priv(dev); struct in_addr ip4_addr; - if (!netif_is_ipvlan(dev)) - return NOTIFY_DONE; - - if (!ipvlan || !ipvlan->port) + if (!ipvlan_is_valid_dev(dev)) return NOTIFY_DONE; switch (event) { @@ -961,10 +965,7 @@ static int ipvlan_addr4_validator_event(struct notifier_block *unused, struct net_device *dev = (struct net_device *)ivi->ivi_dev->dev; struct ipvl_dev *ipvlan = netdev_priv(dev); - if (!netif_is_ipvlan(dev)) - return NOTIFY_DONE; - - if (!ipvlan || !ipvlan->port) + if (!ipvlan_is_valid_dev(dev)) return NOTIFY_DONE; switch (event) { diff --git a/drivers/net/netdevsim/Makefile b/drivers/net/netdevsim/Makefile new file mode 100644 index 000000000000..074ddebbc41d --- /dev/null +++ b/drivers/net/netdevsim/Makefile @@ -0,0 +1,7 @@ +# SPDX-License-Identifier: GPL-2.0 + +obj-$(CONFIG_NETDEVSIM) += netdevsim.o + +netdevsim-objs := \ + netdev.o \ + bpf.o \ diff --git a/drivers/net/netdevsim/bpf.c b/drivers/net/netdevsim/bpf.c new file mode 100644 index 000000000000..078d2c37a6c1 --- /dev/null +++ b/drivers/net/netdevsim/bpf.c @@ -0,0 +1,373 @@ +/* + * Copyright (C) 2017 Netronome Systems, Inc. + * + * This software is licensed under the GNU General License Version 2, + * June 1991 as shown in the file COPYING in the top-level directory of this + * source tree. + * + * THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" + * WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, + * BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS + * FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE + * OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME + * THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. + */ + +#include <linux/bpf.h> +#include <linux/bpf_verifier.h> +#include <linux/debugfs.h> +#include <linux/kernel.h> +#include <linux/rtnetlink.h> +#include <net/pkt_cls.h> + +#include "netdevsim.h" + +struct nsim_bpf_bound_prog { + struct netdevsim *ns; + struct bpf_prog *prog; + struct dentry *ddir; + const char *state; + bool is_loaded; + struct list_head l; +}; + +static int nsim_debugfs_bpf_string_read(struct seq_file *file, void *data) +{ + const char **str = file->private; + + if (*str) + seq_printf(file, "%s\n", *str); + + return 0; +} + +static int nsim_debugfs_bpf_string_open(struct inode *inode, struct file *f) +{ + return single_open(f, nsim_debugfs_bpf_string_read, inode->i_private); +} + +static const struct file_operations nsim_bpf_string_fops = { + .owner = THIS_MODULE, + .open = nsim_debugfs_bpf_string_open, + .release = single_release, + .read = seq_read, + .llseek = seq_lseek +}; + +static int +nsim_bpf_verify_insn(struct bpf_verifier_env *env, int insn_idx, int prev_insn) +{ + struct nsim_bpf_bound_prog *state; + + state = env->prog->aux->offload->dev_priv; + if (state->ns->bpf_bind_verifier_delay && !insn_idx) + msleep(state->ns->bpf_bind_verifier_delay); + + return 0; +} + +static const struct bpf_ext_analyzer_ops nsim_bpf_analyzer_ops = { + .insn_hook = nsim_bpf_verify_insn, +}; + +static bool nsim_xdp_offload_active(struct netdevsim *ns) +{ + return ns->xdp_prog_mode == XDP_ATTACHED_HW; +} + +static void nsim_prog_set_loaded(struct bpf_prog *prog, bool loaded) +{ + struct nsim_bpf_bound_prog *state; + + if (!prog || !prog->aux->offload) + return; + + state = prog->aux->offload->dev_priv; + state->is_loaded = loaded; +} + +static int +nsim_bpf_offload(struct netdevsim *ns, struct bpf_prog *prog, bool oldprog) +{ + nsim_prog_set_loaded(ns->bpf_offloaded, false); + + WARN(!!ns->bpf_offloaded != oldprog, + "bad offload state, expected offload %sto be active", + oldprog ? "" : "not "); + ns->bpf_offloaded = prog; + ns->bpf_offloaded_id = prog ? prog->aux->id : 0; + nsim_prog_set_loaded(prog, true); + + return 0; +} + +int nsim_bpf_setup_tc_block_cb(enum tc_setup_type type, + void *type_data, void *cb_priv) +{ + struct tc_cls_bpf_offload *cls_bpf = type_data; + struct bpf_prog *prog = cls_bpf->prog; + struct netdevsim *ns = cb_priv; + bool skip_sw; + + if (type != TC_SETUP_CLSBPF || + !tc_can_offload(ns->netdev) || + cls_bpf->common.protocol != htons(ETH_P_ALL) || + cls_bpf->common.chain_index) + return -EOPNOTSUPP; + + skip_sw = cls_bpf->gen_flags & TCA_CLS_FLAGS_SKIP_SW; + + if (nsim_xdp_offload_active(ns)) + return -EBUSY; + + if (!ns->bpf_tc_accept) + return -EOPNOTSUPP; + /* Note: progs without skip_sw will probably not be dev bound */ + if (prog && !prog->aux->offload && !ns->bpf_tc_non_bound_accept) + return -EOPNOTSUPP; + + switch (cls_bpf->command) { + case TC_CLSBPF_REPLACE: + return nsim_bpf_offload(ns, prog, true); + case TC_CLSBPF_ADD: + return nsim_bpf_offload(ns, prog, false); + case TC_CLSBPF_DESTROY: + return nsim_bpf_offload(ns, NULL, true); + default: + return -EOPNOTSUPP; + } +} + +int nsim_bpf_disable_tc(struct netdevsim *ns) +{ + if (ns->bpf_offloaded && !nsim_xdp_offload_active(ns)) + return -EBUSY; + return 0; +} + +static int nsim_xdp_offload_prog(struct netdevsim *ns, struct netdev_bpf *bpf) +{ + if (!nsim_xdp_offload_active(ns) && !bpf->prog) + return 0; + if (!nsim_xdp_offload_active(ns) && bpf->prog && ns->bpf_offloaded) { + NSIM_EA(bpf->extack, "TC program is already loaded"); + return -EBUSY; + } + + return nsim_bpf_offload(ns, bpf->prog, nsim_xdp_offload_active(ns)); +} + +static int nsim_xdp_set_prog(struct netdevsim *ns, struct netdev_bpf *bpf) +{ + int err; + + if (ns->xdp_prog && (bpf->flags ^ ns->xdp_flags) & XDP_FLAGS_MODES) { + NSIM_EA(bpf->extack, "program loaded with different flags"); + return -EBUSY; + } + + if (bpf->command == XDP_SETUP_PROG && !ns->bpf_xdpdrv_accept) { + NSIM_EA(bpf->extack, "driver XDP disabled in DebugFS"); + return -EOPNOTSUPP; + } + if (bpf->command == XDP_SETUP_PROG_HW && !ns->bpf_xdpoffload_accept) { + NSIM_EA(bpf->extack, "XDP offload disabled in DebugFS"); + return -EOPNOTSUPP; + } + + if (bpf->command == XDP_SETUP_PROG_HW) { + err = nsim_xdp_offload_prog(ns, bpf); + if (err) + return err; + } + + if (ns->xdp_prog) + bpf_prog_put(ns->xdp_prog); + + ns->xdp_prog = bpf->prog; + ns->xdp_flags = bpf->flags; + + if (!bpf->prog) + ns->xdp_prog_mode = XDP_ATTACHED_NONE; + else if (bpf->command == XDP_SETUP_PROG) + ns->xdp_prog_mode = XDP_ATTACHED_DRV; + else + ns->xdp_prog_mode = XDP_ATTACHED_HW; + + return 0; +} + +static int nsim_bpf_create_prog(struct netdevsim *ns, struct bpf_prog *prog) +{ + struct nsim_bpf_bound_prog *state; + char name[16]; + int err; + + state = kzalloc(sizeof(*state), GFP_KERNEL); + if (!state) + return -ENOMEM; + + state->ns = ns; + state->prog = prog; + state->state = "verify"; + + /* Program id is not populated yet when we create the state. */ + sprintf(name, "%u", ns->prog_id_gen++); + state->ddir = debugfs_create_dir(name, ns->ddir_bpf_bound_progs); + if (IS_ERR(state->ddir)) { + err = PTR_ERR(state->ddir); + kfree(state); + return err; + } + + debugfs_create_u32("id", 0400, state->ddir, &prog->aux->id); + debugfs_create_file("state", 0400, state->ddir, + &state->state, &nsim_bpf_string_fops); + debugfs_create_bool("loaded", 0400, state->ddir, &state->is_loaded); + + list_add_tail(&state->l, &ns->bpf_bound_progs); + + prog->aux->offload->dev_priv = state; + + return 0; +} + +static void nsim_bpf_destroy_prog(struct bpf_prog *prog) +{ + struct nsim_bpf_bound_prog *state; + + state = prog->aux->offload->dev_priv; + WARN(state->is_loaded, + "offload state destroyed while program still bound"); + debugfs_remove_recursive(state->ddir); + list_del(&state->l); + kfree(state); +} + +static int nsim_setup_prog_checks(struct netdevsim *ns, struct netdev_bpf *bpf) +{ + if (bpf->prog && bpf->prog->aux->offload) { + NSIM_EA(bpf->extack, "attempt to load offloaded prog to drv"); + return -EINVAL; + } + if (ns->netdev->mtu > NSIM_XDP_MAX_MTU) { + NSIM_EA(bpf->extack, "MTU too large w/ XDP enabled"); + return -EINVAL; + } + if (nsim_xdp_offload_active(ns)) { + NSIM_EA(bpf->extack, "xdp offload active, can't load drv prog"); + return -EBUSY; + } + return 0; +} + +static int +nsim_setup_prog_hw_checks(struct netdevsim *ns, struct netdev_bpf *bpf) +{ + struct nsim_bpf_bound_prog *state; + + if (!bpf->prog) + return 0; + + if (!bpf->prog->aux->offload) { + NSIM_EA(bpf->extack, "xdpoffload of non-bound program"); + return -EINVAL; + } + if (bpf->prog->aux->offload->netdev != ns->netdev) { + NSIM_EA(bpf->extack, "program bound to different dev"); + return -EINVAL; + } + + state = bpf->prog->aux->offload->dev_priv; + if (WARN_ON(strcmp(state->state, "xlated"))) { + NSIM_EA(bpf->extack, "offloading program in bad state"); + return -EINVAL; + } + return 0; +} + +int nsim_bpf(struct net_device *dev, struct netdev_bpf *bpf) +{ + struct netdevsim *ns = netdev_priv(dev); + struct nsim_bpf_bound_prog *state; + int err; + + ASSERT_RTNL(); + + switch (bpf->command) { + case BPF_OFFLOAD_VERIFIER_PREP: + if (!ns->bpf_bind_accept) + return -EOPNOTSUPP; + + err = nsim_bpf_create_prog(ns, bpf->verifier.prog); + if (err) + return err; + + bpf->verifier.ops = &nsim_bpf_analyzer_ops; + return 0; + case BPF_OFFLOAD_TRANSLATE: + state = bpf->offload.prog->aux->offload->dev_priv; + + state->state = "xlated"; + return 0; + case BPF_OFFLOAD_DESTROY: + nsim_bpf_destroy_prog(bpf->offload.prog); + return 0; + case XDP_QUERY_PROG: + bpf->prog_attached = ns->xdp_prog_mode; + bpf->prog_id = ns->xdp_prog ? ns->xdp_prog->aux->id : 0; + bpf->prog_flags = ns->xdp_prog ? ns->xdp_flags : 0; + return 0; + case XDP_SETUP_PROG: + err = nsim_setup_prog_checks(ns, bpf); + if (err) + return err; + + return nsim_xdp_set_prog(ns, bpf); + case XDP_SETUP_PROG_HW: + err = nsim_setup_prog_hw_checks(ns, bpf); + if (err) + return err; + + return nsim_xdp_set_prog(ns, bpf); + default: + return -EINVAL; + } +} + +int nsim_bpf_init(struct netdevsim *ns) +{ + INIT_LIST_HEAD(&ns->bpf_bound_progs); + + debugfs_create_u32("bpf_offloaded_id", 0400, ns->ddir, + &ns->bpf_offloaded_id); + + ns->bpf_bind_accept = true; + debugfs_create_bool("bpf_bind_accept", 0600, ns->ddir, + &ns->bpf_bind_accept); + debugfs_create_u32("bpf_bind_verifier_delay", 0600, ns->ddir, + &ns->bpf_bind_verifier_delay); + ns->ddir_bpf_bound_progs = + debugfs_create_dir("bpf_bound_progs", ns->ddir); + + ns->bpf_tc_accept = true; + debugfs_create_bool("bpf_tc_accept", 0600, ns->ddir, + &ns->bpf_tc_accept); + debugfs_create_bool("bpf_tc_non_bound_accept", 0600, ns->ddir, + &ns->bpf_tc_non_bound_accept); + ns->bpf_xdpdrv_accept = true; + debugfs_create_bool("bpf_xdpdrv_accept", 0600, ns->ddir, + &ns->bpf_xdpdrv_accept); + ns->bpf_xdpoffload_accept = true; + debugfs_create_bool("bpf_xdpoffload_accept", 0600, ns->ddir, + &ns->bpf_xdpoffload_accept); + + return 0; +} + +void nsim_bpf_uninit(struct netdevsim *ns) +{ + WARN_ON(!list_empty(&ns->bpf_bound_progs)); + WARN_ON(ns->xdp_prog); + WARN_ON(ns->bpf_offloaded); +} diff --git a/drivers/net/netdevsim/netdev.c b/drivers/net/netdevsim/netdev.c new file mode 100644 index 000000000000..eb8c679fca9f --- /dev/null +++ b/drivers/net/netdevsim/netdev.c @@ -0,0 +1,502 @@ +/* + * Copyright (C) 2017 Netronome Systems, Inc. + * + * This software is licensed under the GNU General License Version 2, + * June 1991 as shown in the file COPYING in the top-level directory of this + * source tree. + * + * THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" + * WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, + * BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS + * FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE + * OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME + * THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. + */ + +#include <linux/debugfs.h> +#include <linux/etherdevice.h> +#include <linux/kernel.h> +#include <linux/module.h> +#include <linux/netdevice.h> +#include <linux/slab.h> +#include <net/netlink.h> +#include <net/pkt_cls.h> +#include <net/rtnetlink.h> + +#include "netdevsim.h" + +struct nsim_vf_config { + int link_state; + u16 min_tx_rate; + u16 max_tx_rate; + u16 vlan; + __be16 vlan_proto; + u16 qos; + u8 vf_mac[ETH_ALEN]; + bool spoofchk_enabled; + bool trusted; + bool rss_query_enabled; +}; + +static u32 nsim_dev_id; + +static int nsim_num_vf(struct device *dev) +{ + struct netdevsim *ns = to_nsim(dev); + + return ns->num_vfs; +} + +static struct bus_type nsim_bus = { + .name = DRV_NAME, + .dev_name = DRV_NAME, + .num_vf = nsim_num_vf, +}; + +static int nsim_vfs_enable(struct netdevsim *ns, unsigned int num_vfs) +{ + ns->vfconfigs = kcalloc(num_vfs, sizeof(struct nsim_vf_config), + GFP_KERNEL); + if (!ns->vfconfigs) + return -ENOMEM; + ns->num_vfs = num_vfs; + + return 0; +} + +static void nsim_vfs_disable(struct netdevsim *ns) +{ + kfree(ns->vfconfigs); + ns->vfconfigs = NULL; + ns->num_vfs = 0; +} + +static ssize_t +nsim_numvfs_store(struct device *dev, struct device_attribute *attr, + const char *buf, size_t count) +{ + struct netdevsim *ns = to_nsim(dev); + unsigned int num_vfs; + int ret; + + ret = kstrtouint(buf, 0, &num_vfs); + if (ret) + return ret; + + rtnl_lock(); + if (ns->num_vfs == num_vfs) + goto exit_good; + if (ns->num_vfs && num_vfs) { + ret = -EBUSY; + goto exit_unlock; + } + + if (num_vfs) { + ret = nsim_vfs_enable(ns, num_vfs); + if (ret) + goto exit_unlock; + } else { + nsim_vfs_disable(ns); + } +exit_good: + ret = count; +exit_unlock: + rtnl_unlock(); + + return ret; +} + +static ssize_t +nsim_numvfs_show(struct device *dev, struct device_attribute *attr, char *buf) +{ + struct netdevsim *ns = to_nsim(dev); + + return sprintf(buf, "%u\n", ns->num_vfs); +} + +static struct device_attribute nsim_numvfs_attr = + __ATTR(sriov_numvfs, 0664, nsim_numvfs_show, nsim_numvfs_store); + +static struct attribute *nsim_dev_attrs[] = { + &nsim_numvfs_attr.attr, + NULL, +}; + +static const struct attribute_group nsim_dev_attr_group = { + .attrs = nsim_dev_attrs, +}; + +static const struct attribute_group *nsim_dev_attr_groups[] = { + &nsim_dev_attr_group, + NULL, +}; + +static void nsim_dev_release(struct device *dev) +{ + struct netdevsim *ns = to_nsim(dev); + + nsim_vfs_disable(ns); + free_netdev(ns->netdev); +} + +struct device_type nsim_dev_type = { + .groups = nsim_dev_attr_groups, + .release = nsim_dev_release, +}; + +static int nsim_init(struct net_device *dev) +{ + struct netdevsim *ns = netdev_priv(dev); + int err; + + ns->netdev = dev; + ns->ddir = debugfs_create_dir(netdev_name(dev), nsim_ddir); + + err = nsim_bpf_init(ns); + if (err) + goto err_debugfs_destroy; + + ns->dev.id = nsim_dev_id++; + ns->dev.bus = &nsim_bus; + ns->dev.type = &nsim_dev_type; + err = device_register(&ns->dev); + if (err) + goto err_bpf_uninit; + + SET_NETDEV_DEV(dev, &ns->dev); + + return 0; + +err_bpf_uninit: + nsim_bpf_uninit(ns); +err_debugfs_destroy: + debugfs_remove_recursive(ns->ddir); + return err; +} + +static void nsim_uninit(struct net_device *dev) +{ + struct netdevsim *ns = netdev_priv(dev); + + debugfs_remove_recursive(ns->ddir); + nsim_bpf_uninit(ns); +} + +static void nsim_free(struct net_device *dev) +{ + struct netdevsim *ns = netdev_priv(dev); + + device_unregister(&ns->dev); + /* netdev and vf state will be freed out of device_release() */ +} + +static netdev_tx_t nsim_start_xmit(struct sk_buff *skb, struct net_device *dev) +{ + struct netdevsim *ns = netdev_priv(dev); + + u64_stats_update_begin(&ns->syncp); + ns->tx_packets++; + ns->tx_bytes += skb->len; + u64_stats_update_end(&ns->syncp); + + dev_kfree_skb(skb); + + return NETDEV_TX_OK; +} + +static void nsim_set_rx_mode(struct net_device *dev) +{ +} + +static int nsim_change_mtu(struct net_device *dev, int new_mtu) +{ + struct netdevsim *ns = netdev_priv(dev); + + if (ns->xdp_prog_mode == XDP_ATTACHED_DRV && + new_mtu > NSIM_XDP_MAX_MTU) + return -EBUSY; + + dev->mtu = new_mtu; + + return 0; +} + +static void +nsim_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats) +{ + struct netdevsim *ns = netdev_priv(dev); + unsigned int start; + + do { + start = u64_stats_fetch_begin(&ns->syncp); + stats->tx_bytes = ns->tx_bytes; + stats->tx_packets = ns->tx_packets; + } while (u64_stats_fetch_retry(&ns->syncp, start)); +} + +static int +nsim_setup_tc_block_cb(enum tc_setup_type type, void *type_data, void *cb_priv) +{ + return nsim_bpf_setup_tc_block_cb(type, type_data, cb_priv); +} + +static int +nsim_setup_tc_block(struct net_device *dev, struct tc_block_offload *f) +{ + struct netdevsim *ns = netdev_priv(dev); + + if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS) + return -EOPNOTSUPP; + + switch (f->command) { + case TC_BLOCK_BIND: + return tcf_block_cb_register(f->block, nsim_setup_tc_block_cb, + ns, ns); + case TC_BLOCK_UNBIND: + tcf_block_cb_unregister(f->block, nsim_setup_tc_block_cb, ns); + return 0; + default: + return -EOPNOTSUPP; + } +} + +static int nsim_set_vf_mac(struct net_device *dev, int vf, u8 *mac) +{ + struct netdevsim *ns = netdev_priv(dev); + + /* Only refuse multicast addresses, zero address can mean unset/any. */ + if (vf >= ns->num_vfs || is_multicast_ether_addr(mac)) + return -EINVAL; + memcpy(ns->vfconfigs[vf].vf_mac, mac, ETH_ALEN); + + return 0; +} + +static int nsim_set_vf_vlan(struct net_device *dev, int vf, + u16 vlan, u8 qos, __be16 vlan_proto) +{ + struct netdevsim *ns = netdev_priv(dev); + + if (vf >= ns->num_vfs || vlan > 4095 || qos > 7) + return -EINVAL; + + ns->vfconfigs[vf].vlan = vlan; + ns->vfconfigs[vf].qos = qos; + ns->vfconfigs[vf].vlan_proto = vlan_proto; + + return 0; +} + +static int nsim_set_vf_rate(struct net_device *dev, int vf, int min, int max) +{ + struct netdevsim *ns = netdev_priv(dev); + + if (vf >= ns->num_vfs) + return -EINVAL; + + ns->vfconfigs[vf].min_tx_rate = min; + ns->vfconfigs[vf].max_tx_rate = max; + + return 0; +} + +static int nsim_set_vf_spoofchk(struct net_device *dev, int vf, bool val) +{ + struct netdevsim *ns = netdev_priv(dev); + + if (vf >= ns->num_vfs) + return -EINVAL; + ns->vfconfigs[vf].spoofchk_enabled = val; + + return 0; +} + +static int nsim_set_vf_rss_query_en(struct net_device *dev, int vf, bool val) +{ + struct netdevsim *ns = netdev_priv(dev); + + if (vf >= ns->num_vfs) + return -EINVAL; + ns->vfconfigs[vf].rss_query_enabled = val; + + return 0; +} + +static int nsim_set_vf_trust(struct net_device *dev, int vf, bool val) +{ + struct netdevsim *ns = netdev_priv(dev); + + if (vf >= ns->num_vfs) + return -EINVAL; + ns->vfconfigs[vf].trusted = val; + + return 0; +} + +static int +nsim_get_vf_config(struct net_device *dev, int vf, struct ifla_vf_info *ivi) +{ + struct netdevsim *ns = netdev_priv(dev); + + if (vf >= ns->num_vfs) + return -EINVAL; + + ivi->vf = vf; + ivi->linkstate = ns->vfconfigs[vf].link_state; + ivi->min_tx_rate = ns->vfconfigs[vf].min_tx_rate; + ivi->max_tx_rate = ns->vfconfigs[vf].max_tx_rate; + ivi->vlan = ns->vfconfigs[vf].vlan; + ivi->vlan_proto = ns->vfconfigs[vf].vlan_proto; + ivi->qos = ns->vfconfigs[vf].qos; + memcpy(&ivi->mac, ns->vfconfigs[vf].vf_mac, ETH_ALEN); + ivi->spoofchk = ns->vfconfigs[vf].spoofchk_enabled; + ivi->trusted = ns->vfconfigs[vf].trusted; + ivi->rss_query_en = ns->vfconfigs[vf].rss_query_enabled; + + return 0; +} + +static int nsim_set_vf_link_state(struct net_device *dev, int vf, int state) +{ + struct netdevsim *ns = netdev_priv(dev); + + if (vf >= ns->num_vfs) + return -EINVAL; + + switch (state) { + case IFLA_VF_LINK_STATE_AUTO: + case IFLA_VF_LINK_STATE_ENABLE: + case IFLA_VF_LINK_STATE_DISABLE: + break; + default: + return -EINVAL; + } + + ns->vfconfigs[vf].link_state = state; + + return 0; +} + +static int +nsim_setup_tc(struct net_device *dev, enum tc_setup_type type, void *type_data) +{ + switch (type) { + case TC_SETUP_BLOCK: + return nsim_setup_tc_block(dev, type_data); + default: + return -EOPNOTSUPP; + } +} + +static int +nsim_set_features(struct net_device *dev, netdev_features_t features) +{ + struct netdevsim *ns = netdev_priv(dev); + + if ((dev->features & NETIF_F_HW_TC) > (features & NETIF_F_HW_TC)) + return nsim_bpf_disable_tc(ns); + + return 0; +} + +static const struct net_device_ops nsim_netdev_ops = { + .ndo_init = nsim_init, + .ndo_uninit = nsim_uninit, + .ndo_start_xmit = nsim_start_xmit, + .ndo_set_rx_mode = nsim_set_rx_mode, + .ndo_set_mac_address = eth_mac_addr, + .ndo_validate_addr = eth_validate_addr, + .ndo_change_mtu = nsim_change_mtu, + .ndo_get_stats64 = nsim_get_stats64, + .ndo_set_vf_mac = nsim_set_vf_mac, + .ndo_set_vf_vlan = nsim_set_vf_vlan, + .ndo_set_vf_rate = nsim_set_vf_rate, + .ndo_set_vf_spoofchk = nsim_set_vf_spoofchk, + .ndo_set_vf_trust = nsim_set_vf_trust, + .ndo_get_vf_config = nsim_get_vf_config, + .ndo_set_vf_link_state = nsim_set_vf_link_state, + .ndo_set_vf_rss_query_en = nsim_set_vf_rss_query_en, + .ndo_setup_tc = nsim_setup_tc, + .ndo_set_features = nsim_set_features, + .ndo_bpf = nsim_bpf, +}; + +static void nsim_setup(struct net_device *dev) +{ + ether_setup(dev); + eth_hw_addr_random(dev); + + dev->netdev_ops = &nsim_netdev_ops; + dev->priv_destructor = nsim_free; + + dev->tx_queue_len = 0; + dev->flags |= IFF_NOARP; + dev->flags &= ~IFF_MULTICAST; + dev->priv_flags |= IFF_LIVE_ADDR_CHANGE | + IFF_NO_QUEUE; + dev->features |= NETIF_F_HIGHDMA | + NETIF_F_SG | + NETIF_F_FRAGLIST | + NETIF_F_HW_CSUM | + NETIF_F_TSO; + dev->hw_features |= NETIF_F_HW_TC; + dev->max_mtu = ETH_MAX_MTU; +} + +static int nsim_validate(struct nlattr *tb[], struct nlattr *data[], + struct netlink_ext_ack *extack) +{ + if (tb[IFLA_ADDRESS]) { + if (nla_len(tb[IFLA_ADDRESS]) != ETH_ALEN) + return -EINVAL; + if (!is_valid_ether_addr(nla_data(tb[IFLA_ADDRESS]))) + return -EADDRNOTAVAIL; + } + return 0; +} + +static struct rtnl_link_ops nsim_link_ops __read_mostly = { + .kind = DRV_NAME, + .priv_size = sizeof(struct netdevsim), + .setup = nsim_setup, + .validate = nsim_validate, +}; + +struct dentry *nsim_ddir; + +static int __init nsim_module_init(void) +{ + int err; + + nsim_ddir = debugfs_create_dir(DRV_NAME, NULL); + if (IS_ERR(nsim_ddir)) + return PTR_ERR(nsim_ddir); + + err = bus_register(&nsim_bus); + if (err) + goto err_debugfs_destroy; + + err = rtnl_link_register(&nsim_link_ops); + if (err) + goto err_unreg_bus; + + return 0; + +err_unreg_bus: + bus_unregister(&nsim_bus); +err_debugfs_destroy: + debugfs_remove_recursive(nsim_ddir); + return err; +} + +static void __exit nsim_module_exit(void) +{ + rtnl_link_unregister(&nsim_link_ops); + bus_unregister(&nsim_bus); + debugfs_remove_recursive(nsim_ddir); +} + +module_init(nsim_module_init); +module_exit(nsim_module_exit); +MODULE_LICENSE("GPL"); +MODULE_ALIAS_RTNL_LINK(DRV_NAME); diff --git a/drivers/net/netdevsim/netdevsim.h b/drivers/net/netdevsim/netdevsim.h new file mode 100644 index 000000000000..32270de9395a --- /dev/null +++ b/drivers/net/netdevsim/netdevsim.h @@ -0,0 +1,78 @@ +/* + * Copyright (C) 2017 Netronome Systems, Inc. + * + * This software is licensed under the GNU General License Version 2, + * June 1991 as shown in the file COPYING in the top-level directory of this + * source tree. + * + * THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" + * WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, + * BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS + * FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE + * OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME + * THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. + */ + +#include <linux/device.h> +#include <linux/kernel.h> +#include <linux/list.h> +#include <linux/netdevice.h> +#include <linux/u64_stats_sync.h> + +#define DRV_NAME "netdevsim" + +#define NSIM_XDP_MAX_MTU 4000 + +#define NSIM_EA(extack, msg) NL_SET_ERR_MSG_MOD((extack), msg) + +struct bpf_prog; +struct dentry; +struct nsim_vf_config; + +struct netdevsim { + struct net_device *netdev; + + u64 tx_packets; + u64 tx_bytes; + struct u64_stats_sync syncp; + + struct device dev; + + struct dentry *ddir; + + unsigned int num_vfs; + struct nsim_vf_config *vfconfigs; + + struct bpf_prog *bpf_offloaded; + u32 bpf_offloaded_id; + + u32 xdp_flags; + int xdp_prog_mode; + struct bpf_prog *xdp_prog; + + u32 prog_id_gen; + + bool bpf_bind_accept; + u32 bpf_bind_verifier_delay; + struct dentry *ddir_bpf_bound_progs; + struct list_head bpf_bound_progs; + + bool bpf_tc_accept; + bool bpf_tc_non_bound_accept; + bool bpf_xdpdrv_accept; + bool bpf_xdpoffload_accept; +}; + +extern struct dentry *nsim_ddir; + +int nsim_bpf_init(struct netdevsim *ns); +void nsim_bpf_uninit(struct netdevsim *ns); +int nsim_bpf(struct net_device *dev, struct netdev_bpf *bpf); +int nsim_bpf_disable_tc(struct netdevsim *ns); +int nsim_bpf_setup_tc_block_cb(enum tc_setup_type type, + void *type_data, void *cb_priv); + +static inline struct netdevsim *to_nsim(struct device *ptr) +{ + return container_of(ptr, struct netdevsim, dev); +} diff --git a/drivers/net/phy/amd.c b/drivers/net/phy/amd.c index 18141c022b13..6fe5dc9201d0 100644 --- a/drivers/net/phy/amd.c +++ b/drivers/net/phy/amd.c @@ -68,8 +68,6 @@ static struct phy_driver am79c_driver[] = { { .features = PHY_BASIC_FEATURES, .flags = PHY_HAS_INTERRUPT, .config_init = am79c_config_init, - .config_aneg = genphy_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = am79c_ack_interrupt, .config_intr = am79c_config_intr, } }; diff --git a/drivers/net/phy/at803x.c b/drivers/net/phy/at803x.c index 5f93e6add563..29da7a3c7a37 100644 --- a/drivers/net/phy/at803x.c +++ b/drivers/net/phy/at803x.c @@ -71,7 +71,6 @@ MODULE_LICENSE("GPL"); struct at803x_priv { bool phy_reset:1; - struct gpio_desc *gpiod_reset; }; struct at803x_context { @@ -254,22 +253,11 @@ static int at803x_probe(struct phy_device *phydev) { struct device *dev = &phydev->mdio.dev; struct at803x_priv *priv; - struct gpio_desc *gpiod_reset; priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL); if (!priv) return -ENOMEM; - if (phydev->drv->phy_id != ATH8030_PHY_ID) - goto does_not_require_reset_workaround; - - gpiod_reset = devm_gpiod_get_optional(dev, "reset", GPIOD_OUT_LOW); - if (IS_ERR(gpiod_reset)) - return PTR_ERR(gpiod_reset); - - priv->gpiod_reset = gpiod_reset; - -does_not_require_reset_workaround: phydev->priv = priv; return 0; @@ -343,14 +331,14 @@ static void at803x_link_change_notify(struct phy_device *phydev) * cannot recover from by software. */ if (phydev->state == PHY_NOLINK) { - if (priv->gpiod_reset && !priv->phy_reset) { + if (phydev->mdio.reset && !priv->phy_reset) { struct at803x_context context; at803x_context_save(phydev, &context); - gpiod_set_value(priv->gpiod_reset, 1); + phy_device_reset(phydev, 1); msleep(1); - gpiod_set_value(priv->gpiod_reset, 0); + phy_device_reset(phydev, 0); msleep(1); at803x_context_restore(phydev, &context); @@ -408,8 +396,6 @@ static struct phy_driver at803x_driver[] = { .resume = at803x_resume, .features = PHY_GBIT_FEATURES, .flags = PHY_HAS_INTERRUPT, - .config_aneg = genphy_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = at803x_ack_interrupt, .config_intr = at803x_config_intr, }, { @@ -426,8 +412,6 @@ static struct phy_driver at803x_driver[] = { .resume = at803x_resume, .features = PHY_BASIC_FEATURES, .flags = PHY_HAS_INTERRUPT, - .config_aneg = genphy_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = at803x_ack_interrupt, .config_intr = at803x_config_intr, }, { @@ -443,8 +427,6 @@ static struct phy_driver at803x_driver[] = { .resume = at803x_resume, .features = PHY_GBIT_FEATURES, .flags = PHY_HAS_INTERRUPT, - .config_aneg = genphy_config_aneg, - .read_status = genphy_read_status, .aneg_done = at803x_aneg_done, .ack_interrupt = &at803x_ack_interrupt, .config_intr = &at803x_config_intr, diff --git a/drivers/net/phy/bcm-cygnus.c b/drivers/net/phy/bcm-cygnus.c index 3fe8cc5c177e..6838129839ca 100644 --- a/drivers/net/phy/bcm-cygnus.c +++ b/drivers/net/phy/bcm-cygnus.c @@ -136,8 +136,6 @@ static struct phy_driver bcm_cygnus_phy_driver[] = { .name = "Broadcom Cygnus PHY", .features = PHY_GBIT_FEATURES, .config_init = bcm_cygnus_config_init, - .config_aneg = genphy_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = bcm_phy_ack_intr, .config_intr = bcm_phy_config_intr, .suspend = genphy_suspend, diff --git a/drivers/net/phy/bcm63xx.c b/drivers/net/phy/bcm63xx.c index b0492ef2cdaa..cf14613745c9 100644 --- a/drivers/net/phy/bcm63xx.c +++ b/drivers/net/phy/bcm63xx.c @@ -69,8 +69,6 @@ static struct phy_driver bcm63xx_driver[] = { .features = (PHY_BASIC_FEATURES | SUPPORTED_Pause), .flags = PHY_HAS_INTERRUPT | PHY_IS_INTERNAL, .config_init = bcm63xx_config_init, - .config_aneg = genphy_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = bcm_phy_ack_intr, .config_intr = bcm63xx_config_intr, }, { @@ -81,8 +79,6 @@ static struct phy_driver bcm63xx_driver[] = { .features = (PHY_BASIC_FEATURES | SUPPORTED_Pause), .flags = PHY_HAS_INTERRUPT | PHY_IS_INTERNAL, .config_init = bcm63xx_config_init, - .config_aneg = genphy_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = bcm_phy_ack_intr, .config_intr = bcm63xx_config_intr, } }; diff --git a/drivers/net/phy/bcm7xxx.c b/drivers/net/phy/bcm7xxx.c index 8b33f688ac8a..421feb8f92fe 100644 --- a/drivers/net/phy/bcm7xxx.c +++ b/drivers/net/phy/bcm7xxx.c @@ -611,8 +611,6 @@ static int bcm7xxx_28nm_probe(struct phy_device *phydev) .features = PHY_GBIT_FEATURES, \ .flags = PHY_IS_INTERNAL, \ .config_init = bcm7xxx_28nm_config_init, \ - .config_aneg = genphy_config_aneg, \ - .read_status = genphy_read_status, \ .resume = bcm7xxx_28nm_resume, \ .get_tunable = bcm7xxx_28nm_get_tunable, \ .set_tunable = bcm7xxx_28nm_set_tunable, \ @@ -630,8 +628,6 @@ static int bcm7xxx_28nm_probe(struct phy_device *phydev) .features = PHY_BASIC_FEATURES, \ .flags = PHY_IS_INTERNAL, \ .config_init = bcm7xxx_28nm_ephy_config_init, \ - .config_aneg = genphy_config_aneg, \ - .read_status = genphy_read_status, \ .resume = bcm7xxx_28nm_ephy_resume, \ .get_sset_count = bcm_phy_get_sset_count, \ .get_strings = bcm_phy_get_strings, \ @@ -647,8 +643,6 @@ static int bcm7xxx_28nm_probe(struct phy_device *phydev) .features = PHY_BASIC_FEATURES, \ .flags = PHY_IS_INTERNAL, \ .config_init = bcm7xxx_config_init, \ - .config_aneg = genphy_config_aneg, \ - .read_status = genphy_read_status, \ .suspend = bcm7xxx_suspend, \ .resume = bcm7xxx_config_init, \ } diff --git a/drivers/net/phy/broadcom.c b/drivers/net/phy/broadcom.c index d7ed69deabfb..a8f69c5777bc 100644 --- a/drivers/net/phy/broadcom.c +++ b/drivers/net/phy/broadcom.c @@ -548,8 +548,6 @@ static struct phy_driver broadcom_drivers[] = { .features = PHY_GBIT_FEATURES, .flags = PHY_HAS_INTERRUPT, .config_init = bcm54xx_config_init, - .config_aneg = genphy_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = bcm_phy_ack_intr, .config_intr = bcm_phy_config_intr, }, { @@ -559,8 +557,6 @@ static struct phy_driver broadcom_drivers[] = { .features = PHY_GBIT_FEATURES, .flags = PHY_HAS_INTERRUPT, .config_init = bcm54xx_config_init, - .config_aneg = genphy_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = bcm_phy_ack_intr, .config_intr = bcm_phy_config_intr, }, { @@ -570,8 +566,6 @@ static struct phy_driver broadcom_drivers[] = { .features = PHY_GBIT_FEATURES, .flags = PHY_HAS_INTERRUPT, .config_init = bcm54xx_config_init, - .config_aneg = genphy_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = bcm_phy_ack_intr, .config_intr = bcm_phy_config_intr, }, { @@ -581,8 +575,6 @@ static struct phy_driver broadcom_drivers[] = { .features = PHY_GBIT_FEATURES, .flags = PHY_HAS_INTERRUPT, .config_init = bcm54xx_config_init, - .config_aneg = genphy_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = bcm_phy_ack_intr, .config_intr = bcm_phy_config_intr, }, { @@ -592,8 +584,6 @@ static struct phy_driver broadcom_drivers[] = { .features = PHY_GBIT_FEATURES, .flags = PHY_HAS_INTERRUPT, .config_init = bcm54xx_config_init, - .config_aneg = genphy_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = bcm_phy_ack_intr, .config_intr = bcm_phy_config_intr, }, { @@ -603,8 +593,6 @@ static struct phy_driver broadcom_drivers[] = { .features = PHY_GBIT_FEATURES, .flags = PHY_HAS_INTERRUPT, .config_init = bcm54xx_config_init, - .config_aneg = genphy_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = bcm_phy_ack_intr, .config_intr = bcm_phy_config_intr, }, { @@ -614,8 +602,6 @@ static struct phy_driver broadcom_drivers[] = { .features = PHY_GBIT_FEATURES, .flags = PHY_HAS_INTERRUPT, .config_init = bcm54xx_config_init, - .config_aneg = genphy_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = bcm_phy_ack_intr, .config_intr = bcm_phy_config_intr, }, { @@ -626,7 +612,6 @@ static struct phy_driver broadcom_drivers[] = { .flags = PHY_HAS_INTERRUPT, .config_init = bcm54xx_config_init, .config_aneg = bcm5481_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = bcm_phy_ack_intr, .config_intr = bcm_phy_config_intr, }, { @@ -637,7 +622,6 @@ static struct phy_driver broadcom_drivers[] = { .flags = PHY_HAS_INTERRUPT, .config_init = bcm54xx_config_init, .config_aneg = bcm5481_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = bcm_phy_ack_intr, .config_intr = bcm_phy_config_intr, }, { @@ -647,7 +631,6 @@ static struct phy_driver broadcom_drivers[] = { .features = PHY_GBIT_FEATURES, .flags = PHY_HAS_INTERRUPT, .config_init = bcm5482_config_init, - .config_aneg = genphy_config_aneg, .read_status = bcm5482_read_status, .ack_interrupt = bcm_phy_ack_intr, .config_intr = bcm_phy_config_intr, @@ -658,8 +641,6 @@ static struct phy_driver broadcom_drivers[] = { .features = PHY_GBIT_FEATURES, .flags = PHY_HAS_INTERRUPT, .config_init = bcm54xx_config_init, - .config_aneg = genphy_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = bcm_phy_ack_intr, .config_intr = bcm_phy_config_intr, }, { @@ -669,8 +650,6 @@ static struct phy_driver broadcom_drivers[] = { .features = PHY_GBIT_FEATURES, .flags = PHY_HAS_INTERRUPT, .config_init = bcm54xx_config_init, - .config_aneg = genphy_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = bcm_phy_ack_intr, .config_intr = bcm_phy_config_intr, }, { @@ -680,8 +659,6 @@ static struct phy_driver broadcom_drivers[] = { .features = PHY_GBIT_FEATURES, .flags = PHY_HAS_INTERRUPT, .config_init = bcm54xx_config_init, - .config_aneg = genphy_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = bcm_phy_ack_intr, .config_intr = bcm_phy_config_intr, }, { @@ -691,8 +668,6 @@ static struct phy_driver broadcom_drivers[] = { .features = PHY_BASIC_FEATURES, .flags = PHY_HAS_INTERRUPT, .config_init = brcm_fet_config_init, - .config_aneg = genphy_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = brcm_fet_ack_interrupt, .config_intr = brcm_fet_config_intr, }, { @@ -702,8 +677,6 @@ static struct phy_driver broadcom_drivers[] = { .features = PHY_BASIC_FEATURES, .flags = PHY_HAS_INTERRUPT, .config_init = brcm_fet_config_init, - .config_aneg = genphy_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = brcm_fet_ack_interrupt, .config_intr = brcm_fet_config_intr, } }; diff --git a/drivers/net/phy/cicada.c b/drivers/net/phy/cicada.c index d339c1afea77..c05af00bf4b6 100644 --- a/drivers/net/phy/cicada.c +++ b/drivers/net/phy/cicada.c @@ -110,8 +110,6 @@ static struct phy_driver cis820x_driver[] = { .features = PHY_GBIT_FEATURES, .flags = PHY_HAS_INTERRUPT, .config_init = &cis820x_config_init, - .config_aneg = &genphy_config_aneg, - .read_status = &genphy_read_status, .ack_interrupt = &cis820x_ack_interrupt, .config_intr = &cis820x_config_intr, }, { @@ -121,8 +119,6 @@ static struct phy_driver cis820x_driver[] = { .features = PHY_GBIT_FEATURES, .flags = PHY_HAS_INTERRUPT, .config_init = &cis820x_config_init, - .config_aneg = &genphy_config_aneg, - .read_status = &genphy_read_status, .ack_interrupt = &cis820x_ack_interrupt, .config_intr = &cis820x_config_intr, } }; diff --git a/drivers/net/phy/davicom.c b/drivers/net/phy/davicom.c index e28913d9ea7e..5ee99b3b428c 100644 --- a/drivers/net/phy/davicom.c +++ b/drivers/net/phy/davicom.c @@ -153,7 +153,6 @@ static struct phy_driver dm91xx_driver[] = { .flags = PHY_HAS_INTERRUPT, .config_init = dm9161_config_init, .config_aneg = dm9161_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = dm9161_ack_interrupt, .config_intr = dm9161_config_intr, }, { @@ -164,7 +163,6 @@ static struct phy_driver dm91xx_driver[] = { .flags = PHY_HAS_INTERRUPT, .config_init = dm9161_config_init, .config_aneg = dm9161_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = dm9161_ack_interrupt, .config_intr = dm9161_config_intr, }, { @@ -175,7 +173,6 @@ static struct phy_driver dm91xx_driver[] = { .flags = PHY_HAS_INTERRUPT, .config_init = dm9161_config_init, .config_aneg = dm9161_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = dm9161_ack_interrupt, .config_intr = dm9161_config_intr, }, { @@ -184,8 +181,6 @@ static struct phy_driver dm91xx_driver[] = { .phy_id_mask = 0x0ffffff0, .features = PHY_BASIC_FEATURES, .flags = PHY_HAS_INTERRUPT, - .config_aneg = genphy_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = dm9161_ack_interrupt, .config_intr = dm9161_config_intr, } }; diff --git a/drivers/net/phy/dp83640.c b/drivers/net/phy/dp83640.c index cbd629822f04..654f42d00092 100644 --- a/drivers/net/phy/dp83640.c +++ b/drivers/net/phy/dp83640.c @@ -1502,8 +1502,6 @@ static struct phy_driver dp83640_driver = { .probe = dp83640_probe, .remove = dp83640_remove, .config_init = dp83640_config_init, - .config_aneg = genphy_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = dp83640_ack_interrupt, .config_intr = dp83640_config_intr, .ts_info = dp83640_ts_info, diff --git a/drivers/net/phy/dp83822.c b/drivers/net/phy/dp83822.c index 14335d14e9e4..6e8a2a4f3a6e 100644 --- a/drivers/net/phy/dp83822.c +++ b/drivers/net/phy/dp83822.c @@ -325,8 +325,6 @@ static struct phy_driver dp83822_driver[] = { .set_wol = dp83822_set_wol, .ack_interrupt = dp83822_ack_interrupt, .config_intr = dp83822_config_intr, - .config_aneg = genphy_config_aneg, - .read_status = genphy_read_status, .suspend = dp83822_suspend, .resume = dp83822_resume, }, diff --git a/drivers/net/phy/dp83848.c b/drivers/net/phy/dp83848.c index 3966d43c5146..cd09c3af2117 100644 --- a/drivers/net/phy/dp83848.c +++ b/drivers/net/phy/dp83848.c @@ -95,8 +95,6 @@ MODULE_DEVICE_TABLE(mdio, dp83848_tbl); .config_init = genphy_config_init, \ .suspend = genphy_suspend, \ .resume = genphy_resume, \ - .config_aneg = genphy_config_aneg, \ - .read_status = genphy_read_status, \ \ /* IRQ related */ \ .ack_interrupt = dp83848_ack_interrupt, \ diff --git a/drivers/net/phy/dp83867.c b/drivers/net/phy/dp83867.c index c1ab976cc800..ab58224f897f 100644 --- a/drivers/net/phy/dp83867.c +++ b/drivers/net/phy/dp83867.c @@ -324,8 +324,6 @@ static struct phy_driver dp83867_driver[] = { .ack_interrupt = dp83867_ack_interrupt, .config_intr = dp83867_config_intr, - .config_aneg = genphy_config_aneg, - .read_status = genphy_read_status, .suspend = genphy_suspend, .resume = genphy_resume, }, diff --git a/drivers/net/phy/icplus.c b/drivers/net/phy/icplus.c index 567280a72241..791587a49215 100644 --- a/drivers/net/phy/icplus.c +++ b/drivers/net/phy/icplus.c @@ -227,8 +227,6 @@ static struct phy_driver icplus_driver[] = { .phy_id_mask = 0x0ffffff0, .features = PHY_GBIT_FEATURES, .config_init = &ip1001_config_init, - .config_aneg = &genphy_config_aneg, - .read_status = &genphy_read_status, .suspend = genphy_suspend, .resume = genphy_resume, }, { @@ -239,8 +237,6 @@ static struct phy_driver icplus_driver[] = { .flags = PHY_HAS_INTERRUPT, .ack_interrupt = ip101a_g_ack_interrupt, .config_init = &ip101a_g_config_init, - .config_aneg = &genphy_config_aneg, - .read_status = &genphy_read_status, .suspend = genphy_suspend, .resume = genphy_resume, } }; diff --git a/drivers/net/phy/intel-xway.c b/drivers/net/phy/intel-xway.c index 55f8c52dd2f1..a11f80cb5388 100644 --- a/drivers/net/phy/intel-xway.c +++ b/drivers/net/phy/intel-xway.c @@ -243,7 +243,6 @@ static struct phy_driver xway_gphy[] = { .flags = PHY_HAS_INTERRUPT, .config_init = xway_gphy_config_init, .config_aneg = xway_gphy14_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = xway_gphy_ack_interrupt, .did_interrupt = xway_gphy_did_interrupt, .config_intr = xway_gphy_config_intr, @@ -257,7 +256,6 @@ static struct phy_driver xway_gphy[] = { .flags = PHY_HAS_INTERRUPT, .config_init = xway_gphy_config_init, .config_aneg = xway_gphy14_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = xway_gphy_ack_interrupt, .did_interrupt = xway_gphy_did_interrupt, .config_intr = xway_gphy_config_intr, @@ -271,7 +269,6 @@ static struct phy_driver xway_gphy[] = { .flags = PHY_HAS_INTERRUPT, .config_init = xway_gphy_config_init, .config_aneg = xway_gphy14_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = xway_gphy_ack_interrupt, .did_interrupt = xway_gphy_did_interrupt, .config_intr = xway_gphy_config_intr, @@ -285,7 +282,6 @@ static struct phy_driver xway_gphy[] = { .flags = PHY_HAS_INTERRUPT, .config_init = xway_gphy_config_init, .config_aneg = xway_gphy14_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = xway_gphy_ack_interrupt, .did_interrupt = xway_gphy_did_interrupt, .config_intr = xway_gphy_config_intr, @@ -298,8 +294,6 @@ static struct phy_driver xway_gphy[] = { .features = PHY_GBIT_FEATURES, .flags = PHY_HAS_INTERRUPT, .config_init = xway_gphy_config_init, - .config_aneg = genphy_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = xway_gphy_ack_interrupt, .did_interrupt = xway_gphy_did_interrupt, .config_intr = xway_gphy_config_intr, @@ -312,8 +306,6 @@ static struct phy_driver xway_gphy[] = { .features = PHY_BASIC_FEATURES, .flags = PHY_HAS_INTERRUPT, .config_init = xway_gphy_config_init, - .config_aneg = genphy_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = xway_gphy_ack_interrupt, .did_interrupt = xway_gphy_did_interrupt, .config_intr = xway_gphy_config_intr, @@ -326,8 +318,6 @@ static struct phy_driver xway_gphy[] = { .features = PHY_GBIT_FEATURES, .flags = PHY_HAS_INTERRUPT, .config_init = xway_gphy_config_init, - .config_aneg = genphy_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = xway_gphy_ack_interrupt, .did_interrupt = xway_gphy_did_interrupt, .config_intr = xway_gphy_config_intr, @@ -340,8 +330,6 @@ static struct phy_driver xway_gphy[] = { .features = PHY_BASIC_FEATURES, .flags = PHY_HAS_INTERRUPT, .config_init = xway_gphy_config_init, - .config_aneg = genphy_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = xway_gphy_ack_interrupt, .did_interrupt = xway_gphy_did_interrupt, .config_intr = xway_gphy_config_intr, diff --git a/drivers/net/phy/lxt.c b/drivers/net/phy/lxt.c index 09d215177fff..c14b254b2879 100644 --- a/drivers/net/phy/lxt.c +++ b/drivers/net/phy/lxt.c @@ -259,8 +259,6 @@ static struct phy_driver lxt97x_driver[] = { .features = PHY_BASIC_FEATURES, .flags = PHY_HAS_INTERRUPT, .config_init = lxt970_config_init, - .config_aneg = genphy_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = lxt970_ack_interrupt, .config_intr = lxt970_config_intr, }, { @@ -269,8 +267,6 @@ static struct phy_driver lxt97x_driver[] = { .phy_id_mask = 0xfffffff0, .features = PHY_BASIC_FEATURES, .flags = PHY_HAS_INTERRUPT, - .config_aneg = genphy_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = lxt971_ack_interrupt, .config_intr = lxt971_config_intr, }, { @@ -290,7 +286,6 @@ static struct phy_driver lxt97x_driver[] = { .flags = 0, .probe = lxt973_probe, .config_aneg = lxt973_config_aneg, - .read_status = genphy_read_status, } }; module_phy_driver(lxt97x_driver); diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c index 4d02b27df044..6dbb0f4c34eb 100644 --- a/drivers/net/phy/marvell.c +++ b/drivers/net/phy/marvell.c @@ -1958,7 +1958,6 @@ static struct phy_driver marvell_drivers[] = { .probe = marvell_probe, .config_init = &marvell_config_init, .config_aneg = &m88e1101_config_aneg, - .read_status = &genphy_read_status, .ack_interrupt = &marvell_ack_interrupt, .config_intr = &marvell_config_intr, .resume = &genphy_resume, @@ -1976,7 +1975,6 @@ static struct phy_driver marvell_drivers[] = { .probe = marvell_probe, .config_init = &m88e1111_config_init, .config_aneg = &marvell_config_aneg, - .read_status = &genphy_read_status, .ack_interrupt = &marvell_ack_interrupt, .config_intr = &marvell_config_intr, .resume = &genphy_resume, @@ -2012,7 +2010,6 @@ static struct phy_driver marvell_drivers[] = { .probe = marvell_probe, .config_init = &m88e1118_config_init, .config_aneg = &m88e1118_config_aneg, - .read_status = &genphy_read_status, .ack_interrupt = &marvell_ack_interrupt, .config_intr = &marvell_config_intr, .resume = &genphy_resume, @@ -2070,7 +2067,6 @@ static struct phy_driver marvell_drivers[] = { .probe = marvell_probe, .config_init = &m88e1145_config_init, .config_aneg = &marvell_config_aneg, - .read_status = &genphy_read_status, .ack_interrupt = &marvell_ack_interrupt, .config_intr = &marvell_config_intr, .resume = &genphy_resume, @@ -2088,7 +2084,6 @@ static struct phy_driver marvell_drivers[] = { .probe = marvell_probe, .config_init = &m88e1149_config_init, .config_aneg = &m88e1118_config_aneg, - .read_status = &genphy_read_status, .ack_interrupt = &marvell_ack_interrupt, .config_intr = &marvell_config_intr, .resume = &genphy_resume, @@ -2106,7 +2101,6 @@ static struct phy_driver marvell_drivers[] = { .probe = marvell_probe, .config_init = &m88e1111_config_init, .config_aneg = &marvell_config_aneg, - .read_status = &genphy_read_status, .ack_interrupt = &marvell_ack_interrupt, .config_intr = &marvell_config_intr, .resume = &genphy_resume, @@ -2123,8 +2117,6 @@ static struct phy_driver marvell_drivers[] = { .flags = PHY_HAS_INTERRUPT, .probe = marvell_probe, .config_init = &m88e1116r_config_init, - .config_aneg = &genphy_config_aneg, - .read_status = &genphy_read_status, .ack_interrupt = &marvell_ack_interrupt, .config_intr = &marvell_config_intr, .resume = &genphy_resume, @@ -2200,7 +2192,6 @@ static struct phy_driver marvell_drivers[] = { .features = PHY_BASIC_FEATURES, .flags = PHY_HAS_INTERRUPT, .probe = marvell_probe, - .config_aneg = &genphy_config_aneg, .config_init = &m88e3016_config_init, .aneg_done = &marvell_aneg_done, .read_status = &marvell_read_status, diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c index 2df7b62c1a36..8f8b7747c54b 100644 --- a/drivers/net/phy/mdio_bus.c +++ b/drivers/net/phy/mdio_bus.c @@ -38,6 +38,7 @@ #include <linux/phy.h> #include <linux/io.h> #include <linux/uaccess.h> +#include <linux/gpio/consumer.h> #include <asm/irq.h> @@ -48,9 +49,26 @@ int mdiobus_register_device(struct mdio_device *mdiodev) { + struct gpio_desc *gpiod = NULL; + if (mdiodev->bus->mdio_map[mdiodev->addr]) return -EBUSY; + /* Deassert the optional reset signal */ + if (mdiodev->dev.of_node) + gpiod = fwnode_get_named_gpiod(&mdiodev->dev.of_node->fwnode, + "reset-gpios", 0, GPIOD_OUT_LOW, + "PHY reset"); + if (PTR_ERR(gpiod) == -ENOENT) + gpiod = NULL; + else if (IS_ERR(gpiod)) + return PTR_ERR(gpiod); + + mdiodev->reset = gpiod; + + /* Assert the reset signal again */ + mdio_device_reset(mdiodev, 1); + mdiodev->bus->mdio_map[mdiodev->addr] = mdiodev; return 0; @@ -420,6 +438,9 @@ void mdiobus_unregister(struct mii_bus *bus) if (!mdiodev) continue; + if (mdiodev->reset) + gpiod_put(mdiodev->reset); + mdiodev->device_remove(mdiodev); mdiodev->device_free(mdiodev); } diff --git a/drivers/net/phy/mdio_device.c b/drivers/net/phy/mdio_device.c index e24f28924af8..75d97dd9fb28 100644 --- a/drivers/net/phy/mdio_device.c +++ b/drivers/net/phy/mdio_device.c @@ -12,6 +12,8 @@ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include <linux/errno.h> +#include <linux/gpio.h> +#include <linux/gpio/consumer.h> #include <linux/init.h> #include <linux/interrupt.h> #include <linux/kernel.h> @@ -114,6 +116,13 @@ void mdio_device_remove(struct mdio_device *mdiodev) } EXPORT_SYMBOL(mdio_device_remove); +void mdio_device_reset(struct mdio_device *mdiodev, int value) +{ + if (mdiodev->reset) + gpiod_set_value(mdiodev->reset, value); +} +EXPORT_SYMBOL(mdio_device_reset); + /** * mdio_probe - probe an MDIO device * @dev: device to probe @@ -128,8 +137,16 @@ static int mdio_probe(struct device *dev) struct mdio_driver *mdiodrv = to_mdio_driver(drv); int err = 0; - if (mdiodrv->probe) + if (mdiodrv->probe) { + /* Deassert the reset signal */ + mdio_device_reset(mdiodev, 0); + err = mdiodrv->probe(mdiodev); + if (err) { + /* Assert the reset signal */ + mdio_device_reset(mdiodev, 1); + } + } return err; } @@ -140,9 +157,13 @@ static int mdio_remove(struct device *dev) struct device_driver *drv = mdiodev->dev.driver; struct mdio_driver *mdiodrv = to_mdio_driver(drv); - if (mdiodrv->remove) + if (mdiodrv->remove) { mdiodrv->remove(mdiodev); + /* Assert the reset signal */ + mdio_device_reset(mdiodev, 1); + } + return 0; } diff --git a/drivers/net/phy/meson-gxl.c b/drivers/net/phy/meson-gxl.c index 1ea69b7585d9..401e3234be58 100644 --- a/drivers/net/phy/meson-gxl.c +++ b/drivers/net/phy/meson-gxl.c @@ -58,9 +58,7 @@ static struct phy_driver meson_gxl_phy[] = { .features = PHY_BASIC_FEATURES, .flags = PHY_IS_INTERNAL, .config_init = meson_gxl_config_init, - .config_aneg = genphy_config_aneg, .aneg_done = genphy_aneg_done, - .read_status = genphy_read_status, .suspend = genphy_suspend, .resume = genphy_resume, }, diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c index ab4614113403..fd500b18e77f 100644 --- a/drivers/net/phy/micrel.c +++ b/drivers/net/phy/micrel.c @@ -801,8 +801,6 @@ static struct phy_driver ksphy_driver[] = { .flags = PHY_HAS_INTERRUPT, .driver_data = &ks8737_type, .config_init = kszphy_config_init, - .config_aneg = genphy_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = kszphy_ack_interrupt, .config_intr = kszphy_config_intr, .suspend = genphy_suspend, @@ -816,8 +814,6 @@ static struct phy_driver ksphy_driver[] = { .driver_data = &ksz8021_type, .probe = kszphy_probe, .config_init = kszphy_config_init, - .config_aneg = genphy_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = kszphy_ack_interrupt, .config_intr = kszphy_config_intr, .get_sset_count = kszphy_get_sset_count, @@ -834,8 +830,6 @@ static struct phy_driver ksphy_driver[] = { .driver_data = &ksz8021_type, .probe = kszphy_probe, .config_init = kszphy_config_init, - .config_aneg = genphy_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = kszphy_ack_interrupt, .config_intr = kszphy_config_intr, .get_sset_count = kszphy_get_sset_count, @@ -853,7 +847,6 @@ static struct phy_driver ksphy_driver[] = { .probe = kszphy_probe, .config_init = ksz8041_config_init, .config_aneg = ksz8041_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = kszphy_ack_interrupt, .config_intr = kszphy_config_intr, .get_sset_count = kszphy_get_sset_count, @@ -870,8 +863,6 @@ static struct phy_driver ksphy_driver[] = { .driver_data = &ksz8041_type, .probe = kszphy_probe, .config_init = kszphy_config_init, - .config_aneg = genphy_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = kszphy_ack_interrupt, .config_intr = kszphy_config_intr, .get_sset_count = kszphy_get_sset_count, @@ -888,8 +879,6 @@ static struct phy_driver ksphy_driver[] = { .driver_data = &ksz8051_type, .probe = kszphy_probe, .config_init = kszphy_config_init, - .config_aneg = genphy_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = kszphy_ack_interrupt, .config_intr = kszphy_config_intr, .get_sset_count = kszphy_get_sset_count, @@ -906,8 +895,6 @@ static struct phy_driver ksphy_driver[] = { .driver_data = &ksz8041_type, .probe = kszphy_probe, .config_init = kszphy_config_init, - .config_aneg = genphy_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = kszphy_ack_interrupt, .config_intr = kszphy_config_intr, .get_sset_count = kszphy_get_sset_count, @@ -924,8 +911,6 @@ static struct phy_driver ksphy_driver[] = { .driver_data = &ksz8081_type, .probe = kszphy_probe, .config_init = kszphy_config_init, - .config_aneg = genphy_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = kszphy_ack_interrupt, .config_intr = kszphy_config_intr, .get_sset_count = kszphy_get_sset_count, @@ -940,8 +925,6 @@ static struct phy_driver ksphy_driver[] = { .features = PHY_BASIC_FEATURES, .flags = PHY_HAS_INTERRUPT, .config_init = kszphy_config_init, - .config_aneg = genphy_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = kszphy_ack_interrupt, .config_intr = kszphy_config_intr, .suspend = genphy_suspend, @@ -955,8 +938,6 @@ static struct phy_driver ksphy_driver[] = { .driver_data = &ksz9021_type, .probe = kszphy_probe, .config_init = ksz9021_config_init, - .config_aneg = genphy_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = kszphy_ack_interrupt, .config_intr = kszphy_config_intr, .get_sset_count = kszphy_get_sset_count, @@ -975,7 +956,6 @@ static struct phy_driver ksphy_driver[] = { .driver_data = &ksz9021_type, .probe = kszphy_probe, .config_init = ksz9031_config_init, - .config_aneg = genphy_config_aneg, .read_status = ksz9031_read_status, .ack_interrupt = kszphy_ack_interrupt, .config_intr = kszphy_config_intr, @@ -1000,8 +980,6 @@ static struct phy_driver ksphy_driver[] = { .features = PHY_BASIC_FEATURES, .flags = PHY_HAS_INTERRUPT, .config_init = kszphy_config_init, - .config_aneg = genphy_config_aneg, - .read_status = genphy_read_status, .suspend = genphy_suspend, .resume = genphy_resume, }, { @@ -1021,8 +999,6 @@ static struct phy_driver ksphy_driver[] = { .name = "Microchip KSZ9477", .features = PHY_GBIT_FEATURES, .config_init = kszphy_config_init, - .config_aneg = genphy_config_aneg, - .read_status = genphy_read_status, .suspend = genphy_suspend, .resume = genphy_resume, } }; diff --git a/drivers/net/phy/microchip.c b/drivers/net/phy/microchip.c index 37ee856c7680..0f293ef28935 100644 --- a/drivers/net/phy/microchip.c +++ b/drivers/net/phy/microchip.c @@ -153,7 +153,6 @@ static struct phy_driver microchip_phy_driver[] = { .config_init = genphy_config_init, .config_aneg = lan88xx_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = lan88xx_phy_ack_interrupt, .config_intr = lan88xx_phy_config_intr, diff --git a/drivers/net/phy/national.c b/drivers/net/phy/national.c index 2addf1d3f619..2b1e336961f9 100644 --- a/drivers/net/phy/national.c +++ b/drivers/net/phy/national.c @@ -136,8 +136,6 @@ static struct phy_driver dp83865_driver[] = { { .features = PHY_GBIT_FEATURES, .flags = PHY_HAS_INTERRUPT, .config_init = ns_config_init, - .config_aneg = genphy_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = ns_ack_interrupt, .config_intr = ns_config_intr, } }; diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c index 2b1e67bc1e73..944143b521d7 100644 --- a/drivers/net/phy/phy.c +++ b/drivers/net/phy/phy.c @@ -493,7 +493,10 @@ static int phy_start_aneg_priv(struct phy_device *phydev, bool sync) /* Invalidate LP advertising flags */ phydev->lp_advertising = 0; - err = phydev->drv->config_aneg(phydev); + if (phydev->drv->config_aneg) + err = phydev->drv->config_aneg(phydev); + else + err = genphy_config_aneg(phydev); if (err < 0) goto out_unlock; @@ -629,9 +632,6 @@ static irqreturn_t phy_interrupt(int irq, void *phy_dat) if (PHY_HALTED == phydev->state) return IRQ_NONE; /* It can't be ours. */ - disable_irq_nosync(irq); - atomic_inc(&phydev->irq_disable); - phy_change(phydev); return IRQ_HANDLED; @@ -689,7 +689,6 @@ phy_err: */ int phy_start_interrupts(struct phy_device *phydev) { - atomic_set(&phydev->irq_disable, 0); if (request_threaded_irq(phydev->irq, NULL, phy_interrupt, IRQF_ONESHOT | IRQF_SHARED, phydev_name(phydev), phydev) < 0) { @@ -716,13 +715,6 @@ int phy_stop_interrupts(struct phy_device *phydev) free_irq(phydev->irq, phydev); - /* If work indeed has been cancelled, disable_irq() will have - * been left unbalanced from phy_interrupt() and enable_irq() - * has to be called so that other devices on the line work. - */ - while (atomic_dec_return(&phydev->irq_disable) >= 0) - enable_irq(phydev->irq); - return err; } EXPORT_SYMBOL(phy_stop_interrupts); @@ -736,10 +728,11 @@ void phy_change(struct phy_device *phydev) if (phy_interrupt_is_valid(phydev)) { if (phydev->drv->did_interrupt && !phydev->drv->did_interrupt(phydev)) - goto ignore; + return; - if (phy_disable_interrupts(phydev)) - goto phy_err; + if (phydev->state == PHY_HALTED) + if (phy_disable_interrupts(phydev)) + goto phy_err; } mutex_lock(&phydev->lock); @@ -747,28 +740,13 @@ void phy_change(struct phy_device *phydev) phydev->state = PHY_CHANGELINK; mutex_unlock(&phydev->lock); - if (phy_interrupt_is_valid(phydev)) { - atomic_dec(&phydev->irq_disable); - enable_irq(phydev->irq); - - /* Reenable interrupts */ - if (PHY_HALTED != phydev->state && - phy_config_interrupt(phydev, PHY_INTERRUPT_ENABLED)) - goto irq_enable_err; - } - /* reschedule state queue work to run as soon as possible */ phy_trigger_machine(phydev, true); - return; -ignore: - atomic_dec(&phydev->irq_disable); - enable_irq(phydev->irq); + if (phy_interrupt_is_valid(phydev) && phy_clear_interrupt(phydev)) + goto phy_err; return; -irq_enable_err: - disable_irq(phydev->irq); - atomic_inc(&phydev->irq_disable); phy_err: phy_error(phydev); } @@ -1006,10 +984,6 @@ void phy_state_machine(struct work_struct *work) phydev->state = PHY_NOLINK; phy_link_down(phydev, true); } - - if (phy_interrupt_is_valid(phydev)) - err = phy_config_interrupt(phydev, - PHY_INTERRUPT_ENABLED); break; case PHY_HALTED: if (phydev->link) { diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index 67f25ac29025..1de5e242b8b4 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -632,6 +632,9 @@ int phy_device_register(struct phy_device *phydev) if (err) return err; + /* Deassert the reset signal */ + phy_device_reset(phydev, 0); + /* Run all of the fixups for this PHY */ err = phy_scan_fixups(phydev); if (err) { @@ -650,6 +653,9 @@ int phy_device_register(struct phy_device *phydev) return 0; out: + /* Assert the reset signal */ + phy_device_reset(phydev, 1); + mdiobus_unregister_device(&phydev->mdio); return err; } @@ -666,6 +672,10 @@ EXPORT_SYMBOL(phy_device_register); void phy_device_remove(struct phy_device *phydev) { device_del(&phydev->mdio.dev); + + /* Assert the reset signal */ + phy_device_reset(phydev, 1); + mdiobus_unregister_device(&phydev->mdio); } EXPORT_SYMBOL(phy_device_remove); @@ -849,6 +859,9 @@ int phy_init_hw(struct phy_device *phydev) { int ret = 0; + /* Deassert the reset signal */ + phy_device_reset(phydev, 0); + if (!phydev->drv || !phydev->drv->config_init) return 0; @@ -1126,6 +1139,9 @@ void phy_detach(struct phy_device *phydev) put_device(&phydev->mdio.dev); if (ndev_owner != bus->owner) module_put(bus->owner); + + /* Assert the reset signal */ + phy_device_reset(phydev, 1); } EXPORT_SYMBOL(phy_detach); @@ -1811,8 +1827,16 @@ static int phy_probe(struct device *dev) /* Set the state to READY by default */ phydev->state = PHY_READY; - if (phydev->drv->probe) + if (phydev->drv->probe) { + /* Deassert the reset signal */ + phy_device_reset(phydev, 0); + err = phydev->drv->probe(phydev); + if (err) { + /* Assert the reset signal */ + phy_device_reset(phydev, 1); + } + } mutex_unlock(&phydev->lock); @@ -1829,8 +1853,12 @@ static int phy_remove(struct device *dev) phydev->state = PHY_DOWN; mutex_unlock(&phydev->lock); - if (phydev->drv && phydev->drv->remove) + if (phydev->drv && phydev->drv->remove) { phydev->drv->remove(phydev); + + /* Assert the reset signal */ + phy_device_reset(phydev, 1); + } phydev->drv = NULL; return 0; @@ -1907,9 +1935,7 @@ static struct phy_driver genphy_driver = { .features = PHY_GBIT_FEATURES | SUPPORTED_MII | SUPPORTED_AUI | SUPPORTED_FIBRE | SUPPORTED_BNC, - .config_aneg = genphy_config_aneg, .aneg_done = genphy_aneg_done, - .read_status = genphy_read_status, .suspend = genphy_suspend, .resume = genphy_resume, .set_loopback = genphy_loopback, diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c index 5dc9668dde34..2bfb548d3dff 100644 --- a/drivers/net/phy/phylink.c +++ b/drivers/net/phy/phylink.c @@ -36,7 +36,11 @@ enum { PHYLINK_DISABLE_LINK, }; +/** + * struct phylink - internal data type for phylink + */ struct phylink { + /* private: */ struct net_device *netdev; const struct phylink_mac_ops *ops; @@ -87,6 +91,13 @@ static inline bool linkmode_empty(const unsigned long *src) return bitmap_empty(src, __ETHTOOL_LINK_MODE_MASK_NBITS); } +/** + * phylink_set_port_modes() - set the port type modes in the ethtool mask + * @mask: ethtool link mode mask + * + * Sets all the port type modes in the ethtool mask. MAC drivers should + * use this in their 'validate' callback. + */ void phylink_set_port_modes(unsigned long *mask) { phylink_set(mask, TP); @@ -117,8 +128,7 @@ static const char *phylink_an_mode_str(unsigned int mode) static const char *modestr[] = { [MLO_AN_PHY] = "phy", [MLO_AN_FIXED] = "fixed", - [MLO_AN_SGMII] = "SGMII", - [MLO_AN_8023Z] = "802.3z", + [MLO_AN_INBAND] = "inband", }; return mode < ARRAY_SIZE(modestr) ? modestr[mode] : "unknown"; @@ -132,59 +142,64 @@ static int phylink_validate(struct phylink *pl, unsigned long *supported, return phylink_is_empty_linkmode(supported) ? -EINVAL : 0; } -static int phylink_parse_fixedlink(struct phylink *pl, struct device_node *np) +static int phylink_parse_fixedlink(struct phylink *pl, + struct fwnode_handle *fwnode) { - struct device_node *fixed_node; + struct fwnode_handle *fixed_node; const struct phy_setting *s; struct gpio_desc *desc; - const __be32 *fixed_prop; u32 speed; - int ret, len; + int ret; - fixed_node = of_get_child_by_name(np, "fixed-link"); + fixed_node = fwnode_get_named_child_node(fwnode, "fixed-link"); if (fixed_node) { - ret = of_property_read_u32(fixed_node, "speed", &speed); + ret = fwnode_property_read_u32(fixed_node, "speed", &speed); pl->link_config.speed = speed; pl->link_config.duplex = DUPLEX_HALF; - if (of_property_read_bool(fixed_node, "full-duplex")) + if (fwnode_property_read_bool(fixed_node, "full-duplex")) pl->link_config.duplex = DUPLEX_FULL; /* We treat the "pause" and "asym-pause" terminology as * defining the link partner's ability. */ - if (of_property_read_bool(fixed_node, "pause")) + if (fwnode_property_read_bool(fixed_node, "pause")) pl->link_config.pause |= MLO_PAUSE_SYM; - if (of_property_read_bool(fixed_node, "asym-pause")) + if (fwnode_property_read_bool(fixed_node, "asym-pause")) pl->link_config.pause |= MLO_PAUSE_ASYM; if (ret == 0) { - desc = fwnode_get_named_gpiod(&fixed_node->fwnode, - "link-gpios", 0, - GPIOD_IN, "?"); + desc = fwnode_get_named_gpiod(fixed_node, "link-gpios", + 0, GPIOD_IN, "?"); if (!IS_ERR(desc)) pl->link_gpio = desc; else if (desc == ERR_PTR(-EPROBE_DEFER)) ret = -EPROBE_DEFER; } - of_node_put(fixed_node); + fwnode_handle_put(fixed_node); if (ret) return ret; } else { - fixed_prop = of_get_property(np, "fixed-link", &len); - if (!fixed_prop) { + u32 prop[5]; + + ret = fwnode_property_read_u32_array(fwnode, "fixed-link", + NULL, 0); + if (ret != ARRAY_SIZE(prop)) { netdev_err(pl->netdev, "broken fixed-link?\n"); return -EINVAL; } - if (len == 5 * sizeof(*fixed_prop)) { - pl->link_config.duplex = be32_to_cpu(fixed_prop[1]) ? + + ret = fwnode_property_read_u32_array(fwnode, "fixed-link", + prop, ARRAY_SIZE(prop)); + if (!ret) { + pl->link_config.duplex = prop[1] ? DUPLEX_FULL : DUPLEX_HALF; - pl->link_config.speed = be32_to_cpu(fixed_prop[2]); - if (be32_to_cpu(fixed_prop[3])) + pl->link_config.speed = prop[2]; + if (prop[3]) pl->link_config.pause |= MLO_PAUSE_SYM; - if (be32_to_cpu(fixed_prop[4])) + if (prop[4]) pl->link_config.pause |= MLO_PAUSE_ASYM; } } @@ -220,17 +235,17 @@ static int phylink_parse_fixedlink(struct phylink *pl, struct device_node *np) return 0; } -static int phylink_parse_mode(struct phylink *pl, struct device_node *np) +static int phylink_parse_mode(struct phylink *pl, struct fwnode_handle *fwnode) { - struct device_node *dn; + struct fwnode_handle *dn; const char *managed; - dn = of_get_child_by_name(np, "fixed-link"); - if (dn || of_find_property(np, "fixed-link", NULL)) + dn = fwnode_get_named_child_node(fwnode, "fixed-link"); + if (dn || fwnode_property_present(fwnode, "fixed-link")) pl->link_an_mode = MLO_AN_FIXED; - of_node_put(dn); + fwnode_handle_put(dn); - if (of_property_read_string(np, "managed", &managed) == 0 && + if (fwnode_property_read_string(fwnode, "managed", &managed) == 0 && strcmp(managed, "in-band-status") == 0) { if (pl->link_an_mode == MLO_AN_FIXED) { netdev_err(pl->netdev, @@ -244,6 +259,7 @@ static int phylink_parse_mode(struct phylink *pl, struct device_node *np) phylink_set(pl->supported, Asym_Pause); phylink_set(pl->supported, Pause); pl->link_config.an_enabled = true; + pl->link_an_mode = MLO_AN_INBAND; switch (pl->link_config.interface) { case PHY_INTERFACE_MODE_SGMII: @@ -253,17 +269,14 @@ static int phylink_parse_mode(struct phylink *pl, struct device_node *np) phylink_set(pl->supported, 100baseT_Full); phylink_set(pl->supported, 1000baseT_Half); phylink_set(pl->supported, 1000baseT_Full); - pl->link_an_mode = MLO_AN_SGMII; break; case PHY_INTERFACE_MODE_1000BASEX: phylink_set(pl->supported, 1000baseX_Full); - pl->link_an_mode = MLO_AN_8023Z; break; case PHY_INTERFACE_MODE_2500BASEX: phylink_set(pl->supported, 2500baseX_Full); - pl->link_an_mode = MLO_AN_8023Z; break; case PHY_INTERFACE_MODE_10GKR: @@ -280,7 +293,6 @@ static int phylink_parse_mode(struct phylink *pl, struct device_node *np) phylink_set(pl->supported, 10000baseLR_Full); phylink_set(pl->supported, 10000baseLRM_Full); phylink_set(pl->supported, 10000baseER_Full); - pl->link_an_mode = MLO_AN_SGMII; break; default: @@ -320,8 +332,7 @@ static void phylink_mac_config(struct phylink *pl, static void phylink_mac_an_restart(struct phylink *pl) { if (pl->link_config.an_enabled && - (pl->link_config.interface == PHY_INTERFACE_MODE_1000BASEX || - pl->link_config.interface == PHY_INTERFACE_MODE_2500BASEX)) + phy_interface_mode_is_8023z(pl->link_config.interface)) pl->ops->mac_an_restart(pl->netdev); } @@ -423,7 +434,7 @@ static void phylink_resolve(struct work_struct *w) phylink_mac_config(pl, &link_state); break; - case MLO_AN_SGMII: + case MLO_AN_INBAND: phylink_get_mac_state(pl, &link_state); if (pl->phydev) { bool changed = false; @@ -449,10 +460,6 @@ static void phylink_resolve(struct work_struct *w) } } break; - - case MLO_AN_8023Z: - phylink_get_mac_state(pl, &link_state); - break; } } @@ -489,15 +496,24 @@ static void phylink_run_resolve(struct phylink *pl) static const struct sfp_upstream_ops sfp_phylink_ops; -static int phylink_register_sfp(struct phylink *pl, struct device_node *np) +static int phylink_register_sfp(struct phylink *pl, + struct fwnode_handle *fwnode) { - struct device_node *sfp_np; + struct fwnode_reference_args ref; + int ret; - sfp_np = of_parse_phandle(np, "sfp", 0); - if (!sfp_np) - return 0; + ret = fwnode_property_get_reference_args(fwnode, "sfp", NULL, + 0, 0, &ref); + if (ret < 0) { + if (ret == -ENOENT) + return 0; - pl->sfp_bus = sfp_register_upstream(sfp_np, pl->netdev, pl, + netdev_err(pl->netdev, "unable to parse \"sfp\" node: %d\n", + ret); + return ret; + } + + pl->sfp_bus = sfp_register_upstream(ref.fwnode, pl->netdev, pl, &sfp_phylink_ops); if (!pl->sfp_bus) return -ENOMEM; @@ -505,7 +521,22 @@ static int phylink_register_sfp(struct phylink *pl, struct device_node *np) return 0; } -struct phylink *phylink_create(struct net_device *ndev, struct device_node *np, +/** + * phylink_create() - create a phylink instance + * @ndev: a pointer to the &struct net_device + * @fwnode: a pointer to a &struct fwnode_handle describing the network + * interface + * @iface: the desired link mode defined by &typedef phy_interface_t + * @ops: a pointer to a &struct phylink_mac_ops for the MAC. + * + * Create a new phylink instance, and parse the link parameters found in @np. + * This will parse in-band modes, fixed-link or SFP configuration. + * + * Returns a pointer to a &struct phylink, or an error-pointer value. Users + * must use IS_ERR() to check for errors from this function. + */ +struct phylink *phylink_create(struct net_device *ndev, + struct fwnode_handle *fwnode, phy_interface_t iface, const struct phylink_mac_ops *ops) { @@ -533,21 +564,21 @@ struct phylink *phylink_create(struct net_device *ndev, struct device_node *np, linkmode_copy(pl->link_config.advertising, pl->supported); phylink_validate(pl, pl->supported, &pl->link_config); - ret = phylink_parse_mode(pl, np); + ret = phylink_parse_mode(pl, fwnode); if (ret < 0) { kfree(pl); return ERR_PTR(ret); } if (pl->link_an_mode == MLO_AN_FIXED) { - ret = phylink_parse_fixedlink(pl, np); + ret = phylink_parse_fixedlink(pl, fwnode); if (ret < 0) { kfree(pl); return ERR_PTR(ret); } } - ret = phylink_register_sfp(pl, np); + ret = phylink_register_sfp(pl, fwnode); if (ret < 0) { kfree(pl); return ERR_PTR(ret); @@ -557,6 +588,13 @@ struct phylink *phylink_create(struct net_device *ndev, struct device_node *np, } EXPORT_SYMBOL_GPL(phylink_create); +/** + * phylink_destroy() - cleanup and destroy the phylink instance + * @pl: a pointer to a &struct phylink returned from phylink_create() + * + * Destroy a phylink instance. Any PHY that has been attached must have been + * cleaned up via phylink_disconnect_phy() prior to calling this function. + */ void phylink_destroy(struct phylink *pl) { if (pl->sfp_bus) @@ -653,10 +691,30 @@ static int phylink_bringup_phy(struct phylink *pl, struct phy_device *phy) return 0; } +/** + * phylink_connect_phy() - connect a PHY to the phylink instance + * @pl: a pointer to a &struct phylink returned from phylink_create() + * @phy: a pointer to a &struct phy_device. + * + * Connect @phy to the phylink instance specified by @pl by calling + * phy_attach_direct(). Configure the @phy according to the MAC driver's + * capabilities, start the PHYLIB state machine and enable any interrupts + * that the PHY supports. + * + * This updates the phylink's ethtool supported and advertising link mode + * masks. + * + * Returns 0 on success or a negative errno. + */ int phylink_connect_phy(struct phylink *pl, struct phy_device *phy) { int ret; + if (WARN_ON(pl->link_an_mode == MLO_AN_FIXED || + (pl->link_an_mode == MLO_AN_INBAND && + phy_interface_mode_is_8023z(pl->link_interface)))) + return -EINVAL; + ret = phy_attach_direct(pl->netdev, phy, 0, pl->link_interface); if (ret) return ret; @@ -669,14 +727,27 @@ int phylink_connect_phy(struct phylink *pl, struct phy_device *phy) } EXPORT_SYMBOL_GPL(phylink_connect_phy); +/** + * phylink_of_phy_connect() - connect the PHY specified in the DT mode. + * @pl: a pointer to a &struct phylink returned from phylink_create() + * @dn: a pointer to a &struct device_node. + * + * Connect the phy specified in the device node @dn to the phylink instance + * specified by @pl. Actions specified in phylink_connect_phy() will be + * performed. + * + * Returns 0 on success or a negative errno. + */ int phylink_of_phy_connect(struct phylink *pl, struct device_node *dn) { struct device_node *phy_node; struct phy_device *phy_dev; int ret; - /* Fixed links are handled without needing a PHY */ - if (pl->link_an_mode == MLO_AN_FIXED) + /* Fixed links and 802.3z are handled without needing a PHY */ + if (pl->link_an_mode == MLO_AN_FIXED || + (pl->link_an_mode == MLO_AN_INBAND && + phy_interface_mode_is_8023z(pl->link_interface))) return 0; phy_node = of_parse_phandle(dn, "phy-handle", 0); @@ -708,6 +779,13 @@ int phylink_of_phy_connect(struct phylink *pl, struct device_node *dn) } EXPORT_SYMBOL_GPL(phylink_of_phy_connect); +/** + * phylink_disconnect_phy() - disconnect any PHY attached to the phylink + * instance. + * @pl: a pointer to a &struct phylink returned from phylink_create() + * + * Disconnect any current PHY from the phylink instance described by @pl. + */ void phylink_disconnect_phy(struct phylink *pl) { struct phy_device *phy; @@ -729,6 +807,14 @@ void phylink_disconnect_phy(struct phylink *pl) } EXPORT_SYMBOL_GPL(phylink_disconnect_phy); +/** + * phylink_mac_change() - notify phylink of a change in MAC state + * @pl: a pointer to a &struct phylink returned from phylink_create() + * @up: indicates whether the link is currently up. + * + * The MAC driver should call this driver when the state of its link + * changes (eg, link failure, new negotiation results, etc.) + */ void phylink_mac_change(struct phylink *pl, bool up) { if (!up) @@ -738,6 +824,14 @@ void phylink_mac_change(struct phylink *pl, bool up) } EXPORT_SYMBOL_GPL(phylink_mac_change); +/** + * phylink_start() - start a phylink instance + * @pl: a pointer to a &struct phylink returned from phylink_create() + * + * Start the phylink instance specified by @pl, configuring the MAC for the + * desired link mode(s) and negotiation style. This should be called from the + * network device driver's &struct net_device_ops ndo_open() method. + */ void phylink_start(struct phylink *pl) { WARN_ON(!lockdep_rtnl_is_held()); @@ -753,6 +847,12 @@ void phylink_start(struct phylink *pl) phylink_resolve_flow(pl, &pl->link_config); phylink_mac_config(pl, &pl->link_config); + /* Restart autonegotiation if using 802.3z to ensure that the link + * parameters are properly negotiated. This is necessary for DSA + * switches using 802.3z negotiation to ensure they see our modes. + */ + phylink_mac_an_restart(pl); + clear_bit(PHYLINK_DISABLE_STOPPED, &pl->phylink_disable_state); phylink_run_resolve(pl); @@ -763,6 +863,15 @@ void phylink_start(struct phylink *pl) } EXPORT_SYMBOL_GPL(phylink_start); +/** + * phylink_stop() - stop a phylink instance + * @pl: a pointer to a &struct phylink returned from phylink_create() + * + * Stop the phylink instance specified by @pl. This should be called from the + * network device driver's &struct net_device_ops ndo_stop() method. The + * network device's carrier state should not be changed prior to calling this + * function. + */ void phylink_stop(struct phylink *pl) { WARN_ON(!lockdep_rtnl_is_held()); @@ -778,6 +887,15 @@ void phylink_stop(struct phylink *pl) } EXPORT_SYMBOL_GPL(phylink_stop); +/** + * phylink_ethtool_get_wol() - get the wake on lan parameters for the PHY + * @pl: a pointer to a &struct phylink returned from phylink_create() + * @wol: a pointer to &struct ethtool_wolinfo to hold the read parameters + * + * Read the wake on lan parameters from the PHY attached to the phylink + * instance specified by @pl. If no PHY is currently attached, report no + * support for wake on lan. + */ void phylink_ethtool_get_wol(struct phylink *pl, struct ethtool_wolinfo *wol) { WARN_ON(!lockdep_rtnl_is_held()); @@ -790,6 +908,17 @@ void phylink_ethtool_get_wol(struct phylink *pl, struct ethtool_wolinfo *wol) } EXPORT_SYMBOL_GPL(phylink_ethtool_get_wol); +/** + * phylink_ethtool_set_wol() - set wake on lan parameters + * @pl: a pointer to a &struct phylink returned from phylink_create() + * @wol: a pointer to &struct ethtool_wolinfo for the desired parameters + * + * Set the wake on lan parameters for the PHY attached to the phylink + * instance specified by @pl. If no PHY is attached, returns %EOPNOTSUPP + * error. + * + * Returns zero on success or negative errno code. + */ int phylink_ethtool_set_wol(struct phylink *pl, struct ethtool_wolinfo *wol) { int ret = -EOPNOTSUPP; @@ -825,6 +954,15 @@ static void phylink_get_ksettings(const struct phylink_link_state *state, AUTONEG_DISABLE; } +/** + * phylink_ethtool_ksettings_get() - get the current link settings + * @pl: a pointer to a &struct phylink returned from phylink_create() + * @kset: a pointer to a &struct ethtool_link_ksettings to hold link settings + * + * Read the current link settings for the phylink instance specified by @pl. + * This will be the link settings read from the MAC, PHY or fixed link + * settings depending on the current negotiation mode. + */ int phylink_ethtool_ksettings_get(struct phylink *pl, struct ethtool_link_ksettings *kset) { @@ -850,14 +988,13 @@ int phylink_ethtool_ksettings_get(struct phylink *pl, phylink_get_ksettings(&link_state, kset); break; - case MLO_AN_SGMII: + case MLO_AN_INBAND: /* If there is a phy attached, then use the reported * settings from the phy with no modification. */ if (pl->phydev) break; - case MLO_AN_8023Z: phylink_get_mac_state(pl, &link_state); /* The MAC is reporting the link results from its own PCS @@ -872,6 +1009,11 @@ int phylink_ethtool_ksettings_get(struct phylink *pl, } EXPORT_SYMBOL_GPL(phylink_ethtool_ksettings_get); +/** + * phylink_ethtool_ksettings_set() - set the link settings + * @pl: a pointer to a &struct phylink returned from phylink_create() + * @kset: a pointer to a &struct ethtool_link_ksettings for the desired modes + */ int phylink_ethtool_ksettings_set(struct phylink *pl, const struct ethtool_link_ksettings *kset) { @@ -965,6 +1107,17 @@ int phylink_ethtool_ksettings_set(struct phylink *pl, } EXPORT_SYMBOL_GPL(phylink_ethtool_ksettings_set); +/** + * phylink_ethtool_nway_reset() - restart negotiation + * @pl: a pointer to a &struct phylink returned from phylink_create() + * + * Restart negotiation for the phylink instance specified by @pl. This will + * cause any attached phy to restart negotiation with the link partner, and + * if the MAC is in a BaseX mode, the MAC will also be requested to restart + * negotiation. + * + * Returns zero on success, or negative error code. + */ int phylink_ethtool_nway_reset(struct phylink *pl) { int ret = 0; @@ -979,6 +1132,11 @@ int phylink_ethtool_nway_reset(struct phylink *pl) } EXPORT_SYMBOL_GPL(phylink_ethtool_nway_reset); +/** + * phylink_ethtool_get_pauseparam() - get the current pause parameters + * @pl: a pointer to a &struct phylink returned from phylink_create() + * @pause: a pointer to a &struct ethtool_pauseparam + */ void phylink_ethtool_get_pauseparam(struct phylink *pl, struct ethtool_pauseparam *pause) { @@ -990,6 +1148,11 @@ void phylink_ethtool_get_pauseparam(struct phylink *pl, } EXPORT_SYMBOL_GPL(phylink_ethtool_get_pauseparam); +/** + * phylink_ethtool_set_pauseparam() - set the current pause parameters + * @pl: a pointer to a &struct phylink returned from phylink_create() + * @pause: a pointer to a &struct ethtool_pauseparam + */ int phylink_ethtool_set_pauseparam(struct phylink *pl, struct ethtool_pauseparam *pause) { @@ -1028,8 +1191,7 @@ int phylink_ethtool_set_pauseparam(struct phylink *pl, phylink_mac_config(pl, config); break; - case MLO_AN_SGMII: - case MLO_AN_8023Z: + case MLO_AN_INBAND: phylink_mac_config(pl, config); phylink_mac_an_restart(pl); break; @@ -1068,19 +1230,16 @@ int phylink_ethtool_get_module_eeprom(struct phylink *pl, } EXPORT_SYMBOL_GPL(phylink_ethtool_get_module_eeprom); -int phylink_init_eee(struct phylink *pl, bool clk_stop_enable) -{ - int ret = -EPROTONOSUPPORT; - - WARN_ON(!lockdep_rtnl_is_held()); - - if (pl->phydev) - ret = phy_init_eee(pl->phydev, clk_stop_enable); - - return ret; -} -EXPORT_SYMBOL_GPL(phylink_init_eee); - +/** + * phylink_ethtool_get_eee_err() - read the energy efficient ethernet error + * counter + * @pl: a pointer to a &struct phylink returned from phylink_create(). + * + * Read the Energy Efficient Ethernet error counter from the PHY associated + * with the phylink instance specified by @pl. + * + * Returns positive error counter value, or negative error code. + */ int phylink_get_eee_err(struct phylink *pl) { int ret = 0; @@ -1094,6 +1253,11 @@ int phylink_get_eee_err(struct phylink *pl) } EXPORT_SYMBOL_GPL(phylink_get_eee_err); +/** + * phylink_ethtool_get_eee() - read the energy efficient ethernet parameters + * @pl: a pointer to a &struct phylink returned from phylink_create() + * @eee: a pointer to a &struct ethtool_eee for the read parameters + */ int phylink_ethtool_get_eee(struct phylink *pl, struct ethtool_eee *eee) { int ret = -EOPNOTSUPP; @@ -1107,6 +1271,11 @@ int phylink_ethtool_get_eee(struct phylink *pl, struct ethtool_eee *eee) } EXPORT_SYMBOL_GPL(phylink_ethtool_get_eee); +/** + * phylink_ethtool_set_eee() - set the energy efficient ethernet parameters + * @pl: a pointer to a &struct phylink returned from phylink_create() + * @eee: a pointer to a &struct ethtool_eee for the desired parameters + */ int phylink_ethtool_set_eee(struct phylink *pl, struct ethtool_eee *eee) { int ret = -EOPNOTSUPP; @@ -1246,9 +1415,7 @@ static int phylink_mii_read(struct phylink *pl, unsigned int phy_id, case MLO_AN_PHY: return -EOPNOTSUPP; - case MLO_AN_SGMII: - /* No phy, fall through to 8023z method */ - case MLO_AN_8023Z: + case MLO_AN_INBAND: if (phy_id == 0) { val = phylink_get_mac_state(pl, &state); if (val < 0) @@ -1273,15 +1440,31 @@ static int phylink_mii_write(struct phylink *pl, unsigned int phy_id, case MLO_AN_PHY: return -EOPNOTSUPP; - case MLO_AN_SGMII: - /* No phy, fall through to 8023z method */ - case MLO_AN_8023Z: + case MLO_AN_INBAND: break; } return 0; } +/** + * phylink_mii_ioctl() - generic mii ioctl interface + * @pl: a pointer to a &struct phylink returned from phylink_create() + * @ifr: a pointer to a &struct ifreq for socket ioctls + * @cmd: ioctl cmd to execute + * + * Perform the specified MII ioctl on the PHY attached to the phylink instance + * specified by @pl. If no PHY is attached, emulate the presence of the PHY. + * + * Returns: zero on success or negative error code. + * + * %SIOCGMIIPHY: + * read register from the current PHY. + * %SIOCGMIIREG: + * read register from the specified PHY. + * %SIOCSMIIREG: + * set a register on the specified PHY. + */ int phylink_mii_ioctl(struct phylink *pl, struct ifreq *ifr, int cmd) { struct mii_ioctl_data *mii = if_mii(ifr); @@ -1290,7 +1473,7 @@ int phylink_mii_ioctl(struct phylink *pl, struct ifreq *ifr, int cmd) WARN_ON(!lockdep_rtnl_is_held()); if (pl->phydev) { - /* PHYs only exist for MLO_AN_PHY and MLO_AN_SGMII */ + /* PHYs only exist for MLO_AN_PHY and SGMII */ switch (cmd) { case SIOCGMIIPHY: mii->phy_id = pl->phydev->mdio.addr; @@ -1359,10 +1542,10 @@ static int phylink_sfp_module_insert(void *upstream, switch (iface) { case PHY_INTERFACE_MODE_SGMII: - mode = MLO_AN_SGMII; - break; case PHY_INTERFACE_MODE_1000BASEX: - mode = MLO_AN_8023Z; + case PHY_INTERFACE_MODE_2500BASEX: + case PHY_INTERFACE_MODE_10GKR: + mode = MLO_AN_INBAND; break; default: return -EINVAL; @@ -1389,7 +1572,7 @@ static int phylink_sfp_module_insert(void *upstream, phylink_an_mode_str(mode), phy_modes(config.interface), __ETHTOOL_LINK_MODE_MASK_NBITS, support); - if (mode == MLO_AN_8023Z && pl->phydev) + if (phy_interface_mode_is_8023z(iface) && pl->phydev) return -EINVAL; changed = !bitmap_equal(pl->supported, support, diff --git a/drivers/net/phy/qsemi.c b/drivers/net/phy/qsemi.c index dbef8002bc28..889a4dce1648 100644 --- a/drivers/net/phy/qsemi.c +++ b/drivers/net/phy/qsemi.c @@ -118,8 +118,6 @@ static struct phy_driver qs6612_driver[] = { { .features = PHY_BASIC_FEATURES, .flags = PHY_HAS_INTERRUPT, .config_init = qs6612_config_init, - .config_aneg = genphy_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = qs6612_ack_interrupt, .config_intr = qs6612_config_intr, } }; diff --git a/drivers/net/phy/realtek.c b/drivers/net/phy/realtek.c index eda0a6e86918..7c1bf688dd48 100644 --- a/drivers/net/phy/realtek.c +++ b/drivers/net/phy/realtek.c @@ -13,29 +13,67 @@ * option) any later version. * */ +#include <linux/bitops.h> #include <linux/phy.h> #include <linux/module.h> -#define RTL821x_PHYSR 0x11 -#define RTL821x_PHYSR_DUPLEX 0x2000 -#define RTL821x_PHYSR_SPEED 0xc000 -#define RTL821x_INER 0x12 -#define RTL821x_INER_INIT 0x6400 -#define RTL821x_INSR 0x13 -#define RTL821x_PAGE_SELECT 0x1f -#define RTL8211E_INER_LINK_STATUS 0x400 +#define RTL821x_PHYSR 0x11 +#define RTL821x_PHYSR_DUPLEX BIT(13) +#define RTL821x_PHYSR_SPEED GENMASK(15, 14) -#define RTL8211F_INER_LINK_STATUS 0x0010 -#define RTL8211F_INSR 0x1d -#define RTL8211F_TX_DELAY 0x100 +#define RTL821x_INER 0x12 +#define RTL8211B_INER_INIT 0x6400 +#define RTL8211E_INER_LINK_STATUS BIT(10) +#define RTL8211F_INER_LINK_STATUS BIT(4) -#define RTL8201F_ISR 0x1e -#define RTL8201F_IER 0x13 +#define RTL821x_INSR 0x13 + +#define RTL821x_PAGE_SELECT 0x1f + +#define RTL8211F_INSR 0x1d + +#define RTL8211F_TX_DELAY BIT(8) + +#define RTL8201F_ISR 0x1e +#define RTL8201F_IER 0x13 MODULE_DESCRIPTION("Realtek PHY driver"); MODULE_AUTHOR("Johnson Leung"); MODULE_LICENSE("GPL"); +static int rtl8211x_page_read(struct phy_device *phydev, u16 page, u16 address) +{ + int ret; + + ret = phy_write(phydev, RTL821x_PAGE_SELECT, page); + if (ret) + return ret; + + ret = phy_read(phydev, address); + + /* restore to default page 0 */ + phy_write(phydev, RTL821x_PAGE_SELECT, 0x0); + + return ret; +} + +static int rtl8211x_page_write(struct phy_device *phydev, u16 page, + u16 address, u16 val) +{ + int ret; + + ret = phy_write(phydev, RTL821x_PAGE_SELECT, page); + if (ret) + return ret; + + ret = phy_write(phydev, address, val); + + /* restore to default page 0 */ + phy_write(phydev, RTL821x_PAGE_SELECT, 0x0); + + return ret; +} + static int rtl8201_ack_interrupt(struct phy_device *phydev) { int err; @@ -58,31 +96,21 @@ static int rtl8211f_ack_interrupt(struct phy_device *phydev) { int err; - phy_write(phydev, RTL821x_PAGE_SELECT, 0xa43); - err = phy_read(phydev, RTL8211F_INSR); - /* restore to default page 0 */ - phy_write(phydev, RTL821x_PAGE_SELECT, 0x0); + err = rtl8211x_page_read(phydev, 0xa43, RTL8211F_INSR); return (err < 0) ? err : 0; } static int rtl8201_config_intr(struct phy_device *phydev) { - int err; - - /* switch to page 7 */ - phy_write(phydev, RTL821x_PAGE_SELECT, 0x7); + u16 val; if (phydev->interrupts == PHY_INTERRUPT_ENABLED) - err = phy_write(phydev, RTL8201F_IER, - BIT(13) | BIT(12) | BIT(11)); + val = BIT(13) | BIT(12) | BIT(11); else - err = phy_write(phydev, RTL8201F_IER, 0); - - /* restore to default page 0 */ - phy_write(phydev, RTL821x_PAGE_SELECT, 0x0); + val = 0; - return err; + return rtl8211x_page_write(phydev, 0x7, RTL8201F_IER, val); } static int rtl8211b_config_intr(struct phy_device *phydev) @@ -91,7 +119,7 @@ static int rtl8211b_config_intr(struct phy_device *phydev) if (phydev->interrupts == PHY_INTERRUPT_ENABLED) err = phy_write(phydev, RTL821x_INER, - RTL821x_INER_INIT); + RTL8211B_INER_INIT); else err = phy_write(phydev, RTL821x_INER, 0); @@ -113,41 +141,41 @@ static int rtl8211e_config_intr(struct phy_device *phydev) static int rtl8211f_config_intr(struct phy_device *phydev) { - int err; + u16 val; - phy_write(phydev, RTL821x_PAGE_SELECT, 0xa42); if (phydev->interrupts == PHY_INTERRUPT_ENABLED) - err = phy_write(phydev, RTL821x_INER, - RTL8211F_INER_LINK_STATUS); + val = RTL8211F_INER_LINK_STATUS; else - err = phy_write(phydev, RTL821x_INER, 0); - phy_write(phydev, RTL821x_PAGE_SELECT, 0); + val = 0; - return err; + return rtl8211x_page_write(phydev, 0xa42, RTL821x_INER, val); } static int rtl8211f_config_init(struct phy_device *phydev) { int ret; - u16 reg; + u16 val; ret = genphy_config_init(phydev); if (ret < 0) return ret; - phy_write(phydev, RTL821x_PAGE_SELECT, 0xd08); - reg = phy_read(phydev, 0x11); + ret = rtl8211x_page_read(phydev, 0xd08, 0x11); + if (ret < 0) + return ret; + + val = ret & 0xffff; /* enable TX-delay for rgmii-id and rgmii-txid, otherwise disable it */ if (phydev->interface == PHY_INTERFACE_MODE_RGMII_ID || phydev->interface == PHY_INTERFACE_MODE_RGMII_TXID) - reg |= RTL8211F_TX_DELAY; + val |= RTL8211F_TX_DELAY; else - reg &= ~RTL8211F_TX_DELAY; + val &= ~RTL8211F_TX_DELAY; - phy_write(phydev, 0x11, reg); - /* restore to default page 0 */ - phy_write(phydev, RTL821x_PAGE_SELECT, 0x0); + ret = rtl8211x_page_write(phydev, 0xd08, 0x11, val); + if (ret) + return ret; return 0; } @@ -159,16 +187,12 @@ static struct phy_driver realtek_drvs[] = { .phy_id_mask = 0x0000ffff, .features = PHY_BASIC_FEATURES, .flags = PHY_HAS_INTERRUPT, - .config_aneg = &genphy_config_aneg, - .read_status = &genphy_read_status, }, { .phy_id = 0x001cc816, .name = "RTL8201F 10/100Mbps Ethernet", .phy_id_mask = 0x001fffff, .features = PHY_BASIC_FEATURES, .flags = PHY_HAS_INTERRUPT, - .config_aneg = &genphy_config_aneg, - .read_status = &genphy_read_status, .ack_interrupt = &rtl8201_ack_interrupt, .config_intr = &rtl8201_config_intr, .suspend = genphy_suspend, @@ -179,8 +203,6 @@ static struct phy_driver realtek_drvs[] = { .phy_id_mask = 0x001fffff, .features = PHY_GBIT_FEATURES, .flags = PHY_HAS_INTERRUPT, - .config_aneg = &genphy_config_aneg, - .read_status = &genphy_read_status, .ack_interrupt = &rtl821x_ack_interrupt, .config_intr = &rtl8211b_config_intr, }, { @@ -189,8 +211,6 @@ static struct phy_driver realtek_drvs[] = { .phy_id_mask = 0x001fffff, .features = PHY_GBIT_FEATURES, .flags = PHY_HAS_INTERRUPT, - .config_aneg = genphy_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = rtl821x_ack_interrupt, .config_intr = rtl8211e_config_intr, .suspend = genphy_suspend, @@ -201,8 +221,6 @@ static struct phy_driver realtek_drvs[] = { .phy_id_mask = 0x001fffff, .features = PHY_GBIT_FEATURES, .flags = PHY_HAS_INTERRUPT, - .config_aneg = &genphy_config_aneg, - .read_status = &genphy_read_status, .ack_interrupt = &rtl821x_ack_interrupt, .config_intr = &rtl8211e_config_intr, .suspend = genphy_suspend, @@ -213,9 +231,7 @@ static struct phy_driver realtek_drvs[] = { .phy_id_mask = 0x001fffff, .features = PHY_GBIT_FEATURES, .flags = PHY_HAS_INTERRUPT, - .config_aneg = &genphy_config_aneg, .config_init = &rtl8211f_config_init, - .read_status = &genphy_read_status, .ack_interrupt = &rtl8211f_ack_interrupt, .config_intr = &rtl8211f_config_intr, .suspend = genphy_suspend, diff --git a/drivers/net/phy/rockchip.c b/drivers/net/phy/rockchip.c index c092af137056..f1da70b9b55f 100644 --- a/drivers/net/phy/rockchip.c +++ b/drivers/net/phy/rockchip.c @@ -213,7 +213,6 @@ static struct phy_driver rockchip_phy_driver[] = { .soft_reset = genphy_soft_reset, .config_init = rockchip_integrated_phy_config_init, .config_aneg = rockchip_config_aneg, - .read_status = genphy_read_status, .suspend = genphy_suspend, .resume = rockchip_phy_resume, }, diff --git a/drivers/net/phy/sfp-bus.c b/drivers/net/phy/sfp-bus.c index 8a1b1f4c1b7c..1356dba0d9d3 100644 --- a/drivers/net/phy/sfp-bus.c +++ b/drivers/net/phy/sfp-bus.c @@ -8,10 +8,14 @@ #include "sfp.h" +/** + * struct sfp_bus - internal representation of a sfp bus + */ struct sfp_bus { + /* private: */ struct kref kref; struct list_head node; - struct device_node *device_node; + struct fwnode_handle *fwnode; const struct sfp_socket_ops *socket_ops; struct device *sfp_dev; @@ -26,6 +30,20 @@ struct sfp_bus { bool started; }; +/** + * sfp_parse_port() - Parse the EEPROM base ID, setting the port type + * @bus: a pointer to the &struct sfp_bus structure for the sfp module + * @id: a pointer to the module's &struct sfp_eeprom_id + * @support: optional pointer to an array of unsigned long for the + * ethtool support mask + * + * Parse the EEPROM identification given in @id, and return one of + * %PORT_TP, %PORT_FIBRE or %PORT_OTHER. If @support is non-%NULL, + * also set the ethtool %ETHTOOL_LINK_MODE_xxx_BIT corresponding with + * the connector type. + * + * If the port type is not known, returns %PORT_OTHER. + */ int sfp_parse_port(struct sfp_bus *bus, const struct sfp_eeprom_id *id, unsigned long *support) { @@ -78,6 +96,24 @@ int sfp_parse_port(struct sfp_bus *bus, const struct sfp_eeprom_id *id, } EXPORT_SYMBOL_GPL(sfp_parse_port); +/** + * sfp_parse_interface() - Parse the phy_interface_t + * @bus: a pointer to the &struct sfp_bus structure for the sfp module + * @id: a pointer to the module's &struct sfp_eeprom_id + * + * Derive the phy_interface_t mode for the information found in the + * module's identifying EEPROM. There is no standard or defined way + * to derive this information, so we use some heuristics. + * + * If the encoding is 64b66b, then the module must be >= 10G, so + * return %PHY_INTERFACE_MODE_10GKR. + * + * If it's 8b10b, then it's 1G or slower. If it's definitely a fibre + * module, return %PHY_INTERFACE_MODE_1000BASEX mode, otherwise return + * %PHY_INTERFACE_MODE_SGMII mode. + * + * If the encoding is not known, return %PHY_INTERFACE_MODE_NA. + */ phy_interface_t sfp_parse_interface(struct sfp_bus *bus, const struct sfp_eeprom_id *id) { @@ -117,6 +153,15 @@ phy_interface_t sfp_parse_interface(struct sfp_bus *bus, } EXPORT_SYMBOL_GPL(sfp_parse_interface); +/** + * sfp_parse_support() - Parse the eeprom id for supported link modes + * @bus: a pointer to the &struct sfp_bus structure for the sfp module + * @id: a pointer to the module's &struct sfp_eeprom_id + * @support: pointer to an array of unsigned long for the ethtool support mask + * + * Parse the EEPROM identification information and derive the supported + * ethtool link modes for the module. + */ void sfp_parse_support(struct sfp_bus *bus, const struct sfp_eeprom_id *id, unsigned long *support) { @@ -215,7 +260,7 @@ static const struct sfp_upstream_ops *sfp_get_upstream_ops(struct sfp_bus *bus) return bus->registered ? bus->upstream_ops : NULL; } -static struct sfp_bus *sfp_bus_get(struct device_node *np) +static struct sfp_bus *sfp_bus_get(struct fwnode_handle *fwnode) { struct sfp_bus *sfp, *new, *found = NULL; @@ -224,7 +269,7 @@ static struct sfp_bus *sfp_bus_get(struct device_node *np) mutex_lock(&sfp_mutex); list_for_each_entry(sfp, &sfp_buses, node) { - if (sfp->device_node == np) { + if (sfp->fwnode == fwnode) { kref_get(&sfp->kref); found = sfp; break; @@ -233,7 +278,7 @@ static struct sfp_bus *sfp_bus_get(struct device_node *np) if (!found && new) { kref_init(&new->kref); - new->device_node = np; + new->fwnode = fwnode; list_add(&new->node, &sfp_buses); found = new; new = NULL; @@ -246,7 +291,7 @@ static struct sfp_bus *sfp_bus_get(struct device_node *np) return found; } -static void sfp_bus_release(struct kref *kref) __releases(sfp_mutex) +static void sfp_bus_release(struct kref *kref) { struct sfp_bus *bus = container_of(kref, struct sfp_bus, kref); @@ -293,6 +338,16 @@ static void sfp_unregister_bus(struct sfp_bus *bus) bus->registered = false; } +/** + * sfp_get_module_info() - Get the ethtool_modinfo for a SFP module + * @bus: a pointer to the &struct sfp_bus structure for the sfp module + * @modinfo: a &struct ethtool_modinfo + * + * Fill in the type and eeprom_len parameters in @modinfo for a module on + * the sfp bus specified by @bus. + * + * Returns 0 on success or a negative errno number. + */ int sfp_get_module_info(struct sfp_bus *bus, struct ethtool_modinfo *modinfo) { if (!bus->registered) @@ -301,6 +356,17 @@ int sfp_get_module_info(struct sfp_bus *bus, struct ethtool_modinfo *modinfo) } EXPORT_SYMBOL_GPL(sfp_get_module_info); +/** + * sfp_get_module_eeprom() - Read the SFP module EEPROM + * @bus: a pointer to the &struct sfp_bus structure for the sfp module + * @ee: a &struct ethtool_eeprom + * @data: buffer to contain the EEPROM data (must be at least @ee->len bytes) + * + * Read the EEPROM as specified by the supplied @ee. See the documentation + * for &struct ethtool_eeprom for the region to be read. + * + * Returns 0 on success or a negative errno number. + */ int sfp_get_module_eeprom(struct sfp_bus *bus, struct ethtool_eeprom *ee, u8 *data) { @@ -310,6 +376,15 @@ int sfp_get_module_eeprom(struct sfp_bus *bus, struct ethtool_eeprom *ee, } EXPORT_SYMBOL_GPL(sfp_get_module_eeprom); +/** + * sfp_upstream_start() - Inform the SFP that the network device is up + * @bus: a pointer to the &struct sfp_bus structure for the sfp module + * + * Inform the SFP socket that the network device is now up, so that the + * module can be enabled by allowing TX_DISABLE to be deasserted. This + * should be called from the network device driver's &struct net_device_ops + * ndo_open() method. + */ void sfp_upstream_start(struct sfp_bus *bus) { if (bus->registered) @@ -318,6 +393,15 @@ void sfp_upstream_start(struct sfp_bus *bus) } EXPORT_SYMBOL_GPL(sfp_upstream_start); +/** + * sfp_upstream_stop() - Inform the SFP that the network device is down + * @bus: a pointer to the &struct sfp_bus structure for the sfp module + * + * Inform the SFP socket that the network device is now up, so that the + * module can be disabled by asserting TX_DISABLE, disabling the laser + * in optical modules. This should be called from the network device + * driver's &struct net_device_ops ndo_stop() method. + */ void sfp_upstream_stop(struct sfp_bus *bus) { if (bus->registered) @@ -326,11 +410,24 @@ void sfp_upstream_stop(struct sfp_bus *bus) } EXPORT_SYMBOL_GPL(sfp_upstream_stop); -struct sfp_bus *sfp_register_upstream(struct device_node *np, +/** + * sfp_register_upstream() - Register the neighbouring device + * @np: device node for the SFP bus + * @ndev: network device associated with the interface + * @upstream: the upstream private data + * @ops: the upstream's &struct sfp_upstream_ops + * + * Register the upstream device (eg, PHY) with the SFP bus. MAC drivers + * should use phylink, which will call this function for them. Returns + * a pointer to the allocated &struct sfp_bus. + * + * On error, returns %NULL. + */ +struct sfp_bus *sfp_register_upstream(struct fwnode_handle *fwnode, struct net_device *ndev, void *upstream, const struct sfp_upstream_ops *ops) { - struct sfp_bus *bus = sfp_bus_get(np); + struct sfp_bus *bus = sfp_bus_get(fwnode); int ret = 0; if (bus) { @@ -353,6 +450,13 @@ struct sfp_bus *sfp_register_upstream(struct device_node *np, } EXPORT_SYMBOL_GPL(sfp_register_upstream); +/** + * sfp_unregister_upstream() - Unregister sfp bus + * @bus: a pointer to the &struct sfp_bus structure for the sfp module + * + * Unregister a previously registered upstream connection for the SFP + * module. @bus is returned from sfp_register_upstream(). + */ void sfp_unregister_upstream(struct sfp_bus *bus) { rtnl_lock(); @@ -433,7 +537,7 @@ EXPORT_SYMBOL_GPL(sfp_module_remove); struct sfp_bus *sfp_register_socket(struct device *dev, struct sfp *sfp, const struct sfp_socket_ops *ops) { - struct sfp_bus *bus = sfp_bus_get(dev->of_node); + struct sfp_bus *bus = sfp_bus_get(dev->fwnode); int ret = 0; if (bus) { diff --git a/drivers/net/phy/smsc.c b/drivers/net/phy/smsc.c index 2306bfae057f..a1961ba87e2b 100644 --- a/drivers/net/phy/smsc.c +++ b/drivers/net/phy/smsc.c @@ -227,8 +227,6 @@ static struct phy_driver smsc_phy_driver[] = { .probe = smsc_phy_probe, /* basic functions */ - .config_aneg = genphy_config_aneg, - .read_status = genphy_read_status, .config_init = smsc_phy_config_init, .soft_reset = smsc_phy_reset, @@ -249,8 +247,6 @@ static struct phy_driver smsc_phy_driver[] = { .probe = smsc_phy_probe, /* basic functions */ - .config_aneg = genphy_config_aneg, - .read_status = genphy_read_status, .config_init = smsc_phy_config_init, .soft_reset = smsc_phy_reset, @@ -276,7 +272,6 @@ static struct phy_driver smsc_phy_driver[] = { .probe = smsc_phy_probe, /* basic functions */ - .config_aneg = genphy_config_aneg, .read_status = lan87xx_read_status, .config_init = smsc_phy_config_init, .soft_reset = smsc_phy_reset, @@ -303,8 +298,6 @@ static struct phy_driver smsc_phy_driver[] = { .probe = smsc_phy_probe, /* basic functions */ - .config_aneg = genphy_config_aneg, - .read_status = genphy_read_status, .config_init = lan911x_config_init, /* IRQ related */ @@ -324,7 +317,6 @@ static struct phy_driver smsc_phy_driver[] = { .probe = smsc_phy_probe, /* basic functions */ - .config_aneg = genphy_config_aneg, .read_status = lan87xx_read_status, .config_init = smsc_phy_config_init, .soft_reset = smsc_phy_reset, @@ -351,7 +343,6 @@ static struct phy_driver smsc_phy_driver[] = { .probe = smsc_phy_probe, /* basic functions */ - .config_aneg = genphy_config_aneg, .read_status = lan87xx_read_status, .config_init = smsc_phy_config_init, .soft_reset = smsc_phy_reset, diff --git a/drivers/net/phy/ste10Xp.c b/drivers/net/phy/ste10Xp.c index d00cfb64529e..fbd548a1ad84 100644 --- a/drivers/net/phy/ste10Xp.c +++ b/drivers/net/phy/ste10Xp.c @@ -89,8 +89,6 @@ static struct phy_driver ste10xp_pdriver[] = { .features = PHY_BASIC_FEATURES | SUPPORTED_Pause, .flags = PHY_HAS_INTERRUPT, .config_init = ste10Xp_config_init, - .config_aneg = genphy_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = ste10Xp_ack_interrupt, .config_intr = ste10Xp_config_intr, .suspend = genphy_suspend, @@ -102,8 +100,6 @@ static struct phy_driver ste10xp_pdriver[] = { .features = PHY_BASIC_FEATURES | SUPPORTED_Pause, .flags = PHY_HAS_INTERRUPT, .config_init = ste10Xp_config_init, - .config_aneg = genphy_config_aneg, - .read_status = genphy_read_status, .ack_interrupt = ste10Xp_ack_interrupt, .config_intr = ste10Xp_config_intr, .suspend = genphy_suspend, diff --git a/drivers/net/phy/uPD60620.c b/drivers/net/phy/uPD60620.c index 96b33475ea5e..55f48ee3595a 100644 --- a/drivers/net/phy/uPD60620.c +++ b/drivers/net/phy/uPD60620.c @@ -95,7 +95,6 @@ static struct phy_driver upd60620_driver[1] = { { .features = PHY_BASIC_FEATURES, .flags = 0, .config_init = upd60620_config_init, - .config_aneg = genphy_config_aneg, .read_status = upd60620_read_status, } }; diff --git a/drivers/net/phy/vitesse.c b/drivers/net/phy/vitesse.c index f78ff0279648..d9dd8fbfffc7 100644 --- a/drivers/net/phy/vitesse.c +++ b/drivers/net/phy/vitesse.c @@ -267,7 +267,6 @@ static struct phy_driver vsc82xx_driver[] = { .flags = PHY_HAS_INTERRUPT, .config_init = &vsc824x_config_init, .config_aneg = &vsc82x4_config_aneg, - .read_status = &genphy_read_status, .ack_interrupt = &vsc824x_ack_interrupt, .config_intr = &vsc82xx_config_intr, }, { @@ -278,7 +277,6 @@ static struct phy_driver vsc82xx_driver[] = { .flags = PHY_HAS_INTERRUPT, .config_init = &vsc824x_config_init, .config_aneg = &vsc82x4_config_aneg, - .read_status = &genphy_read_status, .ack_interrupt = &vsc824x_ack_interrupt, .config_intr = &vsc82xx_config_intr, }, { @@ -289,7 +287,6 @@ static struct phy_driver vsc82xx_driver[] = { .flags = PHY_HAS_INTERRUPT, .config_init = &vsc824x_config_init, .config_aneg = &vsc82x4_config_aneg, - .read_status = &genphy_read_status, .ack_interrupt = &vsc824x_ack_interrupt, .config_intr = &vsc82xx_config_intr, }, { @@ -300,7 +297,6 @@ static struct phy_driver vsc82xx_driver[] = { .flags = PHY_HAS_INTERRUPT, .config_init = &vsc824x_config_init, .config_aneg = &vsc82x4_config_aneg, - .read_status = &genphy_read_status, .ack_interrupt = &vsc824x_ack_interrupt, .config_intr = &vsc82xx_config_intr, }, { @@ -311,7 +307,6 @@ static struct phy_driver vsc82xx_driver[] = { .flags = PHY_HAS_INTERRUPT, .config_init = &vsc824x_config_init, .config_aneg = &vsc82x4_config_aneg, - .read_status = &genphy_read_status, .ack_interrupt = &vsc824x_ack_interrupt, .config_intr = &vsc82xx_config_intr, }, { @@ -321,8 +316,6 @@ static struct phy_driver vsc82xx_driver[] = { .features = PHY_GBIT_FEATURES, .flags = PHY_HAS_INTERRUPT, .config_init = &vsc8601_config_init, - .config_aneg = &genphy_config_aneg, - .read_status = &genphy_read_status, .ack_interrupt = &vsc824x_ack_interrupt, .config_intr = &vsc82xx_config_intr, }, { @@ -333,7 +326,6 @@ static struct phy_driver vsc82xx_driver[] = { .flags = PHY_HAS_INTERRUPT, .config_init = &vsc824x_config_init, .config_aneg = &vsc82x4_config_aneg, - .read_status = &genphy_read_status, .ack_interrupt = &vsc824x_ack_interrupt, .config_intr = &vsc82xx_config_intr, }, { @@ -344,8 +336,6 @@ static struct phy_driver vsc82xx_driver[] = { .features = PHY_GBIT_FEATURES, .flags = PHY_HAS_INTERRUPT, .config_init = &vsc8221_config_init, - .config_aneg = &genphy_config_aneg, - .read_status = &genphy_read_status, .ack_interrupt = &vsc824x_ack_interrupt, .config_intr = &vsc82xx_config_intr, }, { @@ -356,8 +346,6 @@ static struct phy_driver vsc82xx_driver[] = { .features = PHY_GBIT_FEATURES, .flags = PHY_HAS_INTERRUPT, .config_init = &vsc8221_config_init, - .config_aneg = &genphy_config_aneg, - .read_status = &genphy_read_status, .ack_interrupt = &vsc824x_ack_interrupt, .config_intr = &vsc82xx_config_intr, } }; diff --git a/drivers/net/slip/slip.c b/drivers/net/slip/slip.c index cc63102ca96e..8940417c30e5 100644 --- a/drivers/net/slip/slip.c +++ b/drivers/net/slip/slip.c @@ -731,7 +731,7 @@ static void sl_sync(void) /* Find a free SLIP channel, and link in this `tty' line. */ -static struct slip *sl_alloc(dev_t line) +static struct slip *sl_alloc(void) { int i; char name[IFNAMSIZ]; @@ -809,7 +809,7 @@ static int slip_open(struct tty_struct *tty) /* OK. Find a free SLIP channel to use. */ err = -ENFILE; - sl = sl_alloc(tty_devnum(tty)); + sl = sl_alloc(); if (sl == NULL) goto err_exit; diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 4f4a842a1c9c..e367d6310353 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -195,6 +195,11 @@ struct tun_flow_entry { #define TUN_NUM_FLOW_ENTRIES 1024 +struct tun_steering_prog { + struct rcu_head rcu; + struct bpf_prog *prog; +}; + /* Since the socket were moved to tun_file, to preserve the behavior of persist * device, socket filter, sndbuf and vnet header size were restore when the * file were attached to a persist device. @@ -232,6 +237,7 @@ struct tun_struct { u32 rx_batched; struct tun_pcpu_stats __percpu *pcpu_stats; struct bpf_prog __rcu *xdp_prog; + struct tun_steering_prog __rcu *steering_prog; }; static int tun_napi_receive(struct napi_struct *napi, int budget) @@ -537,15 +543,12 @@ static inline void tun_flow_save_rps_rxhash(struct tun_flow_entry *e, u32 hash) * different rxq no. here. If we could not get rxhash, then we would * hope the rxq no. may help here. */ -static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb, - void *accel_priv, select_queue_fallback_t fallback) +static u16 tun_automq_select_queue(struct tun_struct *tun, struct sk_buff *skb) { - struct tun_struct *tun = netdev_priv(dev); struct tun_flow_entry *e; u32 txq = 0; u32 numqueues = 0; - rcu_read_lock(); numqueues = READ_ONCE(tun->numqueues); txq = __skb_get_hash_symmetric(skb); @@ -563,10 +566,37 @@ static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb, txq -= numqueues; } - rcu_read_unlock(); return txq; } +static u16 tun_ebpf_select_queue(struct tun_struct *tun, struct sk_buff *skb) +{ + struct tun_steering_prog *prog; + u16 ret = 0; + + prog = rcu_dereference(tun->steering_prog); + if (prog) + ret = bpf_prog_run_clear_cb(prog->prog, skb); + + return ret % tun->numqueues; +} + +static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb, + void *accel_priv, select_queue_fallback_t fallback) +{ + struct tun_struct *tun = netdev_priv(dev); + u16 ret; + + rcu_read_lock(); + if (rcu_dereference(tun->steering_prog)) + ret = tun_ebpf_select_queue(tun, skb); + else + ret = tun_automq_select_queue(tun, skb); + rcu_read_unlock(); + + return ret; +} + static inline bool tun_not_capable(struct tun_struct *tun) { const struct cred *cred = current_cred(); @@ -673,7 +703,6 @@ static void tun_detach(struct tun_file *tfile, bool clean) static void tun_detach_all(struct net_device *dev) { struct tun_struct *tun = netdev_priv(dev); - struct bpf_prog *xdp_prog = rtnl_dereference(tun->xdp_prog); struct tun_file *tfile, *tmp; int i, n = tun->numqueues; @@ -708,9 +737,6 @@ static void tun_detach_all(struct net_device *dev) } BUG_ON(tun->numdisabled != 0); - if (xdp_prog) - bpf_prog_put(xdp_prog); - if (tun->flags & IFF_PERSIST) module_put(THIS_MODULE); } @@ -937,23 +963,10 @@ static int tun_net_close(struct net_device *dev) } /* Net device start xmit */ -static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev) +static void tun_automq_xmit(struct tun_struct *tun, struct sk_buff *skb) { - struct tun_struct *tun = netdev_priv(dev); - int txq = skb->queue_mapping; - struct tun_file *tfile; - u32 numqueues = 0; - - rcu_read_lock(); - tfile = rcu_dereference(tun->tfiles[txq]); - numqueues = READ_ONCE(tun->numqueues); - - /* Drop packet if interface is not attached */ - if (txq >= numqueues) - goto drop; - #ifdef CONFIG_RPS - if (numqueues == 1 && static_key_false(&rps_needed)) { + if (tun->numqueues == 1 && static_key_false(&rps_needed)) { /* Select queue was not called for the skbuff, so we extract the * RPS hash and save it into the flow_table here. */ @@ -969,6 +982,24 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev) } } #endif +} + +/* Net device start xmit */ +static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev) +{ + struct tun_struct *tun = netdev_priv(dev); + int txq = skb->queue_mapping; + struct tun_file *tfile; + + rcu_read_lock(); + tfile = rcu_dereference(tun->tfiles[txq]); + + /* Drop packet if interface is not attached */ + if (txq >= tun->numqueues) + goto drop; + + if (!rcu_dereference(tun->steering_prog)) + tun_automq_xmit(tun, skb); tun_debug(KERN_INFO, tun, "tun_net_xmit %d\n", skb->len); @@ -1551,7 +1582,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile, int copylen; bool zerocopy = false; int err; - u32 rxhash; + u32 rxhash = 0; int skb_xdp = 1; bool frags = tun_napi_frags_enabled(tun); @@ -1739,7 +1770,10 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile, rcu_read_unlock(); } - rxhash = __skb_get_hash_symmetric(skb); + rcu_read_lock(); + if (!rcu_dereference(tun->steering_prog)) + rxhash = __skb_get_hash_symmetric(skb); + rcu_read_unlock(); if (frags) { /* Exercise flow dissector code path. */ @@ -1783,7 +1817,9 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile, u64_stats_update_end(&stats->syncp); put_cpu_ptr(stats); - tun_flow_update(tun, rxhash, tfile); + if (rxhash) + tun_flow_update(tun, rxhash, tfile); + return total_len; } @@ -1991,6 +2027,39 @@ static ssize_t tun_chr_read_iter(struct kiocb *iocb, struct iov_iter *to) return ret; } +static void tun_steering_prog_free(struct rcu_head *rcu) +{ + struct tun_steering_prog *prog = container_of(rcu, + struct tun_steering_prog, rcu); + + bpf_prog_destroy(prog->prog); + kfree(prog); +} + +static int __tun_set_steering_ebpf(struct tun_struct *tun, + struct bpf_prog *prog) +{ + struct tun_steering_prog *old, *new = NULL; + + if (prog) { + new = kmalloc(sizeof(*new), GFP_KERNEL); + if (!new) + return -ENOMEM; + new->prog = prog; + } + + spin_lock_bh(&tun->lock); + old = rcu_dereference_protected(tun->steering_prog, + lockdep_is_held(&tun->lock)); + rcu_assign_pointer(tun->steering_prog, new); + spin_unlock_bh(&tun->lock); + + if (old) + call_rcu(&old->rcu, tun_steering_prog_free); + + return 0; +} + static void tun_free_netdev(struct net_device *dev) { struct tun_struct *tun = netdev_priv(dev); @@ -1999,6 +2068,7 @@ static void tun_free_netdev(struct net_device *dev) free_percpu(tun->pcpu_stats); tun_flow_uninit(tun); security_tun_dev_free_security(tun->security); + __tun_set_steering_ebpf(tun, NULL); } static void tun_setup(struct net_device *dev) @@ -2287,6 +2357,7 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr) tun->filter_attached = false; tun->sndbuf = tfile->socket.sk->sk_sndbuf; tun->rx_batched = 0; + RCU_INIT_POINTER(tun->steering_prog, NULL); tun->pcpu_stats = netdev_alloc_pcpu_stats(struct tun_pcpu_stats); if (!tun->pcpu_stats) { @@ -2479,6 +2550,25 @@ unlock: return ret; } +static int tun_set_steering_ebpf(struct tun_struct *tun, void __user *data) +{ + struct bpf_prog *prog; + int fd; + + if (copy_from_user(&fd, data, sizeof(fd))) + return -EFAULT; + + if (fd == -1) { + prog = NULL; + } else { + prog = bpf_prog_get_type(fd, BPF_PROG_TYPE_SOCKET_FILTER); + if (IS_ERR(prog)) + return PTR_ERR(prog); + } + + return __tun_set_steering_ebpf(tun, prog); +} + static long __tun_chr_ioctl(struct file *file, unsigned int cmd, unsigned long arg, int ifreq_len) { @@ -2755,6 +2845,10 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd, ret = 0; break; + case TUNSETSTEERINGEBPF: + ret = tun_set_steering_ebpf(tun, argp); + break; + default: ret = -EINVAL; break; diff --git a/drivers/net/veth.c b/drivers/net/veth.c index f5438d0978ca..a69ad39ee57e 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -410,6 +410,9 @@ static int veth_newlink(struct net *src_net, struct net_device *dev, if (ifmp && (dev->ifindex != 0)) peer->ifindex = ifmp->ifi_index; + peer->gso_max_size = dev->gso_max_size; + peer->gso_max_segs = dev->gso_max_segs; + err = register_netdevice(peer); put_net(net); net = NULL; diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 559b215c0169..6fb7b658a6cc 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -261,9 +261,12 @@ static void virtqueue_napi_complete(struct napi_struct *napi, int opaque; opaque = virtqueue_enable_cb_prepare(vq); - if (napi_complete_done(napi, processed) && - unlikely(virtqueue_poll(vq, opaque))) - virtqueue_napi_schedule(napi, vq); + if (napi_complete_done(napi, processed)) { + if (unlikely(virtqueue_poll(vq, opaque))) + virtqueue_napi_schedule(napi, vq); + } else { + virtqueue_disable_cb(vq); + } } static void skb_xmit_done(struct virtqueue *vq) diff --git a/drivers/net/vmxnet3/vmxnet3_int.h b/drivers/net/vmxnet3/vmxnet3_int.h index 9c51b8be0038..5ba222920e80 100644 --- a/drivers/net/vmxnet3/vmxnet3_int.h +++ b/drivers/net/vmxnet3/vmxnet3_int.h @@ -69,10 +69,10 @@ /* * Version numbers */ -#define VMXNET3_DRIVER_VERSION_STRING "1.4.a.0-k" +#define VMXNET3_DRIVER_VERSION_STRING "1.4.11.0-k" /* a 32-bit int, each byte encode a verion number in VMXNET3_DRIVER_VERSION */ -#define VMXNET3_DRIVER_VERSION_NUM 0x01040a00 +#define VMXNET3_DRIVER_VERSION_NUM 0x01040b00 #if defined(CONFIG_PCI_MSI) /* RSS only makes sense if MSI-X is supported. */ @@ -416,8 +416,8 @@ struct vmxnet3_adapter { /* must be a multiple of VMXNET3_RING_SIZE_ALIGN */ #define VMXNET3_DEF_TX_RING_SIZE 512 -#define VMXNET3_DEF_RX_RING_SIZE 256 -#define VMXNET3_DEF_RX_RING2_SIZE 128 +#define VMXNET3_DEF_RX_RING_SIZE 1024 +#define VMXNET3_DEF_RX_RING2_SIZE 256 #define VMXNET3_DEF_RXDATA_DESC_SIZE 128 diff --git a/include/linux/dsa/lan9303.h b/include/linux/dsa/lan9303.h index f48a85c377de..b6514c29563f 100644 --- a/include/linux/dsa/lan9303.h +++ b/include/linux/dsa/lan9303.h @@ -26,6 +26,7 @@ struct lan9303 { bool phy_addr_sel_strap; struct dsa_switch *ds; struct mutex indirect_mutex; /* protect indexed register access */ + struct mutex alr_mutex; /* protect ALR access */ const struct lan9303_phy_ops *ops; bool is_bridged; /* true if port 1 and 2 are bridged */ diff --git a/include/linux/filter.h b/include/linux/filter.h index 80b5b482cb46..0062302e1285 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -985,6 +985,7 @@ struct bpf_sock_ops_kern { u32 reply; u32 replylong[4]; }; + u32 is_fullsock; }; #endif /* __LINUX_FILTER_H__ */ diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index 6c9336626592..93bd6fcd6e62 100644 --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -127,28 +127,6 @@ struct hv_ring_buffer_info { u32 priv_read_index; }; -/* - * - * hv_get_ringbuffer_availbytes() - * - * Get number of bytes available to read and to write to - * for the specified ring buffer - */ -static inline void -hv_get_ringbuffer_availbytes(const struct hv_ring_buffer_info *rbi, - u32 *read, u32 *write) -{ - u32 read_loc, write_loc, dsize; - - /* Capture the read/write indices before they changed */ - read_loc = rbi->ring_buffer->read_index; - write_loc = rbi->ring_buffer->write_index; - dsize = rbi->ring_datasize; - - *write = write_loc >= read_loc ? dsize - (write_loc - read_loc) : - read_loc - write_loc; - *read = dsize - *write; -} static inline u32 hv_get_bytes_to_read(const struct hv_ring_buffer_info *rbi) { diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h index bedf54b6f943..4cb7aeeafce0 100644 --- a/include/linux/if_macvlan.h +++ b/include/linux/if_macvlan.h @@ -30,10 +30,10 @@ struct macvlan_dev { enum macvlan_mode mode; u16 flags; int nest_level; + unsigned int macaddr_count; #ifdef CONFIG_NET_POLL_CONTROLLER struct netpoll *netpoll; #endif - unsigned int macaddr_count; }; static inline void macvlan_count_rx(const struct macvlan_dev *vlan, diff --git a/include/linux/mdio.h b/include/linux/mdio.h index ca08ab16ecdc..92d4e55ffe67 100644 --- a/include/linux/mdio.h +++ b/include/linux/mdio.h @@ -12,6 +12,7 @@ #include <uapi/linux/mdio.h> #include <linux/mod_devicetable.h> +struct gpio_desc; struct mii_bus; /* Multiple levels of nesting are possible. However typically this is @@ -39,6 +40,7 @@ struct mdio_device { /* Bus address of the MDIO device (0-31) */ int addr; int flags; + struct gpio_desc *reset; }; #define to_mdio_device(d) container_of(d, struct mdio_device, dev) @@ -71,6 +73,7 @@ void mdio_device_free(struct mdio_device *mdiodev); struct mdio_device *mdio_device_create(struct mii_bus *bus, int addr); int mdio_device_register(struct mdio_device *mdiodev); void mdio_device_remove(struct mdio_device *mdiodev); +void mdio_device_reset(struct mdio_device *mdiodev, int value); int mdio_driver_register(struct mdio_driver *drv); void mdio_driver_unregister(struct mdio_driver *drv); int mdio_device_bus_match(struct device *dev, struct device_driver *drv); diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index ef789e1d679e..cc4ce7456e38 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -820,6 +820,8 @@ struct netdev_bpf { struct { u8 prog_attached; u32 prog_id; + /* flags with which program was installed */ + u32 prog_flags; }; /* BPF_OFFLOAD_VERIFIER_PREP */ struct { @@ -3330,7 +3332,8 @@ struct sk_buff *dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev, typedef int (*bpf_op_t)(struct net_device *dev, struct netdev_bpf *bpf); int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack, int fd, u32 flags); -u8 __dev_xdp_attached(struct net_device *dev, bpf_op_t xdp_op, u32 *prog_id); +void __dev_xdp_query(struct net_device *dev, bpf_op_t xdp_op, + struct netdev_bpf *xdp); int __dev_forward_skb(struct net_device *dev, struct sk_buff *skb); int dev_forward_skb(struct net_device *dev, struct sk_buff *skb); diff --git a/include/linux/phy.h b/include/linux/phy.h index dc82a07cb4fd..d3037e2ffbc4 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -468,7 +468,6 @@ struct phy_device { /* Interrupt and Polling infrastructure */ struct work_struct phy_queue; struct delayed_work state_queue; - atomic_t irq_disable; struct mutex lock; @@ -497,19 +496,19 @@ struct phy_device { * flags: A bitfield defining certain other features this PHY * supports (like interrupts) * - * The drivers must implement config_aneg and read_status. All - * other functions are optional. Note that none of these - * functions should be called from interrupt time. The goal is - * for the bus read/write functions to be able to block when the - * bus transaction is happening, and be freed up by an interrupt - * (The MPC85xx has this ability, though it is not currently - * supported in the driver). + * All functions are optional. If config_aneg or read_status + * are not implemented, the phy core uses the genphy versions. + * Note that none of these functions should be called from + * interrupt time. The goal is for the bus read/write functions + * to be able to block when the bus transaction is happening, + * and be freed up by an interrupt (The MPC85xx has this ability, + * though it is not currently supported in the driver). */ struct phy_driver { struct mdio_driver_common mdiodrv; u32 phy_id; char *name; - unsigned int phy_id_mask; + u32 phy_id_mask; u32 features; u32 flags; const void *driver_data; @@ -763,6 +762,20 @@ static inline bool phy_interface_mode_is_rgmii(phy_interface_t mode) }; /** + * phy_interface_mode_is_8023z() - does the phy interface mode use 802.3z + * negotiation + * @mode: one of &enum phy_interface_t + * + * Returns true if the phy interface mode uses the 16-bit negotiation + * word as defined in 802.3z. (See 802.3-2015 37.2.1 Config_Reg encoding) + */ +static inline bool phy_interface_mode_is_8023z(phy_interface_t mode) +{ + return mode == PHY_INTERFACE_MODE_1000BASEX || + mode == PHY_INTERFACE_MODE_2500BASEX; +} + +/** * phy_interface_is_rgmii - Convenience function for testing if a PHY interface * is RGMII (all variants) * @phydev: the phy_device struct @@ -841,12 +854,9 @@ int phy_aneg_done(struct phy_device *phydev); int phy_stop_interrupts(struct phy_device *phydev); int phy_restart_aneg(struct phy_device *phydev); -static inline int phy_read_status(struct phy_device *phydev) +static inline void phy_device_reset(struct phy_device *phydev, int value) { - if (!phydev->drv) - return -EIO; - - return phydev->drv->read_status(phydev); + mdio_device_reset(&phydev->mdio, value); } #define phydev_err(_phydev, format, args...) \ @@ -890,6 +900,17 @@ int genphy_c45_read_pma(struct phy_device *phydev); int genphy_c45_pma_setup_forced(struct phy_device *phydev); int genphy_c45_an_disable_aneg(struct phy_device *phydev); +static inline int phy_read_status(struct phy_device *phydev) +{ + if (!phydev->drv) + return -EIO; + + if (phydev->drv->read_status) + return phydev->drv->read_status(phydev); + else + return genphy_read_status(phydev); +} + void phy_driver_unregister(struct phy_driver *drv); void phy_drivers_unregister(struct phy_driver *drv, int n); int phy_driver_register(struct phy_driver *new_driver, struct module *owner); diff --git a/include/linux/phylink.h b/include/linux/phylink.h index af67edd4ae38..4f0f452ff38d 100644 --- a/include/linux/phylink.h +++ b/include/linux/phylink.h @@ -7,6 +7,7 @@ struct device_node; struct ethtool_cmd; +struct fwnode_handle; struct net_device; enum { @@ -20,19 +21,31 @@ enum { MLO_AN_PHY = 0, /* Conventional PHY */ MLO_AN_FIXED, /* Fixed-link mode */ - MLO_AN_SGMII, /* Cisco SGMII protocol */ - MLO_AN_8023Z, /* 1000base-X protocol */ + MLO_AN_INBAND, /* In-band protocol */ }; static inline bool phylink_autoneg_inband(unsigned int mode) { - return mode == MLO_AN_SGMII || mode == MLO_AN_8023Z; + return mode == MLO_AN_INBAND; } +/** + * struct phylink_link_state - link state structure + * @advertising: ethtool bitmask containing advertised link modes + * @lp_advertising: ethtool bitmask containing link partner advertised link + * modes + * @interface: link &typedef phy_interface_t mode + * @speed: link speed, one of the SPEED_* constants. + * @duplex: link duplex mode, one of DUPLEX_* constants. + * @pause: link pause state, described by MLO_PAUSE_* constants. + * @link: true if the link is up. + * @an_enabled: true if autonegotiation is enabled/desired. + * @an_complete: true if autonegotiation has completed. + */ struct phylink_link_state { __ETHTOOL_DECLARE_LINK_MODE_MASK(advertising); __ETHTOOL_DECLARE_LINK_MODE_MASK(lp_advertising); - phy_interface_t interface; /* PHY_INTERFACE_xxx */ + phy_interface_t interface; int speed; int duplex; int pause; @@ -41,66 +54,136 @@ struct phylink_link_state { unsigned int an_complete:1; }; +/** + * struct phylink_mac_ops - MAC operations structure. + * @validate: Validate and update the link configuration. + * @mac_link_state: Read the current link state from the hardware. + * @mac_config: configure the MAC for the selected mode and state. + * @mac_an_restart: restart 802.3z BaseX autonegotiation. + * @mac_link_down: take the link down. + * @mac_link_up: allow the link to come up. + * + * The individual methods are described more fully below. + */ struct phylink_mac_ops { - /** - * validate: validate and update the link configuration - * @ndev: net_device structure associated with MAC - * @config: configuration to validate - * - * Update the %config->supported and %config->advertised masks - * clearing bits that can not be supported. - * - * Note: the PHY may be able to transform from one connection - * technology to another, so, eg, don't clear 1000BaseX just - * because the MAC is unable to support it. This is more about - * clearing unsupported speeds and duplex settings. - * - * If the %config->interface mode is %PHY_INTERFACE_MODE_1000BASEX - * or %PHY_INTERFACE_MODE_2500BASEX, select the appropriate mode - * based on %config->advertised and/or %config->speed. - */ void (*validate)(struct net_device *ndev, unsigned long *supported, struct phylink_link_state *state); - - /* Read the current link state from the hardware */ - int (*mac_link_state)(struct net_device *, struct phylink_link_state *); - - /* Configure the MAC */ - /** - * mac_config: configure the MAC for the selected mode and state - * @ndev: net_device structure for the MAC - * @mode: one of MLO_AN_FIXED, MLO_AN_PHY, MLO_AN_8023Z, MLO_AN_SGMII - * @state: state structure - * - * The action performed depends on the currently selected mode: - * - * %MLO_AN_FIXED, %MLO_AN_PHY: - * set the specified speed, duplex, pause mode, and phy interface - * mode in the provided @state. - * %MLO_AN_8023Z: - * place the link in 1000base-X mode, advertising the parameters - * given in advertising in @state. - * %MLO_AN_SGMII: - * place the link in Cisco SGMII mode - there is no advertisment - * to make as the PHY communicates the speed and duplex to the - * MAC over the in-band control word. Configuration of the pause - * mode is as per MLO_AN_PHY since this is not included. - */ + int (*mac_link_state)(struct net_device *ndev, + struct phylink_link_state *state); void (*mac_config)(struct net_device *ndev, unsigned int mode, const struct phylink_link_state *state); - - /** - * mac_an_restart: restart 802.3z BaseX autonegotiation - * @ndev: net_device structure for the MAC - */ void (*mac_an_restart)(struct net_device *ndev); - - void (*mac_link_down)(struct net_device *, unsigned int mode); - void (*mac_link_up)(struct net_device *, unsigned int mode, - struct phy_device *); + void (*mac_link_down)(struct net_device *ndev, unsigned int mode); + void (*mac_link_up)(struct net_device *ndev, unsigned int mode, + struct phy_device *phy); }; -struct phylink *phylink_create(struct net_device *, struct device_node *, +#if 0 /* For kernel-doc purposes only. */ +/** + * validate - Validate and update the link configuration + * @ndev: a pointer to a &struct net_device for the MAC. + * @supported: ethtool bitmask for supported link modes. + * @state: a pointer to a &struct phylink_link_state. + * + * Clear bits in the @supported and @state->advertising masks that + * are not supportable by the MAC. + * + * Note that the PHY may be able to transform from one connection + * technology to another, so, eg, don't clear 1000BaseX just + * because the MAC is unable to BaseX mode. This is more about + * clearing unsupported speeds and duplex settings. + * + * If the @state->interface mode is %PHY_INTERFACE_MODE_1000BASEX + * or %PHY_INTERFACE_MODE_2500BASEX, select the appropriate mode + * based on @state->advertising and/or @state->speed and update + * @state->interface accordingly. + */ +void validate(struct net_device *ndev, unsigned long *supported, + struct phylink_link_state *state); + +/** + * mac_link_state() - Read the current link state from the hardware + * @ndev: a pointer to a &struct net_device for the MAC. + * @state: a pointer to a &struct phylink_link_state. + * + * Read the current link state from the MAC, reporting the current + * speed in @state->speed, duplex mode in @state->duplex, pause mode + * in @state->pause using the %MLO_PAUSE_RX and %MLO_PAUSE_TX bits, + * negotiation completion state in @state->an_complete, and link + * up state in @state->link. + */ +int mac_link_state(struct net_device *ndev, + struct phylink_link_state *state); + +/** + * mac_config() - configure the MAC for the selected mode and state + * @ndev: a pointer to a &struct net_device for the MAC. + * @mode: one of %MLO_AN_FIXED, %MLO_AN_PHY, %MLO_AN_INBAND. + * @state: a pointer to a &struct phylink_link_state. + * + * The action performed depends on the currently selected mode: + * + * %MLO_AN_FIXED, %MLO_AN_PHY: + * Configure the specified @state->speed, @state->duplex and + * @state->pause (%MLO_PAUSE_TX / %MLO_PAUSE_RX) mode. + * + * %MLO_AN_INBAND: + * place the link in an inband negotiation mode (such as 802.3z + * 1000base-X or Cisco SGMII mode depending on the @state->interface + * mode). In both cases, link state management (whether the link + * is up or not) is performed by the MAC, and reported via the + * mac_link_state() callback. Changes in link state must be made + * by calling phylink_mac_change(). + * + * If in 802.3z mode, the link speed is fixed, dependent on the + * @state->interface. Duplex is negotiated, and pause is advertised + * according to @state->an_enabled, @state->pause and + * @state->advertising flags. Beware of MACs which only support full + * duplex at gigabit and higher speeds. + * + * If in Cisco SGMII mode, the link speed and duplex mode are passed + * in the serial bitstream 16-bit configuration word, and the MAC + * should be configured to read these bits and acknowledge the + * configuration word. Nothing is advertised by the MAC. The MAC is + * responsible for reading the configuration word and configuring + * itself accordingly. + */ +void mac_config(struct net_device *ndev, unsigned int mode, + const struct phylink_link_state *state); + +/** + * mac_an_restart() - restart 802.3z BaseX autonegotiation + * @ndev: a pointer to a &struct net_device for the MAC. + */ +void mac_an_restart(struct net_device *ndev); + +/** + * mac_link_down() - take the link down + * @ndev: a pointer to a &struct net_device for the MAC. + * @mode: link autonegotiation mode + * + * If @mode is not an in-band negotiation mode (as defined by + * phylink_autoneg_inband()), force the link down and disable any + * Energy Efficient Ethernet MAC configuration. + */ +void mac_link_down(struct net_device *ndev, unsigned int mode); + +/** + * mac_link_up() - allow the link to come up + * @ndev: a pointer to a &struct net_device for the MAC. + * @mode: link autonegotiation mode + * @phy: any attached phy + * + * If @mode is not an in-band negotiation mode (as defined by + * phylink_autoneg_inband()), allow the link to come up. If @phy + * is non-%NULL, configure Energy Efficient Ethernet by calling + * phy_init_eee() and perform appropriate MAC configuration for EEE. + */ +void mac_link_up(struct net_device *ndev, unsigned int mode, + struct phy_device *phy); +#endif + +struct phylink *phylink_create(struct net_device *, struct fwnode_handle *, phy_interface_t iface, const struct phylink_mac_ops *ops); void phylink_destroy(struct phylink *); @@ -128,7 +211,6 @@ int phylink_ethtool_set_pauseparam(struct phylink *, int phylink_ethtool_get_module_info(struct phylink *, struct ethtool_modinfo *); int phylink_ethtool_get_module_eeprom(struct phylink *, struct ethtool_eeprom *, u8 *); -int phylink_init_eee(struct phylink *, bool); int phylink_get_eee_err(struct phylink *); int phylink_ethtool_get_eee(struct phylink *, struct ethtool_eee *); int phylink_ethtool_set_eee(struct phylink *, struct ethtool_eee *); diff --git a/include/linux/sfp.h b/include/linux/sfp.h index 4a906f560817..47ea32d3e816 100644 --- a/include/linux/sfp.h +++ b/include/linux/sfp.h @@ -3,7 +3,7 @@ #include <linux/phy.h> -struct __packed sfp_eeprom_base { +struct sfp_eeprom_base { u8 phys_id; u8 phys_ext_id; u8 connector; @@ -166,12 +166,12 @@ struct __packed sfp_eeprom_base { union { __be16 optical_wavelength; u8 cable_spec; - }; + } __packed; u8 reserved62; u8 cc_base; -}; +} __packed; -struct __packed sfp_eeprom_ext { +struct sfp_eeprom_ext { __be16 options; u8 br_max; u8 br_min; @@ -181,12 +181,21 @@ struct __packed sfp_eeprom_ext { u8 enhopts; u8 sff8472_compliance; u8 cc_ext; -}; - -struct __packed sfp_eeprom_id { +} __packed; + +/** + * struct sfp_eeprom_id - raw SFP module identification information + * @base: base SFP module identification structure + * @ext: extended SFP module identification structure + * + * See the SFF-8472 specification and related documents for the definition + * of these structure members. This can be obtained from + * ftp://ftp.seagate.com/sff + */ +struct sfp_eeprom_id { struct sfp_eeprom_base base; struct sfp_eeprom_ext ext; -}; +} __packed; /* SFP EEPROM registers */ enum { @@ -347,19 +356,32 @@ enum { SFP_PAGE = 0x7f, }; -struct device_node; +struct fwnode_handle; struct ethtool_eeprom; struct ethtool_modinfo; struct net_device; struct sfp_bus; +/** + * struct sfp_upstream_ops - upstream operations structure + * @module_insert: called after a module has been detected to determine + * whether the module is supported for the upstream device. + * @module_remove: called after the module has been removed. + * @link_down: called when the link is non-operational for whatever + * reason. + * @link_up: called when the link is operational. + * @connect_phy: called when an I2C accessible PHY has been detected + * on the module. + * @disconnect_phy: called when a module with an I2C accessible PHY has + * been removed. + */ struct sfp_upstream_ops { - int (*module_insert)(void *, const struct sfp_eeprom_id *id); - void (*module_remove)(void *); - void (*link_down)(void *); - void (*link_up)(void *); - int (*connect_phy)(void *, struct phy_device *); - void (*disconnect_phy)(void *); + int (*module_insert)(void *priv, const struct sfp_eeprom_id *id); + void (*module_remove)(void *priv); + void (*link_down)(void *priv); + void (*link_up)(void *priv); + int (*connect_phy)(void *priv, struct phy_device *); + void (*disconnect_phy)(void *priv); }; #if IS_ENABLED(CONFIG_SFP) @@ -375,7 +397,7 @@ int sfp_get_module_eeprom(struct sfp_bus *bus, struct ethtool_eeprom *ee, u8 *data); void sfp_upstream_start(struct sfp_bus *bus); void sfp_upstream_stop(struct sfp_bus *bus); -struct sfp_bus *sfp_register_upstream(struct device_node *np, +struct sfp_bus *sfp_register_upstream(struct fwnode_handle *fwnode, struct net_device *ndev, void *upstream, const struct sfp_upstream_ops *ops); void sfp_unregister_upstream(struct sfp_bus *bus); @@ -419,7 +441,8 @@ static inline void sfp_upstream_stop(struct sfp_bus *bus) { } -static inline struct sfp_bus *sfp_register_upstream(struct device_node *np, +static inline struct sfp_bus *sfp_register_upstream( + struct fwnode_handle *fwnode, struct net_device *ndev, void *upstream, const struct sfp_upstream_ops *ops) { diff --git a/include/linux/skb_array.h b/include/linux/skb_array.h index 8621ffdeecbf..c7addf37d119 100644 --- a/include/linux/skb_array.h +++ b/include/linux/skb_array.h @@ -72,6 +72,11 @@ static inline bool __skb_array_empty(struct skb_array *a) return !__ptr_ring_peek(&a->ring); } +static inline struct sk_buff *__skb_array_peek(struct skb_array *a) +{ + return __ptr_ring_peek(&a->ring); +} + static inline bool skb_array_empty(struct skb_array *a) { return ptr_ring_empty(&a->ring); diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index a38c80e9f91e..b8e0da6c27d6 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1211,6 +1211,11 @@ static inline bool skb_flow_dissect_flow_keys_buf(struct flow_keys *flow, data, proto, nhoff, hlen, flags); } +void +skb_flow_dissect_tunnel_info(const struct sk_buff *skb, + struct flow_dissector *flow_dissector, + void *target_container); + static inline __u32 skb_get_hash(struct sk_buff *skb) { if (!skb->l4_hash && !skb->sw_hash) diff --git a/include/net/act_api.h b/include/net/act_api.h index fd08df74c466..02bf409140d0 100644 --- a/include/net/act_api.h +++ b/include/net/act_api.h @@ -86,7 +86,7 @@ struct tc_action_ops { int (*act)(struct sk_buff *, const struct tc_action *, struct tcf_result *); int (*dump)(struct sk_buff *, struct tc_action *, int, int); - void (*cleanup)(struct tc_action *, int bind); + void (*cleanup)(struct tc_action *); int (*lookup)(struct net *, struct tc_action **, u32); int (*init)(struct net *net, struct nlattr *nla, struct nlattr *est, struct tc_action **act, int ovr, diff --git a/include/net/addrconf.h b/include/net/addrconf.h index b623b65a79d1..c4185a7b0e90 100644 --- a/include/net/addrconf.h +++ b/include/net/addrconf.h @@ -180,7 +180,7 @@ static inline int addrconf_finite_timeout(unsigned long timeout) */ int ipv6_addr_label_init(void); void ipv6_addr_label_cleanup(void); -void ipv6_addr_label_rtnl_register(void); +int ipv6_addr_label_rtnl_register(void); u32 ipv6_addr_label(struct net *net, const struct in6_addr *addr, int type, int ifindex); diff --git a/include/net/dn_route.h b/include/net/dn_route.h index 55df9939bca2..342d2503cba5 100644 --- a/include/net/dn_route.h +++ b/include/net/dn_route.h @@ -69,6 +69,7 @@ int dn_route_rcv(struct sk_buff *skb, struct net_device *dev, */ struct dn_route { struct dst_entry dst; + struct dn_route __rcu *dn_next; struct neighbour *n; diff --git a/include/net/dsa.h b/include/net/dsa.h index 2a05738570d8..6cb602dd970c 100644 --- a/include/net/dsa.h +++ b/include/net/dsa.h @@ -296,31 +296,39 @@ static inline u32 dsa_user_ports(struct dsa_switch *ds) return mask; } -static inline u8 dsa_upstream_port(struct dsa_switch *ds) +/* Return the local port used to reach an arbitrary switch port */ +static inline unsigned int dsa_towards_port(struct dsa_switch *ds, int device, + int port) { - struct dsa_switch_tree *dst = ds->dst; - - /* - * If this is the root switch (i.e. the switch that connects - * to the CPU), return the cpu port number on this switch. - * Else return the (DSA) port number that connects to the - * switch that is one hop closer to the cpu. - */ - if (dst->cpu_dp->ds == ds) - return dst->cpu_dp->index; + if (device == ds->index) + return port; else - return ds->rtable[dst->cpu_dp->ds->index]; + return ds->rtable[device]; +} + +/* Return the local port used to reach the dedicated CPU port */ +static inline unsigned int dsa_upstream_port(struct dsa_switch *ds, int port) +{ + const struct dsa_port *dp = dsa_to_port(ds, port); + const struct dsa_port *cpu_dp = dp->cpu_dp; + + if (!cpu_dp) + return port; + + return dsa_towards_port(ds, cpu_dp->ds->index, cpu_dp->index); } typedef int dsa_fdb_dump_cb_t(const unsigned char *addr, u16 vid, bool is_static, void *data); struct dsa_switch_ops { +#if IS_ENABLED(CONFIG_NET_DSA_LEGACY) /* * Legacy probing. */ const char *(*probe)(struct device *dsa_dev, struct device *host_dev, int sw_addr, void **priv); +#endif enum dsa_tag_protocol (*get_tag_protocol)(struct dsa_switch *ds, int port); @@ -412,12 +420,10 @@ struct dsa_switch_ops { */ int (*port_vlan_filtering)(struct dsa_switch *ds, int port, bool vlan_filtering); - int (*port_vlan_prepare)(struct dsa_switch *ds, int port, - const struct switchdev_obj_port_vlan *vlan, - struct switchdev_trans *trans); - void (*port_vlan_add)(struct dsa_switch *ds, int port, - const struct switchdev_obj_port_vlan *vlan, - struct switchdev_trans *trans); + int (*port_vlan_prepare)(struct dsa_switch *ds, int port, + const struct switchdev_obj_port_vlan *vlan); + void (*port_vlan_add)(struct dsa_switch *ds, int port, + const struct switchdev_obj_port_vlan *vlan); int (*port_vlan_del)(struct dsa_switch *ds, int port, const struct switchdev_obj_port_vlan *vlan); /* @@ -433,12 +439,10 @@ struct dsa_switch_ops { /* * Multicast database */ - int (*port_mdb_prepare)(struct dsa_switch *ds, int port, - const struct switchdev_obj_port_mdb *mdb, - struct switchdev_trans *trans); - void (*port_mdb_add)(struct dsa_switch *ds, int port, - const struct switchdev_obj_port_mdb *mdb, - struct switchdev_trans *trans); + int (*port_mdb_prepare)(struct dsa_switch *ds, int port, + const struct switchdev_obj_port_mdb *mdb); + void (*port_mdb_add)(struct dsa_switch *ds, int port, + const struct switchdev_obj_port_mdb *mdb); int (*port_mdb_del)(struct dsa_switch *ds, int port, const struct switchdev_obj_port_mdb *mdb); /* @@ -472,11 +476,20 @@ struct dsa_switch_driver { const struct dsa_switch_ops *ops; }; +#if IS_ENABLED(CONFIG_NET_DSA_LEGACY) /* Legacy driver registration */ void register_switch_driver(struct dsa_switch_driver *type); void unregister_switch_driver(struct dsa_switch_driver *type); struct mii_bus *dsa_host_dev_to_mii_bus(struct device *dev); +#else +static inline void register_switch_driver(struct dsa_switch_driver *type) { } +static inline void unregister_switch_driver(struct dsa_switch_driver *type) { } +static inline struct mii_bus *dsa_host_dev_to_mii_bus(struct device *dev) +{ + return NULL; +} +#endif struct net_device *dsa_dev_to_net_device(struct device *dev); /* Keep inline for faster access in hot path */ diff --git a/include/net/dst.h b/include/net/dst.h index b091fd536098..33d2a5433924 100644 --- a/include/net/dst.h +++ b/include/net/dst.h @@ -34,13 +34,9 @@ struct sk_buff; struct dst_entry { struct net_device *dev; - struct rcu_head rcu_head; - struct dst_entry *child; struct dst_ops *ops; unsigned long _metrics; unsigned long expires; - struct dst_entry *path; - struct dst_entry *from; #ifdef CONFIG_XFRM struct xfrm_state *xfrm; #else @@ -59,8 +55,6 @@ struct dst_entry { #define DST_XFRM_QUEUE 0x0040 #define DST_METADATA 0x0080 - short error; - /* A non-zero value of dst->obsolete forces by-hand validation * of the route entry. Positive values are set by the generic * dst layer to indicate that the entry has been forcefully @@ -76,35 +70,24 @@ struct dst_entry { #define DST_OBSOLETE_KILL -2 unsigned short header_len; /* more space at head required */ unsigned short trailer_len; /* space to reserve at tail */ - unsigned short __pad3; -#ifdef CONFIG_IP_ROUTE_CLASSID - __u32 tclassid; -#else - __u32 __pad2; -#endif - -#ifdef CONFIG_64BIT - /* - * Align __refcnt to a 64 bytes alignment - * (L1_CACHE_SIZE would be too much) - */ - long __pad_to_align_refcnt[2]; -#endif /* * __refcnt wants to be on a different cache line from * input/output/ops or performance tanks badly */ - atomic_t __refcnt; /* client references */ +#ifdef CONFIG_64BIT + atomic_t __refcnt; /* 64-bit offset 64 */ +#endif int __use; unsigned long lastuse; struct lwtunnel_state *lwtstate; - union { - struct dst_entry *next; - struct rtable __rcu *rt_next; - struct rt6_info __rcu *rt6_next; - struct dn_route __rcu *dn_next; - }; + struct rcu_head rcu_head; + short error; + short __pad; + __u32 tclassid; +#ifndef CONFIG_64BIT + atomic_t __refcnt; /* 32-bit offset 64 */ +#endif }; struct dst_metrics { @@ -250,7 +233,7 @@ static inline void dst_hold(struct dst_entry *dst) { /* * If your kernel compilation stops here, please check - * __pad_to_align_refcnt declaration in struct dst_entry + * the placement of __refcnt in struct dst_entry */ BUILD_BUG_ON(offsetof(struct dst_entry, __refcnt) & 63); WARN_ON(atomic_inc_not_zero(&dst->__refcnt) == 0); diff --git a/include/net/erspan.h b/include/net/erspan.h index ca94fc86865e..6e758d08c9ee 100644 --- a/include/net/erspan.h +++ b/include/net/erspan.h @@ -58,4 +58,55 @@ struct erspanhdr { struct erspan_metadata md; }; +static inline u8 tos_to_cos(u8 tos) +{ + u8 dscp, cos; + + dscp = tos >> 2; + cos = dscp >> 3; + return cos; +} + +static inline void erspan_build_header(struct sk_buff *skb, + __be32 id, u32 index, + bool truncate, bool is_ipv4) +{ + struct ethhdr *eth = eth_hdr(skb); + enum erspan_encap_type enc_type; + struct erspanhdr *ershdr; + struct qtag_prefix { + __be16 eth_type; + __be16 tci; + } *qp; + u16 vlan_tci = 0; + u8 tos; + + tos = is_ipv4 ? ip_hdr(skb)->tos : + (ipv6_hdr(skb)->priority << 4) + + (ipv6_hdr(skb)->flow_lbl[0] >> 4); + + enc_type = ERSPAN_ENCAP_NOVLAN; + + /* If mirrored packet has vlan tag, extract tci and + * perserve vlan header in the mirrored frame. + */ + if (eth->h_proto == htons(ETH_P_8021Q)) { + qp = (struct qtag_prefix *)(skb->data + 2 * ETH_ALEN); + vlan_tci = ntohs(qp->tci); + enc_type = ERSPAN_ENCAP_INFRAME; + } + + skb_push(skb, sizeof(*ershdr)); + ershdr = (struct erspanhdr *)skb->data; + memset(ershdr, 0, sizeof(*ershdr)); + + ershdr->ver_vlan = htons((vlan_tci & VLAN_MASK) | + (ERSPAN_VERSION << VER_OFFSET)); + ershdr->session_id = htons((u16)(ntohl(id) & ID_MASK) | + ((tos_to_cos(tos) << COS_OFFSET) & COS_MASK) | + (enc_type << EN_OFFSET & EN_MASK) | + ((truncate << T_OFFSET) & T_MASK)); + ershdr->md.index = htonl(index & INDEX_MASK); +} + #endif diff --git a/include/net/gen_stats.h b/include/net/gen_stats.h index 304f7aa9cc01..0304ba2ae353 100644 --- a/include/net/gen_stats.h +++ b/include/net/gen_stats.h @@ -49,6 +49,9 @@ int gnet_stats_copy_rate_est(struct gnet_dump *d, int gnet_stats_copy_queue(struct gnet_dump *d, struct gnet_stats_queue __percpu *cpu_q, struct gnet_stats_queue *q, __u32 qlen); +void __gnet_stats_copy_queue(struct gnet_stats_queue *qstats, + const struct gnet_stats_queue __percpu *cpu_q, + const struct gnet_stats_queue *q, __u32 qlen); int gnet_stats_copy_app(struct gnet_dump *d, void *st, int len); int gnet_stats_finish_copy(struct gnet_dump *d); diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h index 0358745ea059..8e1bf9ae4a5e 100644 --- a/include/net/inet_connection_sock.h +++ b/include/net/inet_connection_sock.h @@ -77,6 +77,7 @@ struct inet_connection_sock_af_ops { * @icsk_af_ops Operations which are AF_INET{4,6} specific * @icsk_ulp_ops Pluggable ULP control hook * @icsk_ulp_data ULP private data + * @icsk_listen_portaddr_node hash to the portaddr listener hashtable * @icsk_ca_state: Congestion control state * @icsk_retransmits: Number of unrecovered [RTO] timeouts * @icsk_pending: Scheduled timer event @@ -101,6 +102,7 @@ struct inet_connection_sock { const struct inet_connection_sock_af_ops *icsk_af_ops; const struct tcp_ulp_ops *icsk_ulp_ops; void *icsk_ulp_data; + struct hlist_node icsk_listen_portaddr_node; unsigned int (*icsk_sync_mss)(struct sock *sk, u32 pmtu); __u8 icsk_ca_state:6, icsk_ca_setsockopt:1, diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h index 2dbbbff5e1e3..9141e95529e7 100644 --- a/include/net/inet_hashtables.h +++ b/include/net/inet_hashtables.h @@ -111,6 +111,7 @@ struct inet_bind_hashbucket { */ struct inet_listen_hashbucket { spinlock_t lock; + unsigned int count; struct hlist_head head; }; @@ -132,12 +133,13 @@ struct inet_hashinfo { /* Ok, let's try this, I give up, we do need a local binding * TCP hash as well as the others for fast bind/connect. */ + struct kmem_cache *bind_bucket_cachep; struct inet_bind_hashbucket *bhash; - unsigned int bhash_size; - /* 4 bytes hole on 64 bit */ - struct kmem_cache *bind_bucket_cachep; + /* The 2nd listener table hashed by local port and address */ + unsigned int lhash2_mask; + struct inet_listen_hashbucket *lhash2; /* All the above members are written once at bootup and * never written again _or_ are predominantly read-access. @@ -145,14 +147,25 @@ struct inet_hashinfo { * Now align to a new cache line as all the following members * might be often dirty. */ - /* All sockets in TCP_LISTEN state will be in here. This is the only - * table where wildcard'd TCP sockets can exist. Hash function here - * is just local port number. + /* All sockets in TCP_LISTEN state will be in listening_hash. + * This is the only table where wildcard'd TCP sockets can + * exist. listening_hash is only hashed by local port number. + * If lhash2 is initialized, the same socket will also be hashed + * to lhash2 by port and address. */ struct inet_listen_hashbucket listening_hash[INET_LHTABLE_SIZE] ____cacheline_aligned_in_smp; }; +#define inet_lhash2_for_each_icsk_rcu(__icsk, list) \ + hlist_for_each_entry_rcu(__icsk, list, icsk_listen_portaddr_node) + +static inline struct inet_listen_hashbucket * +inet_lhash2_bucket(struct inet_hashinfo *h, u32 hash) +{ + return &h->lhash2[hash & h->lhash2_mask]; +} + static inline struct inet_ehash_bucket *inet_ehash_bucket( struct inet_hashinfo *hashinfo, unsigned int hash) @@ -208,6 +221,10 @@ int __inet_inherit_port(const struct sock *sk, struct sock *child); void inet_put_port(struct sock *sk); void inet_hashinfo_init(struct inet_hashinfo *h); +void inet_hashinfo2_init(struct inet_hashinfo *h, const char *name, + unsigned long numentries, int scale, + unsigned long low_limit, + unsigned long high_limit); bool inet_ehash_insert(struct sock *sk, struct sock *osk); bool inet_ehash_nolisten(struct sock *sk, struct sock *osk); diff --git a/include/net/ip.h b/include/net/ip.h index 9896f46cbbf1..fc9bf1b1fe2c 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -26,12 +26,14 @@ #include <linux/ip.h> #include <linux/in.h> #include <linux/skbuff.h> +#include <linux/jhash.h> #include <net/inet_sock.h> #include <net/route.h> #include <net/snmp.h> #include <net/flow.h> #include <net/flow_dissector.h> +#include <net/netns/hash.h> #define IPV4_MAX_PMTU 65535U /* RFC 2675, Section 5.1 */ @@ -521,6 +523,13 @@ static inline unsigned int ipv4_addr_hash(__be32 ip) return (__force unsigned int) ip; } +static inline u32 ipv4_portaddr_hash(const struct net *net, + __be32 saddr, + unsigned int port) +{ + return jhash_1word((__force u32)saddr, net_hash_mix(net)) ^ port; +} + bool ip_call_ra_chain(struct sk_buff *skb); /* diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h index 10c913816032..44d96a91e745 100644 --- a/include/net/ip6_fib.h +++ b/include/net/ip6_fib.h @@ -129,6 +129,8 @@ struct rt6_exception { struct rt6_info { struct dst_entry dst; + struct rt6_info __rcu *rt6_next; + struct rt6_info *from; /* * Tail elements of dst_entry (__refcnt etc.) @@ -176,11 +178,11 @@ struct rt6_info { #define for_each_fib6_node_rt_rcu(fn) \ for (rt = rcu_dereference((fn)->leaf); rt; \ - rt = rcu_dereference(rt->dst.rt6_next)) + rt = rcu_dereference(rt->rt6_next)) #define for_each_fib6_walker_rt(w) \ for (rt = (w)->leaf; rt; \ - rt = rcu_dereference_protected(rt->dst.rt6_next, 1)) + rt = rcu_dereference_protected(rt->rt6_next, 1)) static inline struct inet6_dev *ip6_dst_idev(struct dst_entry *dst) { @@ -203,11 +205,9 @@ static inline void rt6_update_expires(struct rt6_info *rt0, int timeout) { struct rt6_info *rt; - for (rt = rt0; rt && !(rt->rt6i_flags & RTF_EXPIRES); - rt = (struct rt6_info *)rt->dst.from); + for (rt = rt0; rt && !(rt->rt6i_flags & RTF_EXPIRES); rt = rt->from); if (rt && rt != rt0) rt0->dst.expires = rt->dst.expires; - dst_set_expires(&rt0->dst, timeout); rt0->rt6i_flags |= RTF_EXPIRES; } @@ -242,8 +242,8 @@ static inline u32 rt6_get_cookie(const struct rt6_info *rt) u32 cookie = 0; if (rt->rt6i_flags & RTF_PCPU || - (unlikely(!list_empty(&rt->rt6i_uncached)) && rt->dst.from)) - rt = (struct rt6_info *)(rt->dst.from); + (unlikely(!list_empty(&rt->rt6i_uncached)) && rt->from)) + rt = rt->from; rt6_get_cookie_safe(rt, &cookie); diff --git a/include/net/ip6_tunnel.h b/include/net/ip6_tunnel.h index d66f70f63734..109a5a8877ef 100644 --- a/include/net/ip6_tunnel.h +++ b/include/net/ip6_tunnel.h @@ -36,6 +36,7 @@ struct __ip6_tnl_parm { __be32 o_key; __u32 fwmark; + __u32 index; /* ERSPAN type II index */ }; /* IPv6 tunnel */ diff --git a/include/net/ipv6.h b/include/net/ipv6.h index f73797e2fa60..25be4715578c 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -22,6 +22,7 @@ #include <net/flow.h> #include <net/flow_dissector.h> #include <net/snmp.h> +#include <net/netns/hash.h> #define SIN6_LEN_RFC2133 24 @@ -673,6 +674,22 @@ static inline bool ipv6_addr_v4mapped(const struct in6_addr *a) cpu_to_be32(0x0000ffff))) == 0UL; } +static inline u32 ipv6_portaddr_hash(const struct net *net, + const struct in6_addr *addr6, + unsigned int port) +{ + unsigned int hash, mix = net_hash_mix(net); + + if (ipv6_addr_any(addr6)) + hash = jhash_1word(0, mix); + else if (ipv6_addr_v4mapped(addr6)) + hash = jhash_1word((__force u32)addr6->s6_addr32[3], mix); + else + hash = jhash2((__force u32 *)addr6->s6_addr32, 4, mix); + + return hash ^ port; +} + /* * Check for a RFC 4843 ORCHID address * (Overlay Routable Cryptographic Hash Identifiers) diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h index d1f413f06c72..240469228851 100644 --- a/include/net/pkt_sched.h +++ b/include/net/pkt_sched.h @@ -105,16 +105,18 @@ struct qdisc_rate_table *qdisc_get_rtab(struct tc_ratespec *r, void qdisc_put_rtab(struct qdisc_rate_table *tab); void qdisc_put_stab(struct qdisc_size_table *tab); void qdisc_warn_nonwc(const char *txt, struct Qdisc *qdisc); -int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q, - struct net_device *dev, struct netdev_queue *txq, - spinlock_t *root_lock, bool validate); +bool sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q, + struct net_device *dev, struct netdev_queue *txq, + spinlock_t *root_lock, bool validate); void __qdisc_run(struct Qdisc *q); static inline void qdisc_run(struct Qdisc *q) { - if (qdisc_run_begin(q)) + if (qdisc_run_begin(q)) { __qdisc_run(q); + qdisc_run_end(q); + } } static inline __be16 tc_skb_protocol(const struct sk_buff *skb) diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h index ead018744ff5..14b6b3af8918 100644 --- a/include/net/rtnetlink.h +++ b/include/net/rtnetlink.h @@ -13,10 +13,10 @@ enum rtnl_link_flags { RTNL_FLAG_DOIT_UNLOCKED = 1, }; -int __rtnl_register(int protocol, int msgtype, - rtnl_doit_func, rtnl_dumpit_func, unsigned int flags); void rtnl_register(int protocol, int msgtype, rtnl_doit_func, rtnl_dumpit_func, unsigned int flags); +int rtnl_register_module(struct module *owner, int protocol, int msgtype, + rtnl_doit_func, rtnl_dumpit_func, unsigned int flags); int rtnl_unregister(int protocol, int msgtype); void rtnl_unregister_all(int protocol); diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index 65d0d25f2648..8f8c0afe529b 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -71,6 +71,7 @@ struct Qdisc { * qdisc_tree_decrease_qlen() should stop. */ #define TCQ_F_INVISIBLE 0x80 /* invisible by default in dump */ +#define TCQ_F_NOLOCK 0x100 /* qdisc does not require locking */ u32 limit; const struct Qdisc_ops *ops; struct qdisc_size_table __rcu *stab; @@ -87,14 +88,14 @@ struct Qdisc { /* * For performance sake on SMP, we put highly modified fields at the end */ - struct sk_buff *gso_skb ____cacheline_aligned_in_smp; + struct sk_buff_head gso_skb ____cacheline_aligned_in_smp; struct qdisc_skb_head q; struct gnet_stats_basic_packed bstats; seqcount_t running; struct gnet_stats_queue qstats; unsigned long state; struct Qdisc *next_sched; - struct sk_buff *skb_bad_txq; + struct sk_buff_head skb_bad_txq; int padded; refcount_t refcnt; @@ -161,7 +162,8 @@ struct Qdisc_class_ops { void (*walk)(struct Qdisc *, struct qdisc_walker * arg); /* Filter manipulation */ - struct tcf_block * (*tcf_block)(struct Qdisc *, unsigned long); + struct tcf_block * (*tcf_block)(struct Qdisc *sch, + unsigned long arg); unsigned long (*bind_tcf)(struct Qdisc *, unsigned long, u32 classid); void (*unbind_tcf)(struct Qdisc *, unsigned long); @@ -178,6 +180,7 @@ struct Qdisc_ops { const struct Qdisc_class_ops *cl_ops; char id[IFNAMSIZ]; int priv_size; + unsigned int static_flags; int (*enqueue)(struct sk_buff *skb, struct Qdisc *sch, @@ -185,11 +188,12 @@ struct Qdisc_ops { struct sk_buff * (*dequeue)(struct Qdisc *); struct sk_buff * (*peek)(struct Qdisc *); - int (*init)(struct Qdisc *, struct nlattr *arg); + int (*init)(struct Qdisc *sch, struct nlattr *arg); void (*reset)(struct Qdisc *); void (*destroy)(struct Qdisc *); - int (*change)(struct Qdisc *, struct nlattr *arg); - void (*attach)(struct Qdisc *); + int (*change)(struct Qdisc *sch, + struct nlattr *arg); + void (*attach)(struct Qdisc *sch); int (*dump)(struct Qdisc *, struct sk_buff *); int (*dump_stats)(struct Qdisc *, struct gnet_dump *); @@ -278,7 +282,6 @@ struct tcf_block { struct net *net; struct Qdisc *q; struct list_head cb_list; - struct work_struct work; }; static inline void qdisc_cb_private_validate(const struct sk_buff *skb, int sz) @@ -289,11 +292,31 @@ static inline void qdisc_cb_private_validate(const struct sk_buff *skb, int sz) BUILD_BUG_ON(sizeof(qcb->data) < sz); } +static inline int qdisc_qlen_cpu(const struct Qdisc *q) +{ + return this_cpu_ptr(q->cpu_qstats)->qlen; +} + static inline int qdisc_qlen(const struct Qdisc *q) { return q->q.qlen; } +static inline int qdisc_qlen_sum(const struct Qdisc *q) +{ + __u32 qlen = 0; + int i; + + if (q->flags & TCQ_F_NOLOCK) { + for_each_possible_cpu(i) + qlen += per_cpu_ptr(q->cpu_qstats, i)->qlen; + } else { + qlen = q->q.qlen; + } + + return qlen; +} + static inline struct qdisc_skb_cb *qdisc_skb_cb(const struct sk_buff *skb) { return (struct qdisc_skb_cb *)skb->cb; @@ -630,12 +653,39 @@ static inline void qdisc_qstats_backlog_dec(struct Qdisc *sch, sch->qstats.backlog -= qdisc_pkt_len(skb); } +static inline void qdisc_qstats_cpu_backlog_dec(struct Qdisc *sch, + const struct sk_buff *skb) +{ + this_cpu_sub(sch->cpu_qstats->backlog, qdisc_pkt_len(skb)); +} + static inline void qdisc_qstats_backlog_inc(struct Qdisc *sch, const struct sk_buff *skb) { sch->qstats.backlog += qdisc_pkt_len(skb); } +static inline void qdisc_qstats_cpu_backlog_inc(struct Qdisc *sch, + const struct sk_buff *skb) +{ + this_cpu_add(sch->cpu_qstats->backlog, qdisc_pkt_len(skb)); +} + +static inline void qdisc_qstats_cpu_qlen_inc(struct Qdisc *sch) +{ + this_cpu_inc(sch->cpu_qstats->qlen); +} + +static inline void qdisc_qstats_cpu_qlen_dec(struct Qdisc *sch) +{ + this_cpu_dec(sch->cpu_qstats->qlen); +} + +static inline void qdisc_qstats_cpu_requeues_inc(struct Qdisc *sch) +{ + this_cpu_inc(sch->cpu_qstats->requeues); +} + static inline void __qdisc_qstats_drop(struct Qdisc *sch, int count) { sch->qstats.drops += count; @@ -766,26 +816,30 @@ static inline struct sk_buff *qdisc_peek_head(struct Qdisc *sch) /* generic pseudo peek method for non-work-conserving qdisc */ static inline struct sk_buff *qdisc_peek_dequeued(struct Qdisc *sch) { + struct sk_buff *skb = skb_peek(&sch->gso_skb); + /* we can reuse ->gso_skb because peek isn't called for root qdiscs */ - if (!sch->gso_skb) { - sch->gso_skb = sch->dequeue(sch); - if (sch->gso_skb) { + if (!skb) { + skb = sch->dequeue(sch); + + if (skb) { + __skb_queue_head(&sch->gso_skb, skb); /* it's still part of the queue */ - qdisc_qstats_backlog_inc(sch, sch->gso_skb); + qdisc_qstats_backlog_inc(sch, skb); sch->q.qlen++; } } - return sch->gso_skb; + return skb; } /* use instead of qdisc->dequeue() for all qdiscs queried with ->peek() */ static inline struct sk_buff *qdisc_dequeue_peeked(struct Qdisc *sch) { - struct sk_buff *skb = sch->gso_skb; + struct sk_buff *skb = skb_peek(&sch->gso_skb); if (skb) { - sch->gso_skb = NULL; + skb = __skb_dequeue(&sch->gso_skb); qdisc_qstats_backlog_dec(sch, skb); sch->q.qlen--; } else { @@ -843,6 +897,14 @@ static inline void rtnl_qdisc_drop(struct sk_buff *skb, struct Qdisc *sch) qdisc_qstats_drop(sch); } +static inline int qdisc_drop_cpu(struct sk_buff *skb, struct Qdisc *sch, + struct sk_buff **to_free) +{ + __qdisc_drop(skb, to_free); + qdisc_qstats_cpu_drop(sch); + + return NET_XMIT_DROP; +} static inline int qdisc_drop(struct sk_buff *skb, struct Qdisc *sch, struct sk_buff **to_free) diff --git a/include/net/tc_act/tc_mirred.h b/include/net/tc_act/tc_mirred.h index 21d253c9a8c6..a2e9cbca5c9e 100644 --- a/include/net/tc_act/tc_mirred.h +++ b/include/net/tc_act/tc_mirred.h @@ -8,10 +8,8 @@ struct tcf_mirred { struct tc_action common; int tcfm_eaction; - int tcfm_ifindex; bool tcfm_mac_header_xmit; struct net_device __rcu *tcfm_dev; - struct net *net; struct list_head tcfm_list; }; #define to_mirred(a) ((struct tcf_mirred *)a) @@ -34,9 +32,9 @@ static inline bool is_tcf_mirred_egress_mirror(const struct tc_action *a) return false; } -static inline int tcf_mirred_ifindex(const struct tc_action *a) +static inline struct net_device *tcf_mirred_dev(const struct tc_action *a) { - return to_mirred(a)->tcfm_ifindex; + return rtnl_dereference(to_mirred(a)->tcfm_dev); } #endif /* __NET_TC_MIR_H */ diff --git a/include/net/tcp.h b/include/net/tcp.h index 6da880d2f022..3c3744e52cd1 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -2011,10 +2011,12 @@ static inline int tcp_call_bpf(struct sock *sk, int op) struct bpf_sock_ops_kern sock_ops; int ret; - if (sk_fullsock(sk)) + memset(&sock_ops, 0, sizeof(sock_ops)); + if (sk_fullsock(sk)) { + sock_ops.is_fullsock = 1; sock_owned_by_me(sk); + } - memset(&sock_ops, 0, sizeof(sock_ops)); sock_ops.sk = sk; sock_ops.op = op; diff --git a/include/net/xfrm.h b/include/net/xfrm.h index dc28a98ce97c..1ec0c4760646 100644 --- a/include/net/xfrm.h +++ b/include/net/xfrm.h @@ -968,7 +968,7 @@ static inline bool xfrm_sec_ctx_match(struct xfrm_sec_ctx *s1, struct xfrm_sec_c /* A struct encoding bundle of transformations to apply to some set of flow. * - * dst->child points to the next element of bundle. + * xdst->child points to the next element of bundle. * dst->xfrm points to an instanse of transformer. * * Due to unfortunate limitations of current routing cache, which we @@ -984,6 +984,8 @@ struct xfrm_dst { struct rt6_info rt6; } u; struct dst_entry *route; + struct dst_entry *child; + struct dst_entry *path; struct xfrm_policy *pols[XFRM_POLICY_TYPE_MAX]; int num_pols, num_xfrms; u32 xfrm_genid; @@ -994,7 +996,35 @@ struct xfrm_dst { u32 path_cookie; }; +static inline struct dst_entry *xfrm_dst_path(const struct dst_entry *dst) +{ #ifdef CONFIG_XFRM + if (dst->xfrm) { + const struct xfrm_dst *xdst = (const struct xfrm_dst *) dst; + + return xdst->path; + } +#endif + return (struct dst_entry *) dst; +} + +static inline struct dst_entry *xfrm_dst_child(const struct dst_entry *dst) +{ +#ifdef CONFIG_XFRM + if (dst->xfrm) { + struct xfrm_dst *xdst = (struct xfrm_dst *) dst; + return xdst->child; + } +#endif + return NULL; +} + +#ifdef CONFIG_XFRM +static inline void xfrm_dst_set_child(struct xfrm_dst *xdst, struct dst_entry *child) +{ + xdst->child = child; +} + static inline void xfrm_dst_destroy(struct xfrm_dst *xdst) { xfrm_pols_put(xdst->pols, xdst->num_pols); @@ -1866,12 +1896,14 @@ bool xfrm_dev_offload_ok(struct sk_buff *skb, struct xfrm_state *x); static inline bool xfrm_dst_offload_ok(struct dst_entry *dst) { struct xfrm_state *x = dst->xfrm; + struct xfrm_dst *xdst; if (!x || !x->type_offload) return false; - if (x->xso.offload_handle && (x->xso.dev == dst->path->dev) && - !dst->child->xfrm) + xdst = (struct xfrm_dst *) dst; + if (x->xso.offload_handle && (x->xso.dev == xfrm_dst_path(dst)->dev) && + !xdst->child->xfrm) return true; return false; diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 4c223ab30293..80d62e88590c 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -941,6 +941,12 @@ struct bpf_sock_ops { __u32 local_ip6[4]; /* Stored in network byte order */ __u32 remote_port; /* Stored in network byte order */ __u32 local_port; /* stored in host byte order */ + __u32 is_fullsock; /* Some TCP fields are only valid if + * there is a full socket. If not, the + * fields read as zero. + */ + __u32 snd_cwnd; + __u32 srtt_us; /* Averaged RTT << 3 in usecs */ }; /* List of known BPF sock_ops operators. diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h index ac71559314e7..44a0b675a6bc 100644 --- a/include/uapi/linux/ethtool.h +++ b/include/uapi/linux/ethtool.h @@ -1686,6 +1686,7 @@ enum ethtool_reset_flags { ETH_RESET_PHY = 1 << 6, /* Transceiver/PHY */ ETH_RESET_RAM = 1 << 7, /* RAM shared between * multiple components */ + ETH_RESET_AP = 1 << 8, /* Application processor */ ETH_RESET_DEDICATED = 0x0000ffff, /* All components dedicated to * this interface */ diff --git a/include/uapi/linux/if_tun.h b/include/uapi/linux/if_tun.h index 030d3e6d6029..fb38c1797131 100644 --- a/include/uapi/linux/if_tun.h +++ b/include/uapi/linux/if_tun.h @@ -57,6 +57,7 @@ */ #define TUNSETVNETBE _IOW('T', 222, int) #define TUNGETVNETBE _IOR('T', 223, int) +#define TUNSETSTEERINGEBPF _IOR('T', 224, int) /* TUNSETIFF ifr flags */ #define IFF_TUN 0x0001 diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index d4593571c404..7afa92e9b409 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -216,6 +216,17 @@ static const char * const reg_type_str[] = { [PTR_TO_PACKET_END] = "pkt_end", }; +static void print_liveness(struct bpf_verifier_env *env, + enum bpf_reg_liveness live) +{ + if (live & (REG_LIVE_READ | REG_LIVE_WRITTEN)) + verbose(env, "_"); + if (live & REG_LIVE_READ) + verbose(env, "r"); + if (live & REG_LIVE_WRITTEN) + verbose(env, "w"); +} + static void print_verifier_state(struct bpf_verifier_env *env, struct bpf_verifier_state *state) { @@ -228,7 +239,9 @@ static void print_verifier_state(struct bpf_verifier_env *env, t = reg->type; if (t == NOT_INIT) continue; - verbose(env, " R%d=%s", i, reg_type_str[t]); + verbose(env, " R%d", i); + print_liveness(env, reg->live); + verbose(env, "=%s", reg_type_str[t]); if ((t == SCALAR_VALUE || t == PTR_TO_STACK) && tnum_is_const(reg->var_off)) { /* reg->off should be 0 for SCALAR_VALUE */ @@ -277,10 +290,13 @@ static void print_verifier_state(struct bpf_verifier_env *env, } } for (i = 0; i < state->allocated_stack / BPF_REG_SIZE; i++) { - if (state->stack[i].slot_type[0] == STACK_SPILL) - verbose(env, " fp%d=%s", - -MAX_BPF_STACK + i * BPF_REG_SIZE, + if (state->stack[i].slot_type[0] == STACK_SPILL) { + verbose(env, " fp%d", + (-i - 1) * BPF_REG_SIZE); + print_liveness(env, state->stack[i].spilled_ptr.live); + verbose(env, "=%s", reg_type_str[state->stack[i].spilled_ptr.type]); + } } verbose(env, "\n"); } @@ -568,8 +584,8 @@ static void mark_reg_unknown(struct bpf_verifier_env *env, { if (WARN_ON(regno >= MAX_BPF_REG)) { verbose(env, "mark_reg_unknown(regs, %u)\n", regno); - /* Something bad happened, let's kill all regs */ - for (regno = 0; regno < MAX_BPF_REG; regno++) + /* Something bad happened, let's kill all regs except FP */ + for (regno = 0; regno < BPF_REG_FP; regno++) __mark_reg_not_init(regs + regno); return; } @@ -587,8 +603,8 @@ static void mark_reg_not_init(struct bpf_verifier_env *env, { if (WARN_ON(regno >= MAX_BPF_REG)) { verbose(env, "mark_reg_not_init(regs, %u)\n", regno); - /* Something bad happened, let's kill all regs */ - for (regno = 0; regno < MAX_BPF_REG; regno++) + /* Something bad happened, let's kill all regs except FP */ + for (regno = 0; regno < BPF_REG_FP; regno++) __mark_reg_not_init(regs + regno); return; } @@ -779,6 +795,11 @@ static int check_stack_read(struct bpf_verifier_env *env, if (value_regno >= 0) { /* restore register state from stack */ state->regs[value_regno] = state->stack[spi].spilled_ptr; + /* mark reg as written since spilled pointer state likely + * has its liveness marks cleared by is_state_visited() + * which resets stack/reg liveness for state transitions + */ + state->regs[value_regno].live |= REG_LIVE_WRITTEN; mark_stack_slot_read(state, spi); } return 0; @@ -1244,9 +1265,9 @@ static int check_xadd(struct bpf_verifier_env *env, int insn_idx, struct bpf_ins } /* Does this register contain a constant zero? */ -static bool register_is_null(struct bpf_reg_state reg) +static bool register_is_null(struct bpf_reg_state *reg) { - return reg.type == SCALAR_VALUE && tnum_equals_const(reg.var_off, 0); + return reg->type == SCALAR_VALUE && tnum_equals_const(reg->var_off, 0); } /* when register 'regno' is passed into function that will read 'access_size' @@ -1259,31 +1280,31 @@ static int check_stack_boundary(struct bpf_verifier_env *env, int regno, int access_size, bool zero_size_allowed, struct bpf_call_arg_meta *meta) { + struct bpf_reg_state *reg = cur_regs(env) + regno; struct bpf_verifier_state *state = env->cur_state; - struct bpf_reg_state *regs = state->regs; int off, i, slot, spi; - if (regs[regno].type != PTR_TO_STACK) { + if (reg->type != PTR_TO_STACK) { /* Allow zero-byte read from NULL, regardless of pointer type */ if (zero_size_allowed && access_size == 0 && - register_is_null(regs[regno])) + register_is_null(reg)) return 0; verbose(env, "R%d type=%s expected=%s\n", regno, - reg_type_str[regs[regno].type], + reg_type_str[reg->type], reg_type_str[PTR_TO_STACK]); return -EACCES; } /* Only allow fixed-offset stack reads */ - if (!tnum_is_const(regs[regno].var_off)) { + if (!tnum_is_const(reg->var_off)) { char tn_buf[48]; - tnum_strn(tn_buf, sizeof(tn_buf), regs[regno].var_off); + tnum_strn(tn_buf, sizeof(tn_buf), reg->var_off); verbose(env, "invalid variable stack read R%d var_off=%s\n", regno, tn_buf); } - off = regs[regno].off + regs[regno].var_off.value; + off = reg->off + reg->var_off.value; if (off >= 0 || off < -MAX_BPF_STACK || off + access_size > 0 || access_size < 0 || (access_size == 0 && !zero_size_allowed)) { verbose(env, "invalid stack type R%d off=%d access_size=%d\n", @@ -1391,7 +1412,7 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno, * passed in as argument, it's a SCALAR_VALUE type. Final test * happens during stack boundary checking. */ - if (register_is_null(*reg) && + if (register_is_null(reg) && arg_type == ARG_PTR_TO_MEM_OR_NULL) /* final test in check_stack_boundary() */; else if (!type_is_pkt_pointer(type) && @@ -2934,8 +2955,9 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env, if (BPF_SRC(insn->code) == BPF_K && (opcode == BPF_JEQ || opcode == BPF_JNE) && dst_reg->type == SCALAR_VALUE && - tnum_equals_const(dst_reg->var_off, insn->imm)) { - if (opcode == BPF_JEQ) { + tnum_is_const(dst_reg->var_off)) { + if ((opcode == BPF_JEQ && dst_reg->var_off.value == insn->imm) || + (opcode == BPF_JNE && dst_reg->var_off.value != insn->imm)) { /* if (imm == imm) goto pc+off; * only follow the goto, ignore fall-through */ diff --git a/net/atm/common.c b/net/atm/common.c index 8a4f99114cd2..5763fd241dc3 100644 --- a/net/atm/common.c +++ b/net/atm/common.c @@ -14,7 +14,7 @@ #include <linux/capability.h> #include <linux/mm.h> #include <linux/sched/signal.h> -#include <linux/time.h> /* struct timeval */ +#include <linux/time64.h> /* 64-bit time for seconds */ #include <linux/skbuff.h> #include <linux/bitops.h> #include <linux/init.h> diff --git a/net/atm/mpc.c b/net/atm/mpc.c index 7c6a1cc760a2..31e0dcb970f8 100644 --- a/net/atm/mpc.c +++ b/net/atm/mpc.c @@ -1089,7 +1089,7 @@ static void MPOA_trigger_rcvd(struct k_message *msg, struct mpoa_client *mpc) msg->type = SND_MPOA_RES_RQST; msg->content.in_info = entry->ctrl_info; msg_to_mpoad(msg, mpc); - do_gettimeofday(&(entry->reply_wait)); + entry->reply_wait = ktime_get_seconds(); mpc->in_ops->put(entry); return; } @@ -1099,7 +1099,7 @@ static void MPOA_trigger_rcvd(struct k_message *msg, struct mpoa_client *mpc) msg->type = SND_MPOA_RES_RQST; msg->content.in_info = entry->ctrl_info; msg_to_mpoad(msg, mpc); - do_gettimeofday(&(entry->reply_wait)); + entry->reply_wait = ktime_get_seconds(); mpc->in_ops->put(entry); return; } @@ -1175,8 +1175,9 @@ static void MPOA_res_reply_rcvd(struct k_message *msg, struct mpoa_client *mpc) } entry->ctrl_info = msg->content.in_info; - do_gettimeofday(&(entry->tv)); - do_gettimeofday(&(entry->reply_wait)); /* Used in refreshing func from now on */ + entry->time = ktime_get_seconds(); + /* Used in refreshing func from now on */ + entry->reply_wait = ktime_get_seconds(); entry->refresh_time = 0; ddprintk_cont("entry->shortcut = %p\n", entry->shortcut); diff --git a/net/atm/mpoa_caches.c b/net/atm/mpoa_caches.c index e01450bb32d6..4bb418313720 100644 --- a/net/atm/mpoa_caches.c +++ b/net/atm/mpoa_caches.c @@ -117,7 +117,7 @@ static in_cache_entry *in_cache_add_entry(__be32 dst_ip, memcpy(entry->MPS_ctrl_ATM_addr, client->mps_ctrl_addr, ATM_ESA_LEN); entry->ctrl_info.in_dst_ip = dst_ip; - do_gettimeofday(&(entry->tv)); + entry->time = ktime_get_seconds(); entry->retry_time = client->parameters.mpc_p4; entry->count = 1; entry->entry_state = INGRESS_INVALID; @@ -148,7 +148,7 @@ static int cache_hit(in_cache_entry *entry, struct mpoa_client *mpc) if (qos != NULL) msg.qos = qos->qos; msg_to_mpoad(&msg, mpc); - do_gettimeofday(&(entry->reply_wait)); + entry->reply_wait = ktime_get_seconds(); entry->entry_state = INGRESS_RESOLVING; } if (entry->shortcut != NULL) @@ -171,7 +171,7 @@ static int cache_hit(in_cache_entry *entry, struct mpoa_client *mpc) if (qos != NULL) msg.qos = qos->qos; msg_to_mpoad(&msg, mpc); - do_gettimeofday(&(entry->reply_wait)); + entry->reply_wait = ktime_get_seconds(); } return CLOSED; @@ -227,17 +227,16 @@ static void in_cache_remove_entry(in_cache_entry *entry, static void clear_count_and_expired(struct mpoa_client *client) { in_cache_entry *entry, *next_entry; - struct timeval now; + time64_t now; - do_gettimeofday(&now); + now = ktime_get_seconds(); write_lock_bh(&client->ingress_lock); entry = client->in_cache; while (entry != NULL) { entry->count = 0; next_entry = entry->next; - if ((now.tv_sec - entry->tv.tv_sec) - > entry->ctrl_info.holding_time) { + if ((now - entry->time) > entry->ctrl_info.holding_time) { dprintk("holding time expired, ip = %pI4\n", &entry->ctrl_info.in_dst_ip); client->in_ops->remove_entry(entry, client); @@ -253,35 +252,35 @@ static void check_resolving_entries(struct mpoa_client *client) struct atm_mpoa_qos *qos; in_cache_entry *entry; - struct timeval now; + time64_t now; struct k_message msg; - do_gettimeofday(&now); + now = ktime_get_seconds(); read_lock_bh(&client->ingress_lock); entry = client->in_cache; while (entry != NULL) { if (entry->entry_state == INGRESS_RESOLVING) { - if ((now.tv_sec - entry->hold_down.tv_sec) < - client->parameters.mpc_p6) { + + if ((now - entry->hold_down) + < client->parameters.mpc_p6) { entry = entry->next; /* Entry in hold down */ continue; } - if ((now.tv_sec - entry->reply_wait.tv_sec) > - entry->retry_time) { + if ((now - entry->reply_wait) > entry->retry_time) { entry->retry_time = MPC_C1 * (entry->retry_time); /* * Retry time maximum exceeded, * put entry in hold down. */ if (entry->retry_time > client->parameters.mpc_p5) { - do_gettimeofday(&(entry->hold_down)); + entry->hold_down = ktime_get_seconds(); entry->retry_time = client->parameters.mpc_p4; entry = entry->next; continue; } /* Ask daemon to send a resolution request. */ - memset(&(entry->hold_down), 0, sizeof(struct timeval)); + memset(&entry->hold_down, 0, sizeof(time64_t)); msg.type = SND_MPOA_RES_RTRY; memcpy(msg.MPS_ctrl, client->mps_ctrl_addr, ATM_ESA_LEN); msg.content.in_info = entry->ctrl_info; @@ -289,7 +288,7 @@ static void check_resolving_entries(struct mpoa_client *client) if (qos != NULL) msg.qos = qos->qos; msg_to_mpoad(&msg, client); - do_gettimeofday(&(entry->reply_wait)); + entry->reply_wait = ktime_get_seconds(); } } entry = entry->next; @@ -300,18 +299,18 @@ static void check_resolving_entries(struct mpoa_client *client) /* Call this every MPC-p5 seconds. */ static void refresh_entries(struct mpoa_client *client) { - struct timeval now; + time64_t now; struct in_cache_entry *entry = client->in_cache; ddprintk("refresh_entries\n"); - do_gettimeofday(&now); + now = ktime_get_seconds(); read_lock_bh(&client->ingress_lock); while (entry != NULL) { if (entry->entry_state == INGRESS_RESOLVED) { if (!(entry->refresh_time)) entry->refresh_time = (2 * (entry->ctrl_info.holding_time))/3; - if ((now.tv_sec - entry->reply_wait.tv_sec) > + if ((now - entry->reply_wait) > entry->refresh_time) { dprintk("refreshing an entry.\n"); entry->entry_state = INGRESS_REFRESHING; @@ -480,7 +479,7 @@ static eg_cache_entry *eg_cache_add_entry(struct k_message *msg, memcpy(entry->MPS_ctrl_ATM_addr, client->mps_ctrl_addr, ATM_ESA_LEN); entry->ctrl_info = msg->content.eg_info; - do_gettimeofday(&(entry->tv)); + entry->time = ktime_get_seconds(); entry->entry_state = EGRESS_RESOLVED; dprintk("new_eg_cache_entry cache_id %u\n", ntohl(entry->ctrl_info.cache_id)); @@ -495,7 +494,7 @@ static eg_cache_entry *eg_cache_add_entry(struct k_message *msg, static void update_eg_cache_entry(eg_cache_entry *entry, uint16_t holding_time) { - do_gettimeofday(&(entry->tv)); + entry->time = ktime_get_seconds(); entry->entry_state = EGRESS_RESOLVED; entry->ctrl_info.holding_time = holding_time; } @@ -503,17 +502,16 @@ static void update_eg_cache_entry(eg_cache_entry *entry, uint16_t holding_time) static void clear_expired(struct mpoa_client *client) { eg_cache_entry *entry, *next_entry; - struct timeval now; + time64_t now; struct k_message msg; - do_gettimeofday(&now); + now = ktime_get_seconds(); write_lock_irq(&client->egress_lock); entry = client->eg_cache; while (entry != NULL) { next_entry = entry->next; - if ((now.tv_sec - entry->tv.tv_sec) - > entry->ctrl_info.holding_time) { + if ((now - entry->time) > entry->ctrl_info.holding_time) { msg.type = SND_EGRESS_PURGE; msg.content.eg_info = entry->ctrl_info; dprintk("egress_cache: holding time expired, cache_id = %u.\n", diff --git a/net/atm/mpoa_caches.h b/net/atm/mpoa_caches.h index 6a266669ebf4..464c4c7f8d1f 100644 --- a/net/atm/mpoa_caches.h +++ b/net/atm/mpoa_caches.h @@ -2,6 +2,7 @@ #ifndef MPOA_CACHES_H #define MPOA_CACHES_H +#include <linux/time64.h> #include <linux/netdevice.h> #include <linux/types.h> #include <linux/atm.h> @@ -16,9 +17,9 @@ void atm_mpoa_init_cache(struct mpoa_client *mpc); typedef struct in_cache_entry { struct in_cache_entry *next; struct in_cache_entry *prev; - struct timeval tv; - struct timeval reply_wait; - struct timeval hold_down; + time64_t time; + time64_t reply_wait; + time64_t hold_down; uint32_t packets_fwded; uint16_t entry_state; uint32_t retry_time; @@ -53,7 +54,7 @@ struct in_cache_ops{ typedef struct eg_cache_entry{ struct eg_cache_entry *next; struct eg_cache_entry *prev; - struct timeval tv; + time64_t time; uint8_t MPS_ctrl_ATM_addr[ATM_ESA_LEN]; struct atm_vcc *shortcut; uint32_t packets_rcvd; diff --git a/net/atm/mpoa_proc.c b/net/atm/mpoa_proc.c index 8a0c17e1c203..2212da9c2da2 100644 --- a/net/atm/mpoa_proc.c +++ b/net/atm/mpoa_proc.c @@ -8,7 +8,7 @@ #include <linux/mm.h> #include <linux/module.h> #include <linux/proc_fs.h> -#include <linux/time.h> +#include <linux/ktime.h> #include <linux/seq_file.h> #include <linux/uaccess.h> #include <linux/atmmpc.h> @@ -138,7 +138,7 @@ static int mpc_show(struct seq_file *m, void *v) int i; in_cache_entry *in_entry; eg_cache_entry *eg_entry; - struct timeval now; + time64_t now; unsigned char ip_string[16]; if (v == SEQ_START_TOKEN) { @@ -148,15 +148,17 @@ static int mpc_show(struct seq_file *m, void *v) seq_printf(m, "\nInterface %d:\n\n", mpc->dev_num); seq_printf(m, "Ingress Entries:\nIP address State Holding time Packets fwded VPI VCI\n"); - do_gettimeofday(&now); + now = ktime_get_seconds(); for (in_entry = mpc->in_cache; in_entry; in_entry = in_entry->next) { + unsigned long seconds_delta = now - in_entry->time; + sprintf(ip_string, "%pI4", &in_entry->ctrl_info.in_dst_ip); seq_printf(m, "%-16s%s%-14lu%-12u", ip_string, ingress_state_string(in_entry->entry_state), in_entry->ctrl_info.holding_time - - (now.tv_sec-in_entry->tv.tv_sec), + seconds_delta, in_entry->packets_fwded); if (in_entry->shortcut) seq_printf(m, " %-3d %-3d", @@ -169,13 +171,14 @@ static int mpc_show(struct seq_file *m, void *v) seq_printf(m, "Egress Entries:\nIngress MPC ATM addr\nCache-id State Holding time Packets recvd Latest IP addr VPI VCI\n"); for (eg_entry = mpc->eg_cache; eg_entry; eg_entry = eg_entry->next) { unsigned char *p = eg_entry->ctrl_info.in_MPC_data_ATM_addr; + unsigned long seconds_delta = now - eg_entry->time; + for (i = 0; i < ATM_ESA_LEN; i++) seq_printf(m, "%02x", p[i]); seq_printf(m, "\n%-16lu%s%-14lu%-15u", (unsigned long)ntohl(eg_entry->ctrl_info.cache_id), egress_state_string(eg_entry->entry_state), - (eg_entry->ctrl_info.holding_time - - (now.tv_sec-eg_entry->tv.tv_sec)), + (eg_entry->ctrl_info.holding_time - seconds_delta), eg_entry->packets_rcvd); /* latest IP address */ diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c index b0f4c734900b..6d9f48bd374a 100644 --- a/net/bridge/br_mdb.c +++ b/net/bridge/br_mdb.c @@ -760,9 +760,9 @@ static int br_mdb_del(struct sk_buff *skb, struct nlmsghdr *nlh, void br_mdb_init(void) { - rtnl_register(PF_BRIDGE, RTM_GETMDB, NULL, br_mdb_dump, 0); - rtnl_register(PF_BRIDGE, RTM_NEWMDB, br_mdb_add, NULL, 0); - rtnl_register(PF_BRIDGE, RTM_DELMDB, br_mdb_del, NULL, 0); + rtnl_register_module(THIS_MODULE, PF_BRIDGE, RTM_GETMDB, NULL, br_mdb_dump, 0); + rtnl_register_module(THIS_MODULE, PF_BRIDGE, RTM_NEWMDB, br_mdb_add, NULL, 0); + rtnl_register_module(THIS_MODULE, PF_BRIDGE, RTM_DELMDB, br_mdb_del, NULL, 0); } void br_mdb_uninit(void) diff --git a/net/bridge/br_nf_core.c b/net/bridge/br_nf_core.c index 20cbb727df4d..8e2d7cfa4e16 100644 --- a/net/bridge/br_nf_core.c +++ b/net/bridge/br_nf_core.c @@ -78,7 +78,6 @@ void br_netfilter_rtable_init(struct net_bridge *br) atomic_set(&rt->dst.__refcnt, 1); rt->dst.dev = br->dev; - rt->dst.path = &rt->dst; dst_init_metrics(&rt->dst, br_dst_default_metrics, true); rt->dst.flags = DST_NOXFRM | DST_FAKE_RTABLE; rt->dst.ops = &fake_dst_ops; diff --git a/net/can/gw.c b/net/can/gw.c index 73a02af4b5d7..398dd0395ad9 100644 --- a/net/can/gw.c +++ b/net/can/gw.c @@ -1014,6 +1014,8 @@ static struct pernet_operations cangw_pernet_ops = { static __init int cgw_module_init(void) { + int ret; + /* sanitize given module parameter */ max_hops = clamp_t(unsigned int, max_hops, CGW_MIN_HOPS, CGW_MAX_HOPS); @@ -1031,15 +1033,19 @@ static __init int cgw_module_init(void) notifier.notifier_call = cgw_notifier; register_netdevice_notifier(¬ifier); - if (__rtnl_register(PF_CAN, RTM_GETROUTE, NULL, cgw_dump_jobs, 0)) { + ret = rtnl_register_module(THIS_MODULE, PF_CAN, RTM_GETROUTE, + NULL, cgw_dump_jobs, 0); + if (ret) { unregister_netdevice_notifier(¬ifier); kmem_cache_destroy(cgw_cache); return -ENOBUFS; } - /* Only the first call to __rtnl_register can fail */ - __rtnl_register(PF_CAN, RTM_NEWROUTE, cgw_create_job, NULL, 0); - __rtnl_register(PF_CAN, RTM_DELROUTE, cgw_remove_job, NULL, 0); + /* Only the first call to rtnl_register_module can fail */ + rtnl_register_module(THIS_MODULE, PF_CAN, RTM_NEWROUTE, + cgw_create_job, NULL, 0); + rtnl_register_module(THIS_MODULE, PF_CAN, RTM_DELROUTE, + cgw_remove_job, NULL, 0); return 0; } diff --git a/net/core/dev.c b/net/core/dev.c index f47e96b62308..8aa2f70995e8 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3162,6 +3162,21 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q, int rc; qdisc_calculate_pkt_len(skb, q); + + if (q->flags & TCQ_F_NOLOCK) { + if (unlikely(test_bit(__QDISC_STATE_DEACTIVATED, &q->state))) { + __qdisc_drop(skb, &to_free); + rc = NET_XMIT_DROP; + } else { + rc = q->enqueue(skb, q, &to_free) & NET_XMIT_MASK; + __qdisc_run(q); + } + + if (unlikely(to_free)) + kfree_skb_list(to_free); + return rc; + } + /* * Heuristic to force contended enqueues to serialize on a * separate lock before trying to get qdisc main lock. @@ -3192,9 +3207,9 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q, contended = false; } __qdisc_run(q); - } else - qdisc_run_end(q); + } + qdisc_run_end(q); rc = NET_XMIT_SUCCESS; } else { rc = q->enqueue(skb, q, &to_free) & NET_XMIT_MASK; @@ -3204,6 +3219,7 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q, contended = false; } __qdisc_run(q); + qdisc_run_end(q); } } spin_unlock(root_lock); @@ -4143,19 +4159,22 @@ static __latent_entropy void net_tx_action(struct softirq_action *h) while (head) { struct Qdisc *q = head; - spinlock_t *root_lock; + spinlock_t *root_lock = NULL; head = head->next_sched; - root_lock = qdisc_lock(q); - spin_lock(root_lock); + if (!(q->flags & TCQ_F_NOLOCK)) { + root_lock = qdisc_lock(q); + spin_lock(root_lock); + } /* We need to make sure head->next_sched is read * before clearing __QDISC_STATE_SCHED */ smp_mb__before_atomic(); clear_bit(__QDISC_STATE_SCHED, &q->state); qdisc_run(q); - spin_unlock(root_lock); + if (root_lock) + spin_unlock(root_lock); } } } @@ -7073,17 +7092,21 @@ int dev_change_proto_down(struct net_device *dev, bool proto_down) } EXPORT_SYMBOL(dev_change_proto_down); -u8 __dev_xdp_attached(struct net_device *dev, bpf_op_t bpf_op, u32 *prog_id) +void __dev_xdp_query(struct net_device *dev, bpf_op_t bpf_op, + struct netdev_bpf *xdp) { - struct netdev_bpf xdp; - - memset(&xdp, 0, sizeof(xdp)); - xdp.command = XDP_QUERY_PROG; + memset(xdp, 0, sizeof(*xdp)); + xdp->command = XDP_QUERY_PROG; /* Query must always succeed. */ - WARN_ON(bpf_op(dev, &xdp) < 0); - if (prog_id) - *prog_id = xdp.prog_id; + WARN_ON(bpf_op(dev, xdp) < 0); +} + +static u8 __dev_xdp_attached(struct net_device *dev, bpf_op_t bpf_op) +{ + struct netdev_bpf xdp; + + __dev_xdp_query(dev, bpf_op, &xdp); return xdp.prog_attached; } @@ -7106,6 +7129,27 @@ static int dev_xdp_install(struct net_device *dev, bpf_op_t bpf_op, return bpf_op(dev, &xdp); } +static void dev_xdp_uninstall(struct net_device *dev) +{ + struct netdev_bpf xdp; + bpf_op_t ndo_bpf; + + /* Remove generic XDP */ + WARN_ON(dev_xdp_install(dev, generic_xdp_install, NULL, 0, NULL)); + + /* Remove from the driver */ + ndo_bpf = dev->netdev_ops->ndo_bpf; + if (!ndo_bpf) + return; + + __dev_xdp_query(dev, ndo_bpf, &xdp); + if (xdp.prog_attached == XDP_ATTACHED_NONE) + return; + + /* Program removal should always succeed */ + WARN_ON(dev_xdp_install(dev, ndo_bpf, NULL, xdp.prog_flags, NULL)); +} + /** * dev_change_xdp_fd - set or clear a bpf program for a device rx path * @dev: device @@ -7134,10 +7178,10 @@ int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack, bpf_chk = generic_xdp_install; if (fd >= 0) { - if (bpf_chk && __dev_xdp_attached(dev, bpf_chk, NULL)) + if (bpf_chk && __dev_xdp_attached(dev, bpf_chk)) return -EEXIST; if ((flags & XDP_FLAGS_UPDATE_IF_NOEXIST) && - __dev_xdp_attached(dev, bpf_op, NULL)) + __dev_xdp_attached(dev, bpf_op)) return -EBUSY; prog = bpf_prog_get_type_dev(fd, BPF_PROG_TYPE_XDP, @@ -7236,6 +7280,7 @@ static void rollback_registered_many(struct list_head *head) /* Shutdown queueing discipline. */ dev_shutdown(dev); + dev_xdp_uninstall(dev); /* Notify protocols, that we are about to destroy * this device. They should clean all the things. @@ -8195,7 +8240,6 @@ EXPORT_SYMBOL(alloc_netdev_mqs); void free_netdev(struct net_device *dev) { struct napi_struct *p, *n; - struct bpf_prog *prog; might_sleep(); netif_free_tx_queues(dev); @@ -8214,12 +8258,6 @@ void free_netdev(struct net_device *dev) free_percpu(dev->pcpu_refcnt); dev->pcpu_refcnt = NULL; - prog = rcu_dereference_protected(dev->xdp_prog, 1); - if (prog) { - bpf_prog_put(prog); - static_key_slow_dec(&generic_xdp_needed); - } - /* Compatibility with error handling in drivers */ if (dev->reg_state == NETREG_UNINITIALIZED) { netdev_freemem(dev); diff --git a/net/core/dst.c b/net/core/dst.c index 662a2d4a3d19..007aa0b08291 100644 --- a/net/core/dst.c +++ b/net/core/dst.c @@ -21,6 +21,7 @@ #include <linux/sched.h> #include <linux/prefetch.h> #include <net/lwtunnel.h> +#include <net/xfrm.h> #include <net/dst.h> #include <net/dst_metadata.h> @@ -62,15 +63,12 @@ void dst_init(struct dst_entry *dst, struct dst_ops *ops, struct net_device *dev, int initial_ref, int initial_obsolete, unsigned short flags) { - dst->child = NULL; dst->dev = dev; if (dev) dev_hold(dev); dst->ops = ops; dst_init_metrics(dst, dst_default_metrics.metrics, true); dst->expires = 0UL; - dst->path = dst; - dst->from = NULL; #ifdef CONFIG_XFRM dst->xfrm = NULL; #endif @@ -88,7 +86,6 @@ void dst_init(struct dst_entry *dst, struct dst_ops *ops, dst->__use = 0; dst->lastuse = jiffies; dst->flags = flags; - dst->next = NULL; if (!(flags & DST_NOCOUNT)) dst_entries_add(ops, 1); } @@ -116,12 +113,17 @@ EXPORT_SYMBOL(dst_alloc); struct dst_entry *dst_destroy(struct dst_entry * dst) { - struct dst_entry *child; + struct dst_entry *child = NULL; smp_rmb(); - child = dst->child; +#ifdef CONFIG_XFRM + if (dst->xfrm) { + struct xfrm_dst *xdst = (struct xfrm_dst *) dst; + child = xdst->child; + } +#endif if (!(dst->flags & DST_NOCOUNT)) dst_entries_add(dst->ops, -1); diff --git a/net/core/filter.c b/net/core/filter.c index 6a85e67fafce..754abe1041b7 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -3013,6 +3013,8 @@ BPF_CALL_4(bpf_skb_set_tunnel_key, struct sk_buff *, skb, info->key.tun_flags = TUNNEL_KEY | TUNNEL_CSUM | TUNNEL_NOCACHE; if (flags & BPF_F_DONT_FRAGMENT) info->key.tun_flags |= TUNNEL_DONT_FRAGMENT; + if (flags & BPF_F_ZERO_CSUM_TX) + info->key.tun_flags &= ~TUNNEL_CSUM; info->key.tun_id = cpu_to_be64(from->tunnel_id); info->key.tos = from->tunnel_tos; @@ -3026,8 +3028,6 @@ BPF_CALL_4(bpf_skb_set_tunnel_key, struct sk_buff *, skb, IPV6_FLOWLABEL_MASK; } else { info->key.u.ipv4.dst = cpu_to_be32(from->remote_ipv4); - if (flags & BPF_F_ZERO_CSUM_TX) - info->key.tun_flags &= ~TUNNEL_CSUM; } return 0; @@ -4437,6 +4437,42 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type, *insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->dst_reg, offsetof(struct sock_common, skc_num)); break; + + case offsetof(struct bpf_sock_ops, is_fullsock): + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF( + struct bpf_sock_ops_kern, + is_fullsock), + si->dst_reg, si->src_reg, + offsetof(struct bpf_sock_ops_kern, + is_fullsock)); + break; + +/* Helper macro for adding read access to tcp_sock fields. */ +#define SOCK_OPS_GET_TCP32(FIELD_NAME) \ + do { \ + BUILD_BUG_ON(FIELD_SIZEOF(struct tcp_sock, FIELD_NAME) != 4); \ + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF( \ + struct bpf_sock_ops_kern, \ + is_fullsock), \ + si->dst_reg, si->src_reg, \ + offsetof(struct bpf_sock_ops_kern, \ + is_fullsock)); \ + *insn++ = BPF_JMP_IMM(BPF_JEQ, si->dst_reg, 0, 2); \ + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF( \ + struct bpf_sock_ops_kern, sk),\ + si->dst_reg, si->src_reg, \ + offsetof(struct bpf_sock_ops_kern, sk));\ + *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg, \ + offsetof(struct tcp_sock, FIELD_NAME)); \ + } while (0) + + case offsetof(struct bpf_sock_ops, snd_cwnd): + SOCK_OPS_GET_TCP32(snd_cwnd); + break; + + case offsetof(struct bpf_sock_ops, srtt_us): + SOCK_OPS_GET_TCP32(srtt_us); + break; } return insn - insn_buf; } diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c index 15ce30063765..cc75488d3653 100644 --- a/net/core/flow_dissector.c +++ b/net/core/flow_dissector.c @@ -133,10 +133,10 @@ skb_flow_dissect_set_enc_addr_type(enum flow_dissector_key_id type, ctrl->addr_type = type; } -static void -__skb_flow_dissect_tunnel_info(const struct sk_buff *skb, - struct flow_dissector *flow_dissector, - void *target_container) +void +skb_flow_dissect_tunnel_info(const struct sk_buff *skb, + struct flow_dissector *flow_dissector, + void *target_container) { struct ip_tunnel_info *info; struct ip_tunnel_key *key; @@ -212,6 +212,7 @@ __skb_flow_dissect_tunnel_info(const struct sk_buff *skb, tp->dst = key->tp_dst; } } +EXPORT_SYMBOL(skb_flow_dissect_tunnel_info); static enum flow_dissect_ret __skb_flow_dissect_mpls(const struct sk_buff *skb, @@ -576,9 +577,6 @@ bool __skb_flow_dissect(const struct sk_buff *skb, FLOW_DISSECTOR_KEY_BASIC, target_container); - __skb_flow_dissect_tunnel_info(skb, flow_dissector, - target_container); - if (dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_ETH_ADDRS)) { struct ethhdr *eth = eth_hdr(skb); diff --git a/net/core/gen_stats.c b/net/core/gen_stats.c index 87f28557b329..b2b2323bdc84 100644 --- a/net/core/gen_stats.c +++ b/net/core/gen_stats.c @@ -252,10 +252,10 @@ __gnet_stats_copy_queue_cpu(struct gnet_stats_queue *qstats, } } -static void __gnet_stats_copy_queue(struct gnet_stats_queue *qstats, - const struct gnet_stats_queue __percpu *cpu, - const struct gnet_stats_queue *q, - __u32 qlen) +void __gnet_stats_copy_queue(struct gnet_stats_queue *qstats, + const struct gnet_stats_queue __percpu *cpu, + const struct gnet_stats_queue *q, + __u32 qlen) { if (cpu) { __gnet_stats_copy_queue_cpu(qstats, cpu); @@ -269,6 +269,7 @@ static void __gnet_stats_copy_queue(struct gnet_stats_queue *qstats, qstats->qlen = qlen; } +EXPORT_SYMBOL(__gnet_stats_copy_queue); /** * gnet_stats_copy_queue - copy queue statistics into statistics TLV diff --git a/net/core/pktgen.c b/net/core/pktgen.c index f95a15086225..b9ce241cd28c 100644 --- a/net/core/pktgen.c +++ b/net/core/pktgen.c @@ -399,7 +399,7 @@ struct pktgen_dev { __u8 ipsmode; /* IPSEC mode (config) */ __u8 ipsproto; /* IPSEC type (config) */ __u32 spi; - struct dst_entry dst; + struct xfrm_dst xdst; struct dst_ops dstops; #endif char result[512]; @@ -2609,7 +2609,7 @@ static int pktgen_output_ipsec(struct sk_buff *skb, struct pktgen_dev *pkt_dev) * supports both transport/tunnel mode + ESP/AH type. */ if ((x->props.mode == XFRM_MODE_TUNNEL) && (pkt_dev->spi != 0)) - skb->_skb_refdst = (unsigned long)&pkt_dev->dst | SKB_DST_NOREF; + skb->_skb_refdst = (unsigned long)&pkt_dev->xdst.u.dst | SKB_DST_NOREF; rcu_read_lock_bh(); err = x->outer_mode->output(x, skb); @@ -3742,10 +3742,10 @@ static int pktgen_add_device(struct pktgen_thread *t, const char *ifname) * performance under such circumstance. */ pkt_dev->dstops.family = AF_INET; - pkt_dev->dst.dev = pkt_dev->odev; - dst_init_metrics(&pkt_dev->dst, pktgen_dst_metrics, false); - pkt_dev->dst.child = &pkt_dev->dst; - pkt_dev->dst.ops = &pkt_dev->dstops; + pkt_dev->xdst.u.dst.dev = pkt_dev->odev; + dst_init_metrics(&pkt_dev->xdst.u.dst, pktgen_dst_metrics, false); + pkt_dev->xdst.child = &pkt_dev->xdst.u.dst; + pkt_dev->xdst.u.dst.ops = &pkt_dev->dstops; #endif return add_dev_to_thread(t, pkt_dev); diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index dabba2a91fc8..412ebf0b09c6 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -62,7 +62,9 @@ struct rtnl_link { rtnl_doit_func doit; rtnl_dumpit_func dumpit; + struct module *owner; unsigned int flags; + struct rcu_head rcu; }; static DEFINE_MUTEX(rtnl_mutex); @@ -127,8 +129,7 @@ bool lockdep_rtnl_is_held(void) EXPORT_SYMBOL(lockdep_rtnl_is_held); #endif /* #ifdef CONFIG_PROVE_LOCKING */ -static struct rtnl_link __rcu *rtnl_msg_handlers[RTNL_FAMILY_MAX + 1]; -static refcount_t rtnl_msg_handlers_ref[RTNL_FAMILY_MAX + 1]; +static struct rtnl_link *__rcu *rtnl_msg_handlers[RTNL_FAMILY_MAX + 1]; static inline int rtm_msgindex(int msgtype) { @@ -144,72 +145,127 @@ static inline int rtm_msgindex(int msgtype) return msgindex; } -/** - * __rtnl_register - Register a rtnetlink message type - * @protocol: Protocol family or PF_UNSPEC - * @msgtype: rtnetlink message type - * @doit: Function pointer called for each request message - * @dumpit: Function pointer called for each dump request (NLM_F_DUMP) message - * @flags: rtnl_link_flags to modifiy behaviour of doit/dumpit functions - * - * Registers the specified function pointers (at least one of them has - * to be non-NULL) to be called whenever a request message for the - * specified protocol family and message type is received. - * - * The special protocol family PF_UNSPEC may be used to define fallback - * function pointers for the case when no entry for the specific protocol - * family exists. - * - * Returns 0 on success or a negative error code. - */ -int __rtnl_register(int protocol, int msgtype, - rtnl_doit_func doit, rtnl_dumpit_func dumpit, - unsigned int flags) +static struct rtnl_link *rtnl_get_link(int protocol, int msgtype) +{ + struct rtnl_link **tab; + + if (protocol >= ARRAY_SIZE(rtnl_msg_handlers)) + protocol = PF_UNSPEC; + + tab = rcu_dereference_rtnl(rtnl_msg_handlers[protocol]); + if (!tab) + tab = rcu_dereference_rtnl(rtnl_msg_handlers[PF_UNSPEC]); + + return tab[msgtype]; +} + +static int rtnl_register_internal(struct module *owner, + int protocol, int msgtype, + rtnl_doit_func doit, rtnl_dumpit_func dumpit, + unsigned int flags) { - struct rtnl_link *tab; + struct rtnl_link *link, *old; + struct rtnl_link __rcu **tab; int msgindex; + int ret = -ENOBUFS; BUG_ON(protocol < 0 || protocol > RTNL_FAMILY_MAX); msgindex = rtm_msgindex(msgtype); - tab = rcu_dereference_raw(rtnl_msg_handlers[protocol]); + rtnl_lock(); + tab = rtnl_msg_handlers[protocol]; if (tab == NULL) { - tab = kcalloc(RTM_NR_MSGTYPES, sizeof(*tab), GFP_KERNEL); - if (tab == NULL) - return -ENOBUFS; + tab = kcalloc(RTM_NR_MSGTYPES, sizeof(void *), GFP_KERNEL); + if (!tab) + goto unlock; + /* ensures we see the 0 stores */ rcu_assign_pointer(rtnl_msg_handlers[protocol], tab); } + old = rtnl_dereference(tab[msgindex]); + if (old) { + link = kmemdup(old, sizeof(*old), GFP_KERNEL); + if (!link) + goto unlock; + } else { + link = kzalloc(sizeof(*link), GFP_KERNEL); + if (!link) + goto unlock; + } + + WARN_ON(link->owner && link->owner != owner); + link->owner = owner; + + WARN_ON(doit && link->doit && link->doit != doit); if (doit) - tab[msgindex].doit = doit; + link->doit = doit; + WARN_ON(dumpit && link->dumpit && link->dumpit != dumpit); if (dumpit) - tab[msgindex].dumpit = dumpit; - tab[msgindex].flags |= flags; + link->dumpit = dumpit; - return 0; + link->flags |= flags; + + /* publish protocol:msgtype */ + rcu_assign_pointer(tab[msgindex], link); + ret = 0; + if (old) + kfree_rcu(old, rcu); +unlock: + rtnl_unlock(); + return ret; +} + +/** + * rtnl_register_module - Register a rtnetlink message type + * + * @owner: module registering the hook (THIS_MODULE) + * @protocol: Protocol family or PF_UNSPEC + * @msgtype: rtnetlink message type + * @doit: Function pointer called for each request message + * @dumpit: Function pointer called for each dump request (NLM_F_DUMP) message + * @flags: rtnl_link_flags to modifiy behaviour of doit/dumpit functions + * + * Like rtnl_register, but for use by removable modules. + */ +int rtnl_register_module(struct module *owner, + int protocol, int msgtype, + rtnl_doit_func doit, rtnl_dumpit_func dumpit, + unsigned int flags) +{ + return rtnl_register_internal(owner, protocol, msgtype, + doit, dumpit, flags); } -EXPORT_SYMBOL_GPL(__rtnl_register); +EXPORT_SYMBOL_GPL(rtnl_register_module); /** * rtnl_register - Register a rtnetlink message type + * @protocol: Protocol family or PF_UNSPEC + * @msgtype: rtnetlink message type + * @doit: Function pointer called for each request message + * @dumpit: Function pointer called for each dump request (NLM_F_DUMP) message + * @flags: rtnl_link_flags to modifiy behaviour of doit/dumpit functions * - * Identical to __rtnl_register() but panics on failure. This is useful - * as failure of this function is very unlikely, it can only happen due - * to lack of memory when allocating the chain to store all message - * handlers for a protocol. Meant for use in init functions where lack - * of memory implies no sense in continuing. + * Registers the specified function pointers (at least one of them has + * to be non-NULL) to be called whenever a request message for the + * specified protocol family and message type is received. + * + * The special protocol family PF_UNSPEC may be used to define fallback + * function pointers for the case when no entry for the specific protocol + * family exists. */ void rtnl_register(int protocol, int msgtype, rtnl_doit_func doit, rtnl_dumpit_func dumpit, unsigned int flags) { - if (__rtnl_register(protocol, msgtype, doit, dumpit, flags) < 0) - panic("Unable to register rtnetlink message handler, " - "protocol = %d, message type = %d\n", - protocol, msgtype); + int err; + + err = rtnl_register_internal(NULL, protocol, msgtype, doit, dumpit, + flags); + if (err) + pr_err("Unable to register rtnetlink message handler, " + "protocol = %d, message type = %d\n", protocol, msgtype); } -EXPORT_SYMBOL_GPL(rtnl_register); /** * rtnl_unregister - Unregister a rtnetlink message type @@ -220,24 +276,25 @@ EXPORT_SYMBOL_GPL(rtnl_register); */ int rtnl_unregister(int protocol, int msgtype) { - struct rtnl_link *handlers; + struct rtnl_link **tab, *link; int msgindex; BUG_ON(protocol < 0 || protocol > RTNL_FAMILY_MAX); msgindex = rtm_msgindex(msgtype); rtnl_lock(); - handlers = rtnl_dereference(rtnl_msg_handlers[protocol]); - if (!handlers) { + tab = rtnl_dereference(rtnl_msg_handlers[protocol]); + if (!tab) { rtnl_unlock(); return -ENOENT; } - handlers[msgindex].doit = NULL; - handlers[msgindex].dumpit = NULL; - handlers[msgindex].flags = 0; + link = tab[msgindex]; + rcu_assign_pointer(tab[msgindex], NULL); rtnl_unlock(); + kfree_rcu(link, rcu); + return 0; } EXPORT_SYMBOL_GPL(rtnl_unregister); @@ -251,20 +308,27 @@ EXPORT_SYMBOL_GPL(rtnl_unregister); */ void rtnl_unregister_all(int protocol) { - struct rtnl_link *handlers; + struct rtnl_link **tab, *link; + int msgindex; BUG_ON(protocol < 0 || protocol > RTNL_FAMILY_MAX); rtnl_lock(); - handlers = rtnl_dereference(rtnl_msg_handlers[protocol]); + tab = rtnl_msg_handlers[protocol]; RCU_INIT_POINTER(rtnl_msg_handlers[protocol], NULL); + for (msgindex = 0; msgindex < RTM_NR_MSGTYPES; msgindex++) { + link = tab[msgindex]; + if (!link) + continue; + + rcu_assign_pointer(tab[msgindex], NULL); + kfree_rcu(link, rcu); + } rtnl_unlock(); synchronize_net(); - while (refcount_read(&rtnl_msg_handlers_ref[protocol]) > 1) - schedule(); - kfree(handlers); + kfree(tab); } EXPORT_SYMBOL_GPL(rtnl_unregister_all); @@ -1261,6 +1325,7 @@ static u8 rtnl_xdp_attached_mode(struct net_device *dev, u32 *prog_id) { const struct net_device_ops *ops = dev->netdev_ops; const struct bpf_prog *generic_xdp_prog; + struct netdev_bpf xdp; ASSERT_RTNL(); @@ -1273,7 +1338,10 @@ static u8 rtnl_xdp_attached_mode(struct net_device *dev, u32 *prog_id) if (!ops->ndo_bpf) return XDP_ATTACHED_NONE; - return __dev_xdp_attached(dev, ops->ndo_bpf, prog_id); + __dev_xdp_query(dev, ops->ndo_bpf, &xdp); + *prog_id = xdp.prog_id; + + return xdp.prog_attached; } static int rtnl_xdp_fill(struct sk_buff *skb, struct net_device *dev) @@ -1569,6 +1637,8 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = { [IFLA_PROMISCUITY] = { .type = NLA_U32 }, [IFLA_NUM_TX_QUEUES] = { .type = NLA_U32 }, [IFLA_NUM_RX_QUEUES] = { .type = NLA_U32 }, + [IFLA_GSO_MAX_SEGS] = { .type = NLA_U32 }, + [IFLA_GSO_MAX_SIZE] = { .type = NLA_U32 }, [IFLA_PHYS_PORT_ID] = { .type = NLA_BINARY, .len = MAX_PHYS_ITEM_ID_LEN }, [IFLA_CARRIER_CHANGES] = { .type = NLA_U32 }, /* ignored */ [IFLA_PHYS_SWITCH_ID] = { .type = NLA_BINARY, .len = MAX_PHYS_ITEM_ID_LEN }, @@ -2219,6 +2289,34 @@ static int do_setlink(const struct sk_buff *skb, } } + if (tb[IFLA_GSO_MAX_SIZE]) { + u32 max_size = nla_get_u32(tb[IFLA_GSO_MAX_SIZE]); + + if (max_size > GSO_MAX_SIZE) { + err = -EINVAL; + goto errout; + } + + if (dev->gso_max_size ^ max_size) { + netif_set_gso_max_size(dev, max_size); + status |= DO_SETLINK_MODIFIED; + } + } + + if (tb[IFLA_GSO_MAX_SEGS]) { + u32 max_segs = nla_get_u32(tb[IFLA_GSO_MAX_SEGS]); + + if (max_segs > GSO_MAX_SEGS) { + err = -EINVAL; + goto errout; + } + + if (dev->gso_max_segs ^ max_segs) { + dev->gso_max_segs = max_segs; + status |= DO_SETLINK_MODIFIED; + } + } + if (tb[IFLA_OPERSTATE]) set_operstate(dev, nla_get_u8(tb[IFLA_OPERSTATE])); @@ -2583,6 +2681,10 @@ struct net_device *rtnl_create_link(struct net *net, dev->link_mode = nla_get_u8(tb[IFLA_LINKMODE]); if (tb[IFLA_GROUP]) dev_set_group(dev, nla_get_u32(tb[IFLA_GROUP])); + if (tb[IFLA_GSO_MAX_SIZE]) + netif_set_gso_max_size(dev, nla_get_u32(tb[IFLA_GSO_MAX_SIZE])); + if (tb[IFLA_GSO_MAX_SEGS]) + dev->gso_max_size = nla_get_u32(tb[IFLA_GSO_MAX_SEGS]); return dev; } @@ -2973,18 +3075,26 @@ static int rtnl_dump_all(struct sk_buff *skb, struct netlink_callback *cb) s_idx = 1; for (idx = 1; idx <= RTNL_FAMILY_MAX; idx++) { + struct rtnl_link **tab; int type = cb->nlh->nlmsg_type-RTM_BASE; - struct rtnl_link *handlers; + struct rtnl_link *link; rtnl_dumpit_func dumpit; if (idx < s_idx || idx == PF_PACKET) continue; - handlers = rtnl_dereference(rtnl_msg_handlers[idx]); - if (!handlers) + if (type < 0 || type >= RTM_NR_MSGTYPES) continue; - dumpit = READ_ONCE(handlers[type].dumpit); + tab = rcu_dereference_rtnl(rtnl_msg_handlers[idx]); + if (!tab) + continue; + + link = tab[type]; + if (!link) + continue; + + dumpit = link->dumpit; if (!dumpit) continue; @@ -4314,7 +4424,8 @@ static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, struct netlink_ext_ack *extack) { struct net *net = sock_net(skb->sk); - struct rtnl_link *handlers; + struct rtnl_link *link; + struct module *owner; int err = -EOPNOTSUPP; rtnl_doit_func doit; unsigned int flags; @@ -4338,79 +4449,85 @@ static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, if (kind != 2 && !netlink_net_capable(skb, CAP_NET_ADMIN)) return -EPERM; - if (family >= ARRAY_SIZE(rtnl_msg_handlers)) - family = PF_UNSPEC; - rcu_read_lock(); - handlers = rcu_dereference(rtnl_msg_handlers[family]); - if (!handlers) { - family = PF_UNSPEC; - handlers = rcu_dereference(rtnl_msg_handlers[family]); - } - if (kind == 2 && nlh->nlmsg_flags&NLM_F_DUMP) { struct sock *rtnl; rtnl_dumpit_func dumpit; u16 min_dump_alloc = 0; - dumpit = READ_ONCE(handlers[type].dumpit); - if (!dumpit) { + link = rtnl_get_link(family, type); + if (!link || !link->dumpit) { family = PF_UNSPEC; - handlers = rcu_dereference(rtnl_msg_handlers[PF_UNSPEC]); - if (!handlers) - goto err_unlock; - - dumpit = READ_ONCE(handlers[type].dumpit); - if (!dumpit) + link = rtnl_get_link(family, type); + if (!link || !link->dumpit) goto err_unlock; } - - refcount_inc(&rtnl_msg_handlers_ref[family]); + owner = link->owner; + dumpit = link->dumpit; if (type == RTM_GETLINK - RTM_BASE) min_dump_alloc = rtnl_calcit(skb, nlh); + err = 0; + /* need to do this before rcu_read_unlock() */ + if (!try_module_get(owner)) + err = -EPROTONOSUPPORT; + rcu_read_unlock(); rtnl = net->rtnl; - { + if (err == 0) { struct netlink_dump_control c = { .dump = dumpit, .min_dump_alloc = min_dump_alloc, + .module = owner, }; err = netlink_dump_start(rtnl, skb, nlh, &c); + /* netlink_dump_start() will keep a reference on + * module if dump is still in progress. + */ + module_put(owner); } - refcount_dec(&rtnl_msg_handlers_ref[family]); return err; } - doit = READ_ONCE(handlers[type].doit); - if (!doit) { + link = rtnl_get_link(family, type); + if (!link || !link->doit) { family = PF_UNSPEC; - handlers = rcu_dereference(rtnl_msg_handlers[family]); + link = rtnl_get_link(PF_UNSPEC, type); + if (!link || !link->doit) + goto out_unlock; } - flags = READ_ONCE(handlers[type].flags); + owner = link->owner; + if (!try_module_get(owner)) { + err = -EPROTONOSUPPORT; + goto out_unlock; + } + + flags = link->flags; if (flags & RTNL_FLAG_DOIT_UNLOCKED) { - refcount_inc(&rtnl_msg_handlers_ref[family]); - doit = READ_ONCE(handlers[type].doit); + doit = link->doit; rcu_read_unlock(); if (doit) err = doit(skb, nlh, extack); - refcount_dec(&rtnl_msg_handlers_ref[family]); + module_put(owner); return err; } - rcu_read_unlock(); rtnl_lock(); - handlers = rtnl_dereference(rtnl_msg_handlers[family]); - if (handlers) { - doit = READ_ONCE(handlers[type].doit); - if (doit) - err = doit(skb, nlh, extack); - } + link = rtnl_get_link(family, type); + if (link && link->doit) + err = link->doit(skb, nlh, extack); rtnl_unlock(); + + module_put(owner); + + return err; + +out_unlock: + rcu_read_unlock(); return err; err_unlock: @@ -4498,11 +4615,6 @@ static struct pernet_operations rtnetlink_net_ops = { void __init rtnetlink_init(void) { - int i; - - for (i = 0; i < ARRAY_SIZE(rtnl_msg_handlers_ref); i++) - refcount_set(&rtnl_msg_handlers_ref[i], 1); - if (register_pernet_subsys(&rtnetlink_net_ops)) panic("rtnetlink_init: cannot initialize rtnetlink\n"); diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c index 5eeb1d20cc38..c5bb52bc73a1 100644 --- a/net/core/sock_reuseport.c +++ b/net/core/sock_reuseport.c @@ -235,7 +235,9 @@ struct sock *reuseport_select_sock(struct sock *sk, if (prog && skb) sk2 = run_bpf(reuse, socks, prog, skb, hdr_len); - else + + /* no bpf or invalid bpf result: fall back to hash usage */ + if (!sk2) sk2 = reuse->socks[reciprocal_scale(hash, socks)]; } diff --git a/net/decnet/dn_dev.c b/net/decnet/dn_dev.c index 9153247dad28..d1885cf59319 100644 --- a/net/decnet/dn_dev.c +++ b/net/decnet/dn_dev.c @@ -1418,9 +1418,12 @@ void __init dn_dev_init(void) dn_dev_devices_on(); - rtnl_register(PF_DECnet, RTM_NEWADDR, dn_nl_newaddr, NULL, 0); - rtnl_register(PF_DECnet, RTM_DELADDR, dn_nl_deladdr, NULL, 0); - rtnl_register(PF_DECnet, RTM_GETADDR, NULL, dn_nl_dump_ifaddr, 0); + rtnl_register_module(THIS_MODULE, PF_DECnet, RTM_NEWADDR, + dn_nl_newaddr, NULL, 0); + rtnl_register_module(THIS_MODULE, PF_DECnet, RTM_DELADDR, + dn_nl_deladdr, NULL, 0); + rtnl_register_module(THIS_MODULE, PF_DECnet, RTM_GETADDR, + NULL, dn_nl_dump_ifaddr, 0); proc_create("decnet_dev", S_IRUGO, init_net.proc_net, &dn_dev_seq_fops); diff --git a/net/decnet/dn_fib.c b/net/decnet/dn_fib.c index b37a1b833c77..fce94cbd4378 100644 --- a/net/decnet/dn_fib.c +++ b/net/decnet/dn_fib.c @@ -792,8 +792,10 @@ void __init dn_fib_init(void) register_dnaddr_notifier(&dn_fib_dnaddr_notifier); - rtnl_register(PF_DECnet, RTM_NEWROUTE, dn_fib_rtm_newroute, NULL, 0); - rtnl_register(PF_DECnet, RTM_DELROUTE, dn_fib_rtm_delroute, NULL, 0); + rtnl_register_module(THIS_MODULE, PF_DECnet, RTM_NEWROUTE, + dn_fib_rtm_newroute, NULL, 0); + rtnl_register_module(THIS_MODULE, PF_DECnet, RTM_DELROUTE, + dn_fib_rtm_delroute, NULL, 0); } diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c index 324cb9f2f551..73160d4aebbe 100644 --- a/net/decnet/dn_route.c +++ b/net/decnet/dn_route.c @@ -199,11 +199,11 @@ static void dn_dst_check_expire(struct timer_list *unused) lockdep_is_held(&dn_rt_hash_table[i].lock))) != NULL) { if (atomic_read(&rt->dst.__refcnt) > 1 || (now - rt->dst.lastuse) < expire) { - rtp = &rt->dst.dn_next; + rtp = &rt->dn_next; continue; } - *rtp = rt->dst.dn_next; - rt->dst.dn_next = NULL; + *rtp = rt->dn_next; + rt->dn_next = NULL; dst_dev_put(&rt->dst); dst_release(&rt->dst); } @@ -233,11 +233,11 @@ static int dn_dst_gc(struct dst_ops *ops) lockdep_is_held(&dn_rt_hash_table[i].lock))) != NULL) { if (atomic_read(&rt->dst.__refcnt) > 1 || (now - rt->dst.lastuse) < expire) { - rtp = &rt->dst.dn_next; + rtp = &rt->dn_next; continue; } - *rtp = rt->dst.dn_next; - rt->dst.dn_next = NULL; + *rtp = rt->dn_next; + rt->dn_next = NULL; dst_dev_put(&rt->dst); dst_release(&rt->dst); break; @@ -333,8 +333,8 @@ static int dn_insert_route(struct dn_route *rt, unsigned int hash, struct dn_rou lockdep_is_held(&dn_rt_hash_table[hash].lock))) != NULL) { if (compare_keys(&rth->fld, &rt->fld)) { /* Put it first */ - *rthp = rth->dst.dn_next; - rcu_assign_pointer(rth->dst.dn_next, + *rthp = rth->dn_next; + rcu_assign_pointer(rth->dn_next, dn_rt_hash_table[hash].chain); rcu_assign_pointer(dn_rt_hash_table[hash].chain, rth); @@ -345,10 +345,10 @@ static int dn_insert_route(struct dn_route *rt, unsigned int hash, struct dn_rou *rp = rth; return 0; } - rthp = &rth->dst.dn_next; + rthp = &rth->dn_next; } - rcu_assign_pointer(rt->dst.dn_next, dn_rt_hash_table[hash].chain); + rcu_assign_pointer(rt->dn_next, dn_rt_hash_table[hash].chain); rcu_assign_pointer(dn_rt_hash_table[hash].chain, rt); dst_hold_and_use(&rt->dst, now); @@ -369,8 +369,8 @@ static void dn_run_flush(struct timer_list *unused) goto nothing_to_declare; for(; rt; rt = next) { - next = rcu_dereference_raw(rt->dst.dn_next); - RCU_INIT_POINTER(rt->dst.dn_next, NULL); + next = rcu_dereference_raw(rt->dn_next); + RCU_INIT_POINTER(rt->dn_next, NULL); dst_dev_put(&rt->dst); dst_release(&rt->dst); } @@ -1183,6 +1183,7 @@ make_route: if (rt == NULL) goto e_nobufs; + rt->dn_next = NULL; memset(&rt->fld, 0, sizeof(rt->fld)); rt->fld.saddr = oldflp->saddr; rt->fld.daddr = oldflp->daddr; @@ -1252,7 +1253,7 @@ static int __dn_route_output_key(struct dst_entry **pprt, const struct flowidn * if (!(flags & MSG_TRYHARD)) { rcu_read_lock_bh(); for (rt = rcu_dereference_bh(dn_rt_hash_table[hash].chain); rt; - rt = rcu_dereference_bh(rt->dst.dn_next)) { + rt = rcu_dereference_bh(rt->dn_next)) { if ((flp->daddr == rt->fld.daddr) && (flp->saddr == rt->fld.saddr) && (flp->flowidn_mark == rt->fld.flowidn_mark) && @@ -1448,6 +1449,7 @@ make_route: if (rt == NULL) goto e_nobufs; + rt->dn_next = NULL; memset(&rt->fld, 0, sizeof(rt->fld)); rt->rt_saddr = fld.saddr; rt->rt_daddr = fld.daddr; @@ -1529,7 +1531,7 @@ static int dn_route_input(struct sk_buff *skb) rcu_read_lock(); for(rt = rcu_dereference(dn_rt_hash_table[hash].chain); rt != NULL; - rt = rcu_dereference(rt->dst.dn_next)) { + rt = rcu_dereference(rt->dn_next)) { if ((rt->fld.saddr == cb->src) && (rt->fld.daddr == cb->dst) && (rt->fld.flowidn_oif == 0) && @@ -1749,7 +1751,7 @@ int dn_cache_dump(struct sk_buff *skb, struct netlink_callback *cb) rcu_read_lock_bh(); for(rt = rcu_dereference_bh(dn_rt_hash_table[h].chain), idx = 0; rt; - rt = rcu_dereference_bh(rt->dst.dn_next), idx++) { + rt = rcu_dereference_bh(rt->dn_next), idx++) { if (idx < s_idx) continue; skb_dst_set(skb, dst_clone(&rt->dst)); @@ -1795,7 +1797,7 @@ static struct dn_route *dn_rt_cache_get_next(struct seq_file *seq, struct dn_rou { struct dn_rt_cache_iter_state *s = seq->private; - rt = rcu_dereference_bh(rt->dst.dn_next); + rt = rcu_dereference_bh(rt->dn_next); while (!rt) { rcu_read_unlock_bh(); if (--s->bucket < 0) @@ -1921,11 +1923,11 @@ void __init dn_route_init(void) &dn_rt_cache_seq_fops); #ifdef CONFIG_DECNET_ROUTER - rtnl_register(PF_DECnet, RTM_GETROUTE, dn_cache_getroute, - dn_fib_dump, 0); + rtnl_register_module(THIS_MODULE, PF_DECnet, RTM_GETROUTE, + dn_cache_getroute, dn_fib_dump, 0); #else - rtnl_register(PF_DECnet, RTM_GETROUTE, dn_cache_getroute, - dn_cache_dump, 0); + rtnl_register_module(THIS_MODULE, PF_DECnet, RTM_GETROUTE, + dn_cache_getroute, dn_cache_dump, 0); #endif } diff --git a/net/dsa/Kconfig b/net/dsa/Kconfig index 03c3bdf25468..bbf2c82cf7b2 100644 --- a/net/dsa/Kconfig +++ b/net/dsa/Kconfig @@ -16,6 +16,15 @@ config NET_DSA if NET_DSA +config NET_DSA_LEGACY + bool "Support for older platform device and Device Tree registration" + default y + ---help--- + Say Y if you want to enable support for the older platform device and + deprecated Device Tree binding registration. + + This feature is scheduled for removal in 4.17. + # tagging formats config NET_DSA_TAG_BRCM bool diff --git a/net/dsa/Makefile b/net/dsa/Makefile index 0e13c1f95d13..9e4d3536f977 100644 --- a/net/dsa/Makefile +++ b/net/dsa/Makefile @@ -1,7 +1,8 @@ # SPDX-License-Identifier: GPL-2.0 # the core obj-$(CONFIG_NET_DSA) += dsa_core.o -dsa_core-y += dsa.o dsa2.o legacy.o master.o port.o slave.o switch.o +dsa_core-y += dsa.o dsa2.o master.o port.o slave.o switch.o +dsa_core-$(CONFIG_NET_DSA_LEGACY) += legacy.o # tagging formats dsa_core-$(CONFIG_NET_DSA_TAG_BRCM) += tag_brcm.o diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c index 1e287420ff49..21f9bed11988 100644 --- a/net/dsa/dsa2.c +++ b/net/dsa/dsa2.c @@ -241,7 +241,7 @@ static int dsa_tree_setup_default_cpu(struct dsa_switch_tree *dst) for (port = 0; port < ds->num_ports; port++) { dp = &ds->ports[port]; - if (dsa_port_is_user(dp)) + if (dsa_port_is_user(dp) || dsa_port_is_dsa(dp)) dp->cpu_dp = dst->cpu_dp; } } diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h index 7d036696e8c4..b03665e8fb4e 100644 --- a/net/dsa/dsa_priv.h +++ b/net/dsa/dsa_priv.h @@ -97,8 +97,17 @@ const struct dsa_device_ops *dsa_resolve_tag_protocol(int tag_protocol); bool dsa_schedule_work(struct work_struct *work); /* legacy.c */ +#if IS_ENABLED(CONFIG_NET_DSA_LEGACY) int dsa_legacy_register(void); void dsa_legacy_unregister(void); +#else +static inline int dsa_legacy_register(void) +{ + return -ENODEV; +} + +static inline void dsa_legacy_unregister(void) { } +#endif int dsa_legacy_fdb_add(struct ndmsg *ndm, struct nlattr *tb[], struct net_device *dev, const unsigned char *addr, u16 vid, diff --git a/net/dsa/legacy.c b/net/dsa/legacy.c index 84611d7fcfa2..aa56d3fb5da4 100644 --- a/net/dsa/legacy.c +++ b/net/dsa/legacy.c @@ -718,26 +718,6 @@ static int dsa_resume(struct device *d) } #endif -/* legacy way, bypassing the bridge *****************************************/ -int dsa_legacy_fdb_add(struct ndmsg *ndm, struct nlattr *tb[], - struct net_device *dev, - const unsigned char *addr, u16 vid, - u16 flags) -{ - struct dsa_port *dp = dsa_slave_to_port(dev); - - return dsa_port_fdb_add(dp, addr, vid); -} - -int dsa_legacy_fdb_del(struct ndmsg *ndm, struct nlattr *tb[], - struct net_device *dev, - const unsigned char *addr, u16 vid) -{ - struct dsa_port *dp = dsa_slave_to_port(dev); - - return dsa_port_fdb_del(dp, addr, vid); -} - static SIMPLE_DEV_PM_OPS(dsa_pm_ops, dsa_suspend, dsa_resume); static const struct of_device_id dsa_of_match_table[] = { diff --git a/net/dsa/slave.c b/net/dsa/slave.c index d6e7a642493b..5d6475a6cc5d 100644 --- a/net/dsa/slave.c +++ b/net/dsa/slave.c @@ -709,14 +709,12 @@ static int dsa_slave_add_cls_matchall(struct net_device *dev, struct dsa_slave_priv *p = netdev_priv(dev); struct dsa_mall_tc_entry *mall_tc_entry; __be16 protocol = cls->common.protocol; - struct net *net = dev_net(dev); struct dsa_switch *ds = dp->ds; struct net_device *to_dev; const struct tc_action *a; struct dsa_port *to_dp; int err = -EOPNOTSUPP; LIST_HEAD(actions); - int ifindex; if (!ds->ops->port_mirror_add) return err; @@ -730,8 +728,7 @@ static int dsa_slave_add_cls_matchall(struct net_device *dev, if (is_tcf_mirred_egress_mirror(a) && protocol == htons(ETH_P_ALL)) { struct dsa_mall_mirror_tc_entry *mirror; - ifindex = tcf_mirred_ifindex(a); - to_dev = __dev_get_by_index(net, ifindex); + to_dev = tcf_mirred_dev(a); if (!to_dev) return -EINVAL; @@ -944,6 +941,26 @@ static const struct ethtool_ops dsa_slave_ethtool_ops = { .set_rxnfc = dsa_slave_set_rxnfc, }; +/* legacy way, bypassing the bridge *****************************************/ +int dsa_legacy_fdb_add(struct ndmsg *ndm, struct nlattr *tb[], + struct net_device *dev, + const unsigned char *addr, u16 vid, + u16 flags) +{ + struct dsa_port *dp = dsa_slave_to_port(dev); + + return dsa_port_fdb_add(dp, addr, vid); +} + +int dsa_legacy_fdb_del(struct ndmsg *ndm, struct nlattr *tb[], + struct net_device *dev, + const unsigned char *addr, u16 vid) +{ + struct dsa_port *dp = dsa_slave_to_port(dev); + + return dsa_port_fdb_del(dp, addr, vid); +} + static const struct net_device_ops dsa_slave_netdev_ops = { .ndo_open = dsa_slave_open, .ndo_stop = dsa_slave_close, diff --git a/net/dsa/switch.c b/net/dsa/switch.c index 29608d087a7c..b93511726069 100644 --- a/net/dsa/switch.c +++ b/net/dsa/switch.c @@ -83,29 +83,52 @@ static int dsa_switch_bridge_leave(struct dsa_switch *ds, static int dsa_switch_fdb_add(struct dsa_switch *ds, struct dsa_notifier_fdb_info *info) { - /* Do not care yet about other switch chips of the fabric */ - if (ds->index != info->sw_index) - return 0; + int port = dsa_towards_port(ds, info->sw_index, info->port); if (!ds->ops->port_fdb_add) return -EOPNOTSUPP; - return ds->ops->port_fdb_add(ds, info->port, info->addr, - info->vid); + return ds->ops->port_fdb_add(ds, port, info->addr, info->vid); } static int dsa_switch_fdb_del(struct dsa_switch *ds, struct dsa_notifier_fdb_info *info) { - /* Do not care yet about other switch chips of the fabric */ - if (ds->index != info->sw_index) - return 0; + int port = dsa_towards_port(ds, info->sw_index, info->port); if (!ds->ops->port_fdb_del) return -EOPNOTSUPP; - return ds->ops->port_fdb_del(ds, info->port, info->addr, - info->vid); + return ds->ops->port_fdb_del(ds, port, info->addr, info->vid); +} + +static int +dsa_switch_mdb_prepare_bitmap(struct dsa_switch *ds, + const struct switchdev_obj_port_mdb *mdb, + const unsigned long *bitmap) +{ + int port, err; + + if (!ds->ops->port_mdb_prepare || !ds->ops->port_mdb_add) + return -EOPNOTSUPP; + + for_each_set_bit(port, bitmap, ds->num_ports) { + err = ds->ops->port_mdb_prepare(ds, port, mdb); + if (err) + return err; + } + + return 0; +} + +static void dsa_switch_mdb_add_bitmap(struct dsa_switch *ds, + const struct switchdev_obj_port_mdb *mdb, + const unsigned long *bitmap) +{ + int port; + + for_each_set_bit(port, bitmap, ds->num_ports) + ds->ops->port_mdb_add(ds, port, mdb); } static int dsa_switch_mdb_add(struct dsa_switch *ds, @@ -114,7 +137,7 @@ static int dsa_switch_mdb_add(struct dsa_switch *ds, const struct switchdev_obj_port_mdb *mdb = info->mdb; struct switchdev_trans *trans = info->trans; DECLARE_BITMAP(group, ds->num_ports); - int port, err; + int port; /* Build a mask of Multicast group members */ bitmap_zero(group, ds->num_ports); @@ -124,21 +147,10 @@ static int dsa_switch_mdb_add(struct dsa_switch *ds, if (dsa_is_dsa_port(ds, port)) set_bit(port, group); - if (switchdev_trans_ph_prepare(trans)) { - if (!ds->ops->port_mdb_prepare || !ds->ops->port_mdb_add) - return -EOPNOTSUPP; - - for_each_set_bit(port, group, ds->num_ports) { - err = ds->ops->port_mdb_prepare(ds, port, mdb, trans); - if (err) - return err; - } - - return 0; - } + if (switchdev_trans_ph_prepare(trans)) + return dsa_switch_mdb_prepare_bitmap(ds, mdb, group); - for_each_set_bit(port, group, ds->num_ports) - ds->ops->port_mdb_add(ds, port, mdb, trans); + dsa_switch_mdb_add_bitmap(ds, mdb, group); return 0; } @@ -157,13 +169,43 @@ static int dsa_switch_mdb_del(struct dsa_switch *ds, return 0; } +static int +dsa_switch_vlan_prepare_bitmap(struct dsa_switch *ds, + const struct switchdev_obj_port_vlan *vlan, + const unsigned long *bitmap) +{ + int port, err; + + if (!ds->ops->port_vlan_prepare || !ds->ops->port_vlan_add) + return -EOPNOTSUPP; + + for_each_set_bit(port, bitmap, ds->num_ports) { + err = ds->ops->port_vlan_prepare(ds, port, vlan); + if (err) + return err; + } + + return 0; +} + +static void +dsa_switch_vlan_add_bitmap(struct dsa_switch *ds, + const struct switchdev_obj_port_vlan *vlan, + const unsigned long *bitmap) +{ + int port; + + for_each_set_bit(port, bitmap, ds->num_ports) + ds->ops->port_vlan_add(ds, port, vlan); +} + static int dsa_switch_vlan_add(struct dsa_switch *ds, struct dsa_notifier_vlan_info *info) { const struct switchdev_obj_port_vlan *vlan = info->vlan; struct switchdev_trans *trans = info->trans; DECLARE_BITMAP(members, ds->num_ports); - int port, err; + int port; /* Build a mask of VLAN members */ bitmap_zero(members, ds->num_ports); @@ -173,21 +215,10 @@ static int dsa_switch_vlan_add(struct dsa_switch *ds, if (dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port)) set_bit(port, members); - if (switchdev_trans_ph_prepare(trans)) { - if (!ds->ops->port_vlan_prepare || !ds->ops->port_vlan_add) - return -EOPNOTSUPP; - - for_each_set_bit(port, members, ds->num_ports) { - err = ds->ops->port_vlan_prepare(ds, port, vlan, trans); - if (err) - return err; - } - - return 0; - } + if (switchdev_trans_ph_prepare(trans)) + return dsa_switch_vlan_prepare_bitmap(ds, vlan, members); - for_each_set_bit(port, members, ds->num_ports) - ds->ops->port_vlan_add(ds, port, vlan, trans); + dsa_switch_vlan_add_bitmap(ds, vlan, members); return 0; } diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index e7d15fb0d94d..f6f58108b4c5 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -19,6 +19,7 @@ #include <linux/slab.h> #include <linux/wait.h> #include <linux/vmalloc.h> +#include <linux/bootmem.h> #include <net/addrconf.h> #include <net/inet_connection_sock.h> @@ -168,6 +169,60 @@ int __inet_inherit_port(const struct sock *sk, struct sock *child) } EXPORT_SYMBOL_GPL(__inet_inherit_port); +static struct inet_listen_hashbucket * +inet_lhash2_bucket_sk(struct inet_hashinfo *h, struct sock *sk) +{ + u32 hash; + +#if IS_ENABLED(CONFIG_IPV6) + if (sk->sk_family == AF_INET6) + hash = ipv6_portaddr_hash(sock_net(sk), + &sk->sk_v6_rcv_saddr, + inet_sk(sk)->inet_num); + else +#endif + hash = ipv4_portaddr_hash(sock_net(sk), + inet_sk(sk)->inet_rcv_saddr, + inet_sk(sk)->inet_num); + return inet_lhash2_bucket(h, hash); +} + +static void inet_hash2(struct inet_hashinfo *h, struct sock *sk) +{ + struct inet_listen_hashbucket *ilb2; + + if (!h->lhash2) + return; + + ilb2 = inet_lhash2_bucket_sk(h, sk); + + spin_lock(&ilb2->lock); + if (sk->sk_reuseport && sk->sk_family == AF_INET6) + hlist_add_tail_rcu(&inet_csk(sk)->icsk_listen_portaddr_node, + &ilb2->head); + else + hlist_add_head_rcu(&inet_csk(sk)->icsk_listen_portaddr_node, + &ilb2->head); + ilb2->count++; + spin_unlock(&ilb2->lock); +} + +static void inet_unhash2(struct inet_hashinfo *h, struct sock *sk) +{ + struct inet_listen_hashbucket *ilb2; + + if (!h->lhash2 || + WARN_ON_ONCE(hlist_unhashed(&inet_csk(sk)->icsk_listen_portaddr_node))) + return; + + ilb2 = inet_lhash2_bucket_sk(h, sk); + + spin_lock(&ilb2->lock); + hlist_del_init_rcu(&inet_csk(sk)->icsk_listen_portaddr_node); + ilb2->count--; + spin_unlock(&ilb2->lock); +} + static inline int compute_score(struct sock *sk, struct net *net, const unsigned short hnum, const __be32 daddr, const int dif, const int sdif, bool exact_dif) @@ -207,6 +262,40 @@ static inline int compute_score(struct sock *sk, struct net *net, */ /* called with rcu_read_lock() : No refcount taken on the socket */ +static struct sock *inet_lhash2_lookup(struct net *net, + struct inet_listen_hashbucket *ilb2, + struct sk_buff *skb, int doff, + const __be32 saddr, __be16 sport, + const __be32 daddr, const unsigned short hnum, + const int dif, const int sdif) +{ + bool exact_dif = inet_exact_dif_match(net, skb); + struct inet_connection_sock *icsk; + struct sock *sk, *result = NULL; + int score, hiscore = 0; + u32 phash = 0; + + inet_lhash2_for_each_icsk_rcu(icsk, &ilb2->head) { + sk = (struct sock *)icsk; + score = compute_score(sk, net, hnum, daddr, + dif, sdif, exact_dif); + if (score > hiscore) { + if (sk->sk_reuseport) { + phash = inet_ehashfn(net, daddr, hnum, + saddr, sport); + result = reuseport_select_sock(sk, phash, + skb, doff); + if (result) + return result; + } + result = sk; + hiscore = score; + } + } + + return result; +} + struct sock *__inet_lookup_listener(struct net *net, struct inet_hashinfo *hashinfo, struct sk_buff *skb, int doff, @@ -216,32 +305,57 @@ struct sock *__inet_lookup_listener(struct net *net, { unsigned int hash = inet_lhashfn(net, hnum); struct inet_listen_hashbucket *ilb = &hashinfo->listening_hash[hash]; - int score, hiscore = 0, matches = 0, reuseport = 0; bool exact_dif = inet_exact_dif_match(net, skb); + struct inet_listen_hashbucket *ilb2; struct sock *sk, *result = NULL; + int score, hiscore = 0; + unsigned int hash2; u32 phash = 0; + if (ilb->count <= 10 || !hashinfo->lhash2) + goto port_lookup; + + /* Too many sk in the ilb bucket (which is hashed by port alone). + * Try lhash2 (which is hashed by port and addr) instead. + */ + + hash2 = ipv4_portaddr_hash(net, daddr, hnum); + ilb2 = inet_lhash2_bucket(hashinfo, hash2); + if (ilb2->count > ilb->count) + goto port_lookup; + + result = inet_lhash2_lookup(net, ilb2, skb, doff, + saddr, sport, daddr, hnum, + dif, sdif); + if (result) + return result; + + /* Lookup lhash2 with INADDR_ANY */ + + hash2 = ipv4_portaddr_hash(net, htonl(INADDR_ANY), hnum); + ilb2 = inet_lhash2_bucket(hashinfo, hash2); + if (ilb2->count > ilb->count) + goto port_lookup; + + return inet_lhash2_lookup(net, ilb2, skb, doff, + saddr, sport, daddr, hnum, + dif, sdif); + +port_lookup: sk_for_each_rcu(sk, &ilb->head) { score = compute_score(sk, net, hnum, daddr, dif, sdif, exact_dif); if (score > hiscore) { - reuseport = sk->sk_reuseport; - if (reuseport) { + if (sk->sk_reuseport) { phash = inet_ehashfn(net, daddr, hnum, saddr, sport); result = reuseport_select_sock(sk, phash, skb, doff); if (result) return result; - matches = 1; } result = sk; hiscore = score; - } else if (score == hiscore && reuseport) { - matches++; - if (reciprocal_scale(phash, matches) == 0) - result = sk; - phash = next_pseudo_random32(phash); } } return result; @@ -483,6 +597,8 @@ int __inet_hash(struct sock *sk, struct sock *osk) hlist_add_tail_rcu(&sk->sk_node, &ilb->head); else hlist_add_head_rcu(&sk->sk_node, &ilb->head); + inet_hash2(hashinfo, sk); + ilb->count++; sock_set_flag(sk, SOCK_RCU_FREE); sock_prot_inuse_add(sock_net(sk), sk->sk_prot, 1); unlock: @@ -509,28 +625,35 @@ EXPORT_SYMBOL_GPL(inet_hash); void inet_unhash(struct sock *sk) { struct inet_hashinfo *hashinfo = sk->sk_prot->h.hashinfo; + struct inet_listen_hashbucket *ilb; spinlock_t *lock; bool listener = false; - int done; if (sk_unhashed(sk)) return; if (sk->sk_state == TCP_LISTEN) { - lock = &hashinfo->listening_hash[inet_sk_listen_hashfn(sk)].lock; + ilb = &hashinfo->listening_hash[inet_sk_listen_hashfn(sk)]; + lock = &ilb->lock; listener = true; } else { lock = inet_ehash_lockp(hashinfo, sk->sk_hash); } spin_lock_bh(lock); + if (sk_unhashed(sk)) + goto unlock; + if (rcu_access_pointer(sk->sk_reuseport_cb)) reuseport_detach_sock(sk); - if (listener) - done = __sk_del_node_init(sk); - else - done = __sk_nulls_del_node_init_rcu(sk); - if (done) - sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1); + if (listener) { + inet_unhash2(hashinfo, sk); + __sk_del_node_init(sk); + ilb->count--; + } else { + __sk_nulls_del_node_init_rcu(sk); + } + sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1); +unlock: spin_unlock_bh(lock); } EXPORT_SYMBOL_GPL(inet_unhash); @@ -665,10 +788,37 @@ void inet_hashinfo_init(struct inet_hashinfo *h) for (i = 0; i < INET_LHTABLE_SIZE; i++) { spin_lock_init(&h->listening_hash[i].lock); INIT_HLIST_HEAD(&h->listening_hash[i].head); + h->listening_hash[i].count = 0; } + + h->lhash2 = NULL; } EXPORT_SYMBOL_GPL(inet_hashinfo_init); +void __init inet_hashinfo2_init(struct inet_hashinfo *h, const char *name, + unsigned long numentries, int scale, + unsigned long low_limit, + unsigned long high_limit) +{ + unsigned int i; + + h->lhash2 = alloc_large_system_hash(name, + sizeof(*h->lhash2), + numentries, + scale, + 0, + NULL, + &h->lhash2_mask, + low_limit, + high_limit); + + for (i = 0; i <= h->lhash2_mask; i++) { + spin_lock_init(&h->lhash2[i].lock); + INIT_HLIST_HEAD(&h->lhash2[i].head); + h->lhash2[i].count = 0; + } +} + int inet_ehash_locks_alloc(struct inet_hashinfo *hashinfo) { unsigned int locksz = sizeof(spinlock_t); diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c index bb6239169b1a..d828821d88d7 100644 --- a/net/ipv4/ip_gre.c +++ b/net/ipv4/ip_gre.c @@ -114,7 +114,8 @@ MODULE_PARM_DESC(log_ecn_error, "Log packets received with corrupted ECN"); static struct rtnl_link_ops ipgre_link_ops __read_mostly; static int ipgre_tunnel_init(struct net_device *dev); static void erspan_build_header(struct sk_buff *skb, - __be32 id, u32 index, bool truncate); + __be32 id, u32 index, + bool truncate, bool is_ipv4); static unsigned int ipgre_net_id __read_mostly; static unsigned int gre_tap_net_id __read_mostly; @@ -589,7 +590,7 @@ static void erspan_fb_xmit(struct sk_buff *skb, struct net_device *dev, goto err_free_rt; erspan_build_header(skb, tunnel_id_to_key32(key->tun_id), - ntohl(md->index), truncate); + ntohl(md->index), truncate, true); gre_build_header(skb, 8, TUNNEL_SEQ, htons(ETH_P_ERSPAN), 0, htonl(tunnel->o_seqno++)); @@ -668,52 +669,6 @@ free_skb: return NETDEV_TX_OK; } -static inline u8 tos_to_cos(u8 tos) -{ - u8 dscp, cos; - - dscp = tos >> 2; - cos = dscp >> 3; - return cos; -} - -static void erspan_build_header(struct sk_buff *skb, - __be32 id, u32 index, bool truncate) -{ - struct iphdr *iphdr = ip_hdr(skb); - struct ethhdr *eth = eth_hdr(skb); - enum erspan_encap_type enc_type; - struct erspanhdr *ershdr; - struct qtag_prefix { - __be16 eth_type; - __be16 tci; - } *qp; - u16 vlan_tci = 0; - - enc_type = ERSPAN_ENCAP_NOVLAN; - - /* If mirrored packet has vlan tag, extract tci and - * perserve vlan header in the mirrored frame. - */ - if (eth->h_proto == htons(ETH_P_8021Q)) { - qp = (struct qtag_prefix *)(skb->data + 2 * ETH_ALEN); - vlan_tci = ntohs(qp->tci); - enc_type = ERSPAN_ENCAP_INFRAME; - } - - skb_push(skb, sizeof(*ershdr)); - ershdr = (struct erspanhdr *)skb->data; - memset(ershdr, 0, sizeof(*ershdr)); - - ershdr->ver_vlan = htons((vlan_tci & VLAN_MASK) | - (ERSPAN_VERSION << VER_OFFSET)); - ershdr->session_id = htons((u16)(ntohl(id) & ID_MASK) | - ((tos_to_cos(iphdr->tos) << COS_OFFSET) & COS_MASK) | - (enc_type << EN_OFFSET & EN_MASK) | - ((truncate << T_OFFSET) & T_MASK)); - ershdr->md.index = htonl(index & INDEX_MASK); -} - static netdev_tx_t erspan_xmit(struct sk_buff *skb, struct net_device *dev) { @@ -737,7 +692,8 @@ static netdev_tx_t erspan_xmit(struct sk_buff *skb, } /* Push ERSPAN header */ - erspan_build_header(skb, tunnel->parms.o_key, tunnel->index, truncate); + erspan_build_header(skb, tunnel->parms.o_key, tunnel->index, + truncate, true); tunnel->parms.o_flags &= ~TUNNEL_KEY; __gre_xmit(skb, dev, &tunnel->parms.iph, htons(ETH_P_ERSPAN)); return NETDEV_TX_OK; diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 43b69af242e1..f0ed031f3594 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -1106,7 +1106,7 @@ void ipv4_sk_update_pmtu(struct sk_buff *skb, struct sock *sk, u32 mtu) new = true; } - __ip_rt_update_pmtu((struct rtable *) rt->dst.path, &fl4, mtu); + __ip_rt_update_pmtu((struct rtable *) xfrm_dst_path(&rt->dst), &fl4, mtu); if (!dst_check(&rt->dst, 0)) { if (new) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index f08eebe60446..c470fec9062f 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -3578,6 +3578,9 @@ void __init tcp_init(void) percpu_counter_init(&tcp_sockets_allocated, 0, GFP_KERNEL); percpu_counter_init(&tcp_orphan_count, 0, GFP_KERNEL); inet_hashinfo_init(&tcp_hashinfo); + inet_hashinfo2_init(&tcp_hashinfo, "tcp_listen_portaddr_hash", + thash_entries, 21, /* one slot per 2 MB*/ + 0, 64 * 1024); tcp_hashinfo.bind_bucket_cachep = kmem_cache_create("tcp_bind_bucket", sizeof(struct inet_bind_bucket), 0, diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index e4ff25c947c5..e9c0d1e1772e 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -357,18 +357,12 @@ fail: } EXPORT_SYMBOL(udp_lib_get_port); -static u32 udp4_portaddr_hash(const struct net *net, __be32 saddr, - unsigned int port) -{ - return jhash_1word((__force u32)saddr, net_hash_mix(net)) ^ port; -} - int udp_v4_get_port(struct sock *sk, unsigned short snum) { unsigned int hash2_nulladdr = - udp4_portaddr_hash(sock_net(sk), htonl(INADDR_ANY), snum); + ipv4_portaddr_hash(sock_net(sk), htonl(INADDR_ANY), snum); unsigned int hash2_partial = - udp4_portaddr_hash(sock_net(sk), inet_sk(sk)->inet_rcv_saddr, 0); + ipv4_portaddr_hash(sock_net(sk), inet_sk(sk)->inet_rcv_saddr, 0); /* precompute partial secondary hash */ udp_sk(sk)->udp_portaddr_hash = hash2_partial; @@ -445,7 +439,7 @@ static struct sock *udp4_lib_lookup2(struct net *net, struct sk_buff *skb) { struct sock *sk, *result; - int score, badness, matches = 0, reuseport = 0; + int score, badness; u32 hash = 0; result = NULL; @@ -454,23 +448,16 @@ static struct sock *udp4_lib_lookup2(struct net *net, score = compute_score(sk, net, saddr, sport, daddr, hnum, dif, sdif, exact_dif); if (score > badness) { - reuseport = sk->sk_reuseport; - if (reuseport) { + if (sk->sk_reuseport) { hash = udp_ehashfn(net, daddr, hnum, saddr, sport); result = reuseport_select_sock(sk, hash, skb, sizeof(struct udphdr)); if (result) return result; - matches = 1; } badness = score; result = sk; - } else if (score == badness && reuseport) { - matches++; - if (reciprocal_scale(hash, matches) == 0) - result = sk; - hash = next_pseudo_random32(hash); } } return result; @@ -488,11 +475,11 @@ struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr, unsigned int hash2, slot2, slot = udp_hashfn(net, hnum, udptable->mask); struct udp_hslot *hslot2, *hslot = &udptable->hash[slot]; bool exact_dif = udp_lib_exact_dif_match(net, skb); - int score, badness, matches = 0, reuseport = 0; + int score, badness; u32 hash = 0; if (hslot->count > 10) { - hash2 = udp4_portaddr_hash(net, daddr, hnum); + hash2 = ipv4_portaddr_hash(net, daddr, hnum); slot2 = hash2 & udptable->mask; hslot2 = &udptable->hash2[slot2]; if (hslot->count < hslot2->count) @@ -503,7 +490,7 @@ struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr, exact_dif, hslot2, skb); if (!result) { unsigned int old_slot2 = slot2; - hash2 = udp4_portaddr_hash(net, htonl(INADDR_ANY), hnum); + hash2 = ipv4_portaddr_hash(net, htonl(INADDR_ANY), hnum); slot2 = hash2 & udptable->mask; /* avoid searching the same slot again. */ if (unlikely(slot2 == old_slot2)) @@ -526,23 +513,16 @@ begin: score = compute_score(sk, net, saddr, sport, daddr, hnum, dif, sdif, exact_dif); if (score > badness) { - reuseport = sk->sk_reuseport; - if (reuseport) { + if (sk->sk_reuseport) { hash = udp_ehashfn(net, daddr, hnum, saddr, sport); result = reuseport_select_sock(sk, hash, skb, sizeof(struct udphdr)); if (result) return result; - matches = 1; } result = sk; badness = score; - } else if (score == badness && reuseport) { - matches++; - if (reciprocal_scale(hash, matches) == 0) - result = sk; - hash = next_pseudo_random32(hash); } } return result; @@ -1775,7 +1755,7 @@ EXPORT_SYMBOL(udp_lib_rehash); static void udp_v4_rehash(struct sock *sk) { - u16 new_hash = udp4_portaddr_hash(sock_net(sk), + u16 new_hash = ipv4_portaddr_hash(sock_net(sk), inet_sk(sk)->inet_rcv_saddr, inet_sk(sk)->inet_num); udp_lib_rehash(sk, new_hash); @@ -1966,9 +1946,9 @@ static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb, struct sk_buff *nskb; if (use_hash2) { - hash2_any = udp4_portaddr_hash(net, htonl(INADDR_ANY), hnum) & + hash2_any = ipv4_portaddr_hash(net, htonl(INADDR_ANY), hnum) & udptable->mask; - hash2 = udp4_portaddr_hash(net, daddr, hnum) & udptable->mask; + hash2 = ipv4_portaddr_hash(net, daddr, hnum) & udptable->mask; start_lookup: hslot = &udptable->hash2[hash2]; offset = offsetof(typeof(*sk), __sk_common.skc_portaddr_node); @@ -2200,7 +2180,7 @@ static struct sock *__udp4_lib_demux_lookup(struct net *net, int dif, int sdif) { unsigned short hnum = ntohs(loc_port); - unsigned int hash2 = udp4_portaddr_hash(net, loc_addr, hnum); + unsigned int hash2 = ipv4_portaddr_hash(net, loc_addr, hnum); unsigned int slot2 = hash2 & udp_table.mask; struct udp_hslot *hslot2 = &udp_table.hash2[slot2]; INET_ADDR_COOKIE(acookie, rmt_addr, loc_addr); diff --git a/net/ipv4/xfrm4_mode_tunnel.c b/net/ipv4/xfrm4_mode_tunnel.c index e6265e2c274e..7d885a44dc9d 100644 --- a/net/ipv4/xfrm4_mode_tunnel.c +++ b/net/ipv4/xfrm4_mode_tunnel.c @@ -62,7 +62,7 @@ static int xfrm4_mode_tunnel_output(struct xfrm_state *x, struct sk_buff *skb) top_iph->frag_off = (flags & XFRM_STATE_NOPMTUDISC) ? 0 : (XFRM_MODE_SKB_CB(skb)->frag_off & htons(IP_DF)); - top_iph->ttl = ip4_dst_hoplimit(dst->child); + top_iph->ttl = ip4_dst_hoplimit(xfrm_dst_child(dst)); top_iph->saddr = x->props.saddr.a4; top_iph->daddr = x->id.daddr.a4; diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index f49bd7897e95..ed06b1190f05 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -6595,27 +6595,45 @@ int __init addrconf_init(void) rtnl_af_register(&inet6_ops); - err = __rtnl_register(PF_INET6, RTM_GETLINK, NULL, inet6_dump_ifinfo, - 0); + err = rtnl_register_module(THIS_MODULE, PF_INET6, RTM_GETLINK, + NULL, inet6_dump_ifinfo, 0); if (err < 0) goto errout; - /* Only the first call to __rtnl_register can fail */ - __rtnl_register(PF_INET6, RTM_NEWADDR, inet6_rtm_newaddr, NULL, 0); - __rtnl_register(PF_INET6, RTM_DELADDR, inet6_rtm_deladdr, NULL, 0); - __rtnl_register(PF_INET6, RTM_GETADDR, inet6_rtm_getaddr, - inet6_dump_ifaddr, RTNL_FLAG_DOIT_UNLOCKED); - __rtnl_register(PF_INET6, RTM_GETMULTICAST, NULL, - inet6_dump_ifmcaddr, 0); - __rtnl_register(PF_INET6, RTM_GETANYCAST, NULL, - inet6_dump_ifacaddr, 0); - __rtnl_register(PF_INET6, RTM_GETNETCONF, inet6_netconf_get_devconf, - inet6_netconf_dump_devconf, RTNL_FLAG_DOIT_UNLOCKED); - - ipv6_addr_label_rtnl_register(); + err = rtnl_register_module(THIS_MODULE, PF_INET6, RTM_NEWADDR, + inet6_rtm_newaddr, NULL, 0); + if (err < 0) + goto errout; + err = rtnl_register_module(THIS_MODULE, PF_INET6, RTM_DELADDR, + inet6_rtm_deladdr, NULL, 0); + if (err < 0) + goto errout; + err = rtnl_register_module(THIS_MODULE, PF_INET6, RTM_GETADDR, + inet6_rtm_getaddr, inet6_dump_ifaddr, + RTNL_FLAG_DOIT_UNLOCKED); + if (err < 0) + goto errout; + err = rtnl_register_module(THIS_MODULE, PF_INET6, RTM_GETMULTICAST, + NULL, inet6_dump_ifmcaddr, 0); + if (err < 0) + goto errout; + err = rtnl_register_module(THIS_MODULE, PF_INET6, RTM_GETANYCAST, + NULL, inet6_dump_ifacaddr, 0); + if (err < 0) + goto errout; + err = rtnl_register_module(THIS_MODULE, PF_INET6, RTM_GETNETCONF, + inet6_netconf_get_devconf, + inet6_netconf_dump_devconf, + RTNL_FLAG_DOIT_UNLOCKED); + if (err < 0) + goto errout; + err = ipv6_addr_label_rtnl_register(); + if (err < 0) + goto errout; return 0; errout: + rtnl_unregister_all(PF_INET6); rtnl_af_unregister(&inet6_ops); unregister_netdevice_notifier(&ipv6_dev_notf); errlo: diff --git a/net/ipv6/addrlabel.c b/net/ipv6/addrlabel.c index 00e1f8ee08f8..1d6ced37ad71 100644 --- a/net/ipv6/addrlabel.c +++ b/net/ipv6/addrlabel.c @@ -547,13 +547,22 @@ static int ip6addrlbl_get(struct sk_buff *in_skb, struct nlmsghdr *nlh, return err; } -void __init ipv6_addr_label_rtnl_register(void) +int __init ipv6_addr_label_rtnl_register(void) { - __rtnl_register(PF_INET6, RTM_NEWADDRLABEL, ip6addrlbl_newdel, - NULL, RTNL_FLAG_DOIT_UNLOCKED); - __rtnl_register(PF_INET6, RTM_DELADDRLABEL, ip6addrlbl_newdel, - NULL, RTNL_FLAG_DOIT_UNLOCKED); - __rtnl_register(PF_INET6, RTM_GETADDRLABEL, ip6addrlbl_get, - ip6addrlbl_dump, RTNL_FLAG_DOIT_UNLOCKED); -} + int ret; + ret = rtnl_register_module(THIS_MODULE, PF_INET6, RTM_NEWADDRLABEL, + ip6addrlbl_newdel, + NULL, RTNL_FLAG_DOIT_UNLOCKED); + if (ret < 0) + return ret; + ret = rtnl_register_module(THIS_MODULE, PF_INET6, RTM_DELADDRLABEL, + ip6addrlbl_newdel, + NULL, RTNL_FLAG_DOIT_UNLOCKED); + if (ret < 0) + return ret; + ret = rtnl_register_module(THIS_MODULE, PF_INET6, RTM_GETADDRLABEL, + ip6addrlbl_get, + ip6addrlbl_dump, RTNL_FLAG_DOIT_UNLOCKED); + return ret; +} diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c index b01858f5deb1..2febe26de6a1 100644 --- a/net/ipv6/inet6_hashtables.c +++ b/net/ipv6/inet6_hashtables.c @@ -125,6 +125,40 @@ static inline int compute_score(struct sock *sk, struct net *net, } /* called with rcu_read_lock() */ +static struct sock *inet6_lhash2_lookup(struct net *net, + struct inet_listen_hashbucket *ilb2, + struct sk_buff *skb, int doff, + const struct in6_addr *saddr, + const __be16 sport, const struct in6_addr *daddr, + const unsigned short hnum, const int dif, const int sdif) +{ + bool exact_dif = inet6_exact_dif_match(net, skb); + struct inet_connection_sock *icsk; + struct sock *sk, *result = NULL; + int score, hiscore = 0; + u32 phash = 0; + + inet_lhash2_for_each_icsk_rcu(icsk, &ilb2->head) { + sk = (struct sock *)icsk; + score = compute_score(sk, net, hnum, daddr, dif, sdif, + exact_dif); + if (score > hiscore) { + if (sk->sk_reuseport) { + phash = inet6_ehashfn(net, daddr, hnum, + saddr, sport); + result = reuseport_select_sock(sk, phash, + skb, doff); + if (result) + return result; + } + result = sk; + hiscore = score; + } + } + + return result; +} + struct sock *inet6_lookup_listener(struct net *net, struct inet_hashinfo *hashinfo, struct sk_buff *skb, int doff, @@ -134,31 +168,56 @@ struct sock *inet6_lookup_listener(struct net *net, { unsigned int hash = inet_lhashfn(net, hnum); struct inet_listen_hashbucket *ilb = &hashinfo->listening_hash[hash]; - int score, hiscore = 0, matches = 0, reuseport = 0; bool exact_dif = inet6_exact_dif_match(net, skb); + struct inet_listen_hashbucket *ilb2; struct sock *sk, *result = NULL; + int score, hiscore = 0; + unsigned int hash2; u32 phash = 0; + if (ilb->count <= 10 || !hashinfo->lhash2) + goto port_lookup; + + /* Too many sk in the ilb bucket (which is hashed by port alone). + * Try lhash2 (which is hashed by port and addr) instead. + */ + + hash2 = ipv6_portaddr_hash(net, daddr, hnum); + ilb2 = inet_lhash2_bucket(hashinfo, hash2); + if (ilb2->count > ilb->count) + goto port_lookup; + + result = inet6_lhash2_lookup(net, ilb2, skb, doff, + saddr, sport, daddr, hnum, + dif, sdif); + if (result) + return result; + + /* Lookup lhash2 with in6addr_any */ + + hash2 = ipv6_portaddr_hash(net, &in6addr_any, hnum); + ilb2 = inet_lhash2_bucket(hashinfo, hash2); + if (ilb2->count > ilb->count) + goto port_lookup; + + return inet6_lhash2_lookup(net, ilb2, skb, doff, + saddr, sport, daddr, hnum, + dif, sdif); + +port_lookup: sk_for_each(sk, &ilb->head) { score = compute_score(sk, net, hnum, daddr, dif, sdif, exact_dif); if (score > hiscore) { - reuseport = sk->sk_reuseport; - if (reuseport) { + if (sk->sk_reuseport) { phash = inet6_ehashfn(net, daddr, hnum, saddr, sport); result = reuseport_select_sock(sk, phash, skb, doff); if (result) return result; - matches = 1; } result = sk; hiscore = score; - } else if (score == hiscore && reuseport) { - matches++; - if (reciprocal_scale(phash, matches) == 0) - result = sk; - phash = next_pseudo_random32(phash); } } return result; diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c index f5285f4e1d08..a64d559fa513 100644 --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -893,7 +893,7 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt, ins = &fn->leaf; for (iter = leaf; iter; - iter = rcu_dereference_protected(iter->dst.rt6_next, + iter = rcu_dereference_protected(iter->rt6_next, lockdep_is_held(&rt->rt6i_table->tb6_lock))) { /* * Search for duplicates @@ -950,7 +950,7 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt, break; next_iter: - ins = &iter->dst.rt6_next; + ins = &iter->rt6_next; } if (fallback_ins && !found) { @@ -979,7 +979,7 @@ next_iter: &sibling->rt6i_siblings); break; } - sibling = rcu_dereference_protected(sibling->dst.rt6_next, + sibling = rcu_dereference_protected(sibling->rt6_next, lockdep_is_held(&rt->rt6i_table->tb6_lock)); } /* For each sibling in the list, increment the counter of @@ -1009,7 +1009,7 @@ add: if (err) return err; - rcu_assign_pointer(rt->dst.rt6_next, iter); + rcu_assign_pointer(rt->rt6_next, iter); atomic_inc(&rt->rt6i_ref); rcu_assign_pointer(rt->rt6i_node, fn); rcu_assign_pointer(*ins, rt); @@ -1040,7 +1040,7 @@ add: atomic_inc(&rt->rt6i_ref); rcu_assign_pointer(rt->rt6i_node, fn); - rt->dst.rt6_next = iter->dst.rt6_next; + rt->rt6_next = iter->rt6_next; rcu_assign_pointer(*ins, rt); call_fib6_entry_notifiers(info->nl_net, FIB_EVENT_ENTRY_REPLACE, rt, extack); @@ -1059,14 +1059,14 @@ add: if (nsiblings) { /* Replacing an ECMP route, remove all siblings */ - ins = &rt->dst.rt6_next; + ins = &rt->rt6_next; iter = rcu_dereference_protected(*ins, lockdep_is_held(&rt->rt6i_table->tb6_lock)); while (iter) { if (iter->rt6i_metric > rt->rt6i_metric) break; if (rt6_qualify_for_ecmp(iter)) { - *ins = iter->dst.rt6_next; + *ins = iter->rt6_next; iter->rt6i_node = NULL; fib6_purge_rt(iter, fn, info->nl_net); if (rcu_access_pointer(fn->rr_ptr) == iter) @@ -1075,7 +1075,7 @@ add: nsiblings--; info->nl_net->ipv6.rt6_stats->fib_rt_entries--; } else { - ins = &iter->dst.rt6_next; + ins = &iter->rt6_next; } iter = rcu_dereference_protected(*ins, lockdep_is_held(&rt->rt6i_table->tb6_lock)); @@ -1644,7 +1644,7 @@ static void fib6_del_route(struct fib6_table *table, struct fib6_node *fn, WARN_ON_ONCE(rt->rt6i_flags & RTF_CACHE); /* Unlink it */ - *rtp = rt->dst.rt6_next; + *rtp = rt->rt6_next; rt->rt6i_node = NULL; net->ipv6.rt6_stats->fib_rt_entries--; net->ipv6.rt6_stats->fib_discarded_routes++; @@ -1672,7 +1672,7 @@ static void fib6_del_route(struct fib6_table *table, struct fib6_node *fn, FOR_WALKERS(net, w) { if (w->state == FWS_C && w->leaf == rt) { RT6_TRACE("walker %p adjusted by delroute\n", w); - w->leaf = rcu_dereference_protected(rt->dst.rt6_next, + w->leaf = rcu_dereference_protected(rt->rt6_next, lockdep_is_held(&table->tb6_lock)); if (!w->leaf) w->state = FWS_U; @@ -1731,7 +1731,7 @@ int fib6_del(struct rt6_info *rt, struct nl_info *info) fib6_del_route(table, fn, rtp, info); return 0; } - rtp_next = &cur->dst.rt6_next; + rtp_next = &cur->rt6_next; } return -ENOENT; } @@ -2142,8 +2142,8 @@ int __init fib6_init(void) if (ret) goto out_kmem_cache_create; - ret = __rtnl_register(PF_INET6, RTM_GETROUTE, NULL, inet6_dump_fib, - 0); + ret = rtnl_register_module(THIS_MODULE, PF_INET6, RTM_GETROUTE, NULL, + inet6_dump_fib, 0); if (ret) goto out_unregister_subsys; @@ -2208,7 +2208,7 @@ static int ipv6_route_yield(struct fib6_walker *w) do { iter->w.leaf = rcu_dereference_protected( - iter->w.leaf->dst.rt6_next, + iter->w.leaf->rt6_next, lockdep_is_held(&iter->tbl->tb6_lock)); iter->skip--; if (!iter->skip && iter->w.leaf) @@ -2274,7 +2274,7 @@ static void *ipv6_route_seq_next(struct seq_file *seq, void *v, loff_t *pos) if (!v) goto iter_table; - n = rcu_dereference_bh(((struct rt6_info *)v)->dst.rt6_next); + n = rcu_dereference_bh(((struct rt6_info *)v)->rt6_next); if (n) { ++*pos; return n; diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c index 4cfd8e0696fe..4562579797d1 100644 --- a/net/ipv6/ip6_gre.c +++ b/net/ipv6/ip6_gre.c @@ -55,6 +55,8 @@ #include <net/ip6_route.h> #include <net/ip6_tunnel.h> #include <net/gre.h> +#include <net/erspan.h> +#include <net/dst_metadata.h> static bool log_ecn_error = true; @@ -68,11 +70,13 @@ static unsigned int ip6gre_net_id __read_mostly; struct ip6gre_net { struct ip6_tnl __rcu *tunnels[4][IP6_GRE_HASH_SIZE]; + struct ip6_tnl __rcu *collect_md_tun; struct net_device *fb_tunnel_dev; }; static struct rtnl_link_ops ip6gre_link_ops __read_mostly; static struct rtnl_link_ops ip6gre_tap_ops __read_mostly; +static struct rtnl_link_ops ip6erspan_tap_ops __read_mostly; static int ip6gre_tunnel_init(struct net_device *dev); static void ip6gre_tunnel_setup(struct net_device *dev); static void ip6gre_tunnel_link(struct ip6gre_net *ign, struct ip6_tnl *t); @@ -121,7 +125,8 @@ static struct ip6_tnl *ip6gre_tunnel_lookup(struct net_device *dev, unsigned int h1 = HASH_KEY(key); struct ip6_tnl *t, *cand = NULL; struct ip6gre_net *ign = net_generic(net, ip6gre_net_id); - int dev_type = (gre_proto == htons(ETH_P_TEB)) ? + int dev_type = (gre_proto == htons(ETH_P_TEB) || + gre_proto == htons(ETH_P_ERSPAN)) ? ARPHRD_ETHER : ARPHRD_IP6GRE; int score, cand_score = 4; @@ -226,6 +231,10 @@ static struct ip6_tnl *ip6gre_tunnel_lookup(struct net_device *dev, if (cand) return cand; + t = rcu_dereference(ign->collect_md_tun); + if (t && t->dev->flags & IFF_UP) + return t; + dev = ign->fb_tunnel_dev; if (dev->flags & IFF_UP) return netdev_priv(dev); @@ -261,6 +270,9 @@ static void ip6gre_tunnel_link(struct ip6gre_net *ign, struct ip6_tnl *t) { struct ip6_tnl __rcu **tp = ip6gre_bucket(ign, t); + if (t->parms.collect_md) + rcu_assign_pointer(ign->collect_md_tun, t); + rcu_assign_pointer(t->next, rtnl_dereference(*tp)); rcu_assign_pointer(*tp, t); } @@ -270,6 +282,9 @@ static void ip6gre_tunnel_unlink(struct ip6gre_net *ign, struct ip6_tnl *t) struct ip6_tnl __rcu **tp; struct ip6_tnl *iter; + if (t->parms.collect_md) + rcu_assign_pointer(ign->collect_md_tun, NULL); + for (tp = ip6gre_bucket(ign, t); (iter = rtnl_dereference(*tp)) != NULL; tp = &iter->next) { @@ -460,7 +475,86 @@ static int ip6gre_rcv(struct sk_buff *skb, const struct tnl_ptk_info *tpi) &ipv6h->saddr, &ipv6h->daddr, tpi->key, tpi->proto); if (tunnel) { - ip6_tnl_rcv(tunnel, skb, tpi, NULL, log_ecn_error); + if (tunnel->parms.collect_md) { + struct metadata_dst *tun_dst; + __be64 tun_id; + __be16 flags; + + flags = tpi->flags; + tun_id = key32_to_tunnel_id(tpi->key); + + tun_dst = ipv6_tun_rx_dst(skb, flags, tun_id, 0); + if (!tun_dst) + return PACKET_REJECT; + + ip6_tnl_rcv(tunnel, skb, tpi, tun_dst, log_ecn_error); + } else { + ip6_tnl_rcv(tunnel, skb, tpi, NULL, log_ecn_error); + } + + return PACKET_RCVD; + } + + return PACKET_REJECT; +} + +static int ip6erspan_rcv(struct sk_buff *skb, int gre_hdr_len, + struct tnl_ptk_info *tpi) +{ + const struct ipv6hdr *ipv6h; + struct erspanhdr *ershdr; + struct ip6_tnl *tunnel; + __be32 index; + + ipv6h = ipv6_hdr(skb); + ershdr = (struct erspanhdr *)skb->data; + + if (unlikely(!pskb_may_pull(skb, sizeof(*ershdr)))) + return PACKET_REJECT; + + tpi->key = cpu_to_be32(ntohs(ershdr->session_id) & ID_MASK); + index = ershdr->md.index; + + tunnel = ip6gre_tunnel_lookup(skb->dev, + &ipv6h->saddr, &ipv6h->daddr, tpi->key, + tpi->proto); + if (tunnel) { + if (__iptunnel_pull_header(skb, sizeof(*ershdr), + htons(ETH_P_TEB), + false, false) < 0) + return PACKET_REJECT; + + if (tunnel->parms.collect_md) { + struct metadata_dst *tun_dst; + struct ip_tunnel_info *info; + struct erspan_metadata *md; + __be64 tun_id; + __be16 flags; + + tpi->flags |= TUNNEL_KEY; + flags = tpi->flags; + tun_id = key32_to_tunnel_id(tpi->key); + + tun_dst = ipv6_tun_rx_dst(skb, flags, tun_id, + sizeof(*md)); + if (!tun_dst) + return PACKET_REJECT; + + info = &tun_dst->u.tun_info; + md = ip_tunnel_info_opts(info); + if (!md) + return PACKET_REJECT; + + md->index = index; + info->key.tun_flags |= TUNNEL_ERSPAN_OPT; + info->options_len = sizeof(*md); + + ip6_tnl_rcv(tunnel, skb, tpi, tun_dst, log_ecn_error); + + } else { + tunnel->parms.index = ntohl(index); + ip6_tnl_rcv(tunnel, skb, tpi, NULL, log_ecn_error); + } return PACKET_RCVD; } @@ -481,6 +575,12 @@ static int gre_rcv(struct sk_buff *skb) if (iptunnel_pull_header(skb, hdr_len, tpi.proto, false)) goto drop; + if (unlikely(tpi.proto == htons(ETH_P_ERSPAN))) { + if (ip6erspan_rcv(skb, hdr_len, &tpi) == PACKET_RCVD) + return 0; + goto drop; + } + if (ip6gre_rcv(skb, &tpi) == PACKET_RCVD) return 0; @@ -496,6 +596,78 @@ static int gre_handle_offloads(struct sk_buff *skb, bool csum) csum ? SKB_GSO_GRE_CSUM : SKB_GSO_GRE); } +static void prepare_ip6gre_xmit_ipv4(struct sk_buff *skb, + struct net_device *dev, + struct flowi6 *fl6, __u8 *dsfield, + int *encap_limit) +{ + const struct iphdr *iph = ip_hdr(skb); + struct ip6_tnl *t = netdev_priv(dev); + + if (!(t->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT)) + *encap_limit = t->parms.encap_limit; + + memcpy(fl6, &t->fl.u.ip6, sizeof(*fl6)); + + if (t->parms.flags & IP6_TNL_F_USE_ORIG_TCLASS) + *dsfield = ipv4_get_dsfield(iph); + else + *dsfield = ip6_tclass(t->parms.flowinfo); + + if (t->parms.flags & IP6_TNL_F_USE_ORIG_FWMARK) + fl6->flowi6_mark = skb->mark; + else + fl6->flowi6_mark = t->parms.fwmark; + + fl6->flowi6_uid = sock_net_uid(dev_net(dev), NULL); +} + +static int prepare_ip6gre_xmit_ipv6(struct sk_buff *skb, + struct net_device *dev, + struct flowi6 *fl6, __u8 *dsfield, + int *encap_limit) +{ + struct ipv6hdr *ipv6h = ipv6_hdr(skb); + struct ip6_tnl *t = netdev_priv(dev); + __u16 offset; + + offset = ip6_tnl_parse_tlv_enc_lim(skb, skb_network_header(skb)); + /* ip6_tnl_parse_tlv_enc_lim() might have reallocated skb->head */ + + if (offset > 0) { + struct ipv6_tlv_tnl_enc_lim *tel; + + tel = (struct ipv6_tlv_tnl_enc_lim *)&skb_network_header(skb)[offset]; + if (tel->encap_limit == 0) { + icmpv6_send(skb, ICMPV6_PARAMPROB, + ICMPV6_HDR_FIELD, offset + 2); + return -1; + } + *encap_limit = tel->encap_limit - 1; + } else if (!(t->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT)) { + *encap_limit = t->parms.encap_limit; + } + + memcpy(fl6, &t->fl.u.ip6, sizeof(*fl6)); + + if (t->parms.flags & IP6_TNL_F_USE_ORIG_TCLASS) + *dsfield = ipv6_get_dsfield(ipv6h); + else + *dsfield = ip6_tclass(t->parms.flowinfo); + + if (t->parms.flags & IP6_TNL_F_USE_ORIG_FLOWLABEL) + fl6->flowlabel |= ip6_flowlabel(ipv6h); + + if (t->parms.flags & IP6_TNL_F_USE_ORIG_FWMARK) + fl6->flowi6_mark = skb->mark; + else + fl6->flowi6_mark = t->parms.fwmark; + + fl6->flowi6_uid = sock_net_uid(dev_net(dev), NULL); + + return 0; +} + static netdev_tx_t __gre6_xmit(struct sk_buff *skb, struct net_device *dev, __u8 dsfield, struct flowi6 *fl6, int encap_limit, @@ -517,8 +689,38 @@ static netdev_tx_t __gre6_xmit(struct sk_buff *skb, /* Push GRE header. */ protocol = (dev->type == ARPHRD_ETHER) ? htons(ETH_P_TEB) : proto; - gre_build_header(skb, tunnel->tun_hlen, tunnel->parms.o_flags, - protocol, tunnel->parms.o_key, htonl(tunnel->o_seqno)); + + if (tunnel->parms.collect_md) { + struct ip_tunnel_info *tun_info; + const struct ip_tunnel_key *key; + __be16 flags; + + tun_info = skb_tunnel_info(skb); + if (unlikely(!tun_info || + !(tun_info->mode & IP_TUNNEL_INFO_TX) || + ip_tunnel_info_af(tun_info) != AF_INET6)) + return -EINVAL; + + key = &tun_info->key; + memset(fl6, 0, sizeof(*fl6)); + fl6->flowi6_proto = IPPROTO_GRE; + fl6->daddr = key->u.ipv6.dst; + fl6->flowlabel = key->label; + fl6->flowi6_uid = sock_net_uid(dev_net(dev), NULL); + + dsfield = key->tos; + flags = key->tun_flags & (TUNNEL_CSUM | TUNNEL_KEY); + tunnel->tun_hlen = gre_calc_hlen(flags); + + gre_build_header(skb, tunnel->tun_hlen, + flags, protocol, + tunnel_id_to_key32(tun_info->key.tun_id), 0); + + } else { + gre_build_header(skb, tunnel->tun_hlen, tunnel->parms.o_flags, + protocol, tunnel->parms.o_key, + htonl(tunnel->o_seqno)); + } return ip6_tnl_xmit(skb, dev, dsfield, fl6, encap_limit, pmtu, NEXTHDR_GRE); @@ -527,30 +729,17 @@ static netdev_tx_t __gre6_xmit(struct sk_buff *skb, static inline int ip6gre_xmit_ipv4(struct sk_buff *skb, struct net_device *dev) { struct ip6_tnl *t = netdev_priv(dev); - const struct iphdr *iph = ip_hdr(skb); int encap_limit = -1; struct flowi6 fl6; - __u8 dsfield; + __u8 dsfield = 0; __u32 mtu; int err; memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt)); - if (!(t->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT)) - encap_limit = t->parms.encap_limit; - - memcpy(&fl6, &t->fl.u.ip6, sizeof(fl6)); - - if (t->parms.flags & IP6_TNL_F_USE_ORIG_TCLASS) - dsfield = ipv4_get_dsfield(iph); - else - dsfield = ip6_tclass(t->parms.flowinfo); - if (t->parms.flags & IP6_TNL_F_USE_ORIG_FWMARK) - fl6.flowi6_mark = skb->mark; - else - fl6.flowi6_mark = t->parms.fwmark; - - fl6.flowi6_uid = sock_net_uid(dev_net(dev), NULL); + if (!t->parms.collect_md) + prepare_ip6gre_xmit_ipv4(skb, dev, &fl6, + &dsfield, &encap_limit); err = gre_handle_offloads(skb, !!(t->parms.o_flags & TUNNEL_CSUM)); if (err) @@ -574,46 +763,17 @@ static inline int ip6gre_xmit_ipv6(struct sk_buff *skb, struct net_device *dev) struct ip6_tnl *t = netdev_priv(dev); struct ipv6hdr *ipv6h = ipv6_hdr(skb); int encap_limit = -1; - __u16 offset; struct flowi6 fl6; - __u8 dsfield; + __u8 dsfield = 0; __u32 mtu; int err; if (ipv6_addr_equal(&t->parms.raddr, &ipv6h->saddr)) return -1; - offset = ip6_tnl_parse_tlv_enc_lim(skb, skb_network_header(skb)); - /* ip6_tnl_parse_tlv_enc_lim() might have reallocated skb->head */ - ipv6h = ipv6_hdr(skb); - - if (offset > 0) { - struct ipv6_tlv_tnl_enc_lim *tel; - tel = (struct ipv6_tlv_tnl_enc_lim *)&skb_network_header(skb)[offset]; - if (tel->encap_limit == 0) { - icmpv6_send(skb, ICMPV6_PARAMPROB, - ICMPV6_HDR_FIELD, offset + 2); - return -1; - } - encap_limit = tel->encap_limit - 1; - } else if (!(t->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT)) - encap_limit = t->parms.encap_limit; - - memcpy(&fl6, &t->fl.u.ip6, sizeof(fl6)); - - if (t->parms.flags & IP6_TNL_F_USE_ORIG_TCLASS) - dsfield = ipv6_get_dsfield(ipv6h); - else - dsfield = ip6_tclass(t->parms.flowinfo); - - if (t->parms.flags & IP6_TNL_F_USE_ORIG_FLOWLABEL) - fl6.flowlabel |= ip6_flowlabel(ipv6h); - if (t->parms.flags & IP6_TNL_F_USE_ORIG_FWMARK) - fl6.flowi6_mark = skb->mark; - else - fl6.flowi6_mark = t->parms.fwmark; - - fl6.flowi6_uid = sock_net_uid(dev_net(dev), NULL); + if (!t->parms.collect_md && + prepare_ip6gre_xmit_ipv6(skb, dev, &fl6, &dsfield, &encap_limit)) + return -1; if (gre_handle_offloads(skb, !!(t->parms.o_flags & TUNNEL_CSUM))) return -1; @@ -660,7 +820,8 @@ static int ip6gre_xmit_other(struct sk_buff *skb, struct net_device *dev) if (!(t->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT)) encap_limit = t->parms.encap_limit; - memcpy(&fl6, &t->fl.u.ip6, sizeof(fl6)); + if (!t->parms.collect_md) + memcpy(&fl6, &t->fl.u.ip6, sizeof(fl6)); err = gre_handle_offloads(skb, !!(t->parms.o_flags & TUNNEL_CSUM)); if (err) @@ -705,6 +866,119 @@ tx_err: return NETDEV_TX_OK; } +static netdev_tx_t ip6erspan_tunnel_xmit(struct sk_buff *skb, + struct net_device *dev) +{ + struct ipv6hdr *ipv6h = ipv6_hdr(skb); + struct ip6_tnl *t = netdev_priv(dev); + struct dst_entry *dst = skb_dst(skb); + struct net_device_stats *stats; + bool truncate = false; + int encap_limit = -1; + __u8 dsfield = false; + struct flowi6 fl6; + int err = -EINVAL; + __u32 mtu; + + if (!ip6_tnl_xmit_ctl(t, &t->parms.laddr, &t->parms.raddr)) + goto tx_err; + + if (gre_handle_offloads(skb, false)) + goto tx_err; + + if (skb->len > dev->mtu + dev->hard_header_len) { + pskb_trim(skb, dev->mtu + dev->hard_header_len); + truncate = true; + } + + t->parms.o_flags &= ~TUNNEL_KEY; + IPCB(skb)->flags = 0; + + /* For collect_md mode, derive fl6 from the tunnel key, + * for native mode, call prepare_ip6gre_xmit_{ipv4,ipv6}. + */ + if (t->parms.collect_md) { + struct ip_tunnel_info *tun_info; + const struct ip_tunnel_key *key; + struct erspan_metadata *md; + + tun_info = skb_tunnel_info(skb); + if (unlikely(!tun_info || + !(tun_info->mode & IP_TUNNEL_INFO_TX) || + ip_tunnel_info_af(tun_info) != AF_INET6)) + return -EINVAL; + + key = &tun_info->key; + memset(&fl6, 0, sizeof(fl6)); + fl6.flowi6_proto = IPPROTO_GRE; + fl6.daddr = key->u.ipv6.dst; + fl6.flowlabel = key->label; + fl6.flowi6_uid = sock_net_uid(dev_net(dev), NULL); + + dsfield = key->tos; + md = ip_tunnel_info_opts(tun_info); + if (!md) + goto tx_err; + + erspan_build_header(skb, tunnel_id_to_key32(key->tun_id), + ntohl(md->index), truncate, false); + + } else { + switch (skb->protocol) { + case htons(ETH_P_IP): + memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt)); + prepare_ip6gre_xmit_ipv4(skb, dev, &fl6, + &dsfield, &encap_limit); + break; + case htons(ETH_P_IPV6): + if (ipv6_addr_equal(&t->parms.raddr, &ipv6h->saddr)) + goto tx_err; + if (prepare_ip6gre_xmit_ipv6(skb, dev, &fl6, + &dsfield, &encap_limit)) + goto tx_err; + break; + default: + memcpy(&fl6, &t->fl.u.ip6, sizeof(fl6)); + break; + } + + erspan_build_header(skb, t->parms.o_key, t->parms.index, + truncate, false); + fl6.daddr = t->parms.raddr; + } + + /* Push GRE header. */ + gre_build_header(skb, 8, TUNNEL_SEQ, + htons(ETH_P_ERSPAN), 0, htonl(t->o_seqno++)); + + /* TooBig packet may have updated dst->dev's mtu */ + if (!t->parms.collect_md && dst && dst_mtu(dst) > dst->dev->mtu) + dst->ops->update_pmtu(dst, NULL, skb, dst->dev->mtu); + + err = ip6_tnl_xmit(skb, dev, dsfield, &fl6, encap_limit, &mtu, + NEXTHDR_GRE); + if (err != 0) { + /* XXX: send ICMP error even if DF is not set. */ + if (err == -EMSGSIZE) { + if (skb->protocol == htons(ETH_P_IP)) + icmp_send(skb, ICMP_DEST_UNREACH, + ICMP_FRAG_NEEDED, htonl(mtu)); + else + icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu); + } + + goto tx_err; + } + return NETDEV_TX_OK; + +tx_err: + stats = &t->dev->stats; + stats->tx_errors++; + stats->tx_dropped++; + kfree_skb(skb); + return NETDEV_TX_OK; +} + static void ip6gre_tnl_link_config(struct ip6_tnl *t, int set_mtu) { struct net_device *dev = t->dev; @@ -1048,6 +1322,11 @@ static int ip6gre_tunnel_init_common(struct net_device *dev) if (!(tunnel->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT)) dev->mtu -= 8; + if (tunnel->parms.collect_md) { + dev->features |= NETIF_F_NETNS_LOCAL; + netif_keep_dst(dev); + } + return 0; } @@ -1062,6 +1341,9 @@ static int ip6gre_tunnel_init(struct net_device *dev) tunnel = netdev_priv(dev); + if (tunnel->parms.collect_md) + return 0; + memcpy(dev->dev_addr, &tunnel->parms.laddr, sizeof(struct in6_addr)); memcpy(dev->broadcast, &tunnel->parms.raddr, sizeof(struct in6_addr)); @@ -1084,7 +1366,6 @@ static void ip6gre_fb_tunnel_init(struct net_device *dev) dev_hold(dev); } - static struct inet6_protocol ip6gre_protocol __read_mostly = { .handler = gre_rcv, .err_handler = ip6gre_err, @@ -1099,7 +1380,8 @@ static void ip6gre_destroy_tunnels(struct net *net, struct list_head *head) for_each_netdev_safe(net, dev, aux) if (dev->rtnl_link_ops == &ip6gre_link_ops || - dev->rtnl_link_ops == &ip6gre_tap_ops) + dev->rtnl_link_ops == &ip6gre_tap_ops || + dev->rtnl_link_ops == &ip6erspan_tap_ops) unregister_netdevice_queue(dev, head); for (prio = 0; prio < 4; prio++) { @@ -1221,6 +1503,47 @@ out: return ip6gre_tunnel_validate(tb, data, extack); } +static int ip6erspan_tap_validate(struct nlattr *tb[], struct nlattr *data[], + struct netlink_ext_ack *extack) +{ + __be16 flags = 0; + int ret; + + if (!data) + return 0; + + ret = ip6gre_tap_validate(tb, data, extack); + if (ret) + return ret; + + /* ERSPAN should only have GRE sequence and key flag */ + if (data[IFLA_GRE_OFLAGS]) + flags |= nla_get_be16(data[IFLA_GRE_OFLAGS]); + if (data[IFLA_GRE_IFLAGS]) + flags |= nla_get_be16(data[IFLA_GRE_IFLAGS]); + if (!data[IFLA_GRE_COLLECT_METADATA] && + flags != (GRE_SEQ | GRE_KEY)) + return -EINVAL; + + /* ERSPAN Session ID only has 10-bit. Since we reuse + * 32-bit key field as ID, check it's range. + */ + if (data[IFLA_GRE_IKEY] && + (ntohl(nla_get_be32(data[IFLA_GRE_IKEY])) & ~ID_MASK)) + return -EINVAL; + + if (data[IFLA_GRE_OKEY] && + (ntohl(nla_get_be32(data[IFLA_GRE_OKEY])) & ~ID_MASK)) + return -EINVAL; + + if (data[IFLA_GRE_ERSPAN_INDEX]) { + u32 index = nla_get_u32(data[IFLA_GRE_ERSPAN_INDEX]); + + if (index & ~INDEX_MASK) + return -EINVAL; + } + return 0; +} static void ip6gre_netlink_parms(struct nlattr *data[], struct __ip6_tnl_parm *parms) @@ -1267,6 +1590,12 @@ static void ip6gre_netlink_parms(struct nlattr *data[], if (data[IFLA_GRE_FWMARK]) parms->fwmark = nla_get_u32(data[IFLA_GRE_FWMARK]); + + if (data[IFLA_GRE_ERSPAN_INDEX]) + parms->index = nla_get_u32(data[IFLA_GRE_ERSPAN_INDEX]); + + if (data[IFLA_GRE_COLLECT_METADATA]) + parms->collect_md = true; } static int ip6gre_tap_init(struct net_device *dev) @@ -1303,6 +1632,59 @@ static const struct net_device_ops ip6gre_tap_netdev_ops = { NETIF_F_HIGHDMA | \ NETIF_F_HW_CSUM) +static int ip6erspan_tap_init(struct net_device *dev) +{ + struct ip6_tnl *tunnel; + int t_hlen; + int ret; + + tunnel = netdev_priv(dev); + + tunnel->dev = dev; + tunnel->net = dev_net(dev); + strcpy(tunnel->parms.name, dev->name); + + dev->tstats = netdev_alloc_pcpu_stats(struct pcpu_sw_netstats); + if (!dev->tstats) + return -ENOMEM; + + ret = dst_cache_init(&tunnel->dst_cache, GFP_KERNEL); + if (ret) { + free_percpu(dev->tstats); + dev->tstats = NULL; + return ret; + } + + tunnel->tun_hlen = 8; + tunnel->hlen = tunnel->tun_hlen + tunnel->encap_hlen + + sizeof(struct erspanhdr); + t_hlen = tunnel->hlen + sizeof(struct ipv6hdr); + + dev->hard_header_len = LL_MAX_HEADER + t_hlen; + dev->mtu = ETH_DATA_LEN - t_hlen; + if (dev->type == ARPHRD_ETHER) + dev->mtu -= ETH_HLEN; + if (!(tunnel->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT)) + dev->mtu -= 8; + + dev->priv_flags |= IFF_LIVE_ADDR_CHANGE; + tunnel = netdev_priv(dev); + ip6gre_tnl_link_config(tunnel, 1); + + return 0; +} + +static const struct net_device_ops ip6erspan_netdev_ops = { + .ndo_init = ip6erspan_tap_init, + .ndo_uninit = ip6gre_tunnel_uninit, + .ndo_start_xmit = ip6erspan_tunnel_xmit, + .ndo_set_mac_address = eth_mac_addr, + .ndo_validate_addr = eth_validate_addr, + .ndo_change_mtu = ip6_tnl_change_mtu, + .ndo_get_stats64 = ip_tunnel_get_stats64, + .ndo_get_iflink = ip6_tnl_get_iflink, +}; + static void ip6gre_tap_setup(struct net_device *dev) { @@ -1372,8 +1754,13 @@ static int ip6gre_newlink(struct net *src_net, struct net_device *dev, ip6gre_netlink_parms(data, &nt->parms); - if (ip6gre_tunnel_find(net, &nt->parms, dev->type)) - return -EEXIST; + if (nt->parms.collect_md) { + if (rtnl_dereference(ign->collect_md_tun)) + return -EEXIST; + } else { + if (ip6gre_tunnel_find(net, &nt->parms, dev->type)) + return -EEXIST; + } if (dev->type == ARPHRD_ETHER && !tb[IFLA_ADDRESS]) eth_hw_addr_random(dev); @@ -1492,8 +1879,12 @@ static size_t ip6gre_get_size(const struct net_device *dev) nla_total_size(2) + /* IFLA_GRE_ENCAP_DPORT */ nla_total_size(2) + + /* IFLA_GRE_COLLECT_METADATA */ + nla_total_size(0) + /* IFLA_GRE_FWMARK */ nla_total_size(4) + + /* IFLA_GRE_ERSPAN_INDEX */ + nla_total_size(4) + 0; } @@ -1515,7 +1906,8 @@ static int ip6gre_fill_info(struct sk_buff *skb, const struct net_device *dev) nla_put_u8(skb, IFLA_GRE_ENCAP_LIMIT, p->encap_limit) || nla_put_be32(skb, IFLA_GRE_FLOWINFO, p->flowinfo) || nla_put_u32(skb, IFLA_GRE_FLAGS, p->flags) || - nla_put_u32(skb, IFLA_GRE_FWMARK, p->fwmark)) + nla_put_u32(skb, IFLA_GRE_FWMARK, p->fwmark) || + nla_put_u32(skb, IFLA_GRE_ERSPAN_INDEX, p->index)) goto nla_put_failure; if (nla_put_u16(skb, IFLA_GRE_ENCAP_TYPE, @@ -1528,6 +1920,11 @@ static int ip6gre_fill_info(struct sk_buff *skb, const struct net_device *dev) t->encap.flags)) goto nla_put_failure; + if (p->collect_md) { + if (nla_put_flag(skb, IFLA_GRE_COLLECT_METADATA)) + goto nla_put_failure; + } + return 0; nla_put_failure: @@ -1550,9 +1947,25 @@ static const struct nla_policy ip6gre_policy[IFLA_GRE_MAX + 1] = { [IFLA_GRE_ENCAP_FLAGS] = { .type = NLA_U16 }, [IFLA_GRE_ENCAP_SPORT] = { .type = NLA_U16 }, [IFLA_GRE_ENCAP_DPORT] = { .type = NLA_U16 }, + [IFLA_GRE_COLLECT_METADATA] = { .type = NLA_FLAG }, [IFLA_GRE_FWMARK] = { .type = NLA_U32 }, + [IFLA_GRE_ERSPAN_INDEX] = { .type = NLA_U32 }, }; +static void ip6erspan_tap_setup(struct net_device *dev) +{ + ether_setup(dev); + + dev->netdev_ops = &ip6erspan_netdev_ops; + dev->needs_free_netdev = true; + dev->priv_destructor = ip6gre_dev_free; + + dev->features |= NETIF_F_NETNS_LOCAL; + dev->priv_flags &= ~IFF_TX_SKB_SHARING; + dev->priv_flags |= IFF_LIVE_ADDR_CHANGE; + netif_keep_dst(dev); +} + static struct rtnl_link_ops ip6gre_link_ops __read_mostly = { .kind = "ip6gre", .maxtype = IFLA_GRE_MAX, @@ -1582,6 +1995,20 @@ static struct rtnl_link_ops ip6gre_tap_ops __read_mostly = { .get_link_net = ip6_tnl_get_link_net, }; +static struct rtnl_link_ops ip6erspan_tap_ops __read_mostly = { + .kind = "ip6erspan", + .maxtype = IFLA_GRE_MAX, + .policy = ip6gre_policy, + .priv_size = sizeof(struct ip6_tnl), + .setup = ip6erspan_tap_setup, + .validate = ip6erspan_tap_validate, + .newlink = ip6gre_newlink, + .changelink = ip6gre_changelink, + .get_size = ip6gre_get_size, + .fill_info = ip6gre_fill_info, + .get_link_net = ip6_tnl_get_link_net, +}; + /* * And now the modules code and kernel interface. */ @@ -1610,9 +2037,15 @@ static int __init ip6gre_init(void) if (err < 0) goto tap_ops_failed; + err = rtnl_link_register(&ip6erspan_tap_ops); + if (err < 0) + goto erspan_link_failed; + out: return err; +erspan_link_failed: + rtnl_link_unregister(&ip6gre_tap_ops); tap_ops_failed: rtnl_link_unregister(&ip6gre_link_ops); rtnl_link_failed: @@ -1626,6 +2059,7 @@ static void __exit ip6gre_fini(void) { rtnl_link_unregister(&ip6gre_tap_ops); rtnl_link_unregister(&ip6gre_link_ops); + rtnl_link_unregister(&ip6erspan_tap_ops); inet6_del_protocol(&ip6gre_protocol, IPPROTO_GRE); unregister_pernet_device(&ip6gre_net_ops); } diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 5110a418cc4d..176d74fb3b4d 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1201,13 +1201,13 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork, rt->dst.dev->mtu : dst_mtu(&rt->dst); else mtu = np->pmtudisc >= IPV6_PMTUDISC_PROBE ? - rt->dst.dev->mtu : dst_mtu(rt->dst.path); + rt->dst.dev->mtu : dst_mtu(xfrm_dst_path(&rt->dst)); if (np->frag_size < mtu) { if (np->frag_size) mtu = np->frag_size; } cork->base.fragsize = mtu; - if (dst_allfrag(rt->dst.path)) + if (dst_allfrag(xfrm_dst_path(&rt->dst))) cork->base.flags |= IPCORK_ALLFRAG; cork->base.length = 0; diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c index db84f523656d..6ff2f21ae3fc 100644 --- a/net/ipv6/ip6_tunnel.c +++ b/net/ipv6/ip6_tunnel.c @@ -861,7 +861,7 @@ int ip6_tnl_rcv(struct ip6_tnl *t, struct sk_buff *skb, struct metadata_dst *tun_dst, bool log_ecn_err) { - return __ip6_tnl_rcv(t, skb, tpi, NULL, ip6ip6_dscp_ecn_decapsulate, + return __ip6_tnl_rcv(t, skb, tpi, tun_dst, ip6ip6_dscp_ecn_decapsulate, log_ecn_err); } EXPORT_SYMBOL(ip6_tnl_rcv); @@ -979,6 +979,9 @@ int ip6_tnl_xmit_ctl(struct ip6_tnl *t, int ret = 0; struct net *net = t->net; + if (t->parms.collect_md) + return 1; + if ((p->flags & IP6_TNL_F_CAP_XMIT) || ((p->flags & IP6_TNL_F_CAP_PER_PACKET) && (ip6_tnl_get_cap(t, laddr, raddr) & IP6_TNL_F_CAP_XMIT))) { diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c index a2e1a864eb46..890f9bda06a4 100644 --- a/net/ipv6/ip6mr.c +++ b/net/ipv6/ip6mr.c @@ -1425,10 +1425,13 @@ int __init ip6_mr_init(void) goto add_proto_fail; } #endif - rtnl_register(RTNL_FAMILY_IP6MR, RTM_GETROUTE, NULL, - ip6mr_rtm_dumproute, 0); - return 0; + err = rtnl_register_module(THIS_MODULE, RTNL_FAMILY_IP6MR, RTM_GETROUTE, + NULL, ip6mr_rtm_dumproute, 0); + if (err == 0) + return 0; + #ifdef CONFIG_IPV6_PIMSM_V2 + inet6_del_protocol(&pim6_protocol, IPPROTO_PIM); add_proto_fail: unregister_netdevice_notifier(&ip6_mr_notifier); #endif diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 7a8d1500d374..b3f4d19b3ca5 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -186,7 +186,7 @@ static void rt6_uncached_list_flush_dev(struct net *net, struct net_device *dev) static u32 *rt6_pcpu_cow_metrics(struct rt6_info *rt) { - return dst_metrics_write_ptr(rt->dst.from); + return dst_metrics_write_ptr(&rt->from->dst); } static u32 *ipv6_cow_metrics(struct dst_entry *dst, unsigned long old) @@ -391,7 +391,7 @@ static void ip6_dst_destroy(struct dst_entry *dst) { struct rt6_info *rt = (struct rt6_info *)dst; struct rt6_exception_bucket *bucket; - struct dst_entry *from = dst->from; + struct rt6_info *from = rt->from; struct inet6_dev *idev; dst_destroy_metrics_generic(dst); @@ -409,8 +409,8 @@ static void ip6_dst_destroy(struct dst_entry *dst) kfree(bucket); } - dst->from = NULL; - dst_release(from); + rt->from = NULL; + dst_release(&from->dst); } static void ip6_dst_ifdown(struct dst_entry *dst, struct net_device *dev, @@ -443,9 +443,9 @@ static bool rt6_check_expired(const struct rt6_info *rt) if (rt->rt6i_flags & RTF_EXPIRES) { if (time_after(jiffies, rt->dst.expires)) return true; - } else if (rt->dst.from) { + } else if (rt->from) { return rt->dst.obsolete != DST_OBSOLETE_FORCE_CHK || - rt6_check_expired((struct rt6_info *)rt->dst.from); + rt6_check_expired(rt->from); } return false; } @@ -502,7 +502,7 @@ static inline struct rt6_info *rt6_device_match(struct net *net, if (!oif && ipv6_addr_any(saddr)) goto out; - for (sprt = rt; sprt; sprt = rcu_dereference(sprt->dst.rt6_next)) { + for (sprt = rt; sprt; sprt = rcu_dereference(sprt->rt6_next)) { struct net_device *dev = sprt->dst.dev; if (oif) { @@ -721,7 +721,7 @@ static struct rt6_info *find_rr_leaf(struct fib6_node *fn, match = NULL; cont = NULL; - for (rt = rr_head; rt; rt = rcu_dereference(rt->dst.rt6_next)) { + for (rt = rr_head; rt; rt = rcu_dereference(rt->rt6_next)) { if (rt->rt6i_metric != metric) { cont = rt; break; @@ -731,7 +731,7 @@ static struct rt6_info *find_rr_leaf(struct fib6_node *fn, } for (rt = leaf; rt && rt != rr_head; - rt = rcu_dereference(rt->dst.rt6_next)) { + rt = rcu_dereference(rt->rt6_next)) { if (rt->rt6i_metric != metric) { cont = rt; break; @@ -743,7 +743,7 @@ static struct rt6_info *find_rr_leaf(struct fib6_node *fn, if (match || !cont) return match; - for (rt = cont; rt; rt = rcu_dereference(rt->dst.rt6_next)) + for (rt = cont; rt; rt = rcu_dereference(rt->rt6_next)) match = find_match(rt, oif, strict, &mpri, match, do_rr); return match; @@ -781,7 +781,7 @@ static struct rt6_info *rt6_select(struct net *net, struct fib6_node *fn, &do_rr); if (do_rr) { - struct rt6_info *next = rcu_dereference(rt0->dst.rt6_next); + struct rt6_info *next = rcu_dereference(rt0->rt6_next); /* no entries matched; do round-robin */ if (!next || next->rt6i_metric != rt0->rt6i_metric) @@ -1054,7 +1054,7 @@ static struct rt6_info *ip6_rt_cache_alloc(struct rt6_info *ort, */ if (ort->rt6i_flags & (RTF_CACHE | RTF_PCPU)) - ort = (struct rt6_info *)ort->dst.from; + ort = ort->from; rcu_read_lock(); dev = ip6_rt_get_dev_rcu(ort); @@ -1274,7 +1274,7 @@ static int rt6_insert_exception(struct rt6_info *nrt, /* ort can't be a cache or pcpu route */ if (ort->rt6i_flags & (RTF_CACHE | RTF_PCPU)) - ort = (struct rt6_info *)ort->dst.from; + ort = ort->from; WARN_ON_ONCE(ort->rt6i_flags & (RTF_CACHE | RTF_PCPU)); spin_lock_bh(&rt6_exception_lock); @@ -1415,8 +1415,8 @@ static struct rt6_info *rt6_find_cached_rt(struct rt6_info *rt, /* Remove the passed in cached rt from the hash table that contains it */ int rt6_remove_exception_rt(struct rt6_info *rt) { - struct rt6_info *from = (struct rt6_info *)rt->dst.from; struct rt6_exception_bucket *bucket; + struct rt6_info *from = rt->from; struct in6_addr *src_key = NULL; struct rt6_exception *rt6_ex; int err; @@ -1460,8 +1460,8 @@ int rt6_remove_exception_rt(struct rt6_info *rt) */ static void rt6_update_exception_stamp_rt(struct rt6_info *rt) { - struct rt6_info *from = (struct rt6_info *)rt->dst.from; struct rt6_exception_bucket *bucket; + struct rt6_info *from = rt->from; struct in6_addr *src_key = NULL; struct rt6_exception *rt6_ex; @@ -1929,9 +1929,9 @@ struct dst_entry *ip6_blackhole_route(struct net *net, struct dst_entry *dst_ori static void rt6_dst_from_metrics_check(struct rt6_info *rt) { - if (rt->dst.from && - dst_metrics_ptr(&rt->dst) != dst_metrics_ptr(rt->dst.from)) - dst_init_metrics(&rt->dst, dst_metrics_ptr(rt->dst.from), true); + if (rt->from && + dst_metrics_ptr(&rt->dst) != dst_metrics_ptr(&rt->from->dst)) + dst_init_metrics(&rt->dst, dst_metrics_ptr(&rt->from->dst), true); } static struct dst_entry *rt6_check(struct rt6_info *rt, u32 cookie) @@ -1951,7 +1951,7 @@ static struct dst_entry *rt6_dst_from_check(struct rt6_info *rt, u32 cookie) { if (!__rt6_check_expired(rt) && rt->dst.obsolete == DST_OBSOLETE_FORCE_CHK && - rt6_check((struct rt6_info *)(rt->dst.from), cookie)) + rt6_check(rt->from, cookie)) return &rt->dst; else return NULL; @@ -1971,7 +1971,7 @@ static struct dst_entry *ip6_dst_check(struct dst_entry *dst, u32 cookie) rt6_dst_from_metrics_check(rt); if (rt->rt6i_flags & RTF_PCPU || - (unlikely(!list_empty(&rt->rt6i_uncached)) && rt->dst.from)) + (unlikely(!list_empty(&rt->rt6i_uncached)) && rt->from)) return rt6_dst_from_check(rt, cookie); else return rt6_check(rt, cookie); @@ -3055,11 +3055,11 @@ out: static void rt6_set_from(struct rt6_info *rt, struct rt6_info *from) { - BUG_ON(from->dst.from); + BUG_ON(from->from); rt->rt6i_flags &= ~RTF_EXPIRES; dst_hold(&from->dst); - rt->dst.from = &from->dst; + rt->from = from; dst_init_metrics(&rt->dst, dst_metrics_ptr(&from->dst), true); } @@ -4596,8 +4596,6 @@ static int __net_init ip6_route_net_init(struct net *net) GFP_KERNEL); if (!net->ipv6.ip6_null_entry) goto out_ip6_dst_entries; - net->ipv6.ip6_null_entry->dst.path = - (struct dst_entry *)net->ipv6.ip6_null_entry; net->ipv6.ip6_null_entry->dst.ops = &net->ipv6.ip6_dst_ops; dst_init_metrics(&net->ipv6.ip6_null_entry->dst, ip6_template_metrics, true); @@ -4609,8 +4607,6 @@ static int __net_init ip6_route_net_init(struct net *net) GFP_KERNEL); if (!net->ipv6.ip6_prohibit_entry) goto out_ip6_null_entry; - net->ipv6.ip6_prohibit_entry->dst.path = - (struct dst_entry *)net->ipv6.ip6_prohibit_entry; net->ipv6.ip6_prohibit_entry->dst.ops = &net->ipv6.ip6_dst_ops; dst_init_metrics(&net->ipv6.ip6_prohibit_entry->dst, ip6_template_metrics, true); @@ -4620,8 +4616,6 @@ static int __net_init ip6_route_net_init(struct net *net) GFP_KERNEL); if (!net->ipv6.ip6_blk_hole_entry) goto out_ip6_prohibit_entry; - net->ipv6.ip6_blk_hole_entry->dst.path = - (struct dst_entry *)net->ipv6.ip6_blk_hole_entry; net->ipv6.ip6_blk_hole_entry->dst.ops = &net->ipv6.ip6_dst_ops; dst_init_metrics(&net->ipv6.ip6_blk_hole_entry->dst, ip6_template_metrics, true); @@ -4778,11 +4772,20 @@ int __init ip6_route_init(void) if (ret) goto fib6_rules_init; - ret = -ENOBUFS; - if (__rtnl_register(PF_INET6, RTM_NEWROUTE, inet6_rtm_newroute, NULL, 0) || - __rtnl_register(PF_INET6, RTM_DELROUTE, inet6_rtm_delroute, NULL, 0) || - __rtnl_register(PF_INET6, RTM_GETROUTE, inet6_rtm_getroute, NULL, - RTNL_FLAG_DOIT_UNLOCKED)) + ret = rtnl_register_module(THIS_MODULE, PF_INET6, RTM_NEWROUTE, + inet6_rtm_newroute, NULL, 0); + if (ret < 0) + goto out_register_late_subsys; + + ret = rtnl_register_module(THIS_MODULE, PF_INET6, RTM_DELROUTE, + inet6_rtm_delroute, NULL, 0); + if (ret < 0) + goto out_register_late_subsys; + + ret = rtnl_register_module(THIS_MODULE, PF_INET6, RTM_GETROUTE, + inet6_rtm_getroute, NULL, + RTNL_FLAG_DOIT_UNLOCKED); + if (ret < 0) goto out_register_late_subsys; ret = register_netdevice_notifier(&ip6_route_dev_notifier); @@ -4800,6 +4803,7 @@ out: return ret; out_register_late_subsys: + rtnl_unregister_all(PF_INET6); unregister_pernet_subsys(&ip6_route_net_late_ops); fib6_rules_init: fib6_rules_cleanup(); diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index 3f30fa313bf2..eecf9f0faf29 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -89,28 +89,12 @@ static u32 udp6_ehashfn(const struct net *net, udp_ipv6_hash_secret + net_hash_mix(net)); } -static u32 udp6_portaddr_hash(const struct net *net, - const struct in6_addr *addr6, - unsigned int port) -{ - unsigned int hash, mix = net_hash_mix(net); - - if (ipv6_addr_any(addr6)) - hash = jhash_1word(0, mix); - else if (ipv6_addr_v4mapped(addr6)) - hash = jhash_1word((__force u32)addr6->s6_addr32[3], mix); - else - hash = jhash2((__force u32 *)addr6->s6_addr32, 4, mix); - - return hash ^ port; -} - int udp_v6_get_port(struct sock *sk, unsigned short snum) { unsigned int hash2_nulladdr = - udp6_portaddr_hash(sock_net(sk), &in6addr_any, snum); + ipv6_portaddr_hash(sock_net(sk), &in6addr_any, snum); unsigned int hash2_partial = - udp6_portaddr_hash(sock_net(sk), &sk->sk_v6_rcv_saddr, 0); + ipv6_portaddr_hash(sock_net(sk), &sk->sk_v6_rcv_saddr, 0); /* precompute partial secondary hash */ udp_sk(sk)->udp_portaddr_hash = hash2_partial; @@ -119,7 +103,7 @@ int udp_v6_get_port(struct sock *sk, unsigned short snum) static void udp_v6_rehash(struct sock *sk) { - u16 new_hash = udp6_portaddr_hash(sock_net(sk), + u16 new_hash = ipv6_portaddr_hash(sock_net(sk), &sk->sk_v6_rcv_saddr, inet_sk(sk)->inet_num); @@ -184,7 +168,7 @@ static struct sock *udp6_lib_lookup2(struct net *net, struct udp_hslot *hslot2, struct sk_buff *skb) { struct sock *sk, *result; - int score, badness, matches = 0, reuseport = 0; + int score, badness; u32 hash = 0; result = NULL; @@ -193,8 +177,7 @@ static struct sock *udp6_lib_lookup2(struct net *net, score = compute_score(sk, net, saddr, sport, daddr, hnum, dif, sdif, exact_dif); if (score > badness) { - reuseport = sk->sk_reuseport; - if (reuseport) { + if (sk->sk_reuseport) { hash = udp6_ehashfn(net, daddr, hnum, saddr, sport); @@ -202,15 +185,9 @@ static struct sock *udp6_lib_lookup2(struct net *net, sizeof(struct udphdr)); if (result) return result; - matches = 1; } result = sk; badness = score; - } else if (score == badness && reuseport) { - matches++; - if (reciprocal_scale(hash, matches) == 0) - result = sk; - hash = next_pseudo_random32(hash); } } return result; @@ -228,11 +205,11 @@ struct sock *__udp6_lib_lookup(struct net *net, unsigned int hash2, slot2, slot = udp_hashfn(net, hnum, udptable->mask); struct udp_hslot *hslot2, *hslot = &udptable->hash[slot]; bool exact_dif = udp6_lib_exact_dif_match(net, skb); - int score, badness, matches = 0, reuseport = 0; + int score, badness; u32 hash = 0; if (hslot->count > 10) { - hash2 = udp6_portaddr_hash(net, daddr, hnum); + hash2 = ipv6_portaddr_hash(net, daddr, hnum); slot2 = hash2 & udptable->mask; hslot2 = &udptable->hash2[slot2]; if (hslot->count < hslot2->count) @@ -243,7 +220,7 @@ struct sock *__udp6_lib_lookup(struct net *net, hslot2, skb); if (!result) { unsigned int old_slot2 = slot2; - hash2 = udp6_portaddr_hash(net, &in6addr_any, hnum); + hash2 = ipv6_portaddr_hash(net, &in6addr_any, hnum); slot2 = hash2 & udptable->mask; /* avoid searching the same slot again. */ if (unlikely(slot2 == old_slot2)) @@ -267,23 +244,16 @@ begin: score = compute_score(sk, net, saddr, sport, daddr, hnum, dif, sdif, exact_dif); if (score > badness) { - reuseport = sk->sk_reuseport; - if (reuseport) { + if (sk->sk_reuseport) { hash = udp6_ehashfn(net, daddr, hnum, saddr, sport); result = reuseport_select_sock(sk, hash, skb, sizeof(struct udphdr)); if (result) return result; - matches = 1; } result = sk; badness = score; - } else if (score == badness && reuseport) { - matches++; - if (reciprocal_scale(hash, matches) == 0) - result = sk; - hash = next_pseudo_random32(hash); } } return result; @@ -719,9 +689,9 @@ static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb, struct sk_buff *nskb; if (use_hash2) { - hash2_any = udp6_portaddr_hash(net, &in6addr_any, hnum) & + hash2_any = ipv6_portaddr_hash(net, &in6addr_any, hnum) & udptable->mask; - hash2 = udp6_portaddr_hash(net, daddr, hnum) & udptable->mask; + hash2 = ipv6_portaddr_hash(net, daddr, hnum) & udptable->mask; start_lookup: hslot = &udptable->hash2[hash2]; offset = offsetof(typeof(*sk), __sk_common.skc_portaddr_node); @@ -909,7 +879,7 @@ static struct sock *__udp6_lib_demux_lookup(struct net *net, int dif, int sdif) { unsigned short hnum = ntohs(loc_port); - unsigned int hash2 = udp6_portaddr_hash(net, loc_addr, hnum); + unsigned int hash2 = ipv6_portaddr_hash(net, loc_addr, hnum); unsigned int slot2 = hash2 & udp_table.mask; struct udp_hslot *hslot2 = &udp_table.hash2[slot2]; const __portpair ports = INET_COMBINED_PORTS(rmt_port, hnum); diff --git a/net/ipv6/xfrm6_mode_tunnel.c b/net/ipv6/xfrm6_mode_tunnel.c index 02556e356f87..e66b94f46532 100644 --- a/net/ipv6/xfrm6_mode_tunnel.c +++ b/net/ipv6/xfrm6_mode_tunnel.c @@ -59,7 +59,7 @@ static int xfrm6_mode_tunnel_output(struct xfrm_state *x, struct sk_buff *skb) if (x->props.flags & XFRM_STATE_NOECN) dsfield &= ~INET_ECN_MASK; ipv6_change_dsfield(top_iph, 0, dsfield); - top_iph->hop_limit = ip6_dst_hoplimit(dst->child); + top_iph->hop_limit = ip6_dst_hoplimit(xfrm_dst_child(dst)); top_iph->saddr = *(struct in6_addr *)&x->props.saddr; top_iph->daddr = *(struct in6_addr *)&x->id.daddr; return 0; diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c index 885ade234a49..09fb44ee3b45 100644 --- a/net/ipv6/xfrm6_policy.c +++ b/net/ipv6/xfrm6_policy.c @@ -265,7 +265,7 @@ static void xfrm6_dst_ifdown(struct dst_entry *dst, struct net_device *dev, in6_dev_put(xdst->u.rt6.rt6i_idev); xdst->u.rt6.rt6i_idev = loopback_idev; in6_dev_hold(loopback_idev); - xdst = (struct xfrm_dst *)xdst->u.dst.child; + xdst = (struct xfrm_dst *)xfrm_dst_child(&xdst->u.dst); } while (xdst->u.dst.xfrm); __in6_dev_put(loopback_idev); diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c index 8ca9915befc8..5dce8336d33f 100644 --- a/net/mpls/af_mpls.c +++ b/net/mpls/af_mpls.c @@ -2510,12 +2510,15 @@ static int __init mpls_init(void) rtnl_af_register(&mpls_af_ops); - rtnl_register(PF_MPLS, RTM_NEWROUTE, mpls_rtm_newroute, NULL, 0); - rtnl_register(PF_MPLS, RTM_DELROUTE, mpls_rtm_delroute, NULL, 0); - rtnl_register(PF_MPLS, RTM_GETROUTE, mpls_getroute, mpls_dump_routes, - 0); - rtnl_register(PF_MPLS, RTM_GETNETCONF, mpls_netconf_get_devconf, - mpls_netconf_dump_devconf, 0); + rtnl_register_module(THIS_MODULE, PF_MPLS, RTM_NEWROUTE, + mpls_rtm_newroute, NULL, 0); + rtnl_register_module(THIS_MODULE, PF_MPLS, RTM_DELROUTE, + mpls_rtm_delroute, NULL, 0); + rtnl_register_module(THIS_MODULE, PF_MPLS, RTM_GETROUTE, + mpls_getroute, mpls_dump_routes, 0); + rtnl_register_module(THIS_MODULE, PF_MPLS, RTM_GETNETCONF, + mpls_netconf_get_devconf, + mpls_netconf_dump_devconf, 0); err = ipgre_tunnel_encap_add_mpls_ops(); if (err) pr_err("Can't add mpls over gre tunnel ops\n"); diff --git a/net/netfilter/xt_policy.c b/net/netfilter/xt_policy.c index 2b4ab189bba7..5639fb03bdd9 100644 --- a/net/netfilter/xt_policy.c +++ b/net/netfilter/xt_policy.c @@ -93,7 +93,8 @@ match_policy_out(const struct sk_buff *skb, const struct xt_policy_info *info, if (dst->xfrm == NULL) return -1; - for (i = 0; dst && dst->xfrm; dst = dst->child, i++) { + for (i = 0; dst && dst->xfrm; + dst = ((struct xfrm_dst *)dst)->child, i++) { pos = strict ? i : 0; if (pos >= info->len) return 0; diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c index dbe2379329c5..76d050aba7a4 100644 --- a/net/openvswitch/flow.c +++ b/net/openvswitch/flow.c @@ -56,12 +56,12 @@ u64 ovs_flow_used_time(unsigned long flow_jiffies) { - struct timespec cur_ts; + struct timespec64 cur_ts; u64 cur_ms, idle_ms; - ktime_get_ts(&cur_ts); + ktime_get_ts64(&cur_ts); idle_ms = jiffies_to_msecs(jiffies - flow_jiffies); - cur_ms = (u64)cur_ts.tv_sec * MSEC_PER_SEC + + cur_ms = (u64)(u32)cur_ts.tv_sec * MSEC_PER_SEC + cur_ts.tv_nsec / NSEC_PER_MSEC; return cur_ms - idle_ms; diff --git a/net/openvswitch/vport-internal_dev.c b/net/openvswitch/vport-internal_dev.c index 04a3128adcf0..3e7747549f90 100644 --- a/net/openvswitch/vport-internal_dev.c +++ b/net/openvswitch/vport-internal_dev.c @@ -126,18 +126,12 @@ internal_get_stats(struct net_device *dev, struct rtnl_link_stats64 *stats) } } -static void internal_set_rx_headroom(struct net_device *dev, int new_hr) -{ - dev->needed_headroom = new_hr < 0 ? 0 : new_hr; -} - static const struct net_device_ops internal_dev_netdev_ops = { .ndo_open = internal_dev_open, .ndo_stop = internal_dev_stop, .ndo_start_xmit = internal_dev_xmit, .ndo_set_mac_address = eth_mac_addr, .ndo_get_stats64 = internal_get_stats, - .ndo_set_rx_headroom = internal_set_rx_headroom, }; static struct rtnl_link_ops internal_dev_link_ops __read_mostly = { @@ -154,7 +148,7 @@ static void do_setup(struct net_device *netdev) netdev->priv_flags &= ~IFF_TX_SKB_SHARING; netdev->priv_flags |= IFF_LIVE_ADDR_CHANGE | IFF_OPENVSWITCH | - IFF_PHONY_HEADROOM | IFF_NO_QUEUE; + IFF_NO_QUEUE; netdev->needs_free_netdev = true; netdev->priv_destructor = internal_dev_destructor; netdev->ethtool_ops = &internal_dev_ethtool_ops; @@ -195,7 +189,6 @@ static struct vport *internal_dev_create(const struct vport_parms *parms) err = -ENOMEM; goto error_free_netdev; } - vport->dev->needed_headroom = vport->dp->max_headroom; dev_net_set(vport->dev, ovs_dp_get_net(vport->dp)); internal_dev = internal_dev_priv(vport->dev); diff --git a/net/phonet/pn_netlink.c b/net/phonet/pn_netlink.c index da754fc926e7..871eaf2cb85e 100644 --- a/net/phonet/pn_netlink.c +++ b/net/phonet/pn_netlink.c @@ -299,16 +299,21 @@ out: int __init phonet_netlink_register(void) { - int err = __rtnl_register(PF_PHONET, RTM_NEWADDR, addr_doit, - NULL, 0); + int err = rtnl_register_module(THIS_MODULE, PF_PHONET, RTM_NEWADDR, + addr_doit, NULL, 0); if (err) return err; - /* Further __rtnl_register() cannot fail */ - __rtnl_register(PF_PHONET, RTM_DELADDR, addr_doit, NULL, 0); - __rtnl_register(PF_PHONET, RTM_GETADDR, NULL, getaddr_dumpit, 0); - __rtnl_register(PF_PHONET, RTM_NEWROUTE, route_doit, NULL, 0); - __rtnl_register(PF_PHONET, RTM_DELROUTE, route_doit, NULL, 0); - __rtnl_register(PF_PHONET, RTM_GETROUTE, NULL, route_dumpit, 0); + /* Further rtnl_register_module() cannot fail */ + rtnl_register_module(THIS_MODULE, PF_PHONET, RTM_DELADDR, + addr_doit, NULL, 0); + rtnl_register_module(THIS_MODULE, PF_PHONET, RTM_GETADDR, + NULL, getaddr_dumpit, 0); + rtnl_register_module(THIS_MODULE, PF_PHONET, RTM_NEWROUTE, + route_doit, NULL, 0); + rtnl_register_module(THIS_MODULE, PF_PHONET, RTM_DELROUTE, + route_doit, NULL, 0); + rtnl_register_module(THIS_MODULE, PF_PHONET, RTM_GETROUTE, + NULL, route_dumpit, 0); return 0; } diff --git a/net/qrtr/qrtr.c b/net/qrtr/qrtr.c index 77ab05e23001..5fb3929e3d7d 100644 --- a/net/qrtr/qrtr.c +++ b/net/qrtr/qrtr.c @@ -1116,9 +1116,13 @@ static int __init qrtr_proto_init(void) return rc; } - rtnl_register(PF_QIPCRTR, RTM_NEWADDR, qrtr_addr_doit, NULL, 0); + rc = rtnl_register_module(THIS_MODULE, PF_QIPCRTR, RTM_NEWADDR, qrtr_addr_doit, NULL, 0); + if (rc) { + sock_unregister(qrtr_family.family); + proto_unregister(&qrtr_proto); + } - return 0; + return rc; } postcore_initcall(qrtr_proto_init); diff --git a/net/rds/connection.c b/net/rds/connection.c index 7ee2d5d68b78..6492c0b608a4 100644 --- a/net/rds/connection.c +++ b/net/rds/connection.c @@ -230,8 +230,8 @@ static struct rds_connection *__rds_conn_create(struct net *net, rdsdebug("allocated conn %p for %pI4 -> %pI4 over %s %s\n", conn, &laddr, &faddr, - trans->t_name ? trans->t_name : "[unknown]", - is_outgoing ? "(outgoing)" : ""); + strnlen(trans->t_name, sizeof(trans->t_name)) ? trans->t_name : + "[unknown]", is_outgoing ? "(outgoing)" : ""); /* * Since we ran without holding the conn lock, someone could @@ -366,6 +366,8 @@ void rds_conn_shutdown(struct rds_conn_path *cp) * to the conn hash, so we never trigger a reconnect on this * conn - the reconnect is always triggered by the active peer. */ cancel_delayed_work_sync(&cp->cp_conn_w); + if (conn->c_destroy_in_prog) + return; rcu_read_lock(); if (!hlist_unhashed(&conn->c_hash_node)) { rcu_read_unlock(); @@ -445,7 +447,6 @@ void rds_conn_destroy(struct rds_connection *conn) */ rds_cong_remove_conn(conn); - put_net(conn->c_net); kfree(conn->c_path); kmem_cache_free(rds_conn_slab, conn); diff --git a/net/rds/rds.h b/net/rds/rds.h index c349c71babff..d09f6c1facb4 100644 --- a/net/rds/rds.h +++ b/net/rds/rds.h @@ -150,7 +150,7 @@ struct rds_connection { /* Protocol version */ unsigned int c_version; - struct net *c_net; + possible_net_t c_net; struct list_head c_map_item; unsigned long c_map_queued; @@ -165,13 +165,13 @@ struct rds_connection { static inline struct net *rds_conn_net(struct rds_connection *conn) { - return conn->c_net; + return read_pnet(&conn->c_net); } static inline void rds_conn_net_set(struct rds_connection *conn, struct net *net) { - conn->c_net = get_net(net); + write_pnet(&conn->c_net, net); } #define RDS_FLAG_CONG_BITMAP 0x01 diff --git a/net/rds/tcp.c b/net/rds/tcp.c index 6b7ee71f40c6..39f502d47969 100644 --- a/net/rds/tcp.c +++ b/net/rds/tcp.c @@ -306,7 +306,8 @@ static void rds_tcp_conn_free(void *arg) rdsdebug("freeing tc %p\n", tc); spin_lock_irqsave(&rds_tcp_conn_lock, flags); - list_del(&tc->t_tcp_node); + if (!tc->t_tcp_node_detached) + list_del(&tc->t_tcp_node); spin_unlock_irqrestore(&rds_tcp_conn_lock, flags); kmem_cache_free(rds_tcp_conn_slab, tc); @@ -495,27 +496,6 @@ static struct pernet_operations rds_tcp_net_ops = { .size = sizeof(struct rds_tcp_net), }; -/* explicitly send a RST on each socket, thereby releasing any socket refcnts - * that may otherwise hold up netns deletion. - */ -static void rds_tcp_conn_paths_destroy(struct rds_connection *conn) -{ - struct rds_conn_path *cp; - struct rds_tcp_connection *tc; - int i; - struct sock *sk; - - for (i = 0; i < RDS_MPATH_WORKERS; i++) { - cp = &conn->c_path[i]; - tc = cp->cp_transport_data; - if (!tc->t_sock) - continue; - sk = tc->t_sock->sk; - sk->sk_prot->disconnect(sk, 0); - tcp_done(sk); - } -} - static void rds_tcp_kill_sock(struct net *net) { struct rds_tcp_connection *tc, *_tc; @@ -527,18 +507,20 @@ static void rds_tcp_kill_sock(struct net *net) rds_tcp_listen_stop(lsock, &rtn->rds_tcp_accept_w); spin_lock_irq(&rds_tcp_conn_lock); list_for_each_entry_safe(tc, _tc, &rds_tcp_conn_list, t_tcp_node) { - struct net *c_net = tc->t_cpath->cp_conn->c_net; + struct net *c_net = read_pnet(&tc->t_cpath->cp_conn->c_net); if (net != c_net || !tc->t_sock) continue; - if (!list_has_conn(&tmp_list, tc->t_cpath->cp_conn)) + if (!list_has_conn(&tmp_list, tc->t_cpath->cp_conn)) { list_move_tail(&tc->t_tcp_node, &tmp_list); + } else { + list_del(&tc->t_tcp_node); + tc->t_tcp_node_detached = true; + } } spin_unlock_irq(&rds_tcp_conn_lock); - list_for_each_entry_safe(tc, _tc, &tmp_list, t_tcp_node) { - rds_tcp_conn_paths_destroy(tc->t_cpath->cp_conn); + list_for_each_entry_safe(tc, _tc, &tmp_list, t_tcp_node) rds_conn_destroy(tc->t_cpath->cp_conn); - } } void *rds_tcp_listen_sock_def_readable(struct net *net) @@ -586,7 +568,7 @@ static void rds_tcp_sysctl_reset(struct net *net) spin_lock_irq(&rds_tcp_conn_lock); list_for_each_entry_safe(tc, _tc, &rds_tcp_conn_list, t_tcp_node) { - struct net *c_net = tc->t_cpath->cp_conn->c_net; + struct net *c_net = read_pnet(&tc->t_cpath->cp_conn->c_net); if (net != c_net || !tc->t_sock) continue; diff --git a/net/rds/tcp.h b/net/rds/tcp.h index 1aafbf7c3011..e7858ee8ed8b 100644 --- a/net/rds/tcp.h +++ b/net/rds/tcp.h @@ -12,6 +12,7 @@ struct rds_tcp_incoming { struct rds_tcp_connection { struct list_head t_tcp_node; + bool t_tcp_node_detached; struct rds_conn_path *t_cpath; /* t_conn_path_lock synchronizes the connection establishment between * rds_tcp_accept_one and rds_tcp_conn_path_connect diff --git a/net/sched/act_api.c b/net/sched/act_api.c index 4d33a50a8a6d..52622a3d2517 100644 --- a/net/sched/act_api.c +++ b/net/sched/act_api.c @@ -99,7 +99,7 @@ int __tcf_idr_release(struct tc_action *p, bool bind, bool strict) p->tcfa_refcnt--; if (p->tcfa_bindcnt <= 0 && p->tcfa_refcnt <= 0) { if (p->ops->cleanup) - p->ops->cleanup(p, bind); + p->ops->cleanup(p); tcf_idr_remove(p->idrinfo, p); ret = ACT_P_DELETED; } diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c index 5ef8ce8c83d4..e6c477fa9ca5 100644 --- a/net/sched/act_bpf.c +++ b/net/sched/act_bpf.c @@ -357,7 +357,7 @@ out: return ret; } -static void tcf_bpf_cleanup(struct tc_action *act, int bind) +static void tcf_bpf_cleanup(struct tc_action *act) { struct tcf_bpf_cfg tmp; diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c index 3007cb1310ea..dee9cf22686c 100644 --- a/net/sched/act_ife.c +++ b/net/sched/act_ife.c @@ -387,7 +387,7 @@ out_nlmsg_trim: } /* under ife->tcf_lock */ -static void _tcf_ife_cleanup(struct tc_action *a, int bind) +static void _tcf_ife_cleanup(struct tc_action *a) { struct tcf_ife_info *ife = to_ife(a); struct tcf_meta_info *e, *n; @@ -405,13 +405,13 @@ static void _tcf_ife_cleanup(struct tc_action *a, int bind) } } -static void tcf_ife_cleanup(struct tc_action *a, int bind) +static void tcf_ife_cleanup(struct tc_action *a) { struct tcf_ife_info *ife = to_ife(a); struct tcf_ife_params *p; spin_lock_bh(&ife->tcf_lock); - _tcf_ife_cleanup(a, bind); + _tcf_ife_cleanup(a); spin_unlock_bh(&ife->tcf_lock); p = rcu_dereference_protected(ife->params, 1); @@ -546,7 +546,7 @@ metadata_parse_err: if (exists) tcf_idr_release(*a, bind); if (ret == ACT_P_CREATED) - _tcf_ife_cleanup(*a, bind); + _tcf_ife_cleanup(*a); if (exists) spin_unlock_bh(&ife->tcf_lock); @@ -567,7 +567,7 @@ metadata_parse_err: err = use_all_metadata(ife); if (err) { if (ret == ACT_P_CREATED) - _tcf_ife_cleanup(*a, bind); + _tcf_ife_cleanup(*a); if (exists) spin_unlock_bh(&ife->tcf_lock); diff --git a/net/sched/act_ipt.c b/net/sched/act_ipt.c index d9e399a7e3d5..2479b255dc1d 100644 --- a/net/sched/act_ipt.c +++ b/net/sched/act_ipt.c @@ -77,7 +77,7 @@ static void ipt_destroy_target(struct xt_entry_target *t) module_put(par.target->me); } -static void tcf_ipt_release(struct tc_action *a, int bind) +static void tcf_ipt_release(struct tc_action *a) { struct tcf_ipt *ipt = to_ipt(a); ipt_destroy_target(ipt->tcfi_t); diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c index 8b3e59388480..cee2d413bf57 100644 --- a/net/sched/act_mirred.c +++ b/net/sched/act_mirred.c @@ -29,7 +29,6 @@ #include <net/tc_act/tc_mirred.h> static LIST_HEAD(mirred_list); -static DEFINE_SPINLOCK(mirred_list_lock); static bool tcf_mirred_is_act_redirect(int action) { @@ -50,18 +49,15 @@ static bool tcf_mirred_act_wants_ingress(int action) } } -static void tcf_mirred_release(struct tc_action *a, int bind) +static void tcf_mirred_release(struct tc_action *a) { struct tcf_mirred *m = to_mirred(a); struct net_device *dev; - /* We could be called either in a RCU callback or with RTNL lock held. */ - spin_lock_bh(&mirred_list_lock); list_del(&m->tcfm_list); - dev = rcu_dereference_protected(m->tcfm_dev, 1); + dev = rtnl_dereference(m->tcfm_dev); if (dev) dev_put(dev); - spin_unlock_bh(&mirred_list_lock); } static const struct nla_policy mirred_policy[TCA_MIRRED_MAX + 1] = { @@ -139,8 +135,6 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla, m->tcf_action = parm->action; m->tcfm_eaction = parm->eaction; if (dev != NULL) { - m->tcfm_ifindex = parm->ifindex; - m->net = net; if (ret != ACT_P_CREATED) dev_put(rcu_dereference_protected(m->tcfm_dev, 1)); dev_hold(dev); @@ -149,9 +143,7 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla, } if (ret == ACT_P_CREATED) { - spin_lock_bh(&mirred_list_lock); list_add(&m->tcfm_list, &mirred_list); - spin_unlock_bh(&mirred_list_lock); tcf_idr_insert(tn, *a); } @@ -247,13 +239,14 @@ static int tcf_mirred_dump(struct sk_buff *skb, struct tc_action *a, int bind, { unsigned char *b = skb_tail_pointer(skb); struct tcf_mirred *m = to_mirred(a); + struct net_device *dev = rtnl_dereference(m->tcfm_dev); struct tc_mirred opt = { .index = m->tcf_index, .action = m->tcf_action, .refcnt = m->tcf_refcnt - ref, .bindcnt = m->tcf_bindcnt - bind, .eaction = m->tcfm_eaction, - .ifindex = m->tcfm_ifindex, + .ifindex = dev ? dev->ifindex : 0, }; struct tcf_t t; @@ -294,7 +287,6 @@ static int mirred_device_event(struct notifier_block *unused, ASSERT_RTNL(); if (event == NETDEV_UNREGISTER) { - spin_lock_bh(&mirred_list_lock); list_for_each_entry(m, &mirred_list, tcfm_list) { if (rcu_access_pointer(m->tcfm_dev) == dev) { dev_put(dev); @@ -304,7 +296,6 @@ static int mirred_device_event(struct notifier_block *unused, RCU_INIT_POINTER(m->tcfm_dev, NULL); } } - spin_unlock_bh(&mirred_list_lock); } return NOTIFY_DONE; @@ -318,7 +309,7 @@ static struct net_device *tcf_mirred_get_dev(const struct tc_action *a) { struct tcf_mirred *m = to_mirred(a); - return __dev_get_by_index(m->net, m->tcfm_ifindex); + return rtnl_dereference(m->tcfm_dev); } static struct tc_action_ops act_mirred_ops = { diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c index 491fe5deb09e..dba996bcd6dc 100644 --- a/net/sched/act_pedit.c +++ b/net/sched/act_pedit.c @@ -216,7 +216,7 @@ static int tcf_pedit_init(struct net *net, struct nlattr *nla, return ret; } -static void tcf_pedit_cleanup(struct tc_action *a, int bind) +static void tcf_pedit_cleanup(struct tc_action *a) { struct tcf_pedit *p = to_pedit(a); struct tc_pedit_key *keys = p->tcfp_keys; diff --git a/net/sched/act_sample.c b/net/sched/act_sample.c index 9438969290a6..859a93903339 100644 --- a/net/sched/act_sample.c +++ b/net/sched/act_sample.c @@ -96,7 +96,7 @@ static int tcf_sample_init(struct net *net, struct nlattr *nla, return ret; } -static void tcf_sample_cleanup(struct tc_action *a, int bind) +static void tcf_sample_cleanup(struct tc_action *a) { struct tcf_sample *s = to_sample(a); struct psample_group *psample_group; diff --git a/net/sched/act_simple.c b/net/sched/act_simple.c index e7b57e5071a3..eda57b47a6b6 100644 --- a/net/sched/act_simple.c +++ b/net/sched/act_simple.c @@ -47,7 +47,7 @@ static int tcf_simp(struct sk_buff *skb, const struct tc_action *a, return d->tcf_action; } -static void tcf_simp_release(struct tc_action *a, int bind) +static void tcf_simp_release(struct tc_action *a) { struct tcf_defact *d = to_defact(a); kfree(d->tcfd_defdata); diff --git a/net/sched/act_skbmod.c b/net/sched/act_skbmod.c index b642ad3d39dd..f090bba1a79e 100644 --- a/net/sched/act_skbmod.c +++ b/net/sched/act_skbmod.c @@ -184,7 +184,7 @@ static int tcf_skbmod_init(struct net *net, struct nlattr *nla, return ret; } -static void tcf_skbmod_cleanup(struct tc_action *a, int bind) +static void tcf_skbmod_cleanup(struct tc_action *a) { struct tcf_skbmod *d = to_skbmod(a); struct tcf_skbmod_params *p; diff --git a/net/sched/act_tunnel_key.c b/net/sched/act_tunnel_key.c index 30c96274c638..57b63bdec3ae 100644 --- a/net/sched/act_tunnel_key.c +++ b/net/sched/act_tunnel_key.c @@ -201,7 +201,7 @@ err_out: return ret; } -static void tunnel_key_release(struct tc_action *a, int bind) +static void tunnel_key_release(struct tc_action *a) { struct tcf_tunnel_key *t = to_tunnel_key(a); struct tcf_tunnel_key_params *params; diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c index 97f717a13ad5..41f0878ad26e 100644 --- a/net/sched/act_vlan.c +++ b/net/sched/act_vlan.c @@ -219,7 +219,7 @@ static int tcf_vlan_init(struct net *net, struct nlattr *nla, return ret; } -static void tcf_vlan_cleanup(struct tc_action *a, int bind) +static void tcf_vlan_cleanup(struct tc_action *a) { struct tcf_vlan *v = to_vlan(a); struct tcf_vlan_params *p; diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index ddcf04b4ab43..5b9b8a61e8c4 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -218,8 +218,12 @@ static void tcf_chain_flush(struct tcf_chain *chain) static void tcf_chain_destroy(struct tcf_chain *chain) { + struct tcf_block *block = chain->block; + list_del(&chain->list); kfree(chain); + if (list_empty(&block->chain_list)) + kfree(block); } static void tcf_chain_hold(struct tcf_chain *chain) @@ -330,47 +334,32 @@ int tcf_block_get(struct tcf_block **p_block, } EXPORT_SYMBOL(tcf_block_get); -static void tcf_block_put_final(struct work_struct *work) -{ - struct tcf_block *block = container_of(work, struct tcf_block, work); - struct tcf_chain *chain, *tmp; - - rtnl_lock(); - - /* At this point, all the chains should have refcnt == 1. */ - list_for_each_entry_safe(chain, tmp, &block->chain_list, list) - tcf_chain_put(chain); - rtnl_unlock(); - kfree(block); -} - /* XXX: Standalone actions are not allowed to jump to any chain, and bound * actions should be all removed after flushing. */ void tcf_block_put_ext(struct tcf_block *block, struct Qdisc *q, struct tcf_block_ext_info *ei) { - struct tcf_chain *chain; + struct tcf_chain *chain, *tmp; - /* Hold a refcnt for all chains, except 0, so that they don't disappear + /* Hold a refcnt for all chains, so that they don't disappear * while we are iterating. */ list_for_each_entry(chain, &block->chain_list, list) - if (chain->index) - tcf_chain_hold(chain); + tcf_chain_hold(chain); list_for_each_entry(chain, &block->chain_list, list) tcf_chain_flush(chain); tcf_block_offload_unbind(block, q, ei); - INIT_WORK(&block->work, tcf_block_put_final); - /* Wait for existing RCU callbacks to cool down, make sure their works - * have been queued before this. We can not flush pending works here - * because we are holding the RTNL lock. - */ - rcu_barrier(); - tcf_queue_work(&block->work); + /* At this point, all the chains should have refcnt >= 1. */ + list_for_each_entry_safe(chain, tmp, &block->chain_list, list) + tcf_chain_put(chain); + + /* Finally, put chain 0 and allow block to be freed. */ + chain = list_first_entry(&block->chain_list, struct tcf_chain, list); + tcf_chain_put(chain); } EXPORT_SYMBOL(tcf_block_put_ext); diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c index 543a3e875d05..6132a7317efa 100644 --- a/net/sched/cls_flower.c +++ b/net/sched/cls_flower.c @@ -166,6 +166,7 @@ static int fl_classify(struct sk_buff *skb, const struct tcf_proto *tp, * so do it rather here. */ skb_key.basic.n_proto = skb->protocol; + skb_flow_dissect_tunnel_info(skb, &head->dissector, &skb_key); skb_flow_dissect(skb, &head->dissector, &skb_key, 0); fl_set_masked_key(&skb_mkey, &skb_key, &head->mask); diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index b6c4f536876b..a904276b657d 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -797,7 +797,8 @@ static int tc_fill_qdisc(struct sk_buff *skb, struct Qdisc *q, u32 clid, goto nla_put_failure; if (q->ops->dump && q->ops->dump(q, skb) < 0) goto nla_put_failure; - qlen = q->q.qlen; + + qlen = qdisc_qlen_sum(q); stab = rtnl_dereference(q->stab); if (stab && qdisc_dump_stab(skb, stab) < 0) @@ -954,6 +955,11 @@ skip: } else { const struct Qdisc_class_ops *cops = parent->ops->cl_ops; + /* Only support running class lockless if parent is lockless */ + if (new && (new->flags & TCQ_F_NOLOCK) && + parent && !(parent->flags & TCQ_F_NOLOCK)) + new->flags &= ~TCQ_F_NOLOCK; + err = -EOPNOTSUPP; if (cops && cops->graft) { unsigned long cl = cops->find(parent, classid); @@ -1020,7 +1026,7 @@ static struct Qdisc *qdisc_create(struct net_device *dev, #endif err = -ENOENT; - if (ops == NULL) + if (!ops) goto err_out; sch = qdisc_alloc(dev_queue, ops); @@ -1060,54 +1066,60 @@ static struct Qdisc *qdisc_create(struct net_device *dev, netdev_info(dev, "Caught tx_queue_len zero misconfig\n"); } - if (!ops->init || (err = ops->init(sch, tca[TCA_OPTIONS])) == 0) { - if (qdisc_is_percpu_stats(sch)) { - sch->cpu_bstats = - netdev_alloc_pcpu_stats(struct gnet_stats_basic_cpu); - if (!sch->cpu_bstats) - goto err_out4; + if (ops->init) { + err = ops->init(sch, tca[TCA_OPTIONS]); + if (err != 0) + goto err_out5; + } - sch->cpu_qstats = alloc_percpu(struct gnet_stats_queue); - if (!sch->cpu_qstats) - goto err_out4; - } + if (qdisc_is_percpu_stats(sch)) { + sch->cpu_bstats = + netdev_alloc_pcpu_stats(struct gnet_stats_basic_cpu); + if (!sch->cpu_bstats) + goto err_out4; - if (tca[TCA_STAB]) { - stab = qdisc_get_stab(tca[TCA_STAB]); - if (IS_ERR(stab)) { - err = PTR_ERR(stab); - goto err_out4; - } - rcu_assign_pointer(sch->stab, stab); - } - if (tca[TCA_RATE]) { - seqcount_t *running; - - err = -EOPNOTSUPP; - if (sch->flags & TCQ_F_MQROOT) - goto err_out4; + sch->cpu_qstats = alloc_percpu(struct gnet_stats_queue); + if (!sch->cpu_qstats) + goto err_out4; + } - if ((sch->parent != TC_H_ROOT) && - !(sch->flags & TCQ_F_INGRESS) && - (!p || !(p->flags & TCQ_F_MQROOT))) - running = qdisc_root_sleeping_running(sch); - else - running = &sch->running; - - err = gen_new_estimator(&sch->bstats, - sch->cpu_bstats, - &sch->rate_est, - NULL, - running, - tca[TCA_RATE]); - if (err) - goto err_out4; + if (tca[TCA_STAB]) { + stab = qdisc_get_stab(tca[TCA_STAB]); + if (IS_ERR(stab)) { + err = PTR_ERR(stab); + goto err_out4; } + rcu_assign_pointer(sch->stab, stab); + } + if (tca[TCA_RATE]) { + seqcount_t *running; - qdisc_hash_add(sch, false); + err = -EOPNOTSUPP; + if (sch->flags & TCQ_F_MQROOT) + goto err_out4; - return sch; + if (sch->parent != TC_H_ROOT && + !(sch->flags & TCQ_F_INGRESS) && + (!p || !(p->flags & TCQ_F_MQROOT))) + running = qdisc_root_sleeping_running(sch); + else + running = &sch->running; + + err = gen_new_estimator(&sch->bstats, + sch->cpu_bstats, + &sch->rate_est, + NULL, + running, + tca[TCA_RATE]); + if (err) + goto err_out4; } + + qdisc_hash_add(sch, false); + + return sch; + +err_out5: /* ops->init() failed, we call ->destroy() like qdisc_create_dflt() */ if (ops->destroy) ops->destroy(sch); @@ -1139,7 +1151,7 @@ static int qdisc_change(struct Qdisc *sch, struct nlattr **tca) int err = 0; if (tca[TCA_OPTIONS]) { - if (sch->ops->change == NULL) + if (!sch->ops->change) return -EINVAL; err = sch->ops->change(sch, tca[TCA_OPTIONS]); if (err) @@ -1344,7 +1356,8 @@ replay: goto create_n_graft; if (n->nlmsg_flags & NLM_F_EXCL) return -EEXIST; - if (tca[TCA_KIND] && nla_strcmp(tca[TCA_KIND], q->ops->id)) + if (tca[TCA_KIND] && + nla_strcmp(tca[TCA_KIND], q->ops->id)) return -EINVAL; if (q == p || (p && check_loop(q, p, 0))) @@ -1389,7 +1402,7 @@ replay: } /* Change qdisc parameters */ - if (q == NULL) + if (!q) return -ENOENT; if (n->nlmsg_flags & NLM_F_EXCL) return -EEXIST; diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index cd1b200acae7..981c08fe810b 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -26,6 +26,7 @@ #include <linux/list.h> #include <linux/slab.h> #include <linux/if_vlan.h> +#include <linux/skb_array.h> #include <linux/if_macvlan.h> #include <net/sch_generic.h> #include <net/pkt_sched.h> @@ -47,9 +48,70 @@ EXPORT_SYMBOL(default_qdisc_ops); * - updates to tree and tree walking are only done under the rtnl mutex. */ -static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q) +static inline struct sk_buff *__skb_dequeue_bad_txq(struct Qdisc *q) +{ + const struct netdev_queue *txq = q->dev_queue; + spinlock_t *lock = NULL; + struct sk_buff *skb; + + if (q->flags & TCQ_F_NOLOCK) { + lock = qdisc_lock(q); + spin_lock(lock); + } + + skb = skb_peek(&q->skb_bad_txq); + if (skb) { + /* check the reason of requeuing without tx lock first */ + txq = skb_get_tx_queue(txq->dev, skb); + if (!netif_xmit_frozen_or_stopped(txq)) { + skb = __skb_dequeue(&q->skb_bad_txq); + if (qdisc_is_percpu_stats(q)) { + qdisc_qstats_cpu_backlog_dec(q, skb); + qdisc_qstats_cpu_qlen_dec(q); + } else { + qdisc_qstats_backlog_dec(q, skb); + q->q.qlen--; + } + } else { + skb = NULL; + } + } + + if (lock) + spin_unlock(lock); + + return skb; +} + +static inline struct sk_buff *qdisc_dequeue_skb_bad_txq(struct Qdisc *q) +{ + struct sk_buff *skb = skb_peek(&q->skb_bad_txq); + + if (unlikely(skb)) + skb = __skb_dequeue_bad_txq(q); + + return skb; +} + +static inline void qdisc_enqueue_skb_bad_txq(struct Qdisc *q, + struct sk_buff *skb) { - q->gso_skb = skb; + spinlock_t *lock = NULL; + + if (q->flags & TCQ_F_NOLOCK) { + lock = qdisc_lock(q); + spin_lock(lock); + } + + __skb_queue_tail(&q->skb_bad_txq, skb); + + if (lock) + spin_unlock(lock); +} + +static inline int __dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q) +{ + __skb_queue_head(&q->gso_skb, skb); q->qstats.requeues++; qdisc_qstats_backlog_inc(q, skb); q->q.qlen++; /* it's still part of the queue */ @@ -58,6 +120,30 @@ static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q) return 0; } +static inline int dev_requeue_skb_locked(struct sk_buff *skb, struct Qdisc *q) +{ + spinlock_t *lock = qdisc_lock(q); + + spin_lock(lock); + __skb_queue_tail(&q->gso_skb, skb); + spin_unlock(lock); + + qdisc_qstats_cpu_requeues_inc(q); + qdisc_qstats_cpu_backlog_inc(q, skb); + qdisc_qstats_cpu_qlen_inc(q); + __netif_schedule(q); + + return 0; +} + +static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q) +{ + if (q->flags & TCQ_F_NOLOCK) + return dev_requeue_skb_locked(skb, q); + else + return __dev_requeue_skb(skb, q); +} + static void try_bulk_dequeue_skb(struct Qdisc *q, struct sk_buff *skb, const struct netdev_queue *txq, @@ -95,9 +181,15 @@ static void try_bulk_dequeue_skb_slow(struct Qdisc *q, if (!nskb) break; if (unlikely(skb_get_queue_mapping(nskb) != mapping)) { - q->skb_bad_txq = nskb; - qdisc_qstats_backlog_inc(q, nskb); - q->q.qlen++; + qdisc_enqueue_skb_bad_txq(q, nskb); + + if (qdisc_is_percpu_stats(q)) { + qdisc_qstats_cpu_backlog_inc(q, nskb); + qdisc_qstats_cpu_qlen_inc(q); + } else { + qdisc_qstats_backlog_inc(q, nskb); + q->q.qlen++; + } break; } skb->next = nskb; @@ -113,40 +205,60 @@ static void try_bulk_dequeue_skb_slow(struct Qdisc *q, static struct sk_buff *dequeue_skb(struct Qdisc *q, bool *validate, int *packets) { - struct sk_buff *skb = q->gso_skb; const struct netdev_queue *txq = q->dev_queue; + struct sk_buff *skb = NULL; *packets = 1; - if (unlikely(skb)) { + if (unlikely(!skb_queue_empty(&q->gso_skb))) { + spinlock_t *lock = NULL; + + if (q->flags & TCQ_F_NOLOCK) { + lock = qdisc_lock(q); + spin_lock(lock); + } + + skb = skb_peek(&q->gso_skb); + + /* skb may be null if another cpu pulls gso_skb off in between + * empty check and lock. + */ + if (!skb) { + if (lock) + spin_unlock(lock); + goto validate; + } + /* skb in gso_skb were already validated */ *validate = false; /* check the reason of requeuing without tx lock first */ txq = skb_get_tx_queue(txq->dev, skb); if (!netif_xmit_frozen_or_stopped(txq)) { - q->gso_skb = NULL; - qdisc_qstats_backlog_dec(q, skb); - q->q.qlen--; - } else + skb = __skb_dequeue(&q->gso_skb); + if (qdisc_is_percpu_stats(q)) { + qdisc_qstats_cpu_backlog_dec(q, skb); + qdisc_qstats_cpu_qlen_dec(q); + } else { + qdisc_qstats_backlog_dec(q, skb); + q->q.qlen--; + } + } else { skb = NULL; - goto trace; - } - *validate = true; - skb = q->skb_bad_txq; - if (unlikely(skb)) { - /* check the reason of requeuing without tx lock first */ - txq = skb_get_tx_queue(txq->dev, skb); - if (!netif_xmit_frozen_or_stopped(txq)) { - q->skb_bad_txq = NULL; - qdisc_qstats_backlog_dec(q, skb); - q->q.qlen--; - goto bulk; } - skb = NULL; + if (lock) + spin_unlock(lock); goto trace; } - if (!(q->flags & TCQ_F_ONETXQUEUE) || - !netif_xmit_frozen_or_stopped(txq)) - skb = q->dequeue(q); +validate: + *validate = true; + + if ((q->flags & TCQ_F_ONETXQUEUE) && + netif_xmit_frozen_or_stopped(txq)) + return skb; + + skb = qdisc_dequeue_skb_bad_txq(q); + if (unlikely(skb)) + goto bulk; + skb = q->dequeue(q); if (skb) { bulk: if (qdisc_may_bulk(q)) @@ -165,17 +277,18 @@ trace: * only one CPU can execute this function. * * Returns to the caller: - * 0 - queue is empty or throttled. - * >0 - queue is not empty. + * false - hardware queue frozen backoff + * true - feel free to send more pkts */ -int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q, - struct net_device *dev, struct netdev_queue *txq, - spinlock_t *root_lock, bool validate) +bool sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q, + struct net_device *dev, struct netdev_queue *txq, + spinlock_t *root_lock, bool validate) { int ret = NETDEV_TX_BUSY; /* And release qdisc */ - spin_unlock(root_lock); + if (root_lock) + spin_unlock(root_lock); /* Note that we validate skb (GSO, checksum, ...) outside of locks */ if (validate) @@ -188,27 +301,28 @@ int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q, HARD_TX_UNLOCK(dev, txq); } else { - spin_lock(root_lock); - return qdisc_qlen(q); + if (root_lock) + spin_lock(root_lock); + return true; } - spin_lock(root_lock); - if (dev_xmit_complete(ret)) { - /* Driver sent out skb successfully or skb was consumed */ - ret = qdisc_qlen(q); - } else { + if (root_lock) + spin_lock(root_lock); + + if (!dev_xmit_complete(ret)) { /* Driver returned NETDEV_TX_BUSY - requeue skb */ if (unlikely(ret != NETDEV_TX_BUSY)) net_warn_ratelimited("BUG %s code %d qlen %d\n", dev->name, ret, q->q.qlen); - ret = dev_requeue_skb(skb, q); + dev_requeue_skb(skb, q); + return false; } if (ret && netif_xmit_frozen_or_stopped(txq)) - ret = 0; + return false; - return ret; + return true; } /* @@ -230,20 +344,22 @@ int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q, * >0 - queue is not empty. * */ -static inline int qdisc_restart(struct Qdisc *q, int *packets) +static inline bool qdisc_restart(struct Qdisc *q, int *packets) { + spinlock_t *root_lock = NULL; struct netdev_queue *txq; struct net_device *dev; - spinlock_t *root_lock; struct sk_buff *skb; bool validate; /* Dequeue packet */ skb = dequeue_skb(q, &validate, packets); if (unlikely(!skb)) - return 0; + return false; + + if (!(q->flags & TCQ_F_NOLOCK)) + root_lock = qdisc_lock(q); - root_lock = qdisc_lock(q); dev = qdisc_dev(q); txq = skb_get_tx_queue(dev, skb); @@ -267,8 +383,6 @@ void __qdisc_run(struct Qdisc *q) break; } } - - qdisc_run_end(q); } unsigned long dev_trans_start(struct net_device *dev) @@ -468,93 +582,93 @@ static const u8 prio2band[TC_PRIO_MAX + 1] = { /* * Private data for a pfifo_fast scheduler containing: - * - queues for the three band - * - bitmap indicating which of the bands contain skbs + * - rings for priority bands */ struct pfifo_fast_priv { - u32 bitmap; - struct qdisc_skb_head q[PFIFO_FAST_BANDS]; + struct skb_array q[PFIFO_FAST_BANDS]; }; -/* - * Convert a bitmap to the first band number where an skb is queued, where: - * bitmap=0 means there are no skbs on any band. - * bitmap=1 means there is an skb on band 0. - * bitmap=7 means there are skbs on all 3 bands, etc. - */ -static const int bitmap2band[] = {-1, 0, 1, 0, 2, 0, 1, 0}; - -static inline struct qdisc_skb_head *band2list(struct pfifo_fast_priv *priv, - int band) +static inline struct skb_array *band2list(struct pfifo_fast_priv *priv, + int band) { - return priv->q + band; + return &priv->q[band]; } static int pfifo_fast_enqueue(struct sk_buff *skb, struct Qdisc *qdisc, struct sk_buff **to_free) { - if (qdisc->q.qlen < qdisc_dev(qdisc)->tx_queue_len) { - int band = prio2band[skb->priority & TC_PRIO_MAX]; - struct pfifo_fast_priv *priv = qdisc_priv(qdisc); - struct qdisc_skb_head *list = band2list(priv, band); - - priv->bitmap |= (1 << band); - qdisc->q.qlen++; - return __qdisc_enqueue_tail(skb, qdisc, list); - } + int band = prio2band[skb->priority & TC_PRIO_MAX]; + struct pfifo_fast_priv *priv = qdisc_priv(qdisc); + struct skb_array *q = band2list(priv, band); + int err; + + err = skb_array_produce(q, skb); - return qdisc_drop(skb, qdisc, to_free); + if (unlikely(err)) + return qdisc_drop_cpu(skb, qdisc, to_free); + + qdisc_qstats_cpu_qlen_inc(qdisc); + qdisc_qstats_cpu_backlog_inc(qdisc, skb); + return NET_XMIT_SUCCESS; } static struct sk_buff *pfifo_fast_dequeue(struct Qdisc *qdisc) { struct pfifo_fast_priv *priv = qdisc_priv(qdisc); - int band = bitmap2band[priv->bitmap]; - - if (likely(band >= 0)) { - struct qdisc_skb_head *qh = band2list(priv, band); - struct sk_buff *skb = __qdisc_dequeue_head(qh); + struct sk_buff *skb = NULL; + int band; - if (likely(skb != NULL)) { - qdisc_qstats_backlog_dec(qdisc, skb); - qdisc_bstats_update(qdisc, skb); - } + for (band = 0; band < PFIFO_FAST_BANDS && !skb; band++) { + struct skb_array *q = band2list(priv, band); - qdisc->q.qlen--; - if (qh->qlen == 0) - priv->bitmap &= ~(1 << band); + if (__skb_array_empty(q)) + continue; - return skb; + skb = skb_array_consume_bh(q); + } + if (likely(skb)) { + qdisc_qstats_cpu_backlog_dec(qdisc, skb); + qdisc_bstats_cpu_update(qdisc, skb); + qdisc_qstats_cpu_qlen_dec(qdisc); } - return NULL; + return skb; } static struct sk_buff *pfifo_fast_peek(struct Qdisc *qdisc) { struct pfifo_fast_priv *priv = qdisc_priv(qdisc); - int band = bitmap2band[priv->bitmap]; + struct sk_buff *skb = NULL; + int band; - if (band >= 0) { - struct qdisc_skb_head *qh = band2list(priv, band); + for (band = 0; band < PFIFO_FAST_BANDS && !skb; band++) { + struct skb_array *q = band2list(priv, band); - return qh->head; + skb = __skb_array_peek(q); } - return NULL; + return skb; } static void pfifo_fast_reset(struct Qdisc *qdisc) { - int prio; + int i, band; struct pfifo_fast_priv *priv = qdisc_priv(qdisc); - for (prio = 0; prio < PFIFO_FAST_BANDS; prio++) - __qdisc_reset_queue(band2list(priv, prio)); + for (band = 0; band < PFIFO_FAST_BANDS; band++) { + struct skb_array *q = band2list(priv, band); + struct sk_buff *skb; - priv->bitmap = 0; - qdisc->qstats.backlog = 0; - qdisc->q.qlen = 0; + while ((skb = skb_array_consume_bh(q)) != NULL) + kfree_skb(skb); + } + + for_each_possible_cpu(i) { + struct gnet_stats_queue *q = per_cpu_ptr(qdisc->cpu_qstats, i); + + q->backlog = 0; + q->qlen = 0; + } } static int pfifo_fast_dump(struct Qdisc *qdisc, struct sk_buff *skb) @@ -572,17 +686,48 @@ nla_put_failure: static int pfifo_fast_init(struct Qdisc *qdisc, struct nlattr *opt) { - int prio; + unsigned int qlen = qdisc_dev(qdisc)->tx_queue_len; struct pfifo_fast_priv *priv = qdisc_priv(qdisc); + int prio; - for (prio = 0; prio < PFIFO_FAST_BANDS; prio++) - qdisc_skb_head_init(band2list(priv, prio)); + /* guard against zero length rings */ + if (!qlen) + return -EINVAL; + + for (prio = 0; prio < PFIFO_FAST_BANDS; prio++) { + struct skb_array *q = band2list(priv, prio); + int err; + + err = skb_array_init(q, qlen, GFP_KERNEL); + if (err) + return -ENOMEM; + } /* Can by-pass the queue discipline */ qdisc->flags |= TCQ_F_CAN_BYPASS; return 0; } +static void pfifo_fast_destroy(struct Qdisc *sch) +{ + struct pfifo_fast_priv *priv = qdisc_priv(sch); + int prio; + + for (prio = 0; prio < PFIFO_FAST_BANDS; prio++) { + struct skb_array *q = band2list(priv, prio); + + /* NULL ring is possible if destroy path is due to a failed + * skb_array_init() in pfifo_fast_init() case. + */ + if (!&q->ring.queue) + continue; + /* Destroy ring but no need to kfree_skb because a call to + * pfifo_fast_reset() has already done that work. + */ + ptr_ring_cleanup(&q->ring, NULL); + } +} + struct Qdisc_ops pfifo_fast_ops __read_mostly = { .id = "pfifo_fast", .priv_size = sizeof(struct pfifo_fast_priv), @@ -590,9 +735,11 @@ struct Qdisc_ops pfifo_fast_ops __read_mostly = { .dequeue = pfifo_fast_dequeue, .peek = pfifo_fast_peek, .init = pfifo_fast_init, + .destroy = pfifo_fast_destroy, .reset = pfifo_fast_reset, .dump = pfifo_fast_dump, .owner = THIS_MODULE, + .static_flags = TCQ_F_NOLOCK | TCQ_F_CPUSTATS, }; EXPORT_SYMBOL(pfifo_fast_ops); @@ -630,9 +777,24 @@ struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue, sch = (struct Qdisc *) QDISC_ALIGN((unsigned long) p); sch->padded = (char *) sch - (char *) p; } + __skb_queue_head_init(&sch->gso_skb); + __skb_queue_head_init(&sch->skb_bad_txq); qdisc_skb_head_init(&sch->q); spin_lock_init(&sch->q.lock); + if (ops->static_flags & TCQ_F_CPUSTATS) { + sch->cpu_bstats = + netdev_alloc_pcpu_stats(struct gnet_stats_basic_cpu); + if (!sch->cpu_bstats) + goto errout1; + + sch->cpu_qstats = alloc_percpu(struct gnet_stats_queue); + if (!sch->cpu_qstats) { + free_percpu(sch->cpu_bstats); + goto errout1; + } + } + spin_lock_init(&sch->busylock); lockdep_set_class(&sch->busylock, dev->qdisc_tx_busylock ?: &qdisc_tx_busylock); @@ -642,6 +804,7 @@ struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue, dev->qdisc_running_key ?: &qdisc_running_key); sch->ops = ops; + sch->flags = ops->static_flags; sch->enqueue = ops->enqueue; sch->dequeue = ops->dequeue; sch->dev_queue = dev_queue; @@ -649,6 +812,8 @@ struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue, refcount_set(&sch->refcnt, 1); return sch; +errout1: + kfree(p); errout: return ERR_PTR(err); } @@ -682,17 +847,21 @@ EXPORT_SYMBOL(qdisc_create_dflt); void qdisc_reset(struct Qdisc *qdisc) { const struct Qdisc_ops *ops = qdisc->ops; + struct sk_buff *skb, *tmp; if (ops->reset) ops->reset(qdisc); - kfree_skb(qdisc->skb_bad_txq); - qdisc->skb_bad_txq = NULL; + skb_queue_walk_safe(&qdisc->gso_skb, skb, tmp) { + __skb_unlink(skb, &qdisc->gso_skb); + kfree_skb_list(skb); + } - if (qdisc->gso_skb) { - kfree_skb_list(qdisc->gso_skb); - qdisc->gso_skb = NULL; + skb_queue_walk_safe(&qdisc->skb_bad_txq, skb, tmp) { + __skb_unlink(skb, &qdisc->skb_bad_txq); + kfree_skb_list(skb); } + qdisc->q.qlen = 0; qdisc->qstats.backlog = 0; } @@ -711,6 +880,7 @@ static void qdisc_free(struct Qdisc *qdisc) void qdisc_destroy(struct Qdisc *qdisc) { const struct Qdisc_ops *ops = qdisc->ops; + struct sk_buff *skb, *tmp; if (qdisc->flags & TCQ_F_BUILTIN || !refcount_dec_and_test(&qdisc->refcnt)) @@ -730,8 +900,16 @@ void qdisc_destroy(struct Qdisc *qdisc) module_put(ops->owner); dev_put(qdisc_dev(qdisc)); - kfree_skb_list(qdisc->gso_skb); - kfree_skb(qdisc->skb_bad_txq); + skb_queue_walk_safe(&qdisc->gso_skb, skb, tmp) { + __skb_unlink(skb, &qdisc->gso_skb); + kfree_skb_list(skb); + } + + skb_queue_walk_safe(&qdisc->skb_bad_txq, skb, tmp) { + __skb_unlink(skb, &qdisc->skb_bad_txq); + kfree_skb_list(skb); + } + qdisc_free(qdisc); } EXPORT_SYMBOL(qdisc_destroy); @@ -746,10 +924,6 @@ struct Qdisc *dev_graft_qdisc(struct netdev_queue *dev_queue, root_lock = qdisc_lock(oqdisc); spin_lock_bh(root_lock); - /* Prune old scheduler */ - if (oqdisc && refcount_read(&oqdisc->refcnt) <= 1) - qdisc_reset(oqdisc); - /* ... and graft new one */ if (qdisc == NULL) qdisc = &noop_qdisc; @@ -885,14 +1059,18 @@ static bool some_qdisc_is_busy(struct net_device *dev) dev_queue = netdev_get_tx_queue(dev, i); q = dev_queue->qdisc_sleeping; - root_lock = qdisc_lock(q); - spin_lock_bh(root_lock); + if (q->flags & TCQ_F_NOLOCK) { + val = test_bit(__QDISC_STATE_SCHED, &q->state); + } else { + root_lock = qdisc_lock(q); + spin_lock_bh(root_lock); - val = (qdisc_is_running(q) || - test_bit(__QDISC_STATE_SCHED, &q->state)); + val = (qdisc_is_running(q) || + test_bit(__QDISC_STATE_SCHED, &q->state)); - spin_unlock_bh(root_lock); + spin_unlock_bh(root_lock); + } if (val) return true; @@ -900,6 +1078,16 @@ static bool some_qdisc_is_busy(struct net_device *dev) return false; } +static void dev_qdisc_reset(struct net_device *dev, + struct netdev_queue *dev_queue, + void *none) +{ + struct Qdisc *qdisc = dev_queue->qdisc_sleeping; + + if (qdisc) + qdisc_reset(qdisc); +} + /** * dev_deactivate_many - deactivate transmissions on several devices * @head: list of devices to deactivate @@ -910,7 +1098,6 @@ static bool some_qdisc_is_busy(struct net_device *dev) void dev_deactivate_many(struct list_head *head) { struct net_device *dev; - bool sync_needed = false; list_for_each_entry(dev, head, close_list) { netdev_for_each_tx_queue(dev, dev_deactivate_queue, @@ -920,20 +1107,25 @@ void dev_deactivate_many(struct list_head *head) &noop_qdisc); dev_watchdog_down(dev); - sync_needed |= !dev->dismantle; } /* Wait for outstanding qdisc-less dev_queue_xmit calls. * This is avoided if all devices are in dismantle phase : * Caller will call synchronize_net() for us */ - if (sync_needed) - synchronize_net(); + synchronize_net(); /* Wait for outstanding qdisc_run calls. */ - list_for_each_entry(dev, head, close_list) + list_for_each_entry(dev, head, close_list) { while (some_qdisc_is_busy(dev)) yield(); + /* The new qdisc is assigned at this point so we can safely + * unwind stale skb lists and qdisc statistics + */ + netdev_for_each_tx_queue(dev, dev_qdisc_reset, NULL); + if (dev_ingress_queue(dev)) + dev_qdisc_reset(dev, dev_ingress_queue(dev), NULL); + } } void dev_deactivate(struct net_device *dev) @@ -954,6 +1146,8 @@ static void dev_init_scheduler_queue(struct net_device *dev, rcu_assign_pointer(dev_queue->qdisc, qdisc); dev_queue->qdisc_sleeping = qdisc; + __skb_queue_head_init(&qdisc->gso_skb); + __skb_queue_head_init(&qdisc->skb_bad_txq); } void dev_init_scheduler(struct net_device *dev) diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c index 213b586a06a0..8cbb5c829d59 100644 --- a/net/sched/sch_mq.c +++ b/net/sched/sch_mq.c @@ -17,6 +17,7 @@ #include <linux/skbuff.h> #include <net/netlink.h> #include <net/pkt_sched.h> +#include <net/sch_generic.h> struct mq_sched { struct Qdisc **qdiscs; @@ -97,23 +98,42 @@ static int mq_dump(struct Qdisc *sch, struct sk_buff *skb) struct net_device *dev = qdisc_dev(sch); struct Qdisc *qdisc; unsigned int ntx; + __u32 qlen = 0; sch->q.qlen = 0; memset(&sch->bstats, 0, sizeof(sch->bstats)); memset(&sch->qstats, 0, sizeof(sch->qstats)); + /* MQ supports lockless qdiscs. However, statistics accounting needs + * to account for all, none, or a mix of locked and unlocked child + * qdiscs. Percpu stats are added to counters in-band and locking + * qdisc totals are added at end. + */ for (ntx = 0; ntx < dev->num_tx_queues; ntx++) { qdisc = netdev_get_tx_queue(dev, ntx)->qdisc_sleeping; spin_lock_bh(qdisc_lock(qdisc)); - sch->q.qlen += qdisc->q.qlen; - sch->bstats.bytes += qdisc->bstats.bytes; - sch->bstats.packets += qdisc->bstats.packets; - sch->qstats.backlog += qdisc->qstats.backlog; - sch->qstats.drops += qdisc->qstats.drops; - sch->qstats.requeues += qdisc->qstats.requeues; - sch->qstats.overlimits += qdisc->qstats.overlimits; + + if (qdisc_is_percpu_stats(qdisc)) { + qlen = qdisc_qlen_sum(qdisc); + __gnet_stats_copy_basic(NULL, &sch->bstats, + qdisc->cpu_bstats, + &qdisc->bstats); + __gnet_stats_copy_queue(&sch->qstats, + qdisc->cpu_qstats, + &qdisc->qstats, qlen); + } else { + sch->q.qlen += qdisc->q.qlen; + sch->bstats.bytes += qdisc->bstats.bytes; + sch->bstats.packets += qdisc->bstats.packets; + sch->qstats.backlog += qdisc->qstats.backlog; + sch->qstats.drops += qdisc->qstats.drops; + sch->qstats.requeues += qdisc->qstats.requeues; + sch->qstats.overlimits += qdisc->qstats.overlimits; + } + spin_unlock_bh(qdisc_lock(qdisc)); } + return 0; } diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c index b85885a9d8a1..8622745f3cd9 100644 --- a/net/sched/sch_mqprio.c +++ b/net/sched/sch_mqprio.c @@ -388,22 +388,40 @@ static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb) struct nlattr *nla = (struct nlattr *)skb_tail_pointer(skb); struct tc_mqprio_qopt opt = { 0 }; struct Qdisc *qdisc; - unsigned int i; + unsigned int ntx, tc; sch->q.qlen = 0; memset(&sch->bstats, 0, sizeof(sch->bstats)); memset(&sch->qstats, 0, sizeof(sch->qstats)); - for (i = 0; i < dev->num_tx_queues; i++) { - qdisc = rtnl_dereference(netdev_get_tx_queue(dev, i)->qdisc); + /* MQ supports lockless qdiscs. However, statistics accounting needs + * to account for all, none, or a mix of locked and unlocked child + * qdiscs. Percpu stats are added to counters in-band and locking + * qdisc totals are added at end. + */ + for (ntx = 0; ntx < dev->num_tx_queues; ntx++) { + qdisc = netdev_get_tx_queue(dev, ntx)->qdisc_sleeping; spin_lock_bh(qdisc_lock(qdisc)); - sch->q.qlen += qdisc->q.qlen; - sch->bstats.bytes += qdisc->bstats.bytes; - sch->bstats.packets += qdisc->bstats.packets; - sch->qstats.backlog += qdisc->qstats.backlog; - sch->qstats.drops += qdisc->qstats.drops; - sch->qstats.requeues += qdisc->qstats.requeues; - sch->qstats.overlimits += qdisc->qstats.overlimits; + + if (qdisc_is_percpu_stats(qdisc)) { + __u32 qlen = qdisc_qlen_sum(qdisc); + + __gnet_stats_copy_basic(NULL, &sch->bstats, + qdisc->cpu_bstats, + &qdisc->bstats); + __gnet_stats_copy_queue(&sch->qstats, + qdisc->cpu_qstats, + &qdisc->qstats, qlen); + } else { + sch->q.qlen += qdisc->q.qlen; + sch->bstats.bytes += qdisc->bstats.bytes; + sch->bstats.packets += qdisc->bstats.packets; + sch->qstats.backlog += qdisc->qstats.backlog; + sch->qstats.drops += qdisc->qstats.drops; + sch->qstats.requeues += qdisc->qstats.requeues; + sch->qstats.overlimits += qdisc->qstats.overlimits; + } + spin_unlock_bh(qdisc_lock(qdisc)); } @@ -411,9 +429,9 @@ static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb) memcpy(opt.prio_tc_map, dev->prio_tc_map, sizeof(opt.prio_tc_map)); opt.hw = priv->hw_offload; - for (i = 0; i < netdev_get_num_tc(dev); i++) { - opt.count[i] = dev->tc_to_txq[i].count; - opt.offset[i] = dev->tc_to_txq[i].offset; + for (tc = 0; tc < netdev_get_num_tc(dev); tc++) { + opt.count[tc] = dev->tc_to_txq[tc].count; + opt.offset[tc] = dev->tc_to_txq[tc].offset; } if (nla_put(skb, TCA_OPTIONS, NLA_ALIGN(sizeof(opt)), &opt)) @@ -495,7 +513,6 @@ static int mqprio_dump_class_stats(struct Qdisc *sch, unsigned long cl, if (cl >= TC_H_MIN_PRIORITY) { int i; __u32 qlen = 0; - struct Qdisc *qdisc; struct gnet_stats_queue qstats = {0}; struct gnet_stats_basic_packed bstats = {0}; struct net_device *dev = qdisc_dev(sch); @@ -511,18 +528,26 @@ static int mqprio_dump_class_stats(struct Qdisc *sch, unsigned long cl, for (i = tc.offset; i < tc.offset + tc.count; i++) { struct netdev_queue *q = netdev_get_tx_queue(dev, i); + struct Qdisc *qdisc = rtnl_dereference(q->qdisc); + struct gnet_stats_basic_cpu __percpu *cpu_bstats = NULL; + struct gnet_stats_queue __percpu *cpu_qstats = NULL; - qdisc = rtnl_dereference(q->qdisc); spin_lock_bh(qdisc_lock(qdisc)); - qlen += qdisc->q.qlen; - bstats.bytes += qdisc->bstats.bytes; - bstats.packets += qdisc->bstats.packets; - qstats.backlog += qdisc->qstats.backlog; - qstats.drops += qdisc->qstats.drops; - qstats.requeues += qdisc->qstats.requeues; - qstats.overlimits += qdisc->qstats.overlimits; + if (qdisc_is_percpu_stats(qdisc)) { + cpu_bstats = qdisc->cpu_bstats; + cpu_qstats = qdisc->cpu_qstats; + } + + qlen = qdisc_qlen_sum(qdisc); + __gnet_stats_copy_basic(NULL, &sch->bstats, + cpu_bstats, &qdisc->bstats); + __gnet_stats_copy_queue(&sch->qstats, + cpu_qstats, + &qdisc->qstats, + qlen); spin_unlock_bh(qdisc_lock(qdisc)); } + /* Reclaim root sleeping lock before completing stats */ if (d->lock) spin_lock_bh(d->lock); diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index 6451c5013e06..daf8075f5a4c 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -520,7 +520,7 @@ decline_rdma: smc->use_fallback = true; if (reason_code && (reason_code != SMC_CLC_DECL_REPLY)) { rc = smc_clc_send_decline(smc, reason_code); - if (rc < sizeof(struct smc_clc_msg_decline)) + if (rc < 0) goto out_err; } goto out_connected; @@ -751,14 +751,16 @@ static void smc_listen_work(struct work_struct *work) { struct smc_sock *new_smc = container_of(work, struct smc_sock, smc_listen_work); + struct smc_clc_msg_proposal_prefix *pclc_prfx; struct socket *newclcsock = new_smc->clcsock; struct smc_sock *lsmc = new_smc->listen_smc; struct smc_clc_msg_accept_confirm cclc; int local_contact = SMC_REUSE_CONTACT; struct sock *newsmcsk = &new_smc->sk; - struct smc_clc_msg_proposal pclc; + struct smc_clc_msg_proposal *pclc; struct smc_ib_device *smcibdev; struct sockaddr_in peeraddr; + u8 buf[SMC_CLC_MAX_LEN]; struct smc_link *link; int reason_code = 0; int rc = 0, len; @@ -775,7 +777,7 @@ static void smc_listen_work(struct work_struct *work) /* do inband token exchange - *wait for and receive SMC Proposal CLC message */ - reason_code = smc_clc_wait_msg(new_smc, &pclc, sizeof(pclc), + reason_code = smc_clc_wait_msg(new_smc, &buf, sizeof(buf), SMC_CLC_PROPOSAL); if (reason_code < 0) goto out_err; @@ -804,8 +806,11 @@ static void smc_listen_work(struct work_struct *work) reason_code = SMC_CLC_DECL_CNFERR; /* configuration error */ goto decline_rdma; } - if ((pclc.outgoing_subnet != subnet) || - (pclc.prefix_len != prefix_len)) { + + pclc = (struct smc_clc_msg_proposal *)&buf; + pclc_prfx = smc_clc_proposal_get_prefix(pclc); + if (pclc_prfx->outgoing_subnet != subnet || + pclc_prfx->prefix_len != prefix_len) { reason_code = SMC_CLC_DECL_CNFERR; /* configuration error */ goto decline_rdma; } @@ -816,7 +821,7 @@ static void smc_listen_work(struct work_struct *work) /* allocate connection / link group */ mutex_lock(&smc_create_lgr_pending); local_contact = smc_conn_create(new_smc, peeraddr.sin_addr.s_addr, - smcibdev, ibport, &pclc.lcl, 0); + smcibdev, ibport, &pclc->lcl, 0); if (local_contact < 0) { rc = local_contact; if (rc == -ENOMEM) @@ -879,11 +884,9 @@ static void smc_listen_work(struct work_struct *work) } /* QP confirmation over RoCE fabric */ reason_code = smc_serv_conf_first_link(new_smc); - if (reason_code < 0) { + if (reason_code < 0) /* peer is not aware of a problem */ - rc = reason_code; goto out_err_unlock; - } if (reason_code > 0) goto decline_rdma_unlock; } @@ -916,8 +919,7 @@ decline_rdma: smc_conn_free(&new_smc->conn); new_smc->use_fallback = true; if (reason_code && (reason_code != SMC_CLC_DECL_REPLY)) { - rc = smc_clc_send_decline(new_smc, reason_code); - if (rc < sizeof(struct smc_clc_msg_decline)) + if (smc_clc_send_decline(new_smc, reason_code) < 0) goto out_err; } goto out_connected; diff --git a/net/smc/smc_cdc.c b/net/smc/smc_cdc.c index 87f7bede6eab..d4155ff6acde 100644 --- a/net/smc/smc_cdc.c +++ b/net/smc/smc_cdc.c @@ -213,6 +213,9 @@ static void smc_cdc_msg_recv_action(struct smc_sock *smc, /* guarantee 0 <= bytes_to_rcv <= rmbe_size */ smp_mb__after_atomic(); smc->sk.sk_data_ready(&smc->sk); + } else if ((conn->local_rx_ctrl.prod_flags.write_blocked) || + (conn->local_rx_ctrl.prod_flags.cons_curs_upd_req)) { + smc->sk.sk_data_ready(&smc->sk); } if (conn->local_rx_ctrl.conn_state_flags.peer_conn_abort) { @@ -234,15 +237,6 @@ static void smc_cdc_msg_recv_action(struct smc_sock *smc, /* trigger socket release if connection closed */ smc_close_wake_tx_prepared(smc); } - - /* socket connected but not accepted */ - if (!smc->sk.sk_socket) - return; - - /* data available */ - if ((conn->local_rx_ctrl.prod_flags.write_blocked) || - (conn->local_rx_ctrl.prod_flags.cons_curs_upd_req)) - smc_tx_consumer_update(conn); } /* called under tasklet context */ diff --git a/net/smc/smc_clc.c b/net/smc/smc_clc.c index 1800e16b2a02..abf7ceb6690b 100644 --- a/net/smc/smc_clc.c +++ b/net/smc/smc_clc.c @@ -22,6 +22,54 @@ #include "smc_clc.h" #include "smc_ib.h" +/* check if received message has a correct header length and contains valid + * heading and trailing eyecatchers + */ +static bool smc_clc_msg_hdr_valid(struct smc_clc_msg_hdr *clcm) +{ + struct smc_clc_msg_proposal_prefix *pclc_prfx; + struct smc_clc_msg_accept_confirm *clc; + struct smc_clc_msg_proposal *pclc; + struct smc_clc_msg_decline *dclc; + struct smc_clc_msg_trail *trl; + + if (memcmp(clcm->eyecatcher, SMC_EYECATCHER, sizeof(SMC_EYECATCHER))) + return false; + switch (clcm->type) { + case SMC_CLC_PROPOSAL: + pclc = (struct smc_clc_msg_proposal *)clcm; + pclc_prfx = smc_clc_proposal_get_prefix(pclc); + if (ntohs(pclc->hdr.length) != + sizeof(*pclc) + ntohs(pclc->iparea_offset) + + sizeof(*pclc_prfx) + + pclc_prfx->ipv6_prefixes_cnt * + sizeof(struct smc_clc_ipv6_prefix) + + sizeof(*trl)) + return false; + trl = (struct smc_clc_msg_trail *) + ((u8 *)pclc + ntohs(pclc->hdr.length) - sizeof(*trl)); + break; + case SMC_CLC_ACCEPT: + case SMC_CLC_CONFIRM: + clc = (struct smc_clc_msg_accept_confirm *)clcm; + if (ntohs(clc->hdr.length) != sizeof(*clc)) + return false; + trl = &clc->trl; + break; + case SMC_CLC_DECLINE: + dclc = (struct smc_clc_msg_decline *)clcm; + if (ntohs(dclc->hdr.length) != sizeof(*dclc)) + return false; + trl = &dclc->trl; + break; + default: + return false; + } + if (memcmp(trl->eyecatcher, SMC_EYECATCHER, sizeof(SMC_EYECATCHER))) + return false; + return true; +} + /* Wait for data on the tcp-socket, analyze received data * Returns: * 0 if success and it was not a decline that we received. @@ -72,9 +120,7 @@ int smc_clc_wait_msg(struct smc_sock *smc, void *buf, int buflen, } datlen = ntohs(clcm->length); if ((len < sizeof(struct smc_clc_msg_hdr)) || - (datlen < sizeof(struct smc_clc_msg_decline)) || - (datlen > sizeof(struct smc_clc_msg_accept_confirm)) || - memcmp(clcm->eyecatcher, SMC_EYECATCHER, sizeof(SMC_EYECATCHER)) || + (datlen > buflen) || ((clcm->type != SMC_CLC_DECLINE) && (clcm->type != expected_type))) { smc->sk.sk_err = EPROTO; @@ -89,7 +135,7 @@ int smc_clc_wait_msg(struct smc_sock *smc, void *buf, int buflen, krflags = MSG_WAITALL; smc->clcsock->sk->sk_rcvtimeo = CLC_WAIT_TIME; len = kernel_recvmsg(smc->clcsock, &msg, &vec, 1, datlen, krflags); - if (len < datlen) { + if (len < datlen || !smc_clc_msg_hdr_valid(clcm)) { smc->sk.sk_err = EPROTO; reason_code = -EPROTO; goto out; @@ -133,7 +179,7 @@ int smc_clc_send_decline(struct smc_sock *smc, u32 peer_diag_info) smc->sk.sk_err = EPROTO; if (len < 0) smc->sk.sk_err = -len; - return len; + return sock_error(&smc->sk); } /* send CLC PROPOSAL message across internal TCP socket */ @@ -141,33 +187,43 @@ int smc_clc_send_proposal(struct smc_sock *smc, struct smc_ib_device *smcibdev, u8 ibport) { + struct smc_clc_msg_proposal_prefix pclc_prfx; struct smc_clc_msg_proposal pclc; + struct smc_clc_msg_trail trl; int reason_code = 0; + struct kvec vec[3]; struct msghdr msg; - struct kvec vec; - int len, rc; + int len, plen, rc; /* send SMC Proposal CLC message */ + plen = sizeof(pclc) + sizeof(pclc_prfx) + sizeof(trl); memset(&pclc, 0, sizeof(pclc)); memcpy(pclc.hdr.eyecatcher, SMC_EYECATCHER, sizeof(SMC_EYECATCHER)); pclc.hdr.type = SMC_CLC_PROPOSAL; - pclc.hdr.length = htons(sizeof(pclc)); + pclc.hdr.length = htons(plen); pclc.hdr.version = SMC_CLC_V1; /* SMC version */ memcpy(pclc.lcl.id_for_peer, local_systemid, sizeof(local_systemid)); memcpy(&pclc.lcl.gid, &smcibdev->gid[ibport - 1], SMC_GID_SIZE); memcpy(&pclc.lcl.mac, &smcibdev->mac[ibport - 1], ETH_ALEN); + pclc.iparea_offset = htons(0); + memset(&pclc_prfx, 0, sizeof(pclc_prfx)); /* determine subnet and mask from internal TCP socket */ - rc = smc_netinfo_by_tcpsk(smc->clcsock, &pclc.outgoing_subnet, - &pclc.prefix_len); + rc = smc_netinfo_by_tcpsk(smc->clcsock, &pclc_prfx.outgoing_subnet, + &pclc_prfx.prefix_len); if (rc) return SMC_CLC_DECL_CNFERR; /* configuration error */ - memcpy(pclc.trl.eyecatcher, SMC_EYECATCHER, sizeof(SMC_EYECATCHER)); + pclc_prfx.ipv6_prefixes_cnt = 0; + memcpy(trl.eyecatcher, SMC_EYECATCHER, sizeof(SMC_EYECATCHER)); memset(&msg, 0, sizeof(msg)); - vec.iov_base = &pclc; - vec.iov_len = sizeof(pclc); + vec[0].iov_base = &pclc; + vec[0].iov_len = sizeof(pclc); + vec[1].iov_base = &pclc_prfx; + vec[1].iov_len = sizeof(pclc_prfx); + vec[2].iov_base = &trl; + vec[2].iov_len = sizeof(trl); /* due to the few bytes needed for clc-handshake this cannot block */ - len = kernel_sendmsg(smc->clcsock, &msg, &vec, 1, sizeof(pclc)); + len = kernel_sendmsg(smc->clcsock, &msg, vec, 3, plen); if (len < sizeof(pclc)) { if (len >= 0) { reason_code = -ENETUNREACH; diff --git a/net/smc/smc_clc.h b/net/smc/smc_clc.h index 12a9af1539a2..c145a0f36a68 100644 --- a/net/smc/smc_clc.h +++ b/net/smc/smc_clc.h @@ -44,7 +44,7 @@ struct smc_clc_msg_hdr { /* header1 of clc messages */ #if defined(__BIG_ENDIAN_BITFIELD) u8 version : 4, flag : 1, - rsvd : 3; + rsvd : 3; #elif defined(__LITTLE_ENDIAN_BITFIELD) u8 rsvd : 3, flag : 1, @@ -62,17 +62,31 @@ struct smc_clc_msg_local { /* header2 of clc messages */ u8 mac[6]; /* mac of ib_device port */ }; -struct smc_clc_msg_proposal { /* clc proposal message */ - struct smc_clc_msg_hdr hdr; - struct smc_clc_msg_local lcl; - __be16 iparea_offset; /* offset to IP address information area */ +struct smc_clc_ipv6_prefix { + u8 prefix[4]; + u8 prefix_len; +} __packed; + +struct smc_clc_msg_proposal_prefix { /* prefix part of clc proposal message*/ __be32 outgoing_subnet; /* subnet mask */ u8 prefix_len; /* number of significant bits in mask */ u8 reserved[2]; u8 ipv6_prefixes_cnt; /* number of IPv6 prefixes in prefix array */ - struct smc_clc_msg_trail trl; /* eye catcher "SMCR" EBCDIC */ } __aligned(4); +struct smc_clc_msg_proposal { /* clc proposal message sent by Linux */ + struct smc_clc_msg_hdr hdr; + struct smc_clc_msg_local lcl; + __be16 iparea_offset; /* offset to IP address information area */ +} __aligned(4); + +#define SMC_CLC_PROPOSAL_MAX_OFFSET 0x28 +#define SMC_CLC_PROPOSAL_MAX_PREFIX (8 * sizeof(struct smc_clc_ipv6_prefix)) +#define SMC_CLC_MAX_LEN (sizeof(struct smc_clc_msg_proposal) + \ + SMC_CLC_PROPOSAL_MAX_OFFSET + \ + SMC_CLC_PROPOSAL_MAX_PREFIX + \ + sizeof(struct smc_clc_msg_trail)) + struct smc_clc_msg_accept_confirm { /* clc accept / confirm message */ struct smc_clc_msg_hdr hdr; struct smc_clc_msg_local lcl; @@ -102,6 +116,14 @@ struct smc_clc_msg_decline { /* clc decline message */ struct smc_clc_msg_trail trl; /* eye catcher "SMCR" EBCDIC */ } __aligned(4); +/* determine start of the prefix area within the proposal message */ +static inline struct smc_clc_msg_proposal_prefix * +smc_clc_proposal_get_prefix(struct smc_clc_msg_proposal *pclc) +{ + return (struct smc_clc_msg_proposal_prefix *) + ((u8 *)pclc + sizeof(*pclc) + ntohs(pclc->iparea_offset)); +} + struct smc_sock; struct smc_ib_device; diff --git a/net/smc/smc_close.c b/net/smc/smc_close.c index 48615d2ac4aa..e194c6cc308a 100644 --- a/net/smc/smc_close.c +++ b/net/smc/smc_close.c @@ -113,7 +113,7 @@ static int smc_close_abort(struct smc_connection *conn) /* terminate smc socket abnormally - active abort * RDMA communication no longer possible */ -void smc_close_active_abort(struct smc_sock *smc) +static void smc_close_active_abort(struct smc_sock *smc) { struct smc_cdc_conn_state_flags *txflags = &smc->conn.local_tx_ctrl.conn_state_flags; diff --git a/net/smc/smc_close.h b/net/smc/smc_close.h index ed82506b1b0a..8c498885d758 100644 --- a/net/smc/smc_close.h +++ b/net/smc/smc_close.h @@ -20,7 +20,6 @@ #define SMC_CLOSE_SOCK_PUT_DELAY HZ void smc_close_wake_tx_prepared(struct smc_sock *smc); -void smc_close_active_abort(struct smc_sock *smc); int smc_close_active(struct smc_sock *smc); void smc_close_sock_put_work(struct work_struct *work); int smc_close_shutdown_write(struct smc_sock *smc); diff --git a/net/smc/smc_rx.c b/net/smc/smc_rx.c index cbf58637ee14..9dc392ca06bf 100644 --- a/net/smc/smc_rx.c +++ b/net/smc/smc_rx.c @@ -65,7 +65,6 @@ static int smc_rx_wait_data(struct smc_sock *smc, long *timeo) rc = sk_wait_event(sk, timeo, sk->sk_err || sk->sk_shutdown & RCV_SHUTDOWN || - sock_flag(sk, SOCK_DONE) || atomic_read(&conn->bytes_to_rcv) || smc_cdc_rxed_any_close_or_senddone(conn), &wait); @@ -116,7 +115,7 @@ int smc_rx_recvmsg(struct smc_sock *smc, struct msghdr *msg, size_t len, if (read_done) { if (sk->sk_err || sk->sk_state == SMC_CLOSED || - (sk->sk_shutdown & RCV_SHUTDOWN) || + sk->sk_shutdown & RCV_SHUTDOWN || !timeo || signal_pending(current) || smc_cdc_rxed_any_close_or_senddone(conn) || @@ -124,8 +123,6 @@ int smc_rx_recvmsg(struct smc_sock *smc, struct msghdr *msg, size_t len, peer_conn_abort) break; } else { - if (sock_flag(sk, SOCK_DONE)) - break; if (sk->sk_err) { read_done = sock_error(sk); break; diff --git a/net/smc/smc_tx.c b/net/smc/smc_tx.c index c48dc2d5fd3a..2e50fddf8ce9 100644 --- a/net/smc/smc_tx.c +++ b/net/smc/smc_tx.c @@ -104,14 +104,12 @@ static int smc_tx_wait_memory(struct smc_sock *smc, int flags) if (atomic_read(&conn->sndbuf_space)) break; /* at least 1 byte of free space available */ set_bit(SOCK_NOSPACE, &sk->sk_socket->flags); - sk->sk_write_pending++; sk_wait_event(sk, &timeo, sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN) || smc_cdc_rxed_any_close_or_senddone(conn) || atomic_read(&conn->sndbuf_space), &wait); - sk->sk_write_pending--; } remove_wait_queue(sk_sleep(sk), &wait); return rc; @@ -450,9 +448,7 @@ static void smc_tx_work(struct work_struct *work) void smc_tx_consumer_update(struct smc_connection *conn) { union smc_host_cursor cfed, cons; - struct smc_cdc_tx_pend *pend; - struct smc_wr_buf *wr_buf; - int to_confirm, rc; + int to_confirm; smc_curs_write(&cons, smc_curs_read(&conn->local_tx_ctrl.cons, conn), @@ -466,10 +462,7 @@ void smc_tx_consumer_update(struct smc_connection *conn) ((to_confirm > conn->rmbe_update_limit) && ((to_confirm > (conn->rmbe_size / 2)) || conn->local_rx_ctrl.prod_flags.write_blocked))) { - rc = smc_cdc_get_free_slot(conn, &wr_buf, &pend); - if (!rc) - rc = smc_cdc_msg_send(conn, wr_buf, pend); - if (rc < 0) { + if (smc_cdc_get_slot_and_msg_send(conn) < 0) { schedule_delayed_work(&conn->tx_work, SMC_TX_WORK_DELAY); return; diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c index 329325bd553e..37892b3909af 100644 --- a/net/tipc/bcast.c +++ b/net/tipc/bcast.c @@ -1,7 +1,7 @@ /* * net/tipc/bcast.c: TIPC broadcast code * - * Copyright (c) 2004-2006, 2014-2016, Ericsson AB + * Copyright (c) 2004-2006, 2014-2017, Ericsson AB * Copyright (c) 2004, Intel Corporation. * Copyright (c) 2005, 2010-2011, Wind River Systems * All rights reserved. @@ -42,8 +42,8 @@ #include "link.h" #include "name_table.h" -#define BCLINK_WIN_DEFAULT 50 /* bcast link window size (default) */ -#define BCLINK_WIN_MIN 32 /* bcast minimum link window size */ +#define BCLINK_WIN_DEFAULT 50 /* bcast link window size (default) */ +#define BCLINK_WIN_MIN 32 /* bcast minimum link window size */ const char tipc_bclink_name[] = "broadcast-link"; @@ -74,6 +74,10 @@ static struct tipc_bc_base *tipc_bc_base(struct net *net) return tipc_net(net)->bcbase; } +/* tipc_bcast_get_mtu(): -get the MTU currently used by broadcast link + * Note: the MTU is decremented to give room for a tunnel header, in + * case the message needs to be sent as replicast + */ int tipc_bcast_get_mtu(struct net *net) { return tipc_link_mtu(tipc_bc_sndlink(net)) - INT_H_SIZE; @@ -515,7 +519,7 @@ int tipc_bcast_init(struct net *net) spin_lock_init(&tipc_net(net)->bclock); if (!tipc_link_bc_create(net, 0, 0, - U16_MAX, + FB_MTU, BCLINK_WIN_DEFAULT, 0, &bb->inputq, diff --git a/net/tipc/link.c b/net/tipc/link.c index 6bce0b1117bd..2d6b2aed30e0 100644 --- a/net/tipc/link.c +++ b/net/tipc/link.c @@ -483,7 +483,7 @@ bool tipc_link_create(struct net *net, char *if_name, int bearer_id, /** * tipc_link_bc_create - create new link to be used for broadcast * @n: pointer to associated node - * @mtu: mtu to be used + * @mtu: mtu to be used initially if no peers * @window: send window to be used * @inputq: queue to put messages ready for delivery * @namedq: queue to put binding table update messages ready for delivery diff --git a/net/tipc/msg.c b/net/tipc/msg.c index b0d07b35909d..55d8ba92291d 100644 --- a/net/tipc/msg.c +++ b/net/tipc/msg.c @@ -251,20 +251,23 @@ bool tipc_msg_validate(struct sk_buff **_skb) * @pktmax: Max packet size that can be used * @list: Buffer or chain of buffers to be returned to caller * + * Note that the recursive call we are making here is safe, since it can + * logically go only one further level down. + * * Returns message data size or errno: -ENOMEM, -EFAULT */ -int tipc_msg_build(struct tipc_msg *mhdr, struct msghdr *m, - int offset, int dsz, int pktmax, struct sk_buff_head *list) +int tipc_msg_build(struct tipc_msg *mhdr, struct msghdr *m, int offset, + int dsz, int pktmax, struct sk_buff_head *list) { int mhsz = msg_hdr_sz(mhdr); + struct tipc_msg pkthdr; int msz = mhsz + dsz; - int pktno = 1; - int pktsz; int pktrem = pktmax; - int drem = dsz; - struct tipc_msg pkthdr; struct sk_buff *skb; + int drem = dsz; + int pktno = 1; char *pktpos; + int pktsz; int rc; msg_set_size(mhdr, msz); @@ -272,8 +275,18 @@ int tipc_msg_build(struct tipc_msg *mhdr, struct msghdr *m, /* No fragmentation needed? */ if (likely(msz <= pktmax)) { skb = tipc_buf_acquire(msz, GFP_KERNEL); - if (unlikely(!skb)) + + /* Fall back to smaller MTU if node local message */ + if (unlikely(!skb)) { + if (pktmax != MAX_MSG_SIZE) + return -ENOMEM; + rc = tipc_msg_build(mhdr, m, offset, dsz, FB_MTU, list); + if (rc != dsz) + return rc; + if (tipc_msg_assemble(list)) + return dsz; return -ENOMEM; + } skb_orphan(skb); __skb_queue_tail(list, skb); skb_copy_to_linear_data(skb, mhdr, mhsz); @@ -589,6 +602,30 @@ bool tipc_msg_lookup_dest(struct net *net, struct sk_buff *skb, int *err) return true; } +/* tipc_msg_assemble() - assemble chain of fragments into one message + */ +bool tipc_msg_assemble(struct sk_buff_head *list) +{ + struct sk_buff *skb, *tmp = NULL; + + if (skb_queue_len(list) == 1) + return true; + + while ((skb = __skb_dequeue(list))) { + skb->next = NULL; + if (tipc_buf_append(&tmp, &skb)) { + __skb_queue_tail(list, skb); + return true; + } + if (!tmp) + break; + } + __skb_queue_purge(list); + __skb_queue_head_init(list); + pr_warn("Failed do assemble buffer\n"); + return false; +} + /* tipc_msg_reassemble() - clone a buffer chain of fragments and * reassemble the clones into one message */ diff --git a/net/tipc/msg.h b/net/tipc/msg.h index 3e4384c222f7..b4ba1b4f9ae7 100644 --- a/net/tipc/msg.h +++ b/net/tipc/msg.h @@ -98,7 +98,7 @@ struct plist; #define MAX_H_SIZE 60 /* Largest possible TIPC header size */ #define MAX_MSG_SIZE (MAX_H_SIZE + TIPC_MAX_USER_MSG_SIZE) - +#define FB_MTU 3744 #define TIPC_MEDIA_INFO_OFFSET 5 struct tipc_skb_cb { @@ -943,6 +943,7 @@ bool tipc_msg_extract(struct sk_buff *skb, struct sk_buff **iskb, int *pos); int tipc_msg_build(struct tipc_msg *mhdr, struct msghdr *m, int offset, int dsz, int mtu, struct sk_buff_head *list); bool tipc_msg_lookup_dest(struct net *net, struct sk_buff *skb, int *err); +bool tipc_msg_assemble(struct sk_buff_head *list); bool tipc_msg_reassemble(struct sk_buff_head *list, struct sk_buff_head *rcvq); bool tipc_msg_pskb_copy(u32 dst, struct sk_buff_head *msg, struct sk_buff_head *cpy); diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c index 30e5746085b8..c61a7d46b412 100644 --- a/net/xfrm/xfrm_device.c +++ b/net/xfrm/xfrm_device.c @@ -120,8 +120,8 @@ bool xfrm_dev_offload_ok(struct sk_buff *skb, struct xfrm_state *x) if (!x->type_offload || x->encap) return false; - if ((x->xso.offload_handle && (dev == dst->path->dev)) && - !dst->child->xfrm && x->type->get_mtu) { + if ((x->xso.offload_handle && (dev == xfrm_dst_path(dst)->dev)) && + !xdst->child->xfrm && x->type->get_mtu) { mtu = x->type->get_mtu(x, xdst->child_mtu_cached); if (skb->len <= mtu) diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c index 73ad8c8ef344..23468672a767 100644 --- a/net/xfrm/xfrm_output.c +++ b/net/xfrm/xfrm_output.c @@ -44,7 +44,7 @@ static int xfrm_skb_check_space(struct sk_buff *skb) static struct dst_entry *skb_dst_pop(struct sk_buff *skb) { - struct dst_entry *child = dst_clone(skb_dst(skb)->child); + struct dst_entry *child = dst_clone(xfrm_dst_child(skb_dst(skb))); skb_dst_drop(skb); return child; diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index 9542975eb2f9..22e3350013b4 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -54,7 +54,7 @@ static struct xfrm_policy_afinfo const __rcu *xfrm_policy_afinfo[AF_INET6 + 1] static struct kmem_cache *xfrm_dst_cache __read_mostly; static __read_mostly seqcount_t xfrm_policy_hash_generation; -static void xfrm_init_pmtu(struct dst_entry *dst); +static void xfrm_init_pmtu(struct xfrm_dst **bundle, int nr); static int stale_bundle(struct dst_entry *dst); static int xfrm_bundle_ok(struct xfrm_dst *xdst); static void xfrm_policy_queue_process(struct timer_list *t); @@ -1538,7 +1538,9 @@ static inline int xfrm_fill_dst(struct xfrm_dst *xdst, struct net_device *dev, */ static struct dst_entry *xfrm_bundle_create(struct xfrm_policy *policy, - struct xfrm_state **xfrm, int nx, + struct xfrm_state **xfrm, + struct xfrm_dst **bundle, + int nx, const struct flowi *fl, struct dst_entry *dst) { @@ -1546,8 +1548,8 @@ static struct dst_entry *xfrm_bundle_create(struct xfrm_policy *policy, unsigned long now = jiffies; struct net_device *dev; struct xfrm_mode *inner_mode; - struct dst_entry *dst_prev = NULL; - struct dst_entry *dst0 = NULL; + struct xfrm_dst *xdst_prev = NULL; + struct xfrm_dst *xdst0 = NULL; int i = 0; int err; int header_len = 0; @@ -1573,13 +1575,14 @@ static struct dst_entry *xfrm_bundle_create(struct xfrm_policy *policy, goto put_states; } - if (!dst_prev) - dst0 = dst1; + bundle[i] = xdst; + if (!xdst_prev) + xdst0 = xdst; else /* Ref count is taken during xfrm_alloc_dst() * No need to do dst_clone() on dst1 */ - dst_prev->child = dst1; + xfrm_dst_set_child(xdst_prev, &xdst->u.dst); if (xfrm[i]->sel.family == AF_UNSPEC) { inner_mode = xfrm_ip2inner_mode(xfrm[i], @@ -1616,8 +1619,7 @@ static struct dst_entry *xfrm_bundle_create(struct xfrm_policy *policy, dst1->input = dst_discard; dst1->output = inner_mode->afinfo->output; - dst1->next = dst_prev; - dst_prev = dst1; + xdst_prev = xdst; header_len += xfrm[i]->props.header_len; if (xfrm[i]->type->flags & XFRM_TYPE_NON_FRAGMENT) @@ -1625,40 +1627,39 @@ static struct dst_entry *xfrm_bundle_create(struct xfrm_policy *policy, trailer_len += xfrm[i]->props.trailer_len; } - dst_prev->child = dst; - dst0->path = dst; + xfrm_dst_set_child(xdst_prev, dst); + xdst0->path = dst; err = -ENODEV; dev = dst->dev; if (!dev) goto free_dst; - xfrm_init_path((struct xfrm_dst *)dst0, dst, nfheader_len); - xfrm_init_pmtu(dst_prev); + xfrm_init_path(xdst0, dst, nfheader_len); + xfrm_init_pmtu(bundle, nx); - for (dst_prev = dst0; dst_prev != dst; dst_prev = dst_prev->child) { - struct xfrm_dst *xdst = (struct xfrm_dst *)dst_prev; - - err = xfrm_fill_dst(xdst, dev, fl); + for (xdst_prev = xdst0; xdst_prev != (struct xfrm_dst *)dst; + xdst_prev = (struct xfrm_dst *) xfrm_dst_child(&xdst_prev->u.dst)) { + err = xfrm_fill_dst(xdst_prev, dev, fl); if (err) goto free_dst; - dst_prev->header_len = header_len; - dst_prev->trailer_len = trailer_len; - header_len -= xdst->u.dst.xfrm->props.header_len; - trailer_len -= xdst->u.dst.xfrm->props.trailer_len; + xdst_prev->u.dst.header_len = header_len; + xdst_prev->u.dst.trailer_len = trailer_len; + header_len -= xdst_prev->u.dst.xfrm->props.header_len; + trailer_len -= xdst_prev->u.dst.xfrm->props.trailer_len; } out: - return dst0; + return &xdst0->u.dst; put_states: for (; i < nx; i++) xfrm_state_put(xfrm[i]); free_dst: - if (dst0) - dst_release_immediate(dst0); - dst0 = ERR_PTR(err); + if (xdst0) + dst_release_immediate(&xdst0->u.dst); + xdst0 = ERR_PTR(err); goto out; } @@ -1800,7 +1801,7 @@ static bool xfrm_xdst_can_reuse(struct xfrm_dst *xdst, for (i = 0; i < num; i++) { if (!dst || dst->xfrm != xfrm[i]) return false; - dst = dst->child; + dst = xfrm_dst_child(dst); } return xfrm_bundle_ok(xdst); @@ -1813,6 +1814,7 @@ xfrm_resolve_and_create_bundle(struct xfrm_policy **pols, int num_pols, { struct net *net = xp_net(pols[0]); struct xfrm_state *xfrm[XFRM_MAX_DEPTH]; + struct xfrm_dst *bundle[XFRM_MAX_DEPTH]; struct xfrm_dst *xdst, *old; struct dst_entry *dst; int err; @@ -1840,7 +1842,7 @@ xfrm_resolve_and_create_bundle(struct xfrm_policy **pols, int num_pols, old = xdst; - dst = xfrm_bundle_create(pols[0], xfrm, err, fl, dst_orig); + dst = xfrm_bundle_create(pols[0], xfrm, bundle, err, fl, dst_orig); if (IS_ERR(dst)) { XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTBUNDLEGENERROR); return ERR_CAST(dst); @@ -1880,8 +1882,8 @@ static void xfrm_policy_queue_process(struct timer_list *t) xfrm_decode_session(skb, &fl, dst->ops->family); spin_unlock(&pq->hold_queue.lock); - dst_hold(dst->path); - dst = xfrm_lookup(net, dst->path, &fl, sk, 0); + dst_hold(xfrm_dst_path(dst)); + dst = xfrm_lookup(net, xfrm_dst_path(dst), &fl, sk, 0); if (IS_ERR(dst)) goto purge_queue; @@ -1910,8 +1912,8 @@ static void xfrm_policy_queue_process(struct timer_list *t) skb = __skb_dequeue(&list); xfrm_decode_session(skb, &fl, skb_dst(skb)->ops->family); - dst_hold(skb_dst(skb)->path); - dst = xfrm_lookup(net, skb_dst(skb)->path, &fl, skb->sk, 0); + dst_hold(xfrm_dst_path(skb_dst(skb))); + dst = xfrm_lookup(net, xfrm_dst_path(skb_dst(skb)), &fl, skb->sk, 0); if (IS_ERR(dst)) { kfree_skb(skb); continue; @@ -2012,8 +2014,8 @@ static struct xfrm_dst *xfrm_create_dummy_bundle(struct net *net, dst1->output = xdst_queue_output; dst_hold(dst); - dst1->child = dst; - dst1->path = dst; + xfrm_dst_set_child(xdst, dst); + xdst->path = dst; xfrm_init_path((struct xfrm_dst *)dst1, dst, 0); @@ -2576,7 +2578,7 @@ static int stale_bundle(struct dst_entry *dst) void xfrm_dst_ifdown(struct dst_entry *dst, struct net_device *dev) { - while ((dst = dst->child) && dst->xfrm && dst->dev == dev) { + while ((dst = xfrm_dst_child(dst)) && dst->xfrm && dst->dev == dev) { dst->dev = dev_net(dev)->loopback_dev; dev_hold(dst->dev); dev_put(dev); @@ -2600,13 +2602,15 @@ static struct dst_entry *xfrm_negative_advice(struct dst_entry *dst) return dst; } -static void xfrm_init_pmtu(struct dst_entry *dst) +static void xfrm_init_pmtu(struct xfrm_dst **bundle, int nr) { - do { - struct xfrm_dst *xdst = (struct xfrm_dst *)dst; + while (nr--) { + struct xfrm_dst *xdst = bundle[nr]; u32 pmtu, route_mtu_cached; + struct dst_entry *dst; - pmtu = dst_mtu(dst->child); + dst = &xdst->u.dst; + pmtu = dst_mtu(xfrm_dst_child(dst)); xdst->child_mtu_cached = pmtu; pmtu = xfrm_state_mtu(dst->xfrm, pmtu); @@ -2618,7 +2622,7 @@ static void xfrm_init_pmtu(struct dst_entry *dst) pmtu = route_mtu_cached; dst_metric_set(dst, RTAX_MTU, pmtu); - } while ((dst = dst->next)); + } } /* Check that the bundle accepts the flow and its components are @@ -2627,19 +2631,20 @@ static void xfrm_init_pmtu(struct dst_entry *dst) static int xfrm_bundle_ok(struct xfrm_dst *first) { + struct xfrm_dst *bundle[XFRM_MAX_DEPTH]; struct dst_entry *dst = &first->u.dst; - struct xfrm_dst *last; + struct xfrm_dst *xdst; + int start_from, nr; u32 mtu; - if (!dst_check(dst->path, ((struct xfrm_dst *)dst)->path_cookie) || + if (!dst_check(xfrm_dst_path(dst), ((struct xfrm_dst *)dst)->path_cookie) || (dst->dev && !netif_running(dst->dev))) return 0; if (dst->flags & DST_XFRM_QUEUE) return 1; - last = NULL; - + start_from = nr = 0; do { struct xfrm_dst *xdst = (struct xfrm_dst *)dst; @@ -2651,9 +2656,11 @@ static int xfrm_bundle_ok(struct xfrm_dst *first) xdst->policy_genid != atomic_read(&xdst->pols[0]->genid)) return 0; - mtu = dst_mtu(dst->child); + bundle[nr++] = xdst; + + mtu = dst_mtu(xfrm_dst_child(dst)); if (xdst->child_mtu_cached != mtu) { - last = xdst; + start_from = nr; xdst->child_mtu_cached = mtu; } @@ -2661,30 +2668,30 @@ static int xfrm_bundle_ok(struct xfrm_dst *first) return 0; mtu = dst_mtu(xdst->route); if (xdst->route_mtu_cached != mtu) { - last = xdst; + start_from = nr; xdst->route_mtu_cached = mtu; } - dst = dst->child; + dst = xfrm_dst_child(dst); } while (dst->xfrm); - if (likely(!last)) + if (likely(!start_from)) return 1; - mtu = last->child_mtu_cached; - for (;;) { - dst = &last->u.dst; + xdst = bundle[start_from - 1]; + mtu = xdst->child_mtu_cached; + while (start_from--) { + dst = &xdst->u.dst; mtu = xfrm_state_mtu(dst->xfrm, mtu); - if (mtu > last->route_mtu_cached) - mtu = last->route_mtu_cached; + if (mtu > xdst->route_mtu_cached) + mtu = xdst->route_mtu_cached; dst_metric_set(dst, RTAX_MTU, mtu); - - if (last == first) + if (!start_from) break; - last = (struct xfrm_dst *)last->u.dst.next; - last->child_mtu_cached = mtu; + xdst = bundle[start_from - 1]; + xdst->child_mtu_cached = mtu; } return 1; @@ -2692,22 +2699,20 @@ static int xfrm_bundle_ok(struct xfrm_dst *first) static unsigned int xfrm_default_advmss(const struct dst_entry *dst) { - return dst_metric_advmss(dst->path); + return dst_metric_advmss(xfrm_dst_path(dst)); } static unsigned int xfrm_mtu(const struct dst_entry *dst) { unsigned int mtu = dst_metric_raw(dst, RTAX_MTU); - return mtu ? : dst_mtu(dst->path); + return mtu ? : dst_mtu(xfrm_dst_path(dst)); } static const void *xfrm_get_dst_nexthop(const struct dst_entry *dst, const void *daddr) { - const struct dst_entry *path = dst->path; - - for (; dst != path; dst = dst->child) { + while (dst->xfrm) { const struct xfrm_state *xfrm = dst->xfrm; if (xfrm->props.mode == XFRM_MODE_TRANSPORT) @@ -2716,6 +2721,8 @@ static const void *xfrm_get_dst_nexthop(const struct dst_entry *dst, daddr = xfrm->coaddr; else if (!(xfrm->type->flags & XFRM_TYPE_LOCAL_COADDR)) daddr = &xfrm->id.daddr; + + dst = xfrm_dst_child(dst); } return daddr; } @@ -2724,7 +2731,7 @@ static struct neighbour *xfrm_neigh_lookup(const struct dst_entry *dst, struct sk_buff *skb, const void *daddr) { - const struct dst_entry *path = dst->path; + const struct dst_entry *path = xfrm_dst_path(dst); if (!skb) daddr = xfrm_get_dst_nexthop(dst, daddr); @@ -2733,7 +2740,7 @@ static struct neighbour *xfrm_neigh_lookup(const struct dst_entry *dst, static void xfrm_confirm_neigh(const struct dst_entry *dst, const void *daddr) { - const struct dst_entry *path = dst->path; + const struct dst_entry *path = xfrm_dst_path(dst); daddr = xfrm_get_dst_nexthop(dst, daddr); path->ops->confirm_neigh(path, daddr); diff --git a/samples/bpf/tcbpf2_kern.c b/samples/bpf/tcbpf2_kern.c index 370b749f5ee6..79ad061079dd 100644 --- a/samples/bpf/tcbpf2_kern.c +++ b/samples/bpf/tcbpf2_kern.c @@ -81,6 +81,49 @@ int _gre_get_tunnel(struct __sk_buff *skb) return TC_ACT_OK; } +SEC("ip6gretap_set_tunnel") +int _ip6gretap_set_tunnel(struct __sk_buff *skb) +{ + struct bpf_tunnel_key key; + int ret; + + __builtin_memset(&key, 0x0, sizeof(key)); + key.remote_ipv6[3] = _htonl(0x11); /* ::11 */ + key.tunnel_id = 2; + key.tunnel_tos = 0; + key.tunnel_ttl = 64; + key.tunnel_label = 0xabcde; + + ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key), + BPF_F_TUNINFO_IPV6 | BPF_F_ZERO_CSUM_TX); + if (ret < 0) { + ERROR(ret); + return TC_ACT_SHOT; + } + + return TC_ACT_OK; +} + +SEC("ip6gretap_get_tunnel") +int _ip6gretap_get_tunnel(struct __sk_buff *skb) +{ + char fmt[] = "key %d remote ip6 ::%x label %x\n"; + struct bpf_tunnel_key key; + int ret; + + ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), + BPF_F_TUNINFO_IPV6); + if (ret < 0) { + ERROR(ret); + return TC_ACT_SHOT; + } + + bpf_trace_printk(fmt, sizeof(fmt), + key.tunnel_id, key.remote_ipv6[3], key.tunnel_label); + + return TC_ACT_OK; +} + SEC("erspan_set_tunnel") int _erspan_set_tunnel(struct __sk_buff *skb) { @@ -138,6 +181,64 @@ int _erspan_get_tunnel(struct __sk_buff *skb) return TC_ACT_OK; } +SEC("ip4ip6erspan_set_tunnel") +int _ip4ip6erspan_set_tunnel(struct __sk_buff *skb) +{ + struct bpf_tunnel_key key; + struct erspan_metadata md; + int ret; + + __builtin_memset(&key, 0x0, sizeof(key)); + key.remote_ipv6[3] = _htonl(0x11); + key.tunnel_id = 2; + key.tunnel_tos = 0; + key.tunnel_ttl = 64; + + ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key), + BPF_F_TUNINFO_IPV6); + if (ret < 0) { + ERROR(ret); + return TC_ACT_SHOT; + } + + md.index = htonl(123); + ret = bpf_skb_set_tunnel_opt(skb, &md, sizeof(md)); + if (ret < 0) { + ERROR(ret); + return TC_ACT_SHOT; + } + + return TC_ACT_OK; +} + +SEC("ip4ip6erspan_get_tunnel") +int _ip4ip6erspan_get_tunnel(struct __sk_buff *skb) +{ + char fmt[] = "key %d remote ip6 ::%x erspan index 0x%x\n"; + struct bpf_tunnel_key key; + struct erspan_metadata md; + u32 index; + int ret; + + ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), BPF_F_TUNINFO_IPV6); + if (ret < 0) { + ERROR(ret); + return TC_ACT_SHOT; + } + + ret = bpf_skb_get_tunnel_opt(skb, &md, sizeof(md)); + if (ret < 0) { + ERROR(ret); + return TC_ACT_SHOT; + } + + index = bpf_ntohl(md.index); + bpf_trace_printk(fmt, sizeof(fmt), + key.tunnel_id, key.remote_ipv6[0], index); + + return TC_ACT_OK; +} + SEC("vxlan_set_tunnel") int _vxlan_set_tunnel(struct __sk_buff *skb) { diff --git a/samples/bpf/test_cgrp2_attach2.c b/samples/bpf/test_cgrp2_attach2.c index 3e8232cc04a8..1af412ec6007 100644 --- a/samples/bpf/test_cgrp2_attach2.c +++ b/samples/bpf/test_cgrp2_attach2.c @@ -78,7 +78,8 @@ static int test_foo_bar(void) if (join_cgroup(FOO)) goto err; - if (bpf_prog_attach(drop_prog, foo, BPF_CGROUP_INET_EGRESS, 1)) { + if (bpf_prog_attach(drop_prog, foo, BPF_CGROUP_INET_EGRESS, + BPF_F_ALLOW_OVERRIDE)) { log_err("Attaching prog to /foo"); goto err; } @@ -97,7 +98,8 @@ static int test_foo_bar(void) printf("Attached DROP prog. This ping in cgroup /foo/bar should fail...\n"); assert(system(PING_CMD) != 0); - if (bpf_prog_attach(allow_prog, bar, BPF_CGROUP_INET_EGRESS, 1)) { + if (bpf_prog_attach(allow_prog, bar, BPF_CGROUP_INET_EGRESS, + BPF_F_ALLOW_OVERRIDE)) { log_err("Attaching prog to /foo/bar"); goto err; } @@ -114,7 +116,8 @@ static int test_foo_bar(void) "This ping in cgroup /foo/bar should fail...\n"); assert(system(PING_CMD) != 0); - if (bpf_prog_attach(allow_prog, bar, BPF_CGROUP_INET_EGRESS, 1)) { + if (bpf_prog_attach(allow_prog, bar, BPF_CGROUP_INET_EGRESS, + BPF_F_ALLOW_OVERRIDE)) { log_err("Attaching prog to /foo/bar"); goto err; } @@ -128,7 +131,8 @@ static int test_foo_bar(void) "This ping in cgroup /foo/bar should pass...\n"); assert(system(PING_CMD) == 0); - if (bpf_prog_attach(allow_prog, bar, BPF_CGROUP_INET_EGRESS, 1)) { + if (bpf_prog_attach(allow_prog, bar, BPF_CGROUP_INET_EGRESS, + BPF_F_ALLOW_OVERRIDE)) { log_err("Attaching prog to /foo/bar"); goto err; } @@ -161,13 +165,15 @@ static int test_foo_bar(void) goto err; } - if (!bpf_prog_attach(allow_prog, bar, BPF_CGROUP_INET_EGRESS, 1)) { + if (!bpf_prog_attach(allow_prog, bar, BPF_CGROUP_INET_EGRESS, + BPF_F_ALLOW_OVERRIDE)) { errno = 0; log_err("Unexpected success attaching overridable prog to /foo/bar"); goto err; } - if (!bpf_prog_attach(allow_prog, foo, BPF_CGROUP_INET_EGRESS, 1)) { + if (!bpf_prog_attach(allow_prog, foo, BPF_CGROUP_INET_EGRESS, + BPF_F_ALLOW_OVERRIDE)) { errno = 0; log_err("Unexpected success attaching overridable prog to /foo"); goto err; @@ -273,27 +279,33 @@ static int test_multiprog(void) if (join_cgroup("/cg1/cg2/cg3/cg4/cg5")) goto err; - if (bpf_prog_attach(allow_prog[0], cg1, BPF_CGROUP_INET_EGRESS, 2)) { + if (bpf_prog_attach(allow_prog[0], cg1, BPF_CGROUP_INET_EGRESS, + BPF_F_ALLOW_MULTI)) { log_err("Attaching prog to cg1"); goto err; } - if (!bpf_prog_attach(allow_prog[0], cg1, BPF_CGROUP_INET_EGRESS, 2)) { + if (!bpf_prog_attach(allow_prog[0], cg1, BPF_CGROUP_INET_EGRESS, + BPF_F_ALLOW_MULTI)) { log_err("Unexpected success attaching the same prog to cg1"); goto err; } - if (bpf_prog_attach(allow_prog[1], cg1, BPF_CGROUP_INET_EGRESS, 2)) { + if (bpf_prog_attach(allow_prog[1], cg1, BPF_CGROUP_INET_EGRESS, + BPF_F_ALLOW_MULTI)) { log_err("Attaching prog2 to cg1"); goto err; } - if (bpf_prog_attach(allow_prog[2], cg2, BPF_CGROUP_INET_EGRESS, 1)) { + if (bpf_prog_attach(allow_prog[2], cg2, BPF_CGROUP_INET_EGRESS, + BPF_F_ALLOW_OVERRIDE)) { log_err("Attaching prog to cg2"); goto err; } - if (bpf_prog_attach(allow_prog[3], cg3, BPF_CGROUP_INET_EGRESS, 2)) { + if (bpf_prog_attach(allow_prog[3], cg3, BPF_CGROUP_INET_EGRESS, + BPF_F_ALLOW_MULTI)) { log_err("Attaching prog to cg3"); goto err; } - if (bpf_prog_attach(allow_prog[4], cg4, BPF_CGROUP_INET_EGRESS, 1)) { + if (bpf_prog_attach(allow_prog[4], cg4, BPF_CGROUP_INET_EGRESS, + BPF_F_ALLOW_OVERRIDE)) { log_err("Attaching prog to cg4"); goto err; } diff --git a/samples/bpf/test_tunnel_bpf.sh b/samples/bpf/test_tunnel_bpf.sh index 312e1722a39f..f53efb62f699 100755 --- a/samples/bpf/test_tunnel_bpf.sh +++ b/samples/bpf/test_tunnel_bpf.sh @@ -33,6 +33,30 @@ function add_gre_tunnel { ip addr add dev $DEV 10.1.1.200/24 } +function add_ip6gretap_tunnel { + + # assign ipv6 address + ip netns exec at_ns0 ip addr add ::11/96 dev veth0 + ip netns exec at_ns0 ip link set dev veth0 up + ip addr add dev veth1 ::22/96 + ip link set dev veth1 up + + # in namespace + ip netns exec at_ns0 \ + ip link add dev $DEV_NS type $TYPE flowlabel 0xbcdef key 2 \ + local ::11 remote ::22 + + ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24 + ip netns exec at_ns0 ip addr add dev $DEV_NS fc80::100/96 + ip netns exec at_ns0 ip link set dev $DEV_NS up + + # out of namespace + ip link add dev $DEV type $TYPE external + ip addr add dev $DEV 10.1.1.200/24 + ip addr add dev $DEV fc80::200/24 + ip link set dev $DEV up +} + function add_erspan_tunnel { # in namespace ip netns exec at_ns0 \ @@ -46,6 +70,28 @@ function add_erspan_tunnel { ip addr add dev $DEV 10.1.1.200/24 } +function add_ip6erspan_tunnel { + + # assign ipv6 address + ip netns exec at_ns0 ip addr add ::11/96 dev veth0 + ip netns exec at_ns0 ip link set dev veth0 up + ip addr add dev veth1 ::22/96 + ip link set dev veth1 up + + # in namespace + ip netns exec at_ns0 \ + ip link add dev $DEV_NS type $TYPE seq key 2 erspan 123 \ + local ::11 remote ::22 + + ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24 + ip netns exec at_ns0 ip link set dev $DEV_NS up + + # out of namespace + ip link add dev $DEV type $TYPE external + ip addr add dev $DEV 10.1.1.200/24 + ip link set dev $DEV up +} + function add_vxlan_tunnel { # Set static ARP entry here because iptables set-mark works # on L3 packet, as a result not applying to ARP packets, @@ -113,6 +159,41 @@ function test_gre { cleanup } +function test_ip6gre { + TYPE=ip6gre + DEV_NS=ip6gre00 + DEV=ip6gre11 + config_device + # reuse the ip6gretap function + add_ip6gretap_tunnel + attach_bpf $DEV ip6gretap_set_tunnel ip6gretap_get_tunnel + # underlay + ping6 -c 4 ::11 + # overlay: ipv4 over ipv6 + ip netns exec at_ns0 ping -c 1 10.1.1.200 + ping -c 1 10.1.1.100 + # overlay: ipv6 over ipv6 + ip netns exec at_ns0 ping6 -c 1 fc80::200 + cleanup +} + +function test_ip6gretap { + TYPE=ip6gretap + DEV_NS=ip6gretap00 + DEV=ip6gretap11 + config_device + add_ip6gretap_tunnel + attach_bpf $DEV ip6gretap_set_tunnel ip6gretap_get_tunnel + # underlay + ping6 -c 4 ::11 + # overlay: ipv4 over ipv6 + ip netns exec at_ns0 ping -i .2 -c 1 10.1.1.200 + ping -c 1 10.1.1.100 + # overlay: ipv6 over ipv6 + ip netns exec at_ns0 ping6 -c 1 fc80::200 + cleanup +} + function test_erspan { TYPE=erspan DEV_NS=erspan00 @@ -125,6 +206,18 @@ function test_erspan { cleanup } +function test_ip6erspan { + TYPE=ip6erspan + DEV_NS=ip6erspan00 + DEV=ip6erspan11 + config_device + add_ip6erspan_tunnel + attach_bpf $DEV ip4ip6erspan_set_tunnel ip4ip6erspan_get_tunnel + ping6 -c 3 ::11 + ip netns exec at_ns0 ping -c 1 10.1.1.200 + cleanup +} + function test_vxlan { TYPE=vxlan DEV_NS=vxlan00 @@ -175,9 +268,12 @@ function cleanup { ip link del veth1 ip link del ipip11 ip link del gretap11 + ip link del ip6gre11 + ip link del ip6gretap11 ip link del vxlan11 ip link del geneve11 ip link del erspan11 + ip link del ip6erspan11 pkill tcpdump pkill cat set -ex @@ -187,8 +283,14 @@ trap cleanup 0 2 3 6 9 cleanup echo "Testing GRE tunnel..." test_gre +echo "Testing IP6GRE tunnel..." +test_ip6gre +echo "Testing IP6GRETAP tunnel..." +test_ip6gretap echo "Testing ERSPAN tunnel..." test_erspan +echo "Testing IP6ERSPAN tunnel..." +test_ip6erspan echo "Testing VXLAN tunnel..." test_vxlan echo "Testing GENEVE tunnel..." diff --git a/security/selinux/xfrm.c b/security/selinux/xfrm.c index 56e354fcdfc6..928188902901 100644 --- a/security/selinux/xfrm.c +++ b/security/selinux/xfrm.c @@ -452,7 +452,7 @@ int selinux_xfrm_postroute_last(u32 sk_sid, struct sk_buff *skb, if (dst) { struct dst_entry *iter; - for (iter = dst; iter != NULL; iter = iter->child) { + for (iter = dst; iter != NULL; iter = xfrm_dst_child(iter)) { struct xfrm_state *x = iter->xfrm; if (x && selinux_authorizable_xfrm(x)) diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile index 21a2d76b67dc..f309ab9afd9b 100644 --- a/tools/testing/selftests/bpf/Makefile +++ b/tools/testing/selftests/bpf/Makefile @@ -29,9 +29,10 @@ TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o test_obj_id.o \ test_pkt_md_access.o test_xdp_redirect.o test_xdp_meta.o sockmap_parse_prog.o \ - sockmap_verdict_prog.o dev_cgroup.o + sockmap_verdict_prog.o dev_cgroup.o sample_ret0.o -TEST_PROGS := test_kmod.sh test_xdp_redirect.sh test_xdp_meta.sh +TEST_PROGS := test_kmod.sh test_xdp_redirect.sh test_xdp_meta.sh \ + test_offload.py include ../lib.mk diff --git a/tools/testing/selftests/bpf/sample_ret0.c b/tools/testing/selftests/bpf/sample_ret0.c new file mode 100644 index 000000000000..fec99750d6ea --- /dev/null +++ b/tools/testing/selftests/bpf/sample_ret0.c @@ -0,0 +1,7 @@ +/* SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause) */ + +/* Sample program which should always load for testing control paths. */ +int func() +{ + return 0; +} diff --git a/tools/testing/selftests/bpf/test_align.c b/tools/testing/selftests/bpf/test_align.c index 8591c89c0828..fe916d29e166 100644 --- a/tools/testing/selftests/bpf/test_align.c +++ b/tools/testing/selftests/bpf/test_align.c @@ -64,11 +64,11 @@ static struct bpf_align_test tests[] = { .matches = { {1, "R1=ctx(id=0,off=0,imm=0)"}, {1, "R10=fp0"}, - {1, "R3=inv2"}, - {2, "R3=inv4"}, - {3, "R3=inv8"}, - {4, "R3=inv16"}, - {5, "R3=inv32"}, + {1, "R3_w=inv2"}, + {2, "R3_w=inv4"}, + {3, "R3_w=inv8"}, + {4, "R3_w=inv16"}, + {5, "R3_w=inv32"}, }, }, { @@ -92,17 +92,17 @@ static struct bpf_align_test tests[] = { .matches = { {1, "R1=ctx(id=0,off=0,imm=0)"}, {1, "R10=fp0"}, - {1, "R3=inv1"}, - {2, "R3=inv2"}, - {3, "R3=inv4"}, - {4, "R3=inv8"}, - {5, "R3=inv16"}, - {6, "R3=inv1"}, - {7, "R4=inv32"}, - {8, "R4=inv16"}, - {9, "R4=inv8"}, - {10, "R4=inv4"}, - {11, "R4=inv2"}, + {1, "R3_w=inv1"}, + {2, "R3_w=inv2"}, + {3, "R3_w=inv4"}, + {4, "R3_w=inv8"}, + {5, "R3_w=inv16"}, + {6, "R3_w=inv1"}, + {7, "R4_w=inv32"}, + {8, "R4_w=inv16"}, + {9, "R4_w=inv8"}, + {10, "R4_w=inv4"}, + {11, "R4_w=inv2"}, }, }, { @@ -121,12 +121,12 @@ static struct bpf_align_test tests[] = { .matches = { {1, "R1=ctx(id=0,off=0,imm=0)"}, {1, "R10=fp0"}, - {1, "R3=inv4"}, - {2, "R3=inv8"}, - {3, "R3=inv10"}, - {4, "R4=inv8"}, - {5, "R4=inv12"}, - {6, "R4=inv14"}, + {1, "R3_w=inv4"}, + {2, "R3_w=inv8"}, + {3, "R3_w=inv10"}, + {4, "R4_w=inv8"}, + {5, "R4_w=inv12"}, + {6, "R4_w=inv14"}, }, }, { @@ -143,10 +143,10 @@ static struct bpf_align_test tests[] = { .matches = { {1, "R1=ctx(id=0,off=0,imm=0)"}, {1, "R10=fp0"}, - {1, "R3=inv7"}, - {2, "R3=inv7"}, - {3, "R3=inv14"}, - {4, "R3=inv56"}, + {1, "R3_w=inv7"}, + {2, "R3_w=inv7"}, + {3, "R3_w=inv14"}, + {4, "R3_w=inv56"}, }, }, @@ -185,18 +185,18 @@ static struct bpf_align_test tests[] = { .prog_type = BPF_PROG_TYPE_SCHED_CLS, .matches = { {7, "R0=pkt(id=0,off=8,r=8,imm=0)"}, - {7, "R3=inv(id=0,umax_value=255,var_off=(0x0; 0xff))"}, - {8, "R3=inv(id=0,umax_value=510,var_off=(0x0; 0x1fe))"}, - {9, "R3=inv(id=0,umax_value=1020,var_off=(0x0; 0x3fc))"}, - {10, "R3=inv(id=0,umax_value=2040,var_off=(0x0; 0x7f8))"}, - {11, "R3=inv(id=0,umax_value=4080,var_off=(0x0; 0xff0))"}, + {7, "R3_w=inv(id=0,umax_value=255,var_off=(0x0; 0xff))"}, + {8, "R3_w=inv(id=0,umax_value=510,var_off=(0x0; 0x1fe))"}, + {9, "R3_w=inv(id=0,umax_value=1020,var_off=(0x0; 0x3fc))"}, + {10, "R3_w=inv(id=0,umax_value=2040,var_off=(0x0; 0x7f8))"}, + {11, "R3_w=inv(id=0,umax_value=4080,var_off=(0x0; 0xff0))"}, {18, "R3=pkt_end(id=0,off=0,imm=0)"}, - {18, "R4=inv(id=0,umax_value=255,var_off=(0x0; 0xff))"}, - {19, "R4=inv(id=0,umax_value=8160,var_off=(0x0; 0x1fe0))"}, - {20, "R4=inv(id=0,umax_value=4080,var_off=(0x0; 0xff0))"}, - {21, "R4=inv(id=0,umax_value=2040,var_off=(0x0; 0x7f8))"}, - {22, "R4=inv(id=0,umax_value=1020,var_off=(0x0; 0x3fc))"}, - {23, "R4=inv(id=0,umax_value=510,var_off=(0x0; 0x1fe))"}, + {18, "R4_w=inv(id=0,umax_value=255,var_off=(0x0; 0xff))"}, + {19, "R4_w=inv(id=0,umax_value=8160,var_off=(0x0; 0x1fe0))"}, + {20, "R4_w=inv(id=0,umax_value=4080,var_off=(0x0; 0xff0))"}, + {21, "R4_w=inv(id=0,umax_value=2040,var_off=(0x0; 0x7f8))"}, + {22, "R4_w=inv(id=0,umax_value=1020,var_off=(0x0; 0x3fc))"}, + {23, "R4_w=inv(id=0,umax_value=510,var_off=(0x0; 0x1fe))"}, }, }, { @@ -217,16 +217,16 @@ static struct bpf_align_test tests[] = { }, .prog_type = BPF_PROG_TYPE_SCHED_CLS, .matches = { - {7, "R3=inv(id=0,umax_value=255,var_off=(0x0; 0xff))"}, - {8, "R4=inv(id=0,umax_value=255,var_off=(0x0; 0xff))"}, - {9, "R4=inv(id=0,umax_value=255,var_off=(0x0; 0xff))"}, - {10, "R4=inv(id=0,umax_value=255,var_off=(0x0; 0xff))"}, - {11, "R4=inv(id=0,umax_value=510,var_off=(0x0; 0x1fe))"}, - {12, "R4=inv(id=0,umax_value=255,var_off=(0x0; 0xff))"}, - {13, "R4=inv(id=0,umax_value=1020,var_off=(0x0; 0x3fc))"}, - {14, "R4=inv(id=0,umax_value=255,var_off=(0x0; 0xff))"}, - {15, "R4=inv(id=0,umax_value=2040,var_off=(0x0; 0x7f8))"}, - {16, "R4=inv(id=0,umax_value=4080,var_off=(0x0; 0xff0))"}, + {7, "R3_w=inv(id=0,umax_value=255,var_off=(0x0; 0xff))"}, + {8, "R4_w=inv(id=0,umax_value=255,var_off=(0x0; 0xff))"}, + {9, "R4_w=inv(id=0,umax_value=255,var_off=(0x0; 0xff))"}, + {10, "R4_w=inv(id=0,umax_value=255,var_off=(0x0; 0xff))"}, + {11, "R4_w=inv(id=0,umax_value=510,var_off=(0x0; 0x1fe))"}, + {12, "R4_w=inv(id=0,umax_value=255,var_off=(0x0; 0xff))"}, + {13, "R4_w=inv(id=0,umax_value=1020,var_off=(0x0; 0x3fc))"}, + {14, "R4_w=inv(id=0,umax_value=255,var_off=(0x0; 0xff))"}, + {15, "R4_w=inv(id=0,umax_value=2040,var_off=(0x0; 0x7f8))"}, + {16, "R4_w=inv(id=0,umax_value=4080,var_off=(0x0; 0xff0))"}, }, }, { @@ -257,14 +257,14 @@ static struct bpf_align_test tests[] = { }, .prog_type = BPF_PROG_TYPE_SCHED_CLS, .matches = { - {4, "R5=pkt(id=0,off=0,r=0,imm=0)"}, - {5, "R5=pkt(id=0,off=14,r=0,imm=0)"}, - {6, "R4=pkt(id=0,off=14,r=0,imm=0)"}, + {4, "R5_w=pkt(id=0,off=0,r=0,imm=0)"}, + {5, "R5_w=pkt(id=0,off=14,r=0,imm=0)"}, + {6, "R4_w=pkt(id=0,off=14,r=0,imm=0)"}, {10, "R2=pkt(id=0,off=0,r=18,imm=0)"}, {10, "R5=pkt(id=0,off=14,r=18,imm=0)"}, - {10, "R4=inv(id=0,umax_value=255,var_off=(0x0; 0xff))"}, - {14, "R4=inv(id=0,umax_value=65535,var_off=(0x0; 0xffff))"}, - {15, "R4=inv(id=0,umax_value=65535,var_off=(0x0; 0xffff))"}, + {10, "R4_w=inv(id=0,umax_value=255,var_off=(0x0; 0xff))"}, + {14, "R4_w=inv(id=0,umax_value=65535,var_off=(0x0; 0xffff))"}, + {15, "R4_w=inv(id=0,umax_value=65535,var_off=(0x0; 0xffff))"}, }, }, { @@ -320,11 +320,11 @@ static struct bpf_align_test tests[] = { * alignment of 4. */ {8, "R2=pkt(id=0,off=0,r=8,imm=0)"}, - {8, "R6=inv(id=0,umax_value=1020,var_off=(0x0; 0x3fc))"}, + {8, "R6_w=inv(id=0,umax_value=1020,var_off=(0x0; 0x3fc))"}, /* Offset is added to packet pointer R5, resulting in * known fixed offset, and variable offset from R6. */ - {11, "R5=pkt(id=1,off=14,r=0,umax_value=1020,var_off=(0x0; 0x3fc))"}, + {11, "R5_w=pkt(id=1,off=14,r=0,umax_value=1020,var_off=(0x0; 0x3fc))"}, /* At the time the word size load is performed from R5, * it's total offset is NET_IP_ALIGN + reg->off (0) + * reg->aux_off (14) which is 16. Then the variable @@ -336,11 +336,11 @@ static struct bpf_align_test tests[] = { /* Variable offset is added to R5 packet pointer, * resulting in auxiliary alignment of 4. */ - {18, "R5=pkt(id=2,off=0,r=0,umax_value=1020,var_off=(0x0; 0x3fc))"}, + {18, "R5_w=pkt(id=2,off=0,r=0,umax_value=1020,var_off=(0x0; 0x3fc))"}, /* Constant offset is added to R5, resulting in * reg->off of 14. */ - {19, "R5=pkt(id=2,off=14,r=0,umax_value=1020,var_off=(0x0; 0x3fc))"}, + {19, "R5_w=pkt(id=2,off=14,r=0,umax_value=1020,var_off=(0x0; 0x3fc))"}, /* At the time the word size load is performed from R5, * its total fixed offset is NET_IP_ALIGN + reg->off * (14) which is 16. Then the variable offset is 4-byte @@ -352,18 +352,18 @@ static struct bpf_align_test tests[] = { /* Constant offset is added to R5 packet pointer, * resulting in reg->off value of 14. */ - {26, "R5=pkt(id=0,off=14,r=8"}, + {26, "R5_w=pkt(id=0,off=14,r=8"}, /* Variable offset is added to R5, resulting in a * variable offset of (4n). */ - {27, "R5=pkt(id=3,off=14,r=0,umax_value=1020,var_off=(0x0; 0x3fc))"}, + {27, "R5_w=pkt(id=3,off=14,r=0,umax_value=1020,var_off=(0x0; 0x3fc))"}, /* Constant is added to R5 again, setting reg->off to 18. */ - {28, "R5=pkt(id=3,off=18,r=0,umax_value=1020,var_off=(0x0; 0x3fc))"}, + {28, "R5_w=pkt(id=3,off=18,r=0,umax_value=1020,var_off=(0x0; 0x3fc))"}, /* And once more we add a variable; resulting var_off * is still (4n), fixed offset is not changed. * Also, we create a new reg->id. */ - {29, "R5=pkt(id=4,off=18,r=0,umax_value=2040,var_off=(0x0; 0x7fc))"}, + {29, "R5_w=pkt(id=4,off=18,r=0,umax_value=2040,var_off=(0x0; 0x7fc))"}, /* At the time the word size load is performed from R5, * its total fixed offset is NET_IP_ALIGN + reg->off (18) * which is 20. Then the variable offset is (4n), so @@ -410,11 +410,11 @@ static struct bpf_align_test tests[] = { * alignment of 4. */ {8, "R2=pkt(id=0,off=0,r=8,imm=0)"}, - {8, "R6=inv(id=0,umax_value=1020,var_off=(0x0; 0x3fc))"}, + {8, "R6_w=inv(id=0,umax_value=1020,var_off=(0x0; 0x3fc))"}, /* Adding 14 makes R6 be (4n+2) */ - {9, "R6=inv(id=0,umin_value=14,umax_value=1034,var_off=(0x2; 0x7fc))"}, + {9, "R6_w=inv(id=0,umin_value=14,umax_value=1034,var_off=(0x2; 0x7fc))"}, /* Packet pointer has (4n+2) offset */ - {11, "R5=pkt(id=1,off=0,r=0,umin_value=14,umax_value=1034,var_off=(0x2; 0x7fc))"}, + {11, "R5_w=pkt(id=1,off=0,r=0,umin_value=14,umax_value=1034,var_off=(0x2; 0x7fc))"}, {13, "R4=pkt(id=1,off=4,r=0,umin_value=14,umax_value=1034,var_off=(0x2; 0x7fc))"}, /* At the time the word size load is performed from R5, * its total fixed offset is NET_IP_ALIGN + reg->off (0) @@ -426,11 +426,11 @@ static struct bpf_align_test tests[] = { /* Newly read value in R6 was shifted left by 2, so has * known alignment of 4. */ - {18, "R6=inv(id=0,umax_value=1020,var_off=(0x0; 0x3fc))"}, + {18, "R6_w=inv(id=0,umax_value=1020,var_off=(0x0; 0x3fc))"}, /* Added (4n) to packet pointer's (4n+2) var_off, giving * another (4n+2). */ - {19, "R5=pkt(id=2,off=0,r=0,umin_value=14,umax_value=2054,var_off=(0x2; 0xffc))"}, + {19, "R5_w=pkt(id=2,off=0,r=0,umin_value=14,umax_value=2054,var_off=(0x2; 0xffc))"}, {21, "R4=pkt(id=2,off=4,r=0,umin_value=14,umax_value=2054,var_off=(0x2; 0xffc))"}, /* At the time the word size load is performed from R5, * its total fixed offset is NET_IP_ALIGN + reg->off (0) @@ -473,11 +473,11 @@ static struct bpf_align_test tests[] = { .prog_type = BPF_PROG_TYPE_SCHED_CLS, .result = REJECT, .matches = { - {4, "R5=pkt(id=0,off=0,r=0,imm=0)"}, + {4, "R5_w=pkt(id=0,off=0,r=0,imm=0)"}, /* ptr & 0x40 == either 0 or 0x40 */ - {5, "R5=inv(id=0,umax_value=64,var_off=(0x0; 0x40))"}, + {5, "R5_w=inv(id=0,umax_value=64,var_off=(0x0; 0x40))"}, /* ptr << 2 == unknown, (4n) */ - {7, "R5=inv(id=0,smax_value=9223372036854775804,umax_value=18446744073709551612,var_off=(0x0; 0xfffffffffffffffc))"}, + {7, "R5_w=inv(id=0,smax_value=9223372036854775804,umax_value=18446744073709551612,var_off=(0x0; 0xfffffffffffffffc))"}, /* (4n) + 14 == (4n+2). We blow our bounds, because * the add could overflow. */ @@ -485,7 +485,7 @@ static struct bpf_align_test tests[] = { /* Checked s>=0 */ {10, "R5=inv(id=0,umin_value=2,umax_value=9223372036854775806,var_off=(0x2; 0x7ffffffffffffffc))"}, /* packet pointer + nonnegative (4n+2) */ - {12, "R6=pkt(id=1,off=0,r=0,umin_value=2,umax_value=9223372036854775806,var_off=(0x2; 0x7ffffffffffffffc))"}, + {12, "R6_w=pkt(id=1,off=0,r=0,umin_value=2,umax_value=9223372036854775806,var_off=(0x2; 0x7ffffffffffffffc))"}, {14, "R4=pkt(id=1,off=4,r=0,umin_value=2,umax_value=9223372036854775806,var_off=(0x2; 0x7ffffffffffffffc))"}, /* NET_IP_ALIGN + (4n+2) == (4n), alignment is fine. * We checked the bounds, but it might have been able @@ -530,11 +530,11 @@ static struct bpf_align_test tests[] = { * alignment of 4. */ {7, "R2=pkt(id=0,off=0,r=8,imm=0)"}, - {9, "R6=inv(id=0,umax_value=1020,var_off=(0x0; 0x3fc))"}, + {9, "R6_w=inv(id=0,umax_value=1020,var_off=(0x0; 0x3fc))"}, /* Adding 14 makes R6 be (4n+2) */ - {10, "R6=inv(id=0,umin_value=14,umax_value=1034,var_off=(0x2; 0x7fc))"}, + {10, "R6_w=inv(id=0,umin_value=14,umax_value=1034,var_off=(0x2; 0x7fc))"}, /* New unknown value in R7 is (4n) */ - {11, "R7=inv(id=0,umax_value=1020,var_off=(0x0; 0x3fc))"}, + {11, "R7_w=inv(id=0,umax_value=1020,var_off=(0x0; 0x3fc))"}, /* Subtracting it from R6 blows our unsigned bounds */ {12, "R6=inv(id=0,smin_value=-1006,smax_value=1034,var_off=(0x2; 0xfffffffffffffffc))"}, /* Checked s>= 0 */ @@ -583,15 +583,15 @@ static struct bpf_align_test tests[] = { * alignment of 4. */ {7, "R2=pkt(id=0,off=0,r=8,imm=0)"}, - {10, "R6=inv(id=0,umax_value=60,var_off=(0x0; 0x3c))"}, + {10, "R6_w=inv(id=0,umax_value=60,var_off=(0x0; 0x3c))"}, /* Adding 14 makes R6 be (4n+2) */ - {11, "R6=inv(id=0,umin_value=14,umax_value=74,var_off=(0x2; 0x7c))"}, + {11, "R6_w=inv(id=0,umin_value=14,umax_value=74,var_off=(0x2; 0x7c))"}, /* Subtracting from packet pointer overflows ubounds */ - {13, "R5=pkt(id=1,off=0,r=8,umin_value=18446744073709551542,umax_value=18446744073709551602,var_off=(0xffffffffffffff82; 0x7c))"}, + {13, "R5_w=pkt(id=1,off=0,r=8,umin_value=18446744073709551542,umax_value=18446744073709551602,var_off=(0xffffffffffffff82; 0x7c))"}, /* New unknown value in R7 is (4n), >= 76 */ - {15, "R7=inv(id=0,umin_value=76,umax_value=1096,var_off=(0x0; 0x7fc))"}, + {15, "R7_w=inv(id=0,umin_value=76,umax_value=1096,var_off=(0x0; 0x7fc))"}, /* Adding it to packet pointer gives nice bounds again */ - {16, "R5=pkt(id=2,off=0,r=0,umin_value=2,umax_value=1082,var_off=(0x2; 0x7fc))"}, + {16, "R5_w=pkt(id=2,off=0,r=0,umin_value=2,umax_value=1082,var_off=(0x2; 0x7fc))"}, /* At the time the word size load is performed from R5, * its total fixed offset is NET_IP_ALIGN + reg->off (0) * which is 2. Then the variable offset is (4n+2), so diff --git a/tools/testing/selftests/bpf/test_offload.py b/tools/testing/selftests/bpf/test_offload.py new file mode 100755 index 000000000000..3914f7a4585a --- /dev/null +++ b/tools/testing/selftests/bpf/test_offload.py @@ -0,0 +1,681 @@ +#!/usr/bin/python3 + +# Copyright (C) 2017 Netronome Systems, Inc. +# +# This software is licensed under the GNU General License Version 2, +# June 1991 as shown in the file COPYING in the top-level directory of this +# source tree. +# +# THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" +# WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, +# BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS +# FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE +# OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME +# THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. + +from datetime import datetime +import argparse +import json +import os +import pprint +import subprocess +import time + +logfile = None +log_level = 1 +bpf_test_dir = os.path.dirname(os.path.realpath(__file__)) +pp = pprint.PrettyPrinter() +devs = [] # devices we created for clean up +files = [] # files to be removed + +def log_get_sec(level=0): + return "*" * (log_level + level) + +def log_level_inc(add=1): + global log_level + log_level += add + +def log_level_dec(sub=1): + global log_level + log_level -= sub + +def log_level_set(level): + global log_level + log_level = level + +def log(header, data, level=None): + """ + Output to an optional log. + """ + if logfile is None: + return + if level is not None: + log_level_set(level) + + if not isinstance(data, str): + data = pp.pformat(data) + + if len(header): + logfile.write("\n" + log_get_sec() + " ") + logfile.write(header) + if len(header) and len(data.strip()): + logfile.write("\n") + logfile.write(data) + +def skip(cond, msg): + if not cond: + return + print("SKIP: " + msg) + log("SKIP: " + msg, "", level=1) + os.sys.exit(0) + +def fail(cond, msg): + if not cond: + return + print("FAIL: " + msg) + log("FAIL: " + msg, "", level=1) + os.sys.exit(1) + +def start_test(msg): + log(msg, "", level=1) + log_level_inc() + print(msg) + +def cmd(cmd, shell=True, include_stderr=False, background=False, fail=True): + """ + Run a command in subprocess and return tuple of (retval, stdout); + optionally return stderr as well as third value. + """ + proc = subprocess.Popen(cmd, shell=shell, stdout=subprocess.PIPE, + stderr=subprocess.PIPE) + if background: + msg = "%s START: %s" % (log_get_sec(1), + datetime.now().strftime("%H:%M:%S.%f")) + log("BKG " + proc.args, msg) + return proc + + return cmd_result(proc, include_stderr=include_stderr, fail=fail) + +def cmd_result(proc, include_stderr=False, fail=False): + stdout, stderr = proc.communicate() + stdout = stdout.decode("utf-8") + stderr = stderr.decode("utf-8") + proc.stdout.close() + proc.stderr.close() + + stderr = "\n" + stderr + if stderr[-1] == "\n": + stderr = stderr[:-1] + + sec = log_get_sec(1) + log("CMD " + proc.args, + "RETCODE: %d\n%s STDOUT:\n%s%s STDERR:%s\n%s END: %s" % + (proc.returncode, sec, stdout, sec, stderr, + sec, datetime.now().strftime("%H:%M:%S.%f"))) + + if proc.returncode != 0 and fail: + if len(stderr) > 0 and stderr[-1] == "\n": + stderr = stderr[:-1] + raise Exception("Command failed: %s\n%s" % (proc.args, stderr)) + + if include_stderr: + return proc.returncode, stdout, stderr + else: + return proc.returncode, stdout + +def rm(f): + cmd("rm -f %s" % (f)) + if f in files: + files.remove(f) + +def tool(name, args, flags, JSON=True, fail=True): + params = "" + if JSON: + params += "%s " % (flags["json"]) + + ret, out = cmd(name + " " + params + args, fail=fail) + if JSON and len(out.strip()) != 0: + return ret, json.loads(out) + else: + return ret, out + +def bpftool(args, JSON=True, fail=True): + return tool("bpftool", args, {"json":"-p"}, JSON=JSON, fail=fail) + +def bpftool_prog_list(expected=None): + _, progs = bpftool("prog show", JSON=True, fail=True) + if expected is not None: + if len(progs) != expected: + fail(True, "%d BPF programs loaded, expected %d" % + (len(progs), expected)) + return progs + +def bpftool_prog_list_wait(expected=0, n_retry=20): + for i in range(n_retry): + nprogs = len(bpftool_prog_list()) + if nprogs == expected: + return + time.sleep(0.05) + raise Exception("Time out waiting for program counts to stabilize want %d, have %d" % (expected, nprogs)) + +def ip(args, force=False, JSON=True, fail=True): + if force: + args = "-force " + args + return tool("ip", args, {"json":"-j"}, JSON=JSON, fail=fail) + +def tc(args, JSON=True, fail=True): + return tool("tc", args, {"json":"-p"}, JSON=JSON, fail=fail) + +def ethtool(dev, opt, args, fail=True): + return cmd("ethtool %s %s %s" % (opt, dev["ifname"], args), fail=fail) + +def bpf_obj(name, sec=".text", path=bpf_test_dir,): + return "obj %s sec %s" % (os.path.join(path, name), sec) + +def bpf_pinned(name): + return "pinned %s" % (name) + +def bpf_bytecode(bytecode): + return "bytecode \"%s\"" % (bytecode) + +class DebugfsDir: + """ + Class for accessing DebugFS directories as a dictionary. + """ + + def __init__(self, path): + self.path = path + self._dict = self._debugfs_dir_read(path) + + def __len__(self): + return len(self._dict.keys()) + + def __getitem__(self, key): + if type(key) is int: + key = list(self._dict.keys())[key] + return self._dict[key] + + def __setitem__(self, key, value): + log("DebugFS set %s = %s" % (key, value), "") + log_level_inc() + + cmd("echo '%s' > %s/%s" % (value, self.path, key)) + log_level_dec() + + _, out = cmd('cat %s/%s' % (self.path, key)) + self._dict[key] = out.strip() + + def _debugfs_dir_read(self, path): + dfs = {} + + log("DebugFS state for %s" % (path), "") + log_level_inc(add=2) + + _, out = cmd('ls ' + path) + for f in out.split(): + p = os.path.join(path, f) + if os.path.isfile(p): + _, out = cmd('cat %s/%s' % (path, f)) + dfs[f] = out.strip() + elif os.path.isdir(p): + dfs[f] = DebugfsDir(p) + else: + raise Exception("%s is neither file nor directory" % (p)) + + log_level_dec() + log("DebugFS state", dfs) + log_level_dec() + + return dfs + +class NetdevSim: + """ + Class for netdevsim netdevice and its attributes. + """ + + def __init__(self): + self.dev = self._netdevsim_create() + devs.append(self) + + self.dfs_dir = '/sys/kernel/debug/netdevsim/%s' % (self.dev['ifname']) + self.dfs_refresh() + + def __getitem__(self, key): + return self.dev[key] + + def _netdevsim_create(self): + _, old = ip("link show") + ip("link add sim%d type netdevsim") + _, new = ip("link show") + + for dev in new: + f = filter(lambda x: x["ifname"] == dev["ifname"], old) + if len(list(f)) == 0: + return dev + + raise Exception("failed to create netdevsim device") + + def remove(self): + devs.remove(self) + ip("link del dev %s" % (self.dev["ifname"])) + + def dfs_refresh(self): + self.dfs = DebugfsDir(self.dfs_dir) + return self.dfs + + def dfs_num_bound_progs(self): + path = os.path.join(self.dfs_dir, "bpf_bound_progs") + _, progs = cmd('ls %s' % (path)) + return len(progs.split()) + + def dfs_get_bound_progs(self, expected): + progs = DebugfsDir(os.path.join(self.dfs_dir, "bpf_bound_progs")) + if expected is not None: + if len(progs) != expected: + fail(True, "%d BPF programs bound, expected %d" % + (len(progs), expected)) + return progs + + def wait_for_flush(self, bound=0, total=0, n_retry=20): + for i in range(n_retry): + nbound = self.dfs_num_bound_progs() + nprogs = len(bpftool_prog_list()) + if nbound == bound and nprogs == total: + return + time.sleep(0.05) + raise Exception("Time out waiting for program counts to stabilize want %d/%d, have %d bound, %d loaded" % (bound, total, nbound, nprogs)) + + def set_mtu(self, mtu, fail=True): + return ip("link set dev %s mtu %d" % (self.dev["ifname"], mtu), + fail=fail) + + def set_xdp(self, bpf, mode, force=False, fail=True): + return ip("link set dev %s xdp%s %s" % (self.dev["ifname"], mode, bpf), + force=force, fail=fail) + + def unset_xdp(self, mode, force=False, fail=True): + return ip("link set dev %s xdp%s off" % (self.dev["ifname"], mode), + force=force, fail=fail) + + def ip_link_show(self, xdp): + _, link = ip("link show dev %s" % (self['ifname'])) + if len(link) > 1: + raise Exception("Multiple objects on ip link show") + if len(link) < 1: + return {} + fail(xdp != "xdp" in link, + "XDP program not reporting in iplink (reported %s, expected %s)" % + ("xdp" in link, xdp)) + return link[0] + + def tc_add_ingress(self): + tc("qdisc add dev %s ingress" % (self['ifname'])) + + def tc_del_ingress(self): + tc("qdisc del dev %s ingress" % (self['ifname'])) + + def tc_flush_filters(self, bound=0, total=0): + self.tc_del_ingress() + self.tc_add_ingress() + self.wait_for_flush(bound=bound, total=total) + + def tc_show_ingress(self, expected=None): + # No JSON support, oh well... + flags = ["skip_sw", "skip_hw", "in_hw"] + named = ["protocol", "pref", "chain", "handle", "id", "tag"] + + args = "-s filter show dev %s ingress" % (self['ifname']) + _, out = tc(args, JSON=False) + + filters = [] + lines = out.split('\n') + for line in lines: + words = line.split() + if "handle" not in words: + continue + fltr = {} + for flag in flags: + fltr[flag] = flag in words + for name in named: + try: + idx = words.index(name) + fltr[name] = words[idx + 1] + except ValueError: + pass + filters.append(fltr) + + if expected is not None: + fail(len(filters) != expected, + "%d ingress filters loaded, expected %d" % + (len(filters), expected)) + return filters + + def cls_bpf_add_filter(self, bpf, da=False, skip_sw=False, skip_hw=False, + fail=True): + params = "" + if da: + params += " da" + if skip_sw: + params += " skip_sw" + if skip_hw: + params += " skip_hw" + return tc("filter add dev %s ingress bpf %s %s" % + (self['ifname'], bpf, params), fail=fail) + + def set_ethtool_tc_offloads(self, enable, fail=True): + args = "hw-tc-offload %s" % ("on" if enable else "off") + return ethtool(self, "-K", args, fail=fail) + +################################################################################ +def clean_up(): + for dev in devs: + dev.remove() + for f in files: + cmd("rm -f %s" % (f)) + +def pin_prog(file_name, idx=0): + progs = bpftool_prog_list(expected=(idx + 1)) + prog = progs[idx] + bpftool("prog pin id %d %s" % (prog["id"], file_name)) + files.append(file_name) + + return file_name, bpf_pinned(file_name) + +# Parse command line +parser = argparse.ArgumentParser() +parser.add_argument("--log", help="output verbose log to given file") +args = parser.parse_args() +if args.log: + logfile = open(args.log, 'w+') + logfile.write("# -*-Org-*-") + +log("Prepare...", "", level=1) +log_level_inc() + +# Check permissions +skip(os.getuid() != 0, "test must be run as root") + +# Check tools +ret, progs = bpftool("prog", fail=False) +skip(ret != 0, "bpftool not installed") +# Check no BPF programs are loaded +skip(len(progs) != 0, "BPF programs already loaded on the system") + +# Check netdevsim +ret, out = cmd("modprobe netdevsim", fail=False) +skip(ret != 0, "netdevsim module could not be loaded") + +# Check debugfs +_, out = cmd("mount") +if out.find("/sys/kernel/debug type debugfs") == -1: + cmd("mount -t debugfs none /sys/kernel/debug") + +# Check samples are compiled +samples = ["sample_ret0.o"] +for s in samples: + ret, out = cmd("ls %s/%s" % (bpf_test_dir, s), fail=False) + skip(ret != 0, "sample %s/%s not found, please compile it" % + (bpf_test_dir, s)) + +try: + obj = bpf_obj("sample_ret0.o") + bytecode = bpf_bytecode("1,6 0 0 4294967295,") + + start_test("Test destruction of generic XDP...") + sim = NetdevSim() + sim.set_xdp(obj, "generic") + sim.remove() + bpftool_prog_list_wait(expected=0) + + sim = NetdevSim() + sim.tc_add_ingress() + + start_test("Test TC non-offloaded...") + ret, _ = sim.cls_bpf_add_filter(obj, skip_hw=True, fail=False) + fail(ret != 0, "Software TC filter did not load") + + start_test("Test TC non-offloaded isn't getting bound...") + ret, _ = sim.cls_bpf_add_filter(obj, fail=False) + fail(ret != 0, "Software TC filter did not load") + sim.dfs_get_bound_progs(expected=0) + + sim.tc_flush_filters() + + start_test("Test TC offloads are off by default...") + ret, _ = sim.cls_bpf_add_filter(obj, skip_sw=True, fail=False) + fail(ret == 0, "TC filter loaded without enabling TC offloads") + sim.wait_for_flush() + + sim.set_ethtool_tc_offloads(True) + sim.dfs["bpf_tc_non_bound_accept"] = "Y" + + start_test("Test TC offload by default...") + ret, _ = sim.cls_bpf_add_filter(obj, fail=False) + fail(ret != 0, "Software TC filter did not load") + sim.dfs_get_bound_progs(expected=0) + ingress = sim.tc_show_ingress(expected=1) + fltr = ingress[0] + fail(not fltr["in_hw"], "Filter not offloaded by default") + + sim.tc_flush_filters() + + start_test("Test TC cBPF bytcode tries offload by default...") + ret, _ = sim.cls_bpf_add_filter(bytecode, fail=False) + fail(ret != 0, "Software TC filter did not load") + sim.dfs_get_bound_progs(expected=0) + ingress = sim.tc_show_ingress(expected=1) + fltr = ingress[0] + fail(not fltr["in_hw"], "Bytecode not offloaded by default") + + sim.tc_flush_filters() + sim.dfs["bpf_tc_non_bound_accept"] = "N" + + start_test("Test TC cBPF unbound bytecode doesn't offload...") + ret, _ = sim.cls_bpf_add_filter(bytecode, skip_sw=True, fail=False) + fail(ret == 0, "TC bytecode loaded for offload") + sim.wait_for_flush() + + start_test("Test TC offloads work...") + ret, _ = sim.cls_bpf_add_filter(obj, skip_sw=True, fail=False) + fail(ret != 0, "TC filter did not load with TC offloads enabled") + + start_test("Test TC offload basics...") + dfs = sim.dfs_get_bound_progs(expected=1) + progs = bpftool_prog_list(expected=1) + ingress = sim.tc_show_ingress(expected=1) + + dprog = dfs[0] + prog = progs[0] + fltr = ingress[0] + fail(fltr["skip_hw"], "TC does reports 'skip_hw' on offloaded filter") + fail(not fltr["in_hw"], "TC does not report 'in_hw' for offloaded filter") + fail(not fltr["skip_sw"], "TC does not report 'skip_sw' back") + + start_test("Test TC offload is device-bound...") + fail(str(prog["id"]) != fltr["id"], "Program IDs don't match") + fail(prog["tag"] != fltr["tag"], "Program tags don't match") + fail(fltr["id"] != dprog["id"], "Program IDs don't match") + fail(dprog["state"] != "xlated", "Offloaded program state not translated") + fail(dprog["loaded"] != "Y", "Offloaded program is not loaded") + + start_test("Test disabling TC offloads is rejected while filters installed...") + ret, _ = sim.set_ethtool_tc_offloads(False, fail=False) + fail(ret == 0, "Driver should refuse to disable TC offloads with filters installed...") + + start_test("Test qdisc removal frees things...") + sim.tc_flush_filters() + sim.tc_show_ingress(expected=0) + + start_test("Test disabling TC offloads is OK without filters...") + ret, _ = sim.set_ethtool_tc_offloads(False, fail=False) + fail(ret != 0, + "Driver refused to disable TC offloads without filters installed...") + + sim.set_ethtool_tc_offloads(True) + + start_test("Test destroying device gets rid of TC filters...") + sim.cls_bpf_add_filter(obj, skip_sw=True) + sim.remove() + bpftool_prog_list_wait(expected=0) + + sim = NetdevSim() + sim.set_ethtool_tc_offloads(True) + + start_test("Test destroying device gets rid of XDP...") + sim.set_xdp(obj, "offload") + sim.remove() + bpftool_prog_list_wait(expected=0) + + sim = NetdevSim() + sim.set_ethtool_tc_offloads(True) + + start_test("Test XDP prog reporting...") + sim.set_xdp(obj, "drv") + ipl = sim.ip_link_show(xdp=True) + progs = bpftool_prog_list(expected=1) + fail(ipl["xdp"]["prog"]["id"] != progs[0]["id"], + "Loaded program has wrong ID") + + start_test("Test XDP prog replace without force...") + ret, _ = sim.set_xdp(obj, "drv", fail=False) + fail(ret == 0, "Replaced XDP program without -force") + sim.wait_for_flush(total=1) + + start_test("Test XDP prog replace with force...") + ret, _ = sim.set_xdp(obj, "drv", force=True, fail=False) + fail(ret != 0, "Could not replace XDP program with -force") + bpftool_prog_list_wait(expected=1) + ipl = sim.ip_link_show(xdp=True) + progs = bpftool_prog_list(expected=1) + fail(ipl["xdp"]["prog"]["id"] != progs[0]["id"], + "Loaded program has wrong ID") + + start_test("Test XDP prog replace with bad flags...") + ret, _ = sim.set_xdp(obj, "offload", force=True, fail=False) + fail(ret == 0, "Replaced XDP program with a program in different mode") + ret, _ = sim.set_xdp(obj, "", force=True, fail=False) + fail(ret == 0, "Replaced XDP program with a program in different mode") + + start_test("Test XDP prog remove with bad flags...") + ret, _ = sim.unset_xdp("offload", force=True, fail=False) + fail(ret == 0, "Removed program with a bad mode mode") + ret, _ = sim.unset_xdp("", force=True, fail=False) + fail(ret == 0, "Removed program with a bad mode mode") + + start_test("Test MTU restrictions...") + ret, _ = sim.set_mtu(9000, fail=False) + fail(ret == 0, + "Driver should refuse to increase MTU to 9000 with XDP loaded...") + sim.unset_xdp("drv") + bpftool_prog_list_wait(expected=0) + sim.set_mtu(9000) + ret, _ = sim.set_xdp(obj, "drv", fail=False) + fail(ret == 0, "Driver should refuse to load program with MTU of 9000...") + sim.set_mtu(1500) + + sim.wait_for_flush() + start_test("Test XDP offload...") + sim.set_xdp(obj, "offload") + ipl = sim.ip_link_show(xdp=True) + link_xdp = ipl["xdp"]["prog"] + progs = bpftool_prog_list(expected=1) + prog = progs[0] + fail(link_xdp["id"] != prog["id"], "Loaded program has wrong ID") + + start_test("Test XDP offload is device bound...") + dfs = sim.dfs_get_bound_progs(expected=1) + dprog = dfs[0] + + fail(prog["id"] != link_xdp["id"], "Program IDs don't match") + fail(prog["tag"] != link_xdp["tag"], "Program tags don't match") + fail(str(link_xdp["id"]) != dprog["id"], "Program IDs don't match") + fail(dprog["state"] != "xlated", "Offloaded program state not translated") + fail(dprog["loaded"] != "Y", "Offloaded program is not loaded") + + start_test("Test removing XDP program many times...") + sim.unset_xdp("offload") + sim.unset_xdp("offload") + sim.unset_xdp("drv") + sim.unset_xdp("drv") + sim.unset_xdp("") + sim.unset_xdp("") + bpftool_prog_list_wait(expected=0) + + start_test("Test attempt to use a program for a wrong device...") + sim2 = NetdevSim() + sim2.set_xdp(obj, "offload") + pin_file, pinned = pin_prog("/sys/fs/bpf/tmp") + + ret, _ = sim.set_xdp(pinned, "offload", fail=False) + fail(ret == 0, "Pinned program loaded for a different device accepted") + sim2.remove() + ret, _ = sim.set_xdp(pinned, "offload", fail=False) + fail(ret == 0, "Pinned program loaded for a removed device accepted") + rm(pin_file) + bpftool_prog_list_wait(expected=0) + + start_test("Test mixing of TC and XDP...") + sim.tc_add_ingress() + sim.set_xdp(obj, "offload") + ret, _ = sim.cls_bpf_add_filter(obj, skip_sw=True, fail=False) + fail(ret == 0, "Loading TC when XDP active should fail") + sim.unset_xdp("offload") + sim.wait_for_flush() + + sim.cls_bpf_add_filter(obj, skip_sw=True) + ret, _ = sim.set_xdp(obj, "offload", fail=False) + fail(ret == 0, "Loading XDP when TC active should fail") + + start_test("Test binding TC from pinned...") + pin_file, pinned = pin_prog("/sys/fs/bpf/tmp") + sim.tc_flush_filters(bound=1, total=1) + sim.cls_bpf_add_filter(pinned, da=True, skip_sw=True) + sim.tc_flush_filters(bound=1, total=1) + + start_test("Test binding XDP from pinned...") + sim.set_xdp(obj, "offload") + pin_file, pinned = pin_prog("/sys/fs/bpf/tmp2", idx=1) + + sim.set_xdp(pinned, "offload", force=True) + sim.unset_xdp("offload") + sim.set_xdp(pinned, "offload", force=True) + sim.unset_xdp("offload") + + start_test("Test offload of wrong type fails...") + ret, _ = sim.cls_bpf_add_filter(pinned, da=True, skip_sw=True, fail=False) + fail(ret == 0, "Managed to attach XDP program to TC") + + start_test("Test asking for TC offload of two filters...") + sim.cls_bpf_add_filter(obj, da=True, skip_sw=True) + sim.cls_bpf_add_filter(obj, da=True, skip_sw=True) + # The above will trigger a splat until TC cls_bpf drivers are fixed + + sim.tc_flush_filters(bound=2, total=2) + + start_test("Test if netdev removal waits for translation...") + delay_msec = 500 + sim.dfs["bpf_bind_verifier_delay"] = delay_msec + start = time.time() + cmd_line = "tc filter add dev %s ingress bpf %s da skip_sw" % \ + (sim['ifname'], obj) + tc_proc = cmd(cmd_line, background=True, fail=False) + # Wait for the verifier to start + while sim.dfs_num_bound_progs() <= 2: + pass + sim.remove() + end = time.time() + ret, _ = cmd_result(tc_proc, fail=False) + time_diff = end - start + log("Time", "start:\t%s\nend:\t%s\ndiff:\t%s" % (start, end, time_diff)) + + fail(ret == 0, "Managed to load TC filter on a unregistering device") + delay_sec = delay_msec * 0.001 + fail(time_diff < delay_sec, "Removal process took %s, expected %s" % + (time_diff, delay_sec)) + + print("%s: OK" % (os.path.basename(__file__))) + +finally: + log("Clean up...", "", level=1) + log_level_inc() + clean_up() |