summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-10-15dt-bindings: net: dsa: qca8k: Document qca,led-open-drain bindingAnsuel Smith1-0/+11
Document new binding qca,ignore-power-on-sel used to ignore power on strapping and use sw regs instead. Document qca,led-open.drain to set led to open drain mode, the qca,ignore-power-on-sel is mandatory with this enabled or an error will be reported. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15net: dsa: qca8k: add explicit SGMII PLL enableAnsuel Smith2-2/+18
Support enabling PLL on the SGMII CPU port. Some device require this special configuration or no traffic is transmitted and the switch doesn't work at all. A dedicated binding is added to the CPU node port to apply the correct reg on mac config. Fail to correctly configure sgmii with qca8327 switch and warn if pll is used on qca8337 with a revision greater than 1. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15dt-bindings: net: dsa: qca8k: Document qca,sgmii-enable-pllAnsuel Smith1-0/+10
Document qca,sgmii-enable-pll binding used in the CPU nodes to enable SGMII PLL on MAC config. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15net: dsa: qca8k: rework rgmii delay logic and scan for cpu port 6Ansuel Smith2-86/+89
Future proof commit. This switch have 2 CPU ports and one valid configuration is first CPU port set to sgmii and second CPU port set to rgmii-id. The current implementation detects delay only for CPU port zero set to rgmii and doesn't count any delay set in a secondary CPU port. Drop the current delay scan function and move it to the sgmii parser function to generalize and implicitly add support for secondary CPU port set to rgmii-id. Introduce new logic where delay is enabled also with internal delay binding declared and rgmii set as PHY mode. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15net: dsa: qca8k: add support for cpu port 6Ansuel Smith2-17/+36
Currently CPU port is always hardcoded to port 0. This switch have 2 CPU ports. The original intention of this driver seems to be use the mac06_exchange bit to swap MAC0 with MAC6 in the strange configuration where device have connected only the CPU port 6. To skip the introduction of a new binding, rework the driver to address the secondary CPU port as primary and drop any reference of hardcoded port. With configuration of mac06 exchange, just skip the definition of port0 and define the CPU port as a secondary. The driver will autoconfigure the switch to use that as the primary CPU port. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15dt-bindings: net: dsa: qca8k: Document support for CPU port 6Ansuel Smith1-1/+5
The switch now support CPU port to be set 6 instead of be hardcoded to 0. Document support for it and describe logic selection. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15net: dsa: qca8k: add support for sgmii falling edgeAnsuel Smith2-0/+67
Add support for this in the qca8k driver. Also add support for SGMII rx/tx clock falling edge. This is only present for pad0, pad5 and pad6 have these bit reserved from Documentation. Add a comment that this is hardcoded to PAD0 as qca8327/28/34/37 have an unique sgmii line and setting falling in port0 applies to both configuration with sgmii used for port0 or port6. Co-developed-by: Matthew Hagan <mnhagan88@gmail.com> Signed-off-by: Matthew Hagan <mnhagan88@gmail.com> Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15dt-bindings: net: dsa: qca8k: Add SGMII clock phase propertiesAnsuel Smith1-0/+4
Add names and descriptions of additional PORT0_PAD_CTRL properties. qca,sgmii-(rx|tx)clk-falling-edge are for setting the respective clock phase to failling edge. Co-developed-by: Matthew Hagan <mnhagan88@gmail.com> Signed-off-by: Matthew Hagan <mnhagan88@gmail.com> Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15dsa: qca8k: add mac_power_sel supportAnsuel Smith2-0/+36
Add missing mac power sel support needed for ipq8064/5 SoC that require 1.8v for the internal regulator port instead of the default 1.5v. If other device needs this, consider adding a dedicated binding to support this. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15xen-netback: Remove redundant initialization of variable errColin Ian King1-1/+1
The variable err is being initialized with a value that is never read, it is being updated immediately afterwards. The assignment is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15page_pool: disable dma mapping support for 32-bit arch with 64-bit DMAYunsheng Lin3-27/+8
As the 32-bit arch with 64-bit DMA seems to rare those days, and page pool might carry a lot of code and complexity for systems that possibly. So disable dma mapping support for such systems, if drivers really want to work on such systems, they have to implement their own DMA-mapping fallback tracking outside page_pool. Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15Merge branch 'octeontx2-af-miscellaneous-changes-for-cpt'Jakub Kicinski7-15/+517
Srujana Challa says: ==================== octeontx2-af: Miscellaneous changes for CPT This patchset consists of miscellaneous changes for CPT. First patch enables the CPT HW interrupts, second patch adds support for CPT LF teardown in non FLR path and final patch does CPT CTX flush in FLR handler. v2: - Fixed a warning reported by kernel test robot. ==================== Link: https://lore.kernel.org/r/20211013055621.1812301-1-schalla@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-15octeontx2-af: Add support to flush full CPT CTX cacheSrujana Challa5-0/+232
Adds support to flush or invalidate CPT CTX entries as part of FLR and also provides a mailbox to flush CPT CTX entries in case of graceful exit. This patch also adds support for AF -> CPT PF uplink mailbox messages and adds a new mbox message to submit a CPT instruction from AF. Signed-off-by: Srujana Challa <schalla@marvell.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-15octeontx2-af: Perform cpt lf teardown in non FLR pathNithin Dabilpuram3-16/+22
Perform CPT LF teardown in non FLR path as well via cpt_lf_free() Currently CPT LF teardown and reset sequence is only done when FLR is handled with CPT LF still attached. This patch also fixes cpt_lf_alloc() to set EXEC_LDWB in CPT_AF_LFX_CTL2 when being completely overwritten as that is the default value and is better for performance. Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-15octeontx2-af: Enable CPT HW interruptsSrujana Challa4-0/+264
This patch enables and registers interrupt handler for CPT HW interrupts. Signed-off-by: Srujana Challa <schalla@marvell.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-15net: tulip: winbond-840: fix build for UMLRandy Dunlap1-1/+1
On i386, when builtin (not a loadable module), the winbond-840 driver inspects boot_cpu_data to see what CPU family it is running on, and then acts on that data. The "family" struct member (x86) does not exist when running on UML, so prevent that test and do the default action. Prevents this build error on UML + i386: ../drivers/net/ethernet/dec/tulip/winbond-840.c: In function ‘init_registers’: ../drivers/net/ethernet/dec/tulip/winbond-840.c:882:19: error: ‘struct cpuinfo_um’ has no member named ‘x86’ if (boot_cpu_data.x86 <= 4) { Fixes: 68f5d3f3b654 ("um: add PCI over virtio emulation driver") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: linux-um@lists.infradead.org Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Link: https://lore.kernel.org/r/20211014050606.7288-1-rdunlap@infradead.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-15net: intel: igc_ptp: fix build for UMLRandy Dunlap1-1/+1
On a UML build, the igc_ptp driver uses CONFIG_X86_TSC for timestamp conversion. The function that is used is not available on UML builds, so have the function use the default system_counterval_t timestamp instead for UML builds. Prevents this build error on UML: ../drivers/net/ethernet/intel/igc/igc_ptp.c: In function ‘igc_device_tstamp_to_system’: ../drivers/net/ethernet/intel/igc/igc_ptp.c:777:9: error: implicit declaration of function ‘convert_art_ns_to_tsc’ [-Werror=implicit-function-declaration] return convert_art_ns_to_tsc(tstamp); ../drivers/net/ethernet/intel/igc/igc_ptp.c:777:9: error: incompatible types when returning type ‘int’ but ‘struct system_counterval_t’ was expected return convert_art_ns_to_tsc(tstamp); Fixes: 68f5d3f3b654 ("um: add PCI over virtio emulation driver") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: linux-um@lists.infradead.org Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Cc: Tony Nguyen <anthony.l.nguyen@intel.com> Cc: intel-wired-lan@lists.osuosl.org Link: https://lore.kernel.org/r/20211014050516.6846-1-rdunlap@infradead.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-15net: fealnx: fix build for UMLRandy Dunlap1-1/+1
On i386, when builtin (not a loadable module), the fealnx driver inspects boot_cpu_data to see what CPU family it is running on, and then acts on that data. The "family" struct member (x86) does not exist when running on UML, so prevent that test and do the default action. Prevents this build error on UML + i386: ../drivers/net/ethernet/fealnx.c: In function ‘netdev_open’: ../drivers/net/ethernet/fealnx.c:861:19: error: ‘struct cpuinfo_um’ has no member named ‘x86’ Fixes: 68f5d3f3b654 ("um: add PCI over virtio emulation driver") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: linux-um@lists.infradead.org Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Link: https://lore.kernel.org/r/20211014050500.5620-1-rdunlap@infradead.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-15hv_netvsc: Add comment of netvsc_xdp_xmit()Jiasheng Jiang1-0/+1
Adding comment to avoid the misusing of netvsc_xdp_xmit(). Otherwise the value of skb->queue_mapping could be 0 and then the return value of skb_get_rx_queue() could be MAX_U16 cause by overflow. Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Link: https://lore.kernel.org/r/1634174786-1810351-1-git-send-email-jiasheng@iscas.ac.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-15Merge branch 'minor-managed-neighbor-follow-ups'Jakub Kicinski1-8/+12
Daniel Borkmann says: ==================== Minor managed neighbor follow-ups Minor follow-up series to address prior feedback from David and Jakub. Patch 1 adds a build time assertion to prevent overflows when shifting in extended flags, patch 2 is a cleanup to use NLA_POLICY_MASK instead of open-coding invalid flags rejection and patch 3 rejects creating new neighbors with NUD_PERMANENT & NTF_MANAGED. For details, see individual patches. ==================== Link: https://lore.kernel.org/r/20211013132140.11143-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-15net, neigh: Reject creating NUD_PERMANENT with NTF_MANAGED entriesDaniel Borkmann1-3/+8
The combination of NUD_PERMANENT + NTF_MANAGED is not supported and does not make sense either given the former indicates a static/fixed neighbor entry whereas the latter a dynamically resolved one. While it is possible to transition from one over to the other, we should however reject such creation attempts. Fixes: 7482e3841d52 ("net, neigh: Add NTF_MANAGED flag for managed neighbor entries") Suggested-by: David Ahern <dsahern@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-15net, neigh: Use NLA_POLICY_MASK helper for NDA_FLAGS_EXT attributeDaniel Borkmann1-5/+1
Instead of open-coding a check for invalid bits in NTF_EXT_MASK, we can just use the NLA_POLICY_MASK() helper instead, and simplify NDA_FLAGS_EXT sanity check this way. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-15net, neigh: Add build-time assertion to avoid neigh->flags overflowDaniel Borkmann1-0/+3
Currently, NDA_FLAGS_EXT flags allow a maximum of 24 bits to be used for extended neighbor flags. These are eventually fed into neigh->flags by shifting with NTF_EXT_SHIFT as per commit 2c611ad97a82 ("net, neigh: Extend neigh->flags to 32 bit to allow for extensions"). If really ever needed in future, the full 32 bits from NDA_FLAGS_EXT can be used, it would only require to move neigh->flags from u32 to u64 inside the kernel. Add a build-time assertion such that when extending the NTF_EXT_MASK with new bits, we'll trigger an error once we surpass the 24th bit. This assumes that no bit holes in new NTF_EXT_* flags will slip in from UAPI, but I think this is reasonable to assume. Suggested-by: David Ahern <dsahern@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-15net: mvneta: Delete unused variableYuval Shaia1-6/+5
The variable pp is not in use - delete it. Signed-off-by: Yuval Shaia <yshaia@marvell.com> Link: https://lore.kernel.org/r/20211013064921.26346-1-yshaia@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-15net: phy: dp83867: introduce critical chip default init for non-of platformLay, Kuan Loon1-0/+19
PHY driver dp83867 has rich supports for OF-platform to fine-tune the PHY chip during phy configuration. However, for non-OF platform, certain PHY tunable parameters such as IO impedance and RX & TX internal delays are critical and should be initialized to its default during PHY driver probe. Tested-by: Clement <clement@intel.com> Signed-off-by: Lay, Kuan Loon <kuan.loon.lay@intel.com> Co-developed-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Tested-by: Kurt Kanzenbach <kurt@linutronix.de> Link: https://lore.kernel.org/r/20211013065941.2124858-1-boon.leong.ong@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-15net: microchip: lan743x: add support for PTP pulse width (duty cycle)Yuiko Oshino2-11/+81
If the PTP_PEROUT_DUTY_CYCLE flag is set, then check if the request_on value in ptp_perout_request matches the pre-defined values or a toggle option. Return a failure if the value is not supported. Preserve the old behaviors if the PTP_PEROUT_DUTY_CYCLE flag is not set. Tested with an oscilloscope on EVB-LAN7430: e.g., to output PPS 1sec period 500mS on (high) to GPIO 2. ./testptp -L 2,2 ./testptp -p 1000000000 -w 500000000 Signed-off-by: Yuiko Oshino <yuiko.oshino@microchip.com> Link: https://lore.kernel.org/r/1634046593-64312-1-git-send-email-yuiko.oshino@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-15net: phy: micrel: make *-skew-ps check more lenientMatthias Schiffer1-2/+2
It seems reasonable to fine-tune only some of the skew values when using one of the rgmii-*id PHY modes, and even when all skew values are specified, using the correct ID PHY mode makes sense for documentation purposes. Such a configuration also appears in the binding docs in Documentation/devicetree/bindings/net/micrel-ksz90x1.txt, so the driver should not warn about it. Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com> Link: https://lore.kernel.org/r/20211012103402.21438-1-matthias.schiffer@ew.tq-group.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski284-1438/+3016
tools/testing/selftests/net/ioam6.sh 7b1700e009cc ("selftests: net: modify IOAM tests for undef bits") bf77b1400a56 ("selftests: net: Test for the IOAM encapsulation with IPv6") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-15net: of: fix stub of_net helpers for CONFIG_NET=nArnd Bergmann1-1/+1
Moving the of_net code from drivers/of/ to net/core means we no longer stub out the helpers when networking is disabled, which leads to a randconfig build failure with at least one ARM platform that calls this from non-networking code: arm-linux-gnueabi-ld: arch/arm/mach-mvebu/kirkwood.o: in function `kirkwood_dt_eth_fixup': kirkwood.c:(.init.text+0x54): undefined reference to `of_get_mac_address' Restore the way this worked before by changing that #ifdef check back to testing for both CONFIG_OF and CONFIG_NET. Fixes: e330fb14590c ("of: net: move of_net under net/") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20211014090055.2058949-1-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-15Merge tag 'net-5.15-rc6' of ↵Linus Torvalds74-585/+1009
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Quite calm. The noisy DSA driver (embedded switches) changes, and adjustment to IPv6 IOAM behavior add to diffstat's bottom line but are not scary. Current release - regressions: - af_unix: rename UNIX-DGRAM to UNIX to maintain backwards compatibility - procfs: revert "add seq_puts() statement for dev_mcast", minor format change broke user space Current release - new code bugs: - dsa: fix bridge_num not getting cleared after ports leaving the bridge, resource leak - dsa: tag_dsa: send packets with TX fwd offload from VLAN-unaware bridges using VID 0, prevent packet drops if pvid is removed - dsa: mv88e6xxx: keep the pvid at 0 when VLAN-unaware, prevent HW getting confused about station to VLAN mapping Previous releases - regressions: - virtio-net: fix for skb_over_panic inside big mode - phy: do not shutdown PHYs in READY state - dsa: mv88e6xxx: don't use PHY_DETECT on internal PHY's, fix link LED staying lit after ifdown - mptcp: fix possible infinite wait on recvmsg(MSG_WAITALL) - mqprio: Correct stats in mqprio_dump_class_stats() - ice: fix deadlock for Tx timestamp tracking flush - stmmac: fix feature detection on old hardware Previous releases - always broken: - sctp: account stream padding length for reconf chunk - icmp: fix icmp_ext_echo_iio parsing in icmp_build_probe() - isdn: cpai: check ctr->cnr to avoid array index out of bound - isdn: mISDN: fix sleeping function called from invalid context - nfc: nci: fix potential UAF of rf_conn_info object - dsa: microchip: prevent ksz_mib_read_work from kicking back in after it's canceled in .remove and crashing - dsa: mv88e6xxx: isolate the ATU databases of standalone and bridged ports - dsa: sja1105, ocelot: break circular dependency between switch and tag drivers - dsa: felix: improve timestamping in presence of packe loss - mlxsw: thermal: fix out-of-bounds memory accesses Misc: - ipv6: ioam: move the check for undefined bits to improve interoperability" * tag 'net-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (60 commits) icmp: fix icmp_ext_echo_iio parsing in icmp_build_probe MAINTAINERS: Update the devicetree documentation path of imx fec driver sctp: account stream padding length for reconf chunk mlxsw: thermal: Fix out-of-bounds memory accesses ethernet: s2io: fix setting mac address during resume NFC: digital: fix possible memory leak in digital_in_send_sdd_req() NFC: digital: fix possible memory leak in digital_tg_listen_mdaa() nfc: fix error handling of nfc_proto_register() Revert "net: procfs: add seq_puts() statement for dev_mcast" net: encx24j600: check error in devm_regmap_init_encx24j600 net: korina: select CRC32 net: arc: select CRC32 net: dsa: felix: break at first CPU port during init and teardown net: dsa: tag_ocelot_8021q: fix inability to inject STP BPDUs into BLOCKING ports net: dsa: felix: purge skb from TX timestamping queue if it cannot be sent net: dsa: tag_ocelot_8021q: break circular dependency with ocelot switch lib net: dsa: tag_ocelot: break circular dependency with ocelot switch lib driver net: mscc: ocelot: cross-check the sequence id from the timestamp FIFO with the skb PTP header net: mscc: ocelot: deny TX timestamping of non-PTP packets net: mscc: ocelot: warn when a PTP IRQ is raised for an unknown skb ...
2021-10-15netfilter: ipvs: merge ipv4 + ipv6 icmp reply handlersFlorian Westphal1-23/+14
Similar to earlier patches: allow ipv4 and ipv6 to use the same handler. ipv4 and ipv6 specific actions can be done by checking state->pf. v2: split the pf == NFPROTO_IPV4 check (Julian Anastasov) Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-10-15netfilter: ipvs: remove unneeded input wrappersFlorian Westphal1-53/+4
After earlier patch ip_vs_hook_in can be used directly. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-10-15netfilter: ipvs: remove unneeded output wrappersFlorian Westphal1-56/+6
After earlier patch we can use ip_vs_out_hook directly. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-10-15netfilter: ipvs: prepare for hook function reductionFlorian Westphal1-10/+16
ipvs has multiple one-line wrappers for hooks, compact them. To avoid a large patch make the two most common helpers use the same function signature as hooks. Next patches can then remove the oneline wrappers. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-10-15netfilter: ebtables: allow use of ebt_do_table as hookfnFlorian Westphal5-26/+12
This is possible now that the xt_table structure is passed via *priv. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-10-15netfilter: ip6tables: allow use of ip6t_do_table as hookfnFlorian Westphal7-47/+16
This is possible now that the xt_table structure is passed via *priv. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-10-15netfilter: arp_tables: allow use of arpt_do_table as hookfnFlorian Westphal3-15/+7
This is possible now that the xt_table structure is passed in via *priv. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-10-15netfilter: iptables: allow use of ipt_do_table as hookfnFlorian Westphal7-46/+18
This is possible now that the xt_table structure is passed in via *priv. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-10-15af_packet: Introduce egress hookPablo Neira Ayuso1-0/+35
Add egress hook for AF_PACKET sockets that have the PACKET_QDISC_BYPASS socket option set to on, which allows packets to escape without being filtered in the egress path. This patch only updates the AF_PACKET path, it does not update dev_direct_xmit() so the XDP infrastructure has a chance to bypass Netfilter. [lukas: acquire rcu_read_lock, fix typos, rebase] Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-10-15netfilter: Introduce egress hookLukas Wunner10-10/+168
Support classifying packets with netfilter on egress to satisfy user requirements such as: * outbound security policies for containers (Laura) * filtering and mangling intra-node Direct Server Return (DSR) traffic on a load balancer (Laura) * filtering locally generated traffic coming in through AF_PACKET, such as local ARP traffic generated for clustering purposes or DHCP (Laura; the AF_PACKET plumbing is contained in a follow-up commit) * L2 filtering from ingress and egress for AVB (Audio Video Bridging) and gPTP with nftables (Pablo) * in the future: in-kernel NAT64/NAT46 (Pablo) The egress hook introduced herein complements the ingress hook added by commit e687ad60af09 ("netfilter: add netfilter ingress hook after handle_ing() under unique static key"). A patch for nftables to hook up egress rules from user space has been submitted separately, so users may immediately take advantage of the feature. Alternatively or in addition to netfilter, packets can be classified with traffic control (tc). On ingress, packets are classified first by tc, then by netfilter. On egress, the order is reversed for symmetry. Conceptually, tc and netfilter can be thought of as layers, with netfilter layered above tc. Traffic control is capable of redirecting packets to another interface (man 8 tc-mirred). E.g., an ingress packet may be redirected from the host namespace to a container via a veth connection: tc ingress (host) -> tc egress (veth host) -> tc ingress (veth container) In this case, netfilter egress classifying is not performed when leaving the host namespace! That's because the packet is still on the tc layer. If tc redirects the packet to a physical interface in the host namespace such that it leaves the system, the packet is never subjected to netfilter egress classifying. That is only logical since it hasn't passed through netfilter ingress classifying either. Packets can alternatively be redirected at the netfilter layer using nft fwd. Such a packet *is* subjected to netfilter egress classifying since it has reached the netfilter layer. Internally, the skb->nf_skip_egress flag controls whether netfilter is invoked on egress by __dev_queue_xmit(). Because __dev_queue_xmit() may be called recursively by tunnel drivers such as vxlan, the flag is reverted to false after sch_handle_egress(). This ensures that netfilter is applied both on the overlay and underlying network. Interaction between tc and netfilter is possible by setting and querying skb->mark. If netfilter egress classifying is not enabled on any interface, it is patched out of the data path by way of a static_key and doesn't make a performance difference that is discernible from noise: Before: 1537 1538 1538 1537 1538 1537 Mb/sec After: 1536 1534 1539 1539 1539 1540 Mb/sec Before + tc accept: 1418 1418 1418 1419 1419 1418 Mb/sec After + tc accept: 1419 1424 1418 1419 1422 1420 Mb/sec Before + tc drop: 1620 1619 1619 1619 1620 1620 Mb/sec After + tc drop: 1616 1624 1625 1624 1622 1619 Mb/sec When netfilter egress classifying is enabled on at least one interface, a minimal performance penalty is incurred for every egress packet, even if the interface it's transmitted over doesn't have any netfilter egress rules configured. That is caused by checking dev->nf_hooks_egress against NULL. Measurements were performed on a Core i7-3615QM. Commands to reproduce: ip link add dev foo type dummy ip link set dev foo up modprobe pktgen echo "add_device foo" > /proc/net/pktgen/kpktgend_3 samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh -i foo -n 400000000 -m "11:11:11:11:11:11" -d 1.1.1.1 Accept all traffic with tc: tc qdisc add dev foo clsact tc filter add dev foo egress bpf da bytecode '1,6 0 0 0,' Drop all traffic with tc: tc qdisc add dev foo clsact tc filter add dev foo egress bpf da bytecode '1,6 0 0 2,' Apply this patch when measuring packet drops to avoid errors in dmesg: https://lore.kernel.org/netdev/a73dda33-57f4-95d8-ea51-ed483abd6a7a@iogearbox.net/ Signed-off-by: Lukas Wunner <lukas@wunner.de> Cc: Laura García Liébana <nevola@gmail.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-10-15netfilter: Generalize ingress hook include fileLukas Wunner2-10/+12
Prepare for addition of a netfilter egress hook by generalizing the ingress hook include file. No functional change intended. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-10-15netfilter: Rename ingress hook include fileLukas Wunner2-1/+1
Prepare for addition of a netfilter egress hook by renaming <linux/netfilter_ingress.h> to <linux/netfilter_netdev.h>. The egress hook also necessitates a refactoring of the include file, but that is done in a separate commit to ease reviewing. No functional change intended. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-10-14ethernet: remove random_ether_addr()Jakub Kicinski2-3/+1
random_ether_addr() was the original name of the helper which was kept for backward compatibility (?) after the rename in commit 0a4dd594982a ("etherdevice: Rename random_ether_addr to eth_random_addr"). We have a single random_ether_addr() caller left in tree while there are 70 callers of eth_random_addr(). Time to drop this define. Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20211013205450.328092-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-14Merge branch 'ethernet-more-netdev-dev_addr-write-removals'Jakub Kicinski164-277/+295
Jakub Kicinski says: ==================== ethernet: more netdev->dev_addr write removals Another series removing direct writes to netdev->dev_addr. ==================== Link: https://lore.kernel.org/r/20211013204435.322561-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-14ethernet: replace netdev->dev_addr 16bit writesJakub Kicinski14-37/+58
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it got through appropriate helpers. This patch takes care of drivers which cast netdev->dev_addr to a 16bit type, often with an explicit byte order. Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-14ethernet: replace netdev->dev_addr assignment loopsJakub Kicinski12-38/+17
A handful of drivers contains loops assigning the mac addr byte by byte. Convert those to eth_hw_addr_set(). Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-14ethernet: ibm/emac: use of_get_ethdev_address() to load dev_addrJakub Kicinski1-1/+1
A straggler I somehow missed in the automated conversion in commit 9ca01b25dfff ("ethernet: use of_get_ethdev_address()"). Use the new helper instead of using netdev->dev_addr directly. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-14ethernet: manually convert memcpy(dev_addr,..., sizeof(addr))Jakub Kicinski5-5/+5
A handful of drivers use sizeof(addr) for the size of the address, after manually confirming the size is indeed 6 convert them to eth_hw_addr_set(). Acked-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Petko Manolov <petkan@nucleusys.com> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-14ethernet: make use of eth_hw_addr_random() where appropriateJakub Kicinski5-6/+6
Number of drivers use eth_random_addr(netdev->dev_addr) thus writing to netdev->dev_addr directly, and not setting the address type. Switch them to eth_hw_addr_random(). @@ expression netdev; @@ - eth_random_addr(netdev->dev_addr); + eth_hw_addr_random(netdev); Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-14ethernet: make eth_hw_addr_random() use dev_addr_set()Jakub Kicinski1-1/+4
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it got through appropriate helpers. Signed-off-by: Jakub Kicinski <kuba@kernel.org>