summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-08-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski113-488/+1258
Build failure in drivers/net/wwan/mhi_wwan_mbim.c: add missing parameter (0, assuming we don't want buffer pre-alloc). Conflict in drivers/net/dsa/sja1105/sja1105_main.c between: 589918df9322 ("net: dsa: sja1105: be stateless with FDB entries on SJA1105P/Q/R/S/SJA1110 too") 0fac6aa098ed ("net: dsa: sja1105: delete the best_effort_vlan_filtering mode") Follow the instructions from the commit message of the former commit - removed the if conditions. When looking at commit 589918df9322 ("net: dsa: sja1105: be stateless with FDB entries on SJA1105P/Q/R/S/SJA1110 too") note that the mask_iotag fields get removed by the following patch. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-05Merge tag 'net-5.14-rc5' of ↵Linus Torvalds48-197/+583
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from ipsec. Current release - regressions: - sched: taprio: fix init procedure to avoid inf loop when dumping - sctp: move the active_key update after sh_keys is added Current release - new code bugs: - sparx5: fix build with old GCC & bitmask on 32-bit targets Previous releases - regressions: - xfrm: redo the PREEMPT_RT RCU vs hash_resize_mutex deadlock fix - xfrm: fixes for the compat netlink attribute translator - phy: micrel: Fix detection of ksz87xx switch Previous releases - always broken: - gro: set inner transport header offset in tcp/udp GRO hook to avoid crashes when such packets reach GSO - vsock: handle VIRTIO_VSOCK_OP_CREDIT_REQUEST, as required by spec - dsa: sja1105: fix static FDB entries on SJA1105P/Q/R/S and SJA1110 - bridge: validate the NUD_PERMANENT bit when adding an extern_learn FDB entry - usb: lan78xx: don't modify phy_device state concurrently - usb: pegasus: check for errors of IO routines" * tag 'net-5.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (48 commits) net: vxge: fix use-after-free in vxge_device_unregister net: fec: fix use-after-free in fec_drv_remove net: pegasus: fix uninit-value in get_interrupt_interval net: ethernet: ti: am65-cpsw: fix crash in am65_cpsw_port_offload_fwd_mark_update() bnx2x: fix an error code in bnx2x_nic_load() net: wwan: iosm: fix recursive lock acquire in unregister net: wwan: iosm: correct data protocol mask bit net: wwan: iosm: endianness type correction net: wwan: iosm: fix lkp buildbot warning net: usb: lan78xx: don't modify phy_device state concurrently docs: networking: netdevsim rules net: usb: pegasus: Remove the changelog and DRIVER_VERSION. net: usb: pegasus: Check the return value of get_geristers() and friends; net/prestera: Fix devlink groups leakage in error flow net: sched: fix lockdep_set_class() typo error for sch->seqlock net: dsa: qca: ar9331: reorder MDIO write sequence VSOCK: handle VIRTIO_VSOCK_OP_CREDIT_REQUEST mptcp: drop unused rcu member in mptcp_pm_addr_entry net: ipv6: fix returned variable type in ip6_skb_dst_mtu nfp: update ethtool reporting of pauseframe control ...
2021-08-05Bluetooth: defer cleanup of resources in hci_unregister_dev()Tetsuo Handa4-24/+45
syzbot is hitting might_sleep() warning at hci_sock_dev_event() due to calling lock_sock() with rw spinlock held [1]. It seems that history of this locking problem is a trial and error. Commit b40df5743ee8 ("[PATCH] bluetooth: fix socket locking in hci_sock_dev_event()") in 2.6.21-rc4 changed bh_lock_sock() to lock_sock() as an attempt to fix lockdep warning. Then, commit 4ce61d1c7a8e ("[BLUETOOTH]: Fix locking in hci_sock_dev_event().") in 2.6.22-rc2 changed lock_sock() to local_bh_disable() + bh_lock_sock_nested() as an attempt to fix the sleep in atomic context warning. Then, commit 4b5dd696f81b ("Bluetooth: Remove local_bh_disable() from hci_sock.c") in 3.3-rc1 removed local_bh_disable(). Then, commit e305509e678b ("Bluetooth: use correct lock to prevent UAF of hdev object") in 5.13-rc5 again changed bh_lock_sock_nested() to lock_sock() as an attempt to fix CVE-2021-3573. This difficulty comes from current implementation that hci_sock_dev_event(HCI_DEV_UNREG) is responsible for dropping all references from sockets because hci_unregister_dev() immediately reclaims resources as soon as returning from hci_sock_dev_event(HCI_DEV_UNREG). But the history suggests that hci_sock_dev_event(HCI_DEV_UNREG) was not doing what it should do. Therefore, instead of trying to detach sockets from device, let's accept not detaching sockets from device at hci_sock_dev_event(HCI_DEV_UNREG), by moving actual cleanup of resources from hci_unregister_dev() to hci_cleanup_dev() which is called by bt_host_release() when all references to this unregistered device (which is a kobject) are gone. Since hci_sock_dev_event(HCI_DEV_UNREG) no longer resets hci_pi(sk)->hdev, we need to check whether this device was unregistered and return an error based on HCI_UNREGISTER flag. There might be subtle behavioral difference in "monitor the hdev" functionality; please report if you found something went wrong due to this patch. Link: https://syzkaller.appspot.com/bug?extid=a5df189917e79d5e59c9 [1] Reported-by: syzbot <syzbot+a5df189917e79d5e59c9@syzkaller.appspotmail.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Fixes: e305509e678b ("Bluetooth: use correct lock to prevent UAF of hdev object") Acked-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-08-05Merge tag 'selinux-pr-20210805' of ↵Linus Torvalds1-6/+4
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull selinux fix from Paul Moore: "One small SELinux fix for a problem where an error code was not being propagated back up to userspace when a bogus SELinux policy is loaded into the kernel" * tag 'selinux-pr-20210805' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: correct the return value when loads initial sids
2021-08-05Merge branch 'for-v5.14' of ↵Linus Torvalds1-3/+7
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull ucounts fix from Eric Biederman: "Fix a subtle locking versus reference counting bug in the ucount changes, found by syzbot" * 'for-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: ucounts: Fix race condition between alloc_ucounts and put_ucounts
2021-08-05Merge tag 'trace-v5.14-rc4' of ↵Linus Torvalds6-47/+30
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Various tracing fixes: - Fix NULL pointer dereference caused by an error path - Give histogram calculation fields a size, otherwise it breaks synthetic creation based on them. - Reject strings being used for number calculations. - Fix recordmcount.pl warning on llvm building RISC-V allmodconfig - Fix the draw_functrace.py script to handle the new trace output - Fix warning of smp_processor_id() in preemptible code" * tag 'trace-v5.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Quiet smp_processor_id() use in preemptable warning in hwlat scripts/tracing: fix the bug that can't parse raw_trace_func scripts/recordmcount.pl: Remove check_objcopy() and $can_use_local tracing: Reject string operand in the histogram expression tracing / histogram: Give calculation hist_fields a size tracing: Fix NULL pointer dereference in start_creating
2021-08-05Merge tag 's390-5.14-4' of ↵Linus Torvalds6-2/+7
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Heiko Carstens: - fix zstd build for -march=z900 (undefined reference to __clzdi2) - add missing .got.plts to vdso linker scripts to fix kpatch build errors - update defconfigs * tag 's390-5.14-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390: update defconfigs s390/boot: fix zstd build for -march=z900 s390/vdso: add .got.plt in vdso linker script
2021-08-05Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds9-34/+125
Pull kvm fixes from Paolo Bonzini: "Mostly bugfixes; plus, support for XMM arguments to Hyper-V hypercalls now obeys KVM_CAP_HYPERV_ENFORCE_CPUID. Both the XMM arguments feature and KVM_CAP_HYPERV_ENFORCE_CPUID are new in 5.14, and each did not know of the other" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86/mmu: Fix per-cpu counter corruption on 32-bit builds KVM: selftests: fix hyperv_clock test KVM: SVM: improve the code readability for ASID management KVM: SVM: Fix off-by-one indexing when nullifying last used SEV VMCB KVM: Do not leak memory for duplicate debugfs directories KVM: selftests: Test access to XMM fast hypercalls KVM: x86: hyper-v: Check if guest is allowed to use XMM registers for hypercall input KVM: x86: Introduce trace_kvm_hv_hypercall_done() KVM: x86: hyper-v: Check access to hypercall before reading XMM registers KVM: x86: accept userspace interrupt only if no event is injected
2021-08-05Merge branch 'pcmcia-next' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux Pull pcmcia fix from Dominik Brodowski: "Zheyu Ma found and fixed a null pointer dereference bug in the device driver for the i82092 card reader" * 'pcmcia-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux: pcmcia: i82092: fix a null pointer dereference bug
2021-08-05pipe: increase minimum default pipe size to 2 pagesAlex Xu (Hello71)1-2/+17
This program always prints 4096 and hangs before the patch, and always prints 8192 and exits successfully after: int main() { int pipefd[2]; for (int i = 0; i < 1025; i++) if (pipe(pipefd) == -1) return 1; size_t bufsz = fcntl(pipefd[1], F_GETPIPE_SZ); printf("%zd\n", bufsz); char *buf = calloc(bufsz, 1); write(pipefd[1], buf, bufsz); read(pipefd[0], buf, bufsz-1); write(pipefd[1], buf, 1); } Note that you may need to increase your RLIMIT_NOFILE before running the program. Fixes: 759c01142a ("pipe: limit the per-user amount of pages allocated in pipes") Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/lkml/1628086770.5rn8p04n6j.none@localhost/ Link: https://lore.kernel.org/lkml/1628127094.lxxn016tj7.none@localhost/ Signed-off-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-08-05Merge branch 'net-fix-use-after-free-bugs'Jakub Kicinski2-4/+4
Pavel Skripkin says: ==================== net: fix use-after-free bugs I've added new checker to smatch yesterday. It warns about using netdev_priv() pointer after free_{netdev,candev}() call. I hope, it will get into next smatch release. Some of the reported bugs are fixed and upstreamed already, but Dan ran new smatch with allmodconfig and found 2 more. Big thanks to Dan for doing it, because I totally forgot to do it. ==================== Link: https://lore.kernel.org/r/cover.1628091954.git.paskripkin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-05net: vxge: fix use-after-free in vxge_device_unregisterPavel Skripkin1-3/+3
Smatch says: drivers/net/ethernet/neterion/vxge/vxge-main.c:3518 vxge_device_unregister() error: Using vdev after free_{netdev,candev}(dev); drivers/net/ethernet/neterion/vxge/vxge-main.c:3518 vxge_device_unregister() error: Using vdev after free_{netdev,candev}(dev); drivers/net/ethernet/neterion/vxge/vxge-main.c:3520 vxge_device_unregister() error: Using vdev after free_{netdev,candev}(dev); drivers/net/ethernet/neterion/vxge/vxge-main.c:3520 vxge_device_unregister() error: Using vdev after free_{netdev,candev}(dev); Since vdev pointer is netdev private data accessing it after free_netdev() call can cause use-after-free bug. Fix it by moving free_netdev() call at the end of the function Fixes: 6cca200362b4 ("vxge: cleanup probe error paths") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-05net: fec: fix use-after-free in fec_drv_removePavel Skripkin1-1/+1
Smatch says: drivers/net/ethernet/freescale/fec_main.c:3994 fec_drv_remove() error: Using fep after free_{netdev,candev}(ndev); drivers/net/ethernet/freescale/fec_main.c:3995 fec_drv_remove() error: Using fep after free_{netdev,candev}(ndev); Since fep pointer is netdev private data, accessing it after free_netdev() call can cause use-after-free bug. Fix it by moving free_netdev() call at the end of the function Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: a31eda65ba21 ("net: fec: fix clock count mis-match") Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Reviewed-by: Joakim Zhang <qiangqing.zhang@nxp.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-05net: pegasus: fix uninit-value in get_interrupt_intervalPavel Skripkin1-3/+11
Syzbot reported uninit value pegasus_probe(). The problem was in missing error handling. get_interrupt_interval() internally calls read_eprom_word() which can fail in some cases. For example: failed to receive usb control message. These cases should be handled to prevent uninit value bug, since read_eprom_word() will not initialize passed stack variable in case of internal failure. Fail log: BUG: KMSAN: uninit-value in get_interrupt_interval drivers/net/usb/pegasus.c:746 [inline] BUG: KMSAN: uninit-value in pegasus_probe+0x10e7/0x4080 drivers/net/usb/pegasus.c:1152 CPU: 1 PID: 825 Comm: kworker/1:1 Not tainted 5.12.0-rc6-syzkaller #0 ... Workqueue: usb_hub_wq hub_event Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x24c/0x2e0 lib/dump_stack.c:120 kmsan_report+0xfb/0x1e0 mm/kmsan/kmsan_report.c:118 __msan_warning+0x5c/0xa0 mm/kmsan/kmsan_instr.c:197 get_interrupt_interval drivers/net/usb/pegasus.c:746 [inline] pegasus_probe+0x10e7/0x4080 drivers/net/usb/pegasus.c:1152 .... Local variable ----data.i@pegasus_probe created at: get_interrupt_interval drivers/net/usb/pegasus.c:1151 [inline] pegasus_probe+0xe57/0x4080 drivers/net/usb/pegasus.c:1152 get_interrupt_interval drivers/net/usb/pegasus.c:1151 [inline] pegasus_probe+0xe57/0x4080 drivers/net/usb/pegasus.c:1152 Reported-and-tested-by: syzbot+02c9f70f3afae308464a@syzkaller.appspotmail.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Link: https://lore.kernel.org/r/20210804143005.439-1-paskripkin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-05tracing: Quiet smp_processor_id() use in preemptable warning in hwlatSteven Rostedt (VMware)1-1/+1
The hardware latency detector (hwlat) has a mode that it runs one thread across CPUs. The logic to move from the currently running CPU to the next one in the list does a smp_processor_id() to find where it currently is. Unfortunately, it's done with preemption enabled, and this triggers a warning for using smp_processor_id() in a preempt enabled section. As it is only using smp_processor_id() to get information on where it currently is in order to simply move it to the next CPU, it doesn't really care if it got moved in the mean time. It will simply balance out later if such a case arises. Switch smp_processor_id() to raw_smp_processor_id() to quiet that warning. Link: https://lkml.kernel.org/r/20210804141848.79edadc0@oasis.local.home Acked-by: Daniel Bristot de Oliveira <bristot@redhat.com> Fixes: 8fa826b7344d ("trace/hwlat: Implement the mode config option") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-08-05net: ethernet: ti: am65-cpsw: fix crash in ↵Grygorii Strashko1-1/+5
am65_cpsw_port_offload_fwd_mark_update() The am65_cpsw_port_offload_fwd_mark_update() causes NULL exception crash when there is at least one disabled port and any other port added to the bridge first time. Unable to handle kernel NULL pointer dereference at virtual address 0000000000000858 pc : am65_cpsw_port_offload_fwd_mark_update+0x54/0x68 lr : am65_cpsw_netdevice_event+0x8c/0xf0 Call trace: am65_cpsw_port_offload_fwd_mark_update+0x54/0x68 notifier_call_chain+0x54/0x98 raw_notifier_call_chain+0x14/0x20 call_netdevice_notifiers_info+0x34/0x78 __netdev_upper_dev_link+0x1c8/0x290 netdev_master_upper_dev_link+0x1c/0x28 br_add_if+0x3f0/0x6d0 [bridge] Fix it by adding proper check for port->ndev != NULL. Fixes: 2934db9bcb30 ("net: ti: am65-cpsw-nuss: Add netdevice notifiers") Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-05bnx2x: fix an error code in bnx2x_nic_load()Dan Carpenter1-1/+2
Set the error code if bnx2x_alloc_fw_stats_mem() fails. The current code returns success. Fixes: ad5afc89365e ("bnx2x: Separate VF and PF logic") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-05netdevsim: Forbid devlink reload when adding or deleting portsLeon Romanovsky2-12/+11
In order to remove complexity in devlink core related to devlink_reload_enable/disable, let's rewrite new_port/del_port logic to rely on internal to netdevsim lcok. We should protect only reload_down flow because it destroys nsim_dev, which is needed for nsim_dev_port_add/nsim_dev_port_del to hold port_list_lock. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-05net: dsa: tag_sja1105: optionally build as module when switch driver is ↵Vladimir Oltean1-0/+1
module if PTP is enabled TX timestamps are sent by SJA1110 as Ethernet packets containing metadata, so they are received by the tagging driver but must be processed by the switch driver - the one that is stateful since it keeps the TX timestamp queue. This means that there is an sja1110_process_meta_tstamp() symbol exported by the switch driver which is called by the tagging driver. There is a shim definition for that function when the switch driver is not compiled, which does nothing, but that shim is not effective when the tagging protocol driver is built-in and the switch driver is a module, because built-in code cannot call symbols exported by modules. So add an optional dependency between the tagger and the switch driver, if PTP support is enabled in the switch driver. If PTP is not enabled, sja1110_process_meta_tstamp() will translate into the shim "do nothing with these meta frames" function. Fixes: 566b18c8b752 ("net: dsa: sja1105: implement TX timestamping for SJA1110") Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-05netdevice: add the case if dev is NULLYajun Deng1-4/+8
Add the case if dev is NULL in dev_{put, hold}, so the caller doesn't need to care whether dev is NULL or not. Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-05net: Remove redundant if statementsYajun Deng39-168/+82
The 'if (dev)' statement already move into dev_{put , hold}, so remove redundant if statements. Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-05Revert "wwan: mhi: Fix build."David S. Miller1-1/+1
This reverts commit ab996c420508761f3313c15c5f72d06ca7dc1a5b. Only aplicable when net is merged into net-next. Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-05Merge branch 'GRO-Toeplitz-selftests'David S. Miller7-0/+2119
Coco Li says: ==================== GRO and Toeplitz hash selftests This patch contains two selftests in net, as well as respective scripts to run the tests on a single machine in loopback mode. GRO: tests the Linux kernel GRO behavior Toeplitz: tests the toeplitz hash implementation ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-05selftests/net: toeplitz testCoco Li4-0/+813
To verify that this hash implements the Toeplitz hash function. Additionally, provide a script toeplitz.sh to run the test in loopback mode on a networking device of choice (see setup_loopback.sh). Since the script modifies the NIC setup, it will not be run by selftests automatically. Tested: ./toeplitz.sh -i eth0 -irq_prefix <eth0_pattern> -t -6 carrier ready rxq 0: cpu 14 rxq 1: cpu 20 rxq 2: cpu 17 rxq 3: cpu 23 cpu 14: rx_hash 0x69103ebc [saddr fda8::2 daddr fda8::1 sport 58938 dport 8000] OK rxq 0 (cpu 14) ... cpu 20: rx_hash 0x257118b9 [saddr fda8::2 daddr fda8::1 sport 59258 dport 8000] OK rxq 1 (cpu 20) count: pass=111 nohash=0 fail=0 Test Succeeded! Signed-off-by: Coco Li <lixiaoyan@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-05selftests/net: GRO coalesce testCoco Li4-0/+1306
Implement a GRO testsuite that expects Linux kernel GRO behavior. All tests pass with the kernel software GRO stack. Run against a device with hardware GRO to verify that it matches the software stack. gro.c generates packets and sends them out through a packet socket. The receiver in gro.c (run separately) receives the packets on a packet socket, filters them by destination ports using BPF and checks the packet geometry to see whether GRO was applied. gro.sh provides a wrapper to run the gro.c in NIC loopback mode. It is not included in continuous testing because it modifies network configuration around a physical NIC: gro.sh sets the NIC in loopback mode, creates macvlan devices on the physical device in separate namespaces, and sends traffic generated by gro.c between the two namespaces to observe coalescing behavior. GRO coalescing is time sensitive. Some tests may prove flaky on some hardware. Note that this test suite tests for software GRO unless hardware GRO is enabled (ethtool -K $DEV rx-gro-hw on). To test, run ./gro.sh. The wrapper will output success or failed test names, and generate log.txt and stderr. Sample log.txt result: ... pure data packet of same size: Test succeeded large data packets followed by a smaller one: Test succeeded small data packets followed by a larger one: Test succeeded ... Sample stderr result: ... carrier ready running test ipv4 data Expected {200 }, Total 1 packets Received {200 }, Total 1 packets. ... Signed-off-by: Coco Li <lixiaoyan@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-05wwan: mhi: Fix build.David S. Miller1-1/+1
Reported-by: Mark Brown <broonie@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-05net/ipv6/mcast: Use struct_size() helperGustavo A. R. Silva2-10/+13
Replace IP6_SFLSIZE() with struct_size() helper in order to avoid any potential type mistakes or integer overflows that, in the worst scenario, could lead to heap overflows. Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-05net/ipv4/igmp: Use struct_size() helperGustavo A. R. Silva2-10/+13
Replace IP_SFLSIZE() with struct_size() helper in order to avoid any potential type mistakes or integer overflows that, in the worst scenario, could lead to heap overflows. Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-05net/ipv4/ipv6: Replace one-element arraya with flexible-array membersGustavo A. R. Silva4-30/+55
There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use “flexible array members”[1] for these cases. The older style of one-element or zero-length arrays should no longer be used[2]. Use an anonymous union with a couple of anonymous structs in order to keep userspace unchanged and refactor the related code accordingly: $ pahole -C group_filter net/ipv4/ip_sockglue.o struct group_filter { union { struct { __u32 gf_interface_aux; /* 0 4 */ /* XXX 4 bytes hole, try to pack */ struct __kernel_sockaddr_storage gf_group_aux; /* 8 128 */ /* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */ __u32 gf_fmode_aux; /* 136 4 */ __u32 gf_numsrc_aux; /* 140 4 */ struct __kernel_sockaddr_storage gf_slist[1]; /* 144 128 */ }; /* 0 272 */ struct { __u32 gf_interface; /* 0 4 */ /* XXX 4 bytes hole, try to pack */ struct __kernel_sockaddr_storage gf_group; /* 8 128 */ /* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */ __u32 gf_fmode; /* 136 4 */ __u32 gf_numsrc; /* 140 4 */ struct __kernel_sockaddr_storage gf_slist_flex[0]; /* 144 0 */ }; /* 0 144 */ }; /* 0 272 */ /* size: 272, cachelines: 5, members: 1 */ /* last cacheline: 16 bytes */ }; $ pahole -C compat_group_filter net/ipv4/ip_sockglue.o struct compat_group_filter { union { struct { __u32 gf_interface_aux; /* 0 4 */ struct __kernel_sockaddr_storage gf_group_aux __attribute__((__aligned__(4))); /* 4 128 */ /* --- cacheline 2 boundary (128 bytes) was 4 bytes ago --- */ __u32 gf_fmode_aux; /* 132 4 */ __u32 gf_numsrc_aux; /* 136 4 */ struct __kernel_sockaddr_storage gf_slist[1] __attribute__((__aligned__(4))); /* 140 128 */ } __attribute__((__packed__)) __attribute__((__aligned__(4))); /* 0 268 */ struct { __u32 gf_interface; /* 0 4 */ struct __kernel_sockaddr_storage gf_group __attribute__((__aligned__(4))); /* 4 128 */ /* --- cacheline 2 boundary (128 bytes) was 4 bytes ago --- */ __u32 gf_fmode; /* 132 4 */ __u32 gf_numsrc; /* 136 4 */ struct __kernel_sockaddr_storage gf_slist_flex[0] __attribute__((__aligned__(4))); /* 140 0 */ } __attribute__((__packed__)) __attribute__((__aligned__(4))); /* 0 140 */ } __attribute__((__aligned__(1))); /* 0 268 */ /* size: 268, cachelines: 5, members: 1 */ /* forced alignments: 1 */ /* last cacheline: 12 bytes */ } __attribute__((__packed__)); This helps with the ongoing efforts to globally enable -Warray-bounds and get us closer to being able to tighten the FORTIFY_SOURCE routines on memcpy(). [1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://www.kernel.org/doc/html/v5.10/process/deprecated.html#zero-length-and-one-element-arrays Link: https://github.com/KSPP/linux/issues/79 Link: https://github.com/KSPP/linux/issues/109 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-05Merge branch 'bridge-ioctl-fixes'David S. Miller3-18/+34
Nikolay Aleksandrov says: ==================== net: bridge: fix recent ioctl changes These are three fixes for the recent bridge removal of ndo_do_ioctl done by commit ad2f99aedf8f ("net: bridge: move bridge ioctls out of .ndo_do_ioctl"). Patch 01 fixes a deadlock of the new bridge ioctl hook lock and rtnl by taking a netdev reference and always taking the bridge ioctl lock first then rtnl from within the bridge hook. Patch 02 fixes old_deviceless() bridge calls device name argument, and patch 03 checks in dev_ifsioc()'s SIOCBRADD/DELIF cases if the netdevice is actually a bridge before interpreting its private ptr as net_bridge. Patch 01 was tested by running old bridge-utils commands with lockdep enabled. Patch 02 was tested again by using bridge-utils and using the respective ioctl calls on a "up" bridge device. Patch 03 was tested by using the addif ioctl on a non-bridge device (e.g. loopback). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-05net: core: don't call SIOCBRADD/DELIF for non-bridge devicesNikolay Aleksandrov1-0/+2
Commit ad2f99aedf8f ("net: bridge: move bridge ioctls out of .ndo_do_ioctl") changed SIOCBRADD/DELIF to use bridge's ioctl hook (br_ioctl_hook) without checking if the target netdevice is actually a bridge which can cause crashes and generally interpreting other devices' private pointers as net_bridge pointers. Crash example (lo - loopback): $ brctl addif lo ens16 BUG: kernel NULL pointer dereference, address: 000000000000059898 #PF: supervisor read access in kernel modede #PF: error_code(0x0000) - not-present pagege PGD 0 P4D 0 ^Ac Oops: 0000 [#1] SMP NOPTI CPU: 2 PID: 1376 Comm: brctl Kdump: loaded Tainted: G W 5.14.0-rc3+ #405 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-4.fc34 04/01/2014 RIP: 0010:add_del_if+0x1f/0x7c [bridge] Code: 80 bf 1b a0 41 5c e9 c0 3c 03 e1 0f 1f 44 00 00 41 55 41 54 41 89 f4 be 0c 00 00 00 55 48 89 fd 53 48 8b 87 88 00 00 00 89 d3 <4c> 8b a8 98 05 00 00 49 8b bd d0 00 00 00 e8 17 d7 f3 e0 84 c0 74 RSP: 0018:ffff888109d97cb0 EFLAGS: 00010202^Ac RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 000000000000000c RDI: ffff888101239bc0 RBP: ffff888101239bc0 R08: 0000000000000001 R09: 0000000000000000 R10: ffff888109d97cd8 R11: 00000000000000a3 R12: 0000000000000012 R13: 0000000000000000 R14: ffff888101239bc0 R15: ffff888109d97e10 FS: 00007fc1e365b540(0000) GS:ffff88822be80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000598 CR3: 0000000106506000 CR4: 00000000000006e0 Call Trace: br_ioctl_stub+0x7c/0x441 [bridge] br_ioctl_call+0x6d/0x8a dev_ifsioc+0x325/0x4e8 dev_ioctl+0x46b/0x4e1 sock_do_ioctl+0x7b/0xad sock_ioctl+0x2de/0x2f2 vfs_ioctl+0x1e/0x2b __do_sys_ioctl+0x63/0x86 do_syscall_64+0xcb/0xf2 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7fc1e3589427 Code: 00 00 90 48 8b 05 69 aa 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 39 aa 0c 00 f7 d8 64 89 01 48 RSP: 002b:00007ffc8d501d38 EFLAGS: 00000202 ORIG_RAX: 000000000000001010 RAX: ffffffffffffffda RBX: 0000000000000012 RCX: 00007fc1e3589427 RDX: 00007ffc8d501d60 RSI: 00000000000089a3 RDI: 0000000000000003 RBP: 00007ffc8d501d60 R08: 0000000000000000 R09: fefefeff77686d74 R10: fffffffffffff8f9 R11: 0000000000000202 R12: 00007ffc8d502e06 R13: 00007ffc8d502e06 R14: 0000000000000000 R15: 0000000000000000 Modules linked in: bridge stp llc bonding ipv6 virtio_net [last unloaded: llc]^Ac CR2: 0000000000000598 Reported-by: syzbot+79f4a8692e267bdb7227@syzkaller.appspotmail.com Fixes: ad2f99aedf8f ("net: bridge: move bridge ioctls out of .ndo_do_ioctl") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-05net: bridge: fix ioctl old_deviceless bridge argumentNikolay Aleksandrov1-1/+1
Commit ad2f99aedf8f ("net: bridge: move bridge ioctls out of .ndo_do_ioctl") changed the source of the argument copy in bridge's old_deviceless() from args[1] (user ptr to device name) to uarg (ptr to ioctl arguments) causing wrong device name to be used. Example (broken, bridge exists but is up): $ brctl delbr bridge bridge bridge doesn't exist; can't delete it Example (working): $ brctl delbr bridge bridge bridge is still up; can't delete it Fixes: ad2f99aedf8f ("net: bridge: move bridge ioctls out of .ndo_do_ioctl") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-05net: bridge: fix ioctl lockingNikolay Aleksandrov3-17/+31
Before commit ad2f99aedf8f ("net: bridge: move bridge ioctls out of .ndo_do_ioctl") the bridge ioctl calls were divided in two parts: one was deviceless called by sock_ioctl and didn't expect rtnl to be held, the other was with a device called by dev_ifsioc() and expected rtnl to be held. After the commit above they were united in a single ioctl stub, but it didn't take care of the locking expectations. For sock_ioctl now we acquire (1) br_ioctl_mutex, (2) rtnl and for dev_ifsioc we acquire (1) rtnl, (2) br_ioctl_mutex The fix is to get a refcnt on the netdev for dev_ifsioc calls and drop rtnl then to reacquire it in the bridge ioctl stub after br_ioctl_mutex has been acquired. That will avoid playing locking games and make the rules straight-forward: we always take br_ioctl_mutex first, and then rtnl. Reported-by: syzbot+34fe5894623c4ab1b379@syzkaller.appspotmail.com Fixes: ad2f99aedf8f ("net: bridge: move bridge ioctls out of .ndo_do_ioctl") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-05net/ipv4: Revert use of struct_size() helperGustavo A. R. Silva2-9/+7
Revert the use of structr_size() and stay with IP_MSFILTER_SIZE() for now, as in this case, the size of struct ip_msfilter didn't change with the addition of the flexible array imsf_slist_flex[]. So, if we use struct_size() we will be allocating and calculating the size of struct ip_msfilter with one too many items for imsf_slist_flex[]. We might use struct_size() in the future, but for now let's stay with IP_MSFILTER_SIZE(). Fixes: 2d3e5caf96b9 ("net/ipv4: Replace one-element array with flexible-array member") Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-05net: fix GRO skb truesize updatePaolo Abeni1-1/+1
commit 5e10da5385d2 ("skbuff: allow 'slow_gro' for skb carring sock reference") introduces a serious regression at the GRO layer setting the wrong truesize for stolen-head skbs. Restore the correct truesize: SKB_DATA_ALIGN(...) instead of SKB_TRUESIZE(...) Reported-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Fixes: 5e10da5385d2 ("skbuff: allow 'slow_gro' for skb carring sock reference") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Tested-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-05Merge branch 'eean-iosm-fixes'David S. Miller5-8/+8
M Chetan Kumar says: ==================== net: wwan: iosm: fixes This patch series contains IOSM Driver fixes. Below is the patch series breakdown. PATCH1: * Correct the td buffer type casting & format specifier to fix lkp buildbot warning. PATCH2: * Endianness type correction for nr_of_bytes. This field is exchanged as part of host-device protocol communication. PATCH3: * Correct ul/dl data protocol mask bit to know which protocol capability does device implement. PATCH4: * Calling unregister_netdevice() inside wwan del link is trying to acquire the held lock in ndo_stop_cb(). Instead, queue net dev to be unregistered later. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-05net: wwan: iosm: fix recursive lock acquire in unregisterM Chetan Kumar1-1/+1
Calling unregister_netdevice() inside wwan del link is trying to acquire the held lock in ndo_stop_cb(). Instead, queue net dev to be unregistered later. Signed-off-by: M Chetan Kumar <m.chetan.kumar@linux.intel.com> Reviewed-by: Loic Poulain <loic.poulain@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-05net: wwan: iosm: correct data protocol mask bitM Chetan Kumar1-2/+2
Correct ul/dl data protocol mask bit to know which protocol capability does device implement. Signed-off-by: M Chetan Kumar <m.chetan.kumar@linux.intel.com> Reviewed-by: Loic Poulain <loic.poulain@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-05net: wwan: iosm: endianness type correctionM Chetan Kumar2-3/+3
Endianness type correction for nr_of_bytes. This field is exchanged as part of host-device protocol communication. Signed-off-by: M Chetan Kumar <m.chetan.kumar@linux.intel.com> Reviewed-by: Loic Poulain <loic.poulain@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-05net: wwan: iosm: fix lkp buildbot warningM Chetan Kumar1-2/+2
Correct td buffer type casting & format specifier to fix lkp buildbot warning. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: M Chetan Kumar <m.chetan.kumar@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-05Merge branch 'ipa-runtime-pm'David S. Miller5-128/+171
Alex Elder says: ==================== net: ipa: more work toward runtime PM The first two patches in this series are basically bug fixes, but in practice I don't think we've seen the problems they might cause. The third patch moves clock and interconnect related error messages around a bit, reporting better information and doing so in the functions where they are enabled or disabled (rather than those functions' callers). The last three patches move power-related code into "ipa_clock.c", as a step toward generalizing the purpose of that source file. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-05net: ipa: move IPA flags fieldAlex Elder2-14/+14
The ipa->flags field is only ever used in "ipa_clock.c", related to suspend/resume activity. Move the definition of the ipa_flag enumerated type to "ipa_clock.c". And move the flags field from the ipa structure and to the ipa_clock structure. Rename the type and its values to include "power" or "POWER" in the name. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-05net: ipa: move ipa_suspend_handler()Alex Elder3-31/+53
Move ipa_suspend_handler() into "ipa_clock.c" from "ipa_main.c", to group with the reset of the suspend/resume code. This IPA interrupt is triggered if an IPA RX endpoint is suspended but has a packet to be delivered. Introduce ipa_power_setup() and ipa_power_teardown() to add and remove the handler for the IPA SUSPEND interrupt at the same place as before, while allowing the handler to remain private. The "power" naming convention will be adopted elsewhere in this file as well (soon). Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-05net: ipa: move IPA power operations to ipa_clock.cAlex Elder3-59/+65
Move ipa_suspend() and ipa_resume(), as well as the definition of the ipa_pm_ops structure into "ipa_clock.c". Make ipa_pm_ops public and declare it as extern in "ipa_clock.h". This is part of centralizing IPA power management functionality into "ipa_clock.c" (the file will eventually get a name change). Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-05net: ipa: improve IPA clock error messagesAlex Elder1-17/+22
Rearrange messages reported when errors occur in the IPA clock code, so that the specific interconnect is identified when an error occurs enabling or disabling it, or the core clock is indicated when an error occurs enabling it. Have ipa_interconnect_disable() return zero or the negative error value returned by the first interconnect that produced an error when disabled. For now, the callers ignore the returned value. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-05net: ipa: reorder netdev pointer assignmentsAlex Elder1-7/+9
Assign the ipa->modem_netdev and endpoint->netdev pointers *before* registering the network device. As soon as the device is registered it can be opened, and by that time we'll want those pointers valid. Similarly, don't make those pointers NULL until *after* the modem network device is unregistered in ipa_modem_stop(). Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-05net: ipa: don't suspend/resume modem if not upAlex Elder1-2/+10
The modem network device is set up by ipa_modem_start(). But its TX queue is not actually started and endpoints enabled until it is opened. So avoid stopping the modem network device TX queue and disabling endpoints on suspend or stop unless the netdev is marked UP. And skip attempting to resume unless it is UP. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-05Merge branch 'sja1105-H'David S. Miller2-66/+215
Vladimir Oltean says: ==================== NXP SJA1105 driver support for "H" switch topologies Changes in v3: Preserve the behavior of dsa_tree_setup_default_cpu() which is to pick the first CPU port and not the last. Changes in v2: Send as non-RFC, drop the patches for discarding DSA-tagged packets on user ports and DSA-untagged packets on DSA and CPU ports for now. NXP builds boards like the Bluebox 3 where there are multiple SJA1110 switches connected to an LX2160A, but they are also connected to each other. I call this topology an "H" tree because of the lateral connection between switches. A piece extracted from a non-upstream device tree looks like this: &spi_bridge { /* SW1 */ ethernet-switch@0 { compatible = "nxp,sja1110a"; reg = <0>; dsa,member = <0 0>; ethernet-ports { #address-cells = <1>; #size-cells = <0>; /* SW1_P1 */ port@1 { reg = <1>; label = "con_2x20"; phy-mode = "sgmii"; fixed-link { speed = <1000>; full-duplex; }; }; port@2 { reg = <2>; ethernet = <&dpmac17>; phy-mode = "rgmii-id"; fixed-link { speed = <1000>; full-duplex; }; }; port@3 { reg = <3>; label = "1ge_p1"; phy-mode = "rgmii-id"; phy-handle = <&sw1_mii3_phy>; }; sw1p4: port@4 { reg = <4>; link = <&sw2p1>; phy-mode = "sgmii"; fixed-link { speed = <1000>; full-duplex; }; }; port@5 { reg = <5>; label = "trx1"; phy-mode = "internal"; phy-handle = <&sw1_port5_base_t1_phy>; }; port@6 { reg = <6>; label = "trx2"; phy-mode = "internal"; phy-handle = <&sw1_port6_base_t1_phy>; }; port@7 { reg = <7>; label = "trx3"; phy-mode = "internal"; phy-handle = <&sw1_port7_base_t1_phy>; }; port@8 { reg = <8>; label = "trx4"; phy-mode = "internal"; phy-handle = <&sw1_port8_base_t1_phy>; }; port@9 { reg = <9>; label = "trx5"; phy-mode = "internal"; phy-handle = <&sw1_port9_base_t1_phy>; }; port@a { reg = <10>; label = "trx6"; phy-mode = "internal"; phy-handle = <&sw1_port10_base_t1_phy>; }; }; }; /* SW2 */ ethernet-switch@2 { compatible = "nxp,sja1110a"; reg = <2>; dsa,member = <0 1>; ethernet-ports { #address-cells = <1>; #size-cells = <0>; sw2p1: port@1 { reg = <1>; link = <&sw1p4>; phy-mode = "sgmii"; fixed-link { speed = <1000>; full-duplex; }; }; port@2 { reg = <2>; ethernet = <&dpmac18>; phy-mode = "rgmii-id"; fixed-link { speed = <1000>; full-duplex; }; }; port@3 { reg = <3>; label = "1ge_p2"; phy-mode = "rgmii-id"; phy-handle = <&sw2_mii3_phy>; }; port@4 { reg = <4>; label = "to_sw3"; phy-mode = "2500base-x"; fixed-link { speed = <2500>; full-duplex; }; }; port@5 { reg = <5>; label = "trx7"; phy-mode = "internal"; phy-handle = <&sw2_port5_base_t1_phy>; }; port@6 { reg = <6>; label = "trx8"; phy-mode = "internal"; phy-handle = <&sw2_port6_base_t1_phy>; }; port@7 { reg = <7>; label = "trx9"; phy-mode = "internal"; phy-handle = <&sw2_port7_base_t1_phy>; }; port@8 { reg = <8>; label = "trx10"; phy-mode = "internal"; phy-handle = <&sw2_port8_base_t1_phy>; }; port@9 { reg = <9>; label = "trx11"; phy-mode = "internal"; phy-handle = <&sw2_port9_base_t1_phy>; }; port@a { reg = <10>; label = "trx12"; phy-mode = "internal"; phy-handle = <&sw2_port10_base_t1_phy>; }; }; }; }; Basically it is a single DSA tree with 2 "ethernet" properties, i.e. a multi-CPU-port system. There is also a DSA link between the switches, but it is not a daisy chain topology, i.e. there is no "upstream" and "downstream" switch, the DSA link is only to be used for the bridge data plane (autonomous forwarding between switches, between the RJ-45 ports and the automotive Ethernet ports), otherwise all traffic that should reach the host should do so through the dedicated CPU port of the switch. Of course, plain forwarding in this topology is bound to create packet loops. I have thought long and hard about strategies to cut forwarding in such a way as to prevent loops but also not impede normal operation of the network on such a system, and I believe I have found a solution that does work as expected. This relies heavily on DSA's recent ability to perform RX filtering towards the host by installing MAC addresses as static FDB entries. Since we have 2 distinct DSA masters, we have 2 distinct MAC addresses, and if the bridge is configured to have its own MAC address that makes it 3 distinct MAC addresses. The bridge core, plus the switchdev_handle_fdb_add_to_device() extension, handle each MAC address by replicating it to each port of the DSA switch tree. So the end result is that both switch 1 and switch 2 will have static FDB entries towards their respective CPU ports for the 3 MAC addresses corresponding to the DSA masters and to the bridge net device (and of course, towards any station learned on a foreign interface). So I think the basic design works, and it is basically just as fragile as any other multi-CPU-port system is bound to be in terms of reliance on static FDB entries towards the host (if hardware address learning on the CPU port is to be used, MAC addresses would randomly bounce between one CPU port and the other otherwise). In fact, I think it is even better to start DSA's support of multi-CPU-port systems with something small like the NXP Bluebox 3, because we allow some time for the code paths like dsa_switch_host_address_match(), which were specifically designed for it, to break in, and this board needs no user space configuration of CPU ports, like static assignments between user and CPU ports, or bonding between the CPU ports/DSA masters. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-05net: dsa: sja1105: enable address learning on cascade portsVladimir Oltean1-2/+7
Right now, address learning is disabled on DSA ports, which means that a packet received over a DSA port from a cross-chip switch will be flooded to unrelated ports. It is desirable to eliminate that, but for that we need a breakdown of the possibilities for the sja1105 driver. A DSA port can be: - a downstream-facing cascade port. This is simple because it will always receive packets from a downstream switch, and there should be no other route to reach that downstream switch in the first place, which means it should be safe to learn that MAC address towards that switch. - an upstream-facing cascade port. This receives packets either: * autonomously forwarded by an upstream switch (and therefore these packets belong to the data plane of a bridge, so address learning should be ok), or * injected from the CPU. This deserves further discussion, as normally, an upstream-facing cascade port is no different than the CPU port itself. But with "H" topologies (a DSA link towards a switch that has its own CPU port), these are more "laterally-facing" cascade ports than they are "upstream-facing". Here, there is a risk that the port might learn the host addresses on the wrong port (on the DSA port instead of on its own CPU port), but this is solved by DSA's RX filtering infrastructure, which installs the host addresses as static FDB entries on the CPU port of all switches in a "H" tree. So even if there will be an attempt from the switch to migrate the FDB entry from the CPU port to the laterally-facing cascade port, it will fail to do that, because the FDB entry that already exists is static and cannot migrate. So address learning should be safe for this configuration too. Ok, so what about other MAC addresses coming from the host, not necessarily the bridge local FDB entries? What about MAC addresses dynamically learned on foreign interfaces, isn't there a risk that cascade ports will learn these entries dynamically when they are supposed to be delivered towards the CPU port? Well, that is correct, and this is why we also need to enable the assisted learning feature, to snoop for these addresses and write them to hardware as static FDB entries towards the CPU, to make the switch's learning process on the cascade ports ineffective for them. With assisted learning enabled, the hardware learning on the CPU port must be disabled. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-05net: dsa: sja1105: suppress TX packets from looping back in "H" topologiesVladimir Oltean1-0/+29
H topologies like this one have a problem: eth0 eth1 | | CPU port CPU port | DSA link | sw0p0 sw0p1 sw0p2 sw0p3 sw0p4 -------- sw1p4 sw1p3 sw1p2 sw1p1 sw1p0 | | | | | | user user user user user user port port port port port port Basically any packet sent by the eth0 DSA master can be flooded on the interconnecting DSA link sw0p4 <-> sw1p4 and it will be received by the eth1 DSA master too. Basically we are talking to ourselves. In VLAN-unaware mode, these packets are encoded using a tag_8021q TX VLAN, which dsa_8021q_rcv() rightfully cannot decode and complains. Whereas in VLAN-aware mode, the packets are encoded with a bridge VLAN which _can_ be decoded by the tagger running on eth1, so it will attempt to reinject that packet into the network stack (the bridge, if there is any port under eth1 that is under a bridge). In the case where the ports under eth1 are under the same cross-chip bridge as the ports under eth0, the TX packets will even be learned as RX packets. The only thing that will prevent loops with the software bridging path, and therefore disaster, is that the source port and the destination port are in the same hardware domain, and the bridge will receive packets from the driver with skb->offload_fwd_mark = true and will not forward between the two. The proper solution to this problem is to detect H topologies and enforce that all packets are received through the local switch and we do not attempt to receive packets on our CPU port from switches that have their own. This is a viable solution which works thanks to the fact that MAC addresses which should be filtered towards the host are installed by DSA as static MAC addresses towards the CPU port of each switch. TX from a CPU port towards the DSA port continues to be allowed, this is because sja1105 supports bridge TX forwarding offload, and the skb->dev used initially for xmit does not have any direct correlation with where the station that will respond to that packet is connected. It may very well happen that when we send a ping through a br0 interface that spans all switch ports, the xmit packet will exit the system through a DSA switch interface under eth1 (say sw1p2), but the destination station is connected to a switch port under eth0, like sw0p0. So the switch under eth1 needs to communicate on TX with the switch under eth0. The response, however, will not follow the same path, but instead, this patch enforces that the response is sent by the first switch directly to its DSA master which is eth0. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>