summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-02-17net: allow out-of-order netdev unregistrationJakub Kicinski1-27/+37
Sprinkle for each loops to allow netdevices to be unregistered out of order, as their refs are released. This prevents problems caused by dependencies between netdevs which want to release references in their ->priv_destructor. See commit d6ff94afd90b ("vlan: move dev_put into vlan_dev_uninit") for example. Eric has removed the only known ordering requirement in commit c002496babfd ("Merge branch 'ipv6-loopback'") so let's try this and see if anything explodes... Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/r/20220215225310.3679266-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-17net: transition netdev reg state earlier in run_todoJakub Kicinski1-9/+9
In prep for unregistering netdevs out of order move the netdev state validation and change outside of the loop. While at it modernize this code and use WARN() instead of pr_err() + dump_stack(). Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/r/20220215225310.3679266-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-17ucounts: Handle wrapping in is_ucounts_overlimitEric W. Biederman1-1/+2
While examining is_ucounts_overlimit and reading the various messages I realized that is_ucounts_overlimit fails to deal with counts that may have wrapped. Being wrapped should be a transitory state for counts and they should never be wrapped for long, but it can happen so handle it. Cc: stable@vger.kernel.org Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts") Link: https://lkml.kernel.org/r/20220216155832.680775-5-ebiederm@xmission.com Reviewed-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-02-17ucounts: Move RLIMIT_NPROC handling after set_userEric W. Biederman1-5/+14
During set*id() which cred->ucounts to charge the the current process to is not known until after set_cred_ucounts. So move the RLIMIT_NPROC checking into a new helper flag_nproc_exceeded and call flag_nproc_exceeded after set_cred_ucounts. This is very much an arbitrary subset of the places where we currently change the RLIMIT_NPROC accounting, designed to preserve the existing logic. Fixing the existing logic will be the subject of another series of changes. Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20220216155832.680775-4-ebiederm@xmission.com Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-02-17ucounts: Base set_cred_ucounts changes on the real userEric W. Biederman1-7/+2
Michal Koutný <mkoutny@suse.com> wrote: > Tasks are associated to multiple users at once. Historically and as per > setrlimit(2) RLIMIT_NPROC is enforce based on real user ID. > > The commit 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts") > made the accounting structure "indexed" by euid and hence potentially > account tasks differently. > > The effective user ID may be different e.g. for setuid programs but > those are exec'd into already existing task (i.e. below limit), so > different accounting is moot. > > Some special setresuid(2) users may notice the difference, justifying > this fix. I looked at cred->ucount and it is only used for rlimit operations that were previously stored in cred->user. Making the fact cred->ucount can refer to a different user from cred->user a bug, affecting all uses of cred->ulimit not just RLIMIT_NPROC. Fix set_cred_ucounts to always use the real uid not the effective uid. Further simplify set_cred_ucounts by noticing that set_cred_ucounts somehow retained a draft version of the check to see if alloc_ucounts was needed that checks the new->user and new->user_ns against the current_real_cred(). Remove that draft version of the check. All that matters for setting the cred->ucounts are the user_ns and uid fields in the cred. Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20220207121800.5079-4-mkoutny@suse.com Link: https://lkml.kernel.org/r/20220216155832.680775-3-ebiederm@xmission.com Reported-by: Michal Koutný <mkoutny@suse.com> Reviewed-by: Michal Koutný <mkoutny@suse.com> Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-02-17ucounts: Enforce RLIMIT_NPROC not RLIMIT_NPROC+1Eric W. Biederman1-5/+5
Michal Koutný <mkoutny@suse.com> wrote: > It was reported that v5.14 behaves differently when enforcing > RLIMIT_NPROC limit, namely, it allows one more task than previously. > This is consequence of the commit 21d1c5e386bc ("Reimplement > RLIMIT_NPROC on top of ucounts") that missed the sharpness of > equality in the forking path. This can be fixed either by fixing the test or by moving the increment to be before the test. Fix it my moving copy_creds which contains the increment before is_ucounts_overlimit. In the case of CLONE_NEWUSER the ucounts in the task_cred changes. The function is_ucounts_overlimit needs to use the final version of the ucounts for the new process. Which means moving the is_ucounts_overlimit test after copy_creds is necessary. Both the test in fork and the test in set_user were semantically changed when the code moved to ucounts. The change of the test in fork was bad because it was before the increment. The test in set_user was wrong and the change to ucounts fixed it. So this fix only restores the old behavior in one lcation not two. Link: https://lkml.kernel.org/r/20220204181144.24462-1-mkoutny@suse.com Link: https://lkml.kernel.org/r/20220216155832.680775-2-ebiederm@xmission.com Cc: stable@vger.kernel.org Reported-by: Michal Koutný <mkoutny@suse.com> Reviewed-by: Michal Koutný <mkoutny@suse.com> Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-02-17libbpf: Fix memleak in libbpf_netlink_recv()Andrii Nakryiko1-3/+5
Ensure that libbpf_netlink_recv() frees dynamically allocated buffer in all code paths. Fixes: 9c3de619e13e ("libbpf: Use dynamically allocated buffer when receiving netlink messages") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20220217073958.276959-1-andrii@kernel.org
2022-02-17rlimit: Fix RLIMIT_NPROC enforcement failure caused by capability calls in ↵Eric W. Biederman1-2/+1
set_user Solar Designer <solar@openwall.com> wrote: > I'm not aware of anyone actually running into this issue and reporting > it. The systems that I personally know use suexec along with rlimits > still run older/distro kernels, so would not yet be affected. > > So my mention was based on my understanding of how suexec works, and > code review. Specifically, Apache httpd has the setting RLimitNPROC, > which makes it set RLIMIT_NPROC: > > https://httpd.apache.org/docs/2.4/mod/core.html#rlimitnproc > > The above documentation for it includes: > > "This applies to processes forked from Apache httpd children servicing > requests, not the Apache httpd children themselves. This includes CGI > scripts and SSI exec commands, but not any processes forked from the > Apache httpd parent, such as piped logs." > > In code, there are: > > ./modules/generators/mod_cgid.c: ( (cgid_req.limits.limit_nproc_set) && ((rc = apr_procattr_limit_set(procattr, APR_LIMIT_NPROC, > ./modules/generators/mod_cgi.c: ((rc = apr_procattr_limit_set(procattr, APR_LIMIT_NPROC, > ./modules/filters/mod_ext_filter.c: rv = apr_procattr_limit_set(procattr, APR_LIMIT_NPROC, conf->limit_nproc); > > For example, in mod_cgi.c this is in run_cgi_child(). > > I think this means an httpd child sets RLIMIT_NPROC shortly before it > execs suexec, which is a SUID root program. suexec then switches to the > target user and execs the CGI script. > > Before 2863643fb8b9, the setuid() in suexec would set the flag, and the > target user's process count would be checked against RLIMIT_NPROC on > execve(). After 2863643fb8b9, the setuid() in suexec wouldn't set the > flag because setuid() is (naturally) called when the process is still > running as root (thus, has those limits bypass capabilities), and > accordingly execve() would not check the target user's process count > against RLIMIT_NPROC. In commit 2863643fb8b9 ("set_user: add capability check when rlimit(RLIMIT_NPROC) exceeds") capable calls were added to set_user to make it more consistent with fork. Unfortunately because of call site differences those capable calls were checking the credentials of the user before set*id() instead of after set*id(). This breaks enforcement of RLIMIT_NPROC for applications that set the rlimit and then call set*id() while holding a full set of capabilities. The capabilities are only changed in the new credential in security_task_fix_setuid(). The code in apache suexec appears to follow this pattern. Commit 909cc4ae86f3 ("[PATCH] Fix two bugs with process limits (RLIMIT_NPROC)") where this check was added describes the targes of this capability check as: 2/ When a root-owned process (e.g. cgiwrap) sets up process limits and then calls setuid, the setuid should fail if the user would then be running more than rlim_cur[RLIMIT_NPROC] processes, but it doesn't. This patch adds an appropriate test. With this patch, and per-user process limit imposed in cgiwrap really works. So the original use case of this check also appears to match the broken pattern. Restore the enforcement of RLIMIT_NPROC by removing the bad capable checks added in set_user. This unfortunately restores the inconsistent state the code has been in for the last 11 years, but dealing with the inconsistencies looks like a larger problem. Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20210907213042.GA22626@openwall.com/ Link: https://lkml.kernel.org/r/20220212221412.GA29214@openwall.com Link: https://lkml.kernel.org/r/20220216155832.680775-1-ebiederm@xmission.com Fixes: 2863643fb8b9 ("set_user: add capability check when rlimit(RLIMIT_NPROC) exceeds") History-Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Reviewed-by: Solar Designer <solar@openwall.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-02-17ping: fix the dif and sdif check in ping_lookupXin Long1-2/+9
When 'ping' changes to use PING socket instead of RAW socket by: # sysctl -w net.ipv4.ping_group_range="0 100" There is another regression caused when matching sk_bound_dev_if and dif, RAW socket is using inet_iif() while PING socket lookup is using skb->dev->ifindex, the cmd below fails due to this: # ip link add dummy0 type dummy # ip link set dummy0 up # ip addr add 192.168.111.1/24 dev dummy0 # ping -I dummy0 192.168.111.1 -c1 The issue was also reported on: https://github.com/iputils/iputils/issues/104 But fixed in iputils in a wrong way by not binding to device when destination IP is on device, and it will cause some of kselftests to fail, as Jianlin noticed. This patch is to use inet(6)_iif and inet(6)_sdif to get dif and sdif for PING socket, and keep consistent with RAW socket. Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind") Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-17block/wbt: fix negative inflight counter when remove scsi deviceLaibin Qiu2-2/+2
Now that we disable wbt by set WBT_STATE_OFF_DEFAULT in wbt_disable_default() when switch elevator to bfq. And when we remove scsi device, wbt will be enabled by wbt_enable_default. If it become false positive between wbt_wait() and wbt_track() when submit write request. The following is the scenario that triggered the problem. T1 T2 T3 elevator_switch_mq bfq_init_queue wbt_disable_default <= Set rwb->enable_state (OFF) Submit_bio blk_mq_make_request rq_qos_throttle <= rwb->enable_state (OFF) scsi_remove_device sd_remove del_gendisk blk_unregister_queue elv_unregister_queue wbt_enable_default <= Set rwb->enable_state (ON) q_qos_track <= rwb->enable_state (ON) ^^^^^^ this request will mark WBT_TRACKED without inflight add and will lead to drop rqw->inflight to -1 in wbt_done() which will trigger IO hung. Fix this by move wbt_enable_default() from elv_unregister to bfq_exit_queue(). Only re-enable wbt when bfq exit. Fixes: 76a8040817b4b ("blk-wbt: make sure throttle is enabled properly") Remove oneline stale comment, and kill one oneshot local variable. Signed-off-by: Ming Lei <ming.lei@rehdat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/linux-block/20211214133103.551813-1-qiulaibin@huawei.com/ Signed-off-by: Laibin Qiu <qiulaibin@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-17block: fix surprise removal for drivers calling blk_set_queue_dyingChristoph Hellwig9-15/+24
Various block drivers call blk_set_queue_dying to mark a disk as dead due to surprise removal events, but since commit 8e141f9eb803 that doesn't work given that the GD_DEAD flag needs to be set to stop I/O. Replace the driver calls to blk_set_queue_dying with a new (and properly documented) blk_mark_disk_dead API, and fold blk_set_queue_dying into the only remaining caller. Fixes: 8e141f9eb803 ("block: drain file system I/O on del_gendisk") Reported-by: Markus Blöchl <markus.bloechl@ipetronik.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Link: https://lore.kernel.org/r/20220217075231.1140-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-17block-map: add __GFP_ZERO flag for alloc_page in function bio_copy_kernHaimin Zhang1-1/+1
Add __GFP_ZERO flag for alloc_page in function bio_copy_kern to initialize the buffer of a bio. Signed-off-by: Haimin Zhang <tcs.kernel@gmail.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220216084038.15635-1-tcs.kernel@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-17net: usb: cdc_mbim: avoid altsetting toggling for Telit FN990Daniele Palmas1-0/+5
Add quirk CDC_MBIM_FLAG_AVOID_ALTSETTING_TOGGLE for Telit FN990 0x1071 composition in order to avoid bind error. Signed-off-by: Daniele Palmas <dnlplm@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-17Merge branch 'ping6-SOL_IPV6'David S. Miller4-28/+320
Jakub Kicinski says: ==================== net: ping6: support setting basic SOL_IPV6 options via cmsg Support for IPV6_HOPLIMIT, IPV6_TCLASS, IPV6_DONTFRAG on ICMPv6 sockets and associated tests. I have no immediate plans to implement IPV6_FLOWINFO and all the extension header stuff. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-17selftests: net: basic test for IPV6_2292*Jakub Kicinski2-1/+41
Add a basic test to make sure ping sockets don't crash with IPV6_2292* options. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-17selftests: net: test IPV6_HOPLIMITJakub Kicinski2-1/+49
Test setting IPV6_HOPLIMIT via setsockopt and cmsg across socket types. Output without the kernel support (this series): Case HOPLIMIT ICMP cmsg - packet data returned 1, expected 0 Case HOPLIMIT ICMP diff - packet data returned 1, expected 0 Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-17selftests: net: test IPV6_TCLASSJakub Kicinski2-1/+69
Test setting IPV6_TCLASS via setsockopt and cmsg across socket types. Output without the kernel support (this series): Case TCLASS ICMP cmsg - packet data returned 1, expected 0 Case TCLASS ICMP cmsg - rejection returned 0, expected 1 Case TCLASS ICMP diff - pass returned 1, expected 0 Case TCLASS ICMP diff - packet data returned 1, expected 0 Case TCLASS ICMP diff - rejection returned 0, expected 1 Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-17selftests: net: test IPV6_DONTFRAGJakub Kicinski2-23/+147
Test setting IPV6_DONTFRAG via setsockopt and cmsg across socket types. Output without the kernel support (this series): Case DONTFRAG ICMP setsock returned 0, expected 1 Case DONTFRAG ICMP cmsg returned 0, expected 1 Case DONTFRAG ICMP both returned 0, expected 1 Case DONTFRAG ICMP diff returned 0, expected 1 FAIL - 4/24 cases failed Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-17net: ping6: support setting basic SOL_IPV6 options via cmsgJakub Kicinski2-5/+17
Support setting IPV6_HOPLIMIT, IPV6_TCLASS, IPV6_DONTFRAG during sendmsg via SOL_IPV6 cmsgs. tclass and dontfrag are init'ed from struct ipv6_pinfo in ipcm6_init_sk(), while hlimit is inited to -1, so we need to handle it being populated via cmsg explicitly. Leave extension headers and flowlabel unimplemented. Those are slightly more laborious to test and users seem to primarily care about IPV6_TCLASS. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-17Merge branch 'switchdev-BRENTRY'David S. Miller6-31/+6
Vladimir Oltean says: ==================== kRemove BRENTRY checks from switchdev drivers As discussed here: https://patchwork.kernel.org/project/netdevbpf/patch/20220214233111.1586715-2-vladimir.oltean@nxp.com/#24738869 no switchdev driver makes use of VLAN port objects that lack the BRIDGE_VLAN_INFO_BRENTRY flag. Notifying them in the first place rather seems like an omission of commit 9c86ce2c1ae3 ("net: bridge: Notify about bridge VLANs"). Since commit 3116ad0696dd ("net: bridge: vlan: don't notify to switchdev master VLANs without BRENTRY flag") that was just merged, the bridge no longer notifies switchdev upon creation of these VLANs, so we can remove the checks from drivers. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-17net: ti: cpsw: remove guards against !BRIDGE_VLAN_INFO_BRENTRYVladimir Oltean1-4/+0
Since commit 3116ad0696dd ("net: bridge: vlan: don't notify to switchdev master VLANs without BRENTRY flag"), the bridge no longer emits switchdev notifiers for VLANs that don't have the BRIDGE_VLAN_INFO_BRENTRY flag, so these checks are dead code. Remove them. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-17net: ti: am65-cpsw-nuss: remove guards against !BRIDGE_VLAN_INFO_BRENTRYVladimir Oltean1-4/+0
Since commit 3116ad0696dd ("net: bridge: vlan: don't notify to switchdev master VLANs without BRENTRY flag"), the bridge no longer emits switchdev notifiers for VLANs that don't have the BRIDGE_VLAN_INFO_BRENTRY flag, so these checks are dead code. Remove them. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-17net: sparx5: remove guards against !BRIDGE_VLAN_INFO_BRENTRYVladimir Oltean1-6/+4
Since commit 3116ad0696dd ("net: bridge: vlan: don't notify to switchdev master VLANs without BRENTRY flag"), the bridge no longer emits switchdev notifiers for VLANs that don't have the BRIDGE_VLAN_INFO_BRENTRY flag, so these checks are dead code. Remove them. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-17net: lan966x: remove guards against !BRIDGE_VLAN_INFO_BRENTRYVladimir Oltean1-12/+0
Since commit 3116ad0696dd ("net: bridge: vlan: don't notify to switchdev master VLANs without BRENTRY flag"), the bridge no longer emits switchdev notifiers for VLANs that don't have the BRIDGE_VLAN_INFO_BRENTRY flag, so these checks are dead code. Remove them. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-17mlxsw: spectrum: remove guards against !BRIDGE_VLAN_INFO_BRENTRYVladimir Oltean2-5/+2
Since commit 3116ad0696dd ("net: bridge: vlan: don't notify to switchdev master VLANs without BRENTRY flag"), the bridge no longer emits switchdev notifiers for VLANs that don't have the BRIDGE_VLAN_INFO_BRENTRY flag, so these checks are dead code. Remove them. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-17Merge branch 'ptp-over-udp-dsa'David S. Miller9-220/+194
Vladimir Oltean says: ==================== Support PTP over UDP with the ocelot-8021q DSA tagging protocol The alternative tag_8021q-based tagger for Ocelot switches, added here: https://patchwork.kernel.org/project/netdevbpf/cover/20210129010009.3959398-1-olteanv@gmail.com/ gained support for PTP over L2 here: https://patchwork.kernel.org/project/netdevbpf/cover/20210213223801.1334216-1-olteanv@gmail.com/ mostly as a minimum viable requirement. That PTP support was mostly self-contained code that installed some rules to replicate PTP packets on the CPU queue, in felix_setup_mmio_filtering(). However ocelot-8021q starts to look more interesting for general purpose usage, so it is now time to reduce the technical debt by integrating the PTP traps used by Felix for tag_8021q with the rest of the Ocelot driver. There is further consolidation of traps to be done. The cookies used by MRP traps overlap with the cookies used for tag_8021q PTP traps, so those features could not be used at the same time. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-17net: dsa: tag_ocelot_8021q: calculate TX checksum in software for deferred ↵Vladimir Oltean1-0/+7
packets DSA inherits NETIF_F_CSUM_MASK from master->vlan_features, and the expectation is that TX checksumming is offloaded and not done in software. Normally the DSA master takes care of this, but packets handled by ocelot_defer_xmit() are a very special exception, because they are actually injected into the switch through register-based MMIO. So the DSA master is not involved at all for these packets => no one calculates the checksum. This allows PTP over UDP to work using the ocelot-8021q tagging protocol. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-17net: dsa: felix: update destinations of existing traps with ocelot-8021qVladimir Oltean2-114/+77
Historically, the felix DSA driver has installed special traps such that PTP over L2 works with the ocelot-8021q tagging protocol; commit 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping") has the details. Then the ocelot switch library also gained more comprehensive support for PTP traps through commit 96ca08c05838 ("net: mscc: ocelot: set up traps for PTP packets"). Right now, PTP over L2 works using ocelot-8021q via the traps it has set for itself, but nothing else does. Consolidating the two code blocks would make ocelot-8021q gain support for PTP over L4 and tc-flower traps, and at the same time avoid some code and TCAM duplication. The traps are similar in intent, but different in execution, so some explanation is required. The traps set up by felix_setup_mmio_filtering() are VCAP IS1 filters, which have a PAG that chains them to a VCAP IS2 filter, and the IS2 is where the 'trap' action resides. The traps set up by ocelot_trap_add(), on the other hand, have a single filter, in VCAP IS2. The reason for chaining VCAP IS1 and IS2 in Felix was to ensure that the hardcoded traps take precedence and cannot be overridden by the Ocelot switch library. So in principle, the PTP traps needed for ocelot-8021q in the Felix driver can rely on ocelot_trap_add(), but the filters need to be patched to account for a quirk that LS1028A has: the quirk_no_xtr_irq described in commit 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping"). Live-patching is done by iterating through the trap list every time we know it has been updated, and transforming a trap into a redirect + CPU copy if ocelot-8021q is in use. Making the DSA ocelot-8021q tagger work with the Ocelot traps means we can eliminate the dedicated OCELOT_VCAP_IS1_TAG_8021Q_PTP_MMIO and OCELOT_VCAP_IS2_TAG_8021Q_PTP_MMIO cookies. To minimize the patch delta, OCELOT_VCAP_IS2_MRP_TRAP takes the place of OCELOT_VCAP_IS2_TAG_8021Q_PTP_MMIO (the alternative would have been to left-shift all cookie numbers by 1). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-17net: dsa: felix: remove dead code in felix_setup_mmio_filtering()Vladimir Oltean1-5/+3
There has been some controversy related to the sanity check that a CPU port exists, and commit e8b1d7698038 ("net: dsa: felix: Fix memory leak in felix_setup_mmio_filtering") even "corrected" an apparent memory leak as static analysis tools see it. However, the check is completely dead code, since the earliest point at which felix_setup_mmio_filtering() can be called is: felix_pci_probe -> dsa_register_switch -> dsa_switch_probe -> dsa_tree_setup -> dsa_tree_setup_cpu_ports -> dsa_tree_setup_default_cpu -> contains the "DSA: tree %d has no CPU port\n" check -> dsa_tree_setup_master -> dsa_master_setup -> sysfs_create_group(&dev->dev.kobj, &dsa_group); -> makes tagging_store() callable -> dsa_tree_change_tag_proto -> dsa_tree_notify -> dsa_switch_event -> dsa_switch_change_tag_proto -> ds->ops->change_tag_protocol -> felix_change_tag_protocol -> felix_set_tag_protocol -> felix_setup_tag_8021q -> felix_setup_mmio_filtering -> breaks at first CPU port So probing would have failed earlier if there wasn't any CPU port defined. To avoid all confusion, delete the dead code and replace it with a comment. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-17net: mscc: ocelot: annotate which traps need PTP timestampingVladimir Oltean4-8/+12
The ocelot switch library does not need this information, but the felix DSA driver does. As a reminder, the VSC9959 switch in LS1028A doesn't have an IRQ line for packet extraction, so to be notified that a PTP packet needs to be dequeued, it receives that packet also over Ethernet, by setting up a packet trap. The Felix driver needs to install special kinds of traps for packets in need of RX timestamps, such that the packets are replicated both over Ethernet and over the CPU port module. But the Ocelot switch library sets up more than one trap for PTP event messages; it also traps PTP general messages, MRP control messages etc. Those packets don't need PTP timestamps, so there's no reason for the Felix driver to send them to the CPU port module. By knowing which traps need PTP timestamps, the Felix driver can adjust the traps installed using ocelot_trap_add() such that only those will actually get delivered to the CPU port module. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-17net: mscc: ocelot: keep traps in a listVladimir Oltean5-2/+14
When using the ocelot-8021q tagging protocol, the CPU port isn't configured as an NPI port, but is a regular port. So a "trap to CPU" operation is actually a "redirect" operation. So DSA needs to set up the trapping action one way or another, depending on the tagging protocol in use. To ease DSA's work of modifying the action, keep all currently installed traps in a list, so that DSA can live-patch them when the tagging protocol changes. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-17net: dsa: felix: use DSA port iteration helpersVladimir Oltean1-48/+27
Use the helpers that avoid the quadratic complexity associated with calling dsa_to_port() indirectly: dsa_is_unused_port(), dsa_is_cpu_port(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-17net: mscc: ocelot: avoid overlap in VCAP IS2 between PTP and MRP trapsVladimir Oltean1-8/+8
OCELOT_VCAP_IS2_TAG_8021Q_TXVLAN overlaps with OCELOT_VCAP_IS2_MRP_REDIRECT. To avoid this, make OCELOT_VCAP_IS2_MRP_REDIRECT take the cookie region from N to 2 * N - 1 (where N is ocelot->num_phys_ports). To avoid any risk that the singleton (not per port) VCAP IS2 filters overlap with per-port VCAP IS2 filters, we must ensure that the number of singleton filters is smaller than the number of physical ports. This is true right now, but may change in the future as switches with less ports get supported, or more singleton filters get added. So to be future-proof, let's move the singleton filters at the end of the range, where they won't overlap with anything to their right. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-17net: mscc: ocelot: use a single VCAP filter for all MRP trapsVladimir Oltean4-31/+30
The MRP assist code installs a VCAP IS2 trapping rule for each port, but since the key and the action is the same, just the ingress port mask differs, there isn't any need to do this. We can save some space in the TCAM by using a single filter and adjusting the ingress port mask. Reuse the ocelot_trap_add() and ocelot_trap_del() functions for this purpose. Now that the cookies are no longer per port, we need to change the allocation scheme such that MRP traps use a fixed number. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-17net: mscc: ocelot: delete OCELOT_MRP_CPUQVladimir Oltean2-3/+0
MRP frames are configured to be trapped to the CPU queue 7, and this number is reflected in the extraction header. However, the information isn't used anywhere, so just leave MRP frames to go to CPU queue 0 unless needed otherwise. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-17net: mscc: ocelot: consolidate cookie allocation for private VCAP rulesVladimir Oltean4-26/+40
Every use case that needed VCAP filters (in order: DSA tag_8021q, MRP, PTP traps) has hardcoded filter identifiers that worked well enough for that use case alone. But when two or more of those use cases would be used together, some of those identifiers would overlap, leading to breakage. Add definitions for each cookie and centralize them in ocelot_vcap.h, such that the overlaps are more obvious. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-17net: mscc: ocelot: use a consistent cookie for MRP trapsVladimir Oltean1-1/+2
The driver uses an identifier equal to (ocelot->num_phys_ports + port) for MRP traps installed when the system is in the role of an MRC, and an identifier equal to (port) otherwise. Use the same identifier in both cases as a consolidation for the various cookie values spread throughout the driver. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-17Merge tag 'mlx5-updates-2022-02-16' of ↵David S. Miller31-244/+714
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2022-02-16 Misc updates for mlx5: 1) Alex Liu Adds support for using xdp->data_meta 2) Aya Levin Adds PTP counters and port time stamp mode for representors and switchdev mode. 3) Tariq Toukan, Striding RQ simple improvements. 4) Roi Dayan (7): Create multiple attr instances per flow Some TC actions use post actions for their implementation. For example CT and sample actions. Create a new flow attr instance after each multi table action and create a post action rule for it as a generic parsing step. Now multi table actions like CT, sample don't require to do it. When flow has multiple attr instances, the first flow attr is being offloaded normally and linked to the next attr (post action rule) by setting an id on reg_c for matching. Post action rule (rule created from second attr instance) match the id on reg_c and does rest of the actions. Example rule with actions CT,goto will be created with 2 attr instances as following: attr1(CT)->attr2(goto) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-17perf bpf: Defer freeing string after possible strlen() on itArnaldo Carvalho de Melo1-1/+2
This was detected by the gcc in Fedora Rawhide's gcc: 50 11.01 fedora:rawhide : FAIL gcc version 12.0.1 20220205 (Red Hat 12.0.1-0) (GCC) inlined from 'bpf__config_obj' at util/bpf-loader.c:1242:9: util/bpf-loader.c:1225:34: error: pointer 'map_opt' may be used after 'free' [-Werror=use-after-free] 1225 | *key_scan_pos += strlen(map_opt); | ^~~~~~~~~~~~~~~ util/bpf-loader.c:1223:9: note: call to 'free' here 1223 | free(map_name); | ^~~~~~~~~~~~~~ cc1: all warnings being treated as errors So do the calculations on the pointer before freeing it. Fixes: 04f9bf2bac72480c ("perf bpf-loader: Add missing '*' for key_scan_pos") Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang ShaoBo <bobo.shaobowang@huawei.com> Link: https://lore.kernel.org/lkml/Yg1VtQxKrPpS3uNA@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-17Merge tag 'amd-drm-fixes-5.17-2022-02-16' of ↵Dave Airlie5-13/+41
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-5.17-2022-02-16: amdgpu: - Stable pstate clock fixes for Dimgrey Cavefish and Beige Goby - S0ix SDMA fix - Yellow Carp GPU reset fix radeon: - Backlight fix for iMac 12,1 Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220217035242.8084-1-alexander.deucher@amd.com
2022-02-17ASoC: intel: skylake: Set max DMA segment sizeTakashi Iwai1-0/+1
The recent code refactoring to use the standard DMA helper requires the max DMA segment size setup for SG list management. Without it, the kernel may spew warnings when a large buffer is allocated. This patch sets up dma_set_max_seg_size() for avoiding spurious warnings. Fixes: 2c95b92ecd92 ("ALSA: memalloc: Unify x86 SG-buffer handling (take#3)") Acked-by: Cezary Rojewski <cezary.rojewski@intel.com> Acked-by: Mark Brown <broonie@kernel.org> Cc: <stable@vger.kernel.org> BugLink: https://github.com/thesofproject/linux/issues/3430 Link: https://lore.kernel.org/r/20220215132756.31236-4-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2022-02-17ASoC: SOF: hda: Set max DMA segment sizeTakashi Iwai1-0/+1
The recent code refactoring to use the standard DMA helper requires the max DMA segment size setup for SG list management. Without it, the kernel may spew warnings when a large buffer is allocated. This patch sets up dma_set_max_seg_size() for avoiding spurious warnings. Fixes: 2c95b92ecd92 ("ALSA: memalloc: Unify x86 SG-buffer handling (take#3)") Acked-by: Mark Brown <broonie@kernel.org> Cc: <stable@vger.kernel.org> BugLink: https://github.com/thesofproject/linux/issues/3430 Link: https://lore.kernel.org/r/20220215132756.31236-3-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2022-02-17ALSA: hda: Set max DMA segment sizeTakashi Iwai1-0/+1
The recent code refactoring to use the standard DMA helper requires the max DMA segment size setup for SG list management. Without it, the kernel may spew warnings when a large buffer is allocated. This patch sets up dma_set_max_seg_size() for avoiding spurious warnings. Fixes: 2c95b92ecd92 ("ALSA: memalloc: Unify x86 SG-buffer handling (take#3)") Cc: <stable@vger.kernel.org> BugLink: https://github.com/thesofproject/linux/issues/3430 Link: https://lore.kernel.org/r/20220215132756.31236-2-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2022-02-17net/mlx5e: TC, Allow sample action with CTRoi Dayan2-9/+5
Allow sample+CT actions but still block sample+CT NAT as it is not supported. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-02-17net/mlx5e: TC, Make post_act parse CT and sample actionsRoi Dayan1-2/+3
Before this commit post_act can be used for normal rules and didn't handle special cases like CT and sample. With this commit post_act rule can also handle the special cases when needed. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-02-17net/mlx5e: TC, Clean redundant counter flag from tc action parsersRoi Dayan8-15/+7
When tc actions being parsed only the last flow attr created needs the counter flag and the previous flags being reset. Clean the flag from the tc action parsers. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-02-17net/mlx5e: Use multi table support for CT and sample actionsRoi Dayan6-94/+80
CT and sample actions use post actions for their implementation. Flag those actions as multi table actions so the post act infrastructure will handle the post actions allocation. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-02-17net/mlx5e: Create new flow attr for multi table actionsRoi Dayan9-67/+483
Some TC actions use post actions for their implementation. For example CT and sample actions. Create a new flow attr after each multi table action and create a post action rule for it. First flow attr being offloaded normally and linked to the next attr (post action rule) with setting an id on reg_c. Post action rules match the id on reg_c and continue to the next one. The flow counter is allocated on the last rule. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-02-17net/mlx5e: Add post act offload/unoffload APIRoi Dayan2-21/+54
Introduce mlx5e_tc_post_act_offload() and mlx5e_tc_post_act_unoffload() to be able to unoffload and reoffload existing post action rules handles. For example in neigh update events, the driver removes and readds rules in hardware. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-02-17net/mlx5e: Pass actions param to actions_match_supported()Roi Dayan2-3/+8
Currently the mlx5_flow object contains a single mlx5_attr instance. However, multi table actions (e.g. CT) instantiate multiple attr instances. Currently action_match_supported() reads the actions flag from the flow's attribute instance. Modify the function to receive the action flags as a parameter which is set by the calling function and pass the aggregated actions to actions_match_supported(). Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>