diff options
| author | Jakub Kicinski <kuba@kernel.org> | 2022-02-08 07:12:48 +0300 |
|---|---|---|
| committer | Jakub Kicinski <kuba@kernel.org> | 2022-02-08 07:12:48 +0300 |
| commit | c3e676b98326a419f30dd5d956c68fc33323f4fd (patch) | |
| tree | 02229287b1b9ab91b42bde52ad5ff2f1eb69e6e2 /tools | |
| parent | 642436a1ad34a28c45bbc2bdc131640a73782356 (diff) | |
| parent | 32ccf1107980e8ed5c62cf6666da7a47a4fc7ecf (diff) | |
| download | linux-c3e676b98326a419f30dd5d956c68fc33323f4fd.tar.xz | |
Merge branch 'inet-separate-dscp-from-ecn-bits-using-new-dscp_t-type'
Guillaume Nault says:
====================
inet: Separate DSCP from ECN bits using new dscp_t type
The networking stack currently doesn't clearly distinguish between DSCP
and ECN bits. The entire DSCP+ECN bits are stored in u8 variables (or
structure fields), and each part of the stack handles them in their own
way, using different macros. This has created several bugs in the past
and some uncommon code paths are still unfixed.
Such bugs generally manifest by selecting invalid routes because of ECN
bits interfering with FIB routes and rules lookups (more details in the
LPC 2021 talk[1] and in the RFC of this series[2]).
This patch series aims at preventing the introduction of such bugs (and
detecting existing ones), by introducing a dscp_t type, representing
"sanitised" DSCP values (that is, with no ECN information), as opposed
to plain u8 values that contain both DSCP and ECN information. dscp_t
makes it clear for the reader what we're working on, and Sparse can
flag invalid interactions between dscp_t and plain u8.
This series converts only a few variables and structures:
* Patch 1 converts the tclass field of struct fib6_rule. It
effectively forbids the use of ECN bits in the tos/dsfield option
of ip -6 rule. Rules now match packets solely based on their DSCP
bits, so ECN doesn't influence the result any more. This contrasts
with the previous behaviour where all 8 bits of the Traffic Class
field were used. It is believed that this change is acceptable as
matching ECN bits wasn't usable for IPv4, so only IPv6-only
deployments could be depending on it. Also the previous behaviour
made DSCP-based ip6-rules fail for packets with both a DSCP and an
ECN mark, which is another reason why any such deploy is unlikely.
* Patch 2 converts the tos field of struct fib4_rule. This one too
effectively forbids defining ECN bits, this time in ip -4 rule.
Before that, setting ECN bit 1 was accepted, while ECN bit 0 was
rejected. But even when accepted, the rule would never match, as
the packets would have their ECN bits cleared before doing the
rule lookup.
* Patch 3 converts the fc_tos field of struct fib_config. This is
equivalent to patch 2, but for IPv4 routes. Routes using a
tos/dsfield option with any ECN bit set is now rejected. Before
this patch, they were accepted but, as with ip4 rules, these routes
couldn't match any packet, since their ECN bits are cleared before
the lookup.
* Patch 4 converts the fa_tos field of struct fib_alias. This one is
pure internal u8 to dscp_t conversion. While patches 1-3 had user
facing consequences, this patch shouldn't have any side effect and
is there to give an overview of what future conversion patches will
look like. Conversions are quite mechanical, but imply some code
churn, which is the price for the extra clarity a possibility of
type checking.
To summarise, all the behaviour changes required for the dscp_t type
approach to work should be contained in patches 1-3. These changes are
edge cases of ip-route and ip-rule that don't currently work properly.
So they should be safe. Also, a kernel selftest is added for each of
them.
Finally, this work also paves the way for allowing the usage of the 3
high order DSCP bits in IPv4 (a few call paths already handle them, but
in general the stack clears them before IPv4 rule and route lookups).
References:
[1] LPC 2021 talk:
- https://linuxplumbersconf.org/event/11/contributions/943/
- Direct link to slide deck:
https://linuxplumbersconf.org/event/11/contributions/943/attachments/901/1780/inet_tos_lpc2021.pdf
[2] RFC version of this series:
- https://lore.kernel.org/netdev/cover.1638814614.git.gnault@redhat.com/
====================
Link: https://lore.kernel.org/r/cover.1643981839.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Diffstat (limited to 'tools')
| -rwxr-xr-x | tools/testing/selftests/net/fib_rule_tests.sh | 60 | ||||
| -rwxr-xr-x | tools/testing/selftests/net/fib_tests.sh | 76 |
2 files changed, 134 insertions, 2 deletions
diff --git a/tools/testing/selftests/net/fib_rule_tests.sh b/tools/testing/selftests/net/fib_rule_tests.sh index 3b0489910422..4f70baad867d 100755 --- a/tools/testing/selftests/net/fib_rule_tests.sh +++ b/tools/testing/selftests/net/fib_rule_tests.sh @@ -114,10 +114,25 @@ fib_rule6_test_match_n_redirect() log_test $? 0 "rule6 del by pref: $description" } +fib_rule6_test_reject() +{ + local match="$1" + local rc + + $IP -6 rule add $match table $RTABLE 2>/dev/null + rc=$? + log_test $rc 2 "rule6 check: $match" + + if [ $rc -eq 0 ]; then + $IP -6 rule del $match table $RTABLE + fi +} + fib_rule6_test() { local getmatch local match + local cnt # setup the fib rule redirect route $IP -6 route add table $RTABLE default via $GW_IP6 dev $DEV onlink @@ -128,8 +143,21 @@ fib_rule6_test() match="from $SRC_IP6 iif $DEV" fib_rule6_test_match_n_redirect "$match" "$match" "iif redirect to table" + # Reject dsfield (tos) options which have ECN bits set + for cnt in $(seq 1 3); do + match="dsfield $cnt" + fib_rule6_test_reject "$match" + done + + # Don't take ECN bits into account when matching on dsfield match="tos 0x10" - fib_rule6_test_match_n_redirect "$match" "$match" "tos redirect to table" + for cnt in "0x10" "0x11" "0x12" "0x13"; do + # Using option 'tos' instead of 'dsfield' as old iproute2 + # versions don't support 'dsfield' in ip rule show. + getmatch="tos $cnt" + fib_rule6_test_match_n_redirect "$match" "$getmatch" \ + "$getmatch redirect to table" + done match="fwmark 0x64" getmatch="mark 0x64" @@ -187,10 +215,25 @@ fib_rule4_test_match_n_redirect() log_test $? 0 "rule4 del by pref: $description" } +fib_rule4_test_reject() +{ + local match="$1" + local rc + + $IP rule add $match table $RTABLE 2>/dev/null + rc=$? + log_test $rc 2 "rule4 check: $match" + + if [ $rc -eq 0 ]; then + $IP rule del $match table $RTABLE + fi +} + fib_rule4_test() { local getmatch local match + local cnt # setup the fib rule redirect route $IP route add table $RTABLE default via $GW_IP4 dev $DEV onlink @@ -206,8 +249,21 @@ fib_rule4_test() fib_rule4_test_match_n_redirect "$match" "$match" "iif redirect to table" ip netns exec testns sysctl -qw net.ipv4.ip_forward=0 + # Reject dsfield (tos) options which have ECN bits set + for cnt in $(seq 1 3); do + match="dsfield $cnt" + fib_rule4_test_reject "$match" + done + + # Don't take ECN bits into account when matching on dsfield match="tos 0x10" - fib_rule4_test_match_n_redirect "$match" "$match" "tos redirect to table" + for cnt in "0x10" "0x11" "0x12" "0x13"; do + # Using option 'tos' instead of 'dsfield' as old iproute2 + # versions don't support 'dsfield' in ip rule show. + getmatch="tos $cnt" + fib_rule4_test_match_n_redirect "$match" "$getmatch" \ + "$getmatch redirect to table" + done match="fwmark 0x64" getmatch="mark 0x64" diff --git a/tools/testing/selftests/net/fib_tests.sh b/tools/testing/selftests/net/fib_tests.sh index 996af1ae3d3d..bb73235976b3 100755 --- a/tools/testing/selftests/net/fib_tests.sh +++ b/tools/testing/selftests/net/fib_tests.sh @@ -1447,6 +1447,81 @@ ipv4_local_rt_cache() log_test $? 0 "Cached route removed from VRF port device" } +ipv4_rt_dsfield() +{ + echo + echo "IPv4 route with dsfield tests" + + run_cmd "$IP route flush 172.16.102.0/24" + + # New routes should reject dsfield options that interfere with ECN + run_cmd "$IP route add 172.16.102.0/24 dsfield 0x01 via 172.16.101.2" + log_test $? 2 "Reject route with dsfield 0x01" + + run_cmd "$IP route add 172.16.102.0/24 dsfield 0x02 via 172.16.101.2" + log_test $? 2 "Reject route with dsfield 0x02" + + run_cmd "$IP route add 172.16.102.0/24 dsfield 0x03 via 172.16.101.2" + log_test $? 2 "Reject route with dsfield 0x03" + + # A generic route that doesn't take DSCP into account + run_cmd "$IP route add 172.16.102.0/24 via 172.16.101.2" + + # A more specific route for DSCP 0x10 + run_cmd "$IP route add 172.16.102.0/24 dsfield 0x10 via 172.16.103.2" + + # DSCP 0x10 should match the specific route, no matter the ECN bits + $IP route get fibmatch 172.16.102.1 dsfield 0x10 | \ + grep -q "via 172.16.103.2" + log_test $? 0 "IPv4 route with DSCP and ECN:Not-ECT" + + $IP route get fibmatch 172.16.102.1 dsfield 0x11 | \ + grep -q "via 172.16.103.2" + log_test $? 0 "IPv4 route with DSCP and ECN:ECT(1)" + + $IP route get fibmatch 172.16.102.1 dsfield 0x12 | \ + grep -q "via 172.16.103.2" + log_test $? 0 "IPv4 route with DSCP and ECN:ECT(0)" + + $IP route get fibmatch 172.16.102.1 dsfield 0x13 | \ + grep -q "via 172.16.103.2" + log_test $? 0 "IPv4 route with DSCP and ECN:CE" + + # Unknown DSCP should match the generic route, no matter the ECN bits + $IP route get fibmatch 172.16.102.1 dsfield 0x14 | \ + grep -q "via 172.16.101.2" + log_test $? 0 "IPv4 route with unknown DSCP and ECN:Not-ECT" + + $IP route get fibmatch 172.16.102.1 dsfield 0x15 | \ + grep -q "via 172.16.101.2" + log_test $? 0 "IPv4 route with unknown DSCP and ECN:ECT(1)" + + $IP route get fibmatch 172.16.102.1 dsfield 0x16 | \ + grep -q "via 172.16.101.2" + log_test $? 0 "IPv4 route with unknown DSCP and ECN:ECT(0)" + + $IP route get fibmatch 172.16.102.1 dsfield 0x17 | \ + grep -q "via 172.16.101.2" + log_test $? 0 "IPv4 route with unknown DSCP and ECN:CE" + + # Null DSCP should match the generic route, no matter the ECN bits + $IP route get fibmatch 172.16.102.1 dsfield 0x00 | \ + grep -q "via 172.16.101.2" + log_test $? 0 "IPv4 route with no DSCP and ECN:Not-ECT" + + $IP route get fibmatch 172.16.102.1 dsfield 0x01 | \ + grep -q "via 172.16.101.2" + log_test $? 0 "IPv4 route with no DSCP and ECN:ECT(1)" + + $IP route get fibmatch 172.16.102.1 dsfield 0x02 | \ + grep -q "via 172.16.101.2" + log_test $? 0 "IPv4 route with no DSCP and ECN:ECT(0)" + + $IP route get fibmatch 172.16.102.1 dsfield 0x03 | \ + grep -q "via 172.16.101.2" + log_test $? 0 "IPv4 route with no DSCP and ECN:CE" +} + ipv4_route_test() { route_setup @@ -1454,6 +1529,7 @@ ipv4_route_test() ipv4_rt_add ipv4_rt_replace ipv4_local_rt_cache + ipv4_rt_dsfield route_cleanup } |
