<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/include, branch v6.1.81</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.1.81</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.1.81'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2024-03-06T14:45:20+00:00</updated>
<entry>
<title>bpf: Derive source IP addr via bpf_*_fib_lookup()</title>
<updated>2024-03-06T14:45:20+00:00</updated>
<author>
<name>Martynas Pumputis</name>
<email>m@lambda.lt</email>
</author>
<published>2023-10-07T08:14:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2d7ebcb5d878b4311db56eeaf7bdd76dbe9b9a13'/>
<id>urn:sha1:2d7ebcb5d878b4311db56eeaf7bdd76dbe9b9a13</id>
<content type='text'>
commit dab4e1f06cabb6834de14264394ccab197007302 upstream.

Extend the bpf_fib_lookup() helper by making it to return the source
IPv4/IPv6 address if the BPF_FIB_LOOKUP_SRC flag is set.

For example, the following snippet can be used to derive the desired
source IP address:

    struct bpf_fib_lookup p = { .ipv4_dst = ip4-&gt;daddr };

    ret = bpf_skb_fib_lookup(skb, p, sizeof(p),
            BPF_FIB_LOOKUP_SRC | BPF_FIB_LOOKUP_SKIP_NEIGH);
    if (ret != BPF_FIB_LKUP_RET_SUCCESS)
        return TC_ACT_SHOT;

    /* the p.ipv4_src now contains the source address */

The inability to derive the proper source address may cause malfunctions
in BPF-based dataplanes for hosts containing netdevs with more than one
routable IP address or for multi-homed hosts.

For example, Cilium implements packet masquerading in BPF. If an
egressing netdev to which the Cilium's BPF prog is attached has
multiple IP addresses, then only one [hardcoded] IP address can be used for
masquerading. This breaks connectivity if any other IP address should have
been selected instead, for example, when a public and private addresses
are attached to the same egress interface.

The change was tested with Cilium [1].

Nikolay Aleksandrov helped to figure out the IPv6 addr selection.

[1]: https://github.com/cilium/cilium/pull/28283

Signed-off-by: Martynas Pumputis &lt;m@lambda.lt&gt;
Link: https://lore.kernel.org/r/20231007081415.33502-2-m@lambda.lt
Signed-off-by: Martin KaFai Lau &lt;martin.lau@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>bpf: Add table ID to bpf_fib_lookup BPF helper</title>
<updated>2024-03-06T14:45:20+00:00</updated>
<author>
<name>Louis DeLosSantos</name>
<email>louis.delos.devel@gmail.com</email>
</author>
<published>2023-05-31T19:38:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5fafd8254add75d8337df44ba8536e407ffe8928'/>
<id>urn:sha1:5fafd8254add75d8337df44ba8536e407ffe8928</id>
<content type='text'>
commit 8ad77e72caae22a1ddcfd0c03f2884929e93b7a4 upstream.

Add ability to specify routing table ID to the `bpf_fib_lookup` BPF
helper.

A new field `tbid` is added to `struct bpf_fib_lookup` used as
parameters to the `bpf_fib_lookup` BPF helper.

When the helper is called with the `BPF_FIB_LOOKUP_DIRECT` and
`BPF_FIB_LOOKUP_TBID` flags the `tbid` field in `struct bpf_fib_lookup`
will be used as the table ID for the fib lookup.

If the `tbid` does not exist the fib lookup will fail with
`BPF_FIB_LKUP_RET_NOT_FWDED`.

The `tbid` field becomes a union over the vlan related output fields
in `struct bpf_fib_lookup` and will be zeroed immediately after usage.

This functionality is useful in containerized environments.

For instance, if a CNI wants to dictate the next-hop for traffic leaving
a container it can create a container-specific routing table and perform
a fib lookup against this table in a "host-net-namespace-side" TC program.

This functionality also allows `ip rule` like functionality at the TC
layer, allowing an eBPF program to pick a routing table based on some
aspect of the sk_buff.

As a concrete use case, this feature will be used in Cilium's SRv6 L3VPN
datapath.

When egress traffic leaves a Pod an eBPF program attached by Cilium will
determine which VRF the egress traffic should target, and then perform a
FIB lookup in a specific table representing this VRF's FIB.

Signed-off-by: Louis DeLosSantos &lt;louis.delos.devel@gmail.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Link: https://lore.kernel.org/bpf/20230505-bpf-add-tbid-fib-lookup-v2-1-0a31c22c748c@gmail.com
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>block: define bvec_iter as __packed __aligned(4)</title>
<updated>2024-03-06T14:45:19+00:00</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2024-02-25T03:01:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0e351d1aa2e4c1a7a4cb2a5753b86db89796d3c8'/>
<id>urn:sha1:0e351d1aa2e4c1a7a4cb2a5753b86db89796d3c8</id>
<content type='text'>
[ Upstream commit 7838b4656110d950afdd92a081cc0f33e23e0ea8 ]

In commit 19416123ab3e ("block: define 'struct bvec_iter' as packed"),
what we need is to save the 4byte padding, and avoid `bio` to spread on
one extra cache line.

It is enough to define it as '__packed __aligned(4)', as '__packed'
alone means byte aligned, and can cause compiler to generate horrible
code on architectures that don't support unaligned access in case that
bvec_iter is embedded in other structures.

Cc: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Suggested-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Fixes: 19416123ab3e ("block: define 'struct bvec_iter' as packed")
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>efi/libstub: Add memory attribute protocol definitions</title>
<updated>2024-03-06T14:45:18+00:00</updated>
<author>
<name>Ard Biesheuvel</name>
<email>ardb+git@google.com</email>
</author>
<published>2024-03-04T11:19:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8ff6d88c0443acdd4199aacb69f1dd4a24120e8e'/>
<id>urn:sha1:8ff6d88c0443acdd4199aacb69f1dd4a24120e8e</id>
<content type='text'>
From: Evgeniy Baskov &lt;baskov@ispras.ru&gt;

[ Commit 79729f26b074a5d2722c27fa76cc45ef721e65cd upstream ]

EFI_MEMORY_ATTRIBUTE_PROTOCOL servers as a better alternative to
DXE services for setting memory attributes in EFI Boot Services
environment. This protocol is better since it is a part of UEFI
specification itself and not UEFI PI specification like DXE
services.

Add EFI_MEMORY_ATTRIBUTE_PROTOCOL definitions.
Support mixed mode properly for its calls.

Tested-by: Mario Limonciello &lt;mario.limonciello@amd.com&gt;
Signed-off-by: Evgeniy Baskov &lt;baskov@ispras.ru&gt;
Signed-off-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>NFSD: add CB_RECALL_ANY tracepoints</title>
<updated>2024-03-06T14:45:17+00:00</updated>
<author>
<name>Dai Ngo</name>
<email>dai.ngo@oracle.com</email>
</author>
<published>2022-11-17T03:44:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7b2b8a6c75f0c0175f626d61a74e4f7f75d38df4'/>
<id>urn:sha1:7b2b8a6c75f0c0175f626d61a74e4f7f75d38df4</id>
<content type='text'>
[ Upstream commit 638593be55c0b37a1930038460a9918215d5c24b ]

Add tracepoints to trace start and end of CB_RECALL_ANY operation.

Signed-off-by: Dai Ngo &lt;dai.ngo@oracle.com&gt;
[ cel: added show_rca_mask() macro ]
Signed-off-by: Chuck Lever &lt;chuck.lever@oracle.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>NFSD: add delegation reaper to react to low memory condition</title>
<updated>2024-03-06T14:45:17+00:00</updated>
<author>
<name>Dai Ngo</name>
<email>dai.ngo@oracle.com</email>
</author>
<published>2022-11-17T03:44:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f28dae54632c5ea45f32ddc6fba494f5efc15007'/>
<id>urn:sha1:f28dae54632c5ea45f32ddc6fba494f5efc15007</id>
<content type='text'>
[ Upstream commit 44df6f439a1790a5f602e3842879efa88f346672 ]

The delegation reaper is called by nfsd memory shrinker's on
the 'count' callback. It scans the client list and sends the
courtesy CB_RECALL_ANY to the clients that hold delegations.

To avoid flooding the clients with CB_RECALL_ANY requests, the
delegation reaper sends only one CB_RECALL_ANY request to each
client per 5 seconds.

Signed-off-by: Dai Ngo &lt;dai.ngo@oracle.com&gt;
[ cel: moved definition of RCA4_TYPE_MASK_RDATA_DLG ]
Signed-off-by: Chuck Lever &lt;chuck.lever@oracle.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>trace: Relocate event helper files</title>
<updated>2024-03-06T14:45:17+00:00</updated>
<author>
<name>Chuck Lever</name>
<email>chuck.lever@oracle.com</email>
</author>
<published>2022-11-14T13:57:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=371e1c1b326b5de0a12204f217709e2f626c7fe7'/>
<id>urn:sha1:371e1c1b326b5de0a12204f217709e2f626c7fe7</id>
<content type='text'>
[ Upstream commit 247c01ff5f8d66e62a404c91733be52fecb8b7f6 ]

Steven Rostedt says:
&gt; The include/trace/events/ directory should only hold files that
&gt; are to create events, not headers that hold helper functions.
&gt;
&gt; Can you please move them out of include/trace/events/ as that
&gt; directory is "special" in the creation of events.

Signed-off-by: Chuck Lever &lt;chuck.lever@oracle.com&gt;
Acked-by: Leon Romanovsky &lt;leonro@nvidia.com&gt;
Acked-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
Acked-by: Anna Schumaker &lt;Anna.Schumaker@Netapp.com&gt;
Stable-dep-of: 638593be55c0 ("NFSD: add CB_RECALL_ANY tracepoints")
Signed-off-by: Chuck Lever &lt;chuck.lever@oracle.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>filelock: add a new locks_inode_context accessor function</title>
<updated>2024-03-06T14:45:16+00:00</updated>
<author>
<name>Jeff Layton</name>
<email>jlayton@kernel.org</email>
</author>
<published>2022-11-16T14:02:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1f76cb66ff2257675666172aba42b4e661809a20'/>
<id>urn:sha1:1f76cb66ff2257675666172aba42b4e661809a20</id>
<content type='text'>
[ Upstream commit 401a8b8fd5acd51582b15238d72a8d0edd580e9f ]

There are a number of places in the kernel that are accessing the
inode-&gt;i_flctx field without smp_load_acquire. This is required to
ensure that the caller doesn't see a partially-initialized structure.

Add a new accessor function for it to make this clear and convert all of
the relevant accesses in locks.c to use it. Also, convert
locks_free_lock_context to use the helper as well instead of just doing
a "bare" assignment.

Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Stable-dep-of: 77c67530e1f9 ("nfsd: use locks_inode_context helper")
Signed-off-by: Chuck Lever &lt;chuck.lever@oracle.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>decompress: Use 8 byte alignment</title>
<updated>2024-03-06T14:45:14+00:00</updated>
<author>
<name>Ard Biesheuvel</name>
<email>ardb@kernel.org</email>
</author>
<published>2023-08-07T16:27:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=bf0ca988e250af95824c121873b2f76fccfc91df'/>
<id>urn:sha1:bf0ca988e250af95824c121873b2f76fccfc91df</id>
<content type='text'>
commit 8217ad0a435ff06d651d7298ea8ae8d72388179e upstream.

The ZSTD decompressor requires malloc() allocations to be 8 byte
aligned, so ensure that this the case.

Signed-off-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Link: https://lore.kernel.org/r/20230807162720.545787-19-ardb@kernel.org
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>netfilter: bridge: confirm multicast packets before passing them up the stack</title>
<updated>2024-03-06T14:45:08+00:00</updated>
<author>
<name>Florian Westphal</name>
<email>fw@strlen.de</email>
</author>
<published>2024-02-27T15:17:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2b1414d5e94e477edff1d2c79030f1d742625ea0'/>
<id>urn:sha1:2b1414d5e94e477edff1d2c79030f1d742625ea0</id>
<content type='text'>
[ Upstream commit 62e7151ae3eb465e0ab52a20c941ff33bb6332e9 ]

conntrack nf_confirm logic cannot handle cloned skbs referencing
the same nf_conn entry, which will happen for multicast (broadcast)
frames on bridges.

 Example:
    macvlan0
       |
      br0
     /  \
  ethX    ethY

 ethX (or Y) receives a L2 multicast or broadcast packet containing
 an IP packet, flow is not yet in conntrack table.

 1. skb passes through bridge and fake-ip (br_netfilter)Prerouting.
    -&gt; skb-&gt;_nfct now references a unconfirmed entry
 2. skb is broad/mcast packet. bridge now passes clones out on each bridge
    interface.
 3. skb gets passed up the stack.
 4. In macvlan case, macvlan driver retains clone(s) of the mcast skb
    and schedules a work queue to send them out on the lower devices.

    The clone skb-&gt;_nfct is not a copy, it is the same entry as the
    original skb.  The macvlan rx handler then returns RX_HANDLER_PASS.
 5. Normal conntrack hooks (in NF_INET_LOCAL_IN) confirm the orig skb.

The Macvlan broadcast worker and normal confirm path will race.

This race will not happen if step 2 already confirmed a clone. In that
case later steps perform skb_clone() with skb-&gt;_nfct already confirmed (in
hash table).  This works fine.

But such confirmation won't happen when eb/ip/nftables rules dropped the
packets before they reached the nf_confirm step in postrouting.

Pablo points out that nf_conntrack_bridge doesn't allow use of stateful
nat, so we can safely discard the nf_conn entry and let inet call
conntrack again.

This doesn't work for bridge netfilter: skb could have a nat
transformation. Also bridge nf prevents re-invocation of inet prerouting
via 'sabotage_in' hook.

Work around this problem by explicit confirmation of the entry at LOCAL_IN
time, before upper layer has a chance to clone the unconfirmed entry.

The downside is that this disables NAT and conntrack helpers.

Alternative fix would be to add locking to all code parts that deal with
unconfirmed packets, but even if that could be done in a sane way this
opens up other problems, for example:

-m physdev --physdev-out eth0 -j SNAT --snat-to 1.2.3.4
-m physdev --physdev-out eth1 -j SNAT --snat-to 1.2.3.5

For multicast case, only one of such conflicting mappings will be
created, conntrack only handles 1:1 NAT mappings.

Users should set create a setup that explicitly marks such traffic
NOTRACK (conntrack bypass) to avoid this, but we cannot auto-bypass
them, ruleset might have accept rules for untracked traffic already,
so user-visible behaviour would change.

Suggested-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217777
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
</feed>
