Age | Commit message (Collapse) | Author | Files | Lines |
|
Petr Machata says:
====================
mlxsw: Refactor reference counting code
Amit Cohen writes:
This set converts all reference counters defined as 'unsigned int' to
refcount_t type. The reference counting of LAGs can be simplified, so first
refactor the related code and then change the type of the reference
counter.
Patch set overview:
Patches #1-#4 are preparations for LAG refactor
Patch #5 refactors LAG code and change the type of reference counter
Patch #6 converts the remaining reference counters in mlxsw driver
====================
Link: https://lore.kernel.org/r/cover.1706293430.git.petrm@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
mlxsw driver uses 'unsigned int' for reference counters in several
structures. Instead, use refcount_t type which allows us to catch overflow
and underflow issues. Change the type of the counters and use the
appropriate API.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
mlxsw_sp stores an array of LAGs. When a port joins a LAG, in case that
this LAG is already in use, we only have to increase the reference counter.
Otherwise, we have to search for an unused LAG ID and configure it in
hardware. When a port leaves a LAG, we have to destroy it only for the last
user. This code can be simplified, for such requirements we usually add
get() and put() functions which create and destroy the object.
Add mlxsw_sp_lag_{get,put}() and use them. These functions take care of
the reference counter and hardware configuration if needed. Change the
reference counter to refcount_t type which catches overflow and underflow
issues.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Currently, the function mlxsw_sp_lag_index_get() is called twice - first
as part of NETDEV_PRECHANGEUPPER event and later as part of
NETDEV_CHANGEUPPER. This function will be changed in the next patch. To
simplify the code, call it only once as part of NETDEV_CHANGEUPPER
event and set an error message using 'extack' in case of failure.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The maximum number of LAGs is queried from core several times. It is
used to allocate LAG array, and then to iterate over it. In addition, it
is used for PGT initialization. To simplify the code, instead of
querying it several times, store the value as part of 'mlxsw_sp' and use
it.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
A next patch will add mlxsw_sp_lag_{get,put}() functions to handle LAG
reference counting and create/destroy it only for first user/last user.
Remove mlxsw_sp_lag_get() function and access LAG array directly.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The structure mlxsw_sp_upper is used only as LAG. Rename it to
mlxsw_sp_lag and move it to spectrum.c file, as it is used only there.
Move the function mlxsw_sp_lag_get() with the structure.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
syzbot reported an interesting trace [1] caused by a stale sk->sk_wq
pointer in a closed llc socket.
In commit ff7b11aa481f ("net: socket: set sock->sk to NULL after
calling proto_ops::release()") Eric Biggers hinted that some protocols
are missing a sock_orphan(), we need to perform a full audit.
In net-next, I plan to clear sock->sk from sock_orphan() and
amend Eric patch to add a warning.
[1]
BUG: KASAN: slab-use-after-free in list_empty include/linux/list.h:373 [inline]
BUG: KASAN: slab-use-after-free in waitqueue_active include/linux/wait.h:127 [inline]
BUG: KASAN: slab-use-after-free in sock_def_write_space_wfree net/core/sock.c:3384 [inline]
BUG: KASAN: slab-use-after-free in sock_wfree+0x9a8/0x9d0 net/core/sock.c:2468
Read of size 8 at addr ffff88802f4fc880 by task ksoftirqd/1/27
CPU: 1 PID: 27 Comm: ksoftirqd/1 Not tainted 6.8.0-rc1-syzkaller-00049-g6098d87eaf31 #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd9/0x1b0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:377 [inline]
print_report+0xc4/0x620 mm/kasan/report.c:488
kasan_report+0xda/0x110 mm/kasan/report.c:601
list_empty include/linux/list.h:373 [inline]
waitqueue_active include/linux/wait.h:127 [inline]
sock_def_write_space_wfree net/core/sock.c:3384 [inline]
sock_wfree+0x9a8/0x9d0 net/core/sock.c:2468
skb_release_head_state+0xa3/0x2b0 net/core/skbuff.c:1080
skb_release_all net/core/skbuff.c:1092 [inline]
napi_consume_skb+0x119/0x2b0 net/core/skbuff.c:1404
e1000_unmap_and_free_tx_resource+0x144/0x200 drivers/net/ethernet/intel/e1000/e1000_main.c:1970
e1000_clean_tx_irq drivers/net/ethernet/intel/e1000/e1000_main.c:3860 [inline]
e1000_clean+0x4a1/0x26e0 drivers/net/ethernet/intel/e1000/e1000_main.c:3801
__napi_poll.constprop.0+0xb4/0x540 net/core/dev.c:6576
napi_poll net/core/dev.c:6645 [inline]
net_rx_action+0x956/0xe90 net/core/dev.c:6778
__do_softirq+0x21a/0x8de kernel/softirq.c:553
run_ksoftirqd kernel/softirq.c:921 [inline]
run_ksoftirqd+0x31/0x60 kernel/softirq.c:913
smpboot_thread_fn+0x660/0xa10 kernel/smpboot.c:164
kthread+0x2c6/0x3a0 kernel/kthread.c:388
ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242
</TASK>
Allocated by task 5167:
kasan_save_stack+0x33/0x50 mm/kasan/common.c:47
kasan_save_track+0x14/0x30 mm/kasan/common.c:68
unpoison_slab_object mm/kasan/common.c:314 [inline]
__kasan_slab_alloc+0x81/0x90 mm/kasan/common.c:340
kasan_slab_alloc include/linux/kasan.h:201 [inline]
slab_post_alloc_hook mm/slub.c:3813 [inline]
slab_alloc_node mm/slub.c:3860 [inline]
kmem_cache_alloc_lru+0x142/0x6f0 mm/slub.c:3879
alloc_inode_sb include/linux/fs.h:3019 [inline]
sock_alloc_inode+0x25/0x1c0 net/socket.c:308
alloc_inode+0x5d/0x220 fs/inode.c:260
new_inode_pseudo+0x16/0x80 fs/inode.c:1005
sock_alloc+0x40/0x270 net/socket.c:634
__sock_create+0xbc/0x800 net/socket.c:1535
sock_create net/socket.c:1622 [inline]
__sys_socket_create net/socket.c:1659 [inline]
__sys_socket+0x14c/0x260 net/socket.c:1706
__do_sys_socket net/socket.c:1720 [inline]
__se_sys_socket net/socket.c:1718 [inline]
__x64_sys_socket+0x72/0xb0 net/socket.c:1718
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xd3/0x250 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x63/0x6b
Freed by task 0:
kasan_save_stack+0x33/0x50 mm/kasan/common.c:47
kasan_save_track+0x14/0x30 mm/kasan/common.c:68
kasan_save_free_info+0x3f/0x60 mm/kasan/generic.c:640
poison_slab_object mm/kasan/common.c:241 [inline]
__kasan_slab_free+0x121/0x1b0 mm/kasan/common.c:257
kasan_slab_free include/linux/kasan.h:184 [inline]
slab_free_hook mm/slub.c:2121 [inline]
slab_free mm/slub.c:4299 [inline]
kmem_cache_free+0x129/0x350 mm/slub.c:4363
i_callback+0x43/0x70 fs/inode.c:249
rcu_do_batch kernel/rcu/tree.c:2158 [inline]
rcu_core+0x819/0x1680 kernel/rcu/tree.c:2433
__do_softirq+0x21a/0x8de kernel/softirq.c:553
Last potentially related work creation:
kasan_save_stack+0x33/0x50 mm/kasan/common.c:47
__kasan_record_aux_stack+0xba/0x100 mm/kasan/generic.c:586
__call_rcu_common.constprop.0+0x9a/0x7b0 kernel/rcu/tree.c:2683
destroy_inode+0x129/0x1b0 fs/inode.c:315
iput_final fs/inode.c:1739 [inline]
iput.part.0+0x560/0x7b0 fs/inode.c:1765
iput+0x5c/0x80 fs/inode.c:1755
dentry_unlink_inode+0x292/0x430 fs/dcache.c:400
__dentry_kill+0x1ca/0x5f0 fs/dcache.c:603
dput.part.0+0x4ac/0x9a0 fs/dcache.c:845
dput+0x1f/0x30 fs/dcache.c:835
__fput+0x3b9/0xb70 fs/file_table.c:384
task_work_run+0x14d/0x240 kernel/task_work.c:180
exit_task_work include/linux/task_work.h:38 [inline]
do_exit+0xa8a/0x2ad0 kernel/exit.c:871
do_group_exit+0xd4/0x2a0 kernel/exit.c:1020
__do_sys_exit_group kernel/exit.c:1031 [inline]
__se_sys_exit_group kernel/exit.c:1029 [inline]
__x64_sys_exit_group+0x3e/0x50 kernel/exit.c:1029
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xd3/0x250 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x63/0x6b
The buggy address belongs to the object at ffff88802f4fc800
which belongs to the cache sock_inode_cache of size 1408
The buggy address is located 128 bytes inside of
freed 1408-byte region [ffff88802f4fc800, ffff88802f4fcd80)
The buggy address belongs to the physical page:
page:ffffea0000bd3e00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x2f4f8
head:ffffea0000bd3e00 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
anon flags: 0xfff00000000840(slab|head|node=0|zone=1|lastcpupid=0x7ff)
page_type: 0xffffffff()
raw: 00fff00000000840 ffff888013b06b40 0000000000000000 0000000000000001
raw: 0000000000000000 0000000080150015 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 3, migratetype Reclaimable, gfp_mask 0xd20d0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_RECLAIMABLE), pid 4956, tgid 4956 (sshd), ts 31423924727, free_ts 0
set_page_owner include/linux/page_owner.h:31 [inline]
post_alloc_hook+0x2d0/0x350 mm/page_alloc.c:1533
prep_new_page mm/page_alloc.c:1540 [inline]
get_page_from_freelist+0xa28/0x3780 mm/page_alloc.c:3311
__alloc_pages+0x22f/0x2440 mm/page_alloc.c:4567
__alloc_pages_node include/linux/gfp.h:238 [inline]
alloc_pages_node include/linux/gfp.h:261 [inline]
alloc_slab_page mm/slub.c:2190 [inline]
allocate_slab mm/slub.c:2354 [inline]
new_slab+0xcc/0x3a0 mm/slub.c:2407
___slab_alloc+0x4af/0x19a0 mm/slub.c:3540
__slab_alloc.constprop.0+0x56/0xa0 mm/slub.c:3625
__slab_alloc_node mm/slub.c:3678 [inline]
slab_alloc_node mm/slub.c:3850 [inline]
kmem_cache_alloc_lru+0x379/0x6f0 mm/slub.c:3879
alloc_inode_sb include/linux/fs.h:3019 [inline]
sock_alloc_inode+0x25/0x1c0 net/socket.c:308
alloc_inode+0x5d/0x220 fs/inode.c:260
new_inode_pseudo+0x16/0x80 fs/inode.c:1005
sock_alloc+0x40/0x270 net/socket.c:634
__sock_create+0xbc/0x800 net/socket.c:1535
sock_create net/socket.c:1622 [inline]
__sys_socket_create net/socket.c:1659 [inline]
__sys_socket+0x14c/0x260 net/socket.c:1706
__do_sys_socket net/socket.c:1720 [inline]
__se_sys_socket net/socket.c:1718 [inline]
__x64_sys_socket+0x72/0xb0 net/socket.c:1718
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xd3/0x250 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x63/0x6b
page_owner free stack trace missing
Memory state around the buggy address:
ffff88802f4fc780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88802f4fc800: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88802f4fc880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88802f4fc900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88802f4fc980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Fixes: 43815482370c ("net: sock_def_readable() and friends RCU conversion")
Reported-and-tested-by: syzbot+32b89eaa102b372ff76d@syzkaller.appspotmail.com
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240126165532.3396702-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The config file contains a partial kernel configuration to be used by
`virtme-configkernel --custom'. The presumption is that the config file
contains all Kconfig options needed by the selftests from the directory.
In net/forwarding/config, many are missing, which manifests as spurious
failures when running the selftests, with messages about unknown device
types, qdisc kinds or classifier actions. Add the missing configurations.
Tested the resulting configuration using virtme-ng as follows:
# vng -b -f tools/testing/selftests/net/forwarding/config
# vng --user root
(within the VM:)
# make -C tools/testing/selftests TARGETS=net/forwarding run_tests
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/025abded7ff9cea5874a7fe35dcd3fd41bf5e6ac.1706286755.git.petrm@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Esben Haabendal says:
====================
net: stmmac: dwmac-imx: Time Based Scheduling support
This small patch series allows using TBS support of the i.MX Ethernet QOS
controller for etf qdisc offload.
It achieves this in a similar manner that it is done in dwmac-intel.c,
dwmac-mediatek.c and stmmac_pci.c.
Changes since v1:
- Simplified for loop by starting at index 1.
- Fixed problem with indentation.
====================
Link: https://lore.kernel.org/r/cover.1706256158.git.esben@geanix.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
TSO and TBS cannot coexist. For now we set i.MX Ethernet QOS controller to
use the first TX queue with TSO and the rest for TBS.
TX queues with TBS can support etf qdisc hw offload.
Signed-off-by: Esben Haabendal <esben@geanix.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
With the dma conf being reallocated on each call to stmmac_open(), any
information in there is lost, unless we specifically handle it.
The STMMAC_TBS_EN bit is set when adding an etf qdisc, and the etf qdisc
therefore would stop working when link was set down and then back up.
Fixes: ba39b344e924 ("net: ethernet: stmicro: stmmac: generate stmmac dma conf before open")
Cc: stable@vger.kernel.org
Signed-off-by: Esben Haabendal <esben@geanix.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
On a parisc64 kernel I sometimes notice this kernel warning:
Kernel unaligned access to 0x40ff8814 at ndisc_send_skb+0xc0/0x4d8
The address 0x40ff8814 points to the in6addr_linklocal_allrouters
variable and the warning simply means that some ipv6 function tries to
read a 64-bit word directly from the not-64-bit aligned
in6addr_linklocal_allrouters variable.
Unaligned accesses are non-critical as the architecture or exception
handlers usually will fix it up at runtime. Nevertheless it may trigger
a performance penality for some architectures. For details read the
"unaligned-memory-access" kernel documentation.
The patch below ensures that the ipv6 loopback and router addresses will
always be naturally aligned. This prevents the unaligned accesses for
all architectures.
Signed-off-by: Helge Deller <deller@gmx.de>
Fixes: 034dfc5df99eb ("ipv6: export in6addr_loopback to modules")
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/ZbNuFM1bFqoH-UoY@p100
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
When setting or getting PHC time, the higher bits of the second time (>32
bits) they were ignored. Meaning that setting some time in the future like
year 2150, it was failing to set this.
The issue can be reproduced like this:
# phc_ctl /dev/ptp1 set 10000000000
phc_ctl[12.290]: set clock time to 10000000000.000000000 or Sat Nov 20 17:46:40 2286
# phc_ctl /dev/ptp1 get
phc_ctl[15.309]: clock time is 1410065411.018055420 or Sun Sep 7 04:50:11 2014
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Divya Koppera <divya.koppera@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20240126073042.1845153-1-horatiu.vultur@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Use the inline function reciprocal_scale rather than open coding
the scale optimization. Also, remove unnecessary initializations.
Resulting compiled code is unchanged (according to godbolt).
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20240126002550.169608-1-stephen@networkplumber.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Tony Nguyen says:
====================
ice: fix timestamping in reset process
Karol Kolacinski says:
PTP reset process has multiple places where timestamping can end up in
an incorrect state.
This series introduces a proper state machine for PTP and refactors
a large part of the code to ensure that timestamping does not break.
The following are changes since commit 91374ba537bd60caa9ae052c9f1c0fe055b39149:
net: dsa: mt7530: support OF-based registration of switch MDIO bus
and are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue 100GbE
====================
Link: https://lore.kernel.org/r/20240125215757.2601799-1-anthony.l.nguyen@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The ice driver currently attempts to destroy and re-initialize the Tx
timestamp tracker during the reset flow. The release of the Tx tracker
only happened during CORE reset or GLOBAL reset. The ice_ptp_rebuild()
function always calls the ice_ptp_init_tx function which will allocate
a new tracker data structure, resulting in memory leaks during PF reset.
Certainly the driver should not be allocating a new tracker without
removing the old tracker data, as this results in a memory leak.
Additionally, there's no reason to remove the tracker memory during a
reset. Remove this logic from the reset and rebuild flow. Instead of
releasing the Tx tracker, flush outstanding timestamps just before we
reset the PHY timestamp block in ice_ptp_cfg_phy_interrupt().
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The ice_ptp_reset() function uses a goto to skip past clock owner
operations if performing a PF reset or if the device is not the clock
owner. This is a bit confusing. Factor this out into
ice_ptp_rebuild_owner() instead.
The ice_ptp_reset() function is called by ice_rebuild() to restore PTP
functionality after a device reset. Follow the convention set by the
ice_main.c file and rename this function to ice_ptp_rebuild(), in the
same way that we have ice_prepare_for_reset() and
ice_ptp_prepare_for_reset().
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The ice_ptp_tx_cfg_intr() function sends a control queue message to
configure the PHY timestamp interrupt block. This is a very similar name
to a function which is used to configure the MAC Other Interrupt Cause
Enable register.
Rename this function to ice_ptp_cfg_phy_interrupt in order to make it
more obvious to the reader what action it performs, and distinguish it
from other similarly named functions.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
E810 hardware does not have a Tx timestamp ready bitmap. Don't check
has_ready_bitmap in E810-specific functions.
Add has_ready_bitmap check in ice_ptp_process_tx_tstamp() to stop
relying on the fact that ice_get_phy_tx_tstamp_ready() returns all 1s.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The tx->verify_cached flag is used to inform the Tx timestamp tracking
code whether it needs to verify the cached Tx timestamp value against
a previous captured value. This is necessary on E810 hardware which does
not have a Tx timestamp ready bitmap.
In addition, we currently rely on the fact that the
ice_get_phy_tx_tstamp_ready() function returns all 1s for E810 hardware.
Instead of introducing a brand new flag, rename and verify_cached to
has_ready_bitmap, inverting the relevant checks.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The ice_ptp_prepare_for_reset() and ice_ptp_reset() functions currently
check the pf->flags ICE_FLAG_PFR_REQ bit to determine if the current
reset is a PF reset or not.
This is problematic, because it is possible that a PF reset and a higher
level reset (CORE reset, GLOBAL reset, EMP reset) are requested
simultaneously. In that case, the driver performs the highest level
reset requested. However, the ICE_FLAG_PFR_REQ flag will still be set.
The main driver reset functions take an enum ice_reset_req indicating
which reset is actually being performed. Pass this data into the PTP
functions and rely on this instead of relying on the driver flags.
This ensures that the PTP code performs the proper level of reset that
the driver is actually undergoing.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Add PTP state machine so that the driver can correctly identify PTP
state around resets.
When the driver got information about ungraceful reset, PTP was not
prepared for reset and it returned error. When this situation occurs,
prepare PTP before rebuilding its structures.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Co-developed-by: Karol Kolacinski <karol.kolacinski@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Pull jfs fix from David Kleikamp:
"Revert a bad sanity check"
* tag 'jfs-6.8-rc3' of github.com:kleikamp/linux-shaggy:
Revert "jfs: fix shift-out-of-bounds in dbJoin"
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
"Two small fixes for tracefs and eventfs:
- Fix register_snapshot_trigger() on allocation error
If the snapshot fails to allocate, the register_snapshot_trigger()
can still return success. If the call to
tracing_alloc_snapshot_instance() returned anything but 0, it
returned 0, but it should have been returning the error code from
that allocation function.
- Remove leftover code from tracefs doing a dentry walk on remount.
The update_gid() function was called by the tracefs code on remount
to update the gid of eventfs, but that is no longer the case, but
that code wasn't deleted. Nothing calls it. Remove it"
* tag 'trace-v6.8-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracefs: remove stale 'update_gid' code
tracing/trigger: Fix to return error if failed to alloc snapshot
|
|
Modern OSes use iptables implementation with nf_tables as a backend,
e.g.:
$ iptables -V
iptables v1.8.8 (nf_tables)
Pablo points out that we need CONFIG_NFT_COMPAT to make that work,
otherwise we see a lot of:
Warning: Extension DNAT revision 0 not supported, missing kernel module?
with DNAT being just an example here, other modules we need
include udp, TTL, length etc.
Link: https://lore.kernel.org/r/20240126201308.2903602-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When working with GPIO, its direction must be set either when the GPIO is
requested by gpiod_get*() or later on by one of the gpiod_direction_*()
functions. Neither of this is done here which results in undefined
behavior on some systems.
As the reset GPIO is used right after it is requested here, it makes sense
to configure it as GPIOD_OUT_HIGH right away. With that, the following
gpiod_set_value_cansleep(1) becomes redundant and can be safely
removed.
Fixes: a653f2f538f9 ("net: dsa: qca8k: introduce reset via gpio feature")
Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/1706266175-3408-1-git-send-email-michal.vokac@ysoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
For now, the packet with an old ack is not accepted if we are in
FIN_WAIT1 state, which can cause retransmission. Taking the following
case as an example:
Client Server
| |
FIN_WAIT1(Send FIN, seq=10) FIN_WAIT1(Send FIN, seq=20, ack=10)
| |
| Send ACK(seq=21, ack=11)
Recv ACK(seq=21, ack=11)
|
Recv FIN(seq=20, ack=10)
In the case above, simultaneous close is happening, and the FIN and ACK
packet that send from the server is out of order. Then, the FIN will be
dropped by the client, as it has an old ack. Then, the server has to
retransmit the FIN, which can cause delay if the server has set the
SO_LINGER on the socket.
Old ack is accepted in the ESTABLISHED and TIME_WAIT state, and I think
it should be better to keep the same logic.
In this commit, we accept old ack in FIN_WAIT1/FIN_WAIT2/CLOSING/LAST_ACK
states. Maybe we should limit it to FIN_WAIT1 for now?
Signed-off-by: Menglong Dong <menglong8.dong@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240126040519.1846345-1-menglong8.dong@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Arınç ÜNAL says:
====================
MT7530 DSA Subdriver Improvements Act I
This patch series simplifies the MT7530 DSA subdriver and improves the
logic of the support for MT7530, MT7531, and the switch on the MT7988 SoC.
I have done a simple ping test to confirm basic communication on all switch
ports on MCM and standalone MT7530, and MT7531 switch with this patch
series applied.
MT7621 Unielec, MCM MT7530:
rgmii-only-gmac0-mt7621-unielec-u7621-06-16m.dtb
gmac0-and-gmac1-mt7621-unielec-u7621-06-16m.dtb
tftpboot 0x80008000 mips-uzImage.bin; tftpboot 0x83000000 mips-rootfs.cpio.uboot; tftpboot 0x83f00000 $dtb; bootm 0x80008000 0x83000000 0x83f00000
MT7622 Bananapi, MT7531:
gmac0-and-gmac1-mt7622-bananapi-bpi-r64.dtb
tftpboot 0x40000000 arm64-Image; tftpboot 0x45000000 arm64-rootfs.cpio.uboot; tftpboot 0x4a000000 $dtb; booti 0x40000000 0x45000000 0x4a000000
MT7623 Bananapi, standalone MT7530:
rgmii-only-gmac0-mt7623n-bananapi-bpi-r2.dtb
gmac0-and-gmac1-mt7623n-bananapi-bpi-r2.dtb
tftpboot 0x80008000 arm-zImage; tftpboot 0x83000000 arm-rootfs.cpio.uboot; tftpboot 0x83f00000 $dtb; bootz 0x80008000 0x83000000 0x83f00000
This patch series is the continuation of the patch series linked below.
https://lore.kernel.org/r/20230522121532.86610-1-arinc.unal@arinc9.com
v2: https://lore.kernel.org/r/20231227044347.107291-1-arinc.unal@arinc9.com
v1: https://lore.kernel.org/r/20231118123205.266819-1-arinc.unal@arinc9.com
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
====================
Link: https://lore.kernel.org/r/20240122-for-netnext-mt7530-improvements-1-v3-0-042401f2b279@arinc9.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There's no need to run all the code on mt7530_setup_port5() if port 5 is
disabled. The only case for calling mt7530_setup_port5() from
mt7530_setup() is when PHY muxing is enabled. That is because port 5 is not
defined as a port on the devicetree, therefore, it cannot be controlled by
phylink.
Because of this, run mt7530_setup_port5() if priv->p5_intf_sel is
P5_INTF_SEL_PHY_P0 or P5_INTF_SEL_PHY_P4. Remove the P5_DISABLED case from
mt7530_setup_port5().
Stop initialising the interface variable as the remaining cases will always
call mt7530_setup_port5() with it initialised.
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20240122-for-netnext-mt7530-improvements-1-v3-7-042401f2b279@arinc9.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Running mt7530_setup_port5() from mt7530_setup() used to handle all cases
of configuring port 5, including phylink.
Setting priv->p5_interface under mt7530_setup_port5() makes sure that
mt7530_setup_port5() from mt753x_phylink_mac_config() won't run.
The commit ("net: dsa: mt7530: improve code path for setting up port 5")
makes so that mt7530_setup_port5() from mt7530_setup() runs only on
non-phylink cases.
Get rid of unnecessarily setting priv->p5_interface under
mt7530_setup_port5() as port 5 phylink configuration will be done by
running mt7530_setup_port5() from mt753x_phylink_mac_config() now.
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20240122-for-netnext-mt7530-improvements-1-v3-6-042401f2b279@arinc9.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There're two code paths for setting up port 5:
mt7530_setup()
-> mt7530_setup_port5()
mt753x_phylink_mac_config()
-> mt753x_mac_config()
-> mt7530_mac_config()
-> mt7530_setup_port5()
Currently mt7530_setup_port5() from mt7530_setup() always runs. If port 5
is used as a CPU, DSA, or user port, mt7530_setup_port5() from
mt753x_phylink_mac_config() won't run. That is because priv->p5_interface
set on mt7530_setup_port5() will match state->interface on
mt753x_phylink_mac_config() which will stop running mt7530_setup_port5()
again.
Therefore, mt7530_setup_port5() will never run from
mt753x_phylink_mac_config().
Address this by not running mt7530_setup_port5() from mt7530_setup() if
port 5 is used as a CPU, DSA, or user port. This driver isn't in the
dsa_switches_apply_workarounds[] array so phylink will always be present.
To keep the cases where port 5 isn't controlled by phylink working as
before, preserve the mt7530_setup_port5() call from mt7530_setup().
Do not set priv->p5_intf_sel to P5_DISABLED. It is already set to that when
"priv" is allocated.
Move setting the interface to a more specific location. It's supposed to be
overwritten if PHY muxing is detected.
Improve the comment which explains the process.
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20240122-for-netnext-mt7530-improvements-1-v3-5-042401f2b279@arinc9.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There's no logic to numerically order the CPU ports. Just state the port
number instead.
Remove the irrelevant PHY muxing information from
mt7530_mac_port_get_caps(). Explain the supported MII modes instead.
Remove the out of place PHY muxing information from
mt753x_phylink_mac_config(). The function is for MT7530, MT7531, and the
switch on the MT7988 SoC but there's no PHY muxing on MT7531 or the switch
on the MT7988 SoC.
These comments were gradually introduced with the commits below.
commit ca366d6c889b ("net: dsa: mt7530: Convert to PHYLINK API")
commit 38f790a80560 ("net: dsa: mt7530: Add support for port 5")
commit 88bdef8be9f6 ("net: dsa: mt7530: Extend device data ready for adding
a new hardware")
commit c288575f7810 ("net: dsa: mt7530: Add the support of MT7531 switch")
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Acked-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20240122-for-netnext-mt7530-improvements-1-v3-4-042401f2b279@arinc9.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Introduce the p5_sgmii field to store the information for whether port 5
has got SGMII or not. Instead of reading the MT7531_TOP_SIG_SR register
multiple times, the register will be read once and the value will be
stored on the p5_sgmii field. This saves unnecessary reads of the
register.
Move the comment about MT7531AE and MT7531BE to mt7531_setup(), where the
switch is identified.
Get rid of mt7531_dual_sgmii_supported() now that priv->p5_sgmii stores the
information. Address the code where mt7531_dual_sgmii_supported() is used.
Get rid of mt7531_is_rgmii_port() which just prints the opposite of
priv->p5_sgmii.
Instead of calling mt7531_pll_setup() then returning, do not call it if
port 5 is SGMII.
Remove P5_INTF_SEL_GMAC5_SGMII. The p5_interface_select enum is supposed to
represent the mode that port 5 is being used in, not the hardware
information of port 5. Set p5_intf_sel to P5_INTF_SEL_GMAC5 instead, if
port 5 is not dsa_is_unused_port().
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Acked-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20240122-for-netnext-mt7530-improvements-1-v3-3-042401f2b279@arinc9.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use the p5_interface_select enumeration as the data type for the
p5_intf_sel field. This ensures p5_intf_sel can only take the values
defined in the p5_interface_select enumeration.
Remove the explicit assignment of 0 to P5_DISABLED as the first enum item
is automatically assigned 0.
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Acked-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20240122-for-netnext-mt7530-improvements-1-v3-2-042401f2b279@arinc9.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
On the MT7530 switch, the CPU_PORT field indicates which CPU port to trap
frames to, regardless of the affinity of the inbound user port.
When multiple CPU ports are in use, if the DSA conduit interface is down,
trapped frames won't be passed to the conduit interface.
To make trapping frames work including this case, implement
ds->ops->conduit_state_change() on this subdriver and set the CPU_PORT
field to the numerically smallest CPU port whose conduit interface is up.
Introduce the active_cpu_ports field to store the information of the active
CPU ports. Correct the macros, CPU_PORT is bits 4 through 6 of the
register.
Add a comment to explain frame trapping for this switch.
Currently, the driver doesn't support the use of multiple CPU ports so this
is not necessarily a bug fix.
Suggested-by: Vladimir Oltean <olteanv@gmail.com>
Suggested-by: Russell King (Oracle) <linux@armlinux.org.uk>
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20240122-for-netnext-mt7530-improvements-1-v3-1-042401f2b279@arinc9.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"22 hotfixes. 11 are cc:stable and the remainder address post-6.7
issues or aren't considered appropriate for backporting"
* tag 'mm-hotfixes-stable-2024-01-28-23-21' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (22 commits)
mm: thp_get_unmapped_area must honour topdown preference
mm: huge_memory: don't force huge page alignment on 32 bit
userfaultfd: fix mmap_changing checking in mfill_atomic_hugetlb
selftests/mm: ksm_tests should only MADV_HUGEPAGE valid memory
scs: add CONFIG_MMU dependency for vfree_atomic()
mm/memory: fix folio_set_dirty() vs. folio_mark_dirty() in zap_pte_range()
mm/huge_memory: fix folio_set_dirty() vs. folio_mark_dirty()
selftests/mm: Update va_high_addr_switch.sh to check CPU for la57 flag
selftests: mm: fix map_hugetlb failure on 64K page size systems
MAINTAINERS: supplement of zswap maintainers update
stackdepot: make fast paths lock-less again
stackdepot: add stats counters exported via debugfs
mm, kmsan: fix infinite recursion due to RCU critical section
mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again
selftests/mm: switch to bash from sh
MAINTAINERS: add man-pages git trees
mm: memcontrol: don't throttle dying tasks on memory.high
mm: mmap: map MAP_STACK to VM_NOHUGEPAGE
uprobes: use pagesize-aligned virtual address when replacing pages
selftests/mm: mremap_test: fix build warning
...
|
|
Make sure the rpc timeout was assigned with the correct value for
initial timeout and max number of retries.
Fixes: 57331a59ac0d ("NFSv4.1: Use the nfs_client's rpc timeouts for backchannel")
Signed-off-by: Samasth Norway Ananda <samasth.norway.ananda@oracle.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
All error handling paths, except this one, go to 'out' where
release_swfw_sync() is called.
This call balances the acquire_swfw_sync() call done at the beginning of
the function.
Branch to the error handling path in order to correctly release some
resources in case of error.
Fixes: ae14a1d8e104 ("ixgbe: Fix IOSF SB access issues")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The e1000e driver supports hardware with a variety of different clock
speeds, and thus a variety of different increment values used for
programming its PTP hardware clock.
The values currently programmed in e1000e_ptp_init are incorrect. In
particular, only two maximum adjustments are used: 24000000 - 1, and
600000000 - 1. These were originally intended to be used with the 96 MHz
clock and the 25 MHz clock.
Both of these values are actually slightly too high. For the 96 MHz clock,
the actual maximum value that can safely be programmed is 23,999,938. For
the 25 MHz clock, the maximum value is 599,999,904.
Worse, several devices use a 24 MHz clock or a 38.4 MHz clock. These parts
are incorrectly assigned one of either the 24million or 600million values.
For the 24 MHz clock, this is not a significant issue: its current
increment value can support an adjustment up to 7billion in the positive
direction. However, the 38.4 KHz clock uses an increment value which can
only support up to 230,769,157 before it starts overflowing.
To understand where these values come from, consider that frequency
adjustments have the form of:
new_incval = base_incval + (base_incval * adjustment) / (unit of adjustment)
The maximum adjustment is reported in terms of parts per billion:
new_incval = base_incval + (base_incval * adjustment) / 1 billion
The largest possible adjustment is thus given by the following:
max_incval = base_incval + (base_incval * max_adj) / 1 billion
Re-arranging to solve for max_adj:
max_adj = (max_incval - base_incval) * 1 billion / base_incval
We also need to ensure that negative adjustments cannot underflow. This can
be achieved simply by ensuring max_adj is always less than 1 billion.
Introduce new macros in e1000.h codifying the maximum adjustment in PPB for
each frequency given its associated increment values. Also clarify where
these values come from by commenting about the above equations.
Replace the switch statement in e1000e_ptp_init with one which mirrors the
increment value switch statement from e1000e_get_base_timinica. For each
device, assign the appropriate maximum adjustment based on its frequency.
Some parts can have one of two frequency modes as determined by
E1000_TSYNCRXCTL_SYSCFI.
Since the new flow directly matches the assignments in
e1000e_get_base_timinca, and uses well defined macro names, it is much
easier to verify that the resulting maximum adjustments are correct. It
also avoids difficult to parse construction such as the "hw->mac.type <
e1000_phc_lpt", and the use of fallthrough which was especially confusing
when combined with a conditional block.
Note that I believe the current increment value configuration used for
24MHz clocks is sub-par, as it leaves at least 3 extra bits available in
the INCVALUE register. However, fixing that requires more careful review of
the clock rate and associated values.
Reported-by: Trey Harrison <harrisondigitalmedia@gmail.com>
Fixes: 68fe1d5da548 ("e1000e: Add Support for 38.4MHZ frequency")
Fixes: d89777bf0e42 ("e1000e: add support for IEEE-1588 PTP")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
This reverts commit cca974daeb6c43ea971f8ceff5a7080d7d49ee30.
The added sanity check is incorrect. BUDMIN is not the wrong value and
is too small.
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
|
|
Same patch as previous one, but for ebtables.
To build a kernel that only supports ebtables-nft, the builtin tables
need to be disabled, i.e.:
CONFIG_BRIDGE_EBT_BROUTE=n
CONFIG_BRIDGE_EBT_T_FILTER=n
CONFIG_BRIDGE_EBT_T_NAT=n
The ebtables specific extensions can then be used nftables'
NFT_COMPAT interface.
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
Add hidden IP(6)_NF_IPTABLES_LEGACY symbol.
When any of the "old" builtin tables are enabled the "old" iptables
interface will be supported.
To disable the old set/getsockopt interface the existing options
for the builtin tables need to be turned off:
CONFIG_IP_NF_IPTABLES=m
CONFIG_IP_NF_FILTER is not set
CONFIG_IP_NF_NAT is not set
CONFIG_IP_NF_MANGLE is not set
CONFIG_IP_NF_RAW is not set
CONFIG_IP_NF_SECURITY is not set
Same for CONFIG_IP6_NF_ variants.
This allows to build a kernel that only supports ip(6)tables-nft
(iptables-over-nftables api).
In the future the _LEGACY symbol will become visible and the select
statements will be turned into 'depends on', but for now be on safe side
so "make oldconfig" won't break things.
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
Allows to build kernel that supports the arptables mangle target
via nftables' compat infra but without the arptables get/setsockopt
interface or the old arptables filter interpreter.
IOW, setting IP_NF_ARPFILTER=n will break arptables-legacy, but
arptables-nft will continue to work as long as nftables compat
support is enabled.
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Phil Sutter <phil@nwl.cc>
|
|
Use the new KMEM_CACHE() macro instead of direct kmem_cache_create
to simplify the creation of SLAB caches.
Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Acked-by: Simon Horman <horms@kernel.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
Use the new KMEM_CACHE() macro instead of direct kmem_cache_create
to simplify the creation of SLAB caches.
Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
No need to refetch the flag from the netlink attribute, pass the
existing flags variable which already provide validated flags.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
Allow a new process to take ownership of a previously owned table,
useful mostly for firewall management services restarting or suspending
when idle.
By extending __NFT_TABLE_F_UPDATE, the on/off/on check in
nf_tables_updtable() also covers table adoption, although it is actually
not needed: Table adoption is irreversible because nf_tables_updtable()
rejects attempts to drop NFT_TABLE_F_OWNER so table->nlpid setting can
happen just once within the transaction.
If the transaction commences, table's nlpid and flags fields are already
set and no further action is required. If it aborts, the table returns
to orphaned state.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
This companion flag to NFT_TABLE_F_OWNER requests the kernel to keep the
table around after the process has exited. It marks such table as
orphaned (by dropping OWNER flag but keeping PERSIST flag in place),
which opens it for other processes to manipulate. For the sake of
simplicity, PERSIST flag may not be altered though.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
Add at least this one-liner describing the obvious.
Fixes: 6001a930ce03 ("netfilter: nftables: introduce table ownership")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>
|