Age | Commit message (Collapse) | Author | Files | Lines |
|
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Filling in the padding slot in the bpf structure as a bug fix in 'ne'
overlapped with actually using that padding area for something in
'net-next'.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull networking fixes from David Miller:
1) Infinite loop in _decode_session6(), from Eric Dumazet.
2) Pass correct argument to nla_strlcpy() in netfilter, also from Eric
Dumazet.
3) Out of bounds memory access in ipv6 srh code, from Mathieu Xhonneux.
4) NULL deref in XDP_REDIRECT handling of tun driver, from Toshiaki
Makita.
5) Incorrect idr release in cls_flower, from Paul Blakey.
6) Probe error handling fix in davinci_emac, from Dan Carpenter.
7) Memory leak in XPS configuration, from Alexander Duyck.
8) Use after free with cloned sockets in kcm, from Kirill Tkhai.
9) MTU handling fixes fo ip_tunnel and ip6_tunnel, from Nicolas
Dichtel.
10) Fix UAPI hole in bpf data structure for 32-bit compat applications,
from Daniel Borkmann.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (33 commits)
bpf: fix uapi hole for 32 bit compat applications
net: usb: cdc_mbim: add flag FLAG_SEND_ZLP
ip6_tunnel: remove magic mtu value 0xFFF8
ip_tunnel: restore binding to ifaces with a large mtu
net: dsa: b53: Add BCM5389 support
kcm: Fix use-after-free caused by clonned sockets
net-sysfs: Fix memory leak in XPS configuration
ixgbe: fix parsing of TC actions for HW offload
net: ethernet: davinci_emac: fix error handling in probe()
net/ncsi: Fix array size in dumpit handler
cls_flower: Fix incorrect idr release when failing to modify rule
net/sonic: Use dma_mapping_error()
xfrm Fix potential error pointer dereference in xfrm_bundle_create.
vhost_net: flush batched heads before trying to busy polling
tun: Fix NULL pointer dereference in XDP redirect
be2net: Fix error detection logic for BE3
net: qmi_wwan: Add Netgear Aircard 779S
mlxsw: spectrum: Forbid creation of VLAN 1 over port/LAG
atm: zatm: fix memcmp casting
iwlwifi: pcie: compare with number of IRQs requested for, not number of CPUs
...
|
|
If there is a significant amount of chains list search is too slow, so
add an rhlist table for this.
This speeds up ruleset loading: for every new rule we have to check if
the name already exists in current generation.
We need to be able to cope with duplicate chain names in case a transaction
drops the nfnl mutex (for request_module) and the abort of this old
transaction is still pending.
The list is kept -- we need a way to iterate chains even if hash resize is
in progress without missing an entry.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This features which allows you to limit the maximum number of
connections per arbitrary key. The connlimit expression is stateful,
therefore it can be used from meters to dynamically populate a set, this
provides a mapping to the iptables' connlimit match. This patch also
comes that allows you define static connlimit policies.
This extension depends on the nf_conncount infrastructure.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"A few final fixes:
i915:
- fix for potential Spectre vector in the new query uAPI
- fix NULL pointer deref (FDO #106559)
- DMI fix to hide LVDS for Radiant P845 (FDO #105468)
amdgpu:
- suspend/resume DC regression fix
- underscan flicker fix on fiji
- gamma setting fix after dpms
omap:
- fix oops regression
core:
- fix PSR timing
dw-hdmi:
- fix oops regression"
* tag 'drm-fixes-for-v4.17-rc8' of git://people.freedesktop.org/~airlied/linux:
drm/amd/display: Update color props when modeset is required
drm/amd/display: Make atomic-check validate underscan changes
drm/bridge/synopsys: dw-hdmi: fix dw_hdmi_setup_rx_sense
drm/amd/display: Fix BUG_ON during CRTC atomic check update
drm/i915/query: nospec expects no more than an unsigned long
drm/i915/query: Protect tainted function pointer lookup
drm/i915/lvds: Move acpi lid notification registration to registration phase
drm/i915: Disable LVDS on Radiant P845
drm/omap: fix NULL deref crash with SDI displays
drm/psr: Fix missed entry in PSR setup time table.
|
|
Before this patch, cloned expressions are released via ->destroy. This
is a problem for the new connlimit expression since the ->destroy path
drop a reference on the conntrack modules and it unregisters hooks. The
new ->destroy_clone provides context that this expression is being
released from the packet path, so it is mirroring ->clone(), where
neither module reference is dropped nor hooks need to be unregistered -
because this done from the control plane path from the ->init() path.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Use garbage collector to schedule removal of elements based of feedback
from expression that this element comes with. Therefore, the garbage
collector is not guided by timeout expirations in this new mode.
The new connlimit expression sets on the NFT_EXPR_GC flag to enable this
behaviour, the dynset expression needs to explicitly enable the garbage
collector via set->ops->gc_init call.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
nft_set_elem_destroy() can be called from call_rcu context. Annotate
netns and table in set object so we can populate the context object.
Moreover, pass context object to nf_tables_set_elem_destroy() from the
commit phase, since it is already available from there.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This patch provides an interface to maintain the list of connections and
the lookup function to obtain the number of connections in the list.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The new connlimit object needs this to properly deal with conntrack
dependencies.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The extracted functions will likely be usefull to implement tproxy
support in nf_tables.
Extrancted functions:
- nf_tproxy_sk_is_transparent
- nf_tproxy_laddr4
- nf_tproxy_handle_time_wait4
- nf_tproxy_get_sock_v4
- nf_tproxy_laddr6
- nf_tproxy_handle_time_wait6
- nf_tproxy_get_sock_v6
(nf_)tproxy_handle_time_wait6 also needed some refactor as its current
implementation was xtables-specific.
Signed-off-by: Máté Eckl <ecklm94@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
There is a function in include/net/netfilter/nf_socket.h to decide if a
socket has IP(V6)_TRANSPARENT socket option set or not. However this
does the same as inet_sk_transparent() in include/net/tcp.h
include/net/tcp.h:1733
/* This helper checks if socket has IP_TRANSPARENT set */
static inline bool inet_sk_transparent(const struct sock *sk)
{
switch (sk->sk_state) {
case TCP_TIME_WAIT:
return inet_twsk(sk)->tw_transparent;
case TCP_NEW_SYN_RECV:
return inet_rsk(inet_reqsk(sk))->no_srccheck;
}
return inet_sk(sk)->transparent;
}
tproxy_sk_is_transparent has also been refactored to use this function
instead of reimplementing it.
Signed-off-by: Máté Eckl <ecklm94@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull IIO driver fixes from Greg KH:
"Here are some old IIO driver fixes that were sitting in my tree for a
few weeks. Sorry about not getting them to you sooner. They fix a
number of small IIO driver issues that have been reported.
All of these have been in linux-next for a while with no reported
problems"
* tag 'staging-4.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
iio: adc: select buffer for at91-sama5d2_adc
iio: hid-sensor-trigger: Fix sometimes not powering up the sensor after resume
iio: adc: at91-sama5d2_adc: fix channel configuration for differential channels
iio:kfifo_buf: check for uint overflow
iio:buffer: make length types match kfifo types
iio: adc: stm32-dfsdm: fix sample rate for div2 spi clock
iio: adc: stm32-dfsdm: fix successive oversampling settings
iio: ad7793: implement IIO_CHAN_INFO_SAMP_FREQ
|
|
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter/IPVS updates for your net-next
tree, the most relevant things in this batch are:
1) Compile masquerade infrastructure into NAT module, from Florian Westphal.
Same thing with the redirection support.
2) Abort transaction if early initialization of the commit phase fails.
Also from Florian.
3) Get rid of synchronize_rcu() by using rule array in nf_tables, from
Florian.
4) Abort nf_tables batch if fatal signal is pending, from Florian.
5) Use .call_rcu nfnetlink from nf_tables to make dumps fully lockless.
From Florian Westphal.
6) Support to match transparent sockets from nf_tables, from Máté Eckl.
7) Audit support for nf_tables, from Phil Sutter.
8) Validate chain dependencies from commit phase, fall back to fine grain
validation only in case of errors.
9) Attach dst to skbuff from netfilter flowtable packet path, from
Jason A. Donenfeld.
10) Use artificial maximum attribute cap to remove VLA from nfnetlink.
Patch from Kees Cook.
11) Add extension to allow to forward packets through neighbour layer.
12) Add IPv6 conntrack helper support to IPVS, from Julian Anastasov.
13) Add IPv6 FTP conntrack support to IPVS, from Julian Anastasov.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In 64 bit, we have a 4 byte hole between ifindex and netns_dev in the
case of struct bpf_map_info but also struct bpf_prog_info. In net-next
commit b85fab0e67b ("bpf: Add gpl_compatible flag to struct bpf_prog_info")
added a bitfield into it to expose some flags related to programs. Thus,
add an unnamed __u32 bitfield for both so that alignment keeps the same
in both 32 and 64 bit cases, and can be naturally extended from there
as in b85fab0e67b.
Before:
# file test.o
test.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped
# pahole test.o
struct bpf_map_info {
__u32 type; /* 0 4 */
__u32 id; /* 4 4 */
__u32 key_size; /* 8 4 */
__u32 value_size; /* 12 4 */
__u32 max_entries; /* 16 4 */
__u32 map_flags; /* 20 4 */
char name[16]; /* 24 16 */
__u32 ifindex; /* 40 4 */
__u64 netns_dev; /* 44 8 */
__u64 netns_ino; /* 52 8 */
/* size: 64, cachelines: 1, members: 10 */
/* padding: 4 */
};
After (same as on 64 bit):
# file test.o
test.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped
# pahole test.o
struct bpf_map_info {
__u32 type; /* 0 4 */
__u32 id; /* 4 4 */
__u32 key_size; /* 8 4 */
__u32 value_size; /* 12 4 */
__u32 max_entries; /* 16 4 */
__u32 map_flags; /* 20 4 */
char name[16]; /* 24 16 */
__u32 ifindex; /* 40 4 */
/* XXX 4 bytes hole, try to pack */
__u64 netns_dev; /* 48 8 */
__u64 netns_ino; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
/* size: 64, cachelines: 1, members: 10 */
/* sum members: 60, holes: 1, sum holes: 4 */
};
Reported-by: Dmitry V. Levin <ldv@altlinux.org>
Reported-by: Eugene Syromiatnikov <esyr@redhat.com>
Fixes: 52775b33bb507 ("bpf: offload: report device information about offloaded maps")
Fixes: 675fc275a3a2d ("bpf: offload: report device information for offloaded programs")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add support for FTP commands with extended format (RFC 2428):
- FTP EPRT: IPv4 and IPv6, active mode, similar to PORT
- FTP EPSV: IPv4 and IPv6, passive mode, similar to PASV.
EPSV response usually contains only port but we allow real
server to provide different address
We restrict control and data connection to be from same
address family.
Allow the "(" and ")" to be optional in PASV response.
Also, add ipvsh argument to the pkt_in/pkt_out handlers to better
access the payload after transport header.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This allows us to forward packets from the netdev family via neighbour
layer, so you don't need an explicit link-layer destination when using
this expression from rules. The ttl/hop_limit field is decremented.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The following ruleset:
add table ip filter
add chain ip filter input { type filter hook input priority 4; }
add chain ip filter ap
add rule ip filter input jump ap
add rule ip filter ap masquerade
results in a panic, because the masquerade extension should be rejected
from the filter chain. The existing validation is missing a chain
dependency check when the rule is added to the non-base chain.
This patch fixes the problem by walking down the rules from the
basechains, searching for either immediate or lookup expressions, then
jumping to non-base chains and again walking down the rules to perform
the expression validation, so we make sure the full ruleset graph is
validated. This is done only once from the commit phase, in case of
problem, we abort the transaction and perform fine grain validation for
error reporting. This patch requires 003087911af2 ("netfilter:
nfnetlink: allow commit to fail") to achieve this behaviour.
This patch also adds a cleanup callback to nfnl batch interface to reset
the validate state from the exit path.
As a result of this patch, nf_tables_check_loops() doesn't use
->validate to check for loops, instead it just checks for immediate
expressions.
Reported-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This extends log statement to support the behaviour achieved with
AUDIT target in iptables.
Audit logging is enabled via a pseudo log level 8. In this case any
other settings like log prefix are ignored since audit log format is
fixed.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Now it can only match the transparent flag of an ip/ipv6 socket.
Signed-off-by: Máté Eckl <ecklm94@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
In the quest to remove all stack VLA usage from the kernel[1], this
allocates the maximum size expected for all possible types and adds
sanity-checks at both registration and usage to make sure nothing gets
out of sync. This matches the proposed VLA solution for nfnetlink[2]. The
values chosen here were based on finding assignments for .maxtype and
.slave_maxtype and manually counting the enums:
slave_maxtype (max 33):
IFLA_BRPORT_MAX 33
IFLA_BOND_SLAVE_MAX 9
maxtype (max 45):
IFLA_BOND_MAX 28
IFLA_BR_MAX 45
__IFLA_CAIF_HSI_MAX 8
IFLA_CAIF_MAX 4
IFLA_CAN_MAX 16
IFLA_GENEVE_MAX 12
IFLA_GRE_MAX 25
IFLA_GTP_MAX 5
IFLA_HSR_MAX 7
IFLA_IPOIB_MAX 4
IFLA_IPTUN_MAX 21
IFLA_IPVLAN_MAX 3
IFLA_MACSEC_MAX 15
IFLA_MACVLAN_MAX 7
IFLA_PPP_MAX 2
__IFLA_RMNET_MAX 4
IFLA_VLAN_MAX 6
IFLA_VRF_MAX 2
IFLA_VTI_MAX 7
IFLA_VXLAN_MAX 28
VETH_INFO_MAX 2
VXCAN_INFO_MAX 2
This additionally changes maxtype and slave_maxtype fields to unsigned,
since they're only ever using positive values.
[1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com
[2] https://patchwork.kernel.org/patch/10439647/
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The FPGA queue pair (QP) event fires whenever a QP on the FPGA
transitions to the error state.
At this stage, this event is unrecoverable, it may become recoverable
in the future.
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Adi Nissim <adin@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Temperature warning event is sent by FW to indicate high temperature
as detected by one of the sensors on the board.
Add handling of this event by writing the numbers of the alert sensors
to the kernel log.
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Adi Nissim <adin@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
FRRouting installs routes into the kernel associated with
the originating protocol. Add these values to the well
known values in rtnetlink.h.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds driver changes for capturing the link change count in
ethtool statistics display.
Please consider applying this to "net-next".
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This is additional to the
commit ea1627c20c34 ("tcp: minor optimizations around tcp_hdr() usage").
At this point, skb->data is same with tcp_hdr() as tcp header has not
been pulled yet. So use the less expensive one to get the tcp header.
Remove the third parameter of tcp_rcv_established() and put it into
the function body.
Furthermore, the local variables are listed as a reverse christmas tree :)
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
dw-hdmi: Fix Oops regression from rc1 (Neil)
Cc: Neil Armstrong <narmstrong@baylibre.com>
* tag 'drm-misc-fixes-2018-05-30' of git://anongit.freedesktop.org/drm/drm-misc:
drm/bridge/synopsys: dw-hdmi: fix dw_hdmi_setup_rx_sense
|
|
The dw_hdmi_setup_rx_sense exported function should not use struct device
to recover the dw-hdmi context using drvdata, but take struct dw_hdmi
directly like other exported functions.
This caused a regression using Meson DRM on S905X since v4.17-rc1 :
Internal error: Oops: 96000007 [#1] PREEMPT SMP
[...]
CPU: 0 PID: 124 Comm: irq/32-dw_hdmi_ Not tainted 4.17.0-rc7 #2
Hardware name: Libre Technology CC (DT)
[...]
pc : osq_lock+0x54/0x188
lr : __mutex_lock.isra.0+0x74/0x530
[...]
Process irq/32-dw_hdmi_ (pid: 124, stack limit = 0x00000000adf418cb)
Call trace:
osq_lock+0x54/0x188
__mutex_lock_slowpath+0x10/0x18
mutex_lock+0x30/0x38
__dw_hdmi_setup_rx_sense+0x28/0x98
dw_hdmi_setup_rx_sense+0x10/0x18
dw_hdmi_top_thread_irq+0x2c/0x50
irq_thread_fn+0x28/0x68
irq_thread+0x10c/0x1a0
kthread+0x128/0x130
ret_from_fork+0x10/0x18
Code: 34000964 d00050a2 51000484 9135c042 (f864d844)
---[ end trace 945641e1fbbc07da ]---
note: irq/32-dw_hdmi_[124] exited with preempt_count 1
genirq: exiting task "irq/32-dw_hdmi_" (124) is an active IRQ thread (irq 32)
Fixes: eea034af90c6 ("drm/bridge/synopsys: dw-hdmi: don't clobber drvdata")
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Tested-by: Koen Kooi <koen@dominion.thruhere.net>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/1527673438-20643-1-git-send-email-narmstrong@baylibre.com
|
|
skb->len is meaningless to user.
data length could be more helpful, with which we can easily filter out
the packet without payload.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support for IFA_RT_PRIORITY to ipv6 addresses.
If the metric is changed on an existing address then the new route
is inserted before removing the old one. Since the metric is one
of the route keys, the prefix route can not be atomically replaced.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support for IFA_RT_PRIORITY to ipv4 addresses.
If the metric is changed on an existing address then the new route
is inserted before removing the old one. Since the metric is one
of the route keys, the prefix route can not be replaced.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, if two interfaces have addresses in the same connected route,
then the order of the prefix route entries is determined by the order in
which the addresses are assigned or the links brought up.
Add IFA_RT_PRIORITY to allow user to specify the metric of the prefix
route associated with an address giving interface managers better
control of the order of prefix routes.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move config parameters for adding an ipv6 address to a struct. struct
names stem from inet6_rtm_newaddr which is the modern handler for
adding an address.
Start the conversion to ifa6_config with ipv6_add_addr. This is an argument
move only; no functional change intended. Mapping of variable changes:
addr --> cfg->pfx
peer_addr --> cfg->peer_pfx
pfxlen --> cfg->plen
flags --> cfg->ifa_flags
scope, valid_lft, prefered_lft have the same names within cfg
(with corrected spelling).
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
MQ doesn't hold any statistics on its own, however, statistic
from offloads are requested starting from the root, hence MQ
will read the old values for its sums. Call into the drivers,
because of the additive nature of the stats drivers are aware
of how much "pending updates" they have to children of the MQ.
Since MQ reset its stats on every dump we can simply offset
the stats, predicting how stats of offloaded children will
change.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
mq offload is trivial, we just need to let the device know
that the root qdisc is mq. Alternative approach would be
to export qdisc_lookup() and make drivers check the root
type themselves, but notification via ndo_setup_tc is more
in line with other qdiscs.
Note that mq doesn't hold any stats on it's own, it just
adds up stats of its children.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
AFAICT struct gnet_stats_queue.qlen is not used in Qdiscs.
It may, however, be useful for offloads to report HW queue
length there. Add that value to the result of qdisc_qlen_sum().
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5e-updates-2018-05-25
This series includes updates for mlx5e netdev driver.
1) Allowr flow based VF vport mirroring under sriov switchdev scheme,
added support for offloading the TC mirred mirror sub-action, from
Chris Mi.
=================
From: Or Gerlitz <ogerlitz@mellanox.com>
The user will typically set the actions order such that the mirror
port (mirror VF) sees packets as the original port (VF under
mirroring) sent them or as it will receive them. In the general case,
it means that packets are potentially sent to the mirror port before
or after some actions were applied on them.
To properly do that, we follow on the exact action order as set for
the flow and make sure this will also be the case when we program the
HW offload.
If all the actions should apply before forwarding to the mirror and dest port,
mirroring is just multicasting to the two vports. Otherwise, we split
the TC flow to two HW rules, where the 1st applies only the actions
needed up to the mirror (if there are such) and the 2nd the rest of
the actions plus the forwarding to the dest vport.
=================
2) Move to order-0 only allocations (using fragmented work queues) for all
work queues used by the driver, RX and TX descriptor rings
(RQs, SQs and Completion Queues (CQs)), from Tariq Toukan.
3) Avoid resetting netdevice statistics on netdevice
state changes, from Eran Ben Elisha.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
synchronize_rcu() is expensive.
The commit phase currently enforces an unconditional
synchronize_rcu() after incrementing the generation counter.
This is to make sure that a packet always sees a consistent chain, either
nft_do_chain is still using old generation (it will skip the newly added
rules), or the new one (it will skip old ones that might still be linked
into the list).
We could just remove the synchronize_rcu(), it would not cause a crash but
it could cause us to evaluate a rule that was removed and new rule for the
same packet, instead of either-or.
To resolve this, add rule pointer array holding two generations, the
current one and the future generation.
In commit phase, allocate the rule blob and populate it with the rules that
will be active in the new generation.
Then, make this rule blob public, replacing the old generation pointer.
Then the generation counter can be incremented.
nft_do_chain() will either continue to use the current generation
(in case loop was invoked right before increment), or the new one.
Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The struct Qdisc has a lot of holes, especially after commit
a53851e2c321 ("net: sched: explicit locking in gso_cpu fallback"),
which as a side effect, moved the fields just after 'busylock'
on a new cacheline.
Since both 'padded' and 'refcnt' are not updated frequently, and
there is a hole before 'gso_skb', we can move such fields there,
saving a cacheline without any performance side effect.
Before this commit:
pahole -C Qdisc net/sche/sch_generic.o
# ...
/* size: 384, cachelines: 6, members: 25 */
/* sum members: 236, holes: 3, sum holes: 92 */
/* padding: 56 */
After this commit:
pahole -C Qdisc net/sche/sch_generic.o
# ...
/* size: 320, cachelines: 5, members: 25 */
/* sum members: 236, holes: 2, sum holes: 28 */
/* padding: 56 */
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch is to move some definitions in ptp_qoriq.c
to the header file.
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This feature bit can be used by hypervisor to indicate virtio_net device to
act as a standby for another device with the same MAC address.
VIRTIO_NET_F_STANDBY is defined as bit 62 as it is a device feature bit.
Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The net_failover driver provides an automated failover mechanism via APIs
to create and destroy a failover master netdev and manages a primary and
standby slave netdevs that get registered via the generic failover
infrastructure.
The failover netdev acts a master device and controls 2 slave devices. The
original paravirtual interface gets registered as 'standby' slave netdev and
a passthru/vf device with the same MAC gets registered as 'primary' slave
netdev. Both 'standby' and 'failover' netdevs are associated with the same
'pci' device. The user accesses the network interface via 'failover' netdev.
The 'failover' netdev chooses 'primary' netdev as default for transmits when
it is available with link up and running.
This can be used by paravirtual drivers to enable an alternate low latency
datapath. It also enables hypervisor controlled live migration of a VM with
direct attached VF by failing over to the paravirtual datapath when the VF
is unplugged.
Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The failover module provides a generic interface for paravirtual drivers
to register a netdev and a set of ops with a failover instance. The ops
are used as event handlers that get called to handle netdev register/
unregister/link change/name change events on slave pci ethernet devices
with the same mac address as the failover netdev.
This enables paravirtual drivers to use a VF as an accelerated low latency
datapath. It also allows migration of VMs with direct attached VFs by
failing over to the paravirtual datapath when the VF is unplugged.
Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
These have to be included always when nf_socket.h is included.
Signed-off-by: Máté Eckl <ecklm94@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Lots of easy overlapping changes in the confict
resolutions here.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Thomas Gleixner:
"Three fixes for scheduler and kthread code:
- allow calling kthread_park() on an already parked thread
- restore the sched_pi_setprio() tracepoint behaviour
- clarify the unclear string for the scheduling domain debug output"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched, tracing: Fix trace_sched_pi_setprio() for deboosting
kthread: Allow kthread_park() on a parked kthread
sched/topology: Clarify root domain(s) debug string
|
|
Merge misc fixes from Andrew Morton:
"16 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
kasan: fix memory hotplug during boot
kasan: free allocated shadow memory on MEM_CANCEL_ONLINE
checkpatch: fix macro argument precedence test
init/main.c: include <linux/mem_encrypt.h>
kernel/sys.c: fix potential Spectre v1 issue
mm/memory_hotplug: fix leftover use of struct page during hotplug
proc: fix smaps and meminfo alignment
mm: do not warn on offline nodes unless the specific node is explicitly requested
mm, memory_hotplug: make has_unmovable_pages more robust
mm/kasan: don't vfree() nonexistent vm_area
MAINTAINERS: change hugetlbfs maintainer and update files
ipc/shm: fix shmat() nil address after round-down when remapping
Revert "ipc/shm: Fix shmat mmap nil-page protection"
idr: fix invalid ptr dereference on item delete
ocfs2: revert "ocfs2/o2hb: check len for bio_add_page() to avoid getting incorrect bio"
mm: fix nr_rotate_swap leak in swapon() error case
|
|
Pull networking fixes from David Miller:
"Let's begin the holiday weekend with some networking fixes:
1) Whoops need to restrict cfg80211 wiphy names even more to 64
bytes. From Eric Biggers.
2) Fix flags being ignored when using kernel_connect() with SCTP,
from Xin Long.
3) Use after free in DCCP, from Alexey Kodanev.
4) Need to check rhltable_init() return value in ipmr code, from Eric
Dumazet.
5) XDP handling fixes in virtio_net from Jason Wang.
6) Missing RTA_TABLE in rtm_ipv4_policy[], from Roopa Prabhu.
7) Need to use IRQ disabling spinlocks in mlx4_qp_lookup(), from Jack
Morgenstein.
8) Prevent out-of-bounds speculation using indexes in BPF, from
Daniel Borkmann.
9) Fix regression added by AF_PACKET link layer cure, from Willem de
Bruijn.
10) Correct ENIC dma mask, from Govindarajulu Varadarajan.
11) Missing config options for PMTU tests, from Stefano Brivio"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (48 commits)
ibmvnic: Fix partial success login retries
selftests/net: Add missing config options for PMTU tests
mlx4_core: allocate ICM memory in page size chunks
enic: set DMA mask to 47 bit
ppp: remove the PPPIOCDETACH ioctl
ipv4: remove warning in ip_recv_error
net : sched: cls_api: deal with egdev path only if needed
vhost: synchronize IOTLB message with dev cleanup
packet: fix reserve calculation
net/mlx5: IPSec, Fix a race between concurrent sandbox QP commands
net/mlx5e: When RXFCS is set, add FCS data into checksum calculation
bpf: properly enforce index mask to prevent out-of-bounds speculation
net/mlx4: Fix irq-unsafe spinlock usage
net: phy: broadcom: Fix bcm_write_exp()
net: phy: broadcom: Fix auxiliary control register reads
net: ipv4: add missing RTA_TABLE to rtm_ipv4_policy
net/mlx4: fix spelling mistake: "Inrerface" -> "Interface" and rephrase message
ibmvnic: Only do H_EOI for mobility events
tuntap: correctly set SOCKWQ_ASYNC_NOSPACE
virtio-net: fix leaking page for gso packet during mergeable XDP
...
|
|
The case of a new numa node got missed in avoiding using the node info
from page_struct during hotplug. In this path we have a call to
register_mem_sect_under_node (which allows us to specify it is hotplug
so don't change the node), via link_mem_sections which unfortunately
does not.
Fix is to pass check_nid through link_mem_sections as well and disable
it in the new numa node path.
Note the bug only 'sometimes' manifests depending on what happens to be
in the struct page structures - there are lots of them and it only needs
to match one of them.
The result of the bug is that (with a new memory only node) we never
successfully call register_mem_sect_under_node so don't get the memory
associated with the node in sysfs and meminfo for the node doesn't
report it.
It came up whilst testing some arm64 hotplug patches, but appears to be
universal. Whilst I'm triggering it by removing then reinserting memory
to a node with no other elements (thus making the node disappear then
appear again), it appears it would happen on hotplugging memory where
there was none before and it doesn't seem to be related the arm64
patches.
These patches call __add_pages (where most of the issue was fixed by
Pavel's patch). If there is a node at the time of the __add_pages call
then all is well as it calls register_mem_sect_under_node from there
with check_nid set to false. Without a node that function returns
having not done the sysfs related stuff as there is no node to use.
This is expected but it is the resulting path that fails...
Exact path to the problem is as follows:
mm/memory_hotplug.c: add_memory_resource()
The node is not online so we enter the 'if (new_node)' twice, on the
second such block there is a call to link_mem_sections which calls
into
drivers/node.c: link_mem_sections() which calls
drivers/node.c: register_mem_sect_under_node() which calls
get_nid_for_pfn and keeps trying until the output of that matches
the expected node (passed all the way down from
add_memory_resource)
It is effectively the same fix as the one referred to in the fixes tag
just in the code path for a new node where the comments point out we
have to rerun the link creation because it will have failed in
register_new_memory (as there was no node at the time). (actually that
comment is wrong now as we don't have register_new_memory any more it
got renamed to hotplug_memory_register in Pavel's patch).
Link: http://lkml.kernel.org/r/20180504085311.1240-1-Jonathan.Cameron@huawei.com
Fixes: fc44f7f9231a ("mm/memory_hotplug: don't read nid from struct page during hotplug")
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|