Age | Commit message (Collapse) | Author | Files | Lines |
|
We've switched all users to nvmem_get_mac_address(). Remove the now
dead code.
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We already have of_get_nvmem_mac_address() but some non-DT systems want
to read the MAC address from NVMEM too. Implement a generalized routine
that takes struct device as argument.
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
invalidated a lookup
Some users of rhashtables might need to move an object from one table
to another - this appears to be the reason for the incomplete usage
of NULLS markers.
To support these, we store a unique NULLS_MARKER at the end of
each chain, and when a search fails to find a match, we check
if the NULLS marker found was the expected one. If not, the search
may not have examined all objects in the target bucket, so it is
repeated.
The unique NULLS_MARKER is derived from the address of the
head of the chain. As this cannot be derived at load-time the
static rhnull in rht_bucket_nested() needs to be initialised
at run time.
Any caller of a lookup function must still be prepared for the
possibility that the object returned is in a different table - it
might have been there for some time.
Note that this does NOT provide support for other uses of
NULLS_MARKERs such as allocating with SLAB_TYPESAFE_BY_RCU or changing
the key of an object and re-inserting it in the same table.
These could only be done safely if new objects were inserted
at the *start* of a hash chain, and that is not currently the case.
Signed-off-by: NeilBrown <neilb@suse.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Existing functions to retreive the l3mdev of a device did not walk the
master chain to find the upper master. This patch adds a function to
find the l3mdev, even indirect through e.g. a bridge:
+----------+
| |
| vrf-blue |
| |
+----+-----+
|
|
+----+-----+
| |
| br-blue |
| |
+----+-----+
|
|
+----+-----+
| |
| eth0 |
| |
+----------+
This will properly resolve the l3mdev of eth0 to vrf-blue.
Signed-off-by: Alexis Bauvin <abauvin@scaleway.com>
Reviewed-by: Amine Kherbouche <akherbouche@scaleway.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: Amine Kherbouche <akherbouche@scaleway.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
UDP tunnel sockets are always opened unbound to a specific device. This
patch allow the socket to be bound on a custom device, which
incidentally makes UDP tunnels VRF-aware if binding to an l3mdev.
Signed-off-by: Alexis Bauvin <abauvin@scaleway.com>
Reviewed-by: Amine Kherbouche <akherbouche@scaleway.com>
Tested-by: Amine Kherbouche <akherbouche@scaleway.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Many drivers load the device's firmware image during the initialization
flow either from the flash or from the disk. Currently this option is not
controlled by the user and the driver decides from where to load the
firmware image.
'fw_load_policy' gives the ability to control this option which allows the
user to choose between different loading policies supported by the driver.
This parameter can be useful while testing and/or debugging the device. For
example, testing a firmware bug fix.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The userspace may need to control the carrier state.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
the flowi* structures are used and memsetted by server functions
in critical path. Currently flowi_common has a couple of holes that
we can eliminate reordering the struct fields. As a side effect,
both flowi4 and flowi6 shrink by 8 bytes.
Before:
pahole -EC flowi_common
struct flowi_common {
// ...
/* size: 40, cachelines: 1, members: 10 */
/* sum members: 32, holes: 1, sum holes: 4 */
/* padding: 4 */
/* last cacheline: 40 bytes */
};
pahole -EC flowi6
struct flowi6 {
// ...
/* size: 88, cachelines: 2, members: 6 */
/* padding: 4 */
/* last cacheline: 24 bytes */
};
pahole -EC flowi4
struct flowi4 {
// ...
/* size: 56, cachelines: 1, members: 4 */
/* padding: 4 */
/* last cacheline: 56 bytes */
};
After:
struct flowi_common {
// ...
/* size: 32, cachelines: 1, members: 10 */
/* last cacheline: 32 bytes */
};
struct flowi6 {
// ...
/* size: 80, cachelines: 2, members: 6 */
/* padding: 4 */
/* last cacheline: 16 bytes */
};
struct flowi4 {
// ...
/* size: 48, cachelines: 1, members: 4 */
/* padding: 4 */
/* last cacheline: 48 bytes */
};
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Most of the doorbelling entities are outside of the core module.
L2 queues, Roce queues, iscsi and fcoe all need to register.
Make the APIs available for these drivers.
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add the database used to register doorbelling entities, and APIs for adding
and deleting entries, and logic for traversing the database and doorbelling
once on behalf of all entities.
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Most linux hosts never setup TCP MD5 keys. We can avoid a
cache line miss (accessing tp->md5ig_info) on RX and TX
using a jump label.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In case GRO is not as efficient as it should be or disabled,
we might have a user thread trapped in __release_sock() while
softirq handler flood packets up to the point we have to drop.
This patch balances work done from user thread and softirq,
to give more chances to __release_sock() to complete its work
before new packets are added the the backlog.
This also helps if we receive many ACK packets, since GRO
does not aggregate them.
This patch brings ~60% throughput increase on a receiver
without GRO, but the spectacular gain is really on
1000x release_sock() latency reduction I have measured.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Jean-Louis Dupond reported poor iscsi TCP receive performance
that we tracked to backlog drops.
Apparently we fail to send window updates reflecting the
fact that we are under stress.
Note that we might lack a proper window increase when
backlog is fully processed, since __release_sock() clears
sk->sk_backlog.len _after_ all skbs have been processed.
This should not matter in practice. If we had a significant
load through socket backlog, we are in a dangerous
situation.
Reported-by: Jean-Louis Dupond <jean-louis@dupond.be>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Tested-by: Jean-Louis Dupond<jean-louis@dupond.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Tell the compiler that most TCP flows are using SACK these days.
There is no need to add the unlikely() clause in tcp_is_reno(),
the compiler is able to infer it.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Trace events are already present for the receive entry points, to indicate
how the reception entered the stack.
This patch adds the corresponding exit trace events that will bound the
reception such that all events occurring between the entry and the exit
can be considered as part of the reception context. This greatly helps
for dependency and root cause analyses.
Without this, it is not possible with tracepoint instrumentation to
determine whether a sched_wakeup event following a netif_receive_skb
event is the result of the packet reception or a simple coincidence after
further processing by the thread. It is possible using other mechanisms
like kretprobes, but considering the "entry" points are already present,
it would be good to add the matching exit events.
In addition to linking packets with wakeups, the entry/exit event pair
can also be used to perform network stack latency analyses.
Signed-off-by: Geneviève Bastien <gbastien@versatic.net>
CC: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Ingo Molnar <mingo@redhat.com>
CC: David S. Miller <davem@davemloft.net>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> (tracing side)
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There are no such structs flow_dissector_key_flow_vlan or
flow_dissector_key_flow_tags, the actual structs used are struct
flow_dissector_key_vlan and struct flow_dissector_key_tags. So correct the
comments against FLOW_DISSECTOR_KEY_VLAN, FLOW_DISSECTOR_KEY_FLOW_LABEL and
FLOW_DISSECTOR_KEY_CVLAN to refer to those.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Daniel Borkmann says:
====================
bpf-next 2018-11-30
The following pull-request contains BPF updates for your *net-next* tree.
(Getting out bit earlier this time to pull in a dependency from bpf.)
The main changes are:
1) Add libbpf ABI versioning and document API naming conventions
as well as ABI versioning process, from Andrey.
2) Add a new sk_msg_pop_data() helper for sk_msg based BPF
programs that is used in conjunction with sk_msg_push_data()
for adding / removing meta data to the msg data, from John.
3) Optimize convert_bpf_ld_abs() for 0 offset and fix various
lib and testsuite build failures on 32 bit, from David.
4) Make BPF prog dump for !JIT identical to how we dump subprogs
when JIT is in use, from Yonghong.
5) Rename btf_get_from_id() to make it more conform with libbpf
API naming conventions, from Martin.
6) Add a missing BPF kselftest config item, from Naresh.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Trivial conflict in net/core/filter.c, a locally computed
'sdif' is now an argument to the function.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This adds a BPF SK_MSG program helper so that we can pop data from a
msg. We use this to pop metadata from a previous push data call.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Pull networking fixes from David Miller:
1) ARM64 JIT fixes for subprog handling from Daniel Borkmann.
2) Various sparc64 JIT bug fixes (fused branch convergance, frame
pointer usage detection logic, PSEODU call argument handling).
3) Fix to use BH locking in nf_conncount, from Taehee Yoo.
4) Fix race of TX skb freeing in ipheth driver, from Bernd Eckstein.
5) Handle return value of TX NAPI completion properly in lan743x
driver, from Bryan Whitehead.
6) MAC filter deletion in i40e driver clears wrong state bit, from
Lihong Yang.
7) Fix use after free in rionet driver, from Pan Bian.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (53 commits)
s390/qeth: fix length check in SNMP processing
net: hisilicon: remove unexpected free_netdev
rapidio/rionet: do not free skb before reading its length
i40e: fix kerneldoc for xsk methods
ixgbe: recognize 1000BaseLX SFP modules as 1Gbps
i40e: Fix deletion of MAC filters
igb: fix uninitialized variables
netfilter: nf_tables: deactivate expressions in rule replecement routine
lan743x: Enable driver to work with LAN7431
tipc: fix lockdep warning during node delete
lan743x: fix return value for lan743x_tx_napi_poll
net: via: via-velocity: fix spelling mistake "alignement" -> "alignment"
qed: fix spelling mistake "attnetion" -> "attention"
net: thunderx: fix NULL pointer dereference in nic_remove
sctp: increase sk_wmem_alloc when head->truesize is increased
firestream: fix spelling mistake: "Inititing" -> "Initializing"
net: phy: add workaround for issue where PHY driver doesn't bind to the device
usbnet: ipheth: fix potential recvmsg bug and recvmsg bug 2
sparc: Adjust bpf JIT prologue for PSEUDO calls.
bpf, doc: add entries of who looks over which jits
...
|
|
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Disable BH while holding list spinlock in nf_conncount, from
Taehee Yoo.
2) List corruption in nf_conncount, also from Taehee.
3) Fix race that results in leaving around an empty list node in
nf_conncount, from Taehee Yoo.
4) Proper chain handling for inactive chains from the commit path,
from Florian Westphal. This includes a selftest for this.
5) Do duplicate rule handles when replacing rules, also from Florian.
6) Remove net_exit path in xt_RATEEST that results in splat, from Taehee.
7) Possible use-after-free in nft_compat when releasing extensions.
From Florian.
8) Memory leak in xt_hashlimit, from Taehee.
9) Call ip_vs_dst_notifier after ipv6_dev_notf, from Xin Long.
10) Fix cttimeout with udplite and gre, from Florian.
11) Preserve oif for IPv6 link-local generated traffic from mangle
table, from Alin Nastac.
12) Missing error handling in masquerade notifiers, from Taehee Yoo.
13) Use mutex to protect registration/unregistration of masquerade
extensions in order to prevent a race, from Taehee.
14) Incorrect condition check in tree_nodes_free(), also from Taehee.
15) Fix chain counter leak in rule replacement path, from Taehee.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Like the previous patch, the goal is to ease to convert nsids from one
netns to another netns.
A new attribute (NETNSA_CURRENT_NSID) is added to the kernel answer when
NETNSA_TARGET_NSID is provided, thus the user can easily convert nsids.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Like it was done for link and address, add the ability to perform get/dump
in another netns by specifying a target nsid attribute.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use the new boolopt API to add an option which disables learning from
link-local packets. The default is kept as before and learning is
enabled. This is a simple map from a boolopt bit to a bridge private
flag that is tested before learning.
v2: pass NULL for extack via sysfs
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We have been adding many new bridge options, a big number of which are
boolean but still take up netlink attribute ids and waste space in the skb.
Recently we discussed learning from link-local packets[1] and decided
yet another new boolean option will be needed, thus introducing this API
to save some bridge nl space.
The API supports changing the value of multiple boolean options at once
via the br_boolopt_multi struct which has an optmask (which options to
set, bit per opt) and optval (options' new values). Future boolean
options will only be added to the br_boolopt_id enum and then will have
to be handled in br_boolopt_toggle/get. The API will automatically
add the ability to change and export them via netlink, sysfs can use the
single boolopt function versions to do the same. The behaviour with
failing/succeeding is the same as with normal netlink option changing.
If an option requires mapping to internal kernel flag or needs special
configuration to be enabled then it should be handled in
br_boolopt_toggle. It should also be able to retrieve an option's current
state via br_boolopt_get.
v2: WARN_ON() on unsupported option as that shouldn't be possible and
also will help catch people who add new options without handling
them for both set and get. Pass down extack so if an option desires
it could set it on error and be more user-friendly.
[1] https://www.spinics.net/lists/netdev/msg532698.html
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add types and macros for packed ring.
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 838e96904ff3 ("bpf: Introduce bpf_func_info")
added bpf func info support. The userspace is able
to get better ksym's for bpf programs with jit, and
is able to print out func prototypes.
For a program containing func-to-func calls, the existing
implementation returns user specified number of function
calls and BTF types if jit is enabled. If the jit is not
enabled, it only returns the type for the main function.
This is undesirable. Interpreter may still be used
and we should keep feature identical regardless of
whether jit is enabled or not.
This patch fixed this discrepancy.
Fixes: 838e96904ff3 ("bpf: Introduce bpf_func_info")
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Make fetching of the BPF call address from ppc64 JIT generic. ppc64
was using a slightly different variant rather than through the insns'
imm field encoding as the target address would not fit into that space.
Therefore, the target subprog number was encoded into the insns' offset
and fetched through fp->aux->func[off]->bpf_func instead. Given there
are other JITs with this issue and the mechanism of fetching the address
is JIT-generic, move it into the core as a helper instead. On the JIT
side, we get information on whether the retrieved address is a fixed
one, that is, not changing through JIT passes, or a dynamic one. For
the former, JITs can optimize their imm emission because this doesn't
change jump offsets throughout JIT process.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Sandipan Das <sandipan@linux.ibm.com>
Tested-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
register_{netdevice/inetaddr/inet6addr}_notifier may return an error
value, this patch adds the code to handle these error paths.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Daniel Borkmann says:
====================
pull-request: bpf-next 2018-11-26
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Extend BTF to support function call types and improve the BPF
symbol handling with this info for kallsyms and bpftool program
dump to make debugging easier, from Martin and Yonghong.
2) Optimize LPM lookups by making longest_prefix_match() handle
multiple bytes at a time, from Eric.
3) Adds support for loading and attaching flow dissector BPF progs
from bpftool, from Stanislav.
4) Extend the sk_lookup() helper to be supported from XDP, from Nitin.
5) Enable verifier to support narrow context loads with offset > 0
to adapt to LLVM code generation (currently only offset of 0 was
supported). Add test cases as well, from Andrey.
6) Simplify passing device functions for offloaded BPF progs by
adding callbacks to bpf_prog_offload_ops instead of ndo_bpf.
Also convert nfp and netdevsim to make use of them, from Quentin.
7) Add support for sock_ops based BPF programs to send events to
the perf ring-buffer through perf_event_output helper, from
Sowmini and Daniel.
8) Add read / write support for skb->tstamp from tc BPF and cg BPF
programs to allow for supporting rate-limiting in EDT qdiscs
like fq from BPF side, from Vlad.
9) Extend libbpf API to support map in map types and add test cases
for it as well to BPF kselftests, from Nikita.
10) Account the maximum packet offset accessed by a BPF program in
the verifier and use it for optimizing nfp JIT, from Jiong.
11) Fix error handling regarding kprobe_events in BPF sample loader,
from Daniel T.
12) Add support for queue and stack map type in bpftool, from David.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
syzbot was able to trigger the WARN in cttimeout_default_get() by
passing UDPLITE as l4protocol. Alias UDPLITE to UDP, both use
same timeout values.
Furthermore, also fetch GRE timeouts. GRE is a bit more complicated,
as it still can be a module and its netns_proto_gre struct layout isn't
visible outside of the gre module. Can't move timeouts around, it
appears conntrack sysctl unregister assumes net_generic() returns
nf_proto_net, so we get crash. Expose layout of netns_proto_gre instead.
A followup nf-next patch could make gre tracker be built-in as well
if needed, its not that large.
Last, make the WARN() mention the missing protocol value in case
anything else is missing.
Reported-by: syzbot+2fae8fa157dd92618cae@syzkaller.appspotmail.com
Fixes: 8866df9264a3 ("netfilter: nfnetlink_cttimeout: pass default timeout policy to obj_to_nlattr")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
I do not see how one can effectively use skb_insert() without holding
some kind of lock. Otherwise other cpus could have changed the list
right before we have a chance of acquiring list->lock.
Only existing user is in drivers/infiniband/hw/nes/nes_mgt.c and this
one probably meant to use __skb_insert() since it appears nesqp->pau_list
is protected by nesqp->pau_lock. This looks like nesqp->pau_lock
could be removed, since nesqp->pau_list.lock could be used instead.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Faisal Latif <faisal.latif@intel.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: linux-rdma <linux-rdma@vger.kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Similar to netdev_sent_queue add helper __netdev_sent_queue as variant
of __netdev_tx_sent_queue.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull dma-mapping fixes from Christoph Hellwig:
"Two dma-direct / swiotlb regressions fixes:
- zero is a valid physical address on some arm boards, we can't use
it as the error value
- don't try to cache flush the error return value (no matter what it
is)"
* tag 'dma-mapping-4.20-3' of git://git.infradead.org/users/hch/dma-mapping:
swiotlb: Skip cache maintenance on map error
dma-direct: Make DIRECT_MAPPING_ERROR viable for SWIOTLB
|
|
Pull XArray updates from Matthew Wilcox:
"We found some bugs in the DAX conversion to XArray (and one bug which
predated the XArray conversion). There were a couple of bugs in some
of the higher-level functions, which aren't actually being called in
today's kernel, but surfaced as a result of converting existing radix
tree & IDR users over to the XArray.
Some of the other changes to how the higher-level APIs work were also
motivated by converting various users; again, they're not in use in
today's kernel, so changing them has a low probability of introducing
a bug.
Dan can still trigger a bug in the DAX code with hot-offline/online,
and we're working on tracking that down"
* tag 'xarray-4.20-rc4' of git://git.infradead.org/users/willy/linux-dax:
XArray tests: Add missing locking
dax: Avoid losing wakeup in dax_lock_mapping_entry
dax: Fix huge page faults
dax: Fix dax_unlock_mapping_entry for PMD pages
dax: Reinstate RCU protection of inode
dax: Make sure the unlocking entry isn't locked
dax: Remove optimisation from dax_lock_mapping_entry
XArray tests: Correct some 64-bit assumptions
XArray: Correct xa_store_range
XArray: Fix Documentation
XArray: Handle NULL pointers differently for allocation
XArray: Unify xa_store and __xa_store
XArray: Add xa_store_bh() and xa_store_irq()
XArray: Turn xa_erase into an exported function
XArray: Unify xa_cmpxchg and __xa_cmpxchg
XArray: Regularise xa_reserve
nilfs2: Use xa_erase_irq
XArray: Export __xa_foo to non-GPL modules
XArray: Fix xa_for_each with a single element at 0
|
|
Return code should be formally "netdev_tx_t".
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID fixes from Jiri Kosina:
- revert of the high-resolution scrolling feature, as it breaks certain
hardware due to incompatibilities between Logitech and Microsoft
worlds. Peter Hutterer is working on a fixed implementation. Until
that is finished, revert by Benjamin Tissoires.
- revert of incorrect strncpy->strlcpy conversion in uhid, from David
Herrmann
- fix for buggy sendfile() implementation on uhid device node, from
Eric Biggers
- a few assorted device-ID specific quirks
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
Revert "Input: Add the `REL_WHEEL_HI_RES` event code"
Revert "HID: input: Create a utility class for counting scroll events"
Revert "HID: logitech: Add function to enable HID++ 1.0 "scrolling acceleration""
Revert "HID: logitech: Enable high-resolution scrolling on Logitech mice"
Revert "HID: logitech: Use LDJ_DEVICE macro for existing Logitech mice"
Revert "HID: logitech: fix a used uninitialized GCC warning"
Revert "HID: input: simplify/fix high-res scroll event handling"
HID: Add quirk for Primax PIXART OEM mice
HID: i2c-hid: Disable runtime PM for LG touchscreen
HID: multitouch: Add pointstick support for Cirque Touchpad
HID: steam: remove input device when a hid client is running.
Revert "HID: uhid: use strlcpy() instead of strncpy()"
HID: uhid: forbid UHID_CREATE under KERNEL_DS or elevated privileges
HID: input: Ignore battery reported by Symbol DS4308
HID: Add quirk for Microsoft PIXART OEM mouse
|
|
Pull networking fixes from David Miller:
1) Need to take mutex in ath9k_add_interface(), from Dan Carpenter.
2) Fix mt76 build without CONFIG_LEDS_CLASS, from Arnd Bergmann.
3) Fix socket wmem accounting in SCTP, from Xin Long.
4) Fix failed resume crash in ena driver, from Arthur Kiyanovski.
5) qed driver passes bytes instead of bits into second arg of
bitmap_weight(). From Denis Bolotin.
6) Fix reset deadlock in ibmvnic, from Juliet Kim.
7) skb_scrube_packet() needs to scrub the fwd marks too, from Petr
Machata.
8) Make sure older TCP stacks see enough dup ACKs, and avoid doing SACK
compression during this period, from Eric Dumazet.
9) Add atomicity to SMC protocol cursor handling, from Ursula Braun.
10) Don't leave dangling error pointer if bpf_prog_add() fails in
thunderx driver, from Lorenzo Bianconi. Also, when we unmap TSO
headers, set sq->tso_hdrs to NULL.
11) Fix race condition over state variables in act_police, from Davide
Caratti.
12) Disable guest csum in the presence of XDP in virtio_net, from Jason
Wang.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (64 commits)
net: gemini: Fix copy/paste error
net: phy: mscc: fix deadlock in vsc85xx_default_config
dt-bindings: dsa: Fix typo in "probed"
net: thunderx: set tso_hdrs pointer to NULL in nicvf_free_snd_queue
net: amd: add missing of_node_put()
team: no need to do team_notify_peers or team_mcast_rejoin when disabling port
virtio-net: fail XDP set if guest csum is negotiated
virtio-net: disable guest csum during XDP set
net/sched: act_police: add missing spinlock initialization
net: don't keep lonely packets forever in the gro hash
net/ipv6: re-do dad when interface has IFF_NOARP flag change
packet: copy user buffers before orphan or clone
ibmvnic: Update driver queues after change in ring size support
ibmvnic: Fix RX queue buffer cleanup
net: thunderx: set xdp_prog to NULL if bpf_prog_add fails
net/dim: Update DIM start sample after each DIM iteration
net: faraday: ftmac100: remove netif_running(netdev) check before disabling interrupts
net/smc: use after free fix in smc_wr_tx_put_slot()
net/smc: atomic SMCD cursor handling
net/smc: add SMC-D shutdown signal
...
|
|
Drop switchdev_ops.switchdev_port_obj_add and _del. Drop the uses of
this field from all clients, which were migrated to use switchdev
notification in the previous patches.
Add a new function switchdev_port_obj_notify() that sends the switchdev
notifications SWITCHDEV_PORT_OBJ_ADD and _DEL.
Update switchdev_port_obj_del_now() to dispatch to this new function.
Drop __switchdev_port_obj_add() and update switchdev_port_obj_add()
likewise.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
After the transition from switchdev operations to notifier chain (which
will take place in following patches), the onus is on the driver to find
its own devices below possible layer of LAG or other uppers.
The logic to do so is fairly repetitive: each driver is looking for its
own devices among the lowers of the notified device. For those that it
finds, it calls a handler. To indicate that the event was handled,
struct switchdev_notifier_port_obj_info.handled is set. The differences
lie only in what constitutes an "own" device and what handler to call.
Therefore abstract this logic into two helpers,
switchdev_handle_port_obj_add() and switchdev_handle_port_obj_del(). If
a driver only supports physical ports under a bridge device, it will
simply avoid this layer of indirection.
One area where this helper diverges from the current switchdev behavior
is the case of mixed lowers, some of which are switchdev ports and some
of which are not. Previously, such scenario would fail with -EOPNOTSUPP.
The helper could do that for lowers for which the passed-in predicate
doesn't hold. That would however break the case that switchdev ports
from several different drivers are stashed under one master, a scenario
that switchdev currently happily supports. Therefore tolerate any and
all unknown netdevices, whether they are backed by a switchdev driver
or not.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
An offloading driver may need to have access to switchdev events on
ports that aren't directly under its control. An example is a VXLAN port
attached to a bridge offloaded by a driver. The driver needs to know
about VLANs configured on the VXLAN device. However the VXLAN device
isn't stashed between the bridge and a front-panel-port device (such as
is the case e.g. for LAG devices), so the usual switchdev ops don't
reach the driver.
VXLAN is likely not the only device type like this: in theory any L2
tunnel device that needs offloading will prompt requirement of this
sort. This falsifies the assumption that only the lower devices of a
front panel port need to be notified to achieve flawless offloading.
A way to fix this is to give up the notion of port object addition /
deletion as a switchdev operation, which assumes somewhat tight coupling
between the message producer and consumer. And instead send the message
over a notifier chain.
To that end, introduce two new switchdev notifier types,
SWITCHDEV_PORT_OBJ_ADD and SWITCHDEV_PORT_OBJ_DEL. These notifier types
communicate the same event as the corresponding switchdev op, except in
a form of a notification. struct switchdev_notifier_port_obj_info was
added to carry the fields that the switchdev op carries. An additional
field, handled, will be used to communicate back to switchdev that the
event has reached an interested party, which will be important for the
two-phase commit.
The two switchdev operations themselves are kept in place. Following
patches first convert individual clients to the notifier protocol, and
only then are the operations removed.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In general one can't assume that a switchdev notifier is called in a
non-atomic context, and correspondingly, the switchdev notifier chain is
an atomic one.
However, port object addition and deletion messages are delivered from a
process context. Even the MDB addition messages, whose delivery is
scheduled from atomic context, are queued and the delivery itself takes
place in blocking context. For VLAN messages in particular, keeping the
blocking nature is important for error reporting.
Therefore introduce a blocking notifier chain and related service
functions to distribute the notifications for which a blocking context
can be assumed.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The two macros SWITCHDEV_OBJ_PORT_VLAN() and SWITCHDEV_OBJ_PORT_MDB()
expand to a container_of() call, yielding an appropriate container of
their sole argument. However, due to a name collision, the first
argument, i.e. the contained object pointer, is not the only one to get
expanded. The third argument, which is a structure member name, and
should be kept literal, gets expanded as well. The only safe way to use
these two macros is therefore to name the local variable passed to them
"obj".
To fix this, rename the sole argument of the two macros from
"obj" (which collides with the member name) to "OBJ". Additionally,
instead of passing "OBJ" to container_of() verbatim, parenthesize it, so
that a comma in the passed-in expression doesn't pollute the
container_of() invocation.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Check that the values received by the portal interrupt coalesce
change APIs are in range.
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: Roy Pledge <roy.pledge@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
tpacket_snd sends packets with user pages linked into skb frags. It
notifies that pages can be reused when the skb is released by setting
skb->destructor to tpacket_destruct_skb.
This can cause data corruption if the skb is orphaned (e.g., on
transmit through veth) or cloned (e.g., on mirror to another psock).
Create a kernel-private copy of data in these cases, same as tun/tap
zerocopy transmission. Reuse that infrastructure: mark the skb as
SKBTX_ZEROCOPY_FRAG, which will trigger copy in skb_orphan_frags(_rx).
Unlike other zerocopy packets, do not set shinfo destructor_arg to
struct ubuf_info. tpacket_destruct_skb already uses that ptr to notify
when the original skb is released and a timestamp is recorded. Do not
change this timestamp behavior. The ubuf_info->callback is not needed
anyway, as no zerocopy notification is expected.
Mark destructor_arg as not-a-uarg by setting the lower bit to 1. The
resulting value is not a valid ubuf_info pointer, nor a valid
tpacket_snd frame address. Add skb_zcopy_.._nouarg helpers for this.
The fix relies on features introduced in commit 52267790ef52 ("sock:
add MSG_ZEROCOPY"), so can be backported as is only to 4.14.
Tested with from `./in_netns.sh ./txring_overwrite` from
http://github.com/wdebruij/kerneltools/tests
Fixes: 69e3c75f4d54 ("net: TX_RING and packet mmap")
Reported-by: Anand H. Krishnan <anandhkrishnan@gmail.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This could be used to rate limit egress traffic in concert with a qdisc
which supports Earliest Departure Time, such as FQ.
Write access from cg skb progs only with CAP_SYS_ADMIN, since the value
will be used by downstream qdiscs. It might make sense to relax this.
Changes v1 -> v2:
- allow access from cg skb, write only with CAP_SYS_ADMIN
Signed-off-by: Vlad Dumitrescu <vladum@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Perform CQ initialization in the driver when the capability is supported
by the FW. When passing the CQ to HW indicate that the CQ buffer has
been pre-initialized.
Doing so decreases CQ creation time. Testing on P8 showed a single 2048
entry CQ creation time was reduced from ~395us to ~170us, which is
2.3x faster.
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
On every iteration of net_dim, the algorithm may choose to
check for the system state by comparing current data sample
with previous data sample. After each of these comparison,
regardless of the action taken, the sample used as baseline
is needed to be updated.
This patch fixes a bug that causes DIM to take wrong decisions,
due to never updating the baseline sample for comparison between
iterations. This way, DIM always compares current sample with
zeros.
Although this is a functional fix, it also improves and stabilizes
performance as the algorithm works properly now.
Performance:
Tested single UDP TX stream with pktgen:
samples/pktgen/pktgen_sample03_burst_single_flow.sh -i p4p2 -d 1.1.1.1
-m 24:8a:07:88:26:8b -f 3 -b 128
ConnectX-5 100GbE packet rate improved from 15-19Mpps to 19-20Mpps.
Also, toggling between profiles is less frequent with the fix.
Fixes: 8115b750dbcb ("net/dim: use struct net_dim_sample as arg to net_dim")
Signed-off-by: Tal Gilboa <talgi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are a number of small USB fixes for 4.20-rc4.
There's the usual xhci and dwc2/3 fixes as well as a few minor other
issues resolved for problems that have been reported. Full details are
in the shortlog.
All have been in linux-next for a while with no reported issues"
* tag 'usb-4.20-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: cdc-acm: add entry for Hiro (Conexant) modem
usb: xhci: Prevent bus suspend if a port connect change or polling state is detected
usb: core: Fix hub port connection events lost
usb: dwc3: gadget: fix ISOC TRB type on unaligned transfers
Revert "usb: gadget: ffs: Fix BUG when userland exits with submitted AIO transfers"
usb: dwc2: pci: Fix an error code in probe
usb: dwc3: Fix NULL pointer exception in dwc3_pci_remove()
xhci: Add quirk to workaround the errata seen on Cavium Thunder-X2 Soc
usb: xhci: fix timeout for transition from RExit to U0
usb: xhci: fix uninitialized completion when USB3 port got wrong status
xhci: Add check for invalid byte size error when UAS devices are connected.
xhci: handle port status events for removed USB3 hcd
xhci: Fix leaking USB3 shared_hcd at xhci removal
USB: misc: appledisplay: add 20" Apple Cinema Display
USB: quirks: Add no-lpm quirk for Raydium touchscreens
usb: quirks: Add delay-init quirk for Corsair K70 LUX RGB
USB: Wait for extra delay time after USB_PORT_FEAT_RESET for quirky hub
usb: dwc3: gadget: Properly check last unaligned/zero chain TRB
usb: dwc3: core: Clean up ULPI device
|