Age | Commit message (Collapse) | Author | Files | Lines |
|
This patch introduces the RX checksum function to check the
status of the hardware calculated checksum and its error and
appropriately convey status to the upper stack in skb->ip_summed
field.
In hardware, we only support checksum for the following
protocols:
1) IPv4,
2) TCP(over IPv4 or IPv6),
3) UDP(over IPv4 or IPv6),
4) SCTP(over IPv4 or IPv6)
but we support many L3(IPv4, IPv6, MPLS, PPPoE etc) and
L4(TCP, UDP, GRE, SCTP, IGMP, ICMP etc.) protocols.
Hardware limitation:
Our present hardware RX Descriptor lacks L3/L4 checksum
"Status & Error" bit (which usually can be used to indicate whether
checksum was calculated by the hardware and if there was any error
encountered during checksum calculation).
Software workaround:
We do get info within the RX descriptor about the kind of
L3/L4 protocol coming in the packet and the error status. These
errors might not just be checksum errors but could be related to
version, length of IPv4, UDP, TCP etc.
Because there is no-way of knowing if it is a L3/L4 error due
to bad checksum or any other L3/L4 error, we will not (cannot)
convey hardware checksum status(CHECKSUM_UNNECESSARY) for such
cases to upper stack and will not maintain the RX L3/L4 checksum
counters as well.
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The budget split function requires the phy speed to be known.
While ndo open a phy speed identification is postponed till the
moment link is up. Hence, move it to appropriate callback, when link
is up.
Reported-by: Grygorii Strashko <grygorii.strashko@ti.com>
Fixes: 8feb0a196507 ("net: ethernet: ti: cpsw: split tx budget according between channels")
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Neal Cardwell says:
If I am reading the code correctly, then I would have two concerns:
1) Has that been tested? That seems like an extremely dramatic
decrease in cwnd. For example, if the cwnd is 80, and there are 40
ACKs, and half the ACKs are ECE marked, then my back-of-the-envelope
calculations seem to suggest that after just 11 ACKs the cwnd would be
down to a minimal value of 2 [..]
2) That seems to contradict another passage in the draft [..] where it
sazs:
Just as specified in [RFC3168], DCTCP does not react to congestion
indications more than once for every window of data.
Neal is right. Fortunately we don't have to complicate this by testing
vs. current rtt estimate, we can just revert the patch.
Normal stack already handles this for us: receiving ACKs with ECE
set causes a call to tcp_enter_cwr(), from there on the ssthresh gets
adjusted and prr will take care of cwnd adjustment.
Fixes: 4780566784b396 ("dctcp: update cwnd on congestion event")
Cc: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Vivien Didelot says:
====================
net: dsa: mv88e6xxx: rework reset and PPU code
Old Marvell chips (like 88E6060) don't have a PHY Polling Unit (PPU).
Next chips (like 88E6185) have a PPU, which has exclusive access to the
PHY registers, thus must be disabled before access.
Newer chips (like 88E6352) have an indirect mechanism to access the PHY
registers whenever, thus loose control over the PPU (always enabled).
Here's a summary:
Model | PPU? | Has PPU ctrl? | PPU state readable? | PHY access
----- | ---- | -------------- | ------------------- | ----------
6060 | no | no | no | direct
6185 | yes | yes, PPUEn bit | yes, PPUState 2-bit | direct w/ PPU dis.
6352 | yes | no | yes, PPUState 1-bit | indirect
6390 | yes | no | yes, InitState bit | indirect
Depending on the PPU control, a switch may have to restart the PPU when
resetting the switch. Once the switch is reset, we must wait for the PPU
state to be active polling again before accessing the registers.
For that purpose, add new operations to the chips to enable/disable the
PPU, and execute software reset. With these new ops in place, rework the
switch reset code and finally get rid of the MV88E6XXX_FLAG_PPU* flags.
Changes in v3:
- consider 6097 as 6352 (no PPU ops and use mv88e6352_g1_reset).
Changes in v2:
- wait in ppu/reset ops so that ppu_polling is not needed anymore.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Some Marvell chips can enable/disable the PPU on demand. This is needed
to access the PHY registers when there is no indirection mechanism.
Add two new ppu_enable and ppu_disable ops to describe this and finally
get rid of the MV88E6XXX_FLAG_PPU* flags.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Marvell chips have different way to issue a software reset.
Old chips (such as 88E6060) have a reset bit in an ATU control register.
Newer chips moved this bit in a Global control register. Chips with
controllable PPU should reset the PPU when resetting the switch.
Add a new reset operation to implement these differences and introduce a
mv88e6xxx_software_reset() helper to wrap it conveniently.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add an helper to toggle the eventual GPIO connected to the reset pin.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Before resetting a switch, the ports should be set to the Disabled state
and the transmit queues should be drained.
Add an helper to explicit that.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Lino Sanfilippo says:
====================
Gigabit ethernet driver for Alacritechs SLIC devices (v4)
this is the forth version of the slicoss gigabit ethernet driver (which is a
rework of the driver from Alacritech which can currently be found under
drivers/staging/slicoss). The driver is supposed to support Mojave, Oasis and
Kalahari cards, for both copper and fiber.
If this code is accepted the staging version can be removed.
The driver has been tested on a SEN2104ET adapter (4 Port PCIe copper).
v4:
- fix wrong driver name in Kconfig file (reported by Rami Rosen)
- remove unused variable from driver struct (reported by Rami Rosen)
- return "err" instead of 0 in slic_load_rcvseq_firmware() (reported by Rami Rosen)
- Fix typos in constants, comments and error message (reported by Markus Böhme)
- fix various warnings concerning signedness (reported by Markus Böhme)
- improve line formatting (reported by Markus Böhme)
- add comment describing the need for SLIC_MAX_TX_COMPLETIONS (suggested by Florian Fainelli)
- do not zero out complete rx descriptor (suggested by Florian Fainelli)
- add missing write barrier (reported by Florian Fainelli)
- remove unneeded assignment of net_device to skb (reported by Florian Fainelli)
- use napi_complete_done() instead of napi_complete (suggested by Florian Fainelli)
- use napi_schedule_irqoff() instead of napi_schedule (suggested by Florian Fainelli)
- do not map error returned by slic_init() to -ENOMEM
- do proper dma syncs before and after rx descriptor status is set to 0
- if after dma sync for CPU rx descriptor is not used return it to HW by means of dma sync for device
v3:
- dont add defines to pci_ids.h but instead put it into the drivers header file
(requested by Greg Kroah-Hartman)
v2:
- remove unusual padding in statistic strings (suggested by Andrew Lunn)
- for mdio register and bit names use defines from mii.h instead of own ones
(suggested by Andrew Lunn)
- remove unused defines
- ensure PCI flush at two more places
- use mmiowb before lock to prevent mmio writes leaking out of lock
- fix some typos in comments
- add copyright and GPL header
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add myself as maintainer for the slicoss ethernet driver.
Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add driver for Alacritech gigabit ethernet cards with SLIC (session-layer
interface control) technology. The driver provides basic support without
SLIC for the following devices:
- Mojave cards (single port PCI Gigabit) both copper and fiber
- Oasis cards (single and dual port PCI-x Gigabit) copper and fiber
- Kalahari cards (dual and quad port PCI-e Gigabit) copper and fiber
Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In UDP recvmsg() path we currently access 3 cache lines from an skb
while holding receive queue lock, plus another one if packet is
dequeued, since we need to change skb->next->prev
1st cache line (contains ->next/prev pointers, offsets 0x00 and 0x08)
2nd cache line (skb->len & skb->peeked, offsets 0x80 and 0x8e)
3rd cache line (skb->truesize/users, offsets 0xe0 and 0xe4)
skb->peeked is only needed to make sure 0-length packets are properly
handled while MSG_PEEK is operated.
I had first the intent to remove skb->peeked but the "MSG_PEEK at
non-zero offset" support added by Sam Kumar makes this not possible.
This patch avoids one cache line miss during the locked section, when
skb->len and skb->peeked do not have to be read.
It also avoids the skb_set_peeked() cost for non empty UDP datagrams.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Dongpo Li says:
====================
net: hix5hd2_gmac: add tx sg feature and reset/clock control signals
The "hix5hd2" is SoC name, add the generic ethernet driver compatible string.
The "hisi-gemac-v1" is the basic version and "hisi-gemac-v2" adds
the SG/TXCSUM/TSO/UFO features.
This patch set only adds the SG(scatter-gather) driver for transmitting,
the drivers of other features will be submitted later.
Add the MAC reset control signals and clock signals.
We make these signals optional to be backward compatible with
the hix5hd2 SoC.
Changes in v2:
- Make the compatible string changes be a separate patch and
the most specific string come first than the generic string
as advised by Rob.
- Make the MAC reset control signals and clock signals optional
to be backward compatible with the hix5hd2 SoC.
- Change the compatible string and give the clock a specific name
in hix5hd2 dts file.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add gmac generic compatible and clock names.
Signed-off-by: Dongpo Li <lidongpo@hisilicon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add three reset control signals, "mac_core_rst", "mac_ifc_rst" and
"phy_rst".
The following diagram explained how the reset signals work.
SoC
|-----------------------------------------------------
| ------ |
| | cpu | |
| ------ |
| | |
| ------------ AMBA bus |
| GMAC | |
| ---------------------- |
| ------------- mac_core_rst | -------------- | |
| |clock and |-------------->| mac core | | |
| |reset | | -------------- | |
| |generator |---- | | | |
| ------------- | | ---------------- | |
| | ---------->| mac interface | | |
| | mac_ifc_rst | ---------------- | |
| | | | | |
| | | ------------------ | |
| |phy_rst | | RGMII interface | | |
| | | ------------------ | |
| | ---------------------- |
|----------|------------------------------------------|
| |
| ----------
|--------------------- |PHY chip |
----------
The "mac_core_rst" represents "mac core reset signal", it resets
the mac core including packet processing unit, descriptor processing unit,
tx engine, rx engine, control unit.
The "mac_ifc_rst" represents "mac interface reset signal", it resets
the mac interface. The mac interface unit connects mac core and
data interface like MII/RMII/RGMII. After we set a new value of
interface mode, we must reset mac interface to reload the new mode value.
The "mac_core_rst" and "mac_ifc_rst" are both optional to be
backward compatible with the hix5hd2 SoC.
The "phy_rst" represents "phy reset signal", it does a hardware reset
on the PHY chip. This reset signal is optional if the PHY can work well
without the hardware reset.
Add one more clock signal, the existing is MAC core clock,
and the new one is MAC interface clock.
The MAC interface clock is optional to be backward compatible with
the hix5hd2 SoC.
Signed-off-by: Dongpo Li <lidongpo@hisilicon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
"hisi-gemac-v2" adds the SG/TXCSUM/TSO/UFO features.
This patch only adds the SG(scatter-gather) driver for transmitting,
the drivers of other features will be submitted later.
Signed-off-by: Dongpo Li <lidongpo@hisilicon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The "hix5hd2" is SoC name, add the generic ethernet driver name.
The "hisi-gemac-v1" is the basic version and "hisi-gemac-v2" adds
the SG/TXCSUM/TSO/UFO features.
Signed-off-by: Dongpo Li <lidongpo@hisilicon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use DSA_TAG_PROTO_EDSA as tag_protocol for the mv88e6097. The
initialisation was missing before.
Fixes: a1f482aa8c33 ("net: dsa: mv88e6xxx: Move the tagging protocol into info")
Signed-off-by: Stefan Eichenberger <stefan.eichenberger@netmodule.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
- direct packet read is allowed for LWT_*
- direct packet write for LWT_IN/LWT_OUT is prohibited
- direct packet write for LWT_XMIT is allowed
- access to skb->tc_classid is prohibited for LWT_*
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We found network manager is necessary on RHEL to make the synthetic
NIC, VF NIC bonding operations handled automatically. So, enabling
network manager here.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Daniel Borkmann says:
====================
Minor BPF cleanups and digest
First two patches are minor cleanups, and the third one adds
a prog digest. For details, please see individual patches.
After this one, I have a set with tracepoint support that makes
use of this facility as well.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When loading a BPF program via bpf(2), calculate the digest over
the program's instruction stream and store it in struct bpf_prog's
digest member. This is done at a point in time before any instructions
are rewritten by the verifier. Any unstable map file descriptor
number part of the imm field will be zeroed for the hash.
fdinfo example output for progs:
# cat /proc/1590/fdinfo/5
pos: 0
flags: 02000002
mnt_id: 11
prog_type: 1
prog_jited: 1
prog_digest: b27e8b06da22707513aa97363dfb11c7c3675d28
memlock: 4096
When programs are pinned and retrieved by an ELF loader, the loader
can check the program's digest through fdinfo and compare it against
one that was generated over the ELF file's program section to see
if the program needs to be reloaded. Furthermore, this can also be
exposed through other means such as netlink in case of a tc cls/act
dump (or xdp in future), but also through tracepoints or other
facilities to identify the program. Other than that, the digest can
also serve as a base name for the work in progress kallsyms support
of programs. The digest doesn't depend/select the crypto layer, since
we need to keep dependencies to a minimum. iproute2 will get support
for this facility.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 18cdb37ebf4c ("net: sched: do not use tcf_proto 'tp' argument from
call_rcu") removed the last usage of tp from cls_bpf_delete_prog(), so also
remove it from the function as argument to not give a wrong impression. tp
is illegal to access from this callback, since it could already have been
freed.
Refactor the deletion code a bit, so that cls_bpf_destroy() can call into
the same code for prog deletion as cls_bpf_delete() op, instead of having
it unnecessarily duplicated.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit d691f9e8d440 ("bpf: allow programs to write to certain skb
fields") pushed access type check outside of __is_valid_access()
to have different restrictions for socket filters and tc programs.
type is thus not used anymore within __is_valid_access() and should
be removed as a function argument. Same for __is_valid_xdp_access()
introduced by 6a773a15a1e8 ("bpf: add XDP prog type for early driver
filter").
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Florian Fainelli says:
====================
net: ethoc: Misc improvements
This patch series fixes/improves a few things:
- implement a proper PHYLIB adjust_link callback to set the duplex mode
accordingly
- do not open code the fetching of a MAC address in OF/DT environments
- demote an error message that occurs more frequently than expected in low
CPU/memory/bandwidth environments
Tested on a Cirrus Logic EP93xx / TS7300 board.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Spamming the console with: net eth1: packet dropped can happen
fairly frequently if the adapter is busy transmitting, demote the
message to a debug print.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Tobias Klauser <tklauser@distanz.ch>
Acked-by: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Do not open code getting the MAC address exclusively from the
"local-mac-address" property, but instead use of_get_mac_address() which
looks up the MAC address using the 3 typical property names.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Tobias Klauser <tklauser@distanz.ch>
Acked-by: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ethoc_mdio_poll() which is our PHYLIB adjust_link callback does nothing,
we should at least react to duplex changes and change MODER accordingly.
Speed changes is not a problem, since the OpenCores Ethernet core seems
to be reacting okay without us telling it.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Tobias Klauser <tklauser@distanz.ch>
Acked-by: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
1) Old code was hard to maintain, due to complex lock chains.
(We probably will be able to remove some kfree_rcu() in callers)
2) Using a single timer to update all estimators does not scale.
3) Code was buggy on 32bit kernel (WRITE_ONCE() on 64bit quantity
is not supposed to work well)
In this rewrite :
- I removed the RB tree that had to be scanned in
gen_estimator_active(). qdisc dumps should be much faster.
- Each estimator has its own timer.
- Estimations are maintained in net_rate_estimator structure,
instead of dirtying the qdisc. Minor, but part of the simplification.
- Reading the estimator uses RCU and a seqcount to provide proper
support for 32bit kernels.
- We reduce memory need when estimators are not used, since
we store a pointer, instead of the bytes/packets counters.
- xt_rateest_mt() no longer has to grab a spinlock.
(In the future, xt_rateest_tg() could be switched to per cpu counters)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Check if the returned device from tcf_exts_get_dev function supports tc
offload and in case the rule can't be offloaded, set the filter hw_dev
parameter to the original device given by the user.
The filter hw_device parameter should always be set by fl_hw_replace_filter
function, since this pointer is used by dump stats and destroy
filter for each flower rule (offloaded or not).
Fixes: 7091d8c7055d ('net/sched: cls_flower: Add offload support using egress Hardware device')
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Reported-by: Simon Horman <horms@verge.net.au>
Tested-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Made kernel accept IPv6 routes with IPv4-mapped address as next-hop.
It is possible to configure IP interfaces with IPv4-mapped addresses, and
one can add IPv6 routes for IPv4-mapped destinations/prefixes, yet prior
to this fix the kernel returned an EINVAL when attempting to add an IPv6
route with an IPv4-mapped address as a nexthop/gateway.
RFC 4798 (a proposed standard RFC) uses IPv4-mapped addresses as nexthops,
thus in order to support that type of address configuration the kernel
needs to allow IPv4-mapped addresses as nexthops.
Signed-off-by: Erik Nordmark <nordmark@arista.com>
Signed-off-by: Bob Gilligan <gilligan@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Occasionally, clang (e.g. version 3.8.1) translates a sum between two
constant operands using a BPF_OR instead of a BPF_ADD. The verifier is
currently not handling this scenario, and the destination register type
becomes UNKNOWN_VALUE even if it's still storing a constant. As a result,
the destination register cannot be used as argument to a helper function
expecting a ARG_CONST_STACK_*, limiting some use cases.
Modify the verifier to handle this case, and add a few tests to make sure
all combinations are supported, and stack boundaries are still verified
even with BPF_OR.
Signed-off-by: Gianluca Borello <g.borello@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Implement ethtooll::nway_restart by utilizing mii_nway_restart.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:
====================
pull request: bluetooth-next 2016-12-03
Here's a set of Bluetooth & 802.15.4 patches for net-next (i.e. 4.10
kernel):
- Fix for a potential NULL deref in the ieee802154 netlink code
- Fix for the ED values of the at86rf2xx driver
- Documentation updates to ieee802154
- Cleanups to u8 vs __u8 usage
- Timer API usage cleanups in HCI drivers
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Eric Dumazet says:
====================
tcp: tsq: performance series
Under very high TX stress, CPU handling NIC TX completions can spend
considerable amount of cycles handling TSQ (TCP Small Queues) logic.
This patch series avoids some atomic operations, but most notable
patch is the 3rd one, allowing other cpus processing ACK packets and
calling tcp_write_xmit() to grab TCP_TSQ_DEFERRED so that
tcp_tasklet_func() can skip already processed sockets.
This avoid lots of lock acquisitions and cache lines accesses,
particularly under load.
In v2, I added :
- tcp_small_queue_check() change to allow 1st and 2nd packets
in write queue to be sent, even in the case TX completion of
already acknowledged packets did not happen yet.
This helps when TX completion coalescing parameters are set
even to insane values, and/or busy polling is used.
- A reorganization of struct sock fields to
lower false sharing and increase data locality.
- Then I moved tsq_flags from tcp_sock to struct sock also
to reduce cache line misses during TX completions.
I measured an overall throughput gain of 22 % for heavy TCP use
over a single TX queue.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
tsq_flags being in the same cache line than sk_wmem_alloc
makes a lot of sense. Both fields are changed from tcp_wfree()
and more generally by various TSQ related functions.
Prior patch made room in struct sock and added sk_tsq_flags,
this patch deletes tsq_flags from struct tcp_sock.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Group fields used in TX path, and keep some cache lines mostly read
to permit sharing among cpus.
Gained two 4 bytes holes on 64bit arches.
Added a place holder for tcp tsq_flags, next to sk_wmem_alloc
to speed up tcp_wfree() in the following patch.
I have not added ____cacheline_aligned_in_smp, this might be done later.
I prefer doing this once inet and tcp/udp sockets reorg is also done.
Tested with both TCP and UDP.
UDP receiver performance under flood increased by ~20 % :
Accessing sk_filter/sk_wq/sk_napi_id no longer stalls because sk_drops
was moved away from a critical cache line, now mostly read and shared.
/* --- cacheline 4 boundary (256 bytes) --- */
unsigned int sk_napi_id; /* 0x100 0x4 */
int sk_rcvbuf; /* 0x104 0x4 */
struct sk_filter * sk_filter; /* 0x108 0x8 */
union {
struct socket_wq * sk_wq; /* 0x8 */
struct socket_wq * sk_wq_raw; /* 0x8 */
}; /* 0x110 0x8 */
struct xfrm_policy * sk_policy[2]; /* 0x118 0x10 */
struct dst_entry * sk_rx_dst; /* 0x128 0x8 */
struct dst_entry * sk_dst_cache; /* 0x130 0x8 */
atomic_t sk_omem_alloc; /* 0x138 0x4 */
int sk_sndbuf; /* 0x13c 0x4 */
/* --- cacheline 5 boundary (320 bytes) --- */
int sk_wmem_queued; /* 0x140 0x4 */
atomic_t sk_wmem_alloc; /* 0x144 0x4 */
long unsigned int sk_tsq_flags; /* 0x148 0x8 */
struct sk_buff * sk_send_head; /* 0x150 0x8 */
struct sk_buff_head sk_write_queue; /* 0x158 0x18 */
__s32 sk_peek_off; /* 0x170 0x4 */
int sk_write_pending; /* 0x174 0x4 */
long int sk_sndtimeo; /* 0x178 0x8 */
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Adding a likely() in tcp_mtu_probe() moves its code which used to
be inlined in front of tcp_write_xmit()
We still have a cache line miss to access icsk->icsk_mtup.enabled,
we will probably have to reorganize fields to help data locality.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Always allow the two first skbs in write queue to be sent,
regardless of sk_wmem_alloc/sk_pacing_rate values.
This helps a lot in situations where TX completions are delayed either
because of driver latencies or softirq latencies.
Test is done with no cache line misses.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Under high load, tcp_wfree() has an atomic operation trying
to schedule a tasklet over and over.
We can schedule it only if our per cpu list was empty.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Under high stress, I've seen tcp_tasklet_func() consuming
~700 usec, handling ~150 tcp sockets.
By setting TCP_TSQ_DEFERRED in tcp_wfree(), we give a chance
for other cpus/threads entering tcp_write_xmit() to grab it,
allowing tcp_tasklet_func() to skip sockets that already did
an xmit cycle.
In the future, we might give to ACK processing an increased
budget to reduce even more tcp_tasklet_func() amount of work.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Instead of atomically clear TSQ_THROTTLED and atomically set TSQ_QUEUED
bits, use one cmpxchg() to perform a single locked operation.
Since the following patch will also set TCP_TSQ_DEFERRED here,
this cmpxchg() will make this addition free.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This is a cleanup, to ease code review of following patches.
Old 'enum tsq_flags' is renamed, and a new enumeration is added
with the flags used in cmpxchg() operations as opposed to
single bit operations.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Michael Chan says:
====================
bnxt_en: Add DCBNL support.
This series adds DCBNL operations to support host-based IEEE DCBX.
v2: Updated to the latest firmware interface spec.
David, please consider this series for net-next.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Report PFC statistics to ethtool -S and DCBNL.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Support only IEEE DCBX initially. Add IEEE DCBNL ops and functions to
get and set the hardware DCBX parameters. The DCB code is conditional on
Kconfig CONFIG_BNXT_DCB.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Latest interface has the latest DCB command structs. Get and store the
max number of lossless TCs the hardware can support.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a new function bnxt_setup_mq_tc() to handle MQPRIO. This new function
will be called during ETS setup when we add DCBNL in the next patch.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
According to the documentation, the PHYs supported by this driver
can also support pause frames. Announce this to be so.
Tested with a TI83822I.
Acked-by: Andrew F. Davis <afd@ti.com>
Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|