summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2015-04-14Merge branch 'cxgb4-next'David S. Miller2-17/+62
Hariprasad Shenai says: ==================== cxgb4: Misc. fixes for sge Increases value of MAX_IMM_TX_PKT_LEN to improve latency, fill freelist starving threshold based on adapter type, add comments for tx flits and sge length code and don't call t4_slow_intr_handler when we are not master PF. This patch series has been created against net-next tree and includes patches on cxgb4 driver We have included all the maintainers of respective drivers. Kindly review the change and let us know in case of any review comments. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-14cxgb4: Don't call t4_slow_intr_handler when we're not the Master PFHariprasad Shenai2-3/+6
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-14cxgb4: Add comment for calculate tx flits and sge length codeHariprasad Shenai1-1/+35
Add comment for tx filt and sge length calucaltion code, also remove a hardcoded value Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-14cxgb4: Use device node in page allocationHariprasad Shenai1-2/+4
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-14cxgb4: Freelist starving threshold varies from adapter to adapterHariprasad Shenai1-10/+16
fl_starv_thres could be different from adapter to adapter, don't use hardcoded values Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-14cxgb4: Increased the value of MAX_IMM_TX_PKT_LEN from 128 to 256 bytesHariprasad Shenai1-1/+1
This allows a significant latency drop for packets of sizes between 128 and 192 bytes Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-14bgmac: drop ring->num_slotsFelix Fietkau2-15/+15
The ring size is always known at compile time, so make the code a bit more efficient Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-14bgmac: fix DMA rx corruptionFelix Fietkau1-9/+18
The driver needs to inform the hardware about the first invalid (not yet filled) rx slot, by writing its DMA descriptor pointer offset to the BGMAC_DMA_RX_INDEX register. This register was set to a value exceeding the rx ring size, effectively allowing the hardware constant access to the full ring, regardless of which slots are initialized. To fix this issue, always mark the last filled rx slot as invalid. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-14bgmac: simplify dma init/cleanupFelix Fietkau1-39/+40
Instead of allocating buffers at device init time and initializing descriptors at device open, do both at the same time (during open). Free all buffers when closing the device. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Acked-by: Rafał Miłecki <zajec5@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-14bgmac: increase rx ring size from 511 to 512Felix Fietkau1-1/+1
Limiting it to 511 looks like a failed attempt at leaving one descriptor empty to allow the hardware to stop processing a buffer that has not been prepared yet. However, this doesn't work because this affects the total ring size as well Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-14bgmac: add check for oversized packetsFelix Fietkau1-0/+7
In very rare cases, the MAC can catch an internal buffer that is bigger than it's supposed to be. Instead of crashing the kernel, simply pass the buffer back to the hardware Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-14bgmac: simplify/optimize rx DMA error handlingFelix Fietkau1-37/+34
Allocate a new buffer before processing the completed one. If allocation fails, reuse the old buffer. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Acked-by: Rafał Miłecki <zajec5@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-14bgmac: set received skb headroom to NET_SKB_PADFelix Fietkau2-7/+11
A packet buffer offset of 30 bytes is inefficient, because the first 2 bytes end up in a different cacheline. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-14bgmac: leave interrupts disabled as long as there is work to doFelix Fietkau2-22/+10
Always poll rx and tx during NAPI poll instead of relying on the status of the first interrupt. This prevents bgmac_poll from leaving unfinished work around until the next IRQ. In my tests this makes bridging/routing throughput under heavy load more stable and ensures that no new IRQs arrive as long as bgmac_poll uses up the entire budget. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-14bgmac: simplify tx ring index handlingFelix Fietkau2-29/+23
Keep incrementing ring->start and ring->end instead of pointing it to the actual ring slot entry. This simplifies the calculation of the number of free slots. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Acked-by: Rafał Miłecki <zajec5@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-14toshiba: Remove celleb from Kconfig optionsDaniel Axtens1-2/+2
The toshiba drivers had celleb as an optional dependency. celleb has been dropped [1], so clean that out of Kconfig. [1] http://patchwork.ozlabs.org/patch/451730/ CC: netdev@vger.kernel.org CC: Valentin Rothberg <valentinrothberg@gmail.com> CC: mpe@ellerman.id.au CC: linuxppc-dev@lists.ozlabs.org Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-14hv_netvsc: Implement partial copy into send bufferHaiyang Zhang3-19/+46
If remaining space in a send buffer slot is too small for the whole message, we only copy the RNDIS header and PPI data into send buffer, so we can batch one more packet each time. It reduces the vmbus per-message overhead. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-14Merge branch 'for-davem' of ↵David S. Miller87-387/+494
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Al Viro says: ==================== netdev-related stuff in vfs.git There are several commits sitting in vfs.git that probably ought to go in via net-next.git. First of all, there's merge with vfs.git#iocb - that's Christoph's aio rework, which has triggered conflicts with the ->sendmsg() and ->recvmsg() patches a while ago. It's not so much Christoph's stuff that ought to be in net-next, as (pretty simple) conflict resolution on merge. The next chunk is switch to {compat_,}import_iovec/import_single_range - new safer primitives for initializing iov_iter. The primitives themselves come from vfs/git#iov_iter (and they are used quite a lot in vfs part of queue), conversion of net/socket.c syscalls belongs in net-next, IMO. Next there's afs and rxrpc stuff from dhowells. And then there's sanitizing kernel_sendmsg et.al. + missing inlined helper for "how much data is left in msg->msg_iter" - this stuff is used in e.g. cifs stuff, but it belongs in net-next. That pile is pullable from git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-davem I'll post the individual patches in there in followups; could you take a look and tell if everything in there is OK with you? ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-13tcp/dccp: get rid of central timewait timerEric Dumazet11-386/+69
Using a timer wheel for timewait sockets was nice ~15 years ago when memory was expensive and machines had a single processor. This does not scale, code is ugly and source of huge latencies (Typically 30 ms have been seen, cpus spinning on death_lock spinlock.) We can afford to use an extra 64 bytes per timewait sock and spread timewait load to all cpus to have better behavior. Tested: On following test, /proc/sys/net/ipv4/tcp_tw_recycle is set to 1 on the target (lpaa24) Before patch : lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0 419594 lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0 437171 While test is running, we can observe 25 or even 33 ms latencies. lpaa24:~# ping -c 1000 -i 0.02 -qn lpaa23 ... 1000 packets transmitted, 1000 received, 0% packet loss, time 20601ms rtt min/avg/max/mdev = 0.020/0.217/25.771/1.535 ms, pipe 2 lpaa24:~# ping -c 1000 -i 0.02 -qn lpaa23 ... 1000 packets transmitted, 1000 received, 0% packet loss, time 20702ms rtt min/avg/max/mdev = 0.019/0.183/33.761/1.441 ms, pipe 2 After patch : About 90% increase of throughput : lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0 810442 lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0 800992 And latencies are kept to minimal values during this load, even if network utilization is 90% higher : lpaa24:~# ping -c 1000 -i 0.02 -qn lpaa23 ... 1000 packets transmitted, 1000 received, 0% packet loss, time 19991ms rtt min/avg/max/mdev = 0.023/0.064/0.360/0.042 ms Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-13netfilter: Fix format string of nfnetlink_log proc fileRichard Weinberger1-1/+1
The printed values are all of type unsigned integer, therefore use %u instead of %d. Otherwise an user can face negative values. Signed-off-by: Richard Weinberger <richard@nod.at> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-13netfilter: Fix format string of nfnetlink_queue proc fileRichard Weinberger1-1/+1
The printed values are all of type unsigned integer, therefore use %u instead of %d. Otherwise an user can face negative values. Fixes: $ cat /proc/net/netfilter/nfnetlink_queue 0 29508 278 2 65531 0 2004213241 -2129885586 1 1 -27747 0 2 65531 0 0 0 1 2 -27748 0 2 65531 0 0 0 1 Signed-off-by: Richard Weinberger <richard@nod.at> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-13netfilter: Fix portid typesRichard Weinberger2-6/+5
The netlink portid is an unsigned integer, use this type also in netfilter. Signed-off-by: Richard Weinberger <richard@nod.at> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-13nfc: Fix portid type in urelease_workRichard Weinberger1-1/+1
portid is an unsigned integer. Fix urelease_work to match all other portid user in the kernel. Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-13netlink: Fix portid type in netlink_notifyRichard Weinberger1-1/+1
portid is an unsigned integer. Fix netlink_notify to match all other portid user in the kernel. Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-13tcp: fix bogus RTT for CC when retransmissions are ackedKenneth Klette Jonassen1-6/+4
Since retransmitted segments are not used for RTT estimation, previously SACKed segments present in the rtx queue are used. This estimation can be several times larger than the actual RTT. When a cumulative ack covers both previously SACKed and retransmitted segments, CC may thus get a bogus RTT. Such segments previously had an RTT estimation in tcp_sacktag_one(), so it seems reasonable to not reuse them in tcp_clean_rtx_queue() at all. Afaik, this has had no effect on SRTT/RTO because of Karn's check. Signed-off-by: Kenneth Klette Jonassen <kennetkl@ifi.uio.no> Acked-by: Neal Cardwell <ncardwell@google.com> Tested-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-13net: use jump label patching for ingress qdisc in __netif_receive_skb_coreDaniel Borkmann3-8/+47
Even if we make use of classifier and actions from the egress path, we're going into handle_ing() executing additional code on a per-packet cost for ingress qdisc, just to realize that nothing is attached on ingress. Instead, this can just be blinded out as a no-op entirely with the use of a static key. On input fast-path, we already make use of static keys in various places, e.g. skb time stamping, in RPS, etc. It makes sense to not waste time when we're assured that no ingress qdisc is attached anywhere. Enabling/disabling of that code path is being done via two helpers, namely net_{inc,dec}_ingress_queue(), that are being invoked under RTNL mutex when a ingress qdisc is being either initialized or destructed. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-13Merge branch 'netdev_diet'David S. Miller3-29/+26
Thomas Graf says: ==================== Bring sizeof(net_device) down to < 2K bytes The size of struct net_device crossed the 2K boundary a while ago which is a waste in combination with many net namespaces. This series brings the size of struct net_device down to well below 2K in total size with a typical configuration. Some reserves a several holes leave room for further expansion. Before: /* size: 2176, cachelines: 34, members: 121 */ After: /* size: 1984, cachelines: 31, members: 120 */ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-13net_device: Reorder members to fill holesThomas Graf1-24/+21
Some trivial reorders while preserving the RX/TX cache lines split to fill a couple of holes. Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-13e1000e: Move pm_qos_req to e1000e adapterThomas Graf3-5/+5
e1000e is the only driver requiring pm_qos_req, instead of causing every device to waste up to 240 bytes. Allocate it for the specific driver. Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-13selinux/nlmsg: add a build time check for rtnl/xfrm cmdsNicolas Dichtel1-0/+3
When a new rtnl or xfrm command is added, this part of the code is frequently missing. Let's help the developer with a build time test. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-13Merge branch 'master' of ↵David S. Miller16-53/+505
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates 2015-04-11 This series contains updates to iflink, ixgbe and ixgbevf. The entire set of changes come from Vlad Zolotarov to ultimately add the ethtool ops to VF driver to allow querying the RSS indirection table and RSS random key. Currently we support only 82599 and x540 devices. On those devices, VFs share the RSS redirection table and hash key with a PF. Letting the VF query this information may introduce some security risks, therefore this feature will be disabled by default. The new netdev op allows a system administrator to change the default behaviour with "ip link set" command. The relevant iproute2 patch has already been sent and awaits for this series upstream. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-13Merge branch 'fou-next'David S. Miller2-45/+186
Cong Wang says: ==================== fou: some fixes and updates Patch 1~3 fix some minor bugs in net/ipv4/fou.c, the only thing I am not sure is if it's too late to change the byte order of FOU_ATTR_PORT, if so we have to fix iproute2 instead of kernel. Patch 4~5 add some new features to make it complete. v2: make fou->port be16 too ==================== Acked-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-13fou: implement FOU_CMD_GETWANG Cong2-0/+110
Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-13fou: add network namespace supportWANG Cong1-39/+67
Also convert the spinlock to a mutex. Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-13fou: always use be16 for portWANG Cong1-3/+3
udp_config.local_udp_port is be16. And iproute2 passes network order for FOU_ATTR_PORT. This doesn't fix any bug, just for consistency. Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-13fou: exit early when parsing config failsWANG Cong1-1/+4
Not a big deal, just for corretness. Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-13fou: avoid calling udp_del_offload() twiceWANG Cong1-2/+2
This fixes the following harmless warning: ./ip/ip fou del port 7777 [ 122.907516] udp_del_offload: didn't find offload for port 7777 Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-13Merge branch 'selinux_xfrm_nl_cmd'David S. Miller1-0/+3
Nicolas Dichtel says: ==================== selinux: add missing xfrm nl cmd With this series, xfrm commands are fully synchronized. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-13selinux/nlmsg: add XFRM_MSG_MAPPINGNicolas Dichtel1-0/+1
This command is missing. Fixes: 3a2dfbe8acb1 ("xfrm: Notify changes in UDP encapsulation via netlink") CC: Martin Willi <martin@strongswan.org> Reported-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-13selinux/nlmsg: add XFRM_MSG_MIGRATENicolas Dichtel1-0/+1
This command is missing. Fixes: 5c79de6e79cd ("[XFRM]: User interface for handling XFRM_MSG_MIGRATE") Reported-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-13selinux/nlmsg: add XFRM_MSG_REPORTNicolas Dichtel1-0/+1
This command is missing. Fixes: 97a64b4577ae ("[XFRM]: Introduce XFRM_MSG_REPORT.") Reported-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-13tcp: do not cache align timewait socketsEric Dumazet1-2/+1
With recent adoption of skc_cookie in struct sock_common, struct tcp_timewait_sock size increased from 192 to 200 bytes on 64bit arches. SLAB rounds then to 256 bytes. It is time to drop SLAB_HWCACHE_ALIGN constraint for twsk_slab. This saves about 12 MB of memory on typical configuration reaching 262144 timewait sockets, and has no noticeable impact on performance. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-13Merge tag 'mac80211-next-for-davem-2015-04-10' of ↵David S. Miller27-406/+1249
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== There isn't much left, but we have * new mac80211 internal software queue to allow drivers to have shorter hardware queues and pull on-demand * use rhashtable for mac80211 station table * minstrel rate control debug improvements and some refactoring * fix noisy message about TX power reduction * fix continuous message printing and activity if CRDA doesn't respond * fix VHT-related capabilities with "iw connect" or "iwconfig ..." * fix Kconfig for cfg80211 wireless extensions compatibility ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-13net/macb: sqe_test_errors are TX errors, not RX errorsWolfgang Steinwender1-2/+2
The statistics are grouped by TX and RX errors. The SQE Test Errors Register indicates problems with TX. Signed-off-by: Wolfgang Steinwender <wsteinwender@pcs.com> Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-11new helper: msg_data_left()Al Viro8-23/+27
convert open-coded instances Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11Merge remote-tracking branch 'dh/afs' into for-davemAl Viro7-35/+166
2015-04-11get rid of the size argument of sock_sendmsg()Al Viro3-15/+16
it's equal to iov_iter_count(&msg->msg_iter) in all cases Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11ixgbevf: Add the appropriate ethtool ops to query RSS indirection table and keyVlad Zolotarov1-0/+69
Added get_rxfh_indir_size, get_rxfh_key_size and get_rxfh ethtool_ops callbacks implementations. This enables the ethtool's "-x" and "--show-rxfh[-indir]" options for VF devices. This patch adds the support for 82599 and x540 devices only. Support for other devices will be added later. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-11ixgbevf: Add RSS Key query codeVlad Zolotarov4-0/+57
Add the ixgbevf_get_rss_key() function that queries the PF for an RSS Random Key using a new VF-PF channel IXGBE_VF_GET_RSS_KEY command. This patch adds the support for 82599 and x540 devices only. Support for other devices will be added later. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-11ixgbe: Add GET_RSS_KEY command to VF-PF channel commands setVlad Zolotarov2-0/+22
For 82599 and x540 VFs and PF share the same RSS Key. Therefore we will return the same RSS key for all VFs. Support for other devices will be added later. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>