summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2016-05-09can: mcp251x: Replace create_freezable_workqueue with alloc_workqueueAmitoj Kaur Chawla1-1/+2
Replace scheduled to be removed create_freezable_workqueue with alloc_workqueue. priv->wq should be explicitly set as freezable to ensure it is frozen in the suspend sequence and work items are drained so that no new work item starts execution until thawed. Thus, use of WQ_FREEZABLE flag here is required. WQ_MEM_RECLAIM flag has been set here to ensure forward progress regardless of memory pressure. The order of execution is not important so set @max_active as 0. Signed-off-by: Amitoj Kaur Chawla <amitoj1606@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2016-05-09can: sja1000: plx_pci: Add support for Marathon CAN-bus-PCIe cardNikita Edward Baruzdin1-8/+56
This patch adds support for the Marathon CAN-bus-PCIe card to the sja1000 driver. For more information see: http://can.marathon.ru/page/devices/can-bus-pcie Signed-off-by: Nikita Edward Baruzdin <nebaruzdin@gmail.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2016-05-09can: sja1000: Fix error location forwardingAlexander Gerasiov1-1/+5
According to SJA1000 documentation the location of error is available regardless of an error type. Therefore it should always be forwarded to SocketCAN. Signed-off-by: Nikita Edward Baruzdin <nebaruzdin@lvk.cs.msu.su> Signed-off-by: Alexander GQ Gerasiov <gq@cs.msu.su> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2016-05-09Merge branch 'mlx5-build-fix'David S. Miller5-4/+24
Saeed Mahameed says: ==================== net/mlx5e: Kconfig fixes for VxLAN Reposting to net the build errors fixes posted by Arnd last week. Originally Arnd posted those fixes to net-next, while the issue is also seen in net. For net-next a different approach is required for fixing the issue as VXLAN and Device Drivers are no longer dependent, but there is no harm for those fixes to get into net-next. Optionally, once net is merged into net-next we can Revert "net/mlx5e: make VXLAN support conditional" as the CONFIG_MLX5_CORE_EN_VXLAN will no longer be required. Applied on top: 288928658583 ('mlxsw: spectrum: Add missing rollback in flood configuration') ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-09net/mlx5e: make VXLAN support conditionalArnd Bergmann5-3/+24
VXLAN can be disabled at compile-time or it can be a loadable module while mlx5 is built-in, which leads to a link error: drivers/net/built-in.o: In function `mlx5e_create_netdev': ntb_netdev.c:(.text+0x106de4): undefined reference to `vxlan_get_rx_port' This avoids the link error and makes the vxlan code optional, like the other ethernet drivers do as well. Link: https://patchwork.ozlabs.org/patch/589296/ Fixes: b3f63c3d5e2c ("net/mlx5e: Add netdev support for VXLAN tunneling") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-09Revert "net/mlx5: Kconfig: Fix MLX5_EN/VXLAN build issue"Arnd Bergmann1-1/+0
This reverts commit 69976fb1045850a742deb9790ea49cbc6f497531. We cannot select VXLAN when IPv4 support is disabled, that just gives us additional build errors, including: warning: (MLX5_CORE_EN) selects VXLAN which has unmet direct dependencies (NETDEVICES && NET_CORE && INET) In file included from ../drivers/net/vxlan.c:36:0: include/net/udp_tunnel.h: In function 'udp_tunnel_handle_offloads': include/net/udp_tunnel.h:112:9: error: implicit declaration of function 'iptunnel_handle_offloads' [-Werror=implicit-function-declaration] return iptunnel_handle_offloads(skb, type); ^~~~~~~~~~~~~~~~~~~~~~~~ I'm sending a proper fix for the original bug in a separate patch. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-09Merge branch 'sh_eth-sw-reset'David S. Miller1-9/+3
Sergei Shtylyov says: ==================== sh_eth: couple of software reset bit cleanups Here's a set of 2 patches against DaveM's 'net-next.git' repo. We can save on the repetitive chip reset code... [1/2] sh_eth: call sh_eth_tsu_write() from sh_eth_chip_reset_giga() [2/2] sh_eth: reuse sh_eth_chip_reset() ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-09sh_eth: reuse sh_eth_chip_reset()Sergei Shtylyov1-9/+2
All the chip_reset() methods repeat the code writing to the ARSTR register and delaying for 1 ms, so that we can reuse sh_eth_chip_reset() twice. Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-09sh_eth: call sh_eth_tsu_write() from sh_eth_chip_reset_giga()Sergei Shtylyov1-2/+3
sh_eth_chip_reset_giga() doesn't really need to use direct iowrite32() when writing to the ARSTR register, it can use sh_eth_tsu_write() as all other chip_reset() methods. Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-09pxa168_eth: mdiobus_scan() doesn't return NULL anymoreSergei Shtylyov1-2/+0
Now that mdiobus_scan() doesn't return NULL on failure anymore, this driver no longer needs to check for it... Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-09Merge branch 'for-upstream' of ↵David S. Miller8-53/+180
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2016-05-07 Here are a few more Bluetooth patches for the 4.7 kernel: - NULL pointer fix in hci_intel driver - New Intel Bluetooth controller id in btusb driver - Added device tree binding documentation for Marvel's bt-sd8xxx - Platform specific wakeup interrupt support for btmrvl driver Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-09macsec: key identifier is 128 bits, not 64Sabrina Dubroca2-7/+16
The MACsec standard mentions a key identifier for each key, but doesn't specify anything about it, so I arbitrarily chose 64 bits. IEEE 802.1X-2010 specifies MKA (MACsec Key Agreement), and defines the key identifier to be 128 bits (96 bits "member identifier" + 32 bits "key number"). Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-09tcp: refactor struct tcp_skb_cbLawrence Brakmo1-3/+8
Refactor tcp_skb_cb to create two overlaping areas to store state for incoming or outgoing skbs based on comments by Neal Cardwell to tcp_nv patch: AFAICT this patch would not require an increase in the size of sk_buff cb[] if it were to take advantage of the fact that the tcp_skb_cb header.h4 and header.h6 fields are only used in the packet reception code path, and this in_flight field is only used on the transmit side. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-09ifb: support more featuresEric Dumazet1-0/+3
When using ifb+netem on ingress on SIT/IPIP/GRE traffic, GRO packets are not properly processed. Segmentation should not be forced, since ifb is already adding quite a performance hit. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-09net: make sch_handle_ingress() drop monitor readyEric Dumazet1-1/+3
TC_ACT_STOLEN is used when ingress traffic is mirred/redirected to say ifb. Packet is not dropped, but consumed. Only TC_ACT_SHOT is a clear indication something went wrong. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-09ISDN: eicon: replace custom hex_asc_lo() / hex_pack_byte()Andy Shevchenko1-14/+7
Instead of custom approach re-use generic helpers to convert byte to hex format. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-09Documentation/networking: more accurate LCO explanationShmulik Ladkani1-7/+7
In few places the term "ones-complement sum" was used but the actual meaning is "the complement of the ones-complement sum". Also, avoid enclosing long statements with underscore, to ease readability. Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-09fq_codel: add memory limitation per queueEric Dumazet2-3/+27
On small embedded routers, one wants to control maximal amount of memory used by fq_codel, instead of controlling number of packets or bytes, since GRO/TSO make these not practical. Assuming skb->truesize is accurate, we have to keep track of skb->truesize sum for skbs in queue. This patch adds a new TCA_FQ_CODEL_MEMORY_LIMIT attribute. I chose a default value of 32 MBytes, which looks reasonable even for heavy duty usages. (Prior fq_codel users should not be hurt when they upgrade their kernels) Two fields are added to tc_fq_codel_qd_stats to report : - Current memory usage - Number of drops caused by memory limits # tc qd replace dev eth1 root est 1sec 4sec fq_codel memory_limit 4M .. # tc -s -d qd sh dev eth1 qdisc fq_codel 8008: root refcnt 257 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 4Mb ecn Sent 2083566791363 bytes 1376214889 pkt (dropped 4994406, overlimits 0 requeues 21705223) rate 9841Mbit 812549pps backlog 3906120b 376p requeues 21705223 maxpacket 68130 drop_overlimit 4994406 new_flow_count 28855414 ecn_mark 0 memory_used 4190048 drop_overmemory 4994406 new_flows_len 1 old_flows_len 177 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Dave Täht <dave.taht@gmail.com> Cc: Sebastian Möller <moeller0@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-09net: Add Qualcomm IPC routerCourtney Cavin9-1/+1198
Add an implementation of Qualcomm's IPC router protocol, used to communicate with service providing remote processors. Signed-off-by: Courtney Cavin <courtney.cavin@sonymobile.com> Signed-off-by: Bjorn Andersson <bjorn.andersson@sonymobile.com> [bjorn: Cope with 0 being a valid node id and implement RTM_NEWADDR] Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-09soc: qcom: smd: Introduce compile stubsBjorn Andersson1-1/+27
Introduce compile stubs for the SMD API, allowing consumers to be compile tested. Acked-by: Andy Gross <andy.gross@linaro.org> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-09macvtap: segmented packet is consumedEric Dumazet1-1/+1
If GSO packet is segmented and its segments are properly queued, we call consume_skb() instead of kfree_skb() to be drop monitor friendly. Fixes: 3e4f8b7873709 ("macvtap: Perform GSO on forwarding path.") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Vlad Yasevich <vyasevic@redhat.com> Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-09tools: bpf_jit_disasm: check for klogctl failureColin Ian King1-0/+3
klogctl can fail and return -ve len, so check for this and return NULL to avoid passing a (size_t)-1 to malloc. Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-09qede: uninitialized variable in qede_start_xmit()Dan Carpenter1-1/+1
"data_split" was never set to false. It's just uninitialized. Fixes: 2950219d87b0 ('qede: Add basic network device support') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-09Linux 4.6-rc7v4.6-rc7Linus Torvalds1-1/+1
2016-05-07netxen: netxen_rom_fast_read() doesn't return -1Dan Carpenter1-1/+2
The error handling is broken here. netxen_rom_fast_read() returns zero on success and -EIO on error. It never returns -1. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-07netxen: reversed condition in netxen_nic_set_link_parameters()Dan Carpenter1-1/+1
My static checker complains that we are using "autoneg" without initializing it. The problem is the ->phy_read() condition is reversed so we only set this on error instead of success. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-07netxen: fix error handling in netxen_get_flash_block()Dan Carpenter1-4/+8
My static checker complained that "v" can be used unintialized if netxen_rom_fast_read() returns -EIO. That function never actually returns -1. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-07cxgb4: Reset dcb state machine and tx queue prio only if dcb is enabledHariprasad Shenai1-18/+20
When cxgb4 is enabled with CONFIG_CHELSIO_T4_DCB set, VI enable command gets called with DCB enabled. But when we have a back to back setup with DCB enabled on one side and non-DCB on the Peer side. Firmware doesn't send any DCB_L2_CFG, and DCB priority is never set for Tx queue. But driver resets the queue priority and state machine whenever there is a link down, this patch fixes it by adding a check to reset only if cxgb4_dcb_enabled() returns true. Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-07Merge tag 'char-misc-4.6-rc7' of ↵Linus Torvalds3-8/+27
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull misc driver fixes from Gfreg KH: "Here are three small fixes for some driver problems that were reported. Full details in the shortlog below. All of these have been in linux-next with no reported issues" * tag 'char-misc-4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: nvmem: mxs-ocotp: fix buffer overflow in read Drivers: hv: vmbus: Fix signaling logic in hv_need_to_signal_on_read() misc: mic: Fix for double fetch security bug in VOP driver
2016-05-07Merge tag 'staging-4.6-rc7' of ↵Linus Torvalds4-7/+34
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull IIO driver fixes from Grek KH: "It's really just IIO drivers here, some small fixes that resolve some 'crash on boot' errors that have shown up in the -rc series, and other bugfixes that are required. All have been in linux-next with no reported problems" * tag 'staging-4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: iio: imu: mpu6050: Fix name/chip_id when using ACPI iio: imu: mpu6050: fix possible NULL dereferences iio:adc:at91-sama5d2: Repair crash on module removal iio: ak8975: fix maybe-uninitialized warning iio: ak8975: Fix NULL pointer exception on early interrupt
2016-05-07Merge tag 'usb-4.6-rc7' of ↵Linus Torvalds6-19/+11
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg KH: "Here are some last-remaining fixes for USB drivers to resolve issues that have shown up in testing. And two new device ids as well. All of these have been in linux-next with no reported issues" * tag 'usb-4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: Revert "USB / PM: Allow USB devices to remain runtime-suspended when sleeping" usb: musb: jz4740: fix error check of usb_get_phy() Revert "usb: musb: musb_host: Enable HCD_BH flag to handle urb return in bottom half" usb: musb: gadget: nuke endpoint before setting its descriptor to NULL USB: serial: cp210x: add Straizona Focusers device ids USB: serial: cp210x: add ID for Link ECU
2016-05-07Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-armLinus Torvalds3-8/+20
Pull ARM fixes from Russell King: "These are a number of updates to fix a few problems found in the ARM nommu code over the last couple of years, caused mostly by changes on the mmu side" * 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm: ARM: 8573/1: domain: move {set,get}_domain under config guard ARM: 8572/1: nommu: change memory reserve for the vectors ARM: 8571/1: nommu: fix PMSAv7 setup
2016-05-07Merge tag 'media/v4.6-5' of ↵Linus Torvalds3-24/+9
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media fixes from Mauro Carvalho Chehab: - deadlock fixes on driver probe at exynos4-is and s43-camif drivers - a build breakage if media controller is enabled and USB or PCI is built as module. * tag 'media/v4.6-5' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: [media] media-device: fix builds when USB or PCI is compiled as module [media] media: s3c-camif: fix deadlock on driver probe() [media] media: exynos4-is: fix deadlock on driver probe
2016-05-07Merge branch 'for-4.6-fixes' of ↵Linus Torvalds7-1/+229
git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata Pull libata fixes from Tejun Heo: "An ahci driver addition and updates to ahci port enable handling for some platform devices" * 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: ata: add AMD Seattle platform driver ARM: dts: apq8064: add ahci ports-implemented mask ata: ahci-platform: Add ports-implemented DT bindings. libahci: save port map for forced port map
2016-05-07Merge tag 'for-linus' of ↵Linus Torvalds1-4/+10
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma fix from Doug Ledford: "Fix for max sector calculation in iSER" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: IB/iser: Fix max_sectors calculation
2016-05-07mlxsw: spectrum: Fix ordering in mlxsw_sp_finiJiri Pirko1-1/+1
Fixes: 0f433fa0ec ("mlxsw: spectrum_buffers: Implement shared buffer configuration") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-07mlxsw: spectrum: Add missing rollback in flood configurationIdo Schimmel1-0/+8
When we fail to set the flooding configuration for the broadcast and unregistered multicast traffic, we should revert the flooding configuration of the unknown unicast traffic. Fixes: 0293038e0c36 ("mlxsw: spectrum: Add support for flood control") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-07mlxsw: spectrum: Fix rollback order in LAG join failureIdo Schimmel1-2/+2
Make the leave procedure in the error path symmetric to the join procedure and first remove the port from the collector before potentially destroying the LAG. Fixes: 0d65fc13042f ("mlxsw: spectrum: Implement LAG port join/leave") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-07udp_offload: Set encapsulation before inner completes.Jarno Rajahalme5-3/+18
UDP tunnel segmentation code relies on the inner offsets being set for an UDP tunnel GSO packet, but the inner *_complete() functions will set the inner offsets only if 'encapsulation' is set before calling them. Currently, udp_gro_complete() sets 'encapsulation' only after the inner *_complete() functions are done. This causes the inner offsets having invalid values after udp_gro_complete() returns, which in turn will make it impossible to properly segment the packet in case it needs to be forwarded, which would be visible to the user either as invalid packets being sent or as packet loss. This patch fixes this by setting skb's 'encapsulation' in udp_gro_complete() before calling into the inner complete functions, and by making each possible UDP tunnel gro_complete() callback set the inner_mac_header to the beginning of the tunnel payload. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Reviewed-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-07udp_tunnel: Remove redundant udp_tunnel_gro_complete().Jarno Rajahalme4-15/+0
The setting of the UDP tunnel GSO type is already performed by udp[46]_gro_complete(). Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-07macvtap: add namespace support to the sysfs device classMarc Angel1-10/+26
When creating macvtaps that are expected to have the same ifindex in different network namespaces, only the first one will succeed. The others will fail with a sysfs_warn_dup warning due to them trying to create the following sysfs link (with 'NN' the ifindex of macvtapX): /sys/class/macvtap/tapNN -> /sys/devices/virtual/net/macvtapX/tapNN This is reproducible by running the following commands: ip netns add ns1 ip netns add ns2 ip link add veth0 type veth peer name veth1 ip link set veth0 netns ns1 ip link set veth1 netns ns2 ip netns exec ns1 ip l add link veth0 macvtap0 type macvtap ip netns exec ns2 ip l add link veth1 macvtap1 type macvtap The last command will fail with "RTNETLINK answers: File exists" (along with the kernel warning) but retrying it will work because the ifindex was incremented. The 'net' device class is isolated between network namespaces so each one has its own hierarchy of net devices. This isn't the case for the 'macvtap' device class. The problem occurs half-way through the netdev registration, when `macvtap_device_event` is called-back to create the 'tapNN' macvtap class device under the 'macvtapX' net class device. This patch adds namespace support to the 'macvtap' device class so that /sys/class/macvtap is no longer shared between net namespaces. However, making the macvtap sysfs class namespace-aware has the side effect of changing /sys/devices/virtual/net/macvtapX/tapNN into /sys/devices/virtual/net/macvtapX/macvtap/tapNN. This is due to Commit 24b1442 ("Driver-core: Always create class directories for classses that support namespaces") and the fact that class devices supporting namespaces are really not supposed to be placed directly under other class devices. To avoid breaking userland, a tapNN symlink pointing to macvtap/tapNN is created inside the macvtapX directory. Signed-off-by: Marc Angel <marc@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds1-2/+4
Pull writeback fix from Jens Axboe: "Just a single fix for domain aware writeback, fixing a regression that can cause balance_dirty_pages() to keep looping while not getting any work done" * 'for-linus' of git://git.kernel.dk/linux-block: writeback: Fix performance regression in wb_over_bg_thresh()
2016-05-06ipv4: tcp: ip_send_unicast_reply() is not BH safeEric Dumazet1-4/+4
I forgot that ip_send_unicast_reply() is not BH safe (yet). Disabling preemption before calling it was not a good move. Fixes: c10d9310edf5 ("tcp: do not assume TCP code is non preemptible") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Andres Lagar-Cavilla <andreslc@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06Merge branch 'bpf-direct-pkt-access'David S. Miller14-82/+1004
Alexei Starovoitov says: ==================== bpf: introduce direct packet access This set of patches introduce 'direct packet access' from cls_bpf and act_bpf programs (which are root only). Current bpf programs use LD_ABS, LD_INS instructions which have to do 'if (off < skb_headlen)' for every packet access. It's ok for socket filters, but too slow for XDP, since single LD_ABS insn consumes 3% of cpu. Therefore we have to amortize the cost of length check over multiple packet accesses via direct access to skb->data, data_end pointers. The existing packet parser typically look like: if (load_half(skb, offsetof(struct ethhdr, h_proto)) != ETH_P_IP) return 0; if (load_byte(skb, ETH_HLEN + offsetof(struct iphdr, protocol)) != IPPROTO_UDP || load_byte(skb, ETH_HLEN) != 0x45) return 0; ... with 'direct packet access' the bpf program becomes: void *data = (void *)(long)skb->data; void *data_end = (void *)(long)skb->data_end; struct eth_hdr *eth = data; struct iphdr *iph = data + sizeof(*eth); if (data + sizeof(*eth) + sizeof(*iph) + sizeof(*udp) > data_end) return 0; if (eth->h_proto != htons(ETH_P_IP)) return 0; if (iph->protocol != IPPROTO_UDP || iph->ihl != 5) return 0; ... which is more natural to write and significantly faster. See patch 6 for performance tests: 21Mpps(old) vs 24Mpps(new) with just 5 loads. For more complex parsers the performance gain is higher. The other approach implemented in [1] was adding two new instructions to interpreter and JITs and was too hard to use from llvm side. The approach presented here doesn't need any instruction changes, but the verifier has to work harder to check safety of the packet access. Patch 1 prepares the code and Patch 2 adds new checks for direct packet access and all of them are gated with 'env->allow_ptr_leaks' which is true for root only. Patch 3 improves search pruning for large programs. Patch 4 wires in verifier's changes with net/core/filter side. Patch 5 updates docs Patches 6 and 7 add tests. [1] https://git.kernel.org/cgit/linux/kernel/git/ast/bpf.git/?h=ld_abs_dw ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06samples/bpf: add verifier testsAlexei Starovoitov1-0/+80
add few tests for "pointer to packet" logic of the verifier Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06samples/bpf: add 'pointer to packet' testsAlexei Starovoitov5-0/+281
parse_simple.c - packet parser exapmle with single length check that filters out udp packets for port 9 parse_varlen.c - variable length parser that understand multiple vlan headers, ipip, ipip6 and ip options to filter out udp or tcp packets on port 9. The packet is parsed layer by layer with multitple length checks. parse_ldabs.c - classic style of packet parsing using LD_ABS instruction. Same functionality as parse_simple. simple = 24.1Mpps per core varlen = 22.7Mpps ldabs = 21.4Mpps Parser with LD_ABS instructions is slower than full direct access parser which does more packet accesses and checks. These examples demonstrate the choice bpf program authors can make between flexibility of the parser vs speed. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06bpf: add documentation for 'direct packet access'Alexei Starovoitov1-2/+83
explain how verifier checks safety of packet access and update email addresses. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06bpf: wire in data and data_end for cls_act_bpfAlexei Starovoitov4-6/+65
allow cls_bpf and act_bpf programs access skb->data and skb->data_end pointers. The bpf helpers that change skb->data need to update data_end pointer as well. The verifier checks that programs always reload data, data_end pointers after calls to such bpf helpers. We cannot add 'data_end' pointer to struct qdisc_skb_cb directly, since it's embedded as-is by infiniband ipoib, so wrapper struct is needed. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06bpf: improve verifier state equivalenceAlexei Starovoitov1-20/+3
since UNKNOWN_VALUE type is weaker than CONST_IMM we can un-teach verifier its recognition of constants in conditional branches without affecting safety. Ex: if (reg == 123) { .. here verifier was marking reg->type as CONST_IMM instead keep reg as UNKNOWN_VALUE } Two verifier states with UNKNOWN_VALUE are equivalent, whereas CONST_IMM_X != CONST_IMM_Y, since CONST_IMM is used for stack range verification and other cases. So help search pruning by marking registers as UNKNOWN_VALUE where possible instead of CONST_IMM. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06bpf: direct packet accessAlexei Starovoitov3-8/+440
Extended BPF carried over two instructions from classic to access packet data: LD_ABS and LD_IND. They're highly optimized in JITs, but due to their design they have to do length check for every access. When BPF is processing 20M packets per second single LD_ABS after JIT is consuming 3% cpu. Hence the need to optimize it further by amortizing the cost of 'off < skb_headlen' over multiple packet accesses. One option is to introduce two new eBPF instructions LD_ABS_DW and LD_IND_DW with similar usage as skb_header_pointer(). The kernel part for interpreter and x64 JIT was implemented in [1], but such new insns behave like old ld_abs and abort the program with 'return 0' if access is beyond linear data. Such hidden control flow is hard to workaround plus changing JITs and rolling out new llvm is incovenient. Therefore allow cls_bpf/act_bpf program access skb->data directly: int bpf_prog(struct __sk_buff *skb) { struct iphdr *ip; if (skb->data + sizeof(struct iphdr) + ETH_HLEN > skb->data_end) /* packet too small */ return 0; ip = skb->data + ETH_HLEN; /* access IP header fields with direct loads */ if (ip->version != 4 || ip->saddr == 0x7f000001) return 1; [...] } This solution avoids introduction of new instructions. llvm stays the same and all JITs stay the same, but verifier has to work extra hard to prove safety of the above program. For XDP the direct store instructions can be allowed as well. The skb->data is NET_IP_ALIGNED, so for common cases the verifier can check the alignment. The complex packet parsers where packet pointer is adjusted incrementally cannot be tracked for alignment, so allow byte access in such cases and misaligned access on architectures that define efficient_unaligned_access [1] https://git.kernel.org/cgit/linux/kernel/git/ast/bpf.git/?h=ld_abs_dw Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>