summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2016-04-15crypto: ccp - Prevent information leakage on exportTom Lendacky2-0/+6
Prevent information from leaking to userspace by doing a memset to 0 of the export state structure before setting the structure values and copying it. This prevents un-initialized padding areas from being copied into the export area. Cc: <stable@vger.kernel.org> # 3.14.x- Reported-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-04-15crypto: sha1-mb - use corrcet pointer while completing jobsXiaodong Liu1-2/+2
In sha_complete_job, incorrect mcryptd_hash_request_ctx pointer is used when check and complete other jobs. If the memory of first completed req is freed, while still completing other jobs in the func, kernel will crash since NULL pointer is assigned to RIP. Cc: <stable@vger.kernel.org> Signed-off-by: Xiaodong Liu <xiaodong.liu@intel.com> Acked-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-04-15crypto: rsa-pkcs1pad - fix dst lenTadeusz Struk1-6/+6
The output buffer length has to be at least as big as the key_size. It is then updated to the actual output size by the implementation. Cc: <stable@vger.kernel.org> Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-04-15ARM: 8551/2: DMA: Fix kzalloc flags in __dma_allocAlexandre Courbot1-1/+2
Commit 19e6e5e5392b ("ARM: 8547/1: dma-mapping: store buffer information") allocates a structure meant for internal buffer management with the GFP flags of the buffer itself. This can trigger the following safeguard in the slab/slub allocator: if (unlikely(flags & GFP_SLAB_BUG_MASK)) { pr_emerg("gfp: %un", flags & GFP_SLAB_BUG_MASK); BUG(); } Fix this by filtering the flags that make the slab allocator unhappy. Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Acked-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2016-04-15hp_accel: Add support for HP ProBook 440 G3Martin Vajnar1-0/+2
HP ProBook 440 G3 laptop needs a non-standard mapping (x_inverted_usd). Signed-off-by: Martin Vajnar <martin.vajnar@gmail.com> Acked-by: Éric Piel <eric.piel@tremplin-utc.net> Signed-off-by: Darren Hart <dvhart@linux.intel.com>
2016-04-15tun: use per cpu variables for stats accountingPaolo Abeni1-12/+83
Currently the tun device accounting uses dev->stats without applying any kind of protection, regardless that accounting happens in preemptible process context. This patch move the tun stats to a per cpu data structure, and protect the updates with u64_stats_update_begin()/u64_stats_update_end() or this_cpu_inc according to the stat type. The per cpu stats are aggregated by the newly added ndo_get_stats64 ops. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-15Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds8-5/+89
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes: a binutils fix, an lguest fix, an mcelog fix and a missing documentation fix" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Avoid using object after free in genpool lguest, x86/entry/32: Fix handling of guest syscalls using interrupt gates x86/build: Build compressed x86 kernels as PIE x86/mm/pkeys: Add missing Documentation
2016-04-15Merge branch 'mm-urgent-for-linus' of ↵Linus Torvalds3-143/+17
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull mm gup cleanup from Ingo Molnar: "This removes the ugly get-user-pages API hack, now that all upstream code has been migrated to it" ("ugly" is putting it mildly. But it worked.. - Linus) * 'mm-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: mm/gup: Remove the macro overload API migration helpers from the get_user*() APIs
2016-04-15Merge tag 'dm-4.6-fixes' of ↵Linus Torvalds2-25/+43
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: - fix a 4.6-rc1 bio-based DM 'struct dm_target_io' leak in an error path - stable@ fix for DM cache metadata's READ_LOCK macros that were incorrectly returning error if the block manager was in read-only mode; also cleanup multi-statement macros to use do {} while(0) * tag 'dm-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm cache metadata: fix READ_LOCK macros and cleanup WRITE_LOCK macros dm: fix dm_target_io leak if clone_bio() returns an error
2016-04-15Merge tag 'pwm/for-4.6-rc4' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm Pull pwm fix from Thierry Reding: "A single one-line fix to turn the regmap cache from an RB-tree to a flat cache to avoid lockdep and abort issues" * tag 'pwm/for-4.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm: pwm: fsl-ftm: Use flat regmap cache
2016-04-15Merge tag 'sound-4.6-rc4' of ↵Linus Torvalds6-7/+45
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "We've had a very calm development cycle, so far. Here are the few fixes for HD-audio and USB-audio, all of which are small and easy" * tag 'sound-4.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda - Fix inconsistent monitor_present state until repoll ALSA: hda - Fix regression of monitor_present flag in eld proc file ALSA: usb-audio: Skip volume controls triggers hangup on Dell USB Dock ALSA: hda/realtek - Enable the ALC292 dock fixup on the Thinkpad T460s ALSA: sscape: Use correct format identifier for size_t ALSA: usb-audio: Add a quirk for Plantronics BT300 ALSA: usb-audio: Add a sample rate quirk for Phoenix Audio TMX320 ALSA: hda - Bind with i915 only when Intel graphics is present
2016-04-15Merge branch 'bpf-ARG_PTR_TO_RAW_STACK'David S. Miller10-57/+421
Merge branch 'bpf-ARG_PTR_TO_RAW_STACK' Daniel Borkmann says: ==================== BPF updates This series adds a new verifier argument type called ARG_PTR_TO_RAW_STACK and converts related helpers to make use of it. Basic idea is that we can save init of stack memory when the helper function is guaranteed to fully fill out the passed buffer in every path. Series also adds test cases and converts samples. For more details, please see individual patches. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-15Merge branch 'mailbox-devel' of ↵Linus Torvalds3-11/+13
git://git.linaro.org/landing-teams/working/fujitsu/integration Pull mailbox fixes from Jussi Brar: "Misc fixes: mailbox-test driver: - prevent memory leak and another cosmetic change mailbox: - change the returned error code Xgene driver: - return -ENOMEM instead of PTR_ERR for failed devm_kzalloc" * 'mailbox-devel' of git://git.linaro.org/landing-teams/working/fujitsu/integration: mailbox: Stop using ENOSYS for anything other than unimplemented syscalls mailbox: mailbox-test: Prevent memory leak mailbox: mailbox-test: Use more consistent format for calling copy_from_user() mailbox: xgene-slimpro: Fix wrong test for devm_kzalloc
2016-04-15bpf, samples: add test cases for raw stackDaniel Borkmann1-0/+268
This adds test cases mostly around ARG_PTR_TO_RAW_STACK to check the verifier behaviour. [...] #84 raw_stack: no skb_load_bytes OK #85 raw_stack: skb_load_bytes, no init OK #86 raw_stack: skb_load_bytes, init OK #87 raw_stack: skb_load_bytes, spilled regs around bounds OK #88 raw_stack: skb_load_bytes, spilled regs corruption OK #89 raw_stack: skb_load_bytes, spilled regs corruption 2 OK #90 raw_stack: skb_load_bytes, spilled regs + data OK #91 raw_stack: skb_load_bytes, invalid access 1 OK #92 raw_stack: skb_load_bytes, invalid access 2 OK #93 raw_stack: skb_load_bytes, invalid access 3 OK #94 raw_stack: skb_load_bytes, invalid access 4 OK #95 raw_stack: skb_load_bytes, invalid access 5 OK #96 raw_stack: skb_load_bytes, invalid access 6 OK #97 raw_stack: skb_load_bytes, large access OK Summary: 98 PASSED, 0 FAILED Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-15bpf, samples: don't zero data when not neededDaniel Borkmann4-12/+12
Remove the zero initialization in the sample programs where appropriate. Note that this is an optimization which is now possible, old programs still doing the zero initialization are just fine as well. Also, make sure we don't have padding issues when we don't memset() the entire struct anymore. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-15bpf: convert relevant helper args to ARG_PTR_TO_RAW_STACKDaniel Borkmann3-24/+60
This patch converts all helpers that can use ARG_PTR_TO_RAW_STACK as argument type. For tc programs this is bpf_skb_load_bytes(), bpf_skb_get_tunnel_key(), bpf_skb_get_tunnel_opt(). For tracing, this optimizes bpf_get_current_comm() and bpf_probe_read(). The check in bpf_skb_load_bytes() for MAX_BPF_STACK can also be removed since the verifier already makes sure we stay within bounds on stack buffers. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-15bpf, verifier: add ARG_PTR_TO_RAW_STACK typeDaniel Borkmann2-5/+59
When passing buffers from eBPF stack space into a helper function, we have ARG_PTR_TO_STACK argument type for helpers available. The verifier makes sure that such buffers are initialized, within boundaries, etc. However, the downside with this is that we have a couple of helper functions such as bpf_skb_load_bytes() that fill out the passed buffer in the expected success case anyway, so zero initializing them prior to the helper call is unneeded/wasted instructions in the eBPF program that can be avoided. Therefore, add a new helper function argument type called ARG_PTR_TO_RAW_STACK. The idea is to skip the STACK_MISC check in check_stack_boundary() and color the related stack slots as STACK_MISC after we checked all call arguments. Helper functions using ARG_PTR_TO_RAW_STACK must make sure that every path of the helper function will fill the provided buffer area, so that we cannot leak any uninitialized stack memory. This f.e. means that error paths need to memset() the buffers, but the expected fast-path doesn't have to do this anymore. Since there's no such helper needing more than at most one ARG_PTR_TO_RAW_STACK argument, we can keep it simple and don't need to check for multiple areas. Should in future such a use-case really appear, we have check_raw_mode() that will make sure we implement support for it first. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-15bpf, verifier: add bpf_call_arg_meta for passing meta dataDaniel Borkmann1-17/+23
Currently, when the verifier checks calls in check_call() function, we call check_func_arg() for all 5 arguments e.g. to make sure expected types are correct. In some cases, we collect meta data (here: map pointer) to perform additional checks such as checking stack boundary on key/value sizes for subsequent arguments. As we're going to extend the meta data, add a generic struct bpf_call_arg_meta that we can use for passing into check_func_arg(). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-15sctp: add support for RPS and RFSMarcelo Ricardo Leitner2-0/+6
This patch adds what's missing to properly support RPS and RFS on SCTP, as some of it is already implemented in common calls. Having support for RPS and RFS allows better scaling specially because not all NICs support hashing SCTP headers. Save the hash right when we dequeue a skb from inqueue so we do it only once per skb instead of per chunk. New sockets will then inherit the hash through sctp_copy_sock(). Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-15net: validate_xmit_skb() changesEric Dumazet1-5/+2
skbs given to validate_xmit_skb() should not have a next pointer anymore. Also if a packet is dropped, increment dev->tx_dropped __dev_queue_xmit() no longer has to change tx_dropped in this case. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-15Merge tag 'for-linus-4.6-rc4' of ↵Linus Torvalds5-29/+61
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs/fscrypto fixes from Jaegeuk Kim: "In addition to f2fs/fscrypto fixes, I've added one patch which prevents RCU mode lookup in d_revalidate, as Al mentioned. These patches fix f2fs and fscrypto based on -rc3 bug fixes in ext4 crypto, which have not yet been fully propagated as follows. - use of dget_parent and file_dentry to avoid crashes - disallow RCU-mode lookup in d_invalidate - disallow -ENOMEM in the core data encryption path" * tag 'for-linus-4.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: ext4/fscrypto: avoid RCU lookup in d_revalidate fscrypto: don't let data integrity writebacks fail with ENOMEM f2fs: use dget_parent and file_dentry in f2fs_file_open fscrypto: use dget_parent() in fscrypt_d_revalidate()
2016-04-15bgmac: fix MAC soft-reset bit for corerev > 4Felix Fietkau1-3/+3
Only core revisions older than 4 use BGMAC_CMDCFG_SR_REV0. This mainly fixes support for BCM4708A0KF SoCs with Ethernet core rev 5 (it means only some devices as most of BCM4708A0KF-s got core rev 4). This was tested for regressions on BCM47094 which doesn't seem to care which bit gets used. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Rafał Miłecki <zajec5@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-15Merge branch 'linus' of ↵Linus Torvalds3-3/+9
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "This fixes an NFS regression caused by the skcipher/hash conversion in sunrpc. It also fixes a build problem in certain configurations with bcm63xx" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: hwrng: bcm63xx - fix device tree compilation sunrpc: Fix skcipher/shash conversion
2016-04-15Merge branch 'soreuseport-mixed-v4-v6-fixes'David S. Miller6-4/+261
Craig Gallek says: ==================== Fixes for SO_REUSEPORT and mixed v4/v6 sockets Recent changes to the datastructures associated with SO_REUSEPORT broke an existing behavior when equivalent SO_REUSEPORT sockets are created using both AF_INET and AF_INET6. This patch series restores the previous behavior and includes a test to validate it. This series should be a trivial merge to stable kernels (if deemed necessary), but will have conflicts in net-next. The following patches recently replaced the use of hlist_nulls with hlists for UDP and TCP socket lists: ca065d0cf80f ("udp: no longer use SLAB_DESTROY_BY_RCU") 3b24d854cb35 ("tcp/dccp: do not touch listener sk_refcnt under synflood") If this series is accepted, I will send an RFC for the net-next change to assist with the merge. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-15soreuseport: test mixed v4/v6 socketsCraig Gallek3-1/+210
Test to validate the behavior of SO_REUSEPORT sockets that are created with both AF_INET and AF_INET6. See the commit prior to this for a description of this behavior. Signed-off-by: Craig Gallek <kraig@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-15soreuseport: fix ordering for mixed v4/v6 socketsCraig Gallek3-3/+51
With the SO_REUSEPORT socket option, it is possible to create sockets in the AF_INET and AF_INET6 domains which are bound to the same IPv4 address. This is only possible with SO_REUSEPORT and when not using IPV6_V6ONLY on the AF_INET6 sockets. Prior to the commits referenced below, an incoming IPv4 packet would always be routed to a socket of type AF_INET when this mixed-mode was used. After those changes, the same packet would be routed to the most recently bound socket (if this happened to be an AF_INET6 socket, it would have an IPv4 mapped IPv6 address). The change in behavior occurred because the recent SO_REUSEPORT optimizations short-circuit the socket scoring logic as soon as they find a match. They did not take into account the scoring logic that favors AF_INET sockets over AF_INET6 sockets in the event of a tie. To fix this problem, this patch changes the insertion order of AF_INET and AF_INET6 addresses in the TCP and UDP socket lists when the sockets have SO_REUSEPORT set. AF_INET sockets will be inserted at the head of the list and AF_INET6 sockets with SO_REUSEPORT set will always be inserted at the tail of the list. This will force AF_INET sockets to always be considered first. Fixes: e32ea7e74727 ("soreuseport: fast reuseport UDP socket selection") Fixes: 125e80b88687 ("soreuseport: fast reuseport TCP socket selection") Reported-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: Craig Gallek <kraig@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-15cdc_mbim: apply "NDP to end" quirk to all Huawei devicesBjørn Mork1-2/+7
We now have a positive report of another Huawei device needing this quirk: The ME906s-158 (12d1:15c1). This is an m.2 form factor modem with no obvious relationship to the E3372 (12d1:157d) we already have a quirk entry for. This is reason enough to believe the quirk might be necessary for any number of current and future Huawei devices. Applying the quirk to all Huawei devices, since it is crucial to any device affected by the firmware bug, while the impact on non-affected devices is negligible. The quirk can if necessary be disabled per-device by writing N to /sys/class/net/<iface>/cdc_ncm/ndp_to_end Reported-by: Andreas Fett <andreas.fett@secunet.com> Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-15Merge branch 'for-linus' of ↵Linus Torvalds2-2/+4
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull keys bugfixes from James Morris: "Two bugfixes for Keys related code" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: ASN.1: fix open failure check on headername assoc_array: don't call compare_object() on a node
2016-04-15packet: uses kfree_skb() for errors.Weongyo Jeong1-2/+12
consume_skb() isn't for error cases that kfree_skb() is more proper one. At this patch, it fixed tpacket_rcv() and packet_rcv() to be consistent for error or non-error cases letting perf trace its event properly. Signed-off-by: Weongyo Jeong <weongyo.linux@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-15dm cache metadata: fix READ_LOCK macros and cleanup WRITE_LOCK macrosMike Snitzer1-24/+40
The READ_LOCK macro was incorrectly returning -EINVAL if dm_bm_is_read_only() was true -- it will always be true once the cache metadata transitions to read-only by dm_cache_metadata_set_read_only(). Wrap READ_LOCK and WRITE_LOCK multi-statement macros in do {} while(0). Also, all accesses of the 'cmd' argument passed to these related macros are now encapsulated in parenthesis. A follow-up patch can be developed to eliminate the use of macros in favor of pure C code. Avoiding that now given that this needs to apply to stable@. Reported-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Fixes: d14fcf3dd79 ("dm cache: make sure every metadata function checks fail_io") Cc: stable@vger.kernel.org
2016-04-15bgmac: reset & enable Ethernet core before using itRafał Miłecki1-0/+5
This fixes Ethernet on D-Link DIR-885L with BCM47094 SoC. Felix reported similar fix was needed for his BCM4709 device (Buffalo WXR-1900DHP?). I tested this for regressions on BCM4706, BCM4708A0 and BCM47081A0. Cc: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Rafał Miłecki <zajec5@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14tipc: fix a race condition leading to subscriber refcnt bugParthasarathy Bhuvaragan3-10/+17
Until now, the requests sent to topology server are queued to a workqueue by the generic server framework. These messages are processed by worker threads and trigger the registered callbacks. To reduce latency on uniprocessor systems, explicit rescheduling is performed using cond_resched() after MAX_RECV_MSG_COUNT(25) messages. This implementation on SMP systems leads to an subscriber refcnt error as described below: When a worker thread yields by calling cond_resched() in a SMP system, a new worker is created on another CPU to process the pending workitem. Sometimes the sleeping thread wakes up before the new thread finishes execution. This breaks the assumption on ordering and being single threaded. The fault is more frequent when MAX_RECV_MSG_COUNT is lowered. If the first thread was processing subscription create and the second thread processing close(), the close request will free the subscriber and the create request oops as follows: [31.224137] WARNING: CPU: 2 PID: 266 at include/linux/kref.h:46 tipc_subscrb_rcv_cb+0x317/0x380 [tipc] [31.228143] CPU: 2 PID: 266 Comm: kworker/u8:1 Not tainted 4.5.0+ #97 [31.228377] Workqueue: tipc_rcv tipc_recv_work [tipc] [...] [31.228377] Call Trace: [31.228377] [<ffffffff812fbb6b>] dump_stack+0x4d/0x72 [31.228377] [<ffffffff8105a311>] __warn+0xd1/0xf0 [31.228377] [<ffffffff8105a3fd>] warn_slowpath_null+0x1d/0x20 [31.228377] [<ffffffffa0098067>] tipc_subscrb_rcv_cb+0x317/0x380 [tipc] [31.228377] [<ffffffffa00a4984>] tipc_receive_from_sock+0xd4/0x130 [tipc] [31.228377] [<ffffffffa00a439b>] tipc_recv_work+0x2b/0x50 [tipc] [31.228377] [<ffffffff81071925>] process_one_work+0x145/0x3d0 [31.246554] ---[ end trace c3882c9baa05a4fd ]--- [31.248327] BUG: spinlock bad magic on CPU#2, kworker/u8:1/266 [31.249119] BUG: unable to handle kernel NULL pointer dereference at 0000000000000428 [31.249323] IP: [<ffffffff81099d0c>] spin_dump+0x5c/0xe0 [31.249323] PGD 0 [31.249323] Oops: 0000 [#1] SMP In this commit, we - rename tipc_conn_shutdown() to tipc_conn_release(). - move connection release callback execution from tipc_close_conn() to a new function tipc_sock_release(), which is executed before we free the connection. Thus we release the subscriber during connection release procedure rather than connection shutdown procedure. Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14Merge branch 'ipv6-dgram-dst-cache'David S. Miller4-63/+121
Martin KaFai Lau says: ==================== ipv6: datagram: Update dst cache of a connected udp sk during pmtu update v2: ~ Protect __sk_dst_get() operations with rcu_read_lock in release_cb() because another thread may do ip6_dst_store() for a udp sk without taking the sk lock (e.g. in sendmsg). ~ Do a ipv6_addr_v4mapped(&sk->sk_v6_daddr) check before calling ip6_datagram_dst_update() in patch 3 and 4. It is similar to how __ip6_datagram_connect handles it. ~ One fix in ip6_datagram_dst_update() in patch 2. It needs to check (np->flow_label & IPV6_FLOWLABEL_MASK) before doing fl6_sock_lookup. I was confused with the naming of IPV6_FLOWLABEL_MASK and IPV6_FLOWINFO_MASK. ~ Check dst->obsolete just on the safe side, although I think it should at least have DST_OBSOLETE_FORCE_CHK by now. ~ Add Fixes tag to patch 3 and 4 ~ Add some points from the previous discussion about holding sk lock to the commit message in patch 3. v1: There is a case in connected UDP socket such that getsockopt(IPV6_MTU) will return a stale MTU value. The reproducible sequence could be the following: 1. Create a connected UDP socket 2. Send some datagrams out 3. Receive a ICMPV6_PKT_TOOBIG 4. No new outgoing datagrams to trigger the sk_dst_check() logic to update the sk->sk_dst_cache. 5. getsockopt(IPV6_MTU) returns the mtu from the invalid sk->sk_dst_cache instead of the newly created RTF_CACHE clone. Patch 1 and 2 are the prep work. Patch 3 and 4 are the fixes. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14ipv6: udp: Do a route lookup and update during release_cbMartin KaFai Lau3-0/+22
This patch adds a release_cb for UDPv6. It does a route lookup and updates sk->sk_dst_cache if it is needed. It picks up the left-over job from ip6_sk_update_pmtu() if the sk was owned by user during the pmtu update. It takes a rcu_read_lock to protect the __sk_dst_get() operations because another thread may do ip6_dst_store() without taking the sk lock (e.g. sendmsg). Fixes: 45e4fd26683c ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reported-by: Wei Wang <weiwan@google.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Wei Wang <weiwan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14ipv6: datagram: Update dst cache of a connected datagram sk during pmtu updateMartin KaFai Lau3-9/+24
There is a case in connected UDP socket such that getsockopt(IPV6_MTU) will return a stale MTU value. The reproducible sequence could be the following: 1. Create a connected UDP socket 2. Send some datagrams out 3. Receive a ICMPV6_PKT_TOOBIG 4. No new outgoing datagrams to trigger the sk_dst_check() logic to update the sk->sk_dst_cache. 5. getsockopt(IPV6_MTU) returns the mtu from the invalid sk->sk_dst_cache instead of the newly created RTF_CACHE clone. This patch updates the sk->sk_dst_cache for a connected datagram sk during pmtu-update code path. Note that the sk->sk_v6_daddr is used to do the route lookup instead of skb->data (i.e. iph). It is because a UDP socket can become connected after sending out some datagrams in un-connected state. or It can be connected multiple times to different destinations. Hence, iph may not be related to where sk is currently connected to. It is done under '!sock_owned_by_user(sk)' condition because the user may make another ip6_datagram_connect() (i.e changing the sk->sk_v6_daddr) while dst lookup is happening in the pmtu-update code path. For the sock_owned_by_user(sk) == true case, the next patch will introduce a release_cb() which will update the sk->sk_dst_cache. Test: Server (Connected UDP Socket): ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Route Details: [root@arch-fb-vm1 ~]# ip -6 r show | egrep '2fac' 2fac::/64 dev eth0 proto kernel metric 256 pref medium 2fac:face::/64 via 2fac::face dev eth0 metric 1024 pref medium A simple python code to create a connected UDP socket: import socket import errno HOST = '2fac::1' PORT = 8080 s = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM) s.bind((HOST, PORT)) s.connect(('2fac:face::face', 53)) print("connected") while True: try: data = s.recv(1024) except socket.error as se: if se.errno == errno.EMSGSIZE: pmtu = s.getsockopt(41, 24) print("PMTU:%d" % pmtu) break s.close() Python program output after getting a ICMPV6_PKT_TOOBIG: [root@arch-fb-vm1 ~]# python2 ~/devshare/kernel/tasks/fib6/udp-connect-53-8080.py connected PMTU:1300 Cache routes after recieving TOOBIG: [root@arch-fb-vm1 ~]# ip -6 r show table cache 2fac:face::face via 2fac::face dev eth0 metric 0 cache expires 463sec mtu 1300 pref medium Client (Send the ICMPV6_PKT_TOOBIG): ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ scapy is used to generate the TOOBIG message. Here is the scapy script I have used: >>> p=Ether(src='da:75:4d:36:ac:32', dst='52:54:00:12:34:66', type=0x86dd)/IPv6(src='2fac::face', dst='2fac::1')/ICMPv6PacketTooBig(mtu=1300)/IPv6(src='2fac:: 1',dst='2fac:face::face', nh='UDP')/UDP(sport=8080,dport=53) >>> sendp(p, iface='qemubr0') Fixes: 45e4fd26683c ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reported-by: Wei Wang <weiwan@google.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Wei Wang <weiwan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14ipv6: datagram: Refactor dst lookup and update codes to a new functionMartin KaFai Lau1-46/+57
This patch moves the route lookup and update codes for connected datagram sk to a newly created function ip6_datagram_dst_update() It will be reused during the pmtu update in the later patch. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Wei Wang <weiwan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14ipv6: datagram: Refactor flowi6 init codes to a new functionMartin KaFai Lau1-20/+30
Move flowi6 init codes for connected datagram sk to a newly created function ip6_datagram_flow_key_init(). Notes: 1. fl6_flowlabel is used instead of fl6.flowlabel in __ip6_datagram_connect 2. ipv6_addr_is_multicast(&fl6->daddr) is used instead of (addr_type & IPV6_ADDR_MULTICAST) in ip6_datagram_flow_key_init() This new function will be reused during pmtu update in the later patch. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Wei Wang <weiwan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14net: mediatek: update the IRQ part of the binding documentJohn Crispin1-2/+5
The current binding document only describes a single interrupt. Update the document by adding the 2 other interrupts. The driver currently only uses a single interrupt. The HW is however able to using IRQ grouping to split TX and RX onto separate GIC irqs. Signed-off-by: John Crispin <blogic@openwrt.org> Cc: devicetree@vger.kernel.org Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14Merge branch 'gro-fixed-id-gso-partial'David S. Miller13-54/+395
Alexander Duyck says: ==================== GRO Fixed IPv4 ID support and GSO partial support This patch series sets up a few different things. First it adds support for GRO of frames with a fixed IP ID value. This will allow us to perform GRO for frames that go through things like an IPv6 to IPv4 header translation. The second item we add is support for segmenting frames that are generated this way. Most devices only support an incrementing IP ID value, and in the case of TCP the IP ID can be ignored in many cases since the DF bit should be set. So we can technically segment these frames using existing TSO if we are willing to allow the IP ID to be mangled. As such I have added a matching feature for the new form of GRO/GSO called TCP IPv4 ID mangling. With this enabled we can assemble and disassemble a frame with the sequence number fixed and the only ill effect will be that the IPv4 ID will be altered which may or may not have any noticeable effect. As such I have defaulted the feature to disabled. The third item this patch series adds is support for partial GSO segmentation. Partial GSO segmentation allows us to split a large frame into two pieces. The first piece will have an even multiple of MSS worth of data and the headers before the one pointed to by csum_start will have been updated so that they are correct for if the data payload had already been segmented. By doing this we can do things such as precompute the outer header checksums for a frame to be segmented allowing us to perform TSO on devices that don't support tunneling, or tunneling with outer header checksums. This patch set is based on the net-next tree, but I included "net: remove netdevice gso_min_segs" in my tree as I assume it is likely to be applied before this patch set will and I wanted to avoid a merge conflict. v2: Fixed items reported by Jesse Gross fixed missing GSO flag in MPLS check adding DF check for MANGLEID Moved extra GSO feature checks into gso_features_check Rebased batches to account for "net: remove netdevice gso_min_segs" Driver patches from the first patch set should still be compatible. However I do have a few changes in them so I will submit a v2 of those to Jeff Kirsher once these patches are accepted into net-next. Example driver patches for i40e, ixgbe, and igb: https://patchwork.ozlabs.org/patch/608221/ https://patchwork.ozlabs.org/patch/608224/ https://patchwork.ozlabs.org/patch/608225/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14Documentation: Add documentation for TSO and GSO featuresAlexander Duyck1-0/+130
This document is a starting point for defining the TSO and GSO features. The whole thing is starting to get a bit messy so I wanted to make sure we have notes somwhere to start describing what does and doesn't work. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14GSO: Support partial segmentation offloadAlexander Duyck11-24/+151
This patch adds support for something I am referring to as GSO partial. The basic idea is that we can support a broader range of devices for segmentation if we use fixed outer headers and have the hardware only really deal with segmenting the inner header. The idea behind the naming is due to the fact that everything before csum_start will be fixed headers, and everything after will be the region that is handled by hardware. With the current implementation it allows us to add support for the following GSO types with an inner TSO_MANGLEID or TSO6 offload: NETIF_F_GSO_GRE NETIF_F_GSO_GRE_CSUM NETIF_F_GSO_IPIP NETIF_F_GSO_SIT NETIF_F_UDP_TUNNEL NETIF_F_UDP_TUNNEL_CSUM In the case of hardware that already supports tunneling we may be able to extend this further to support TSO_TCPV4 without TSO_MANGLEID if the hardware can support updating inner IPv4 headers. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14GRO: Add support for TCP with fixed IPv4 ID field, limit tunnel IP ID valuesAlexander Duyck5-11/+54
This patch does two things. First it allows TCP to aggregate TCP frames with a fixed IPv4 ID field. As a result we should now be able to aggregate flows that were converted from IPv6 to IPv4. In addition this allows us more flexibility for future implementations of segmentation as we may be able to use a fixed IP ID when segmenting the flow. The second thing this does is that it places limitations on the outer IPv4 ID header in the case of tunneled frames. Specifically it forces the IP ID to be incrementing by 1 unless the DF bit is set in the outer IPv4 header. This way we can avoid creating overlapping series of IP IDs that could possibly be fragmented if the frame goes through GRO and is then resegmented via GSO. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14GSO: Add GSO type for fixed IPv4 IDAlexander Duyck10-24/+63
This patch adds support for TSO using IPv4 headers with a fixed IP ID field. This is meant to allow us to do a lossless GRO in the case of TCP flows that use a fixed IP ID such as those that convert IPv6 header to IPv4 headers. In addition I am adding a feature that for now I am referring to TSO with IP ID mangling. Basically when this flag is enabled the device has the option to either output the flow with incrementing IP IDs or with a fixed IP ID regardless of what the original IP ID ordering was. This is useful in cases where the DF bit is set and we do not care if the original IP ID value is maintained. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14ethtool: Add support for toggling any of the GSO offloadsAlexander Duyck1-0/+2
The strings were missing for several of the GSO offloads that are available. This patch provides the missing strings so that we can toggle or query any of them via the ethtool command. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14Merge branch 'mlxsw-devlink-shared-buffers'David S. Miller10-418/+2683
Jiri Pirko says: ==================== devlink + mlxsw: add support for config and control of shared buffers ASICs implement shared buffer for packet forwarding purposes and enable flexible partitioning of the shared buffer for different flows and ports, enabling non-blocking progress of different flows as well as separation of lossy traffic from loss-less traffic when using Per-Priority Flow Control (PFC). The shared buffer optimizes the buffer utilization for better absorption of packet bursts. This patchset implements API which is based on the model SAI uses. That is aligned with multiple ASIC vendors so this API should be vendor neutral. Userspace counterpart patchset for devlink iproute2 tool can be found here: https://github.com/jpirko/iproute2_mlxsw/tree/devlink_sb Couple of examples of usage: switch$ devlink sb help Usage: devlink sb show [ DEV [ sb SB_INDEX ] ] devlink sb pool show [ DEV [ sb SB_INDEX ] pool POOL_INDEX ] devlink sb pool set DEV [ sb SB_INDEX ] pool POOL_INDEX size POOL_SIZE thtype { static | dynamic } devlink sb port pool show [ DEV/PORT_INDEX [ sb SB_INDEX ] pool POOL_INDEX ] devlink sb port pool set DEV/PORT_INDEX [ sb SB_INDEX ] pool POOL_INDEX th THRESHOLD devlink sb tc bind show [ DEV/PORT_INDEX [ sb SB_INDEX ] tc TC_INDEX ] devlink sb tc bind set DEV/PORT_INDEX [ sb SB_INDEX ] tc TC_INDEX type { ingress | egress } pool POOL_INDEX th THRESHOLD devlink sb occupancy show { DEV | DEV/PORT_INDEX } [ sb SB_INDEX ] devlink sb occupancy snapshot DEV [ sb SB_INDEX ] devlink sb occupancy clearmax DEV [ sb SB_INDEX ] switch$ devlink sb show pci/0000:03:00.0: sb 0 size 16777216 ing_pools 4 eg_pools 4 ing_tcs 8 eg_tcs 8 switch$ devlink sb pool show pci/0000:03:00.0: sb 0 pool 0 type ingress size 12400032 thtype dynamic pci/0000:03:00.0: sb 0 pool 1 type ingress size 0 thtype dynamic pci/0000:03:00.0: sb 0 pool 2 type ingress size 0 thtype dynamic pci/0000:03:00.0: sb 0 pool 3 type ingress size 200064 thtype dynamic pci/0000:03:00.0: sb 0 pool 4 type egress size 13220064 thtype dynamic pci/0000:03:00.0: sb 0 pool 5 type egress size 0 thtype dynamic pci/0000:03:00.0: sb 0 pool 6 type egress size 0 thtype dynamic pci/0000:03:00.0: sb 0 pool 7 type egress size 0 thtype dynamic switch$ devlink sb port pool show sw0p7 pool 0 sw0p7: sb 0 pool 0 threshold 16 switch$ sudo devlink sb port pool set sw0p7 pool 0 th 15 switch$ devlink sb port pool show sw0p7 pool 0 sw0p7: sb 0 pool 0 threshold 15 switch$ devlink sb tc bind show sw0p7 tc 0 type ingress sw0p7: sb 0 tc 0 type ingress pool 0 threshold 10 switch$ sudo devlink sb tc bind set sw0p7 tc 0 type ingress pool 0 th 9 switch$ devlink sb tc bind show sw0p7 tc 0 type ingress sw0p7: sb 0 tc 0 type ingress pool 0 threshold 9 switch$ sudo devlink sb occupancy snapshot pci/0000:03:00.0 switch$ devlink sb occupancy show sw0p7 sw0p7: pool: 0: 82944/3217344 1: 0/0 2: 0/0 3: 0/0 4: 0/384 5: 0/0 6: 0/0 7: 0/0 itc: 0(0): 96768/3217344 1(0): 0/0 2(0): 0/0 3(0): 0/0 4(0): 0/0 5(0): 0/0 6(0): 0/0 7(0): 0/0 etc: 0(4): 0/384 1(4): 0/0 2(4): 0/0 3(4): 0/0 4(4): 0/0 5(4): 0/0 6(4): 0/0 7(4): 0/0 switch$ sudo devlink sb occupancy clearmax pci/0000:03:00.0 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14mlxsw: spectrum_buffers: Implement occupancy monitoringJiri Pirko3-16/+293
Implement occupancy API introduced in devlink and mlxsw core. This is done by accessing SBPM register for Port-Pool and SBSR for Port-TC current and max occupancy values. Max clear is implemented using the same registers. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14mlxsw: core: Introduce support for asynchronous EMAD register accessJiri Pirko2-173/+334
So far it was possible to have one EMAD register access at a time, locked by mutex. This patch extends this interface to allow multiple EMAD register accesses to be in fly at once. That allows faster processing on firmware side avoiding unused time in between EMADs. Measured speedup is ~30% for shared occupancy snapshot operation. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14mlxsw: core: Add mlxsw specific workqueue and use it for FDB notif. processingJiri Pirko3-4/+28
Follow-up patch is going to need to use delayed work as well and frequently. The FDB notification processing is already using that and also quite frequently. It makes sense to create separate workqueue just for mlxsw driver in this case and do not pollute system_wq. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14mlxsw: reg: Extend SBPM register for occupancy controlJiri Pirko2-2/+32
Since it is not possible to get and clear Port-Pool occupancy data using SBSR register, there's a need to implement that using SBPM. Extend pack helper and add unpack helper to get occupancy values. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14mlxsw: reg: Add Shared Buffer Status register definitionJiri Pirko1-0/+100
This register allows to query HW for current and maximal buffer usage. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>