kernel/linux.git/include/linux, branch v4.9.136

posix-timers: Sanitize overrun handling

2018-11-10T15:43:01+00:00

[ Upstream commit 78c9c4dfbf8c04883941445a195276bb4bb92c76 ] The posix timer overrun handling is broken because the forwarding functions can return a huge number of overruns which does not fit in an int. As a consequence timer_getoverrun(2) and siginfo::si_overrun can turn into random number generators. The k_clock::timer_forward() callbacks return a 64 bit value now. Make k_itimer::ti_overrun[_last] 64bit as well, so the kernel internal accounting is correct. 3Remove the temporary (int) casts. Add a helper function which clamps the overrun value returned to user space via timer_getoverrun(2) or siginfo::si_overrun limited to a positive value between 0 and INT_MAX. INT_MAX is an indicator for user space that the overrun value has been clamped. Reported-by: Team OWL337 Signed-off-by: Thomas Gleixner Acked-by: John Stultz Cc: Peter Zijlstra Cc: Michael Kerrisk Link: https://lkml.kernel.org/r/20180626132705.018623573@linutronix.de [florian: Make patch apply to v4.9.135] Signed-off-by: Florian Fainelli Reviewed-by: Thomas Gleixner Signed-off-by: Sasha Levin

iio: buffer: fix the function signature to match implementation

2018-11-10T15:42:55+00:00

[ Upstream commit 92397a6c38d139d50fabbe9e2dc09b61d53b2377 ] linux/iio/buffer-dma.h was not updated to when length was changed to unsigned int. Fixes: c043ec1ca5ba ("iio:buffer: make length types match kfifo types") Signed-off-by: Phil Reid Signed-off-by: Jonathan Cameron Signed-off-by: Sasha Levin

elevator: fix truncation of icq_cache_name

2018-11-10T15:42:48+00:00

[ Upstream commit 9bd2bbc01d17ddd567cc0f81f77fe1163e497462 ] gcc 7.1 reports the following warning: block/elevator.c: In function ‘elv_register’: block/elevator.c:898:5: warning: ‘snprintf’ output may be truncated before the last format character [-Wformat-truncation=] "%s_io_cq", e->elevator_name); ^~~~~~~~~~ block/elevator.c:897:3: note: ‘snprintf’ output between 7 and 22 bytes into a destination of size 21 snprintf(e->icq_cache_name, sizeof(e->icq_cache_name), ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ "%s_io_cq", e->elevator_name); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The bug is that the name of the icq_cache is 6 characters longer than the elevator name, but only ELV_NAME_MAX + 5 characters were reserved for it --- so in the case of a maximum-length elevator name, the 'q' character in "_io_cq" would be truncated by snprintf(). Fix it by reserving ELV_NAME_MAX + 6 characters instead. Signed-off-by: Eric Biggers Reviewed-by: Bart Van Assche Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin

mremap: properly flush TLB before releasing the page

2018-10-20T07:51:31+00:00

commit eb66ae030829605d61fbef1909ce310e29f78821 upstream. Jann Horn points out that our TLB flushing was subtly wrong for the mremap() case. What makes mremap() special is that we don't follow the usual "add page to list of pages to be freed, then flush tlb, and then free pages". No, mremap() obviously just _moves_ the page from one page table location to another. That matters, because mremap() thus doesn't directly control the lifetime of the moved page with a freelist: instead, the lifetime of the page is controlled by the page table locking, that serializes access to the entry. As a result, we need to flush the TLB not just before releasing the lock for the source location (to avoid any concurrent accesses to the entry), but also before we release the destination page table lock (to avoid the TLB being flushed after somebody else has already done something to that page). This also makes the whole "need_flush" logic unnecessary, since we now always end up flushing the TLB for every valid entry. Reported-and-tested-by: Jann Horn Acked-by: Will Deacon Tested-by: Ingo Molnar Acked-by: Peter Zijlstra (Intel) Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman Signed-off-by: Greg Kroah-Hartman

ip: use rb trees for IP frag queue.

2018-10-18T07:13:26+00:00

(commit fa0f527358bd900ef92f925878ed6bfbd51305cc upstream) Similar to TCP OOO RX queue, it makes sense to use rb trees to store IP fragments, so that OOO fragments are inserted faster. Tested: - a follow-up patch contains a rather comprehensive ip defrag self-test (functional) - ran neper `udp_stream -c -H -F 100 -l 300 -T 20`: netstat --statistics Ip: 282078937 total packets received 0 forwarded 0 incoming packets discarded 946760 incoming packets delivered 18743456 requests sent out 101 fragments dropped after timeout 282077129 reassemblies required 944952 packets reassembled ok 262734239 packet reassembles failed (The numbers/stats above are somewhat better re: reassemblies vs a kernel without this patchset. More comprehensive performance testing TBD). Reported-by: Jann Horn Reported-by: Juha-Matti Tilli Suggested-by: Eric Dumazet Signed-off-by: Peter Oskolkov Signed-off-by: Eric Dumazet Cc: Florian Westphal Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman

net: add rb_to_skb() and other rb tree helpers

2018-10-18T07:13:26+00:00

Geeralize private netem_rb_to_skb() TCP rtx queue will soon be converted to rb-tree, so we will need skb_rbtree_walk() helpers. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller (cherry picked from commit 18a4c0eab2623cc95be98a1e6af1ad18e7695977) Signed-off-by: Greg Kroah-Hartman

net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends

2018-10-18T07:13:26+00:00

After working on IP defragmentation lately, I found that some large packets defeat CHECKSUM_COMPLETE optimization because of NIC adding zero paddings on the last (small) fragment. While removing the padding with pskb_trim_rcsum(), we set skb->ip_summed to CHECKSUM_NONE, forcing a full csum validation, even if all prior fragments had CHECKSUM_COMPLETE set. We can instead compute the checksum of the part we are trimming, usually smaller than the part we keep. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller (cherry picked from commit 88078d98d1bb085d72af8437707279e203524fa5) Signed-off-by: Greg Kroah-Hartman

net: modify skb_rbtree_purge to return the truesize of all purged skbs.

2018-10-18T07:13:25+00:00

Tested: see the next patch is the series. Suggested-by: Eric Dumazet Signed-off-by: Peter Oskolkov Signed-off-by: Eric Dumazet Cc: Florian Westphal Signed-off-by: David S. Miller (cherry picked from commit 385114dec8a49b5e5945e77ba7de6356106713f4) Signed-off-by: Greg Kroah-Hartman

inet: frags: get rid of ipfrag_skb_cb/FRAG_CB

2018-10-18T07:13:25+00:00

ip_defrag uses skb->cb[] to store the fragment offset, and unfortunately this integer is currently in a different cache line than skb->next, meaning that we use two cache lines per skb when finding the insertion point. By aliasing skb->ip_defrag_offset and skb->dev, we pack all the fields in a single cache line and save precious memory bandwidth. Note that after the fast path added by Changli Gao in commit d6bebca92c66 ("fragment: add fast path for in-order fragments") this change wont help the fast path, since we still need to access prev->len (2nd cache line), but will show great benefits when slow path is entered, since we perform a linear scan of a potentially long list. Also, note that this potential long list is an attack vector, we might consider also using an rb-tree there eventually. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller (cherry picked from commit bf66337140c64c27fa37222b7abca7e49d63fb57) Signed-off-by: Greg Kroah-Hartman

rhashtable: reorganize struct rhashtable layout

2018-10-18T07:13:25+00:00

While under frags DDOS I noticed unfortunate false sharing between @nelems and @params.automatic_shrinking Move @nelems at the end of struct rhashtable so that first cache line is shared between all cpus, because almost never dirtied. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller (cherry picked from commit e5d672a0780d9e7118caad4c171ec88b8299398d) Signed-off-by: Greg Kroah-Hartman