summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2016-10-26tcp: make challenge acks less predictableEric Dumazet1-4/+9
commit 75ff39ccc1bd5d3c455b6822ab09e533c551f758 upstream. Yue Cao claims that current host rate limiting of challenge ACKS (RFC 5961) could leak enough information to allow a patient attacker to hijack TCP sessions. He will soon provide details in an academic paper. This patch increases the default limit from 100 to 1000, and adds some randomization so that the attacker can no longer hijack sessions without spending a considerable amount of probes. Based on initial analysis and patch from Linus. Note that we also have per socket rate limiting, so it is tempting to remove the host limit in the future. v2: randomize the count of challenge acks per second, not the period. Fixes: 282f23c6ee34 ("tcp: implement RFC 5961 3.2") Reported-by: Yue Cao <ycao009@ucr.edu> Signed-off-by: Eric Dumazet <edumazet@google.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> [lizf: Backported to 3.4: - adjust context - use ACCESS_ONCE instead WRITE_ONCE/READ_ONCE - open-code prandom_u32_max()] Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-26Revert "USB: Add OTG PET device to TPL"Zefan Li2-9/+0
This reverts commit 97fa724b23c3dd22e9c0979ad0e9d260cc6d545d. Conflicts: drivers/usb/core/quirks.c Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-26Revert "USB: Add device quirk for ASUS T100 Base Station keyboard"Zefan Li3-11/+2
This reverts commit eea5a87d270e8d6925063019c3b0f3ff61fcb49a. Conflicts: drivers/usb/core/quirks.c include/linux/usb/quirks.h Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-26Fix incomplete backport of commit 0f792cf949a0Zefan Li1-5/+9
Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-26Fix incomplete backport of commit 423f04d63cf4Zefan Li1-3/+0
Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-26ipv6: fix handling of blackhole and prohibit routesNicolas Dichtel2-4/+29
commit ef2c7d7b59708d54213c7556a82d14de9a7e4475 upstream. When adding a blackhole or a prohibit route, they were handling like classic routes. Moreover, it was only possible to add this kind of routes by specifying an interface. Bug already reported here: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=498498 Before the patch: $ ip route add blackhole 2001::1/128 RTNETLINK answers: No such device $ ip route add blackhole 2001::1/128 dev eth0 $ ip -6 route | grep 2001 2001::1 dev eth0 metric 1024 After: $ ip route add blackhole 2001::1/128 $ ip -6 route | grep 2001 blackhole 2001::1 dev lo metric 1024 error -22 v2: wrong patch v3: add a field fc_type in struct fib6_config to store RTN_* type Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-26ipv6: don't call fib6_run_gc() until routing is readyMichal Kubeček3-7/+19
commit 2c861cc65ef4604011a0082e4dcdba2819aa191a upstream. When loading the ipv6 module, ndisc_init() is called before ip6_route_init(). As the former registers a handler calling fib6_run_gc(), this opens a window to run the garbage collector before necessary data structures are initialized. If a network device is initialized in this window, adding MAC address to it triggers a NETDEV_CHANGEADDR event, leading to a crash in fib6_clean_all(). Take the event handler registration out of ndisc_init() into a separate function ndisc_late_init() and move it after ip6_route_init(). Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-26ipv6: update ip6_rt_last_gc every time GC is runMichal Kubeček2-4/+6
commit 49a18d86f66d33a20144ecb5a34bba0d1856b260 upstream. As pointed out by Eric Dumazet, net->ipv6.ip6_rt_last_gc should hold the last time garbage collector was run so that we should update it whenever fib6_run_gc() calls fib6_clean_all(), not only if we got there from ip6_dst_gc(). Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-26sctp: Prevent soft lockup when sctp_accept() is called during a timeout eventKarl Heiss1-15/+19
commit 635682a14427d241bab7bbdeebb48a7d7b91638e upstream. A case can occur when sctp_accept() is called by the user during a heartbeat timeout event after the 4-way handshake. Since sctp_assoc_migrate() changes both assoc->base.sk and assoc->ep, the bh_sock_lock in sctp_generate_heartbeat_event() will be taken with the listening socket but released with the new association socket. The result is a deadlock on any future attempts to take the listening socket lock. Note that this race can occur with other SCTP timeouts that take the bh_lock_sock() in the event sctp_accept() is called. BUG: soft lockup - CPU#9 stuck for 67s! [swapper:0] ... RIP: 0010:[<ffffffff8152d48e>] [<ffffffff8152d48e>] _spin_lock+0x1e/0x30 RSP: 0018:ffff880028323b20 EFLAGS: 00000206 RAX: 0000000000000002 RBX: ffff880028323b20 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff880028323be0 RDI: ffff8804632c4b48 RBP: ffffffff8100bb93 R08: 0000000000000000 R09: 0000000000000000 R10: ffff880610662280 R11: 0000000000000100 R12: ffff880028323aa0 R13: ffff8804383c3880 R14: ffff880028323a90 R15: ffffffff81534225 FS: 0000000000000000(0000) GS:ffff880028320000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00000000006df528 CR3: 0000000001a85000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper (pid: 0, threadinfo ffff880616b70000, task ffff880616b6cab0) Stack: ffff880028323c40 ffffffffa01c2582 ffff880614cfb020 0000000000000000 <d> 0100000000000000 00000014383a6c44 ffff8804383c3880 ffff880614e93c00 <d> ffff880614e93c00 0000000000000000 ffff8804632c4b00 ffff8804383c38b8 Call Trace: <IRQ> [<ffffffffa01c2582>] ? sctp_rcv+0x492/0xa10 [sctp] [<ffffffff8148c559>] ? nf_iterate+0x69/0xb0 [<ffffffff814974a0>] ? ip_local_deliver_finish+0x0/0x2d0 [<ffffffff8148c716>] ? nf_hook_slow+0x76/0x120 [<ffffffff814974a0>] ? ip_local_deliver_finish+0x0/0x2d0 [<ffffffff8149757d>] ? ip_local_deliver_finish+0xdd/0x2d0 [<ffffffff81497808>] ? ip_local_deliver+0x98/0xa0 [<ffffffff81496ccd>] ? ip_rcv_finish+0x12d/0x440 [<ffffffff81497255>] ? ip_rcv+0x275/0x350 [<ffffffff8145cfeb>] ? __netif_receive_skb+0x4ab/0x750 ... With lockdep debugging: ===================================== [ BUG: bad unlock balance detected! ] ------------------------------------- CslRx/12087 is trying to release lock (slock-AF_INET) at: [<ffffffffa01bcae0>] sctp_generate_timeout_event+0x40/0xe0 [sctp] but there are no more locks to release! other info that might help us debug this: 2 locks held by CslRx/12087: #0: (&asoc->timers[i]){+.-...}, at: [<ffffffff8108ce1f>] run_timer_softirq+0x16f/0x3e0 #1: (slock-AF_INET){+.-...}, at: [<ffffffffa01bcac3>] sctp_generate_timeout_event+0x23/0xe0 [sctp] Ensure the socket taken is also the same one that is released by saving a copy of the socket before entering the timeout event critical section. Signed-off-by: Karl Heiss <kheiss@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.2: - Net namespaces are not used - Keep using sctp_bh_{,un}lock_sock() - Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-26drm/radeon: fix hotplug race at startupDave Airlie1-0/+5
commit 7f98ca454ad373fc1b76be804fa7138ff68c1d27 upstream. We apparantly get a hotplug irq before we've initialised modesetting, [drm] Loading R100 Microcode BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<c125f56f>] __mutex_lock_slowpath+0x23/0x91 *pde = 00000000 Oops: 0002 [#1] Modules linked in: radeon(+) drm_kms_helper ttm drm i2c_algo_bit backlight pcspkr psmouse evdev sr_mod input_leds led_class cdrom sg parport_pc parport floppy intel_agp intel_gtt lpc_ich acpi_cpufreq processor button mfd_core agpgart uhci_hcd ehci_hcd rng_core snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm usbcore usb_common i2c_i801 i2c_core snd_timer snd soundcore thermal_sys CPU: 0 PID: 15 Comm: kworker/0:1 Not tainted 4.2.0-rc7-00015-gbf67402 #111 Hardware name: MicroLink /D850MV , BIOS MV85010A.86A.0067.P24.0304081124 04/08/2003 Workqueue: events radeon_hotplug_work_func [radeon] task: f6ca5900 ti: f6d3e000 task.ti: f6d3e000 EIP: 0060:[<c125f56f>] EFLAGS: 00010282 CPU: 0 EIP is at __mutex_lock_slowpath+0x23/0x91 EAX: 00000000 EBX: f5e900fc ECX: 00000000 EDX: fffffffe ESI: f6ca5900 EDI: f5e90100 EBP: f5e90000 ESP: f6d3ff0c DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 CR0: 8005003b CR2: 00000000 CR3: 36f61000 CR4: 000006d0 Stack: f5e90100 00000000 c103c4c1 f6d2a5a0 f5e900fc f6df394c c125f162 f8b0faca f6d2a5a0 c138ca00 f6df394c f7395600 c1034741 00d40000 00000000 f6d2a5a0 c138ca00 f6d2a5b8 c138ca10 c1034b58 00000001 f6d40000 f6ca5900 f6d0c940 Call Trace: [<c103c4c1>] ? dequeue_task_fair+0xa4/0xb7 [<c125f162>] ? mutex_lock+0x9/0xa [<f8b0faca>] ? radeon_hotplug_work_func+0x17/0x57 [radeon] [<c1034741>] ? process_one_work+0xfc/0x194 [<c1034b58>] ? worker_thread+0x18d/0x218 [<c10349cb>] ? rescuer_thread+0x1d5/0x1d5 [<c103742a>] ? kthread+0x7b/0x80 [<c12601c0>] ? ret_from_kernel_thread+0x20/0x30 [<c10373af>] ? init_completion+0x18/0x18 Code: 42 08 e8 8e a6 dd ff c3 57 56 53 83 ec 0c 8b 35 48 f7 37 c1 8b 10 4a 74 1a 89 c3 8d 78 04 8b 40 08 89 63 Reported-and-Tested-by: Meelis Roos <mroos@linux.ee> Signed-off-by: Dave Airlie <airlied@redhat.com> Cc: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-26udp: properly support MSG_PEEK with truncated buffersEric Dumazet2-4/+8
commit 197c949e7798fbf28cfadc69d9ca0c2abbf93191 upstream. Backport of this upstream commit into stable kernels : 89c22d8c3b27 ("net: Fix skb csum races when peeking") exposed a bug in udp stack vs MSG_PEEK support, when user provides a buffer smaller than skb payload. In this case, skb_copy_and_csum_datagram_iovec(skb, sizeof(struct udphdr), msg->msg_iov); returns -EFAULT. This bug does not happen in upstream kernels since Al Viro did a great job to replace this into : skb_copy_and_csum_datagram_msg(skb, sizeof(struct udphdr), msg); This variant is safe vs short buffers. For the time being, instead reverting Herbert Xu patch and add back skb->ip_summed invalid changes, simply store the result of udp_lib_checksum_complete() so that we avoid computing the checksum a second time, and avoid the problematic skb_copy_and_csum_datagram_iovec() call. This patch can be applied on recent kernels as it avoids a double checksumming, then backported to stable kernels as a bug fix. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-26net: Fix skb csum races when peekingHerbert Xu1-1/+2
[ Upstream commit 89c22d8c3b278212eef6a8cc66b570bc840a6f5a ] When we calculate the checksum on the recv path, we store the result in the skb as an optimisation in case we need the checksum again down the line. This is in fact bogus for the MSG_PEEK case as this is done without any locking. So multiple threads can peek and then store the result to the same skb, potentially resulting in bogus skb states. This patch fixes this by only storing the result if the skb is not shared. This preserves the optimisations for the few cases where it can be done safely due to locking or other reasons, e.g., SIOCINQ. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-26USB: ti_usb_3410_502: Fix ID table sizeBen Hutchings1-2/+2
Commit 35a2fbc941ac ("USB: serial: ti_usb_3410_5052: new device id for Abbot strip port cable") failed to update the size of the ti_id_table_3410 array. This doesn't need to be fixed upstream following commit d7ece6515e12 ("USB: ti_usb_3410_5052: remove vendor/product module parameters") but should be fixed in stable branches older than 3.12. Backports of commit c9d09dc7ad10 ("USB: serial: ti_usb_3410_5052: add Abbott strip port ID to combined table as well.") similarly failed to update the size of the ti_id_table_combined array. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-26af_unix: fix a fatal race with bit fieldsEric Dumazet2-8/+9
commit 60bc851ae59bfe99be6ee89d6bc50008c85ec75d upstream. Using bit fields is dangerous on ppc64/sparc64, as the compiler [1] uses 64bit instructions to manipulate them. If the 64bit word includes any atomic_t or spinlock_t, we can lose critical concurrent changes. This is happening in af_unix, where unix_sk(sk)->gc_candidate/ gc_maybe_cycle/lock share the same 64bit word. This leads to fatal deadlock, as one/several cpus spin forever on a spinlock that will never be available again. A safer way would be to use a long to store flags. This way we are sure compiler/arch wont do bad things. As we own unix_gc_lock spinlock when clearing or setting bits, we can use the non atomic __set_bit()/__clear_bit(). recursion_level can share the same 64bit location with the spinlock, as it is set only with this spinlock held. [1] bug fixed in gcc-4.8.0 : http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080 Reported-by: Ambrose Feinstein <ambrose@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: hejianet <hejianet@gmail.com> Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-26net: possible use after free in dst_releaseFrancesco Ruggeri1-1/+2
commit 07a5d38453599052aff0877b16bb9c1585f08609 upstream. dst_release should not access dst->flags after decrementing __refcnt to 0. The dst_entry may be in dst_busy_list and dst_gc_task may dst_destroy it before dst_release gets a chance to access dst->flags. Fixes: d69bbf88c8d0 ("net: fix a race in dst_release()") Fixes: 27b75c95f10d ("net: avoid RCU for NOCACHE dst") Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-26ftrace/scripts: Fix incorrect use of sprintf in recordmcountColin Ian King1-1/+1
commit 713a3e4de707fab49d5aa4bceb77db1058572a7b upstream. Fix build warning: scripts/recordmcount.c:589:4: warning: format not a string literal and no format arguments [-Wformat-security] sprintf("%s: failed\n", file); Fixes: a50bd43935586 ("ftrace/scripts: Have recordmcount copy the object file") Link: http://lkml.kernel.org/r/1451516801-16951-1-git-send-email-colin.king@canonical.com Cc: Li Bin <huawei.libin@huawei.com> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-26mm/memory_hotplug.c: check for missing sections in test_pages_in_a_zone()Andrew Banman1-12/+19
commit 5f0f2887f4de9508dcf438deab28f1de8070c271 upstream. test_pages_in_a_zone() does not account for the possibility of missing sections in the given pfn range. pfn_valid_within always returns 1 when CONFIG_HOLES_IN_ZONE is not set, allowing invalid pfns from missing sections to pass the test, leading to a kernel oops. Wrap an additional pfn loop with PAGES_PER_SECTION granularity to check for missing sections before proceeding into the zone-check code. This also prevents a crash from offlining memory devices with missing sections. Despite this, it may be a good idea to keep the related patch '[PATCH 3/3] drivers: memory: prohibit offlining of memory blocks with missing sections' because missing sections in a memory block may lead to other problems not covered by the scope of this fix. Signed-off-by: Andrew Banman <abanman@sgi.com> Acked-by: Alex Thorlton <athorlton@sgi.com> Cc: Russ Anderson <rja@sgi.com> Cc: Alex Thorlton <athorlton@sgi.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Greg KH <greg@kroah.com> Cc: Seth Jennings <sjennings@variantweb.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-26ocfs2: fix BUG when calculate new backup superJoseph Qi1-3/+12
commit 5c9ee4cbf2a945271f25b89b137f2c03bbc3be33 upstream. When resizing, it firstly extends the last gd. Once it should backup super in the gd, it calculates new backup super and update the corresponding value. But it currently doesn't consider the situation that the backup super is already done. And in this case, it still sets the bit in gd bitmap and then decrease from bg_free_bits_count, which leads to a corrupted gd and trigger the BUG in ocfs2_block_group_set_bits: BUG_ON(le16_to_cpu(bg->bg_free_bits_count) < num_bits); So check whether the backup super is done and then do the updates. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Reviewed-by: Jiufei Xue <xuejiufei@huawei.com> Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-26ipv6/addrlabel: fix ip6addrlbl_get()Andrey Ryabinin1-1/+1
commit e459dfeeb64008b2d23bdf600f03b3605dbb8152 upstream. ip6addrlbl_get() has never worked. If ip6addrlbl_hold() succeeded, ip6addrlbl_get() will exit with '-ESRCH'. If ip6addrlbl_hold() failed, ip6addrlbl_get() will use about to be free ip6addrlbl_entry pointer. Fix this by inverting ip6addrlbl_hold() check. Fixes: 2a8cc6c89039 ("[IPV6] ADDRCONF: Support RFC3484 configurable address selection policy table.") Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reviewed-by: Cong Wang <cwang@twopensource.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-26parisc: Fix syscall restartsHelge Deller1-15/+52
commit 71a71fb5374a23be36a91981b5614590b9e722c3 upstream. On parisc syscalls which are interrupted by signals sometimes failed to restart and instead returned -ENOSYS which in the worst case lead to userspace crashes. A similiar problem existed on MIPS and was fixed by commit e967ef02 ("MIPS: Fix restart of indirect syscalls"). On parisc the current syscall restart code assumes that all syscall callers load the syscall number in the delay slot of the ble instruction. That's how it is e.g. done in the unistd.h header file: ble 0x100(%sr2, %r0) ldi #syscall_nr, %r20 Because of that assumption the current code never restored %r20 before returning to userspace. This assumption is at least not true for code which uses the glibc syscall() function, which instead uses this syntax: ble 0x100(%sr2, %r0) copy regX, %r20 where regX depend on how the compiler optimizes the code and register usage. This patch fixes this problem by adding code to analyze how the syscall number is loaded in the delay branch and - if needed - copy the syscall number to regX prior returning to userspace for the syscall restart. Signed-off-by: Helge Deller <deller@gmx.de> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-26KEYS: Fix race between read and revokeDavid Howells1-9/+9
commit b4a1b4f5047e4f54e194681125c74c0aa64d637d upstream. This fixes CVE-2015-7550. There's a race between keyctl_read() and keyctl_revoke(). If the revoke happens between keyctl_read() checking the validity of a key and the key's semaphore being taken, then the key type read method will see a revoked key. This causes a problem for the user-defined key type because it assumes in its read method that there will always be a payload in a non-revoked key and doesn't check for a NULL pointer. Fix this by making keyctl_read() check the validity of a key after taking semaphore instead of before. I think the bug was introduced with the original keyrings code. This was discovered by a multithreaded test program generated by syzkaller (http://github.com/google/syzkaller). Here's a cleaned up version: #include <sys/types.h> #include <keyutils.h> #include <pthread.h> void *thr0(void *arg) { key_serial_t key = (unsigned long)arg; keyctl_revoke(key); return 0; } void *thr1(void *arg) { key_serial_t key = (unsigned long)arg; char buffer[16]; keyctl_read(key, buffer, 16); return 0; } int main() { key_serial_t key = add_key("user", "%", "foo", 3, KEY_SPEC_USER_KEYRING); pthread_t th[5]; pthread_create(&th[0], 0, thr0, (void *)(unsigned long)key); pthread_create(&th[1], 0, thr1, (void *)(unsigned long)key); pthread_create(&th[2], 0, thr0, (void *)(unsigned long)key); pthread_create(&th[3], 0, thr1, (void *)(unsigned long)key); pthread_join(th[0], 0); pthread_join(th[1], 0); pthread_join(th[2], 0); pthread_join(th[3], 0); return 0; } Build as: cc -o keyctl-race keyctl-race.c -lkeyutils -lpthread Run as: while keyctl-race; do :; done as it may need several iterations to crash the kernel. The crash can be summarised as: BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 IP: [<ffffffff81279b08>] user_read+0x56/0xa3 ... Call Trace: [<ffffffff81276aa9>] keyctl_read_key+0xb6/0xd7 [<ffffffff81277815>] SyS_keyctl+0x83/0xe0 [<ffffffff815dbb97>] entry_SYSCALL_64_fastpath+0x12/0x6f Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: James Morris <james.l.morris@oracle.com> Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-26USB: fix invalid memory access in hub_activate()Alan Stern1-3/+20
commit e50293ef9775c5f1cf3fcc093037dd6a8c5684ea upstream. Commit 8520f38099cc ("USB: change hub initialization sleeps to delayed_work") changed the hub_activate() routine to make part of it run in a workqueue. However, the commit failed to take a reference to the usb_hub structure or to lock the hub interface while doing so. As a result, if a hub is plugged in and quickly unplugged before the work routine can run, the routine will try to access memory that has been deallocated. Or, if the hub is unplugged while the routine is running, the memory may be deallocated while it is in active use. This patch fixes the problem by taking a reference to the usb_hub at the start of hub_activate() and releasing it at the end (when the work is finished), and by locking the hub interface while the work routine is running. It also adds a check at the start of the routine to see if the hub has already been disconnected, in which nothing should be done. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reported-by: Alexandru Cornea <alexandru.cornea@intel.com> Tested-by: Alexandru Cornea <alexandru.cornea@intel.com> Fixes: 8520f38099cc ("USB: change hub initialization sleeps to delayed_work") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [lizf: Backported to 3.4: add forward declaration of hub_release()] Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-26USB: ipaq.c: fix a timeout loopDan Carpenter1-1/+2
commit abdc9a3b4bac97add99e1d77dc6d28623afe682b upstream. The code expects the loop to end with "retries" set to zero but, because it is a post-op, it will end set to -1. I have fixed this by moving the decrement inside the loop. Fixes: 014aa2a3c32e ('USB: ipaq: minor ipaq_open() cleanup.') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-26xen/pciback: Don't allow MSI-X ops if PCI_COMMAND_MEMORY is not set.Konrad Rzeszutek Wilk1-1/+7
commit 408fb0e5aa7fda0059db282ff58c3b2a4278baa0 upstream. commit f598282f51 ("PCI: Fix the NIU MSI-X problem in a better way") teaches us that dealing with MSI-X can be troublesome. Further checks in the MSI-X architecture shows that if the PCI_COMMAND_MEMORY bit is turned of in the PCI_COMMAND we may not be able to access the BAR (since they are memory regions). Since the MSI-X tables are located in there.. that can lead to us causing PCIe errors. Inhibit us performing any operation on the MSI-X unless the MEMORY bit is set. Note that Xen hypervisor with: "x86/MSI-X: access MSI-X table only after having enabled MSI-X" will return: xen_pciback: 0000:0a:00.1: error -6 enabling MSI-X for guest 3! When the generic MSI code tries to setup the PIRQ without MEMORY bit set. Which means with later versions of Xen (4.6) this patch is not neccessary. This is part of XSA-157 Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-26xen/pciback: For XEN_PCI_OP_disable_msi[|x] only disable if device has ↵Konrad Rzeszutek Wilk1-13/+20
MSI(X) enabled. commit 7cfb905b9638982862f0331b36ccaaca5d383b49 upstream. Otherwise just continue on, returning the same values as previously (return of 0, and op->result has the PIRQ value). This does not change the behavior of XEN_PCI_OP_disable_msi[|x]. The pci_disable_msi or pci_disable_msix have the checks for msi_enabled or msix_enabled so they will error out immediately. However the guest can still call these operations and cause us to disable the 'ack_intr'. That means the backend IRQ handler for the legacy interrupt will not respond to interrupts anymore. This will lead to (if the device is causing an interrupt storm) for the Linux generic code to disable the interrupt line. Naturally this will only happen if the device in question is plugged in on the motherboard on shared level interrupt GSI. This is part of XSA-157 Reviewed-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-26xen/pciback: Do not install an IRQ handler for MSI interrupts.Konrad Rzeszutek Wilk1-0/+7
commit a396f3a210c3a61e94d6b87ec05a75d0be2a60d0 upstream. Otherwise an guest can subvert the generic MSI code to trigger an BUG_ON condition during MSI interrupt freeing: for (i = 0; i < entry->nvec_used; i++) BUG_ON(irq_has_action(entry->irq + i)); Xen PCI backed installs an IRQ handler (request_irq) for the dev->irq whenever the guest writes PCI_COMMAND_MEMORY (or PCI_COMMAND_IO) to the PCI_COMMAND register. This is done in case the device has legacy interrupts the GSI line is shared by the backend devices. To subvert the backend the guest needs to make the backend to change the dev->irq from the GSI to the MSI interrupt line, make the backend allocate an interrupt handler, and then command the backend to free the MSI interrupt and hit the BUG_ON. Since the backend only calls 'request_irq' when the guest writes to the PCI_COMMAND register the guest needs to call XEN_PCI_OP_enable_msi before any other operation. This will cause the generic MSI code to setup an MSI entry and populate dev->irq with the new PIRQ value. Then the guest can write to PCI_COMMAND PCI_COMMAND_MEMORY and cause the backend to setup an IRQ handler for dev->irq (which instead of the GSI value has the MSI pirq). See 'xen_pcibk_control_isr'. Then the guest disables the MSI: XEN_PCI_OP_disable_msi which ends up triggering the BUG_ON condition in 'free_msi_irqs' as there is an IRQ handler for the entry->irq (dev->irq). Note that this cannot be done using MSI-X as the generic code does not over-write dev->irq with the MSI-X PIRQ values. The patch inhibits setting up the IRQ handler if MSI or MSI-X (for symmetry reasons) code had been called successfully. P.S. Xen PCIBack when it sets up the device for the guest consumption ends up writting 0 to the PCI_COMMAND (see xen_pcibk_reset_device). XSA-120 addendum patch removed that - however when upstreaming said addendum we found that it caused issues with qemu upstream. That has now been fixed in qemu upstream. This is part of XSA-157 Reviewed-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-26xen/pciback: Return error on XEN_PCI_OP_enable_msix when device has MSI or ↵Konrad Rzeszutek Wilk1-0/+7
MSI-X enabled commit 5e0ce1455c09dd61d029b8ad45d82e1ac0b6c4c9 upstream. The guest sequence of: a) XEN_PCI_OP_enable_msix b) XEN_PCI_OP_enable_msix results in hitting an NULL pointer due to using freed pointers. The device passed in the guest MUST have MSI-X capability. The a) constructs and SysFS representation of MSI and MSI groups. The b) adds a second set of them but adding in to SysFS fails (duplicate entry). 'populate_msi_sysfs' frees the newly allocated msi_irq_groups (note that in a) pdev->msi_irq_groups is still set) and also free's ALL of the MSI-X entries of the device (the ones allocated in step a) and b)). The unwind code: 'free_msi_irqs' deletes all the entries and tries to delete the pdev->msi_irq_groups (which hasn't been set to NULL). However the pointers in the SysFS are already freed and we hit an NULL pointer further on when 'strlen' is attempted on a freed pointer. The patch adds a simple check in the XEN_PCI_OP_enable_msix to guard against that. The check for msi_enabled is not stricly neccessary. This is part of XSA-157 Reviewed-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-26xen/pciback: Return error on XEN_PCI_OP_enable_msi when device has MSI or ↵Konrad Rzeszutek Wilk1-1/+6
MSI-X enabled commit 56441f3c8e5bd45aab10dd9f8c505dd4bec03b0d upstream. The guest sequence of: a) XEN_PCI_OP_enable_msi b) XEN_PCI_OP_enable_msi c) XEN_PCI_OP_disable_msi results in hitting an BUG_ON condition in the msi.c code. The MSI code uses an dev->msi_list to which it adds MSI entries. Under the above conditions an BUG_ON() can be hit. The device passed in the guest MUST have MSI capability. The a) adds the entry to the dev->msi_list and sets msi_enabled. The b) adds a second entry but adding in to SysFS fails (duplicate entry) and deletes all of the entries from msi_list and returns (with msi_enabled is still set). c) pci_disable_msi passes the msi_enabled checks and hits: BUG_ON(list_empty(dev_to_msi_list(&dev->dev))); and blows up. The patch adds a simple check in the XEN_PCI_OP_enable_msi to guard against that. The check for msix_enabled is not stricly neccessary. This is part of XSA-157. Reviewed-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-26xen/pciback: Save xen_pci_op commands before processing itKonrad Rzeszutek Wilk2-1/+15
commit 8135cf8b092723dbfcc611fe6fdcb3a36c9951c5 upstream. Double fetch vulnerabilities that happen when a variable is fetched twice from shared memory but a security check is only performed the first time. The xen_pcibk_do_op function performs a switch statements on the op->cmd value which is stored in shared memory. Interestingly this can result in a double fetch vulnerability depending on the performed compiler optimization. This patch fixes it by saving the xen_pci_op command before processing it. We also use 'barrier' to make sure that the compiler does not perform any optimization. This is part of XSA155. Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Jan Beulich <JBeulich@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-26xen-blkback: only read request operation from shared ring onceRoger Pau Monné1-4/+4
commit 1f13d75ccb806260079e0679d55d9253e370ec8a upstream. A compiler may load a switch statement value multiple times, which could be bad when the value is in memory shared with the frontend. When converting a non-native request to a native one, ensure that src->operation is only loaded once by using READ_ONCE(). This is part of XSA155. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [lizf: Backported to 3.4: - adjust context - call ACCESS_ONCE instead of READ_ONCE] Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-26xen-netback: use RING_COPY_REQUEST() throughoutDavid Vrabel1-16/+14
commit 68a33bfd8403e4e22847165d149823a2e0e67c9c upstream. Instead of open-coding memcpy()s and directly accessing Tx and Rx requests, use the new RING_COPY_REQUEST() that ensures the local copy is correct. This is more than is strictly necessary for guest Rx requests since only the id and gref fields are used and it is harmless if the frontend modifies these. This is part of XSA155. Reviewed-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [lizf: Backported to 3.4: - adjust context - s/queue/vif/g] Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-26xen-netback: don't use last request to determine minimum Tx creditDavid Vrabel1-3/+1
commit 0f589967a73f1f30ab4ac4dd9ce0bb399b4d6357 upstream. The last from guest transmitted request gives no indication about the minimum amount of credit that the guest might need to send a packet since the last packet might have been a small one. Instead allow for the worst case 128 KiB packet. This is part of XSA155. Reviewed-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [lizf: Backported to 3.4: s/queue/vif/g] Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-26xen: Add RING_COPY_REQUEST()David Vrabel1-0/+14
commit 454d5d882c7e412b840e3c99010fe81a9862f6fb upstream. Using RING_GET_REQUEST() on a shared ring is easy to use incorrectly (i.e., by not considering that the other end may alter the data in the shared ring while it is being inspected). Safe usage of a request generally requires taking a local copy. Provide a RING_COPY_REQUEST() macro to use instead of RING_GET_REQUEST() and an open-coded memcpy(). This takes care of ensuring that the copy is done correctly regardless of any possible compiler optimizations. Use a volatile source to prevent the compiler from reordering or omitting the copy. This is part of XSA155. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-26ftrace/scripts: Have recordmcount copy the object fileSteven Rostedt (Red Hat)1-35/+110
commit a50bd43935586420fb75f4558369eb08566fac5e upstream. Russell King found that he had weird side effects when compiling the kernel with hard linked ccache. The reason was that recordmcount modified the kernel in place via mmap, and when a file gets modified twice by recordmcount, it will complain about it. To fix this issue, Russell wrote a patch that checked if the file was hard linked more than once and would unlink it if it was. Linus Torvalds was not happy with the fact that recordmcount does this in place modification. Instead of doing the unlink only if the file has two or more hard links, it does the unlink all the time. In otherwords, it always does a copy if it changed something. That is, it does the write out if a change was made. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-26scripts: recordmcount: break hardlinksRussell King1-0/+14
commit dd39a26538e37f6c6131e829a4a510787e43c783 upstream. recordmcount edits the file in-place, which can cause problems when using ccache in hardlink mode. Arrange for recordmcount to break a hardlinked object. Link: http://lkml.kernel.org/r/E1a7MVT-0000et-62@rmk-PC.arm.linux.org.uk Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-26spi: fix parent-device reference leakJohan Hovold1-1/+1
commit 157f38f993919b648187ba341bfb05d0e91ad2f6 upstream. Fix parent-device reference leak due to SPI-core taking an unnecessary reference to the parent when allocating the master structure, a reference that was never released. Note that driver core takes its own reference to the parent when the master device is registered. Fixes: 49dce689ad4e ("spi doesn't need class_device") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-26ser_gigaset: fix deallocation of platform device structureTilman Schmidt1-3/+7
commit 4c5e354a974214dfb44cd23fa0429327693bc3ea upstream. When shutting down the device, the struct ser_cardstate must not be kfree()d immediately after the call to platform_device_unregister() since the embedded struct platform_device is still in use. Move the kfree() call to the release method instead. Signed-off-by: Tilman Schmidt <tilman@imap.cc> Fixes: 2869b23e4b95 ("drivers/isdn/gigaset: new M101 driver (v2)") Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-26mISDN: fix a loop countDan Carpenter1-4/+3
commit 40d24c4d8a7430aa4dfd7a665fa3faf3b05b673f upstream. There are two issue here. 1) cnt starts as maxloop + 1 so all these loops iterate one more time than intended. 2) At the end of the loop we test for "if (maxloop && !cnt)" but for the first two loops, we end with cnt equal to -1. Changing this to a pre-op means we end with cnt set to 0. Fixes: cae86d4a4e56 ('mISDN: Add driver for Infineon ISDN chipset family') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-26ARM: 8471/1: need to save/restore arm register(r11) when it is corruptedAnson Huang1-2/+2
commit fa0708b320f6da4c1104fe56e01b7abf66fd16ad upstream. In cpu_v7_do_suspend routine, r11 is used while it is NOT saved/restored, different compiler may have different usage of ARM general registers, so it may cause issues during calling cpu_v7_do_suspend. We meet kernel fault occurs when using GCC 4.8.3, r11 contains valid value before calling into cpu_v7_do_suspend, but when returned from this routine, r11 is corrupted and lead to kernel fault. Doing save/restore for those corrupted registers is a must in assemble code. Signed-off-by: Anson Huang <Anson.Huang@freescale.com> Reviewed-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-26sh_eth: fix TX buffer byte-swappingSergei Shtylyov1-2/+1
commit 3e2309937f1e5d538ff13da5fb8de41196927c61 upstream. For the little-endian SH771x kernels the driver has to byte-swap the RX/TX buffers, however yet unset physcial address from the TX descriptor is used to call sh_eth_soft_swap(). Use 'skb->data' instead... Fixes: 31fcb99d9958 ("net: sh_eth: remove __flush_purge_region") Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-26genirq: Prevent chip buslock deadlockThomas Gleixner1-3/+3
commit abc7e40c81d113ef4bacb556f0a77ca63ac81d85 upstream. If a interrupt chip utilizes chip->buslock then free_irq() can deadlock in the following way: CPU0 CPU1 interrupt(X) (Shared or spurious) free_irq(X) interrupt_thread(X) chip_bus_lock(X) irq_finalize_oneshot(X) chip_bus_lock(X) synchronize_irq(X) synchronize_irq() waits for the interrupt thread to complete, i.e. forever. Solution is simple: Drop chip_bus_lock() before calling synchronize_irq() as we do with the irq_desc lock. There is nothing to be protected after the point where irq_desc lock has been released. This adds chip_bus_lock/unlock() to the remove_irq() code path, but that's actually correct in the case where remove_irq() is called on such an interrupt. The current users of remove_irq() are not affected as none of those interrupts is on a chip which requires buslock. Reported-by: Fredrik Markström <fredrik.markstrom@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-26tty: Fix GPF in flush_to_ldisc()Peter Hurley1-1/+2
commit 9ce119f318ba1a07c29149301f1544b6c4bea52a upstream. A line discipline which does not define a receive_buf() method can can cause a GPF if data is ever received [1]. Oddly, this was known to the author of n_tracesink in 2011, but never fixed. [1] GPF report BUG: unable to handle kernel NULL pointer dereference at (null) IP: [< (null)>] (null) PGD 3752d067 PUD 37a7b067 PMD 0 Oops: 0010 [#1] SMP KASAN Modules linked in: CPU: 2 PID: 148 Comm: kworker/u10:2 Not tainted 4.4.0-rc2+ #51 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Workqueue: events_unbound flush_to_ldisc task: ffff88006da94440 ti: ffff88006db60000 task.ti: ffff88006db60000 RIP: 0010:[<0000000000000000>] [< (null)>] (null) RSP: 0018:ffff88006db67b50 EFLAGS: 00010246 RAX: 0000000000000102 RBX: ffff88003ab32f88 RCX: 0000000000000102 RDX: 0000000000000000 RSI: ffff88003ab330a6 RDI: ffff88003aabd388 RBP: ffff88006db67c48 R08: ffff88003ab32f9c R09: ffff88003ab31fb0 R10: ffff88003ab32fa8 R11: 0000000000000000 R12: dffffc0000000000 R13: ffff88006db67c20 R14: ffffffff863df820 R15: ffff88003ab31fb8 FS: 0000000000000000(0000) GS:ffff88006dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000000 CR3: 0000000037938000 CR4: 00000000000006e0 Stack: ffffffff829f46f1 ffff88006da94bf8 ffff88006da94bf8 0000000000000000 ffff88003ab31fb0 ffff88003aabd438 ffff88003ab31ff8 ffff88006430fd90 ffff88003ab32f9c ffffed0007557a87 1ffff1000db6cf78 ffff88003ab32078 Call Trace: [<ffffffff8127cf91>] process_one_work+0x8f1/0x17a0 kernel/workqueue.c:2030 [<ffffffff8127df14>] worker_thread+0xd4/0x1180 kernel/workqueue.c:2162 [<ffffffff8128faaf>] kthread+0x1cf/0x270 drivers/block/aoe/aoecmd.c:1302 [<ffffffff852a7c2f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:468 Code: Bad RIP value. RIP [< (null)>] (null) RSP <ffff88006db67b50> CR2: 0000000000000000 ---[ end trace a587f8947e54d6ea ]--- Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [lizf: Backportd to 3.4: adjust context] Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-26mm: hugetlb: call huge_pte_alloc() only if ptep is nullNaoya Horiguchi1-4/+4
commit 0d777df5d8953293be090d9ab5a355db893e8357 upstream. Currently at the beginning of hugetlb_fault(), we call huge_pte_offset() and check whether the obtained *ptep is a migration/hwpoison entry or not. And if not, then we get to call huge_pte_alloc(). This is racy because the *ptep could turn into migration/hwpoison entry after the huge_pte_offset() check. This race results in BUG_ON in huge_pte_alloc(). We don't have to call huge_pte_alloc() when the huge_pte_offset() returns non-NULL, so let's fix this bug with moving the code into else block. Note that the *ptep could turn into a migration/hwpoison entry after this block, but that's not a problem because we have another !pte_present check later (we never go into hugetlb_no_page() in that case.) Fixes: 290408d4a250 ("hugetlb: hugepage migration core") Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-26mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any ↵Michal Hocko2-5/+20
progress commit 373ccbe5927034b55bdc80b0f8b54d6e13fe8d12 upstream. Tetsuo Handa has reported that the system might basically livelock in OOM condition without triggering the OOM killer. The issue is caused by internal dependency of the direct reclaim on vmstat counter updates (via zone_reclaimable) which are performed from the workqueue context. If all the current workers get assigned to an allocation request, though, they will be looping inside the allocator trying to reclaim memory but zone_reclaimable can see stalled numbers so it will consider a zone reclaimable even though it has been scanned way too much. WQ concurrency logic will not consider this situation as a congested workqueue because it relies that worker would have to sleep in such a situation. This also means that it doesn't try to spawn new workers or invoke the rescuer thread if the one is assigned to the queue. In order to fix this issue we need to do two things. First we have to let wq concurrency code know that we are in trouble so we have to do a short sleep. In order to prevent from issues handled by 0e093d99763e ("writeback: do not sleep on the congestion queue if there are no congested BDIs or if significant congestion is not being encountered in the current zone") we limit the sleep only to worker threads which are the ones of the interest anyway. The second thing to do is to create a dedicated workqueue for vmstat and mark it WQ_MEM_RECLAIM to note it participates in the reclaim and to have a spare worker thread for it. Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Tejun Heo <tj@kernel.org> Cc: Cristopher Lameter <clameter@sgi.com> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Arkadiusz Miskiewicz <arekm@maven.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-26parisc iommu: fix panic due to trying to allocate too large regionMikulas Patocka1-7/+8
commit e46e31a3696ae2d66f32c207df3969613726e636 upstream. When using the Promise TX2+ SATA controller on PA-RISC, the system often crashes with kernel panic, for example just writing data with the dd utility will make it crash. Kernel panic - not syncing: drivers/parisc/sba_iommu.c: I/O MMU @ 000000000000a000 is out of mapping resources CPU: 0 PID: 18442 Comm: mkspadfs Not tainted 4.4.0-rc2 #2 Backtrace: [<000000004021497c>] show_stack+0x14/0x20 [<0000000040410bf0>] dump_stack+0x88/0x100 [<000000004023978c>] panic+0x124/0x360 [<0000000040452c18>] sba_alloc_range+0x698/0x6a0 [<0000000040453150>] sba_map_sg+0x260/0x5b8 [<000000000c18dbb4>] ata_qc_issue+0x264/0x4a8 [libata] [<000000000c19535c>] ata_scsi_translate+0xe4/0x220 [libata] [<000000000c19a93c>] ata_scsi_queuecmd+0xbc/0x320 [libata] [<0000000040499bbc>] scsi_dispatch_cmd+0xfc/0x130 [<000000004049da34>] scsi_request_fn+0x6e4/0x970 [<00000000403e95a8>] __blk_run_queue+0x40/0x60 [<00000000403e9d8c>] blk_run_queue+0x3c/0x68 [<000000004049a534>] scsi_run_queue+0x2a4/0x360 [<000000004049be68>] scsi_end_request+0x1a8/0x238 [<000000004049de84>] scsi_io_completion+0xfc/0x688 [<0000000040493c74>] scsi_finish_command+0x17c/0x1d0 The cause of the crash is not exhaustion of the IOMMU space, there is plenty of free pages. The function sba_alloc_range is called with size 0x11000, thus the pages_needed variable is 0x11. The function sba_search_bitmap is called with bits_wanted 0x11 and boundary size is 0x10 (because dma_get_seg_boundary(dev) returns 0xffff). The function sba_search_bitmap attempts to allocate 17 pages that must not cross 16-page boundary - it can't satisfy this requirement (iommu_is_span_boundary always returns true) and fails even if there are many free entries in the IOMMU space. How did it happen that we try to allocate 17 pages that don't cross 16-page boundary? The cause is in the function iommu_coalesce_chunks. This function tries to coalesce adjacent entries in the scatterlist. The function does several checks if it may coalesce one entry with the next, one of those checks is this: if (startsg->length + dma_len > max_seg_size) break; When it finishes coalescing adjacent entries, it allocates the mapping: sg_dma_len(contig_sg) = dma_len; dma_len = ALIGN(dma_len + dma_offset, IOVP_SIZE); sg_dma_address(contig_sg) = PIDE_FLAG | (iommu_alloc_range(ioc, dev, dma_len) << IOVP_SHIFT) | dma_offset; It is possible that (startsg->length + dma_len > max_seg_size) is false (we are just near the 0x10000 max_seg_size boundary), so the funcion decides to coalesce this entry with the next entry. When the coalescing succeeds, the function performs dma_len = ALIGN(dma_len + dma_offset, IOVP_SIZE); And now, because of non-zero dma_offset, dma_len is greater than 0x10000. iommu_alloc_range (a pointer to sba_alloc_range) is called and it attempts to allocate 17 pages for a device that must not cross 16-page boundary. To fix the bug, we must make sure that dma_len after addition of dma_offset and alignment doesn't cross the segment boundary. I.e. change if (startsg->length + dma_len > max_seg_size) break; to if (ALIGN(dma_len + dma_offset + startsg->length, IOVP_SIZE) > max_seg_size) break; This patch makes this change (it precalculates max_seg_boundary at the beginning of the function iommu_coalesce_chunks). I also added a check that the mapping length doesn't exceed dma_get_seg_boundary(dev) (it is not needed for Promise TX2+ SATA, but it may be needed for other devices that have dma_get_seg_boundary lower than dma_get_max_seg_size). Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-26ses: fix additional element traversal bugJames Bottomley2-1/+13
commit 5e1033561da1152c57b97ee84371dba2b3d64c25 upstream. KASAN found that our additional element processing scripts drop off the end of the VPD page into unallocated space. The reason is that not every element has additional information but our traversal routines think they do, leading to them expecting far more additional information than is present. Fix this by adding a gate to the traversal routine so that it only processes elements that are expected to have additional information (list is in SES-2 section 6.1.13.1: Additional Element Status diagnostic page overview) Reported-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Tested-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-26vgaarb: fix signal handling in vga_get()Kirill A. Shutemov1-2/+4
commit 9f5bd30818c42c6c36a51f93b4df75a2ea2bd85e upstream. There are few defects in vga_get() related to signal hadning: - we shouldn't check for pending signals for TASK_UNINTERRUPTIBLE case; - if we found pending signal we must remove ourself from wait queue and change task state back to running; - -ERESTARTSYS is more appropriate, I guess. Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name> Reviewed-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-26ses: Fix problems with simple enclosuresJames Bottomley1-1/+19
commit 3417c1b5cb1fdc10261dbed42b05cc93166a78fd upstream. Simple enclosure implementations (mostly USB) are allowed to return only page 8 to every diagnostic query. That really confuses our implementation because we assume the return is the page we asked for and end up doing incorrect offsets based on bogus information leading to accesses outside of allocated ranges. Fix that by checking the page code of the return and giving an error if it isn't the one we asked for. This should fix reported bugs with USB storage by simply refusing to attach to enclosures that behave like this. It's also good defensive practise now that we're starting to see more USB enclosures. Reported-by: Andrea Gelmini <andrea.gelmini@gelma.net> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-26dm btree: fix bufio buffer leaks in dm_btree_del() error pathJoe Thornber1-1/+15
commit ed8b45a3679eb49069b094c0711b30833f27c734 upstream. If dm_btree_del()'s call to push_frame() fails, e.g. due to btree_node_validator finding invalid metadata, the dm_btree_del() error path must unlock all frames (which have active dm-bufio buffers) that were pushed onto the del_stack. Otherwise, dm_bufio_client_destroy() will BUG_ON() because dm-bufio buffers have leaked, e.g.: device-mapper: bufio: leaked buffer 3, hold count 1, list 0 Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-26rfkill: copy the name into the rfkill structJohannes Berg1-3/+3
commit b7bb110008607a915298bf0f47d25886ecb94477 upstream. Some users of rfkill, like NFC and cfg80211, use a dynamic name when allocating rfkill, in those cases dev_name(). Therefore, the pointer passed to rfkill_alloc() might not be valid forever, I specifically found the case that the rfkill name was quite obviously an invalid pointer (or at least garbage) when the wiphy had been renamed. Fix this by making a copy of the rfkill name in rfkill_alloc(). Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Zefan Li <lizefan@huawei.com>