summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2015-06-11x86/crash: Allocate enough low memory when crashkernel=highJoerg Roedel1-5/+7
When the crash kernel is loaded above 4GiB in memory, the first kernel allocates only 72MiB of low-memory for the DMA requirements of the second kernel. On systems with many devices this is not enough and causes device driver initialization errors and failed crash dumps. Testing by SUSE and Redhat has shown that 256MiB is a good default value for now and the discussion has lead to this value as well. So set this default value to 256MiB to make sure there is enough memory available for DMA. Signed-off-by: Joerg Roedel <jroedel@suse.de> [ Reflow comment. ] Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Baoquan He <bhe@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Young <dyoung@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jörg Rödel <joro@8bytes.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: kexec@lists.infradead.org Link: http://lkml.kernel.org/r/1433500202-25531-4-git-send-email-joro@8bytes.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-11x86/swiotlb: Try coherent allocations with __GFP_NOWARNJoerg Roedel1-0/+7
When we boot a kdump kernel in high memory, there is by default only 72MB of low memory available. The swiotlb code takes 64MB of it (by default) so that there are only 8MB left to allocate from. On systems with many devices this causes page allocator warnings from dma_generic_alloc_coherent(): systemd-udevd: page allocation failure: order:0, mode:0x280d4 CPU: 0 PID: 197 Comm: systemd-udevd Tainted: G W 3.12.28-4-default #1 Hardware name: HP ProLiant DL980 G7, BIOS P66 07/30/2012 ffff8800781335e0 ffffffff8150b1db 00000000000280d4 ffffffff8113af90 0000000000000000 0000000000000000 ffff88007efdbb00 0000000100000000 0000000000000000 0000000000000000 0000000000000000 0000000000000001 Call Trace: dump_trace+0x7d/0x2d0 show_stack_log_lvl+0x94/0x170 show_stack+0x21/0x50 dump_stack+0x41/0x51 warn_alloc_failed+0xf0/0x160 __alloc_pages_slowpath+0x72f/0x796 __alloc_pages_nodemask+0x1ea/0x210 dma_generic_alloc_coherent+0x96/0x140 x86_swiotlb_alloc_coherent+0x1c/0x50 ttm_dma_pool_alloc_new_pages+0xab/0x320 [ttm] ttm_dma_populate+0x3ce/0x640 [ttm] ttm_tt_bind+0x36/0x60 [ttm] ttm_bo_handle_move_mem+0x55f/0x5c0 [ttm] ttm_bo_move_buffer+0x105/0x130 [ttm] ttm_bo_validate+0xc1/0x130 [ttm] ttm_bo_init+0x24b/0x400 [ttm] radeon_bo_create+0x16c/0x200 [radeon] radeon_ring_init+0x11e/0x2b0 [radeon] r100_cp_init+0x123/0x5b0 [radeon] r100_startup+0x194/0x230 [radeon] r100_init+0x223/0x410 [radeon] radeon_device_init+0x6af/0x830 [radeon] radeon_driver_load_kms+0x89/0x180 [radeon] drm_get_pci_dev+0x121/0x2f0 [drm] local_pci_probe+0x39/0x60 pci_device_probe+0xa9/0x120 driver_probe_device+0x9d/0x3d0 __driver_attach+0x8b/0x90 bus_for_each_dev+0x5b/0x90 bus_add_driver+0x1f8/0x2c0 driver_register+0x5b/0xe0 do_one_initcall+0xf2/0x1a0 load_module+0x1207/0x1c70 SYSC_finit_module+0x75/0xa0 system_call_fastpath+0x16/0x1b 0x7fac533d2788 After these warnings the code enters a fall-back path and allocated directly from the swiotlb aperture in the end. So remove these warnings as this is not a fatal error. Signed-off-by: Joerg Roedel <jroedel@suse.de> [ Simplify, reflow comment. ] Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Baoquan He <bhe@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Young <dyoung@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jörg Rödel <joro@8bytes.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: kexec@lists.infradead.org Link: http://lkml.kernel.org/r/1433500202-25531-3-git-send-email-joro@8bytes.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-11swiotlb: Warn on allocation failure in swiotlb_alloc_coherent()Joerg Roedel1-2/+9
Print a warning when all allocation tries have been failed and the function is about to return NULL. This prepares for calling the function with __GFP_NOWARN to suppress allocation failure warnings before all fall-backs have failed - which we'll do to improve kdump behavior. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Young <dyoung@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jörg Rödel <joro@8bytes.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: kexec@lists.infradead.org Link: http://lkml.kernel.org/r/1433500202-25531-2-git-send-email-joro@8bytes.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-11net, swap: Remove a warning and clarify why sk_mem_reclaim is required when ↵Mel Gorman1-8/+5
deactivating swap Jeff Layton reported the following; [ 74.232485] ------------[ cut here ]------------ [ 74.233354] WARNING: CPU: 2 PID: 754 at net/core/sock.c:364 sk_clear_memalloc+0x51/0x80() [ 74.234790] Modules linked in: cts rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache xfs libcrc32c snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device nfsd snd_pcm snd_timer snd e1000 ppdev parport_pc joydev parport pvpanic soundcore floppy serio_raw i2c_piix4 pcspkr nfs_acl lockd virtio_balloon acpi_cpufreq auth_rpcgss grace sunrpc qxl drm_kms_helper ttm drm virtio_console virtio_blk virtio_pci ata_generic virtio_ring pata_acpi virtio [ 74.243599] CPU: 2 PID: 754 Comm: swapoff Not tainted 4.1.0-rc6+ #5 [ 74.244635] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 74.245546] 0000000000000000 0000000079e69e31 ffff8800d066bde8 ffffffff8179263d [ 74.246786] 0000000000000000 0000000000000000 ffff8800d066be28 ffffffff8109e6fa [ 74.248175] 0000000000000000 ffff880118d48000 ffff8800d58f5c08 ffff880036e380a8 [ 74.249483] Call Trace: [ 74.249872] [<ffffffff8179263d>] dump_stack+0x45/0x57 [ 74.250703] [<ffffffff8109e6fa>] warn_slowpath_common+0x8a/0xc0 [ 74.251655] [<ffffffff8109e82a>] warn_slowpath_null+0x1a/0x20 [ 74.252585] [<ffffffff81661241>] sk_clear_memalloc+0x51/0x80 [ 74.253519] [<ffffffffa0116c72>] xs_disable_swap+0x42/0x80 [sunrpc] [ 74.254537] [<ffffffffa01109de>] rpc_clnt_swap_deactivate+0x7e/0xc0 [sunrpc] [ 74.255610] [<ffffffffa03e4fd7>] nfs_swap_deactivate+0x27/0x30 [nfs] [ 74.256582] [<ffffffff811e99d4>] destroy_swap_extents+0x74/0x80 [ 74.257496] [<ffffffff811ecb52>] SyS_swapoff+0x222/0x5c0 [ 74.258318] [<ffffffff81023f27>] ? syscall_trace_leave+0xc7/0x140 [ 74.259253] [<ffffffff81798dae>] system_call_fastpath+0x12/0x71 [ 74.260158] ---[ end trace 2530722966429f10 ]--- The warning in question was unnecessary but with Jeff's series the rules are also clearer. This patch removes the warning and updates the comment to explain why sk_mem_reclaim() may still be called. [jlayton: remove if (sk->sk_forward_alloc) conditional. As Leon points out that it's not needed.] Cc: Leon Romanovsky <leon@leon.nu> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-11bridge: fix multicast router rlist endless loopNikolay Aleksandrov1-4/+3
Since the addition of sysfs multicast router support if one set multicast_router to "2" more than once, then the port would be added to the hlist every time and could end up linking to itself and thus causing an endless loop for rlist walkers. So to reproduce just do: echo 2 > multicast_router; echo 2 > multicast_router; in a bridge port and let some igmp traffic flow, for me it hangs up in br_multicast_flood(). Fix this by adding a check in br_multicast_add_router() if the port is already linked. The reason this didn't happen before the addition of multicast_router sysfs entries is because there's a !hlist_unhashed check that prevents it. Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Fixes: 0909e11758bd ("bridge: Add multicast_router sysfs entries") Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-11tipc: disconnect socket directly after probe failureErik Hugne1-5/+11
If the TIPC connection timer expires in a probing state, a self abort message is supposed to be generated and delivered to the local socket. This is currently broken, and the abort message is actually sent out to the peer node with invalid addressing information. This will cause the link to enter a constant retransmission state and eventually reset. We fix this by removing the self-abort message creation and tear down connection immediately instead. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-11ALSA: hda - Continue probing even if i915 binding failsTakashi Iwai1-1/+2
Currently snd-hda-intel driver aborts the probing of Intel HD-audio controller with i915 power well management when binding with i915 driver via hda_i915_init() fails. This is no big problem for Haswell and Broadwell where the HD-audio controllers are dedicated to HDMI/DP, thus i915 link is mandatory. However, Skylake, Baytrail and Braswell have only one controller and both HDMI/DP and analog codecs share the same bus. Thus, even if HDMI/DP isn't usable, we should keep the controller working for other codecs. For fixing this, this patch simply allows continuing the probing even if hda_i915_init() call fails. This may leave stale sound components for HDMI/DP devices that are unbound with graphics. We could abort the probing selectively, but from the code simplicity POV, it's better to continue in all cases. Reported-by: Libin Yang <libin.yang@intel.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2015-06-11Merge tag 'misc-for-linus-4.1-rc8' of ↵Linus Torvalds2-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull misc fixes from Guenter Roeck: "There are two patches here. One fixes a build error affecting the blackfin architecture, the other fixes a build error affecting the score architecture. The score maintainer (Lennox Wu) has a hard time sending you the score patch, and the blackfin maintainer (Steven Miao) has been silent since -rc1. Since 4.1 is about to be released, I figured it would be useful to get the patches upstream to avoid the related build failures in the final release" * tag 'misc-for-linus-4.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: score: Fix exception handler label blackfin: Fix build error
2015-06-11Merge branch 'akpm' (patches from Andrew)Linus Torvalds7-15/+20
Merge misc fixes from Andrew Morton: "The gcc-4.4.4 workaround has actually been merged into a KVM tree by Paolo but it is stuck in linux-next and mainline needs it" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: arch/x86/kvm/mmu.c: work around gcc-4.4.4 bug sched, numa: do not hint for NUMA balancing on VM_MIXEDMAP mappings zsmalloc: fix a null pointer dereference in destroy_handle_cache() mm: memcontrol: fix false-positive VM_BUG_ON() on -rt checkpatch: fix "GLOBAL_INITIALISERS" test zram: clear disk io accounting when reset zram device memcg: do not call reclaim if !__GFP_WAIT mm/memory_hotplug.c: set zone->wait_table to null after freeing it
2015-06-11arch/x86/kvm/mmu.c: work around gcc-4.4.4 bugAndrew Morton1-7/+7
Fix this compile issue with gcc-4.4.4: arch/x86/kvm/mmu.c: In function 'kvm_mmu_pte_write': arch/x86/kvm/mmu.c:4256: error: unknown field 'cr0_wp' specified in initializer arch/x86/kvm/mmu.c:4257: error: unknown field 'cr4_pae' specified in initializer arch/x86/kvm/mmu.c:4257: warning: excess elements in union initializer ... gcc-4.4.4 (at least) has issues when using anonymous unions in initializers. Fixes: edc90b7dc4ceef6 ("KVM: MMU: fix SMAP virtualization") Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-11sched, numa: do not hint for NUMA balancing on VM_MIXEDMAP mappingsMel Gorman1-1/+1
Jovi Zhangwei reported the following problem Below kernel vm bug can be triggered by tcpdump which mmaped a lot of pages with GFP_COMP flag. [Mon May 25 05:29:33 2015] page:ffffea0015414000 count:66 mapcount:1 mapping: (null) index:0x0 [Mon May 25 05:29:33 2015] flags: 0x20047580004000(head) [Mon May 25 05:29:33 2015] page dumped because: VM_BUG_ON_PAGE(compound_order(page) && !PageTransHuge(page)) [Mon May 25 05:29:33 2015] ------------[ cut here ]------------ [Mon May 25 05:29:33 2015] kernel BUG at mm/migrate.c:1661! [Mon May 25 05:29:33 2015] invalid opcode: 0000 [#1] SMP In this case it was triggered by running tcpdump but it's not necessary reproducible on all systems. sudo tcpdump -i bond0.100 'tcp port 4242' -c 100000000000 -w 4242.pcap Compound pages cannot be migrated and it was not expected that such pages be marked for NUMA balancing. This did not take into account that drivers such as net/packet/af_packet.c may insert compound pages into userspace with vm_insert_page. This patch tells the NUMA balancing protection scanner to skip all VM_MIXEDMAP mappings which avoids the possibility that compound pages are marked for migration. Signed-off-by: Mel Gorman <mgorman@suse.de> Reported-by: Jovi Zhangwei <jovi@cloudflare.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-11zsmalloc: fix a null pointer dereference in destroy_handle_cache()Sergey Senozhatsky1-1/+2
If zs_create_pool()->create_handle_cache()->kmem_cache_create() or pool->name allocation fails, zs_create_pool()->destroy_handle_cache() will dereference the NULL pool->handle_cachep. Modify destroy_handle_cache() to avoid this. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-11mm: memcontrol: fix false-positive VM_BUG_ON() on -rtJohannes Weiner1-3/+1
On -rt, the VM_BUG_ON(!irqs_disabled()) triggers inside the memcg swapout path because the spin_lock_irq(&mapping->tree_lock) in the caller doesn't actually disable the hardware interrupts - which is fine, because on -rt the tophalves run in process context and so we are still safe from preemption while updating the statistics. Remove the VM_BUG_ON() but keep the comment of what we rely on. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Clark Williams <williams@redhat.com> Cc: Fernando Lopez-Lezcano <nando@ccrma.Stanford.EDU> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-11checkpatch: fix "GLOBAL_INITIALISERS" testJoe Perches1-2/+2
Commit d5e616fc1c1d ("checkpatch: add a few more --fix corrections") broke the GLOBAL_INITIALISERS test with bad parentheses and optional leading spaces. Fix it. Signed-off-by: Joe Perches <joe@perches.com> Reported-by: Bandan Das <bsd@makefile.in> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-11zram: clear disk io accounting when reset zram deviceWeijie Yang1-0/+2
Clear zram disk io accounting when resetting the zram device. Otherwise the residual io accounting stat will affect the diskstat in the next zram active cycle. Signed-off-by: Weijie Yang <weijie.yang@samsung.com> Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-11memcg: do not call reclaim if !__GFP_WAITVladimir Davydov1-0/+2
When trimming memcg consumption excess (see memory.high), we call try_to_free_mem_cgroup_pages without checking if we are allowed to sleep in the current context, which can result in a deadlock. Fix this. Fixes: 241994ed8649 ("mm: memcontrol: default hierarchy interface for memory") Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-11mm/memory_hotplug.c: set zone->wait_table to null after freeing itGu Zheng1-1/+3
Izumi found the following oops when hot re-adding a node: BUG: unable to handle kernel paging request at ffffc90008963690 IP: __wake_up_bit+0x20/0x70 Oops: 0000 [#1] SMP CPU: 68 PID: 1237 Comm: rs:main Q:Reg Not tainted 4.1.0-rc5 #80 Hardware name: FUJITSU PRIMEQUEST2800E/SB, BIOS PRIMEQUEST 2000 Series BIOS Version 1.87 04/28/2015 task: ffff880838df8000 ti: ffff880017b94000 task.ti: ffff880017b94000 RIP: 0010:[<ffffffff810dff80>] [<ffffffff810dff80>] __wake_up_bit+0x20/0x70 RSP: 0018:ffff880017b97be8 EFLAGS: 00010246 RAX: ffffc90008963690 RBX: 00000000003c0000 RCX: 000000000000a4c9 RDX: 0000000000000000 RSI: ffffea101bffd500 RDI: ffffc90008963648 RBP: ffff880017b97c08 R08: 0000000002000020 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8a0797c73800 R13: ffffea101bffd500 R14: 0000000000000001 R15: 00000000003c0000 FS: 00007fcc7ffff700(0000) GS:ffff880874800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffc90008963690 CR3: 0000000836761000 CR4: 00000000001407e0 Call Trace: unlock_page+0x6d/0x70 generic_write_end+0x53/0xb0 xfs_vm_write_end+0x29/0x80 [xfs] generic_perform_write+0x10a/0x1e0 xfs_file_buffered_aio_write+0x14d/0x3e0 [xfs] xfs_file_write_iter+0x79/0x120 [xfs] __vfs_write+0xd4/0x110 vfs_write+0xac/0x1c0 SyS_write+0x58/0xd0 system_call_fastpath+0x12/0x76 Code: 5d c3 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 45 f8 31 c0 48 8d 47 48 <48> 39 47 48 48 c7 45 e8 00 00 00 00 48 c7 45 f0 00 00 00 00 48 RIP [<ffffffff810dff80>] __wake_up_bit+0x20/0x70 RSP <ffff880017b97be8> CR2: ffffc90008963690 Reproduce method (re-add a node):: Hot-add nodeA --> remove nodeA --> hot-add nodeA (panic) This seems an use-after-free problem, and the root cause is zone->wait_table was not set to *NULL* after free it in try_offline_node. When hot re-add a node, we will reuse the pgdat of it, so does the zone struct, and when add pages to the target zone, it will init the zone first (including the wait_table) if the zone is not initialized. The judgement of zone initialized is based on zone->wait_table: static inline bool zone_is_initialized(struct zone *zone) { return !!zone->wait_table; } so if we do not set the zone->wait_table to *NULL* after free it, the memory hotplug routine will skip the init of new zone when hot re-add the node, and the wait_table still points to the freed memory, then we will access the invalid address when trying to wake up the waiting people after the i/o operation with the page is done, such as mentioned above. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Reported-by: Taku Izumi <izumi.taku@jp.fujitsu.com> Reviewed by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-11Revert "ipv6: Fix protocol resubmission"David S. Miller1-5/+3
This reverts commit 0243508edd317ff1fa63b495643a7c192fbfcd92. It introduces new regressions. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-10score: Fix exception handler labelGuenter Roeck1-1/+1
The latest version of modinfo fails to compile score architecture targets with the following error. FATAL: The relocation at __ex_table+0x634 references section "__ex_table" which is not executable, IOW the kernel will fault if it ever tries to jump to it. Something is seriously wrong and should be fixed. The probem is caused by a bad label in an __ex_table entry. Acked-by: Lennox Wu <lennox.wu@gmail.com> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2015-06-10blackfin: Fix build errorGuenter Roeck1-0/+1
Fix include/asm-generic/io.h: In function 'readb': include/asm-generic/io.h:113:2: error: implicit declaration of function 'bfin_read8' include/asm-generic/io.h: In function 'readw': include/asm-generic/io.h:121:2: error: implicit declaration of function 'bfin_read16' include/asm-generic/io.h: In function 'readl': include/asm-generic/io.h:129:2: error: implicit declaration of function 'bfin_read32' include/asm-generic/io.h: In function 'writeb': include/asm-generic/io.h:147:2: error: implicit declaration of function 'bfin_write8' include/asm-generic/io.h: In function 'writew': include/asm-generic/io.h:155:2: error: implicit declaration of function 'bfin_write16' include/asm-generic/io.h: In function 'writel': include/asm-generic/io.h:163:2: error: implicit declaration of function 'bfin_write32' Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Fixes: 1a3372bc522ef ("blackfin: io: define __raw_readx/writex with bfin_readx/writex") Cc: Steven Miao <realmz6@gmail.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2015-06-10perf record: Amend option summariesPeter Zijlstra2-7/+10
Because there's too many options and I cannot read, I frequently get confused between -c and -P, and try to do things like: perf record -P 50000 -- foo Which does not work; try and make the option description slightly longer and hopefully less confusing. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150610144850.GP19282@twins.programming.kicks-ass.net [ Do those changes on the man page as well ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-06-10perf tools: Avoid possible race condition in copyfile()Milos Vyletel1-15/+31
Use unique temporary files when copying to buildid dir to prevent races in case multiple instances are trying to copy same file. This is done by - creating template in form <path>/.<filename>.XXXXXX where the suffix is used by mkstemp() to create unique file - change file mode - copy content - if successful link temp file to target file - unlink temp file At this point the only file left at target path should be the desired one either created by us or other instance if we raced. This should also prevent not yet fully copied files to be visible to to other perf instances that could try to parse them. On top of that slow_copyfile no longer needs to deal with file mode when creating file since temporary file is already created and mode is set. Succesfully tested by myself by running perf record, archive and reading the data on other system and by running perf buildid-cache on perf binary itself. I also did revert fix from 0635b0f that to exposes previously fixed race with EEXIST and recreator test passed sucessfully. Signed-off-by: Milos Vyletel <milos@redhat.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1433775018-19868-1-git-send-email-milos@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-06-10clocksource: Use current logging styleJoe Perches1-12/+12
clocksource messages aren't prefixed in dmesg so it's a bit unclear what subsystem emits the messages. Use pr_fmt and pr_<level> to auto-prefix the messages appropriately. Miscellanea: o Remove "Warning" from KERN_WARNING level messages o Align "timekeeping watchdog: " messages o Coalesce formats o Align multiline arguments Signed-off-by: Joe Perches <joe@perches.com> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/1432579795.2846.75.camel@perches.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-10time: Allow gcc to fold usecs_to_jiffies(constant)Nicholas Mc Guire1-1/+29
To allow constant folding in usecs_to_jiffies() conditionally calls the HZ dependent _usecs_to_jiffies() helpers or, when gcc can not figure out constant folding, __usecs_to_jiffies, which is the renamed original usecs_to_jiffies() function. Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Cc: Masahiro Yamada <yamada.m@jp.panasonic.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Joe Perches <joe@perches.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Andrew Hunter <ahh@google.com> Cc: Paul Turner <pjt@google.com> Cc: Michal Marek <mmarek@suse.cz> Link: http://lkml.kernel.org/r/1432832996-12129-2-git-send-email-hofrat@osadl.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-10time: Refactor usecs_to_jiffiesNicholas Mc Guire2-11/+29
Refactor the usecs_to_jiffies conditional code part in time.c and jiffies.h putting it into conditional functions rather than #ifdefs to improve readability. This is analogous to the msecs_to_jiffies() cleanup in commit ca42aaf0c861 ("time: Refactor msecs_to_jiffies") Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Cc: Masahiro Yamada <yamada.m@jp.panasonic.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Joe Perches <joe@perches.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Andrew Hunter <ahh@google.com> Cc: Paul Turner <pjt@google.com> Cc: Michal Marek <mmarek@suse.cz> Link: http://lkml.kernel.org/r/1432832996-12129-1-git-send-email-hofrat@osadl.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-10MIPS: MSA: bugfix - disable MSA correctly for new threads/processes.Ralf Baechle1-1/+1
Due to the slightly odd way that new threads and processes start execution when scheduled for the very first time they were bypassing the required disable_msa call. Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2015-06-10MIPS: Loongson: Do not register 8250 platform device from module.Ralf Baechle1-3/+1
If CONFIG_SERIAL_8250 is set to m, the Loongson seria.ko module might get unloaded while the serial driver modules are still loaded resulting in stale references to the destroyed platform_device instance. Anyway, platform devices should always be registered indicated what devices are present, _not_ what drivers have been configured. Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com> Patchwork: https://patchwork.linux-mips.org/patch/10538/
2015-06-10MIPS: Cobalt: Do not build MTD platform device registration code as module.Ralf Baechle1-2/+1
If CONFIG_MTD_PHYSMAP is set to m, the Cobalt mtd.ko module might get unloaded while the drivers/mtd modules are still loaded resulting in stale references to the destroyed platform_device instance. Anyway, platform devices should always be registered indicated what devices are present, _not_ what drivers have been configured. Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2015-06-10ALSA: hda - Don't actually write registers for caps overwritesTakashi Iwai1-0/+3
Along with the transition to regmap for managing the cached parameter reads, the caps overwrite was also moved to regmap cache. The cache change itself works, but it still tries to write the non-existing verb (the HDA parameter is read-only) wrongly. It's harmless in most cases, but some chips are picky and may result in the codec communication stall. This patch avoids it just by adding the missing flag check in reg_write ops. Signed-off-by: Takashi Iwai <tiwai@suse.de>
2015-06-10x86/asm/entry/64: Disentangle error_entry/exit gsbase/ebx/usermode codeAndy Lutomirski1-8/+31
The error_entry/error_exit code to handle gsbase and whether we return to user mdoe was a mess: - error_sti was misnamed. In particular, it did not enable interrupts. - Error handling for gs_change was hopelessly tangled the normal usermode path. Separate it out. This saves a branch in normal entries from kernel mode. - The comments were bad. Fix it up. As a nice side effect, there's now a code path that happens on error entries from user mode. We'll use it soon. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/f1be898ab93360169fb845ab85185948832209ee.1433878454.git.luto@kernel.org [ Prettified it, clarified comments some more. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-10x86/asm/entry/32: Shorten __audit_syscall_entry() args preparationDenys Vlasenko1-9/+18
We use three MOVs to swap edx and ecx. We can use one XCHG instead. Expand the comments. It's difficult to keep track which arg# every register corresponds to, so spell it out. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1433876051-26604-3-git-send-email-dvlasenk@redhat.com [ Expanded the comments some more. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-10x86/asm/entry/32: Explain reloading of registers after __audit_syscall_entry()Denys Vlasenko1-8/+12
Here it is not obvious why we load pt_regs->cx to %esi etc. Lets improve comments. Explain that here we combine two things: first, we reload registers since some of them are clobbered by the C function we just called; and we also convert 32-bit syscall params to 64-bit C ABI, because we are going to jump back to syscall dispatch code. Move reloading of 6th argument into the macro instead of having it after each of two macro invocations. No actual code changes here. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1433876051-26604-2-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-10x86/asm/entry/32: Fix fallout from the R9 trick removal in the SYSCALL codeDenys Vlasenko1-1/+1
I put %ebp restoration code too late. Under strace, it is not reached and %ebp is not restored upon return to userspace. This is the fix. Run-tested. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1433876051-26604-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-10Merge branch 'for-linus' of ↵Linus Torvalds1-1/+6
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input layer fix from Dmitry Torokhov: "A small tweak for the Synaptics PS/2 touchpad driver" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: synaptics - add min/max quirk for Lenovo S540
2015-06-10blk-mq: free hctx->ctxs in queue's release handlerMing Lei1-2/+6
Now blk_cleanup_queue() can be called before calling del_gendisk()[1], inside which hctx->ctxs is touched from blk_mq_unregister_hctx(), but the variable has been freed by blk_cleanup_queue() at that time. So this patch moves freeing of hctx->ctxs into queue's release handler for fixing the oops reported by Stefan. [1], 6cd18e711dd8075 (block: destroy bdi before blockdev is unregistered) Reported-by: Stefan Seyfried <stefan.seyfried@googlemail.com> Cc: NeilBrown <neilb@suse.de> Cc: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org (v4.0) Signed-off-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-09cfg80211: wext: clear sinfo struct before calling driverJohannes Berg1-0/+2
Until recently, mac80211 overwrote all the statistics it could provide when getting called, but it now relies on the struct having been zeroed by the caller. This was always the case in nl80211, but wext used a static struct which could even cause values from one device leak to another. Using a static struct is OK (as even documented in a comment) since the whole usage of this function and its return value is always locked under RTNL. Not clearing the struct for calling the driver has always been wrong though, since drivers were free to only fill values they could report, so calling this for one device and then for another would always have leaked values from one to the other. Fix this by initializing the structure in question before the driver method call. This fixes https://bugzilla.kernel.org/show_bug.cgi?id=99691 Cc: stable@vger.kernel.org Reported-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Reported-by: Alexander Kaltsas <alexkaltsas@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-09SSB: Fix handling of ssb_pmu_get_alp_clock()Hauke Mehrtens1-3/+3
Dan Carpenter reported missing brackets which resulted in reading a wrong crystalfreq value. I also noticed that the result of this function is ignored. Reported-By: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de> Signed-off-by: Michael Buesch <m@bues.ch> Cc: davem@davemloft.net Cc: netdev@vger.kernel.org Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/10536/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2015-06-09iommu/vt-d: Change PASID support to bit 40 of Extended Capability RegisterDavid Woodhouse1-1/+2
The existing hardware implementations with PASID support advertised in bit 28? Forget them. They do not exist. Bit 28 means nothing. When we have something that works, it'll use bit 40. Do not attempt to infer anything meaningful from bit 28. This will be reflected in an updated VT-d spec in the extremely near future. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2015-06-09x86/mpx: Allow 32-bit binaries on 64-bit kernels againDave Hansen1-6/+0
Now that the bugs in mixed mode MPX handling are fixed, re-allow 32-bit binaries on 64-bit kernels again. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183706.70277DAD@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-09x86/mpx: Do not count MPX VMAs as neighbors when unmappingDave Hansen1-5/+19
The comment pretty much says it all. I wrote a test program that does lots of random allocations and forces bounds tables to be created. It came up with a layout like this: .... | BOUNDS DIRECTORY ENTRY COVERS | .... | BOUNDS TABLE COVERS | | BOUNDS TABLE | REAL ALLOC | BOUNDS TABLE | Unmapping "REAL ALLOC" should have been able to free the bounds table "covering" the "REAL ALLOC" because it was the last real user. But, the neighboring VMA bounds tables were found, considered as real neighbors, and we declined to free the bounds table covering the area. Doing this over and over left a small but significant number of these orphans. Handling them is fairly straighforward. All we have to do is walk the VMAs and skip all of the MPX ones when looking for neighbors. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183706.A6BD90BF@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-09x86/mpx: Rewrite the unmap codeDave Hansen1-243/+168
The MPX code needs to clear out bounds tables for memory which is no longer in use. We do this when a userspace mapping is torn down (unmapped). There are two modes: 1. An entire bounds table becomes unused, and can be freed and its pointer removed from the bounds directory. This happens either when a large mapping is torn down, or when a small mapping is torn down and it is the last mapping "covered" by a bounds table. 2. Only part of a bounds table becomes unused, in which case we free the backing memory as if MADV_DONTNEED was called. The old code was a spaghetti mess of "edge" bounds tables where the edges were handled specially, even if we were unmapping an entire one. Non-edge bounds tables are always fully unmapped, but share a different code path from the edge ones. The old code had a bug where it was unmapping too much memory. I worked on fixing it for two days and gave up. I didn't write the original code. I didn't particularly like it, but it worked, so I left it. After my debug session, I realized it was undebuggagle *and* buggy, so out it went. I also wrote a new unmapping test program which uncovers bugs pretty nicely. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183706.DCAEC67D@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-09x86/mpx: Support 32-bit binaries on 64-bit kernelsDave Hansen2-53/+179
Right now, the kernel can only switch between 64-bit and 32-bit binaries at compile time. This patch adds support for 32-bit binaries on 64-bit kernels when we support ia32 emulation. We essentially choose which set of table sizes to use when doing arithmetic for the bounds table calculations. This also uses a different approach for calculating the table indexes than before. I think the new one makes it much more clear what is going on, and allows us to share more code between the 32-bit and 64-bit cases. Based-on-patch-by: Qiaowei Ren <qiaowei.ren@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183705.E01F21E2@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-09x86/mpx: Use 32-bit-only cmpxchg() for 32-bit appsDave Hansen1-5/+36
user_atomic_cmpxchg_inatomic() actually looks at sizeof(*ptr) to figure out how many bytes to copy. If we run it on a 64-bit kernel with a 64-bit pointer, it will copy a 64-bit bounds directory entry. That's fine, except when we have 32-bit programs with 32-bit bounds directory entries and we only *want* 32-bits. This patch breaks the cmpxchg() operation out in to its own function and performs the 32-bit type swizzling in there. Note, the "64-bit" version of this code _would_ work on a 32-bit-only kernel. The issue this patch addresses is only for when the kernel's 'long' is mismatched from the size of the bounds directory entry of the process we are working on. The new helper modifies 'actual_old_val' or returns an error. But gcc doesn't know this, so it warns about 'actual_old_val' being unused. Shut it up with an uninitialized_var(). Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183705.672B115E@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-09x86/mpx: Introduce new 'directory entry' to 'addr' helper functionDave Hansen2-8/+34
Currently, to get from a bounds directory entry to the virtual address of a bounds table, we simply mask off a few low bits. However, the set of bits we mask off is different for 32-bit and 64-bit binaries. This breaks the operation out in to a helper function and also adds a temporary variable to store the result until we are sure we are returning one. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183704.007686CE@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-09x86/mpx: Add temporary variable to reduce maskingDave Hansen1-3/+4
When we allocate a bounds table, we call mmap(), then add a "valid" bit to the value before storing it in to the bounds directory. If we fail along the way, we go and mask that valid bit _back_ out. That seems a little silly, and this makes it much more clear when we have a plain address versus an actual table _entry_. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183704.3D69D5F4@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-09x86: Make is_64bit_mm() widely availableDave Hansen2-9/+14
The uprobes code has a nice helper, is_64bit_mm(), that consults both the runtime and compile-time flags for 32-bit support. Instead of reinventing the wheel, pull it in to an x86 header so we can use it for MPX. I prefer passing the 'mm' around to test_thread_flag(TIF_IA32) because it makes it explicit where the context is coming from. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183704.F0209999@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-09x86/mpx: Trace allocation of new bounds tablesDave Hansen2-0/+17
Bounds tables are a significant consumer of memory. It is important to know when they are being allocated. Add a trace point to trace whenever an allocation occurs and also its virtual address. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183704.EC23A93E@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-09x86/mpx: Trace the attempts to find bounds tablesDave Hansen2-0/+34
There are two different events being traced here. They are doing similar things so share a trace "EVENT_CLASS" and are presented together. 1. Trace when MPX is zapping pages "mpx_unmap_zap": When MPX can not free an entire bounds table, it will instead try to zap unused parts of a bounds table to free the backing memory. This decreases RSS (resident set size) without decreasing the virtual space allocated for bounds tables. 2. Trace attempts to find bounds tables "mpx_unmap_search": This event traces any time we go looking to unmap a bounds table for a given virtual address range. This is useful to ensure that the kernel actually "tried" to free a bounds table versus times it succeeded in finding one. It might try and fail if it realized that a table was shared with an adjacent VMA which is not being unmapped. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183703.B9D2468B@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-09x86/mpx: Trace entry to bounds exception pathsDave Hansen2-0/+35
There are two basic things that can happen as the result of a bounds exception (#BR): 1. We allocate a new bounds table 2. We pass up a bounds exception to userspace. This patch adds a trace point for the case where we are passing the exception up to userspace with a signal. We are also explicit that we're printing out the inverse of the 'upper' that we encounter. If you want to filter, for instance, you need to ~ the value first. The reason we do this is because of how 'upper' is stored in the bounds table. If a pointer's range is: 0x1000 -> 0x2000 it is stored in the bounds table as (32-bits here for brevity): lower: 0x00001000 upper: 0xffffdfff That is so that an all 0's entry: lower: 0x00000000 upper: 0x00000000 corresponds to the "init" bounds which store a *range* of: 0x00000000 -> 0xffffffff That is, by far, the common case, and that lets us use the zero page, or deduplicate the memory, etc... The 'upper' stored in the table is gibberish to print by itself, so we print ~upper to get the *actual*, logical, human-readable value printed out. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183703.027BB9B0@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-09x86/mpx: Trace #BR exceptionsDave Hansen3-0/+55
This is the first in a series of MPX tracing patches. I've found these extremely useful in the process of debugging applications and the kernel code itself. This exception hooks in to the bounds (#BR) exception very early and allows capturing the key registers which would influence how the exception is handled. Note that bndcfgu/bndstatus are technically still 64-bit registers even in 32-bit mode. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183703.5FE2619A@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>