summaryrefslogtreecommitdiff
path: root/drivers/base
AgeCommit message (Collapse)AuthorFilesLines
2020-01-29mm/memory_hotplug: fix try_offline_node()David Hildenbrand1-0/+36
commit 2c91f8fc6c999fe10185d8ad99fda1759f662f70 upstream. -- snip -- Only contextual issues: - Unrelated check_and_unmap_cpu_on_node() changes are missing. - Unrelated walk_memory_blocks() has not been moved/refactored yet. -- snip -- try_offline_node() is pretty much broken right now: - The node span is updated when onlining memory, not when adding it. We ignore memory that was mever onlined. Bad. - We touch possible garbage memmaps. The pfn_to_nid(pfn) can easily trigger a kernel panic. Bad for memory that is offline but also bad for subsection hotadd with ZONE_DEVICE, whereby the memmap of the first PFN of a section might contain garbage. - Sections belonging to mixed nodes are not properly considered. As memory blocks might belong to multiple nodes, we would have to walk all pageblocks (or at least subsections) within present sections. However, we don't have a way to identify whether a memmap that is not online was initialized (relevant for ZONE_DEVICE). This makes things more complicated. Luckily, we can piggy pack on the node span and the nid stored in memory blocks. Currently, the node span is grown when calling move_pfn_range_to_zone() - e.g., when onlining memory, and shrunk when removing memory, before calling try_offline_node(). Sysfs links are created via link_mem_sections(), e.g., during boot or when adding memory. If the node still spans memory or if any memory block belongs to the nid, we don't set the node offline. As memory blocks that span multiple nodes cannot get offlined, the nid stored in memory blocks is reliable enough (for such online memory blocks, the node still spans the memory). Introduce for_each_memory_block() to efficiently walk all memory blocks. Note: We will soon stop shrinking the ZONE_DEVICE zone and the node span when removing ZONE_DEVICE memory to fix similar issues (access of garbage memmaps) - until we have a reliable way to identify whether these memmaps were properly initialized. This implies later, that once a node had ZONE_DEVICE memory, we won't be able to set a node offline - which should be acceptable. Since commit f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") memory that is added is not assoziated with a zone/node (memmap not initialized). The introducing commit 60a5a19e7419 ("memory-hotplug: remove sysfs file of node") already missed that we could have multiple nodes for a section and that the zone/node span is updated when onlining pages, not when adding them. I tested this by hotplugging two DIMMs to a memory-less and cpu-less NUMA node. The node is properly onlined when adding the DIMMs. When removing the DIMMs, the node is properly offlined. Masayoshi Mizuma reported: : Without this patch, memory hotplug fails as panic: : : BUG: kernel NULL pointer dereference, address: 0000000000000000 : ... : Call Trace: : remove_memory_block_devices+0x81/0xc0 : try_remove_memory+0xb4/0x130 : __remove_memory+0xa/0x20 : acpi_memory_device_remove+0x84/0x100 : acpi_bus_trim+0x57/0x90 : acpi_bus_trim+0x2e/0x90 : acpi_device_hotplug+0x2b2/0x4d0 : acpi_hotplug_work_fn+0x1a/0x30 : process_one_work+0x171/0x380 : worker_thread+0x49/0x3f0 : kthread+0xf8/0x130 : ret_from_fork+0x35/0x40 [david@redhat.com: v3] Link: http://lkml.kernel.org/r/20191102120221.7553-1-david@redhat.com Link: http://lkml.kernel.org/r/20191028105458.28320-1-david@redhat.com Fixes: 60a5a19e7419 ("memory-hotplug: remove sysfs file of node") Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") # visiable after d0dc12e86b319 Signed-off-by: David Hildenbrand <david@redhat.com> Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Keith Busch <keith.busch@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Nayna Jain <nayna@linux.ibm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-29drivers/base/node.c: simplify unregister_memory_block_under_nodes()David Hildenbrand2-24/+16
commit d84f2f5a755208da3f93e17714631485cb3da11c upstream. We don't allow to offline memory block devices that belong to multiple numa nodes. Therefore, such devices can never get removed. It is sufficient to process a single node when removing the memory block. No need to iterate over each and every PFN. We already have the nid stored for each memory block. Make sure that the nid always has a sane value. Please note that checking for node_online(nid) is not required. If we would have a memory block belonging to a node that is no longer offline, then we would have a BUG in the node offlining code. Link: http://lkml.kernel.org/r/20190719135244.15242-1-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-29mm/memory_hotplug: make unregister_memory_block_under_nodes() never failDavid Hildenbrand1-13/+5
commit a31b264c2b415b29660da0bc2ba291a98629ce51 upstream. We really don't want anything during memory hotunplug to fail. We always pass a valid memory block device, that check can go. Avoid allocating memory and eventually failing. As we are always called under lock, we can use a static piece of memory. This avoids having to put the structure onto the stack, having to guess about the stack size of callers. Patch inspired by a patch from Oscar Salvador. In the future, there might be no need to iterate over nodes at all. mem->nid should tell us exactly what to remove. Memory block devices with mixed nodes (added during boot) should properly fenced off and never removed. Link: http://lkml.kernel.org/r/20190527111152.16324-11-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Wei Yang <richardw.yang@linux.intel.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Mark Brown <broonie@kernel.org> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: David Hildenbrand <david@redhat.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Andrew Banman <andrew.banman@hpe.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Arun KS <arunks@codeaurora.org> Cc: Baoquan He <bhe@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chintan Pandya <cpandya@codeaurora.org> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Jun Yao <yaojun8558363@gmail.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Mathieu Malaterre <malat@debian.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: "mike.travis@hpe.com" <mike.travis@hpe.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qian Cai <cai@lca.pw> Cc: Rich Felker <dalias@libc.org> Cc: Rob Herring <robh@kernel.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-29mm/memory_hotplug: remove memory block devices before arch_remove_memory()David Hildenbrand2-24/+24
commit 4c4b7f9ba9486c565aead99a198ceeef73ae81f6 upstream. Let's factor out removing of memory block devices, which is only necessary for memory added via add_memory() and friends that created memory block devices. Remove the devices before calling arch_remove_memory(). This finishes factoring out memory block device handling from arch_add_memory() and arch_remove_memory(). Link: http://lkml.kernel.org/r/20190527111152.16324-10-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: "mike.travis@hpe.com" <mike.travis@hpe.com> Cc: Andrew Banman <andrew.banman@hpe.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Mark Brown <broonie@kernel.org> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Oscar Salvador <osalvador@suse.de> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Arun KS <arunks@codeaurora.org> Cc: Mathieu Malaterre <malat@debian.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Baoquan He <bhe@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chintan Pandya <cpandya@codeaurora.org> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Jun Yao <yaojun8558363@gmail.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qian Cai <cai@lca.pw> Cc: Rich Felker <dalias@libc.org> Cc: Rob Herring <robh@kernel.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-29mm/memory_hotplug: create memory block devices after arch_add_memory()David Hildenbrand1-28/+54
commit db051a0dac13db24d58470d75cee0ce7c6b031a1 upstream. Only memory to be added to the buddy and to be onlined/offlined by user space using /sys/devices/system/memory/... needs (and should have!) memory block devices. Factor out creation of memory block devices. Create all devices after arch_add_memory() succeeded. We can later drop the want_memblock parameter, because it is now effectively stale. Only after memory block devices have been added, memory can be onlined by user space. This implies, that memory is not visible to user space at all before arch_add_memory() succeeded. While at it - use WARN_ON_ONCE instead of BUG_ON in moved unregister_memory() - introduce find_memory_block_by_id() to search via block id - Use find_memory_block_by_id() in init_memory_block() to catch duplicates Link: http://lkml.kernel.org/r/20190527111152.16324-8-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: "mike.travis@hpe.com" <mike.travis@hpe.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Banman <andrew.banman@hpe.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Qian Cai <cai@lca.pw> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Arun KS <arunks@codeaurora.org> Cc: Mathieu Malaterre <malat@debian.org> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Baoquan He <bhe@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chintan Pandya <cpandya@codeaurora.org> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Jun Yao <yaojun8558363@gmail.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Mark Brown <broonie@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rich Felker <dalias@libc.org> Cc: Rob Herring <robh@kernel.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-29drivers/base/memory: pass a block_id to init_memory_block()David Hildenbrand1-16/+11
commit 1811582587c43bdf13d690d83345610d4df433bb upstream. We'll rework hotplug_memory_register() shortly, so it no longer consumes pass a section. [cai@lca.pw: fix a compilation warning] Link: http://lkml.kernel.org/r/1559320186-28337-1-git-send-email-cai@lca.pw Link: http://lkml.kernel.org/r/20190527111152.16324-6-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Qian Cai <cai@lca.pw> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Andrew Banman <andrew.banman@hpe.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Arun KS <arunks@codeaurora.org> Cc: Baoquan He <bhe@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chintan Pandya <cpandya@codeaurora.org> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Jun Yao <yaojun8558363@gmail.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Mark Brown <broonie@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Mathieu Malaterre <malat@debian.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: "mike.travis@hpe.com" <mike.travis@hpe.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rich Felker <dalias@libc.org> Cc: Rob Herring <robh@kernel.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-29mm/memory_hotplug: allow arch_remove_memory() without CONFIG_MEMORY_HOTREMOVEDavid Hildenbrand1-2/+0
commit 80ec922dbd87fd38d15719c86a94457204648aeb upstream. -- snip -- Missing arm64 memory hot(un)plug support. -- snip -- We want to improve error handling while adding memory by allowing to use arch_remove_memory() and __remove_pages() even if CONFIG_MEMORY_HOTREMOVE is not set to e.g., implement something like: arch_add_memory() rc = do_something(); if (rc) { arch_remove_memory(); } We won't get rid of CONFIG_MEMORY_HOTREMOVE for now, as it will require quite some dependencies for memory offlining. Link: http://lkml.kernel.org/r/20190527111152.16324-7-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: David Hildenbrand <david@redhat.com> Cc: Oscar Salvador <osalvador@suse.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Mark Brown <broonie@kernel.org> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Rob Herring <robh@kernel.org> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: "mike.travis@hpe.com" <mike.travis@hpe.com> Cc: Andrew Banman <andrew.banman@hpe.com> Cc: Arun KS <arunks@codeaurora.org> Cc: Qian Cai <cai@lca.pw> Cc: Mathieu Malaterre <malat@debian.org> Cc: Baoquan He <bhe@redhat.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chintan Pandya <cpandya@codeaurora.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Jun Yao <yaojun8558363@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-29mm/memory_hotplug: make unregister_memory_section() never failDavid Hildenbrand1-11/+5
commit cb7b3a3685b20d3b5900ff24b2cb96d002960189 upstream. Failing while removing memory is mostly ignored and cannot really be handled. Let's treat errors in unregister_memory_section() in a nice way, warning, but continuing. Link: http://lkml.kernel.org/r/20190409100148.24703-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Banman <andrew.banman@hpe.com> Cc: Mike Travis <mike.travis@hpe.com> Cc: David Hildenbrand <david@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Qian Cai <cai@lca.pw> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Arun KS <arunks@codeaurora.org> Cc: Mathieu Malaterre <malat@debian.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rich Felker <dalias@libc.org> Cc: Rob Herring <robh@kernel.org> Cc: Stefan Agner <stefan@agner.ch> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-29mm, memory_hotplug: update a comment in unregister_memory()Dan Carpenter1-1/+1
commit 16df1456aa858a86f398dbc7d27649eb6662b0cc upstream. The remove_memory_block() function was renamed to in commit cc292b0b4302 ("drivers/base/memory.c: rename remove_memory_block() to remove_memory_section()"). Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-29drivers/base/memory.c: clean up relics in function parametersBaoquan He1-6/+6
commit 063b8a4cee8088224bcdb79bcd08db98df16178e upstream. The input parameter 'phys_index' of memory_block_action() is actually the section number, but not the phys_index of memory_block. This is a relic from the past when one memory block could only contain one section. Rename it to start_section_nr. And also in remove_memory_section(), the 'node_id' and 'phys_device' arguments are not used by anyone. Remove them. Link: http://lkml.kernel.org/r/20190329144250.14315-2-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Mukesh Ojha <mojha@codeaurora.org> Reviewed-by: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-29drivers/base/memory.c: remove an unnecessary check on NR_MEM_SECTIONSWei Yang1-1/+1
commit 3b6fd6ffb27c2efa003c6d4d15ca72c054b71d7c upstream. In cb5e39b8038b ("drivers: base: refactor add_memory_section() to add_memory_block()"), add_memory_block() is introduced, which is only invoked in memory_dev_init(). When combining these two loops in memory_dev_init() and add_memory_block(), they looks like this: for (i = 0; i < NR_MEM_SECTIONS; i += sections_per_block) for (j = i; (j < i + sections_per_block) && j < NR_MEM_SECTIONS; j++) Since it is sure the (i < NR_MEM_SECTIONS) and j sits in its own memory block, the check of (j < NR_MEM_SECTIONS) is not necessary. This patch just removes this check. Link: http://lkml.kernel.org/r/20181123222811.18216-1-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Seth Jennings <sjenning@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-27PM: sleep: Fix possible overflow in pm_system_cancel_wakeup()Rafael J. Wysocki1-1/+1
[ Upstream commit 2933954b71f10d392764f95eec0f0aa2d103054b ] It is not actually guaranteed that pm_abort_suspend will be nonzero when pm_system_cancel_wakeup() is called which may lead to subtle issues, so make it use atomic_dec_if_positive() instead of atomic_dec() for the safety sake. Fixes: 33e4f80ee69b ("ACPI / PM: Ignore spurious SCI wakeups from suspend-to-idle") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27driver core: Fix PM-runtime for links added during consumer probeRafael J. Wysocki2-25/+8
[ Upstream commit 36003d4cf57ca431fb3f94d317bcca426a2394d6 ] Commit 4c06c4e6cf63 ("driver core: Fix possible supplier PM-usage counter imbalance") introduced a regression that causes suppliers to be suspended prematurely for device links added during consumer driver probe if the initial PM-runtime status of the consumer is "suspended" and the consumer is resumed after adding the link and before pm_runtime_put_suppliers() is called. In that case, pm_runtime_put_suppliers() will drop the rpm_active refcount for the link by one and (since rpm_active is equal to two after the preceding consumer resume) the supplier's PM-runtime usage counter will be decremented, which may cause the supplier to suspend even though the consumer's PM-runtime status is "active". For this reason, partially revert commit 4c06c4e6cf63 as the problem it tried to fix needs to be addressed somewhat differently, and change pm_runtime_get_suppliers() and pm_runtime_put_suppliers() so that the latter only drops rpm_active references acquired by the former. [This requires adding a new field to struct device_link, but I coulnd't find a cleaner way to address the issue that would work in all cases.] This causes pm_runtime_put_suppliers() to effectively ignore device links added during consumer probe, so device_link_add() doesn't need to worry about ensuring that suppliers will remain active after pm_runtime_put_suppliers() for links created with DL_FLAG_RPM_ACTIVE set and it only needs to bump up rpm_active by one for those links, so pm_runtime_active_link() is not necessary any more. Fixes: 4c06c4e6cf63 ("driver core: Fix possible supplier PM-usage counter imbalance") Reported-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27driver core: Fix possible supplier PM-usage counter imbalanceRafael J. Wysocki2-19/+29
[ Upstream commit 4c06c4e6cf63d7f3d5dfe62593a073253d750a59 ] If a stateless device link to a certain supplier with DL_FLAG_PM_RUNTIME set in the flags is added and then removed by the consumer driver's probe callback, the supplier's PM-runtime usage counter will be nonzero after that which effectively causes the supplier to remain "always on" going forward. Namely, device_link_add() called to add the link invokes device_link_rpm_prepare() which notices that the consumer driver is probing, so it increments the supplier's PM-runtime usage counter with the assumption that the link will stay around until pm_runtime_put_suppliers() is called by driver_probe_device(), but if the link goes away before that point, the supplier's PM-runtime usage counter will remain nonzero. To prevent that from happening, first rework pm_runtime_get_suppliers() and pm_runtime_put_suppliers() to use the rpm_active refounts of device links and make the latter only drop rpm_active and the supplier's PM-runtime usage counter for each link by one, unless rpm_active is one already for it. Next, modify device_link_add() to bump up the new link's rpm_active refcount and the suppliers PM-runtime usage counter by two, to prevent pm_runtime_put_suppliers(), if it is called subsequently, from suspending the supplier prematurely (in case its PM-runtime usage counter goes down to 0 in there). Due to the way rpm_put_suppliers() works, this change does not affect runtime suspend of the consumer ends of new device links (or, generally, device links for which DL_FLAG_PM_RUNTIME has just been set). Fixes: e2f3cd831a28 ("driver core: Fix handling of runtime PM flags in device_link_add()") Reported-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Tested-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27driver core: Do not call rpm_put_suppliers() in pm_runtime_drop_link()Rafael J. Wysocki2-2/+3
[ Upstream commit a1fdbfbb1da2063ba98a12eb6f1bdd07451c7145 ] Calling rpm_put_suppliers() from pm_runtime_drop_link() is excessive as it affects all suppliers of the consumer device and not just the one pointed to by the device link being dropped. Worst case it may cause the consumer device to stop working unexpectedly. Moreover, in principle it is racy with respect to runtime PM of the consumer device. To avoid these problems drop runtime PM references on the particular supplier pointed to by the link in question only and do that after the link has been dropped from the consumer device's list of links to suppliers, which is in device_link_free(). Fixes: a0504aecba76 ("PM / runtime: Drop usage count for suppliers at device link removal") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27driver core: Fix handling of runtime PM flags in device_link_add()Rafael J. Wysocki2-30/+41
[ Upstream commit e2f3cd831a280fc226118d9369bf3f77aab58c56 ] After commit ead18c23c263 ("driver core: Introduce device links reference counting"), if there is a link between the given supplier and the given consumer already, device_link_add() will refcount it and return it unconditionally without updating its flags. It is possible, however, that the second (or any subsequent) caller of device_link_add() for the same consumer-supplier pair will pass DL_FLAG_PM_RUNTIME, possibly along with DL_FLAG_RPM_ACTIVE, in flags to it and the existing link may not behave as expected then. First, if DL_FLAG_PM_RUNTIME is not set in the existing link's flags at all, it needs to be set like during the original initialization of the link. Second, if DL_FLAG_RPM_ACTIVE is passed to device_link_add() in flags (in addition to DL_FLAG_PM_RUNTIME), the existing link should to be updated to reflect the "active" runtime PM configuration of the consumer-supplier pair and extra care must be taken here to avoid possible destructive races with runtime PM of the consumer. To that end, redefine the rpm_active field in struct device_link as a refcount, initialize it to 1 and make rpm_resume() (for the consumer) and device_link_add() increment it whenever they acquire a runtime PM reference on the supplier device. Accordingly, make rpm_suspend() (for the consumer) and pm_runtime_clean_up_links() decrement it and drop runtime PM references to the supplier device in a loop until rpm_active becones 1 again. Fixes: ead18c23c263 ("driver core: Introduce device links reference counting") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27driver core: Do not resume suppliers under device_links_write_lock()Rafael J. Wysocki1-6/+14
[ Upstream commit 5db25c9eb893df8f6b93c1d97b8006d768e1b6f5 ] It is incorrect to call pm_runtime_get_sync() under device_links_write_lock(), because it may end up trying to take device_links_read_lock() while resuming the target device and that will deadlock in the non-SRCU case, so avoid that by resuming the supplier device in device_link_add() before calling device_links_write_lock(). Fixes: 21d5c57b3726 ("PM / runtime: Use device links") Fixes: baa8809f6097 ("PM / runtime: Optimize the use of device links") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27driver core: Avoid careless re-use of existing device linksRafael J. Wysocki1-3/+20
[ Upstream commit f265df550a4350dce0a4d721a77c52e4b847ea40 ] After commit ead18c23c263 ("driver core: Introduce device links reference counting"), if there is a link between the given supplier and the given consumer already, device_link_add() will refcount it and return it unconditionally. However, if the flags passed to it on the second (or any subsequent) attempt to create a device link between the same consumer-supplier pair are not compatible with the existing link's flags, that is incorrect. First off, if the existing link is stateless and the next caller of device_link_add() for the same consumer-supplier pair wants a stateful one, or the other way around, the existing link cannot be returned, because it will not match the expected behavior, so make device_link_add() dump the stack and return NULL in that case. Moreover, if the DL_FLAG_AUTOREMOVE_CONSUMER flag is passed to device_link_add(), its caller will expect its reference to the link to be dropped automatically on consumer driver removal, which will not happen if that flag is not set in the link's flags (and analogously for DL_FLAG_AUTOREMOVE_SUPPLIER). For this reason, make device_link_add() update the existing link's flags accordingly before returning it to the caller. Fixes: ead18c23c263 ("driver core: Introduce device links reference counting") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27driver core: Fix DL_FLAG_AUTOREMOVE_SUPPLIER device link flag handlingRafael J. Wysocki1-8/+12
[ Upstream commit c8d50986da5d74ddfc233b13b91d0a13369fa164 ] Change the list walk in device_links_driver_cleanup() to a safe one to avoid use-after-free when dropping a link from the list during the walk. Also, while at it, fix device_link_add() to refuse to create stateless device links with DL_FLAG_AUTOREMOVE_SUPPLIER set, which is an invalid combination (setting that flag means that the driver core should manage the link, so it cannot be stateless), and extend the kerneldoc comment of device_link_add() to cover the DL_FLAG_AUTOREMOVE_SUPPLIER flag properly too. Fixes: 1689cac5b32a ("driver core: Add flag to autoremove device link on supplier unbind") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-05drivers/base/platform.c: kmemleak ignore a known leakQian Cai1-0/+3
[ Upstream commit 967d3010df8b6f6f9aa95c198edc5fe3646ebf36 ] unreferenced object 0xffff808ec6dc5a80 (size 128): comm "swapper/0", pid 1, jiffies 4294938063 (age 2560.530s) hex dump (first 32 bytes): ff ff ff ff 00 00 00 00 6b 6b 6b 6b 6b 6b 6b 6b ........kkkkkkkk 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk backtrace: [<00000000476dcf8c>] kmem_cache_alloc_trace+0x430/0x500 [<000000004f708d37>] platform_device_register_full+0xbc/0x1e8 [<000000006c2a7ec7>] acpi_create_platform_device+0x370/0x450 [<00000000ef135642>] acpi_default_enumeration+0x34/0x78 [<000000003bd9a052>] acpi_bus_attach+0x2dc/0x3e0 [<000000003cf4f7f2>] acpi_bus_attach+0x108/0x3e0 [<000000003cf4f7f2>] acpi_bus_attach+0x108/0x3e0 [<000000002968643e>] acpi_bus_scan+0xb0/0x110 [<0000000010dd0bd7>] acpi_scan_init+0x1a8/0x410 [<00000000965b3c5a>] acpi_init+0x408/0x49c [<00000000ed4b9fe2>] do_one_initcall+0x178/0x7f4 [<00000000a5ac5a74>] kernel_init_freeable+0x9d4/0xa9c [<0000000070ea6c15>] kernel_init+0x18/0x138 [<00000000fb8fff06>] ret_from_fork+0x10/0x1c [<0000000041273a0d>] 0xffffffffffffffff Then, faddr2line pointed out this line, /* * This memory isn't freed when the device is put, * I don't have a nice idea for that though. Conceptually * dma_mask in struct device should not be a pointer. * See http://thread.gmane.org/gmane.linux.kernel.pci/9081 */ pdev->dev.dma_mask = kmalloc(sizeof(*pdev->dev.dma_mask), GFP_KERNEL); Since this leak has existed for more than 8 years and it does not reference other parts of the memory, let kmemleak ignore it, so users don't need to waste time reporting this in the future. Link: http://lkml.kernel.org/r/20181206160751.36211-1-cai@gmx.us Signed-off-by: Qian Cai <cai@gmx.us> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J . Wysocki" <rafael.j.wysocki@intel.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-01mm/memory_hotplug: Do not unlock when fails to take the device_hotplug_lockzhong jiang1-1/+1
[ Upstream commit d2ab99403ee00d8014e651728a4702ea1ae5e52c ] When adding the memory by probing memory block in sysfs interface, there is an obvious issue that we will unlock the device_hotplug_lock when fails to takes it. That issue was introduced in Commit 8df1d0e4a265 ("mm/memory_hotplug: make add_memory() take the device_hotplug_lock") We should drop out in time when fails to take the device_hotplug_lock. Fixes: 8df1d0e4a265 ("mm/memory_hotplug: make add_memory() take the device_hotplug_lock") Reported-by: Yang yingliang <yangyingliang@huawei.com> Signed-off-by: zhong jiang <zhongjiang@huawei.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-01mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lockDavid Hildenbrand1-12/+1
[ Upstream commit 381eab4a6ee81266f8dddc62e57376c7e584e5b8 ] There seem to be some problems as result of 30467e0b3be ("mm, hotplug: fix concurrent memory hot-add deadlock"), which tried to fix a possible lock inversion reported and discussed in [1] due to the two locks a) device_lock() b) mem_hotplug_lock While add_memory() first takes b), followed by a) during bus_probe_device(), onlining of memory from user space first took a), followed by b), exposing a possible deadlock. In [1], and it was decided to not make use of device_hotplug_lock, but rather to enforce a locking order. The problems I spotted related to this: 1. Memory block device attributes: While .state first calls mem_hotplug_begin() and the calls device_online() - which takes device_lock() - .online does no longer call mem_hotplug_begin(), so effectively calls online_pages() without mem_hotplug_lock. 2. device_online() should be called under device_hotplug_lock, however onlining memory during add_memory() does not take care of that. In addition, I think there is also something wrong about the locking in 3. arch/powerpc/platforms/powernv/memtrace.c calls offline_pages() without locks. This was introduced after 30467e0b3be. And skimming over the code, I assume it could need some more care in regards to locking (e.g. device_online() called without device_hotplug_lock. This will be addressed in the following patches. Now that we hold the device_hotplug_lock when - adding memory (e.g. via add_memory()/add_memory_resource()) - removing memory (e.g. via remove_memory()) - device_online()/device_offline() We can move mem_hotplug_lock usage back into online_pages()/offline_pages(). Why is mem_hotplug_lock still needed? Essentially to make get_online_mems()/put_online_mems() be very fast (relying on device_hotplug_lock would be very slow), and to serialize against addition of memory that does not create memory block devices (hmm). [1] http://driverdev.linuxdriverproject.org/pipermail/ driverdev-devel/ 2015-February/065324.html This patch is partly based on a patch by Vitaly Kuznetsov. Link: http://lkml.kernel.org/r/20180925091457.28651-4-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com> Reviewed-by: Rashmica Gupta <rashmica.g@gmail.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Len Brown <lenb@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: Rashmica Gupta <rashmica.g@gmail.com> Cc: Michael Neuling <mikey@neuling.org> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Pavel Tatashin <pavel.tatashin@microsoft.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com> Cc: Mathieu Malaterre <malat@debian.org> Cc: John Allen <jallen@linux.vnet.ibm.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-01mm/memory_hotplug: make add_memory() take the device_hotplug_lockDavid Hildenbrand1-2/+7
[ Upstream commit 8df1d0e4a265f25dc1e7e7624ccdbcb4a6630c89 ] add_memory() currently does not take the device_hotplug_lock, however is aleady called under the lock from arch/powerpc/platforms/pseries/hotplug-memory.c drivers/acpi/acpi_memhotplug.c to synchronize against CPU hot-remove and similar. In general, we should hold the device_hotplug_lock when adding memory to synchronize against online/offline request (e.g. from user space) - which already resulted in lock inversions due to device_lock() and mem_hotplug_lock - see 30467e0b3be ("mm, hotplug: fix concurrent memory hot-add deadlock"). add_memory()/add_memory_resource() will create memory block devices, so this really feels like the right thing to do. Holding the device_hotplug_lock makes sure that a memory block device can really only be accessed (e.g. via .online/.state) from user space, once the memory has been fully added to the system. The lock is not held yet in drivers/xen/balloon.c arch/powerpc/platforms/powernv/memtrace.c drivers/s390/char/sclp_cmd.c drivers/hv/hv_balloon.c So, let's either use the locked variants or take the lock. Don't export add_memory_resource(), as it once was exported to be used by XEN, which is never built as a module. If somebody requires it, we also have to export a locked variant (as device_hotplug_lock is never exported). Link: http://lkml.kernel.org/r/20180925091457.28651-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Rashmica Gupta <rashmica.g@gmail.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Len Brown <lenb@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com> Cc: John Allen <jallen@linux.vnet.ibm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mathieu Malaterre <malat@debian.org> Cc: Pavel Tatashin <pavel.tatashin@microsoft.com> Cc: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Neuling <mikey@neuling.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-01PM / Domains: Deal with multiple states but no governor in genpdUlf Hansson1-0/+6
[ Upstream commit 2c9b7f8772033cc8bafbd4eefe2ca605bf3eb094 ] A caller of pm_genpd_init() that provides some states for the genpd via the ->states pointer in the struct generic_pm_domain, should also provide a governor. This because it's the job of the governor to pick a state that satisfies the constraints. Therefore, let's print a warning to inform the user about such bogus configuration and avoid to bail out, by instead picking the shallowest state before genpd invokes the ->power_off() callback. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Lina Iyer <ilina@codeaurora.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-20component: fix loop condition to call unbind() if bind() failsBanajit Goswami1-3/+3
[ Upstream commit bdae566d5d9733b6e32b378668b84eadf28a94d4 ] During component_bind_all(), if bind() fails for any particular component associated with a master, unbind() should be called for all previous components in that master's match array, whose bind() might have completed successfully. As per the current logic, if bind() fails for the component at position 'n' in the master's match array, it would start calling unbind() from component in 'n'th position itself and work backwards, and will always skip calling unbind() for component in 0th position in the master's match array. Fix this by updating the loop condition, and the logic to refer to the components in master's match array, so that unbind() is called for all components starting from 'n-1'st position in the array, until (and including) component in 0th position. Signed-off-by: Banajit Goswami <bgoswami@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-12x86/bugs: Add ITLB_MULTIHIT bug infrastructureVineela Tummalapalli1-0/+8
commit db4d30fbb71b47e4ecb11c4efa5d8aad4b03dfae upstream. Some processors may incur a machine check error possibly resulting in an unrecoverable CPU lockup when an instruction fetch encounters a TLB multi-hit in the instruction TLB. This can occur when the page size is changed along with either the physical address or cache type. The relevant erratum can be found here: https://bugzilla.kernel.org/show_bug.cgi?id=205195 There are other processors affected for which the erratum does not fully disclose the impact. This issue affects both bare-metal x86 page tables and EPT. It can be mitigated by either eliminating the use of large pages or by using careful TLB invalidations when changing the page size in the page tables. Just like Spectre, Meltdown, L1TF and MDS, a new bit has been allocated in MSR_IA32_ARCH_CAPABILITIES (PSCHANGE_MC_NO) and will be set on CPUs which are mitigated against this issue. Signed-off-by: Vineela Tummalapalli <vineela.tummalapalli@intel.com> Co-developed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-12x86/speculation/taa: Add sysfs reporting for TSX Async AbortPawan Gupta1-0/+9
commit 6608b45ac5ecb56f9e171252229c39580cc85f0f upstream. Add the sysfs reporting file for TSX Async Abort. It exposes the vulnerability and the mitigation state similar to the existing files for the other hardware vulnerabilities. Sysfs file path is: /sys/devices/system/cpu/vulnerabilities/tsx_async_abort Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Neelima Krishnan <neelima.krishnan@intel.com> Reviewed-by: Mark Gross <mgross@linux.intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-29cpufreq: Avoid cpufreq_suspend() deadlock on system shutdownRafael J. Wysocki1-0/+3
commit 65650b35133ff20f0c9ef0abd5c3c66dbce3ae57 upstream. It is incorrect to set the cpufreq syscore shutdown callback pointer to cpufreq_suspend(), because that function cannot be run in the syscore stage of system shutdown for two reasons: (a) it may attempt to carry out actions depending on devices that have already been shut down at that point and (b) the RCU synchronization carried out by it may not be able to make progress then. The latter issue has been present since commit 45975c7d21a1 ("rcu: Define RCU-sched API in terms of RCU for Tree RCU PREEMPT builds"), but the former one has been there since commit 90de2a4aa9f3 ("cpufreq: suspend cpufreq governors on shutdown") regardless. Fix that by dropping cpufreq_syscore_ops altogether and making device_shutdown() call cpufreq_suspend() directly before shutting down devices, which is along the lines of what system-wide power management does. Fixes: 45975c7d21a1 ("rcu: Define RCU-sched API in terms of RCU for Tree RCU PREEMPT builds") Fixes: 90de2a4aa9f3 ("cpufreq: suspend cpufreq governors on shutdown") Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: 4.0+ <stable@vger.kernel.org> # 4.0+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-29drivers/base/memory.c: don't access uninitialized memmaps in ↵David Hildenbrand1-0/+3
soft_offline_page_store() commit 641fe2e9387a36f9ee01d7c69382d1fe147a5e98 upstream. Uninitialized memmaps contain garbage and in the worst case trigger kernel BUGs, especially with CONFIG_PAGE_POISONING. They should not get touched. Right now, when trying to soft-offline a PFN that resides on a memory block that was never onlined, one gets a misleading error with CONFIG_PAGE_POISONING: :/# echo 5637144576 > /sys/devices/system/memory/soft_offline_page [ 23.097167] soft offline: 0x150000 page already poisoned But the actual result depends on the garbage in the memmap. soft_offline_page() can only work with online pages, it returns -EIO in case of ZONE_DEVICE. Make sure to only forward pages that are online (iow, managed by the buddy) and, therefore, have an initialized memmap. Add a check against pfn_to_online_page() and similarly return -EIO. Link: http://lkml.kernel.org/r/20191010141200.8985-1-david@redhat.com Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") [visible after d0dc12e86b319] Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: <stable@vger.kernel.org> [4.13+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-07soundwire: fix regmap dependencies and align with other serial linksPierre-Louis Bossart1-1/+1
[ Upstream commit 8676b3ca4673517650fd509d7fa586aff87b3c28 ] The existing code has a mixed select/depend usage which makes no sense. config SOUNDWIRE_BUS tristate select REGMAP_SOUNDWIRE config REGMAP_SOUNDWIRE tristate depends on SOUNDWIRE_BUS Let's remove one layer of Kconfig definitions and align with the solutions used by all other serial links. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Link: https://lore.kernel.org/r/20190718230215.18675-1-pierre-louis.bossart@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-05base: soc: Export soc_device_register/unregister APIsVinod Koul1-0/+2
[ Upstream commit f7ccc7a397cf2ef64aebb2f726970b93203858d2 ] Qcom Socinfo driver can be built as a module, so export these two APIs. Tested-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Vaishali Thakkar <vaishali.thakkar@linaro.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-09-19driver core: Fix use-after-free and double free on glue directoryMuchun Song1-1/+52
commit ac43432cb1f5c2950408534987e57c2071e24d8f upstream. There is a race condition between removing glue directory and adding a new device under the glue dir. It can be reproduced in following test: CPU1: CPU2: device_add() get_device_parent() class_dir_create_and_add() kobject_add_internal() create_dir() // create glue_dir device_add() get_device_parent() kobject_get() // get glue_dir device_del() cleanup_glue_dir() kobject_del(glue_dir) kobject_add() kobject_add_internal() create_dir() // in glue_dir sysfs_create_dir_ns() kernfs_create_dir_ns(sd) sysfs_remove_dir() // glue_dir->sd=NULL sysfs_put() // free glue_dir->sd // sd is freed kernfs_new_node(sd) kernfs_get(glue_dir) kernfs_add_one() kernfs_put() Before CPU1 remove last child device under glue dir, if CPU2 add a new device under glue dir, the glue_dir kobject reference count will be increase to 2 via kobject_get() in get_device_parent(). And CPU2 has been called kernfs_create_dir_ns(), but not call kernfs_new_node(). Meanwhile, CPU1 call sysfs_remove_dir() and sysfs_put(). This result in glue_dir->sd is freed and it's reference count will be 0. Then CPU2 call kernfs_get(glue_dir) will trigger a warning in kernfs_get() and increase it's reference count to 1. Because glue_dir->sd is freed by CPU1, the next call kernfs_add_one() by CPU2 will fail(This is also use-after-free) and call kernfs_put() to decrease reference count. Because the reference count is decremented to 0, it will also call kmem_cache_free() to free the glue_dir->sd again. This will result in double free. In order to avoid this happening, we also should make sure that kernfs_node for glue_dir is released in CPU1 only when refcount for glue_dir kobj is 1 to fix this race. The following calltrace is captured in kernel 4.14 with the following patch applied: commit 726e41097920 ("drivers: core: Remove glue dirs from sysfs earlier") -------------------------------------------------------------------------- [ 3.633703] WARNING: CPU: 4 PID: 513 at .../fs/kernfs/dir.c:494 Here is WARN_ON(!atomic_read(&kn->count) in kernfs_get(). .... [ 3.633986] Call trace: [ 3.633991] kernfs_create_dir_ns+0xa8/0xb0 [ 3.633994] sysfs_create_dir_ns+0x54/0xe8 [ 3.634001] kobject_add_internal+0x22c/0x3f0 [ 3.634005] kobject_add+0xe4/0x118 [ 3.634011] device_add+0x200/0x870 [ 3.634017] _request_firmware+0x958/0xc38 [ 3.634020] request_firmware_into_buf+0x4c/0x70 .... [ 3.634064] kernel BUG at .../mm/slub.c:294! Here is BUG_ON(object == fp) in set_freepointer(). .... [ 3.634346] Call trace: [ 3.634351] kmem_cache_free+0x504/0x6b8 [ 3.634355] kernfs_put+0x14c/0x1d8 [ 3.634359] kernfs_create_dir_ns+0x88/0xb0 [ 3.634362] sysfs_create_dir_ns+0x54/0xe8 [ 3.634366] kobject_add_internal+0x22c/0x3f0 [ 3.634370] kobject_add+0xe4/0x118 [ 3.634374] device_add+0x200/0x870 [ 3.634378] _request_firmware+0x958/0xc38 [ 3.634381] request_firmware_into_buf+0x4c/0x70 -------------------------------------------------------------------------- Fixes: 726e41097920 ("drivers: core: Remove glue dirs from sysfs earlier") Signed-off-by: Muchun Song <smuchun@gmail.com> Reviewed-by: Mukesh Ojha <mojha@codeaurora.org> Signed-off-by: Prateek Sood <prsood@codeaurora.org> Link: https://lore.kernel.org/r/20190727032122.24639-1-smuchun@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-09drivers/base: Introduce kill_device()Dan Williams1-8/+19
commit 00289cd87676e14913d2d8492d1ce05c4baafdae upstream. The libnvdimm subsystem arranges for devices to be destroyed as a result of a sysfs operation. Since device_unregister() cannot be called from an actively running sysfs attribute of the same device libnvdimm arranges for device_unregister() to be performed in an out-of-line async context. The driver core maintains a 'dead' state for coordinating its own racing async registration / de-registration requests. Rather than add local 'dead' state tracking infrastructure to libnvdimm device objects, export the existing state tracking via a new kill_device() helper. The kill_device() helper simply marks the device as dead, i.e. that it is on its way to device_del(), or returns that the device was already dead. This can be used in advance of calling device_unregister() for subsystems like libnvdimm that might need to handle multiple user threads racing to delete a device. This refactoring does not change any behavior, but it is a pre-requisite for follow-on fixes and therefore marked for -stable. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Fixes: 4d88a97aa9e8 ("libnvdimm, nvdimm: dimm driver and base libnvdimm device-driver...") Cc: <stable@vger.kernel.org> Tested-by: Jane Chu <jane.chu@oracle.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/156341207332.292348.14959761496009347574.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-08-09driver core: Establish order of operations for device_add and device_del via ↵Alexander Duyck3-11/+26
bitflag commit 3451a495ef244a88ed6317a035299d835554d579 upstream. Add an additional bit flag to the device_private struct named "dead". This additional flag provides a guarantee that when a device_del is executed on a given interface an async worker will not attempt to attach the driver following the earlier device_del call. Previously this guarantee was not present and could result in the device_del call attempting to remove a driver from an interface only to have the async worker attempt to probe the driver later when it finally completes the asynchronous probe call. One additional change added was that I pulled the check for dev->driver out of the __device_attach_driver call and instead placed it in the __device_attach_async_helper call. This was motivated by the fact that the only other caller of this, __device_attach, had already taken the device_lock() and checked for dev->driver. Instead of testing for this twice in this path it makes more sense to just consolidate the dev->dead and dev->driver checks together into one set of checks. Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-26regmap: fix bulk writes on paged registersSrinivas Kandagatla1-0/+2
[ Upstream commit db057679de3e9e6a03c1bcd5aee09b0d25fd9f5b ] On buses like SlimBus and SoundWire which does not support gather_writes yet in regmap, A bulk write on paged register would be silently ignored after programming page. This is because local variable 'ret' value in regmap_raw_write_impl() gets reset to 0 once page register is written successfully and the code below checks for 'ret' value to be -ENOTSUPP before linearising the write buffer to send to bus->write(). Fix this by resetting the 'ret' value to -ENOTSUPP in cases where gather_writes() is not supported or single register write is not possible. Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-26regmap: debugfs: Fix memory leak in regmap_debugfs_initDaniel Baluta1-0/+2
[ Upstream commit 2899872b627e99b7586fe3b6c9f861da1b4d5072 ] As detected by kmemleak running on i.MX6ULL board: nreferenced object 0xd8366600 (size 64): comm "swapper/0", pid 1, jiffies 4294937370 (age 933.220s) hex dump (first 32 bytes): 64 75 6d 6d 79 2d 69 6f 6d 75 78 63 2d 67 70 72 dummy-iomuxc-gpr 40 32 30 65 34 30 30 30 00 e3 f3 ab fe d1 1b dd @20e4000........ backtrace: [<b0402aec>] kasprintf+0x2c/0x54 [<a6fbad2c>] regmap_debugfs_init+0x7c/0x31c [<9c8d91fa>] __regmap_init+0xb5c/0xcf4 [<5b1c3d2a>] of_syscon_register+0x164/0x2c4 [<596a5d80>] syscon_node_to_regmap+0x64/0x90 [<49bd597b>] imx6ul_init_machine+0x34/0xa0 [<250a4dac>] customize_machine+0x1c/0x30 [<2d19fdaf>] do_one_initcall+0x7c/0x398 [<e6084469>] kernel_init_freeable+0x328/0x448 [<168c9101>] kernel_init+0x8/0x114 [<913268aa>] ret_from_fork+0x14/0x20 [<ce7b131a>] 0x0 Root cause is that map->debugfs_name is allocated using kasprintf and then the pointer is lost by assigning it other memory address. Reported-by: Stefan Wahren <stefan.wahren@i2se.com> Signed-off-by: Daniel Baluta <daniel.baluta@nxp.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-21regmap-irq: do not write mask register if mask_base is zeroMark Zhang1-0/+6
commit 7151449fe7fa5962c6153355f9779d6be99e8e97 upstream. If client have not provided the mask base register then do not write into the mask register. Signed-off-by: Laxman Dewangan <ldewangan@nvidia.com> Signed-off-by: Jinyoung Park <jinyoungp@nvidia.com> Signed-off-by: Venkat Reddy Talla <vreddytalla@nvidia.com> Signed-off-by: Mark Zhang <markz@nvidia.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-21firmware: improve LSM/IMA security behaviourSven Van Asbroeck1-1/+1
commit 2472d64af2d3561954e2f05365a67692bb852f2a upstream. The firmware loader queries if LSM/IMA permits it to load firmware via the sysfs fallback. Unfortunately, the code does the opposite: it expressly permits sysfs fw loading if security_kernel_load_data( LOADING_FIRMWARE) returns -EACCES. This happens because a zero-on-success return value is cast to a bool that's true on success. Fix the return value handling so we get the correct behaviour. Fixes: 6e852651f28e ("firmware: add call to LSM hook before firmware sysfs fallback") Cc: Stable <stable@vger.kernel.org> Cc: Mimi Zohar <zohar@linux.vnet.ibm.com> Cc: Kees Cook <keescook@chromium.org> To: Luis Chamberlain <mcgrof@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Sven Van Asbroeck <TheSven73@gmail.com> Reviewed-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-21drivers: base: cacheinfo: Ensure cpu hotplug work is done before Intel RDTJames Morse1-1/+2
commit 83b44fe343b5abfcb1b2261289bd0cfcfcfd60a8 upstream. The cacheinfo structures are alloced/freed by cpu online/offline callbacks. Originally these were only used by sysfs to expose the cache topology to user space. Without any in-kernel dependencies CPUHP_AP_ONLINE_DYN was an appropriate choice. resctrl has started using these structures to identify CPUs that share a cache. It updates its 'domain' structures from cpu online/offline callbacks. These depend on the cacheinfo structures (resctrl_online_cpu()->domain_add_cpu()->get_cache_id()-> get_cpu_cacheinfo()). These also run as CPUHP_AP_ONLINE_DYN. Now that there is an in-kernel dependency, move the cacheinfo work earlier so we know its done before resctrl's CPUHP_AP_ONLINE_DYN work runs. Fixes: 2264d9c74dda1 ("x86/intel_rdt: Build structures for each resource based on cache topology") Cc: <stable@vger.kernel.org> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: James Morse <james.morse@arm.com> Link: https://lore.kernel.org/r/20190624173656.202407-1-james.morse@arm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-31PM / core: Propagate dev->power.wakeup_path when no callbacksUlf Hansson1-0/+4
[ Upstream commit dc351d4c5f4fe4d0f274d6d660227be0c3a03317 ] The dev->power.direct_complete flag may become set in device_prepare() in case the device don't have any PM callbacks (dev->power.no_pm_callbacks is set). This leads to a broken behaviour, when there is child having wakeup enabled and relies on its parent to be used in the wakeup path. More precisely, when the direct complete path becomes selected for the child in __device_suspend(), the propagation of the dev->power.wakeup_path becomes skipped as well. Let's address this problem, by checking if the device is a part the wakeup path or has wakeup enabled, then prevent the direct complete path from being used. Reported-by: Loic Pallardy <loic.pallardy@st.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> [ rjw: Comment cleanup ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-25driver core: Postpone DMA tear-down until after devres release for probe failureJohn Garry1-3/+2
commit 0b777eee88d712256ba8232a9429edb17c4f9ceb upstream. In commit 376991db4b64 ("driver core: Postpone DMA tear-down until after devres release"), we changed the ordering of tearing down the device DMA ops and releasing all the device's resources; this was because the DMA ops should be maintained until we release the device's managed DMA memories. However, we have seen another crash on an arm64 system when a device driver probe fails: hisi_sas_v3_hw 0000:74:02.0: Adding to iommu group 2 scsi host1: hisi_sas_v3_hw BUG: Bad page state in process swapper/0 pfn:313f5 page:ffff7e0000c4fd40 count:1 mapcount:0 mapping:0000000000000000 index:0x0 flags: 0xfffe00000001000(reserved) raw: 0fffe00000001000 ffff7e0000c4fd48 ffff7e0000c4fd48 0000000000000000 raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set bad because of flags: 0x1000(reserved) Modules linked in: CPU: 49 PID: 1 Comm: swapper/0 Not tainted 5.1.0-rc1-43081-g22d97fd-dirty #1433 Hardware name: Huawei D06/D06, BIOS Hisilicon D06 UEFI RC0 - V1.12.01 01/29/2019 Call trace: dump_backtrace+0x0/0x118 show_stack+0x14/0x1c dump_stack+0xa4/0xc8 bad_page+0xe4/0x13c free_pages_check_bad+0x4c/0xc0 __free_pages_ok+0x30c/0x340 __free_pages+0x30/0x44 __dma_direct_free_pages+0x30/0x38 dma_direct_free+0x24/0x38 dma_free_attrs+0x9c/0xd8 dmam_release+0x20/0x28 release_nodes+0x17c/0x220 devres_release_all+0x34/0x54 really_probe+0xc4/0x2c8 driver_probe_device+0x58/0xfc device_driver_attach+0x68/0x70 __driver_attach+0x94/0xdc bus_for_each_dev+0x5c/0xb4 driver_attach+0x20/0x28 bus_add_driver+0x14c/0x200 driver_register+0x6c/0x124 __pci_register_driver+0x48/0x50 sas_v3_pci_driver_init+0x20/0x28 do_one_initcall+0x40/0x25c kernel_init_freeable+0x2b8/0x3c0 kernel_init+0x10/0x100 ret_from_fork+0x10/0x18 Disabling lock debugging due to kernel taint BUG: Bad page state in process swapper/0 pfn:313f6 page:ffff7e0000c4fd80 count:1 mapcount:0 mapping:0000000000000000 index:0x0 [ 89.322983] flags: 0xfffe00000001000(reserved) raw: 0fffe00000001000 ffff7e0000c4fd88 ffff7e0000c4fd88 0000000000000000 raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 The crash occurs for the same reason. In this case, on the really_probe() failure path, we are still clearing the DMA ops prior to releasing the device's managed memories. This patch fixes this issue by reordering the DMA ops teardown and the call to devres_release_all() on the failure path. Reported-by: Xiang Chen <chenxiang66@hisilicon.com> Tested-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> [jpg: backport to 4.19.x and earlier] Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-14x86/speculation/mds: Add sysfs reporting for MDSThomas Gleixner1-0/+8
commit 8a4b06d391b0a42a373808979b5028f5c84d9c6a upstream Add the sysfs reporting file for MDS. It exposes the vulnerability and mitigation state similar to the existing files for the other speculative hardware vulnerabilities. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jon Masters <jcm@redhat.com> Tested-by: Jon Masters <jcm@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-20mm: hide incomplete nr_indirectly_reclaimable in sysfsKonstantin Khlebnikov1-1/+6
In upstream branch this fixed by commit b29940c1abd7 ("mm: rename and change semantics of nr_indirectly_reclaimable_bytes"). This fixes /sys/devices/system/node/node*/vmstat format: ... nr_dirtied 6613155 nr_written 5796802 11089216 ... Cc: <stable@vger.kernel.org> # 4.19.y Fixes: 7aaf77272358 ("mm: don't show nr_indirectly_reclaimable in /proc/vmstat") Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Roman Gushchin <guro@fb.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-20PM / Domains: Avoid a potential deadlockJiada Wang1-7/+6
[ Upstream commit 2071ac985d37efe496782c34318dbead93beb02f ] Lockdep warns that prepare_lock and genpd->mlock can cause a deadlock the deadlock scenario is like following: First thread is probing cs2000 cs2000_probe() clk_register() __clk_core_init() clk_prepare_lock() ----> acquires prepare_lock cs2000_recalc_rate() i2c_smbus_read_byte_data() rcar_i2c_master_xfer() dma_request_chan() rcar_dmac_of_xlate() rcar_dmac_alloc_chan_resources() pm_runtime_get_sync() __pm_runtime_resume() rpm_resume() rpm_callback() genpd_runtime_resume() ----> acquires genpd->mlock Second thread is attaching any device to the same PM domain genpd_add_device() genpd_lock() ----> acquires genpd->mlock cpg_mssr_attach_dev() of_clk_get_from_provider() __of_clk_get_from_provider() __clk_create_clk() clk_prepare_lock() ----> acquires prepare_lock Since currently no PM provider access genpd's critical section in .attach_dev, and .detach_dev callbacks, so there is no need to protect these two callbacks with genpd->mlock. This patch avoids a potential deadlock by moving out .attach_dev and .detach_dev from genpd->mlock, so that genpd->mlock won't be held when prepare_lock is acquired in .attach_dev and .detach_dev Signed-off-by: Jiada Wang <jiada_wang@mentor.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-03-23PM / wakeup: Rework wakeup source timer cancellationViresh Kumar1-1/+7
commit 1fad17fb1bbcd73159c2b992668a6957ecc5af8a upstream. If wakeup_source_add() is called right after wakeup_source_remove() for the same wakeup source, timer_setup() may be called for a potentially scheduled timer which is incorrect. To avoid that, move the wakeup source timer cancellation from wakeup_source_drop() to wakeup_source_remove(). Moreover, make wakeup_source_remove() clear the timer function after canceling the timer to let wakeup_source_not_registered() treat unregistered wakeup sources in the same way as the ones that have never been registered. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: 4.4+ <stable@vger.kernel.org> # 4.4+ [ rjw: Subject, changelog, merged two patches together ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-14driver core: Postpone DMA tear-down until after devres releaseGeert Uytterhoeven1-1/+1
commit 376991db4b6464e906d699ef07681e2ffa8ab08c upstream. When unbinding the (IOMMU-enabled) R-Car SATA device on Salvator-XS (R-Car H3 ES2.0), in preparation of rebinding against vfio-platform for device pass-through for virtualization:     echo ee300000.sata > /sys/bus/platform/drivers/sata_rcar/unbind the kernel crashes with:     Unable to handle kernel paging request at virtual address ffffffbf029ffffc     Mem abort info:       ESR = 0x96000006       Exception class = DABT (current EL), IL = 32 bits       SET = 0, FnV = 0       EA = 0, S1PTW = 0     Data abort info:       ISV = 0, ISS = 0x00000006       CM = 0, WnR = 0     swapper pgtable: 4k pages, 39-bit VAs, pgdp = 000000007e8c586c     [ffffffbf029ffffc] pgd=000000073bfc6003, pud=000000073bfc6003, pmd=0000000000000000     Internal error: Oops: 96000006 [#1] SMP     Modules linked in:     CPU: 0 PID: 1098 Comm: bash Not tainted 5.0.0-rc5-salvator-x-00452-g37596f884f4318ef #287     Hardware name: Renesas Salvator-X 2nd version board based on r8a7795 ES2.0+ (DT)     pstate: 60400005 (nZCv daif +PAN -UAO)     pc : __free_pages+0x8/0x58     lr : __dma_direct_free_pages+0x50/0x5c     sp : ffffff801268baa0     x29: ffffff801268baa0 x28: 0000000000000000     x27: ffffffc6f9c60bf0 x26: ffffffc6f9c60bf0     x25: ffffffc6f9c60810 x24: 0000000000000000     x23: 00000000fffff000 x22: ffffff8012145000     x21: 0000000000000800 x20: ffffffbf029fffc8     x19: 0000000000000000 x18: ffffffc6f86c42c8     x17: 0000000000000000 x16: 0000000000000070     x15: 0000000000000003 x14: 0000000000000000     x13: ffffff801103d7f8 x12: 0000000000000028     x11: ffffff8011117604 x10: 0000000000009ad8     x9 : ffffff80110126d0 x8 : ffffffc6f7563000     x7 : 6b6b6b6b6b6b6b6b x6 : 0000000000000018     x5 : ffffff8011cf3cc8 x4 : 0000000000004000     x3 : 0000000000080000 x2 : 0000000000000001     x1 : 0000000000000000 x0 : ffffffbf029fffc8     Process bash (pid: 1098, stack limit = 0x00000000c38e3e32)     Call trace:      __free_pages+0x8/0x58      __dma_direct_free_pages+0x50/0x5c      arch_dma_free+0x1c/0x98      dma_direct_free+0x14/0x24      dma_free_attrs+0x9c/0xdc      dmam_release+0x18/0x20      release_nodes+0x25c/0x28c      devres_release_all+0x48/0x4c      device_release_driver_internal+0x184/0x1f0      device_release_driver+0x14/0x1c      unbind_store+0x70/0xb8      drv_attr_store+0x24/0x34      sysfs_kf_write+0x4c/0x64      kernfs_fop_write+0x154/0x1c4      __vfs_write+0x34/0x164      vfs_write+0xb4/0x16c      ksys_write+0x5c/0xbc      __arm64_sys_write+0x14/0x1c      el0_svc_common+0x98/0x114      el0_svc_handler+0x1c/0x24      el0_svc+0x8/0xc     Code: d51b4234 17fffffa a9bf7bfd 910003fd (b9403404)     ---[ end trace 8c564cdd3a1a840f ]--- While I've bisected this to commit e8e683ae9a736407 ("iommu/of: Fix probe-deferral"), and reverting that commit on post-v5.0-rc4 kernels does fix the problem, this turned out to be a red herring. On arm64, arch_teardown_dma_ops() resets dev->dma_ops to NULL. Hence if a driver has used a managed DMA allocation API, the allocated DMA memory will be freed using the direct DMA ops, while it may have been allocated using a custom DMA ops (iommu_dma_ops in this case). Fix this by reversing the order of the calls to devres_release_all() and arch_teardown_dma_ops(). Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: stable <stable@vger.kernel.org> Reviewed-by: Robin Murphy <robin.murphy@arm.com> [rm: backport for 4.12-4.19 - kernels before 5.0 will not see the crash above, but may get silent memory corruption instead] Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-12cacheinfo: Keep the old value if of_property_read_u32 failsHuacai Chen1-4/+2
commit 3a34c986324c07dde32903f7bb262e6138e77c2a upstream. Commit 448a5a552f336bd7b847b1951 ("drivers: base: cacheinfo: use OF property_read_u32 instead of get_property,read_number") makes cache size and number_of_sets be 0 if DT doesn't provide there values. I think this is unreasonable so make them keep the old values, which is the same as old kernels. Fixes: 448a5a552f33 ("drivers: base: cacheinfo: use OF property_read_u32 instead of get_property,read_number") Cc: stable@vger.kernel.org Signed-off-by: Huacai Chen <chenhc@lemote.com> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-12kobject: return error code if writing /sys/.../uevent failsPeter Rajnoha2-5/+15
[ Upstream commit df44b479654f62b478c18ee4d8bc4e9f897a9844 ] Propagate error code back to userspace if writing the /sys/.../uevent file fails. Before, the write operation always returned with success, even if we failed to recognize the input string or if we failed to generate the uevent itself. With the error codes properly propagated back to userspace, we are able to react in userspace accordingly by not assuming and awaiting a uevent that is not delivered. Signed-off-by: Peter Rajnoha <prajnoha@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-02-12driver core: Move async_synchronize_full callAlexander Duyck1-3/+3
[ Upstream commit c37d721c68ad88925ba0e72f6e14acb829a8c6bb ] Move the async_synchronize_full call out of __device_release_driver and into driver_detach. The idea behind this is that the async_synchronize_full call will only guarantee that any existing async operations are flushed. This doesn't do anything to guarantee that a hotplug event that may occur while we are doing the release of the driver will not be asynchronously scheduled. By moving this into the driver_detach path we can avoid potential deadlocks as we aren't holding the device lock at this point and we should not have the driver we want to flush loaded so the flush will take care of any asynchronous events the driver we are detaching might have scheduled. Fixes: 765230b5f084 ("driver-core: add asynchronous probing support for drivers") Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-02-12devres: Align data[] to ARCH_KMALLOC_MINALIGNAlexey Brodkin1-2/+8
commit a66d972465d15b1d89281258805eb8b47d66bd36 upstream. Initially we bumped into problem with 32-bit aligned atomic64_t on ARC, see [1]. And then during quite lengthly discussion Peter Z. mentioned ARCH_KMALLOC_MINALIGN which IMHO makes perfect sense. If allocation is done by plain kmalloc() obtained buffer will be ARCH_KMALLOC_MINALIGN aligned and then why buffer obtained via devm_kmalloc() should have any other alignment? This way we at least get the same behavior for both types of allocation. [1] http://lists.infradead.org/pipermail/linux-snps-arc/2018-July/004009.html [2] http://lists.infradead.org/pipermail/linux-snps-arc/2018-July/004036.html Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Greg KH <greg@kroah.com> Cc: <stable@vger.kernel.org> # 4.8+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>