From 8d9a784d1e2c75e0dcae06f77a02f5e7bb547f3a Mon Sep 17 00:00:00 2001
From: Stuart Menefy <stuart.menefy@st.com>
Date: Tue, 14 Feb 2012 11:29:11 +0000
Subject: sh: Fix error synchronising kernel page tables

The problem is caused by the interaction of two features in the Linux
memory management code.

A processes address space is described by a struct mm_struct, and
every thread has a pointer to the mm it should run in. The exception
to this are kernel threads, which don't have an mm, and so borrow
the mm from the last thread which ran. The system is bootstrapped
by the initial kernel thread using init's mm (even though init hasn't
been created yet, its mm is the static init_mm).

The other feature is how the kernel handles the page table which
describes the portion of the address space which is only visible when
executing inside the kernel, and which is shared by all threads. On
the SH4 the only portion of the kernel's address space which described
using the page table is called P3, from 0xc0000000 to 0xdfffffff. This
portion of the address space is divided into three:
  - mappings for dma_alloc_coherent()
  - mappings for vmalloc() and ioremap()
  - fixmap mappings, primarily used in copy_user_pages() to create
    kernel mappings of user pages with the correct cache colour.

To optimise the TLB miss handler we don't want to add an additional
condition which checks whether the faulting address is in the user or
the kernel portion of the address space, and so all page tables have a
common portion which describes the kernel part of the address
space. As the SH4 uses a two level page table, only the kernel portion
of first level page table (the pgd entries) is duplicated. These all
point to the same second level entries (the pte's), and so no memory
is wasted.

The reference page table for the kernel is called the swapper_pg_dir,
and when a new page table is created for a new process the kernel
portion of the page table is copied from swapper_pg_dir. This works
fine when changes only occur in the second level of the kernel's page
table, or the first level entries are created before any new user
processes. However if a change occurs to the first level of the page
table, and there are existing processes which don't have this entry in
their page table, this new entry needs to be added. This is done on
demand, when the kernel accesses a P3 address which isn't mapped using
the current page table, the code in vmalloc_fault() copies the entry
from the reference page table (swapper_pg_dir) into the current
processes page table.

The bug which this patch addresses is that the code in vmalloc_fault()
was not copying addresses which fell in the dma_alloc_coherent()
portion of the address space, and it should have been copying any P3
address.

Why we hadn't seen this before, and what made this hard to reproduce,
is that normally the kernel will have called dma_alloc_coherent(), and
accessed the memory mapping created, before any user process
runs. Typically drivers such as USB or SATA will have created and used
mappings of this type during the kernel initialisation, when probing
for the attached devices, before init runs. Ethernet is slightly
different, as it normally only creates and accesses
dma_alloc_coherent() mappings when the network is brought up, but if
kernel level IP configuration is used this will also occur before any
user space process runs. So the first reproduction of this problem
which we saw was occurred when USB and SATA were removed from the
kernel, and then bring up Ethernet from user space using ifconfig.
I'd like to thank Joseph Bormolini who did the hard work reducing the
problem to this simple to reproduce criteria.

In your case the situation is slightly different, and turns out to
depends on the exact kernel configuration (which we had) and your
ramdisk contents (which we didn't - hence the need for some assumptions).

In this case the problem is a side effect of kernel level module
loading. Kernel subsystems sometimes trigger the load of kernel
modules directly, for example the crypto subsystem tries to load the
cryptomgr and MTD tries to load modules for Flash partitioning if
these are not built into the kernel. This is done by the kernel
creating a user process which runs insmod to try and load the
appropriate module.

In order for this to cause problems the system must be running with a
initrd or initramfs, which contains an insmod executable - if the
kernel can't find an insmod to run, no user process is created, and
the problem doesn't occur.  If an insmod is found, a process is
created to run it, which will inherit the kernel portion of the
swapper_pg_dir first level page table. It doesn't matter whether the
inmod is successful or not, but when the the kernel scheduler context
switches back to the kernel initialisation thread, the insmod's mm is
'borrowed' by the kernel thread, as it doesn't have an address space
of its own. (Reference counting is used to ensure this mm is not
destroyed, even though the user process which caused its creation may no
longer exist.) If this address space doesn't have a first level page
table entry for the consistent mappings, and a driver tries to access
such a mapping, we are in the same situation as described above,
except this time in a kernel thread rather than a user thread
executing inside the kernel.

See bugzilla: 15425, 15836, 15862, 16106, 16793

Signed-off-by: Stuart Menefy <stuart.menefy@st.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
---
 arch/sh/mm/fault_32.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'arch/sh')

diff --git a/arch/sh/mm/fault_32.c b/arch/sh/mm/fault_32.c
index 324eef93c900..e99b104d967a 100644
--- a/arch/sh/mm/fault_32.c
+++ b/arch/sh/mm/fault_32.c
@@ -86,7 +86,7 @@ static noinline int vmalloc_fault(unsigned long address)
 	pte_t *pte_k;
 
 	/* Make sure we are in vmalloc/module/P3 area: */
-	if (!(address >= VMALLOC_START && address < P3_ADDR_MAX))
+	if (!(address >= P3SEG && address < P3_ADDR_MAX))
 		return -1;
 
 	/*
-- 
cgit v1.2.3


From ec2ccd884ab1e190bc5ddb210c7d5f5ea2ddeba3 Mon Sep 17 00:00:00 2001
From: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com>
Date: Fri, 27 Apr 2012 11:12:38 +0930
Subject: sh: Fix up tracepoint build fallout from static key introduction.

With the introduction of static keys, anything using tracepoints blows up
in the following manner:

include/trace/events/oom.h:8:13: error: initializer element is not constant
include/trace/events/oom.h:8:13: error: (near initialization for '__tracepoint_oom_score_adj_update')
include/trace/events/oom.h:8:13: error: initializer element is not constant
include/trace/events/oom.h:8:13: error: (near initialization for '__tracepoint_oom_score_adj_update.key')

This is a result of the STATIC_KEY_INIT_xxx defs wrapping ATOMIC_INIT()
which on sh includes an atomic_t typecast. Given that we don't really
need the typecast for anything anymore, the simplest solution is simply
to kill off the cast.

Signed-off-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
---
 arch/sh/include/asm/atomic.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'arch/sh')

diff --git a/arch/sh/include/asm/atomic.h b/arch/sh/include/asm/atomic.h
index 37f2f4a55231..f4c1c20bcdf6 100644
--- a/arch/sh/include/asm/atomic.h
+++ b/arch/sh/include/asm/atomic.h
@@ -11,7 +11,7 @@
 #include <linux/types.h>
 #include <asm/cmpxchg.h>
 
-#define ATOMIC_INIT(i)	( (atomic_t) { (i) } )
+#define ATOMIC_INIT(i)	{ (i) }
 
 #define atomic_read(v)		(*(volatile int *)&(v)->counter)
 #define atomic_set(v,i)		((v)->counter = (i))
-- 
cgit v1.2.3


From 6c0a9fa62feb7e9fdefa9720bcc03040c9b0b311 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx@linutronix.de>
Date: Sat, 5 May 2012 15:05:40 +0000
Subject: fork: Remove the weak insanity

We error out when compiling with gcc4.1.[01] as it miscompiles
__weak. The workaround with magic defines is not longer
necessary. Make it __weak again.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20120505150141.306358267@linutronix.de
---
 arch/sh/include/asm/thread_info.h  | 1 -
 arch/x86/include/asm/thread_info.h | 1 -
 kernel/fork.c                      | 8 +-------
 3 files changed, 1 insertion(+), 9 deletions(-)

(limited to 'arch/sh')

diff --git a/arch/sh/include/asm/thread_info.h b/arch/sh/include/asm/thread_info.h
index 20ee40af16e9..09963d4018cb 100644
--- a/arch/sh/include/asm/thread_info.h
+++ b/arch/sh/include/asm/thread_info.h
@@ -98,7 +98,6 @@ static inline struct thread_info *current_thread_info(void)
 extern struct thread_info *alloc_thread_info_node(struct task_struct *tsk, int node);
 extern void free_thread_info(struct thread_info *ti);
 extern void arch_task_cache_init(void);
-#define arch_task_cache_init arch_task_cache_init
 extern int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src);
 extern void init_thread_xstate(void);
 
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index ad6df8ccd715..8692a166dd4e 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -284,6 +284,5 @@ static inline bool is_ia32_task(void)
 extern void arch_task_cache_init(void);
 extern void free_thread_info(struct thread_info *ti);
 extern int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src);
-#define arch_task_cache_init arch_task_cache_init
 #endif
 #endif /* _ASM_X86_THREAD_INFO_H */
diff --git a/kernel/fork.c b/kernel/fork.c
index b9372a0bff18..a79b36e2e912 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -203,13 +203,7 @@ void __put_task_struct(struct task_struct *tsk)
 }
 EXPORT_SYMBOL_GPL(__put_task_struct);
 
-/*
- * macro override instead of weak attribute alias, to workaround
- * gcc 4.1.0 and 4.1.1 bugs with weak attribute and empty functions.
- */
-#ifndef arch_task_cache_init
-#define arch_task_cache_init()
-#endif
+void __init __weak arch_task_cache_init(void) { }
 
 void __init fork_init(unsigned long mempages)
 {
-- 
cgit v1.2.3