summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2013-04-26sched: Fix init NOHZ_IDLE flagVincent Guittot2-11/+16
On my SMP platform which is made of 5 cores in 2 clusters, I have the nr_busy_cpu field of sched_group_power struct that is not null when the platform is fully idle - which makes the scheduler unhappy. The root cause is: During the boot sequence, some CPUs reach the idle loop and set their NOHZ_IDLE flag while waiting for others CPUs to boot. But the nr_busy_cpus field is initialized later with the assumption that all CPUs are in the busy state whereas some CPUs have already set their NOHZ_IDLE flag. More generally, the NOHZ_IDLE flag must be initialized when new sched_domains are created in order to ensure that NOHZ_IDLE and nr_busy_cpus are aligned. This condition can be ensured by adding a synchronize_rcu() between the destruction of old sched_domains and the creation of new ones so the NOHZ_IDLE flag will not be updated with old sched_domain once it has been initialized. But this solution introduces a additionnal latency in the rebuild sequence that is called during cpu hotplug. As suggested by Frederic Weisbecker, another solution is to have the same rcu lifecycle for both NOHZ_IDLE and sched_domain struct. A new nohz_idle field is added to sched_domain so both status and sched_domain will share the same RCU lifecycle and will be always synchronized. In addition, there is no more need to protect nohz_idle against concurrent access as it is only modified by 2 exclusive functions called by local cpu. This solution has been prefered to the creation of a new struct with an extra pointer indirection for sched_domain. The synchronization is done at the cost of : - An additional indirection and a rcu_dereference for accessing nohz_idle. - We use only the nohz_idle field of the top sched_domain. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: linaro-kernel@lists.linaro.org Cc: peterz@infradead.org Cc: fweisbec@gmail.com Cc: pjt@google.com Cc: rostedt@goodmis.org Cc: efault@gmx.de Link: http://lkml.kernel.org/r/1366729142-14662-1-git-send-email-vincent.guittot@linaro.org [ Fixed !NO_HZ build bug. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-24sched: Prevent to re-select dst-cpu in load_balance()Joonsoo Kim1-18/+15
Commit 88b8dac0 makes load_balance() consider other cpus in its group. But, in that, there is no code for preventing to re-select dst-cpu. So, same dst-cpu can be selected over and over. This patch add functionality to load_balance() in order to exclude cpu which is selected once. We prevent to re-select dst_cpu via env's cpus, so now, env's cpus is a candidate not only for src_cpus, but also dst_cpus. With this patch, we can remove lb_iterations and max_lb_iterations, because we decide whether we can go ahead or not via env's cpus. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Jason Low <jason.low2@hp.com> Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1366705662-3587-7-git-send-email-iamjoonsoo.kim@lge.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-24sched: Rename load_balance_tmpmask to load_balance_maskJoonsoo Kim2-4/+4
This name doesn't represent specific meaning. So rename it to imply it's purpose. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Jason Low <jason.low2@hp.com> Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1366705662-3587-6-git-send-email-iamjoonsoo.kim@lge.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-24sched: Move up affinity check to mitigate useless redoing overheadJoonsoo Kim1-9/+7
Currently, LBF_ALL_PINNED is cleared after affinity check is passed. So, if task migration is skipped by small load value or small imbalance value in move_tasks(), we don't clear LBF_ALL_PINNED. At last, we trigger 'redo' in load_balance(). Imbalance value is often so small that any tasks cannot be moved to other cpus and, of course, this situation may be continued after we change the target cpu. So this patch move up affinity check code and clear LBF_ALL_PINNED before evaluating load value in order to mitigate useless redoing overhead. In addition, re-order some comments correctly. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Jason Low <jason.low2@hp.com> Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1366705662-3587-5-git-send-email-iamjoonsoo.kim@lge.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-24sched: Don't consider other cpus in our group in case of NEWLY_IDLEJoonsoo Kim1-1/+14
Commit 88b8dac0 makes load_balance() consider other cpus in its group, regardless of idle type. When we do NEWLY_IDLE balancing, we should not consider it, because a motivation of NEWLY_IDLE balancing is to turn this cpu to non idle state if needed. This is not the case of other cpus. So, change code not to consider other cpus for NEWLY_IDLE balancing. With this patch, assign 'if (pulled_task) this_rq->idle_stamp = 0' in idle_balance() is corrected, because NEWLY_IDLE balancing doesn't consider other cpus. Assigning to 'this_rq->idle_stamp' is now valid. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Tested-by: Jason Low <jason.low2@hp.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1366705662-3587-4-git-send-email-iamjoonsoo.kim@lge.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-24sched: Explicitly cpu_idle_type checking in rebalance_domains()Joonsoo Kim1-3/+4
After commit 88b8dac0, dst-cpu can be changed in load_balance(), then we can't know cpu_idle_type of dst-cpu when load_balance() return positive. So, add explicit cpu_idle_type checking. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Tested-by: Jason Low <jason.low2@hp.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1366705662-3587-3-git-send-email-iamjoonsoo.kim@lge.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-24sched: Change position of resched_cpu() in load_balance()Joonsoo Kim1-5/+5
cur_ld_moved is reset if env.flags hit LBF_NEED_BREAK. So, there is possibility that we miss doing resched_cpu(). Correct it as changing position of resched_cpu() before checking LBF_NEED_BREAK. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Tested-by: Jason Low <jason.low2@hp.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1366705662-3587-2-git-send-email-iamjoonsoo.kim@lge.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-21sched: Fix wrong rq's runnable_avg update with rt tasksVincent Guittot3-2/+49
The current update of the rq's load can be erroneous when RT tasks are involved. The update of the load of a rq that becomes idle, is done only if the avg_idle is less than sysctl_sched_migration_cost. If RT tasks and short idle duration alternate, the runnable_avg will not be updated correctly and the time will be accounted as idle time when a CFS task wakes up. A new idle_enter function is called when the next task is the idle function so the elapsed time will be accounted as run time in the load of the rq, whatever the average idle time is. The function update_rq_runnable_avg is removed from idle_balance. When a RT task is scheduled on an idle CPU, the update of the rq's load is not done when the rq exit idle state because CFS's functions are not called. Then, the idle_balance, which is called just before entering the idle function, updates the rq's load and makes the assumption that the elapsed time since the last update, was only running time. As a consequence, the rq's load of a CPU that only runs a periodic RT task, is close to LOAD_AVG_MAX whatever the running duration of the RT task is. A new idle_exit function is called when the prev task is the idle function so the elapsed time will be accounted as idle time in the rq's load. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: linaro-kernel@lists.linaro.org Cc: peterz@infradead.org Cc: pjt@google.com Cc: fweisbec@gmail.com Cc: efault@gmx.de Link: http://lkml.kernel.org/r/1366302867-5055-1-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-10sched/cpuacct/UML: Fix header file dependency bug on the UML buildIngo Molnar1-0/+1
The cpuacct split caused this build failure on UML: kernel/sched/cpuacct.c:94:2: error: implicit declaration of function 'ERR_PTR' Cc: Li Zefan <lizefan@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-10cgroup: Kill subsys.active flagLi Zefan1-3/+0
The only user was cpuacct. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Li Zefan <lizefan@huawei.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/5155385A.4040207@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-10sched/cpuacct: No need to check subsys active stateLi Zefan1-6/+0
Now we're guaranteed when cpuacct_charge() and cpuacct_account_field() are called, cpuacct has already been properly initialized, so we no longer need those checks. Signed-off-by: Li Zefan <lizefan@huawei.com> Cc: Tejun Heo <tj@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/5155384C.7000508@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-10sched/cpuacct: Initialize cpuacct subsystem earlierLi Zefan1-5/+6
Initialize cpuacct before the scheduler is functioning, so when cpuacct_charge() and cpuacct_account_field() are called, task_ca() won't return NULL. Signed-off-by: Li Zefan <lizefan@huawei.com> Cc: Tejun Heo <tj@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/5155383F.8000005@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-10sched/cpuacct: Initialize root cpuacct earlierLi Zefan3-14/+4
Now we don't need cpuacct_init(), and instead we just initialize root_cpuacct when it's defined. Signed-off-by: Li Zefan <lizefan@huawei.com> Cc: Tejun Heo <tj@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/51553834.9090701@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-10sched/cpuacct: Allocate per_cpu cpuusage for root cpuacct staticallyLi Zefan1-2/+2
This is a preparation, so later we can initialize cpuacct earlier. Signed-off-by: Li Zefan <lizefan@huawei.com> Cc: Tejun Heo <tj@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/51553822.5000403@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-10sched/cpuacct: Clean up cpuacct.hLi Zefan2-47/+43
Now most of the code in cpuacct.h can be moved to cpuacct.c Signed-off-by: Li Zefan <lizefan@huawei.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/515536D5.2080401@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-10sched/cpuacct: Remove redundant NULL checks in cpuacct_acount_field()Li Zefan2-2/+7
This is a micro optimazation for a hot path. - We don't need to check if @ca returned from task_ca() is NULL. - We don't need to check if @ca returned from parent_ca() is NULL. Signed-off-by: Li Zefan <lizefan@huawei.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/515536B7.6060602@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-10sched/cpuacct: Remove redundant NULL checks in cpuacct_charge()Li Zefan2-2/+6
This is a micro optimization for the hot path. - We don't need to check if @ca is NULL in parent_ca(). - We don't need to check if @ca is NULL in the beginning of the for loop. Signed-off-by: Li Zefan <lizefan@huawei.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/515536A9.5000700@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-10sched/cpuacct: Add cpuacct_acount_field()Li Zefan3-17/+30
So we can remove open-coded cpuacct code in cputime.c. Signed-off-by: Li Zefan <lizefan@huawei.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/51553692.9060008@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-10sched/cpuacct: Add cpuacct_init()Li Zefan3-6/+14
So we don't open-coded initialization of cpuacct in core.c. Signed-off-by: Li Zefan <lizefan@huawei.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/51553687.1060906@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-10sched: Split cpuacct code out of sched.hLi Zefan2-47/+53
Add cpuacct.h and let sched.h include it. Signed-off-by: Li Zefan <lizefan@huawei.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/5155367B.2060506@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-10sched: Split cpuacct code out of core.cLi Zefan3-220/+228
Signed-off-by: Li Zefan <lizefan@huawei.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/5155366F.5060404@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-10sched: Fix comment in rebalance_domains()Libin1-1/+1
A comment in function rebalance_domains() mentions arch_init_sched_domains(), but that function does not exist anymore. The proper function is init_sched_domains(). Signed-off-by: Libin <huawei.libin@huawei.com> Cc: <peterz@infradead.org> Link: http://lkml.kernel.org/r/1364814841-49156-1-git-send-email-huawei.libin@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-10sched: Simplify can_migrate_task()Zhang Hang1-7/+4
At this point tsk_cache_hot is always true, so no need to check it. Signed-off-by: Zhang Hang <bob.zhanghang@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/51650107.9040606@huawei.com [ Also remove unnecessary schedstat #ifdefs. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-08sched: Fix typo inside commentViresh Kumar1-1/+1
Fix typo: sched_domains_nume_distance -> sched_domains_numa_distance Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: linaro-kernel@lists.linaro.org Cc: patches@linaro.org Cc: robin.randhawa@arm.com Cc: Steve.Bannister@arm.com Cc: Liviu.Dudau@arm.com Cc: charles.garcia-tobin@arm.com Cc: arvind.chauhan@arm.com Cc: peterz@infradead.org Link: http://lkml.kernel.org/r/cd8084746ac932106d6fa6be388b8f2d6aa9617c.1365159023.git.viresh.kumar@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-03-18sched/tracing: Allow tracing the preemption decision on wakeupPeter Zijlstra1-1/+1
Thomas noted that we do the wakeup preemption check after the wakeup trace point, this means the tracepoint cannot test/report this decision; which is rather important for latency sensitive workloads. Therefore move the tracepoint after doing the preemption check. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Paul Turner <pjt@google.com> Cc: Mike Galbraith <efault@gmx.de> Link: http://lkml.kernel.org/r/1363254519.26965.9.camel@laptop Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-03-18Merge branch 'sched/core' of ↵Ingo Molnar1-12/+34
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into sched/core Pull CPU runtime stats/accounting fixes from Frederic Weisbecker: " Some users are complaining that their threadgroup's runtime accounting freezes after a week or so of intense cpu-bound workload. This set tries to fix the issue by reducing the risk of multiplication overflow in the cputime scaling code. " Stanislaw Gruszka further explained the historic context and impact of the bug: " Commit 0cf55e1ec08bb5a22e068309e2d8ba1180ab4239 start to use scalling for whole thread group, so increase chances of hitting multiplication overflow, depending on how many CPUs are on the system. We have multiplication utime * rtime for one thread since commit b27f03d4bdc145a09fb7b0c0e004b29f1ee555fa. Overflow will happen after: rtime * utime > 0xffffffffffffffff jiffies if thread utilize 100% of CPU time, that gives: rtime > sqrt(0xffffffffffffffff) jiffies ritme > sqrt(0xffffffffffffffff) / (24 * 60 * 60 * HZ) days For HZ 100 it will be 497 days for HZ 1000 it will be 49 days. Bug affect only users, who run CPU intensive application for that long period. Also they have to be interested on utime,stime values, as bug has no other visible effect as making those values incorrect. " Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-03-14sched: Fix variable name misnomer, add commentsAndrei Epure1-4/+5
The min_vruntime variable actually stores the maximum value. The added comment was taken from place_entity function. Signed-off-by: Andrei Epure <epure.andrei@gmail.com> Cc: peterz@infradead.org Link: http://lkml.kernel.org/r/1363115544-1964-1-git-send-email-epure.andrei@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-03-13sched: Lower chances of cputime scaling overflowFrederic Weisbecker1-12/+34
Some users have reported that after running a process with hundreds of threads on intensive CPU-bound loads, the cputime of the group started to freeze after a few days. This is due to how we scale the tick-based cputime against the scheduler precise execution time value. We add the values of all threads in the group and we multiply that against the sum of the scheduler exec runtime of the whole group. This easily overflows after a few days/weeks of execution. A proposed solution to solve this was to compute that multiplication on stime instead of utime: 62188451f0d63add7ad0cd2a1ae269d600c1663d ("cputime: Avoid multiplication overflow on utime scaling") The rationale behind that was that it's easy for a thread to spend most of its time in userspace under intensive CPU-bound workload but it's much harder to do CPU-bound intensive long run in the kernel. This postulate got defeated when a user recently reported he was still seeing cputime freezes after the above patch. The workload that triggers this issue relates to intensive networking workloads where most of the cputime is consumed in the kernel. To reduce much more the opportunities for multiplication overflow, lets reduce the multiplication factors to the remainders of the division between sched exec runtime and cputime. Assuming the difference between these shouldn't ever be that large, it could work on many situations. This gets the same results as in the upstream scaling code except for a small difference: the upstream code always rounds the results to the nearest integer not greater to what would be the precise result. The new code rounds to the nearest integer either greater or not greater. In practice this difference probably shouldn't matter but it's worth mentioning. If this solution appears not to be enough in the end, we'll need to partly revert back to the behaviour prior to commit 0cf55e1ec08bb5a22e068309e2d8ba1180ab4239 ("sched, cputime: Introduce thread_group_times()") Back then, the scaling was done on exit() time before adding the cputime of an exiting thread to the signal struct. And then we'll need to scale one-by-one the live threads cputime in thread_group_cputime(). The drawback may be a slightly slower code on exit time. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org>
2013-03-11sched: Spelling fixAndrei Epure1-1/+1
Signed-off-by: Andrei Epure <epure.andrei@gmail.com> Cc: trivial@kernel.org Cc: peterz@infradead.org Link: http://lkml.kernel.org/r/1362996200-2674-1-git-send-email-epure.andrei@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-03-11sched: Fix update_group_power() prototype placement to fix build warning ↵Li Zefan1-1/+2
when !CONFIG_SMP All warnings: In file included from kernel/sched/core.c:85:0: kernel/sched/sched.h:1036:39: warning: 'struct sched_domain' declared inside parameter list kernel/sched/sched.h:1036:39: warning: its scope is only this definition or declaration, which is probably not what you want It's because struct sched_domain is defined inside #if CONFIG_SMP, while update_group_power() is declared unconditionally. Fix this warning by declaring update_group_power() only if CONFIG_SMP=n. Build tested with CONFIG_SMP enabled and then disabled. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Li Zefan <lizefan@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/5137F4BA.2060101@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-03-08Merge branch 'sched/cputime' of ↵Ingo Molnar3-76/+86
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into sched/core Pull cputime changes from Frederic Weisbecker: * Generalize exception handling * Fix race in context tracking state restore on return from exception and irq exit kernel preemption * Fix cputime scaling in full dynticks accounting dynamic off-case * Fix default Kconfig value Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-03-07cputime: Dynamically scale cputime for full dynticks accountingFrederic Weisbecker2-75/+81
The full dynticks cputime accounting is able to account either using the tick or the context tracking subsystem. This way the housekeeping CPU can keep the low overhead tick based solution. This latter mode has a low jiffies resolution granularity and need to be scaled against CFS precise runtime accounting to improve its result. We are doing this for CONFIG_TICK_CPU_ACCOUNTING, now we also need to expand it to full dynticks accounting dynamic off-case as well. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Kevin Hilman <khilman@linaro.org> Cc: Mats Liljegren <mats.liljegren@enea.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2013-03-07context_tracking: Restore preempted context state after preempt_schedule_irq()Frederic Weisbecker1-1/+5
From the context tracking POV, preempt_schedule_irq() behaves pretty much like an exception: It can be called anytime and schedule another task. But currently it doesn't restore the context tracking state of the preempted code on preempt_schedule_irq() return. As a result, if preempt_schedule_irq() is called in the tiny frame between user_enter() and the actual return to userspace, we resume userspace with the wrong context tracking state. Fix this by using exception_enter/exit() which are a perfect fit for this kind of issue. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Kevin Hilman <khilman@linaro.org> Cc: Mats Liljegren <mats.liljegren@enea.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2013-03-06sched: Remove double declaration of root_task_groupLi Zefan2-5/+4
It's already declared in include/linux/sched.h Signed-off-by: Li Zefan <lizefan@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/5135A7D8.7000107@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-03-06sched: Move group scheduling functions out of include/linux/sched.hLi Zefan2-5/+17
- Make sched_group_{set_,}runtime(), sched_group_{set_,}period() and sched_rt_can_attach() static. - Move sched_{create,destroy,online,offline}_group() to kernel/sched/sched.h. - Remove declaration of sched_group_shares(). Signed-off-by: Li Zefan <lizefan@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/5135A7C5.3000708@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-03-06sched: Make default_scale_freq_power() staticLi Zefan1-3/+3
As default_scale_{freq,smt}_power() and update_rt_power() are used in kernel/sched/fair.c only, annotate them as static functions. Signed-off-by: Li Zefan <lizefan@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/5135A7AF.8010900@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-03-06sched: Move struct sched_class to kernel/sched/sched.hLi Zefan1-0/+55
It's used internally only. Signed-off-by: Li Zefan <lizefan@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/5135A79F.8090502@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-03-06sched: Move wake flags to kernel/sched/sched.hLi Zefan1-0/+7
They are used internally only. Signed-off-by: Li Zefan <lizefan@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/5135A78E.7040609@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-03-06sched: Move struct sched_group to kernel/sched/sched.hLi Zefan1-0/+56
Move struct sched_group_power and sched_group and related inline functions to kernel/sched/sched.h, as they are used internally only. Signed-off-by: Li Zefan <lizefan@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/5135A77F.2010705@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-03-06sched: Move SCHED_LOAD_SHIFT macros to kernel/sched/sched.hLi Zefan1-1/+25
They are used internally only. Signed-off-by: Li Zefan <lizefan@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/5135A771.4070104@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-03-04Merge branch 'for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more VFS bits from Al Viro: "Unfortunately, it looks like xattr series will have to wait until the next cycle ;-/ This pile contains 9p cleanups and fixes (races in v9fs_fid_add() etc), fixup for nommu breakage in shmem.c, several cleanups and a bit more file_inode() work" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: constify path_get/path_put and fs_struct.c stuff fix nommu breakage in shmem.c cache the value of file_inode() in struct file 9p: if v9fs_fid_lookup() gets to asking server, it'd better have hashed dentry 9p: make sure ->lookup() adds fid to the right dentry 9p: untangle ->lookup() a bit 9p: double iput() in ->lookup() if d_materialise_unique() fails 9p: v9fs_fid_add() can't fail now v9fs: get rid of v9fs_dentry 9p: turn fid->dlist into hlist 9p: don't bother with private lock in ->d_fsdata; dentry->d_lock will do just fine more file_inode() open-coded instances selinux: opened file can't have NULL or negative ->f_path.dentry (In the meantime, the hlist traversal macros have changed, so this required a semantic conflict fixup for the newly hlistified fid->dlist)
2013-03-04Merge tag 'metag-v3.9-rc1-v4' of ↵Linus Torvalds1-2/+4
git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag Pull new ImgTec Meta architecture from James Hogan: "This adds core architecture support for Imagination's Meta processor cores, followed by some later miscellaneous arch/metag cleanups and fixes which I kept separate to ease review: - Support for basic Meta 1 (ATP) and Meta 2 (HTP) core architecture - A few fixes all over, particularly for symbol prefixes - A few privilege protection fixes - Several cleanups (setup.c includes, split out a lot of metag_ksyms.c) - Fix some missing exports - Convert hugetlb to use vm_unmapped_area() - Copy device tree to non-init memory - Provide dma_get_sgtable()" * tag 'metag-v3.9-rc1-v4' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag: (61 commits) metag: Provide dma_get_sgtable() metag: prom.h: remove declaration of metag_dt_memblock_reserve() metag: copy devicetree to non-init memory metag: cleanup metag_ksyms.c includes metag: move mm/init.c exports out of metag_ksyms.c metag: move usercopy.c exports out of metag_ksyms.c metag: move setup.c exports out of metag_ksyms.c metag: move kick.c exports out of metag_ksyms.c metag: move traps.c exports out of metag_ksyms.c metag: move irq enable out of irqflags.h on SMP genksyms: fix metag symbol prefix on crc symbols metag: hugetlb: convert to vm_unmapped_area() metag: export clear_page and copy_page metag: export metag_code_cache_flush_all metag: protect more non-MMU memory regions metag: make TXPRIVEXT bits explicit metag: kernel/setup.c: sort includes perf: Enable building perf tools for Meta metag: add boot time LNKGET/LNKSET check metag: add __init to metag_cache_probe() ...
2013-03-03Merge branch 'for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal Pull sigprocmask compat fix from Al Viro: "generic compat_sys_rt_sigprocmask() had a very dumb braino; I'd spent quite a while staring at the offending commit before finally managing to spot the idiocy ;-/" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: fix compat_sys_rt_sigprocmask()
2013-03-03fix compat_sys_rt_sigprocmask()Al Viro1-1/+1
Converting bitmask to 32bit granularity is fine, but we'd better _do_ something with the result. Such as "copy it to userland"... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-03-03trace/ring_buffer: handle 64bit aligned structsJames Hogan1-2/+4
Some 32 bit architectures require 64 bit values to be aligned (for example Meta which has 64 bit read/write instructions). These require 8 byte alignment of event data too, so use !CONFIG_HAVE_64BIT_ALIGNED_ACCESS instead of !CONFIG_64BIT || CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS to decide alignment, and align buffer_data_page::data accordingly. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> (previous version subtly different)
2013-03-02Merge tag 'for_linux-3.9' of ↵Linus Torvalds6-127/+62
git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb Pull KGDB/KDB fixes and cleanups from Jason Wessel: "For a change we removed more code than we added. If people aren't using it we shouldn't be carrying it. :-) Cleanups: - Remove kdb ssb command - there is no in kernel disassembler to support it - Remove kdb ll command - Always caused a kernel oops and there were no bug reports so no one was using this command - Use kernel ARRAY_SIZE macro instead of array computations Fixes: - Stop oops in kdb if user executes kdb_defcmd with args - kdb help command truncated text - ppc64 support for kgdbts - Add missing kconfig option from original kdb port for dealing with catastrophic kernel crashes such that you can reboot automatically on continue from kdb" * tag 'for_linux-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb: kdb: Remove unhandled ssb command kdb: Prevent kernel oops with kdb_defcmd kdb: Remove the ll command kdb_main: fix help print kdb: Fix overlap in buffers with strcpy Fixed dead ifdef block by adding missing Kconfig option. kdb: Setup basic kdb state before invoking commands via kgdb kdb: use ARRAY_SIZE where possible kgdb/kgdbts: support ppc64 kdb: A fix for kdb command table expansion
2013-03-02Merge tag 'arc-v3.9-rc1-late' of ↵Linus Torvalds1-0/+5
git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc Pull new ARC architecture from Vineet Gupta: "Initial ARC Linux port with some fixes on top for 3.9-rc1: I would like to introduce the Linux port to ARC Processors (from Synopsys) for 3.9-rc1. The patch-set has been discussed on the public lists since Nov and has received a fair bit of review, specially from Arnd, tglx, Al and other subsystem maintainers for DeviceTree, kgdb... The arch bits are in arch/arc, some asm-generic changes (acked by Arnd), a minor change to PARISC (acked by Helge). The series is a touch bigger for a new port for 2 main reasons: 1. It enables a basic kernel in first sub-series and adds ptrace/kgdb/.. later 2. Some of the fallout of review (DeviceTree support, multi-platform- image support) were added on top of orig series, primarily to record the revision history. This updated pull request additionally contains - fixes due to our GNU tools catching up with the new syscall/ptrace ABI - some (minor) cross-arch Kconfig updates." * tag 'arc-v3.9-rc1-late' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: (82 commits) ARC: split elf.h into uapi and export it for userspace ARC: Fixup the current ABI version ARC: gdbserver using regset interface possibly broken ARC: Kconfig cleanup tracking cross-arch Kconfig pruning in merge window ARC: make a copy of flat DT ARC: [plat-arcfpga] DT arc-uart bindings change: "baud" => "current-speed" ARC: Ensure CONFIG_VIRT_TO_BUS is not enabled ARC: Fix pt_orig_r8 access ARC: [3.9] Fallout of hlist iterator update ARC: 64bit RTSC timestamp hardware issue ARC: Don't fiddle with non-existent caches ARC: Add self to MAINTAINERS ARC: Provide a default serial.h for uart drivers needing BASE_BAUD ARC: [plat-arcfpga] defconfig for fully loaded ARC Linux ARC: [Review] Multi-platform image #8: platform registers SMP callbacks ARC: [Review] Multi-platform image #7: SMP common code to use callbacks ARC: [Review] Multi-platform image #6: cpu-to-dma-addr optional ARC: [Review] Multi-platform image #5: NR_IRQS defined by ARC core ARC: [Review] Multi-platform image #4: Isolate platform headers ARC: [Review] Multi-platform image #3: switch to board callback ...
2013-03-02kdb: Remove unhandled ssb commandVincent4-39/+2
The 'ssb' command can only be handled when we have a disassembler, to check for branches, so remove the 'ssb' command for now. Signed-off-by: Vincent Stehlé <vincent.stehle@laposte.net> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2013-03-02kdb: Prevent kernel oops with kdb_defcmdJason Wessel1-2/+6
The kdb_defcmd can only be used to display the available command aliases while using the kernel debug shell. If you try to define a new macro while the kernel debugger is active it will oops. The debug shell macros must use pre-allocated memory set aside at the time kdb_init() is run, and the kdb_defcmd is restricted to only working at the time that the kdb_init sequence is being run, which only occurs if you actually activate the kernel debugger. Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2013-03-02kdb: Remove the ll commandJason Wessel1-65/+0
Recently some code inspection was done after fixing a problem with kmalloc used while in the kernel debugger context (which is not legal), and it turned up the fact that kdb ll command will oops the kernel. Given that there have been zero bug reports on the command combined with the fact it will oops the kernel it is clearly not being used. Instead of fixing it, it will be removed. Signed-off-by: Jason Wessel <jason.wessel@windriver.com>