kernel/linux.git - Linux kernel stable tree (mirror)

Age	Commit message (Collapse)	Author	Files	Lines
2017-03-29	cpufreq: intel_pstate: Add support for Gemini Lake	Box, David E	1	-0/+1
	Use same parameters as INTEL_FAM6_ATOM_GOLDMONT to enable Gemini Lake. Signed-off-by: Box, David E <david.e.box@intel.com> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-29	cpufreq: intel_pstate: Eliminate intel_pstate_get_min_max()	Rafael J. Wysocki	1	-26/+14
	Some computations in intel_pstate_get_min_max() are not necessary and one of its two callers doesn't even use the full result. First off, the fixed-point value of cpu->max_perf represents a non-negative number between 0 and 1 inclusive and cpu->min_perf cannot be greater than cpu->max_perf. It is not necessary to check those conditions every time the numbers in question are used. Moreover, since intel_pstate_max_within_limits() only needs the upper boundary, it doesn't make sense to compute the lower one in there and returning min and max from intel_pstate_get_min_max() via pointers doesn't look particularly nice. For the above reasons, drop intel_pstate_get_min_max(), add a helper to get the base P-state for min/max computations and carry out them directly in the previous callers of intel_pstate_get_min_max(). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-29	cpufreq: intel_pstate: Do not walk policy->cpus	Rafael J. Wysocki	1	-64/+60
	intel_pstate_hwp_set() is the only function walking policy->cpus in intel_pstate. The rest of the code simply assumes one CPU per policy, including the initialization code. Therefore it doesn't make sense for intel_pstate_hwp_set() to walk policy->cpus as it is guaranteed to have only one bit set for policy->cpu. For this reason, rearrange intel_pstate_hwp_set() to take the CPU number as the argument and drop the loop over policy->cpus from it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-29	cpufreq: intel_pstate: Introduce pid_in_use()	Rafael J. Wysocki	1	-5/+11
	Add a new function pid_in_use() to return the information on whether or not the PID-based P-state selection algorithm is in use. That allows a couple of complicated conditions in the code to be reduced to simple checks against the new function's return value. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-29	cpufreq: intel_pstate: Drop struct cpu_defaults	Rafael J. Wysocki	1	-87/+67
	The cpu_defaults structure is redundant, because it only contains one member of type struct pstate_funcs which can be used directly instead of struct cpu_defaults. For this reason, drop struct cpu_defaults, use struct pstate_funcs directly instead of it where applicable and rename all of the variables of that type accordingly. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-29	cpufreq: intel_pstate: Move cpu_defaults definitions	Rafael J. Wysocki	1	-67/+62
	Move the definitions of the cpu_defaults structures after the definitions of utilization update callback routines to avoid extra declarations of the latter. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-29	cpufreq: intel_pstate: Add update_util callback to pstate_funcs	Rafael J. Wysocki	1	-38/+43
	Avoid using extra function pointers during P-state selection by dropping the get_target_pstate member from struct pstate_funcs, adding a new update_util callback to it (to be registered with the CPU scheduler as the utilization update callback in the active mode) and reworking the utilization update callback routines to invoke specific P-state selection functions directly. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-29	cpufreq: intel_pstate: Use different utilization update callbacks	Rafael J. Wysocki	1	-25/+54
	Notice that some overhead in the utilization update callbacks registered by intel_pstate in the active mode can be avoided if those callbacks are tailored to specific configurations of the driver. For example, the utilization update callback for the HWP enabled case only needs to update the average CPU performance periodically whereas the utilization update callback for the PID-based algorithm does not need to take IO-wait boosting into account and so on. With that in mind, define three utilization update callbacks for three different use cases: HWP enabled, the CPU load "powersave" P-state selection algorithm and the PID-based "powersave" P-state selection algorithm and modify the driver initialization to choose the callback matching its current configuration. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-29	cpufreq: intel_pstate: Modify check in intel_pstate_update_status()	Rafael J. Wysocki	1	-1/+1
	One of the checks in intel_pstate_update_status() implicitly relies on the information that there are only two struct cpufreq_driver objects available, but it is better to do it directly against the value it really is about (to make the code easier to follow if nothing else). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-29	cpufreq: intel_pstate: Drop driver_registered variable	Rafael J. Wysocki	1	-27/+19
	The driver_registered variable in intel_pstate is used for checking whether or not the driver has been registered, but intel_pstate_driver can be used for that too (with the rule that the driver is not registered as long as it is NULL). That is a bit more straightforward and the code may be simplified a bit this way, so modify the driver accordingly. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-29	cpufreq: intel_pstate: Skip unnecessary PID resets on init	Rafael J. Wysocki	1	-2/+2
	PID controller parameters only need to be initialized if the get_target_pstate_use_performance() P-state selection routine is going to be used. It is not necessary to initialize them otherwise, so don't do that. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-29	cpufreq: intel_pstate: Set HWP sampling interval once	Rafael J. Wysocki	1	-2/+1
	In the HWP enabled case pid_params.sample_rate_ns only needs to be updated once, because it is global, so do that when setting hwp_active instead of doing it during the initialization of every CPU. Moreover, pid_params.sample_rate_ms is never used if HWP is enabled, so do not update it at all then. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-29	cpufreq: intel_pstate: Clean up intel_pstate_busy_pid_reset()	Rafael J. Wysocki	1	-30/+16
	intel_pstate_busy_pid_reset() is the only caller of pid_reset(), pid_p_gain_set(), pid_i_gain_set(), and pid_d_gain_set(). Moreover, it passes constants as two parameters of pid_reset() and all of the other routines above essentially contain the same code, so fold all of them into the caller and drop unnecessary computations. Introduce percent_fp() for converting integer values in percent to fixed-point fractions and use it in the above code cleanup. Finally, rename intel_pstate_busy_pid_reset() to intel_pstate_pid_reset() as it also is used for the initialization of PID parameters for every CPU and the meaning of the "busy" part of the name is not particularly clear. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-29	cpufreq: intel_pstate: Fold intel_pstate_reset_all_pid() into the caller	Rafael J. Wysocki	1	-11/+6
	There is only one caller of intel_pstate_reset_all_pid(), which is pid_param_set() used in the debugfs interface only, and having that code split does not make it particularly convenient to follow. For this reason, move the body of intel_pstate_reset_all_pid() into its caller and drop that function. Also change the loop from for_each_online_cpu() (which is obviously racy with respect to CPU offline/online) to for_each_possible_cpu(), so that all PID parameters are reset for all CPUs regardless of their online/offline status (to prevent, for example, a previously offline CPU from going online with a stale set of PID parameters). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-29	cpufreq: intel_pstate: Initialize pid_params statically	Rafael J. Wysocki	1	-32/+10
	Notice that both the existing struct cpu_defaults instances in which PID parameters are actually initialized use the same values of those parameters, so it is not really necessary to copy them over to pid_params dynamically. Instead, initialize pid_params statically with those values and drop the unused pid_policy member from struct cpu_defaults along with copy_pid_params() used for initializing it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-29	cpufreq: intel_pstate: Drop pointless initialization of PID parameters	Rafael J. Wysocki	1	-26/+2
	The P-state selection algorithm used by intel_pstate for Atom processors is not based on the PID controller and the initialization of PID parametrs for those processors is pointless and confusing, so drop it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-29	cpufreq: intel_pstate: Eliminate struct perf_limits	Rafael J. Wysocki	1	-36/+23
	After recent changes the purpose of struct perf_limits is not particularly clear any more and the code may be made somewhat easier to follow by eliminating it, so go for that. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-24	cpufreq: intel_pstate: Avoid transient updates of cpuinfo.max_freq	Rafael J. Wysocki	1	-19/+28
	Both intel_pstate_verify_policy() and intel_cpufreq_verify_policy() set policy->cpuinfo.max_freq depending on the turbo status, but the updates made by them are discarded by the core, because the policy object passed to them by the core is temporary and cpuinfo.max_freq from that object is not copied to the final policy object in cpufreq_set_policy(). However, cpufreq_set_policy() passes the temporary policy object to the ->setpolicy callback of the driver, so intel_pstate_set_policy() actually sees the policy->cpuinfo.max_freq value updated by intel_pstate_verify_policy() and not the final one. It also updates policy->max sometimes which basically has no effect after it returns, because the core discards that update. To avoid confusion, eliminate policy->cpuinfo.max_freq updates from intel_pstate_verify_policy() and intel_cpufreq_verify_policy() entirely and check the maximum frequency explicitly in intel_pstate_update_perf_limits() instead of relying on the transiently updated policy->cpuinfo.max_freq value. Moreover, move the max->policy adjustment carried out in intel_pstate_set_policy() to a separate function and call that function from the ->verify driver callbacks to ensure that it will actually be effective. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-24	cpufreq: intel_pstate: Active mode P-state limits rework	Rafael J. Wysocki	1	-100/+85
	The coordination of P-state limits used by intel_pstate in the active mode (ie. by default) is problematic, because it synchronizes all of the limits (ie. the global ones and the per-policy ones) so as to use one common pair of P-state limits (min and max) across all CPUs in the system. The drawbacks of that are as follows: - If P-states are coordinated in hardware, it is not necessary to coordinate them in software on top of that, so in that case all of the above activity is in vain. - If P-states are not coordinated in hardware, then the processor is actually capable of setting different P-states for different CPUs and coordinating them at the software level simply doesn't allow that capability to be utilized. - The coordination works in such a way that setting a per-policy limit (eg. scaling_max_freq) for one CPU causes the common effective limit to change (and it will affect all of the other CPUs too), but subsequent reads from the corresponding sysfs attributes for the other CPUs will return stale values (which is confusing). - Reads from the global P-state limit attributes, min_perf_pct and max_perf_pct, return the effective common values and not the last values set through these attributes. However, the last values set through these attributes become hard limits that cannot be exceeded by writes to scaling_min_freq and scaling_max_freq, respectively, and they are not exposed, so essentially users have to remember what they are. All of that is painful enough to warrant a change of the management of P-state limits in the active mode. To that end, redesign the active mode P-state limits management in intel_pstate in accordance with the following rules: (1) All CPUs are affected by the global limits (that is, none of them can be requested to run faster than the global max and none of them can be requested to run slower than the global min). (2) Each individual CPU is affected by its own per-policy limits (that is, it cannot be requested to run faster than its own per-policy max and it cannot be requested to run slower than its own per-policy min). (3) The global and per-policy limits can be set independently. Also, the global maximum and minimum P-state limits will be always expressed as percentages of the maximum supported turbo P-state. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-24	cpufreq: intel_pstate: Use load-based P-state selection more widely	Rafael J. Wysocki	1	-1/+7
	Extend the set of systems for which intel_pstate will use the "powersave" P-state selection algorithm based on CPU load in the active mode by systems with ACPI preferred profile set to "tablet", "appliance PC", "desktop", or "workstation" (ie. everything with a specified preferred profile that is not a "server"). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-24	cpufreq: intel_pstate: Support HWP processors in all operation modes	Rafael J. Wysocki	1	-14/+19
	Currently, some processors supporting HWP are only supported by intel_pstate if HWP is actually going to be used and not supported otherwise which is confusing. Specifically, they are not supported if "intel_pstate=no_hwp" is passed to the kernel in the command line or if the driver is started in the passive mode ("intel_pstate=passive"). There is no real reason for that, because everything about those processor is known anyway and the driver can work with them in all modes, so make that happen, but use the load-based P-state selection algorithm for the active mode "powersave" policy with them. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-24	Merge back intel_pstate updates for 4.12.	Rafael J. Wysocki	1	-14/+4

2017-03-22	cpufreq: intel_pstate: Fix policy data management in passive mode	Rafael J. Wysocki	1	-23/+6
	The policy->cpuinfo.max_freq and policy->max updates in intel_cpufreq_turbo_update() are excessive as they are done for no good reason and may lead to problems in principle, so they should be dropped. However, after dropping them intel_cpufreq_turbo_update() becomes almost entirely pointless, because the check made by it is made again down the road in intel_pstate_prepare_request(). The only thing in it that still needs to be done is the call to update_turbo_state(), so drop intel_cpufreq_turbo_update() altogether and make its callers invoke update_turbo_state() directly instead of it. In addition to that, fix intel_cpufreq_verify_policy() so that it checks global.no_turbo in addition to global.turbo_disabled when updating policy->cpuinfo.max_freq to make it consistent with intel_pstate_verify_policy(). Fixes: 001c76f05b01 (cpufreq: intel_pstate: Generic governors support) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-18	cpufreq: intel_pstate: One set of global limits in active mode	Rafael J. Wysocki	1	-96/+46
	In the active mode intel_pstate currently uses two sets of global limits, each associated with one of the possible scaling_governor settings in that mode: "powersave" or "performance". The driver switches over from one of those sets to the other depending on the scaling_governor setting for the last CPU whose per-policy cpufreq interface in sysfs was last used to change parameters exposed in there. That obviously leads to no end of issues when the scaling_governor settings differ between CPUs. The most recent issue was introduced by commit a240c4aa5d0f (cpufreq: intel_pstate: Do not reinit performance limits in ->setpolicy) that eliminated the reinitialization of "performance" limits in intel_pstate_set_policy() preventing the max limit from being set to anything below 100, among other things. Namely, an undesirable side effect of commit a240c4aa5d0f is that now, after setting scaling_governor to "performance" in the active mode, the per-policy limits for the CPU in question go to the highest level and stay there even when it is switched back to "powersave" later. As it turns out, some distributions set scaling_governor to "performance" temporarily for all CPUs to speed-up system initialization, so that change causes them to misbehave later. To fix that, get rid of the performance/powersave global limits split and use just one set of global limits for everything. From the user's persepctive, after this modification, when scaling_governor is switched from "performance" to "powersave" or the other way around on one CPU, the limits settings (ie. the global max/min_perf_pct and per-policy scaling_max/min_freq for any CPUs) will not change. Still, switching from "performance" to "powersave" or the other way around changes the way in which P-states are selected and in particular "performance" causes the driver to always request the highest P-state it is allowed to ask for for the given CPU. Fixes: a240c4aa5d0f (cpufreq: intel_pstate: Do not reinit performance limits in ->setpolicy) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-15	cpufreq: intel_pstate: Avoid percentages in limits-related computations	Rafael J. Wysocki	1	-28/+28
	Currently, intel_pstate_update_perf_limits() first converts the policy minimum and maximum limits into percentages of the maximum turbo frequency (rounding up to an integer) and then converts these percentages to fractions (by using fixed-point arithmetic to divide them by 100). That introduces a rounding error unnecessarily, because the fractions can be obtained by carrying out fixed-point divisions directly on the input numbers. Rework the computations in intel_pstate_hwp_set() to use fractions instead of percentages (and drop redundant local variables from there) and modify intel_pstate_update_perf_limits() to compute the fractions directly and percentages out of them. While at it, introduce percent_ext_fp() for converting percentages to fractions (with extended number of fraction bits) and use it in the computations. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-14	cpufreq: intel_pstate: Correct frequency setting in the HWP mode	Srinivas Pandruvada	1	-7/+4
	In the functions intel_pstate_hwp_set(), min/max range from HWP capability MSR along with max_perf_pct and min_perf_pct, is used to set the HWP request MSR. In some cases this doesn't result in the correct HWP max/min in HWP request. For example: In the following case: HWP capabilities from MSR 0x771 0x70a1220 Here cpufreq min/max frequencies from above MSR dump are 700MHz and 3.2GHz respectively. This will result in hwp_min = 0x07 hwp_max = 0x20 To limit max frequency to 2GHz: perf_limits->max_perf_pct = 63 (2GHz as a percent of 3.2GHz rounded up) With the current calculation: adj_range = max_perf_pct * range / 100; adj_range = 63 * (32 - 7) / 100 adj_range = 15 max = hw_min + adj_range; max = 7 + 15 = 22 This will result in HWP request of 0x160f, which will result in a frequency cap of 2.2GHz not 2GHz. The problem with the above calculation is that hwp_min of 7 is treated as 0% in the range. But max_perf_pct is calculated with respect to minimum as 0 and max as 3.2GHz or hwp_max, so adding hwp_min to it will result in more than the desired. Since the min_perf_pct and max_perf_pct is already a percent of max frequency or hwp_max, this min/max HWP request value can be calculated directly applying these percentage to hwp_max. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-14	cpufreq: intel_pstate: Update pid_params.sample_rate_ns in pid_param_set()	Rafael J. Wysocki	1	-0/+1
	Fix the debugfs interface for PID tuning to actually update pid_params.sample_rate_ns on PID parameters updates, as changing pid_params.sample_rate_ms via debugfs has no effect now. Fixes: a4675fbc4a7a (cpufreq: intel_pstate: Replace timers with utilization update callbacks) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2017-03-13	cpufreq: intel_pstate: Drop redundant wrapper function	Rafael J. Wysocki	1	-14/+4
	intel_pstate_hwp_set_policy() is a wrapper around intel_pstate_hwp_set(), but the only value it adds is to check hwp_active before calling the latter and one of its two callers has already checked hwp_active before that happens, so in that code path the additional check is redundant and using the wrapper is rather pointless. For this reason, drop intel_pstate_hwp_set_policy() and make its callers invoke intel_pstate_hwp_set() directly (after checking hwp_active). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2017-03-13	Linux 4.11-rc2v4.11-rc2	Linus Torvalds	1	-1/+1

2017-03-13	Merge branch 'for-linus' of ↵	Linus Torvalds	11	-29/+51
	git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Martin Schwidefsky: - four patches to get the new cputime code in shape for s390 - add the new statx system call - a few bug fixes * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390: wire up statx system call KVM: s390: Fix guest migration for huge guests resulting in panic s390/ipl: always use load normal for CCW-type re-IPL s390/timex: micro optimization for tod_to_ns s390/cputime: provide archicture specific cputime_to_nsecs s390/cputime: reset all accounting fields on fork s390/cputime: remove last traces of cputime_t s390: fix in-kernel program checks s390/crypt: fix missing unlock in ctr_paes_crypt on error path
2017-03-13	Merge branch 'x86-urgent-for-linus' of ↵	Linus Torvalds	12	-48/+80
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: - a fix for the kexec/purgatory regression which was introduced in the merge window via an innocent sparse fix. We could have reverted that commit, but on deeper inspection it turned out that the whole machinery is neither documented nor robust. So a proper cleanup was done instead - the fix for the TLB flush issue which was discovered recently - a simple typo fix for a reboot quirk * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/tlb: Fix tlb flushing when lguest clears PGE kexec, x86/purgatory: Unbreak it and clean it up x86/reboot/quirks: Fix typo in ASUS EeeBook X205TA reboot quirk
2017-03-13	Merge branch 'irq-urgent-for-linus' of ↵	Linus Torvalds	5	-4/+35
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: - a workaround for a GIC erratum - a missing stub function for CONFIG_IRQDOMAIN=n - fixes for a couple of type inconsistencies * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/crossbar: Fix incorrect type of register size irqchip/gicv3-its: Add workaround for QDF2400 ITS erratum 0065 irqdomain: Add empty irq_domain_check_msi_remap irqchip/crossbar: Fix incorrect type of local variables
2017-03-12	x86/tlb: Fix tlb flushing when lguest clears PGE	Daniel Borkmann	1	-1/+1
	Fengguang reported random corruptions from various locations on x86-32 after commits d2852a224050 ("arch: add ARCH_HAS_SET_MEMORY config") and 9d876e79df6a ("bpf: fix unlocking of jited image when module ronx not set") that uses the former. While x86-32 doesn't have a JIT like x86_64, the bpf_prog_lock_ro() and bpf_prog_unlock_ro() got enabled due to ARCH_HAS_SET_MEMORY, whereas Fengguang's test kernel doesn't have module support built in and therefore never had the DEBUG_SET_MODULE_RONX setting enabled. After investigating the crashes further, it turned out that using set_memory_ro() and set_memory_rw() didn't have the desired effect, for example, setting the pages as read-only on x86-32 would still let probe_kernel_write() succeed without error. This behavior would manifest itself in situations where the vmalloc'ed buffer was accessed prior to set_memory_*() such as in case of bpf_prog_alloc(). In cases where it wasn't, the page attribute changes seemed to have taken effect, leading to the conclusion that a TLB invalidate didn't happen. Moreover, it turned out that this issue reproduced with qemu in "-cpu kvm64" mode, but not for "-cpu host". When the issue occurs, change_page_attr_set_clr() did trigger a TLB flush as expected via __flush_tlb_all() through cpa_flush_range(), though. There are 3 variants for issuing a TLB flush: invpcid_flush_all() (depends on CPU feature bits X86_FEATURE_INVPCID, X86_FEATURE_PGE), cr4 based flush (depends on X86_FEATURE_PGE), and cr3 based flush. For "-cpu host" case in my setup, the flush used invpcid_flush_all() variant, whereas for "-cpu kvm64", the flush was cr4 based. Switching the kvm64 case to cr3 manually worked fine, and further investigating the cr4 one turned out that X86_CR4_PGE bit was not set in cr4 register, meaning the __native_flush_tlb_global_irq_disabled() wrote cr4 twice with the same value instead of clearing X86_CR4_PGE in the first write to trigger the flush. It turned out that X86_CR4_PGE was cleared from cr4 during init from lguest_arch_host_init() via adjust_pge(). The X86_FEATURE_PGE bit is also cleared from there due to concerns of using PGE in guest kernel that can lead to hard to trace bugs (see bff672e630a0 ("lguest: documentation V: Host") in init()). The CPU feature bits are cleared in dynamic boot_cpu_data, but they never propagated to __flush_tlb_all() as it uses static_cpu_has() instead of boot_cpu_has() for testing which variant of TLB flushing to use, meaning they still used the old setting of the host kernel. Clearing via setup_clear_cpu_cap(X86_FEATURE_PGE) so this would propagate to static_cpu_has() checks is too late at this point as sections have been patched already, so for now, it seems reasonable to switch back to boot_cpu_has(X86_FEATURE_PGE) as it was prior to commit c109bf95992b ("x86/cpufeature: Remove cpu_has_pge"). This lets the TLB flush trigger via cr3 as originally intended, properly makes the new page attributes visible and thus fixes the crashes seen by Fengguang. Fixes: c109bf95992b ("x86/cpufeature: Remove cpu_has_pge") Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: bp@suse.de Cc: Kees Cook <keescook@chromium.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: lkp@01.org Cc: Laura Abbott <labbott@redhat.com> Cc: stable@vger.kernel.org Link: http://lkml.kernrl.org/r/20170301125426.l4nf65rx4wahohyl@wfg-t540p.sh.intel.com Link: http://lkml.kernel.org/r/25c41ad9eca164be4db9ad84f768965b7eb19d9e.1489191673.git.daniel@iogearbox.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-12	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds	13	-101/+191
	Pull KVM fixes from Radim Krčmář: "ARM updates from Marc Zyngier: - vgic updates: - Honour disabling the ITS - Don't deadlock when deactivating own interrupts via MMIO - Correctly expose the lact of IRQ/FIQ bypass on GICv3 - I/O virtualization: - Make KVM_CAP_NR_MEMSLOTS big enough for large guests with many PCIe devices - General bug fixes: - Gracefully handle exception generated with syndroms that the host doesn't understand - Properly invalidate TLBs on VHE systems x86: - improvements in emulation of VMCLEAR, VMX MSR bitmaps, and VCPU reset * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: nVMX: do not warn when MSR bitmap address is not backed KVM: arm64: Increase number of user memslots to 512 KVM: arm/arm64: Remove KVM_PRIVATE_MEM_SLOTS definition that are unused KVM: arm/arm64: Enable KVM_CAP_NR_MEMSLOTS on arm/arm64 KVM: Add documentation for KVM_CAP_NR_MEMSLOTS KVM: arm/arm64: VGIC: Fix command handling while ITS being disabled arm64: KVM: Survive unknown traps from guests arm: KVM: Survive unknown traps from guests KVM: arm/arm64: Let vcpu thread modify its own active state KVM: nVMX: reset nested_run_pending if the vCPU is going to be reset kvm: nVMX: VMCLEAR should not cause the vCPU to shut down KVM: arm/arm64: vgic-v3: Don't pretend to support IRQ/FIQ bypass arm64: KVM: VHE: Clear HCR_TGE when invalidating guest TLBs
2017-03-12	Merge tag 'extable-fix' of ↵	Linus Torvalds	2	-0/+3
	git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux Pull extable.h fix from Paul Gortmaker: "Fixup for arch/score after extable.h introduction. It seems that Guenter is the only one on the planet doing builds for arch/score -- we don't have compile coverage for it in linux-next or in the kbuild-bot either. Guenter couldn't even recall where he got his toolchain, but was kind enough to share it with me so I could validate this change and also add arch/score to my build coverage. I sat on this a bit in case there was any other fallout in other arch dirs, but since this still seems to be the only one, I might as well send it on its way" * tag 'extable-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: score: Fix implicit includes now failing build after extable change
2017-03-11	Merge tag 'random_for_linus' of ↵	Linus Torvalds	3	-83/+65
	git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random Pull random updates from Ted Ts'o: "Change get_random_{int,log} to use the CRNG used by /dev/urandom and getrandom(2). It's faster and arguably more secure than cut-down MD5 that we had been using. Also do some code cleanup" * tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random: random: move random_min_urandom_seed into CONFIG_SYSCTL ifdef block random: convert get_random_int/long into get_random_u32/u64 random: use chacha20 for get_random_int/long random: fix comment for unused random_min_urandom_seed random: remove variable limit random: remove stale urandom_init_wait random: remove stale maybe_reseed_primary_crng
2017-03-11	score: Fix implicit includes now failing build after extable change	Guenter Roeck	2	-0/+3
	After changing from module.h to extable.h, score builds fail with: arch/score/kernel/traps.c: In function 'do_ri': arch/score/kernel/traps.c:248:4: error: implicit declaration of function 'user_disable_single_step' arch/score/mm/extable.c: In function 'fixup_exception': arch/score/mm/extable.c:32:38: error: dereferencing pointer to incomplete type arch/score/mm/extable.c:34:24: error: dereferencing pointer to incomplete type because extable.h doesn't drag in the same amount of headers as the module.h did. Add in the headers which were implicitly expected. Fixes: 90858794c960 ("module.h: remove extable.h include now users have migrated") Signed-off-by: Guenter Roeck <linux@roeck-us.net> [PG: tweak commit log; refresh for sched header refactoring.] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2017-03-11	Merge tag 'tty-4.11-rc2' of ↵	Linus Torvalds	2	-65/+73
	git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial fixes frpm Greg KH: "Here are two bugfixes for tty stuff for 4.11-rc2. One of them resolves the pretty bad bug in the n_hdlc code that Alexander Popov found and fixed and has been reported everywhere. The other just fixes a samsung serial driver issue when DMA fails on some systems. Both have been in linux-next with no reported issues" * tag 'tty-4.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: serial: samsung: Continue to work if DMA request fails tty: n_hdlc: get rid of racy n_hdlc.tbuf
2017-03-11	Merge tag 'staging-4.11-rc2' of ↵	Linus Torvalds	2	-1/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull staging driver fixes from Greg KH: "Here are two small build warning fixes for some staging drivers that Arnd has found on his valiant quest to get the kernel to build properly with no warnings. Both of these have been in linux-next this week and resolve the reported issues" * tag 'staging-4.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: staging: octeon: remove unused variable staging/vc04_services: add CONFIG_OF dependency
2017-03-11	Merge tag 'usb-4.11-rc2' of ↵	Linus Torvalds	26	-144/+213
	git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg KH: "Here is a number of different USB fixes for 4.11-rc2. Seems like there were a lot of unresolved issues that people have been finding for this subsystem, and a bunch of good security auditing happening as well from Johan Hovold. There's the usual batch of gadget driver fixes and xhci issues resolved as well. All of these have been in linux-next with no reported issues" * tag 'usb-4.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (35 commits) usb: host: xhci-plat: Fix timeout on removal of hot pluggable xhci controllers usb: host: xhci-dbg: HCIVERSION should be a binary number usb: xhci: remove dummy extra_priv_size for size of xhci_hcd struct usb: xhci-mtk: check hcc_params after adding primary hcd USB: serial: digi_acceleport: fix OOB-event processing MAINTAINERS: usb251xb: remove reference inexistent file doc: dt-bindings: usb251xb: mark reg as required usb: usb251xb: dt: add unit suffix to oc-delay and power-on-time usb: usb251xb: remove max_{power,current}_{sp,bp} properties usb-storage: Add ignore-residue quirk for Initio INIC-3619 USB: iowarrior: fix NULL-deref in write USB: iowarrior: fix NULL-deref at probe usb: phy: isp1301: Add OF device ID table usb: ohci-at91: Do not drop unhandled USB suspend control requests USB: serial: safe_serial: fix information leak in completion handler USB: serial: io_ti: fix information leak in completion handler USB: serial: omninet: drop open callback USB: serial: omninet: fix reference leaks at open USB: serial: io_ti: fix NULL-deref in interrupt callback usb: dwc3: gadget: make to increment req->remaining in all cases ...
2017-03-11	Merge tag 'pinctrl-v4.11-2' of ↵	Linus Torvalds	2	-6/+21
	git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pinctrl fixes from Linus Walleij: "Two smaller pin control fixes for the v4.11 series: - Add a get_direction() function to the qcom driver - Fix two pin names in the uniphier driver" * tag 'pinctrl-v4.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: uniphier: change pin names of aio/xirq for LD11 pinctrl: qcom: add get_direction function
2017-03-10	kexec, x86/purgatory: Unbreak it and clean it up	Thomas Gleixner	10	-46/+78
	The purgatory code defines global variables which are referenced via a symbol lookup in the kexec code (core and arch). A recent commit addressing sparse warnings made these static and thereby broke kexec_file. Why did this happen? Simply because the whole machinery is undocumented and lacks any form of forward declarations. The variable names are unspecific and lack a prefix, so adding forward declarations creates shadow variables in the core code. Aside of that the code relies on magic constants and duplicate struct definitions with no way to ensure that these things stay in sync. The section placement of the purgatory variables happened by chance and not by design. Unbreak kexec and cleanup the mess: - Add proper forward declarations and document the usage - Use common struct definition - Use the proper common defines instead of magic constants - Add a purgatory_ prefix to have a proper name space - Use ARRAY_SIZE() instead of a homebrewn reimplementation - Add proper sections to the purgatory variables [ From Mike ] Fixes: 72042a8c7b01 ("x86/purgatory: Make functions and variables static") Reported-by: Mike Galbraith <<efault@gmx.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Nicholas Mc Guire <der.herr@hofr.at> Cc: Borislav Petkov <bp@alien8.de> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: "Tobin C. Harding" <me@tobin.cc> Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1703101315140.3681@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-10	Merge tag 'ceph-for-4.11-rc2' of git://github.com/ceph/ceph-client	Linus Torvalds	6	-8/+66
	Pull ceph fixes from Ilya Dryomov: - a fix for the recently discovered misdirected requests bug present in jewel and later on the server side and all stable kernels - a fixup for -rc1 CRUSH changes - two usability enhancements: osd_request_timeout option and supported_features bus attribute. * tag 'ceph-for-4.11-rc2' of git://github.com/ceph/ceph-client: libceph: osd_request_timeout option rbd: supported_features bus attribute libceph: don't set weight to IN when OSD is destroyed libceph: fix crush_decode() for older maps
2017-03-10	Merge branch 'i2c/for-current' of ↵	Linus Torvalds	8	-22/+56
	git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "Here are some driver bugfixes from I2C. Unusual this time are the two reverts. One because I accidently picked a patch from the list which I should have pulled from my co-maintainer instead ("missing of_node_put"). And one which I wrongly assumed to be an easy fix but it turned out already that it needs more iterations ("copy device properties")" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: Revert "i2c: copy device properties when using i2c_register_board_info()" Revert "i2c: add missing of_node_put in i2c_mux_del_adapters" i2c: exynos5: Avoid transaction timeouts due TRANSFER_DONE_AUTO not set i2c: designware: add reset interface i2c: meson: fix wrong variable usage in meson_i2c_put_data i2c: copy device properties when using i2c_register_board_info() i2c: m65xx: drop superfluous quirk structure i2c: brcmstb: Fix START and STOP conditions i2c: add missing of_node_put in i2c_mux_del_adapters i2c: riic: fix restart condition i2c: add missing of_node_put in i2c_mux_del_adapters
2017-03-10	Merge tag 'drm-fixes-for-4.11-rc2' of ↵	Linus Torvalds	22	-264/+768
	git://people.freedesktop.org/~airlied/linux Pull drm fixes from Dave Airlie: "Intel, amd and mxsfb fixes. These are the drm fixes I've collected for rc2. Mostly i915 GVT only fixes, along with a single EDID fix, some mxsfb fixes and a few minor amd fixes" * tag 'drm-fixes-for-4.11-rc2' of git://people.freedesktop.org/~airlied/linux: (38 commits) drm: mxsfb: Implement drm_panel handling drm: mxsfb_crtc: Fix the framebuffer misplacement drm: mxsfb: Fix crash when provided invalid DT bindings drm: mxsfb: fix pixel clock polarity drm: mxsfb: use bus_format to determine LCD bus width drm/amdgpu: bump driver version for some new features drm/amdgpu: validate paramaters in the gem ioctl drm/amd/amdgpu: fix console deadlock if late init failed drm/i915/gvt: change some gvt_err to gvt_dbg_cmd drm/i915/gvt: protect RO and Rsvd bits of virtual vgpu configuration space drm/i915/gvt: handle workload lifecycle properly drm/edid: Add EDID_QUIRK_FORCE_8BPC quirk for Rotel RSX-1058 drm/i915/gvt: fix an error for F_RO flag drm/i915/gvt: use pfn_valid for better checking drm/i915/gvt: set SFUSE_STRAP properly for vitual monitor detection drm/i915/gvt: fix an error for one register drm/i915/gvt: add more registers into handlers list drm/i915/gvt: have more registers with F_CMD_ACCESS flags set drm/i915/gvt: add some new MMIOs to cmd_access white list drm/i915/gvt: fix pcode mailbox write emulation of BDW ...
2017-03-10	Merge branch 'prep-for-5level'	Linus Torvalds	64	-147/+868
	Merge 5-level page table prep from Kirill Shutemov: "Here's relatively low-risk part of 5-level paging patchset. Merging it now will make x86 5-level paging enabling in v4.12 easier. The first patch is actually x86-specific: detect 5-level paging support. It boils down to single define. The rest of patchset converts Linux MMU abstraction from 4- to 5-level paging. Enabling of new abstraction in most cases requires adding single line of code in arch-specific code. The rest is taken care by asm-generic/. Changes to mm/ code are mostly mechanical: add support for new page table level -- p4d_t -- where we deal with pud_t now. v2: - fix build on microblaze (Michal); - comment for __ARCH_HAS_5LEVEL_HACK in kasan_populate_zero_shadow(); - acks from Michal" * emailed patches from Kirill A Shutemov <kirill.shutemov@linux.intel.com>: mm: introduce __p4d_alloc() mm: convert generic code to 5-level paging asm-generic: introduce <asm-generic/pgtable-nop4d.h> arch, mm: convert all architectures to use 5level-fixup.h asm-generic: introduce __ARCH_USE_5LEVEL_HACK asm-generic: introduce 5level-fixup.h x86/cpufeature: Add 5-level paging detection
2017-03-10	Merge branch 'akpm' (patches from Andrew)	Linus Torvalds	54	-185/+277
	Merge fixes from Andrew Morton: "26 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (26 commits) userfaultfd: remove wrong comment from userfaultfd_ctx_get() fat: fix using uninitialized fields of fat_inode/fsinfo_inode sh: cayman: IDE support fix kasan: fix races in quarantine_remove_cache() kasan: resched in quarantine_remove_cache() mm: do not call mem_cgroup_free() from within mem_cgroup_alloc() thp: fix another corner case of munlock() vs. THPs rmap: fix NULL-pointer dereference on THP munlocking mm/memblock.c: fix memblock_next_valid_pfn() userfaultfd: selftest: vm: allow to build in vm/ directory userfaultfd: non-cooperative: userfaultfd_remove revalidate vma in MADV_DONTNEED userfaultfd: non-cooperative: fix fork fctx->new memleak mm/cgroup: avoid panic when init with low memory drivers/md/bcache/util.h: remove duplicate inclusion of blkdev.h mm/vmstats: add thp_split_pud event for clarity include/linux/fs.h: fix unsigned enum warning with gcc-4.2 userfaultfd: non-cooperative: release all ctx in dup_userfaultfd_complete userfaultfd: non-cooperative: robustness check userfaultfd: non-cooperative: rollback userfaultfd_exit x86, mm: unify exit paths in gup_pte_range() ...
2017-03-10	x86/reboot/quirks: Fix typo in ASUS EeeBook X205TA reboot quirk	Matjaz Hegedic	1	-1/+1
	The reboot quirk for ASUS EeeBook X205TA contains a typo in DMI_PRODUCT_NAME, improperly referring to X205TAW instead of X205TA, which prevents the quirk from being triggered. The model X205TAW already has a reboot quirk of its own. This fix simply removes the inappropriate final letter W. Fixes: 90b28ded88dd ("x86/reboot/quirks: Add ASUS EeeBook X205TA reboot quirk") Signed-off-by: Matjaz Hegedic <matjaz.hegedic@gmail.com> Link: http://lkml.kernel.org/r/1489064417-7445-1-git-send-email-matjaz.hegedic@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-10	Merge tag 'xfs-4.11-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux	Linus Torvalds	14	-100/+103
	Pull xfs fixes from Darrick Wong: "Here are some bug fixes for -rc2 to clean up the copy on write handling and to remove a cause of hangs. - Fix various iomap bugs - Fix overly aggressive CoW preallocation garbage collection - Fixes to CoW endio error handling - Fix some incorrect geometry calculations - Remove a potential system hang in bulkstat - Try to allocate blocks more aggressively to reduce ENOSPC errors" * tag 'xfs-4.11-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: try any AG when allocating the first btree block when reflinking xfs: use iomap new flag for newly allocated delalloc blocks xfs: remove kmem_zalloc_greedy xfs: Use xfs_icluster_size_fsb() to calculate inode alignment mask xfs: fix and streamline error handling in xfs_end_io xfs: only reclaim unwritten COW extents periodically iomap: invalidate page caches should be after iomap_dio_complete() in direct write
2017-03-10	Merge tag 'gcc-plugins-v4.11-rc2' of ↵	Linus Torvalds	1	-1/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull gcc-plugins fix from Kees Cook: "Fixes a typo in sancov plugin, exposed in earlier compiler versions" * tag 'gcc-plugins-v4.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: gcc-plugins: fix sancov_plugin for gcc-5