summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-02-21xen: Enable interrupts when calling _cond_resched()Thomas Gleixner1-1/+3
xen_maybe_preempt_hcall() is called from the exception entry point xen_do_hypervisor_callback with interrupts disabled. _cond_resched() evades the might_sleep() check in cond_resched() which would have caught that and schedule_debug() unfortunately lacks a check for irqs_disabled(). Enable interrupts around the call and use cond_resched() to catch future issues. Fixes: fdfd811ddde3 ("x86/xen: allow privcmd hypercalls to be preempted") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/878skypjrh.fsf@nanos.tec.linutronix.de Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2020-02-21tracing: Fix number printing bug in print_synth_event()Tom Zanussi1-3/+29
Fix a varargs-related bug in print_synth_event() which resulted in strange output and oopses on 32-bit x86 systems. The problem is that trace_seq_printf() expects the varargs to match the format string, but print_synth_event() was always passing u64 values regardless. This results in unspecified behavior when unpacking with va_arg() in trace_seq_printf(). Add a function that takes the size into account when calling trace_seq_printf(). Before: modprobe-1731 [003] .... 919.039758: gen_synth_test: next_pid_field=777(null)next_comm_field=hula hoops ts_ns=1000000 ts_ms=1000 cpu=3(null)my_string_field=thneed my_int_field=598(null) After: insmod-1136 [001] .... 36.634590: gen_synth_test: next_pid_field=777 next_comm_field=hula hoops ts_ns=1000000 ts_ms=1000 cpu=1 my_string_field=thneed my_int_field=598 Link: http://lkml.kernel.org/r/a9b59eb515dbbd7d4abe53b347dccf7a8e285657.1581720155.git.zanussi@kernel.org Reported-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-02-21tracing: Check that number of vals matches number of synth event fieldsTom Zanussi1-2/+12
Commit 7276531d4036('tracing: Consolidate trace() functions') inadvertently dropped the synth_event_trace() and synth_event_trace_array() checks that verify the number of values passed in matches the number of fields in the synthetic event being traced, so add them back. Link: http://lkml.kernel.org/r/32819cac708714693669e0dfe10fe9d935e94a16.1581720155.git.zanussi@kernel.org Signed-off-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-02-21tracing: Make synth_event trace functions endian-correctTom Zanussi1-4/+58
synth_event_trace(), synth_event_trace_array() and __synth_event_add_val() write directly into the trace buffer and need to take endianness into account, like trace_event_raw_event_synth() does. Link: http://lkml.kernel.org/r/2011354355e405af9c9d28abba430d1f5ff7771a.1581720155.git.zanussi@kernel.org Signed-off-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-02-21tracing: Make sure synth_event_trace() example always uses u64Tom Zanussi1-17/+17
synth_event_trace() is the varargs version of synth_event_trace_array(), which takes an array of u64, as do synth_event_add_val() et al. To not only be consistent with those, but also to address the fact that synth_event_trace() expects every arg to be of the same type since it doesn't also pass in e.g. a format string, the caller needs to make sure all args are of the same type, u64. u64 is used because it needs to accomodate the largest type available in synthetic events, which is u64. This fixes the bug reported by the kernel test robot/Rong Chen. Link: https://lore.kernel.org/lkml/20200212113444.GS12867@shao2-debian/ Link: http://lkml.kernel.org/r/894c4e955558b521210ee0642ba194a9e603354c.1581720155.git.zanussi@kernel.org Fixes: 9fe41efaca084 ("tracing: Add synth event generation test module") Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-02-20sched: Provide cant_migrate()Thomas Gleixner1-0/+7
Some code pathes rely on preempt_disable() to prevent migration on a non RT enabled kernel. These preempt_disable/enable() pairs are substituted by migrate_disable/enable() pairs or other forms of RT specific protection. On RT these protections prevent migration but not preemption. Obviously a cant_sleep() check in such a section will trigger on RT because preemption is not disabled. Provide a cant_migrate() macro which maps to cant_sleep() on a non RT kernel and an empty placeholder for RT for now. The placeholder will be changed to a proper debug check along with the RT specific migration protection mechanism. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200214161503.070487511@linutronix.de
2020-02-20sched/rt: Provide migrate_disable/enable() inlinesThomas Gleixner1-0/+30
Code which solely needs to prevent migration of a task uses preempt_disable()/enable() pairs. This is the only reliable way to do so as setting the task affinity to a single CPU can be undone by a setaffinity operation from a different task/process. RT provides a seperate migrate_disable/enable() mechanism which does not disable preemption to achieve the semantic requirements of a (almost) fully preemptible kernel. As it is unclear from looking at a given code path whether the intention is to disable preemption or migration, introduce migrate_disable/enable() inline functions which can be used to annotate code which merely needs to disable migration. Map them to preempt_disable/enable() for now. The RT substitution will be provided later. Code which is annotated that way documents that it has no requirement to protect against reentrancy of a preempting task. Either this is not required at all or the call sites are already serialized by other means. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Segall <bsegall@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lore.kernel.org/r/878slclv1u.fsf@nanos.tec.linutronix.de
2020-02-20libbpf: Relax check whether BTF is mandatoryAndrii Nakryiko1-3/+1
If BPF program is using BTF-defined maps, BTF is required only for libbpf itself to process map definitions. If after that BTF fails to be loaded into kernel (e.g., if it doesn't support BTF at all), this shouldn't prevent valid BPF program from loading. Existing retry-without-BTF logic for creating maps will succeed to create such maps without any problems. So, presence of .maps section shouldn't make BTF required for kernel. Update the check accordingly. Validated by ensuring simple BPF program with BTF-defined maps is still loaded on old kernel without BTF support and map is correctly parsed and created. Fixes: abd29c931459 ("libbpf: allow specifying map definitions using BTF") Reported-by: Julia Kartseva <hex@fb.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200220062635.1497872-1-andriin@fb.com
2020-02-20Merge branch 's390-fixes'David S. Miller2-18/+14
Julian Wiedmann says: ==================== s390/qeth: fixes 2020-02-20 please apply the following patch series for qeth to netdev's net tree. This corrects three minor issues: 1) return a more fitting errno when VNICC cmds are not supported, 2) remove a bogus WARN in the NAPI code, and 3) be _very_ pedantic about the RX copybreak. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20s390/qeth: fix off-by-one in RX copybreak checkJulian Wiedmann1-1/+1
The RX copybreak is intended as the _max_ value where the frame's data should be copied. So for frame_len == copybreak, don't build an SG skb. Fixes: 4a71df50047f ("qeth: new qeth device driver") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20s390/qeth: don't warn for napi with 0 budgetJulian Wiedmann1-1/+0
Calling napi->poll() with 0 budget is a legitimate use by netpoll. Fixes: a1c3ed4c9ca0 ("qeth: NAPI support for l2 and l3 discipline") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20s390/qeth: vnicc Fix EOPNOTSUPP precedenceAlexandra Winter1-16/+13
When getting or setting VNICC parameters, the error code EOPNOTSUPP should have precedence over EBUSY. EBUSY is used because vnicc feature and bridgeport feature are mutually exclusive, which is a temporary condition. Whereas EOPNOTSUPP indicates that the HW does not support all or parts of the vnicc feature. This issue causes the vnicc sysfs params to show 'blocked by bridgeport' for HW that does not support VNICC at all. Fixes: caa1f0b10d18 ("s390/qeth: add VNICC enable/disable support") Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20net: use netif_is_bridge_port() to check for IFF_BRIDGE_PORTJulian Wiedmann5-10/+10
Trivial cleanup, so that all bridge port-specific code can be found in one go. CC: Johannes Berg <johannes@sipsolutions.net> CC: Roopa Prabhu <roopa@cumulusnetworks.com> CC: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20net: page_pool: API cleanup and commentsIlias Apalodimas6-80/+74
Functions starting with __ usually indicate those which are exported, but should not be called directly. Update some of those declared in the API and make it more readable. page_pool_unmap_page() and page_pool_release_page() were doing exactly the same thing calling __page_pool_clean_page(). Let's rename __page_pool_clean_page() to page_pool_release_page() and export it in order to show up on perf logs and get rid of page_pool_unmap_page(). Finally rename __page_pool_put_page() to page_pool_put_page() since we can now directly call it from drivers and rename the existing page_pool_put_page() to page_pool_put_full_page() since they do the same thing but the latter is trying to sync the full DMA area. This patch also updates netsec, mvneta and stmmac drivers which use those functions. Suggested-by: Jonathan Lemon <jonathan.lemon@gmail.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20Merge branch 'mlxsw-Preparation-for-RTNL-removal'David S. Miller8-123/+199
Ido Schimmel says: ==================== mlxsw: Preparation for RTNL removal The driver currently acquires RTNL in its route insertion path, which contributes to very large control plane latencies. This patch set prepares mlxsw for RTNL removal from its route insertion path in a follow-up patch set. Patches #1-#2 protect shared resources - KVDL and counter pool - with their own locks. All allocations of these resources are currently performed under RTNL, so no locks were required. Patches #3-#7 ensure that updates to mirroring sessions only take place in case there are active mirroring sessions. This allows us to avoid taking RTNL when it is unnecessary, as updating of the mirroring sessions must be performed under RTNL for the time being. Patches #8-#10 replace the use of APIs that assume that RTNL is taken with their RCU counterparts. Specifically, patches #8 and #9 replace __in_dev_get_rtnl() with __in_dev_get_rcu() under RCU read-side critical section. Patch #10 replaces __dev_get_by_index() with dev_get_by_index_rcu(). Patches #11-#15 perform small adjustments in the code to make it easier to later introduce a router lock instead of relying on RTNL. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20mlxsw: spectrum_nve: Make tunnel initialization symmetricIdo Schimmel1-2/+3
The device supports a single VTEP whose configuration is shared between all VXLAN tunnels. While the shared configuration is cleared upon the destruction of the last tunnel - in mlxsw_sp_nve_tunnel_fini() - it is set in mlxsw_sp_nve_fid_enable(), after calling mlxsw_sp_nve_tunnel_init(). Make tunnel initialization and destruction symmetric and set the configuration in mlxsw_sp_nve_tunnel_init(). This will later allow us to protect the shared configuration with a lock. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20mlxsw: spectrum: Export function to check if RIF existsIdo Schimmel3-7/+16
After the previous patch, all the callers of mlxsw_sp_rif_find_by_dev() outside of the routing code use it to understand if a RIF exists for the passed netdev. Therefore, export a function to check if a RIF exists and make mlxsw_sp_rif_find_by_dev() internal to the routing code. This will later allow us to more easily introduce the router lock which will also protect the RIFs. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20mlxsw: spectrum: Prevent RIF access outside of routing codeIdo Schimmel3-12/+24
There are currently 5 users of mlxsw_sp_rif_find_by_dev() outside of the routing code. Only one call site actually needs to dereference the router interface (RIF). The rest merely need to know if a RIF exists for the provided netdev. Convert this call site to query the needed information directly from the routing code instead of dereferencing the RIF. This will later allow us to replace mlxsw_sp_rif_find_by_dev() with a function that checks if a RIF exist. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20mlxsw: spectrum_router: Prepare function for router lock introductionIdo Schimmel1-3/+9
The function de-associates the port-vlan from its router interface (RIF). It is called both from the netdev notifier block and the inetaddr notifier block that will soon hold the router lock. Make sure that router code calls the internal version, as it will already have the router lock held when the function is called. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20mlxsw: spectrum_router: Prepare function for router lock introductionIdo Schimmel1-3/+9
The function removes the FDB entry that directs the macvlan's MAC to the router port. It is called from both the netdev notifier block and the inetaddr notifier block that will soon hold the router lock. Make sure that only the netdev notifier calls the exported version, so that is will take the router lock, which will already be held by the inetaddr notifier. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20mlxsw: spectrum_router: Do not assume RTNL is taken when resolving underlay ↵Ido Schimmel1-10/+29
device The function that resolves the underlay device of the IPIP tunnel assumes that RTNL is taken, but this will not be correct when RTNL is removed from the route insertion path. Convert the function to use dev_get_by_index_rcu() instead of __dev_get_by_index() and make sure it is always called from an RCU read-side critical section. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20mlxsw: spectrum_router: Do not assume RTNL is taken during RIF teardownIdo Schimmel1-1/+3
IPv6 addresses are deleted in an atomic context, so the driver defers the potential teardown of the associated router interface (RIF) to a work item that takes RTNL. The RIF is only destroyed if the associated netdev does not have any IP addresses (both IPv4 and IPv6). The IPv4 device ('struct in_device') is currently fetched via __in_dev_get_rtnl() which assumes RTNL is taken. Since RTNL is going to be removed, convert it to use __in_dev_get_rcu() from an RCU read-side critical section. Note that the IPv6 device ('struct inet6_dev') is fetched via __in6_dev_get(), which does not require RTNL. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20mlxsw: spectrum_router: Do not assume RTNL is taken during nexthop initIdo Schimmel1-2/+6
RTNL is going to be removed from route insertion path, so use __in_dev_get_rcu() from an RCU read-side critical section instead of __in_dev_get_rtnl() which assumes RTNL is taken. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20mlxsw: spectrum_span: Only update mirroring agents if presentIdo Schimmel1-10/+10
In order not to needlessly schedule the work item that updates the mirroring agents, only schedule it if there are any mirroring agents present. This is done by adding an atomic counter that counts the active mirroring agents. It is incremented / decremented whenever a mirroring agent is created / destroyed. It is read before scheduling the work item and in the devlink-resource occupancy callback. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20mlxsw: spectrum: Convert callers to use new mirroring APIIdo Schimmel2-56/+7
Previous patch added a work item in the mirroring code that will take care of updating the active mirroring agents in response to different events. Change the mirroring agents update function - mlxsw_sp_span_respin() - to invoke this work item when called. Therefore there is no need for callers to schedule a work item themselves. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20mlxsw: spectrum_span: Prepare work item to update mirroring agentsIdo Schimmel1-0/+38
The driver updates its mirroring agents whenever it receives a notification about an event that can affect these. For example, the addition of a route might require the driver to change the egress port of an ERSPAN session. Currently, RTNL needs to be held when these agents are updates, so the driver either: 1. Calls directly into the mirroring code, in case RTNL is held 2. Schedules a work item that will take RTNL and call into the mirroring code Simplify this by having the mirroring code schedule the work item for the update instead of requiring callers to schedule a work item themselves. The conversion of the callers will be done in the next patch to make review easier. This will later allow us to remove RTNL from different parts of the driver. It will also allow us to only schedule the work item in case there are active mirroring agents, which is information private to the mirroring code. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20mlxsw: spectrum_span: Use struct_size() to simplify allocationIdo Schimmel1-18/+5
Allocate the main mirroring struct and the individual structs for the different mirroring agents in a single allocation. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20mlxsw: spectrum_span: Do no expose mirroring agents to entire driverIdo Schimmel2-32/+46
The struct holding the different mirroring agents is currently allocated as part of the main driver struct. This is unlike other driver modules. Allocate the memory required to store the different mirroring agents as part of the initialization of the mirroring module. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20mlxsw: spectrum: Protect counter pool with a lockIdo Schimmel1-5/+20
The counter pool is a shared resource. It is used by both the ACL code to allocate counters for actions and by the routing code to allocate counters for adjacency entries (for example). Currently, all allocations are protected by RTNL, but this is going to change with the removal of RTNL from the routing code. Therefore, protect counter allocations with a spin lock. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20mlxsw: spectrum_kvdl: Protect allocations with a lockIdo Schimmel1-2/+14
The KVDL is used to store objects allocated throughout various places in the driver. For example, both nexthops (adjacency entries) and ACL actions are stored in the KVDL. Currently, all allocations are protected by RTNL, but this is going to change with the removal of RTNL from the routing code. Therefore, protect KVDL allocations with a lock. A mutex is used since the free operation can block in Spectrum-2. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20net: remove unused macro from fib_trie.cLi RongQing1-5/+0
TNODE_KMALLOC_MAX and VERSION are not used, so remove them Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20net: neigh: remove unused NEIGH_SYSCTL_MS_JIFFIES_ENTRYLi RongQing1-3/+0
this macro is never used, so remove it Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20openvswitch: Distribute switch variables for initializationKees Cook1-8/+10
Variables declared in a switch statement before any case statements cannot be automatically initialized with compiler instrumentation (as they are not part of any execution flow). With GCC's proposed automatic stack variable initialization feature, this triggers a warning (and they don't get initialized). Clang's automatic stack variable initialization (via CONFIG_INIT_STACK_ALL=y) doesn't throw a warning, but it also doesn't initialize such variables[1]. Note that these warnings (or silent skipping) happen before the dead-store elimination optimization phase, so even when the automatic initializations are later elided in favor of direct initializations, the warnings remain. To avoid these problems, move such variables into the "case" where they're used or lift them up into the main function body. net/openvswitch/flow_netlink.c: In function ‘validate_set’: net/openvswitch/flow_netlink.c:2711:29: warning: statement will never be executed [-Wswitch-unreachable] 2711 | const struct ovs_key_ipv4 *ipv4_key; | ^~~~~~~~ [1] https://bugs.llvm.org/show_bug.cgi?id=44916 Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20net: ip6_gre: Distribute switch variables for initializationKees Cook2-7/+14
Variables declared in a switch statement before any case statements cannot be automatically initialized with compiler instrumentation (as they are not part of any execution flow). With GCC's proposed automatic stack variable initialization feature, this triggers a warning (and they don't get initialized). Clang's automatic stack variable initialization (via CONFIG_INIT_STACK_ALL=y) doesn't throw a warning, but it also doesn't initialize such variables[1]. Note that these warnings (or silent skipping) happen before the dead-store elimination optimization phase, so even when the automatic initializations are later elided in favor of direct initializations, the warnings remain. To avoid these problems, move such variables into the "case" where they're used or lift them up into the main function body. net/ipv6/ip6_gre.c: In function ‘ip6gre_err’: net/ipv6/ip6_gre.c:440:32: warning: statement will never be executed [-Wswitch-unreachable] 440 | struct ipv6_tlv_tnl_enc_lim *tel; | ^~~ net/ipv6/ip6_tunnel.c: In function ‘ip6_tnl_err’: net/ipv6/ip6_tunnel.c:520:32: warning: statement will never be executed [-Wswitch-unreachable] 520 | struct ipv6_tlv_tnl_enc_lim *tel; | ^~~ [1] https://bugs.llvm.org/show_bug.cgi?id=44916 Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20net: core: Distribute switch variables for initializationKees Cook1-2/+2
Variables declared in a switch statement before any case statements cannot be automatically initialized with compiler instrumentation (as they are not part of any execution flow). With GCC's proposed automatic stack variable initialization feature, this triggers a warning (and they don't get initialized). Clang's automatic stack variable initialization (via CONFIG_INIT_STACK_ALL=y) doesn't throw a warning, but it also doesn't initialize such variables[1]. Note that these warnings (or silent skipping) happen before the dead-store elimination optimization phase, so even when the automatic initializations are later elided in favor of direct initializations, the warnings remain. To avoid these problems, move such variables into the "case" where they're used or lift them up into the main function body. net/core/skbuff.c: In function ‘skb_checksum_setup_ip’: net/core/skbuff.c:4809:7: warning: statement will never be executed [-Wswitch-unreachable] 4809 | int err; | ^~~ [1] https://bugs.llvm.org/show_bug.cgi?id=44916 Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20kvm/emulate: fix a -Werror=cast-function-typeQian Cai2-23/+26
arch/x86/kvm/emulate.c: In function 'x86_emulate_insn': arch/x86/kvm/emulate.c:5686:22: error: cast between incompatible function types from 'int (*)(struct x86_emulate_ctxt *)' to 'void (*)(struct fastop *)' [-Werror=cast-function-type] rc = fastop(ctxt, (fastop_t)ctxt->execute); Fix it by using an unnamed union of a (*execute) function pointer and a (*fastop) function pointer. Fixes: 3009afc6e39e ("KVM: x86: Use a typedef for fastop functions") Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-20KVM: x86: fix incorrect comparison in trace eventPaolo Bonzini1-1/+1
The "u" field in the event has three states, -1/0/1. Using u8 however means that comparison with -1 will always fail, so change to signed char. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-20x86/xen: Distribute switch variables for initializationKees Cook1-3/+4
Variables declared in a switch statement before any case statements cannot be automatically initialized with compiler instrumentation (as they are not part of any execution flow). With GCC's proposed automatic stack variable initialization feature, this triggers a warning (and they don't get initialized). Clang's automatic stack variable initialization (via CONFIG_INIT_STACK_ALL=y) doesn't throw a warning, but it also doesn't initialize such variables[1]. Note that these warnings (or silent skipping) happen before the dead-store elimination optimization phase, so even when the automatic initializations are later elided in favor of direct initializations, the warnings remain. To avoid these problems, move such variables into the "case" where they're used or lift them up into the main function body. arch/x86/xen/enlighten_pv.c: In function ‘xen_write_msr_safe’: arch/x86/xen/enlighten_pv.c:904:12: warning: statement will never be executed [-Wswitch-unreachable] 904 | unsigned which; | ^~~~~ [1] https://bugs.llvm.org/show_bug.cgi?id=44916 Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20200220062318.69299-1-keescook@chromium.org Reviewed-by: Juergen Gross <jgross@suse.com> [boris: made @which an 'unsigned int'] Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2020-02-20selftests/rseq: Fix out-of-tree compilationMichael Ellerman1-1/+1
Currently if you build with O=... the rseq tests don't build: $ make O=$PWD/output -C tools/testing/selftests/ TARGETS=rseq make: Entering directory '/linux/tools/testing/selftests' ... make[1]: Entering directory '/linux/tools/testing/selftests/rseq' gcc -O2 -Wall -g -I./ -I../../../../usr/include/ -L./ -Wl,-rpath=./ -shared -fPIC rseq.c -lpthread -o /linux/output/rseq/librseq.so gcc -O2 -Wall -g -I./ -I../../../../usr/include/ -L./ -Wl,-rpath=./ basic_test.c -lpthread -lrseq -o /linux/output/rseq/basic_test /usr/bin/ld: cannot find -lrseq collect2: error: ld returned 1 exit status This is because the library search path points to the source directory, not the output. We can fix it by changing the library search path to $(OUTPUT). Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2020-02-20selftests: Install settings files to fix TIMEOUT failuresMichael Ellerman5-1/+9
Commit 852c8cbf34d3 ("selftests/kselftest/runner.sh: Add 45 second timeout per test") added a 45 second timeout for tests, and also added a way for tests to customise the timeout via a settings file. For example the ftrace tests take multiple minutes to run, so they were given longer in commit b43e78f65b1d ("tracing/selftests: Turn off timeout setting"). This works when the tests are run from the source tree. However if the tests are installed with "make -C tools/testing/selftests install", the settings files are not copied into the install directory. When the tests are then run from the install directory the longer timeouts are not applied and the tests timeout incorrectly. So add the settings files to TEST_FILES of the appropriate Makefiles to cause the settings files to be installed using the existing install logic. Fixes: 852c8cbf34d3 ("selftests/kselftest/runner.sh: Add 45 second timeout per test") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2020-02-20selftest/lkdtm: Don't pollute 'git status'Christophe Leroy1-0/+4
Commit 46d1a0f03d66 ("selftests/lkdtm: Add tests for LKDTM targets") added generation of lkdtm test scripts. Ignore those generated scripts when performing 'git status' Fixes: 46d1a0f03d66 ("selftests/lkdtm: Add tests for LKDTM targets") Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2020-02-20mm: Avoid creating virtual address aliases in brk()/mmap()/mremap()Catalin Marinas3-7/+9
Currently the arm64 kernel ignores the top address byte passed to brk(), mmap() and mremap(). When the user is not aware of the 56-bit address limit or relies on the kernel to return an error, untagging such pointers has the potential to create address aliases in user-space. Passing a tagged address to munmap(), madvise() is permitted since the tagged pointer is expected to be inside an existing mapping. The current behaviour breaks the existing glibc malloc() implementation which relies on brk() with an address beyond 56-bit to be rejected by the kernel. Remove untagging in the above functions by partially reverting commit ce18d171cb73 ("mm: untag user pointers in mmap/munmap/mremap/brk"). In addition, update the arm64 tagged-address-abi.rst document accordingly. Link: https://bugzilla.redhat.com/1797052 Fixes: ce18d171cb73 ("mm: untag user pointers in mmap/munmap/mremap/brk") Cc: <stable@vger.kernel.org> # 5.4.x- Cc: Florian Weimer <fweimer@redhat.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reported-by: Victor Stinner <vstinner@redhat.com> Acked-by: Will Deacon <will@kernel.org> Acked-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2020-02-20docs: arm64: fix trivial spelling enought to enough in memory.rstScott Branden1-1/+1
Fix trivial spelling error enought to enough in memory.rst. Cc: trivial@kernel.org Signed-off-by: Scott Branden <scott.branden@broadcom.com> Signed-off-by: Will Deacon <will@kernel.org>
2020-02-20ext4: add cond_resched() to __ext4_find_entry()Shijie Luo1-0/+1
We tested a soft lockup problem in linux 4.19 which could also be found in linux 5.x. When dir inode takes up a large number of blocks, and if the directory is growing when we are searching, it's possible the restart branch could be called many times, and the do while loop could hold cpu a long time. Here is the call trace in linux 4.19. [ 473.756186] Call trace: [ 473.756196] dump_backtrace+0x0/0x198 [ 473.756199] show_stack+0x24/0x30 [ 473.756205] dump_stack+0xa4/0xcc [ 473.756210] watchdog_timer_fn+0x300/0x3e8 [ 473.756215] __hrtimer_run_queues+0x114/0x358 [ 473.756217] hrtimer_interrupt+0x104/0x2d8 [ 473.756222] arch_timer_handler_virt+0x38/0x58 [ 473.756226] handle_percpu_devid_irq+0x90/0x248 [ 473.756231] generic_handle_irq+0x34/0x50 [ 473.756234] __handle_domain_irq+0x68/0xc0 [ 473.756236] gic_handle_irq+0x6c/0x150 [ 473.756238] el1_irq+0xb8/0x140 [ 473.756286] ext4_es_lookup_extent+0xdc/0x258 [ext4] [ 473.756310] ext4_map_blocks+0x64/0x5c0 [ext4] [ 473.756333] ext4_getblk+0x6c/0x1d0 [ext4] [ 473.756356] ext4_bread_batch+0x7c/0x1f8 [ext4] [ 473.756379] ext4_find_entry+0x124/0x3f8 [ext4] [ 473.756402] ext4_lookup+0x8c/0x258 [ext4] [ 473.756407] __lookup_hash+0x8c/0xe8 [ 473.756411] filename_create+0xa0/0x170 [ 473.756413] do_mkdirat+0x6c/0x140 [ 473.756415] __arm64_sys_mkdirat+0x28/0x38 [ 473.756419] el0_svc_common+0x78/0x130 [ 473.756421] el0_svc_handler+0x38/0x78 [ 473.756423] el0_svc+0x8/0xc [ 485.755156] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [tmp:5149] Add cond_resched() to avoid soft lockup and to provide a better system responding. Link: https://lore.kernel.org/r/20200215080206.13293-1-luoshijie1@huawei.com Signed-off-by: Shijie Luo <luoshijie1@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Cc: stable@kernel.org
2020-02-20ext4: fix a data race in EXT4_I(inode)->i_disksizeQian Cai2-2/+2
EXT4_I(inode)->i_disksize could be accessed concurrently as noticed by KCSAN, BUG: KCSAN: data-race in ext4_write_end [ext4] / ext4_writepages [ext4] write to 0xffff91c6713b00f8 of 8 bytes by task 49268 on cpu 127: ext4_write_end+0x4e3/0x750 [ext4] ext4_update_i_disksize at fs/ext4/ext4.h:3032 (inlined by) ext4_update_inode_size at fs/ext4/ext4.h:3046 (inlined by) ext4_write_end at fs/ext4/inode.c:1287 generic_perform_write+0x208/0x2a0 ext4_buffered_write_iter+0x11f/0x210 [ext4] ext4_file_write_iter+0xce/0x9e0 [ext4] new_sync_write+0x29c/0x3b0 __vfs_write+0x92/0xa0 vfs_write+0x103/0x260 ksys_write+0x9d/0x130 __x64_sys_write+0x4c/0x60 do_syscall_64+0x91/0xb47 entry_SYSCALL_64_after_hwframe+0x49/0xbe read to 0xffff91c6713b00f8 of 8 bytes by task 24872 on cpu 37: ext4_writepages+0x10ac/0x1d00 [ext4] mpage_map_and_submit_extent at fs/ext4/inode.c:2468 (inlined by) ext4_writepages at fs/ext4/inode.c:2772 do_writepages+0x5e/0x130 __writeback_single_inode+0xeb/0xb20 writeback_sb_inodes+0x429/0x900 __writeback_inodes_wb+0xc4/0x150 wb_writeback+0x4bd/0x870 wb_workfn+0x6b4/0x960 process_one_work+0x54c/0xbe0 worker_thread+0x80/0x650 kthread+0x1e0/0x200 ret_from_fork+0x27/0x50 Reported by Kernel Concurrency Sanitizer on: CPU: 37 PID: 24872 Comm: kworker/u261:2 Tainted: G W O L 5.5.0-next-20200204+ #5 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019 Workqueue: writeback wb_workfn (flush-7:0) Since only the read is operating as lockless (outside of the "i_data_sem"), load tearing could introduce a logic bug. Fix it by adding READ_ONCE() for the read and WRITE_ONCE() for the write. Signed-off-by: Qian Cai <cai@lca.pw> Link: https://lore.kernel.org/r/1581085751-31793-1-git-send-email-cai@lca.pw Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2020-02-20Merge branch 'linux-5.6' of git://github.com/skeggsb/linux into drm-fixesDave Airlie5-0/+48
Nothing major here, another TU1xx modesetting fix, and hooking up ACR/GR support on TU11x now that NVIDIA have made the firmware available. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Ben Skeggs <skeggsb@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/ <CACAvsv64yBq4KHJ8D-5HQ5eeotApJSMiD+V2ut4f3BonUggf0Q@mail.gmail.com
2020-02-20hwmon: (acpi_power_meter) Fix lockdep splatGuenter Roeck1-8/+8
Damien Le Moal reports a lockdep splat with the acpi_power_meter, observed with Linux v5.5 and later. ====================================================== WARNING: possible circular locking dependency detected 5.6.0-rc2+ #629 Not tainted ------------------------------------------------------ python/1397 is trying to acquire lock: ffff888619080070 (&resource->lock){+.+.}, at: show_power+0x3c/0xa0 [acpi_power_meter] but task is already holding lock: ffff88881643f188 (kn->count#119){++++}, at: kernfs_seq_start+0x6a/0x160 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (kn->count#119){++++}: __kernfs_remove+0x626/0x7e0 kernfs_remove_by_name_ns+0x41/0x80 remove_attrs+0xcb/0x3c0 [acpi_power_meter] acpi_power_meter_notify+0x1f7/0x310 [acpi_power_meter] acpi_ev_notify_dispatch+0x198/0x1f3 acpi_os_execute_deferred+0x4d/0x70 process_one_work+0x7c8/0x1340 worker_thread+0x94/0xc70 kthread+0x2ed/0x3f0 ret_from_fork+0x24/0x30 -> #0 (&resource->lock){+.+.}: __lock_acquire+0x20be/0x49b0 lock_acquire+0x127/0x340 __mutex_lock+0x15b/0x1350 show_power+0x3c/0xa0 [acpi_power_meter] dev_attr_show+0x3f/0x80 sysfs_kf_seq_show+0x216/0x410 seq_read+0x407/0xf90 vfs_read+0x152/0x2c0 ksys_read+0xf3/0x1d0 do_syscall_64+0x95/0x1010 entry_SYSCALL_64_after_hwframe+0x49/0xbe other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(kn->count#119); lock(&resource->lock); lock(kn->count#119); lock(&resource->lock); *** DEADLOCK *** 4 locks held by python/1397: #0: ffff8890242d64e0 (&f->f_pos_lock){+.+.}, at: __fdget_pos+0x9b/0xb0 #1: ffff889040be74e0 (&p->lock){+.+.}, at: seq_read+0x6b/0xf90 #2: ffff8890448eb880 (&of->mutex){+.+.}, at: kernfs_seq_start+0x47/0x160 #3: ffff88881643f188 (kn->count#119){++++}, at: kernfs_seq_start+0x6a/0x160 stack backtrace: CPU: 10 PID: 1397 Comm: python Not tainted 5.6.0-rc2+ #629 Hardware name: Supermicro Super Server/X11DPL-i, BIOS 3.1 05/21/2019 Call Trace: dump_stack+0x97/0xe0 check_noncircular+0x32e/0x3e0 ? print_circular_bug.isra.0+0x1e0/0x1e0 ? unwind_next_frame+0xb9a/0x1890 ? entry_SYSCALL_64_after_hwframe+0x49/0xbe ? graph_lock+0x79/0x170 ? __lockdep_reset_lock+0x3c0/0x3c0 ? mark_lock+0xbc/0x1150 __lock_acquire+0x20be/0x49b0 ? mark_held_locks+0xe0/0xe0 ? stack_trace_save+0x91/0xc0 lock_acquire+0x127/0x340 ? show_power+0x3c/0xa0 [acpi_power_meter] ? device_remove_bin_file+0x10/0x10 ? device_remove_bin_file+0x10/0x10 __mutex_lock+0x15b/0x1350 ? show_power+0x3c/0xa0 [acpi_power_meter] ? show_power+0x3c/0xa0 [acpi_power_meter] ? mutex_lock_io_nested+0x11f0/0x11f0 ? lock_downgrade+0x6a0/0x6a0 ? kernfs_seq_start+0x47/0x160 ? lock_acquire+0x127/0x340 ? kernfs_seq_start+0x6a/0x160 ? device_remove_bin_file+0x10/0x10 ? show_power+0x3c/0xa0 [acpi_power_meter] show_power+0x3c/0xa0 [acpi_power_meter] dev_attr_show+0x3f/0x80 ? memset+0x20/0x40 sysfs_kf_seq_show+0x216/0x410 seq_read+0x407/0xf90 ? security_file_permission+0x16f/0x2c0 vfs_read+0x152/0x2c0 Problem is that reading an attribute takes the kernfs lock in the kernfs code, then resource->lock in the driver. During an ACPI notification, the opposite happens: The resource lock is taken first, followed by the kernfs lock when sysfs attributes are removed and re-created. Presumably this is now seen due to some locking related changes in kernfs after v5.4, but it was likely always a problem. Fix the problem by not blindly acquiring the lock in the notification function. It is only needed to protect the various update functions. However, those update functions are called anyway when sysfs attributes are read. This means that we can just stop calling those functions from the notifier, and the resource lock in the notifier function is no longer needed. That leaves two situations: First, METER_NOTIFY_CONFIG removes and re-allocates capability strings. While it did so under the resource lock, _displaying_ those strings was not protected, creating a race condition. To solve this problem, selectively protect both removal/creation and reporting of capability attributes with the resource lock. Second, removing and re-creating the attribute files is no longer protected by the resource lock. That doesn't matter since access to each individual attribute is protected by the kernfs lock. Userspace may get messed up if attributes disappear and reappear under its nose, but that is not different than today, and there is nothing we can do about it without major driver restructuring. Last but not least, when removing the driver, remove attribute functions first, then release capability strings. This avoids yet another race condition. Reported-by: Damien Le Moal <Damien.LeMoal@wdc.com> Cc: Damien Le Moal <Damien.LeMoal@wdc.com> Cc: stable@vger.kernel.org # v5.5+ Tested-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2020-02-20Merge tag 'linux-kselftest-5.6-rc3' of ↵Linus Torvalds11-29/+50
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull Kselftest fixes from Shuah Khan: "Fixes to build failures and other test bugs" * tag 'linux-kselftest-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests: openat2: fix build error on newer glibc selftests: use LDLIBS for libraries instead of LDFLAGS selftests: fix too long argument selftests: allow detection of build failures Kernel selftests: tpm2: check for tpm support selftests/ftrace: Have pid filter test use instance flag selftests: fix spelling mistaked "chaigned" -> "chained"
2020-02-20dt-bindings: media: csi: Fix clocks descriptionMaxime Ripard1-12/+18
Commit 1de243b07666 ("media: dt-bindings: media: sun4i-csi: Add compatible for CSI1 on A10/A20") introduced support for the CSI1 controller on A10 and A20 that unlike CSI0 doesn't have an ISP and therefore only have two clocks, the bus and module clocks. The clocks and clock-names properties have thus been modified to allow either two or tree clocks. However, the current list has the ISP clock at the second position, which means the bindings expects a list of either bus and isp, or bus, isp and mod. The initial intent of the patch was obviously to have bus and mod in the former case. Let's fix the binding so that it validates properly. Fixes: 1de243b07666 ("media: dt-bindings: media: sun4i-csi: Add compatible for CSI1 on A10/A20") Signed-off-by: Maxime Ripard <maxime@cerno.tech> Signed-off-by: Rob Herring <robh@kernel.org>
2020-02-20dt-bindings: media: csi: Add interconnects propertiesMaxime Ripard1-0/+10
The Allwinner CSI controller is sitting beside the MBUS that is represented as an interconnect. Make sure that the interconnect properties are valid in the binding. Fixes: 7866d6903ce8 ("media: dt-bindings: media: sun4i-csi: Add compatible for CSI0 on R40") Signed-off-by: Maxime Ripard <maxime@cerno.tech> Signed-off-by: Rob Herring <robh@kernel.org>