Age | Commit message (Collapse) | Author | Files | Lines |
|
This commit converts several CONFIG_RCU_NOCB_CPU_ALL #ifdefs to
instead use IS_ENABLED(). This change should help avoid hiding
code from compiler diagnostics.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
Our current documentation claims that, when followed by an ACQUIRE,
smp_mb__before_spinlock() orders prior loads against subsequent loads
and stores, which isn't the intent. This commit therefore fixes the
documentation to state that this sequence orders only prior stores
against subsequent loads and stores.
In addition, the original intent of smp_mb__before_spinlock() was to only
order prior loads against subsequent stores, however, people have started
using it as if it ordered prior loads against subsequent loads and stores.
This commit therefore also updates smp_mb__before_spinlock()'s header
comment to reflect this new reality.
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
Now that rcu_access_index() and rcu_dereference_index_check() are no
longer used, the commit removes them from the RCU API. This means that
RCU's data dependencies now involve only pointers, give or take the
occasional cast to and then back from an integer type to do pointer
arithmetic. This in turn eliminates the need for a number of operations
on values carrying RCU data dependencies.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: linux-edac@vger.kernel.org
Cc: Tony Luck <tony.luck@intel.com>
Acked-by: Borislav Petkov <bp@suse.de>
|
|
This commit moves from the old ACCESS_ONCE() API to the new READ_ONCE()
and WRITE_ONCE() APIs.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Updated to include kernel/torture.c as suggested by Jason Low. ]
|
|
sctp_v4_map_v6 was subtly writing and reading from members
of a union in a way the clobbered data it needed to read before
it read it.
Zeroing the v6 flowinfo overwrites the v4 sin_addr with 0, meaning
that every place that calls sctp_v4_map_v6 gets ::ffff:0.0.0.0 as the
result.
Reorder things to guarantee correct behaviour no matter what the
union layout is.
This impacts user space clients that open an IPv6 SCTP socket and
receive IPv4 connections. Prior to 299ee user space would see a
sockaddr with AF_INET and a correct address, after 299ee the sockaddr
is AF_INET6, but the address is wrong.
Fixes: 299ee123e198 (sctp: Fixup v4mapped behaviour to comply with Sock API)
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Rename topology_thread_cpumask() to topology_sibling_cpumask()
for more consistency with scheduler code.
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Benoit Cousson <bcousson@baylibre.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Jean Delvare <jdelvare@suse.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Drokin <oleg.drokin@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: http://lkml.kernel.org/r/1432645896-12588-2-git-send-email-bgolaszewski@baylibre.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
'int' is really not a proper data type for an MSR. Use u32 to make it
clear that we are dealing with a 32-bit unsigned hardware value.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Matt Fleming <matt.fleming@intel.com>
Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: Will Auld <will.auld@intel.com>
Link: http://lkml.kernel.org/r/20150518235149.919350144@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Commit 43b4578071c0 ("perf/x86: Reduce stack usage of
x86_schedule_events()") violated the rule that 'fake' scheduling; as
used for event/group validation; should not change the event state.
This went mostly un-noticed because repeated calls of
x86_pmu::get_event_constraints() would give the same result. And
x86_pmu::put_event_constraints() would mostly not do anything.
Commit e979121b1b15 ("perf/x86/intel: Implement cross-HT corruption
bug workaround") made the situation much worse by actually setting the
event->hw.constraint value to NULL, so when validation and actual
scheduling interact we get NULL ptr derefs.
Fix it by removing the constraint pointer from the event and move it
back to an array, this time in cpuc instead of on the stack.
validate_group()
x86_schedule_events()
event->hw.constraint = c; # store
<context switch>
perf_task_event_sched_in()
...
x86_schedule_events();
event->hw.constraint = c2; # store
...
put_event_constraints(event); # assume failure to schedule
intel_put_event_constraints()
event->hw.constraint = NULL;
<context switch end>
c = event->hw.constraint; # read -> NULL
if (!test_bit(hwc->idx, c->idxmsk)) # <- *BOOM* NULL deref
This in particular is possible when the event in question is a
cpu-wide event and group-leader, where the validate_group() tries to
add an event to the group.
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Hunter <ahh@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 43b4578071c0 ("perf/x86: Reduce stack usage of x86_schedule_events()")
Fixes: e979121b1b15 ("perf/x86/intel: Implement cross-HT corruption bug workaround")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Commit 5590f3196b29 ("drivers/core/of: Add symlink to device-tree from
devices with an OF node") adds the symlink `of_node` for each device
pointing to it's device tree node while creating/initialising it.
However the devicetree sysfs is created and setup in of_init which is
executed at core_initcall level. For all the devices created before
of_init, the following error is thrown:
"Error -2(-ENOENT) creating of_node link"
Like many other components in driver model, initialize the sysfs support
for OF/devicetree from driver_init so that it's ready before any devices
are created.
Fixes: 5590f3196b29 ("drivers/core/of: Add symlink to device-tree from
devices with an OF node")
Suggested-by: Rob Herring <robh+dt@kernel.org>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Pawel Moll <pawel.moll@arm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Tested-by: Robert Schwebel <r.schwebel@pengutronix.de>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
While adding support for 7425 PHY in the 7xxx PHY driver, the ID that
was used was actually coming from an external PHY: a BCM5461x. Fix this
by using the proper ID for the internal 7425 PHY and set the
PHY_IS_INTERNAL flag, otherwise consumers of this PHY driver would not
be able to properly identify it as such.
Fixes: d068b02cfdfc2 ("net: phy: add BCM7425 and BCM7429 PHYs")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Petri Gynther <pgynther@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
"One more fix from the timer departement:
- Handle division of negative nanosecond values proper on 32bit.
A recent cleanup wrecked the sign handling of the dividend and
dropped the check for negative divisors"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
ktime: Fix ktime_divns to do signed division
|
|
Pull networking fixes from David Miller:
1) Don't leak ipvs->sysctl_tbl, from Tommi Rentala.
2) Fix neighbour table entry leak in rocker driver, from Ying Xue.
3) Do not emit bonding notifications for unregistered interfaces, from
Nicolas Dichtel.
4) Set ipv6 flow label properly when in TIME_WAIT state, from Florent
Fourcot.
5) Fix regression in ipv6 multicast filter test, from Henning Rogge.
6) do_replace() in various footables netfilter modules is missing a
check for 0 counters in the datastructure provided by the user. Fix
from Dave Jones, and found with trinity.
7) Fix RCU bug in packet scheduler classifier module unloads, from
Daniel Borkmann.
8) Avoid deadlock in tcp_get_info() by using u64_sync. From Eric
Dumzaet.
9) Input packet processing can race with inetdev_destroy() teardown,
fix potential OOPS in ip_error() by explicitly testing whether the
inetdev is still attached. From Eric W Biederman.
10) MLDv2 parser in bridge multicast code breaks too early while
parsing. Fix from Thadeu Lima de Souza Cascardo.
11) Asking for settings on non-zero PHYID doesn't work because we do not
import the command structure from the user and use the PHYID
provided there. Fix from Arun Parameswaran.
12) Fix UDP checksums with IPV6 RAW sockets, from Vlad Yasevich.
13) Missing NF_TABLES depends for TPROXY etc can cause build failures,
fix from Florian Westphal.
14) Fix netfilter conntrack to handle RFC5961 challenge ACKs properly,
from Jesper Dangaard Brouer.
15) If netlink autobind retry fails, we have to reset the sockets portid
back to zero. From Herbert Xu.
16) VXLAN netns exit code unregisters using wrong device, from John W
Linville.
17) Add some USB device IDs to ath3k and btusb bluetooth drivers, from
Dmitry Tunin and Wen-chien Jesse Sung.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
bridge: fix lockdep splat
net: core: 'ethtool' issue with querying phy settings
bridge: fix parsing of MLDv2 reports
ARM: zynq: DT: Use the zynq binding with macb
net: macb: Disable half duplex gigabit on Zynq
net: macb: Document zynq gem dt binding
ipv4: fill in table id when replacing a route
cdc_ncm: Fix tx_bytes statistics
ipv4: Avoid crashing in ip_error
tcp: fix a potential deadlock in tcp_get_info()
net: sched: fix call_rcu() race on classifier module unloads
net: phy: Make sure phy_start() always re-enables the phy interrupts
ipv6: fix ECMP route replacement
ipv6: do not delete previously existing ECMP routes if add fails
Revert "netfilter: bridge: query conntrack about skb dnat"
netfilter: ensure number of counters is >0 in do_replace()
netfilter: nfnetlink_{log,queue}: Register pernet in first place
tcp: don't over-send F-RTO probes
tcp: only undo on partial ACKs in CA_Loss
net/ipv6/udp: Fix ipv6 multicast socket filter regression
...
|
|
Pull block fixes from Jens Axboe:
"Three small fixes that have been picked up the last few weeks.
Specifically:
- Fix a memory corruption issue in NVMe with malignant user
constructed request. From Christoph.
- Kill (now) unused blk_queue_bio(), dm was changed to not need this
anymore. From Mike Snitzer.
- Always use blk_schedule_flush_plug() from the io_schedule() path
when flushing a plug, fixing a !TASK_RUNNING warning with md. From
Shaohua"
* 'for-linus' of git://git.kernel.dk/linux-block:
sched: always use blk_schedule_flush_plug in io_schedule_out
nvme: fix kernel memory corruption with short INQUIRY buffers
block: remove export for blk_queue_bio
|
|
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contain Netfilter fixes for your net tree, they are:
1) Fix a race in nfnetlink_log and nfnetlink_queue that can lead to a crash.
This problem is due to wrong order in the per-net registration and netlink
socket events. Patch from Francesco Ruggeri.
2) Make sure that counters that userspace pass us are higher than 0 in all the
x_tables frontends. Discovered via Trinity, patch from Dave Jones.
3) Revert a patch for br_netfilter to rely on the conntrack status bits. This
breaks stateless IPv6 NAT transformations. Patch from Florian Westphal.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Taking socket spinlock in tcp_get_info() can deadlock, as
inet_diag_dump_icsk() holds the &hashinfo->ehash_locks[i],
while packet processing can use the reverse locking order.
We could avoid this locking for TCP_LISTEN states, but lockdep would
certainly get confused as all TCP sockets share same lockdep classes.
[ 523.722504] ======================================================
[ 523.728706] [ INFO: possible circular locking dependency detected ]
[ 523.734990] 4.1.0-dbg-DEV #1676 Not tainted
[ 523.739202] -------------------------------------------------------
[ 523.745474] ss/18032 is trying to acquire lock:
[ 523.750002] (slock-AF_INET){+.-...}, at: [<ffffffff81669d44>] tcp_get_info+0x2c4/0x360
[ 523.758129]
[ 523.758129] but task is already holding lock:
[ 523.763968] (&(&hashinfo->ehash_locks[i])->rlock){+.-...}, at: [<ffffffff816bcb75>] inet_diag_dump_icsk+0x1d5/0x6c0
[ 523.774661]
[ 523.774661] which lock already depends on the new lock.
[ 523.774661]
[ 523.782850]
[ 523.782850] the existing dependency chain (in reverse order) is:
[ 523.790326]
-> #1 (&(&hashinfo->ehash_locks[i])->rlock){+.-...}:
[ 523.796599] [<ffffffff811126bb>] lock_acquire+0xbb/0x270
[ 523.802565] [<ffffffff816f5868>] _raw_spin_lock+0x38/0x50
[ 523.808628] [<ffffffff81665af8>] __inet_hash_nolisten+0x78/0x110
[ 523.815273] [<ffffffff816819db>] tcp_v4_syn_recv_sock+0x24b/0x350
[ 523.822067] [<ffffffff81684d41>] tcp_check_req+0x3c1/0x500
[ 523.828199] [<ffffffff81682d09>] tcp_v4_do_rcv+0x239/0x3d0
[ 523.834331] [<ffffffff816842fe>] tcp_v4_rcv+0xa8e/0xc10
[ 523.840202] [<ffffffff81658fa3>] ip_local_deliver_finish+0x133/0x3e0
[ 523.847214] [<ffffffff81659a9a>] ip_local_deliver+0xaa/0xc0
[ 523.853440] [<ffffffff816593b8>] ip_rcv_finish+0x168/0x5c0
[ 523.859624] [<ffffffff81659db7>] ip_rcv+0x307/0x420
Lets use u64_sync infrastructure instead. As a bonus, 64bit
arches get optimized, as these are nop for them.
Fixes: 0df48c26d841 ("tcp: add tcpi_bytes_acked to tcp_info")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull two xen bugfixes from David Vrabel:
- fix ARM build regression.
- fix VIRQ_CONSOLE related oops.
* tag 'for-linus-4.1b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/events: don't bind non-percpu VIRQs with percpu chip
xen/arm: Define xen_arch_suspend()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
Pull HID fixes from Jiri Kosina:
"Bugfixes for HID subsystem that should go in 4.1. Important
highlights:
- the patch that extended support for HID++ protocol for TK820
touchpad turns out to be causing regressions due to firmware
issues; patch reverting back to basic support from Benjamin
Tissoires
- Wacom driver can oops for devices that report non-touch data on
touch interfaces. Fix from Ping Cheng
- gpiolib is not mandatory for i2c-hid, so the driver shouldn't fail
if gpiolib is not enabled. Fix from Mika Westerberg"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
HID: wacom: fix an Oops caused by wacom_wac_finger_count_touches
HID: usbhid: Add HID_QUIRK_NOGET for Aten DVI KVM switch
HID: hid-sensor-hub: Fix debug lock warning
Revert "HID: logitech-hidpp: support combo keyboard touchpad TK820"
HID: i2c-hid: Do not fail probing if gpiolib is not enabled
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes from Michael Turquette:
"The first set of clk fixes for 4.1 are all driver bugs, with the
exception of a single locking fix in the core code.
All driver fixes are for code that was merged recently. The Samsung
stuff is mostly fixes around suspend/resume, the Qualcomm fixes are
for invalid hardware configuration data and the Silicon Labs patches
are fixes following their move away from platform_data to Device Tree"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: si5351: Do not pass struct clk in platform_data
clk: si5351: Mention clock-names in the binding documentation
clk: add missing lock when call clk_core_enable in clk_set_parent
clk: exynos5420: Restore GATE_BUS_TOP on suspend
clk: qcom: Fix MSM8916 gfx3d_clk_src configuration
clk: qcom: Fix MSM8916 venus divider value
clk: exynos5433: Fix wrong PMS value of exynos5433_pll_rates
clk: exynos5433: Fix wrong parent clock of sclk_apollo clock
clk: exynos5433: Fix CLK_PCLK_MONOTONIC_CNT clk register assignment
clk: exynos5433: Fix wrong offset of PCLK_MSCL_SECURE_SMMU_JPEG
clk: Use CONFIG_ARCH_EXYNOS instead of CONFIG_ARCH_EXYNOS5433
|
|
This reverts commit c055d5b03bb4cb69d349d787c9787c0383abd8b2.
There are two issues:
'dnat_took_place' made me think that this is related to
-j DNAT/MASQUERADE.
But thats only one part of the story. This is also relevant for SNAT
when we undo snat translation in reverse/reply direction.
Furthermore, I originally wanted to do this mainly to avoid
storing ipv6 addresses once we make DNAT/REDIRECT work
for ipv6 on bridges.
However, I forgot about SNPT/DNPT which is stateless.
So we can't escape storing address for ipv6 anyway. Might as
well do it for ipv4 too.
Reported-and-tested-by: Bernhard Thaler <bernhard.thaler@wvnet.at>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
A non-percpu VIRQ (e.g., VIRQ_CONSOLE) may be freed on a different
VCPU than it is bound to. This can result in a race between
handle_percpu_irq() and removing the action in __free_irq() because
handle_percpu_irq() does not take desc->lock. The interrupt handler
sees a NULL action and oopses.
Only use the percpu chip/handler for per-CPU VIRQs (like VIRQ_TIMER).
# cat /proc/interrupts | grep virq
40: 87246 0 xen-percpu-virq timer0
44: 0 0 xen-percpu-virq debug0
47: 0 20995 xen-percpu-virq timer1
51: 0 0 xen-percpu-virq debug1
69: 0 0 xen-dyn-virq xen-pcpu
74: 0 0 xen-dyn-virq mce
75: 29 0 xen-dyn-virq hvc_console
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: <stable@vger.kernel.org>
|
|
tcp_illinois and upcoming tcp_cdg require 64bit alignment of
icsk_ca_priv
x86 does not care, but other architectures might.
Fixes: 05cbc0db03e82 ("ipv4: Create probe timer for tcp PMTU as per RFC4821")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Fan Du <fan.du@intel.com>
Acked-by: Fan Du <fan.du@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently people use TASK_INTERRUPTIBLE to idle kthreads and wait for
'work' because TASK_UNINTERRUPTIBLE contributes to the loadavg. Having
all idle kthreads contribute to the loadavg is somewhat silly.
Now mostly this works OK, because kthreads have all their signals
masked. However there's a few sites where this is causing problems and
TASK_UNINTERRUPTIBLE should be used, except for that loadavg issue.
This patch adds TASK_NOLOAD which, when combined with
TASK_UNINTERRUPTIBLE avoids the loadavg accounting.
As most of imagined usage sites are loops where a thread wants to
idle, waiting for work, a helper TASK_IDLE is introduced.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: NeilBrown <neilb@suse.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
As the fault handlers now all rely on the pagefault_disabled() checks
and implicit preempt_disable() calls by pagefault_disable() have been
made explicit, we can completely rely on the pagefault_disableD counter.
So let's no longer touch the preempt count when disabling/enabling
pagefaults. After a call to pagefault_disable(), pagefault_disabled()
will return true, but in_atomic() won't.
Reviewed-and-tested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: David.Laight@ACULAB.COM
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: airlied@linux.ie
Cc: akpm@linux-foundation.org
Cc: benh@kernel.crashing.org
Cc: bigeasy@linutronix.de
Cc: borntraeger@de.ibm.com
Cc: daniel.vetter@intel.com
Cc: heiko.carstens@de.ibm.com
Cc: herbert@gondor.apana.org.au
Cc: hocko@suse.cz
Cc: hughd@google.com
Cc: mst@redhat.com
Cc: paulus@samba.org
Cc: ralf@linux-mips.org
Cc: schwidefsky@de.ibm.com
Cc: yang.shi@windriver.com
Link: http://lkml.kernel.org/r/1431359540-32227-16-git-send-email-dahi@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
futex_atomic_cmpxchg_inatomic() explicitly
Let's explicitly disable/enable preemption in the !CONFIG_SMP version
of futex_atomic_cmpxchg_inatomic(), to prepare for pagefault_disable() not
touching preemption anymore. This is needed for this function to be
callable from both, atomic and non-atomic context.
Otherwise we might break mutual exclusion when relying on a get_user()/
put_user() implementation.
Reviewed-and-tested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: David.Laight@ACULAB.COM
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: airlied@linux.ie
Cc: akpm@linux-foundation.org
Cc: benh@kernel.crashing.org
Cc: bigeasy@linutronix.de
Cc: borntraeger@de.ibm.com
Cc: daniel.vetter@intel.com
Cc: heiko.carstens@de.ibm.com
Cc: herbert@gondor.apana.org.au
Cc: hocko@suse.cz
Cc: hughd@google.com
Cc: mst@redhat.com
Cc: paulus@samba.org
Cc: ralf@linux-mips.org
Cc: schwidefsky@de.ibm.com
Cc: yang.shi@windriver.com
Link: http://lkml.kernel.org/r/1431359540-32227-10-git-send-email-dahi@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
explicitly
Let's explicitly disable/enable preemption in the !CONFIG_SMP version
of futex_atomic_op_inuser, to prepare for pagefault_disable() not
touching preemption anymore.
Otherwise we might break mutual exclusion when relying on a get_user()/
put_user() implementation.
Reviewed-and-tested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: David.Laight@ACULAB.COM
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: airlied@linux.ie
Cc: akpm@linux-foundation.org
Cc: benh@kernel.crashing.org
Cc: bigeasy@linutronix.de
Cc: borntraeger@de.ibm.com
Cc: daniel.vetter@intel.com
Cc: heiko.carstens@de.ibm.com
Cc: herbert@gondor.apana.org.au
Cc: hocko@suse.cz
Cc: hughd@google.com
Cc: mst@redhat.com
Cc: paulus@samba.org
Cc: ralf@linux-mips.org
Cc: schwidefsky@de.ibm.com
Cc: yang.shi@windriver.com
Link: http://lkml.kernel.org/r/1431359540-32227-9-git-send-email-dahi@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
the handler
Introduce faulthandler_disabled() and use it to check for irq context and
disabled pagefaults (via pagefault_disable()) in the pagefault handlers.
Please note that we keep the in_atomic() checks in place - to detect
whether in irq context (in which case preemption is always properly
disabled).
In contrast, preempt_disable() should never be used to disable pagefaults.
With !CONFIG_PREEMPT_COUNT, preempt_disable() doesn't modify the preempt
counter, and therefore the result of in_atomic() differs.
We validate that condition by using might_fault() checks when calling
might_sleep().
Therefore, add a comment to faulthandler_disabled(), describing why this
is needed.
faulthandler_disabled() and pagefault_disable() are defined in
linux/uaccess.h, so let's properly add that include to all relevant files.
This patch is based on a patch from Thomas Gleixner.
Reviewed-and-tested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: David.Laight@ACULAB.COM
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: airlied@linux.ie
Cc: akpm@linux-foundation.org
Cc: benh@kernel.crashing.org
Cc: bigeasy@linutronix.de
Cc: borntraeger@de.ibm.com
Cc: daniel.vetter@intel.com
Cc: heiko.carstens@de.ibm.com
Cc: herbert@gondor.apana.org.au
Cc: hocko@suse.cz
Cc: hughd@google.com
Cc: mst@redhat.com
Cc: paulus@samba.org
Cc: ralf@linux-mips.org
Cc: schwidefsky@de.ibm.com
Cc: yang.shi@windriver.com
Link: http://lkml.kernel.org/r/1431359540-32227-7-git-send-email-dahi@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The existing code relies on pagefault_disable() implicitly disabling
preemption, so that no schedule will happen between kmap_atomic() and
kunmap_atomic().
Let's make this explicit, to prepare for pagefault_disable() not
touching preemption anymore.
Reviewed-and-tested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: David.Laight@ACULAB.COM
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: airlied@linux.ie
Cc: akpm@linux-foundation.org
Cc: benh@kernel.crashing.org
Cc: bigeasy@linutronix.de
Cc: borntraeger@de.ibm.com
Cc: daniel.vetter@intel.com
Cc: heiko.carstens@de.ibm.com
Cc: herbert@gondor.apana.org.au
Cc: hocko@suse.cz
Cc: hughd@google.com
Cc: mst@redhat.com
Cc: paulus@samba.org
Cc: ralf@linux-mips.org
Cc: schwidefsky@de.ibm.com
Cc: yang.shi@windriver.com
Link: http://lkml.kernel.org/r/1431359540-32227-5-git-send-email-dahi@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
disabled pagefaults
Commit 662bbcb2747c ("mm, sched: Allow uaccess in atomic with
pagefault_disable()") removed might_sleep() checks for all user access
code (that uses might_fault()).
The reason was to disable wrong "sleep in atomic" warnings in the
following scenario:
pagefault_disable()
rc = copy_to_user(...)
pagefault_enable()
Which is valid, as pagefault_disable() increments the preempt counter
and therefore disables the pagefault handler. copy_to_user() will not
sleep and return an error code if a page is not available.
However, as all might_sleep() checks are removed,
CONFIG_DEBUG_ATOMIC_SLEEP would no longer detect the following scenario:
spin_lock(&lock);
rc = copy_to_user(...)
spin_unlock(&lock)
If the kernel is compiled with preemption turned on, preempt_disable()
will make in_atomic() detect disabled preemption. The fault handler would
correctly never sleep on user access.
However, with preemption turned off, preempt_disable() is usually a NOP
(with !CONFIG_PREEMPT_COUNT), therefore in_atomic() will not be able to
detect disabled preemption nor disabled pagefaults. The fault handler
could sleep.
We really want to enable CONFIG_DEBUG_ATOMIC_SLEEP checks for user access
functions again, otherwise we can end up with horrible deadlocks.
Root of all evil is that pagefault_disable() acts almost as
preempt_disable(), depending on preemption being turned on/off.
As we now have pagefault_disabled(), we can use it to distinguish
whether user acces functions might sleep.
Convert might_fault() into a makro that calls __might_fault(), to
allow proper file + line messages in case of a might_sleep() warning.
Reviewed-and-tested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: David.Laight@ACULAB.COM
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: airlied@linux.ie
Cc: akpm@linux-foundation.org
Cc: benh@kernel.crashing.org
Cc: bigeasy@linutronix.de
Cc: borntraeger@de.ibm.com
Cc: daniel.vetter@intel.com
Cc: heiko.carstens@de.ibm.com
Cc: herbert@gondor.apana.org.au
Cc: hocko@suse.cz
Cc: hughd@google.com
Cc: mst@redhat.com
Cc: paulus@samba.org
Cc: ralf@linux-mips.org
Cc: schwidefsky@de.ibm.com
Cc: yang.shi@windriver.com
Link: http://lkml.kernel.org/r/1431359540-32227-3-git-send-email-dahi@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Until now, pagefault_disable()/pagefault_enabled() used the preempt
count to track whether in an environment with pagefaults disabled (can
be queried via in_atomic()).
This patch introduces a separate counter in task_struct to count the
level of pagefault_disable() calls. We'll keep manipulating the preempt
count to retain compatibility to existing pagefault handlers.
It is now possible to verify whether in a pagefault_disable() envionment
by calling pagefault_disabled(). In contrast to in_atomic() it will not
be influenced by preempt_enable()/preempt_disable().
This patch is based on a patch from Ingo Molnar.
Reviewed-and-tested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: David.Laight@ACULAB.COM
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: airlied@linux.ie
Cc: akpm@linux-foundation.org
Cc: benh@kernel.crashing.org
Cc: bigeasy@linutronix.de
Cc: borntraeger@de.ibm.com
Cc: daniel.vetter@intel.com
Cc: heiko.carstens@de.ibm.com
Cc: herbert@gondor.apana.org.au
Cc: hocko@suse.cz
Cc: hughd@google.com
Cc: mst@redhat.com
Cc: paulus@samba.org
Cc: ralf@linux-mips.org
Cc: schwidefsky@de.ibm.com
Cc: yang.shi@windriver.com
Link: http://lkml.kernel.org/r/1431359540-32227-2-git-send-email-dahi@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Now that PREEMPT_ACTIVE implies PREEMPT_DISABLE_OFFSET, ignoring
PREEMPT_ACTIVE from in_atomic() check isn't useful anymore.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1431441711-29753-7-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1431441711-29753-6-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
__schedule() disables preemption and some of its callers
(the preempt_schedule*() family) also set PREEMPT_ACTIVE.
So we have two preempt_count() modifications that could be performed
at once.
Lets remove the preemption disablement from __schedule() and pull
this responsibility to its callers in order to optimize preempt_count()
operations in a single place.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1431441711-29753-5-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
"CHECK" suggests it's only used as a comparison mask. But now it's used
further as a config-conditional preempt disabler offset. Lets
disambiguate this name.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1431441711-29753-4-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Adjust a few comments, and further integrate a few definitions after
the dumb headers copy.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1431441711-29753-3-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
preempt_mask.h defines all the preempt_count semantics and related
symbols: preempt, softirq, hardirq, nmi, preempt active, need resched,
etc...
preempt.h defines the accessors and mutators of preempt_count.
But there is a messy dependency game around those two header files:
* preempt_mask.h includes preempt.h in order to access preempt_count()
* preempt_mask.h defines all preempt_count semantic and symbols
except PREEMPT_NEED_RESCHED that is needed by asm/preempt.h
Thus we need to define it from preempt.h, right before including
asm/preempt.h, instead of defining it to preempt_mask.h with the
other preempt_count symbols. Therefore the preempt_count semantics
happen to be spread out.
* We plan to introduce preempt_active_[enter,exit]() to consolidate
preempt_schedule*() code. But we'll need to access both preempt_count
mutators (preempt_count_add()) and preempt_count symbols
(PREEMPT_ACTIVE, PREEMPT_OFFSET). The usual place to define preempt
operations is in preempt.h but then we'll need symbols in
preempt_mask.h which already includes preempt.h. So we end up with
a ressource circle dependency.
Lets merge preempt_mask.h into preempt.h to solve these dependency issues.
This way we gather semantic symbols and operation definition of
preempt_count in a single file.
This is a dumb copy-paste merge. Further merge re-arrangments are
performed in a subsequent patch to ease review.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1431441711-29753-2-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Since set_mb() is really about an smp_mb() -- not a IO/DMA barrier
like mb() rename it to match the recent smp_load_acquire() and
smp_store_release().
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Since we assume set_mb() to result in a single store followed by a
full memory barrier, employ WRITE_ONCE().
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here's some TTY and serial driver fixes for reported issues.
All of these have been in linux-next successfully"
* tag 'tty-4.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
pty: Fix input race when closing
tty/n_gsm.c: fix a memory leak when gsmtty is removed
Revert "serial/amba-pl011: Leave the TX IRQ alone when the UART is not open"
serial: omap: Fix error handling in probe
earlycon: Revert log warnings
|
|
We currently have no limit on the number of elements in a hash table.
This is a problem because some users (tipc) set a ceiling on the
maximum table size and when that is reached the hash table may
degenerate. Others may encounter OOM when growing and if we allow
insertions when that happens the hash table perofrmance may also
suffer.
This patch adds a new paramater insecure_max_entries which becomes
the cap on the table. If unset it defaults to max_size * 2. If
it is also zero it means that there is no cap on the number of
elements in the table. However, the table will grow whenever the
utilisation hits 100% and if that growth fails, you will get ENOMEM
on insertion.
As allowing oversubscription is potentially dangerous, the name
contains the word insecure.
Note that the cap is not a hard limit. This is done for performance
reasons as enforcing a hard limit will result in use of atomic ops
that are heavier than the ones we currently use.
The reasoning is that we're only guarding against a gross over-
subscription of the table, rather than a small breach of the limit.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pablo Neira Ayuso says:
====================
The following patchset contains Netfilter fixes for your net tree, they are:
1) Fix a leak in IPVS, the sysctl table is not released accordingly when
destroying a netns, patch from Tommi Rantala.
2) Fix a build error when TPROXY and socket are built-in but IPv6 defrag is
compiled as module, from Florian Westphal.
3) Fix TCP tracket wrt. RFC5961 challenge ACK when in LAST_ACK state, patch
from Jesper Dangaard Brouer.
4) Fix a bogus WARN_ON() in nf_tables when deleting a set element that stores
a map, from Mirek Kratochvil.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Two fixes: a suspend/resume related regression fix, and an RT priority
boosting fix"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/core: Fix regression in cpuset_cpu_inactive() for suspend
sched: Handle priority boosted tasks proper in setscheduler()
|
|
In compliance with RFC5961, the network stack send challenge ACK in
response to spurious SYN packets, since commit 0c228e833c88 ("tcp:
Restore RFC5961-compliant behavior for SYN packets").
This pose a problem for netfilter conntrack in state LAST_ACK, because
this challenge ACK is (falsely) seen as ACKing last FIN, causing a
false state transition (into TIME_WAIT).
The challenge ACK is hard to distinguish from real last ACK. Thus,
solution introduce a flag that tracks the potential for seeing a
challenge ACK, in case a SYN packet is let through and current state
is LAST_ACK.
When conntrack transition LAST_ACK to TIME_WAIT happens, this flag is
used for determining if we are expecting a challenge ACK.
Scapy based reproducer script avail here:
https://github.com/netoptimizer/network-testing/blob/master/scapy/tcp_hacks_3WHS_LAST_ACK.py
Fixes: 0c228e833c88 ("tcp: Restore RFC5961-compliant behavior for SYN packets")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Pull drm fixes from Dave Airlie:
"Radeon:
one oops fix, one bug fix, one pci id addition patch
i915:
one suspend/resume regression fix.
All seems quiet enough."
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/radeon: don't do mst probing if MST isn't enabled.
drm/radeon: add new bonaire pci id
drm/radeon: fix VM_CONTEXT*_PAGE_TABLE_END_ADDR handling
drm/i915: Avoid GPU hang when coming out of s3 or s4
|
|
into drm-fixes
radeon minor fixes, and pci id addition.
* 'drm-fixes-4.1' of git://people.freedesktop.org/~agd5f/linux:
drm/radeon: don't do mst probing if MST isn't enabled.
drm/radeon: add new bonaire pci id
drm/radeon: fix VM_CONTEXT*_PAGE_TABLE_END_ADDR handling
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
pointless forward declarations, stale comments
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
touch_atime is not RCU-safe, and so cannot be called on an RCU walk.
However, in situations where RCU-walk makes a difference, the symlink
will likely to accessed much more often than it is useful to update
the atime.
So split out the test of "Does the atime actually need to be updated"
into atime_needs_update(), and have get_link() unlazy if it finds that
it will need to do that update.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
RTNH_F_EXTERNAL today is printed as "offload" in iproute2 output.
This patch renames the flag to be consistent with what the user sees.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|