Age | Commit message (Collapse) | Author | Files | Lines |
|
As it is, default ->i_fop has NULL ->open() (along with all other methods).
The only case where it matters is reopening (via procfs symlink) a file that
didn't get its ->f_op from ->i_fop - anything else will have ->i_fop assigned
to something sane (default would fail on read/write/ioctl/etc.).
Unfortunately, such case exists - alloc_file() users, especially
anon_get_file() ones. There we have tons of opened files of very different
kinds sharing the same inode. As the result, attempt to reopen those via
procfs succeeds and you get a descriptor you can't do anything with.
Moreover, in case of sockets we set ->i_fop that will only be used
on such reopen attempts - and put a failing ->open() into it to make sure
those do not succeed.
It would be simpler to put such ->open() into default ->i_fop and leave
it unchanged both for anon inode (as we do anyway) and for socket ones. Result:
* everything going through do_dentry_open() works as it used to
* sock_no_open() kludge is gone
* attempts to reopen anon-inode files fail as they really ought to
* ditto for aio_private_file()
* ditto for perfmon - this one actually tried to imitate sock_no_open()
trick, but failed to set ->i_fop, so in the current tree reopens succeed and
yield completely useless descriptor. Intent clearly had been to fail with
-ENXIO on such reopens; now it actually does.
* everything else that used alloc_file() keeps working - it has ->i_fop
set for its inodes anyway
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
|
|
New pseudo-filesystem: nsfs. Targets of /proc/*/ns/* live there now.
It's not mountable (not even registered, so it's not in /proc/filesystems,
etc.). Files on it *are* bindable - we explicitly permit that in do_loopback().
This stuff lives in fs/nsfs.c now; proc_ns_fget() moved there as well.
get_proc_ns() is a macro now (it's simply returning ->i_private; would
have been an inline, if not for header ordering headache).
proc_ns_inode() is an ex-parrot. The interface used in procfs is
ns_get_path(path, task, ops) and ns_get_name(buf, size, task, ops).
Dentries and inodes are never hashed; a non-counting reference to dentry
is stashed in ns_common (removed by ->d_prune()) and reused by ns_get_path()
if present. See ns_get_path()/ns_prune_dentry/nsfs_evict() for details
of that mechanism.
As the result, proc_ns_follow_link() has stopped poking in nd->path.mnt;
it does nd_jump_link() on a consistent <vfsmount,dentry> pair it gets
from ns_get_path().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
|
|
BTW, do we want memcpy_nocache()?
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
initialization of kvec-backed iov_iter
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
... without bothering with copy_..._user()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
a) make get_proc_ns() return a pointer to struct ns_common
b) mirror ns_ops in dentry->d_fsdata of ns dentries, so that
is_mnt_ns_file() could get away with fewer dereferences.
That way struct proc_ns becomes invisible outside of fs/proc/*.c
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
take struct ns_common *, for now simply wrappers around proc_{alloc,free}_inum()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
We can do that now. And kill ->inum(), while we are at it - all instances
are identical.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
for now - just move corresponding ->proc_inum instances over there
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
Pull percpu fix from Tejun Heo:
"This contains one patch to fix a race condition which can lead to
percpu_ref using a percpu pointer which is corrupted with a set DEAD
bit. The bug was introduced while separating out the ATOMIC mode flag
from the DEAD flag. The fix is pretty straight forward.
I just committed the patch to the percpu tree but am sending out the
pull request early as I'll be on vacation for a week. The patch
should be fairly safe and while the latency will be higher I'll be
checking emails"
* 'for-3.18-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
percpu-ref: fix DEAD flag contamination of percpu pointer
|
|
While decoupling ATOMIC and DEAD flags, f47ad4578461 ("percpu_ref:
decouple switching to percpu mode and reinit") updated
__ref_is_percpu() so that it only tests ATOMIC flag to determine
whether the ref is in percpu mode or not; however, while DEAD implies
ATOMIC, the two flags are set separately during percpu_ref_kill() and
if __ref_is_percpu() races percpu_ref_kill(), it may see DEAD w/o
ATOMIC. Because __ref_is_percpu() returns @ref->percpu_count_ptr
value verbatim as the percpu pointer after testing ATOMIC, the pointer
may now be contaminated with the DEAD flag.
This can be fixed by clearing the flag bits before returning the
pointer which was the fix proposed by Shaohua; however, as DEAD
implies ATOMIC, we can just test for both flags at once and avoid the
explicit masking.
Update __ref_is_percpu() so that it tests that both ATOMIC and DEAD
are clear before returning @ref->percpu_count_ptr as the percpu
pointer.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-Reviewed-by: Shaohua Li <shli@kernel.org>
Link: http://lkml.kernel.org/r/995deb699f5b873c45d667df4add3b06f73c2c25.1416638887.git.shli@kernel.org
Fixes: f47ad4578461 ("percpu_ref: decouple switching to percpu mode and reinit")
|
|
Pull networking fixes from David Miller:
1) Fix BUG when decrypting empty packets in mac80211, from Ronald Wahl.
2) nf_nat_range is not fully initialized and this is copied back to
userspace, from Daniel Borkmann.
3) Fix read past end of b uffer in netfilter ipset, also from Dan
Carpenter.
4) Signed integer overflow in ipv4 address mask creation helper
inet_make_mask(), from Vincent BENAYOUN.
5) VXLAN, be2net, mlx4_en, and qlcnic need ->ndo_gso_check() methods to
properly describe the device's capabilities, from Joe Stringer.
6) Fix memory leaks and checksum miscalculations in openvswitch, from
Pravin B SHelar and Jesse Gross.
7) FIB rules passes back ambiguous error code for unreachable routes,
making behavior confusing for userspace. Fix from Panu Matilainen.
8) ieee802154fake_probe() doesn't release resources properly on error,
from Alexey Khoroshilov.
9) Fix skb_over_panic in add_grhead(), from Daniel Borkmann.
10) Fix access of stale slave pointers in bonding code, from Nikolay
Aleksandrov.
11) Fix stack info leak in PPP pptp code, from Mathias Krause.
12) Cure locking bug in IPX stack, from Jiri Bohac.
13) Revert SKB fclone memory freeing optimization that is racey and can
allow accesses to freed up memory, from Eric Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (71 commits)
tcp: Restore RFC5961-compliant behavior for SYN packets
net: Revert "net: avoid one atomic operation in skb_clone()"
virtio-net: validate features during probe
cxgb4 : Fix DCB priority groups being returned in wrong order
ipx: fix locking regression in ipx_sendmsg and ipx_recvmsg
openvswitch: Don't validate IPv6 label masks.
pptp: fix stack info leak in pptp_getname()
brcmfmac: don't include linux/unaligned/access_ok.h
cxgb4i : Don't block unload/cxgb4 unload when remote closes TCP connection
ipv6: delete protocol and unregister rtnetlink when cleanup
net/mlx4_en: Add VXLAN ndo calls to the PF net device ops too
bonding: fix curr_active_slave/carrier with loadbalance arp monitoring
mac80211: minstrel_ht: fix a crash in rate sorting
vxlan: Inline vxlan_gso_check().
can: m_can: update to support CAN FD features
can: m_can: fix incorrect error messages
can: m_can: add missing delay after setting CCCR_INIT bit
can: m_can: fix not set can_dlc for remote frame
can: m_can: fix possible sleep in napi poll
can: m_can: add missing message RAM initialization
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Misc fixes: two NUMA fixes, two cputime fixes and an RCU/lockdep fix"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/cputime: Fix clock_nanosleep()/clock_gettime() inconsistency
sched/cputime: Fix cpu_timer_sample_group() double accounting
sched/numa: Avoid selecting oneself as swap target
sched/numa: Fix out of bounds read in sched_init_numa()
sched: Remove lockdep check in sched_move_task()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core fix from Ingo Molnar:
"Fix GENMASK macro shift overflow"
Nobody seems to currently use GENMASK() to fill every single last bit
(which is what overflows) in-tree, and gcc would warn about it, so we
have that going for us. But apparently there are pending changes that
want this.
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
bitops: Fix shift overflow in GENMASK macros
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into for-next
Pull the beginning of seq_file cleanup from Steven:
"I'm looking to clean up the seq_file code and to eventually merge the
trace_seq code with seq_file as well, since they basically do the same thing.
Part of this process is to remove the return code of seq_printf() and friends
as they are rather inconsistent. It is better to use the new function
seq_has_overflowed() if you want to stop processing when the buffer
is full. Note, if the buffer is full, the seq_file code will throw away
the contents, allocate a bigger buffer, and then call your code again
to fill in the data. The only thing that breaking out of the function
early does is to save a little time which is probably never noticed.
I started with patches from Joe Perches and modified them as well.
There's many more places that need to be updated before we can convert
seq_printf() and friends to return void. But this patch set introduces
the seq_has_overflowed() and does some initial updates."
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
... for situations when we don't have any candidate in pathnames - basically,
in descriptor-based syscalls.
[Folded the build fix for !CONFIG_AUDITSYSCALL configs from Chen Gang]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
The CAN device drivers can use can_is_canfd_skb() to check if the frame to send
is on CAN FD mode or normal CAN mode.
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Dong Aisheng <b29396@freescale.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
While looking over the cpu-timer code I found that we appear to add
the delta for the calling task twice, through:
cpu_timer_sample_group()
thread_group_cputimer()
thread_group_cputime()
times->sum_exec_runtime += task_sched_runtime();
*sample = cputime.sum_exec_runtime + task_delta_exec();
Which would make the sample run ahead, making the sleep short.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20141112113737.GI10476@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
On some 32 bits architectures, including x86, GENMASK(31, 0) returns 0
instead of the expected ~0UL.
This is the same on some 64 bits architectures with GENMASK_ULL(63, 0).
This is due to an overflow in the shift operand, 1 << 32 for GENMASK,
1 << 64 for GENMASK_ULL.
Reported-by: Eric Paire <eric.paire@st.com>
Suggested-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Maxime Coquelin <maxime.coquelin@st.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org> # v3.13+
Cc: linux@rasmusvillemoes.dk
Cc: gong.chen@linux.intel.com
Cc: John Sullivan <jsrhbz@kanargh.force9.co.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Fixes: 10ef6b0dffe4 ("bitops: Introduce a more generic BITMASK macro")
Link: http://lkml.kernel.org/r/1415267659-10563-1-git-send-email-maxime.coquelin@st.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Pull power supply updates from Sebastian Reichel:
"Power supply and reset changes for the v3.18-rc:
- misc. charger-manager fixes
- year 2038 fix in ab8500_fg
- fix error handling of bq2415x_charger"
* tag 'for-v3.18-rc' of git://git.infradead.org/battery-2.6:
power: charger-manager: Fix accessing invalidated power supply after charger unbind
power: charger-manager: Fix accessing invalidated power supply after fuel gauge unbind
power: charger-manager: Avoid recursive thermal get_temp call
power_supply: Add no_thermal property to prevent recursive get_temp calls
power: bq2415x_charger: Fix memory leak on DTS parsing error
power: bq2415x_charger: Properly handle ENODEV from power_supply_get_by_phandle
power: ab8500_fg.c: use 64-bit time types
|
|
Pull NFS client bugfixes from Trond Myklebust:
"Highlights include:
- stable patches to fix NFSv4.x delegation reclaim error paths
- fix a bug whereby we were advertising NFSv4.1 but using NFSv4.2
features
- fix a use-after-free problem with pNFS block layouts
- fix a memory leak in the pNFS files O_DIRECT code
- replace an intrusive and Oops-prone performance fix in the NFSv4
atomic open code with a safer one-line version and revert the two
original patches"
* tag 'nfs-for-3.18-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
sunrpc: fix sleeping under rcu_read_lock in gss_stringify_acceptor
NFS: Don't try to reclaim delegation open state if recovery failed
NFSv4: Ensure that we call FREE_STATEID when NFSv4.x stateids are revoked
NFSv4: Fix races between nfs_remove_bad_delegation() and delegation return
NFSv4.1: nfs41_clear_delegation_stateid shouldn't trust NFS_DELEGATED_STATE
NFSv4: Ensure that we remove NFSv4.0 delegations when state has expired
NFS: SEEK is an NFS v4.2 feature
nfs: Fix use of uninitialized variable in nfs_getattr()
nfs: Remove bogus assignment
nfs: remove spurious WARN_ON_ONCE in write path
pnfs/blocklayout: serialize GETDEVICEINFO calls
nfs: fix pnfs direct write memory leak
Revert "NFS: nfs4_do_open should add negative results to the dcache."
Revert "NFS: remove BUG possibility in nfs4_open_and_get_state"
NFSv4: Ensure nfs_atomic_open set the dentry verifier on ENOENT
|
|
There could be a signed overflow in the following code.
The expression, (32-logmask) is comprised between 0 and 31 included.
It may be equal to 31.
In such a case the left shift will produce a signed integer overflow.
According to the C99 Standard, this is an undefined behavior.
A simple fix is to replace the signed int 1 with the unsigned int 1U.
Signed-off-by: Vincent BENAYOUN <vincent.benayoun@trust-in-soft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management fixes from Rafael Wysocki:
"These are three regression fixes, two recent (generic power domains,
suspend-to-idle) and one older (cpufreq), an ACPI blacklist entry for
one more machine having problems with Windows 8 compatibility, a minor
cpufreq driver fix (cpufreq-dt) and a fixup for new callback
definitions (generic power domains).
Specifics:
- Fix a crash in the suspend-to-idle code path introduced by a recent
commit that forgot to check a pointer against NULL before
dereferencing it (Dmitry Eremin-Solenikov).
- Fix a boot crash on Exynos5 introduced by a recent commit making
that platform use generic Device Tree bindings for power domains
which exposed a weakness in the generic power domains framework
leading to that crash (Ulf Hansson).
- Fix a crash during system resume on systems where cpufreq depends
on Operation Performance Points (OPP) for functionality, but
CONFIG_OPP is not set. This leads the cpufreq driver registration
to fail, but the resume code attempts to restore the pre-suspend
cpufreq configuration (which does not exist) nevertheless and
crashes. From Geert Uytterhoeven.
- Add a new ACPI blacklist entry for Dell Vostro 3546 that has
problems if it is reported as Windows 8 compatible to the BIOS
(Adam Lee).
- Fix swapped arguments in an error message in the cpufreq-dt driver
(Abhilash Kesavan).
- Fix up the prototypes of new callbacks in struct generic_pm_domain
to make them more useful. Users of those callbacks will be added
in 3.19 and it's better for them to be based on the correct struct
definition in mainline from the start. From Ulf Hansson and Kevin
Hilman"
* tag 'pm+acpi-3.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / Domains: Fix initial default state of the need_restore flag
PM / sleep: Fix entering suspend-to-IDLE if no freeze_oops is set
PM / Domains: Change prototype for the attach and detach callbacks
cpufreq: Avoid crash in resume on SMP without OPP
cpufreq: cpufreq-dt: Fix arguments in clock failure error message
ACPI / blacklist: blacklist Win8 OSI for Dell Vostro 3546
|
|
* pm-domains:
PM / Domains: Fix initial default state of the need_restore flag
PM / Domains: Change prototype for the attach and detach callbacks
* pm-sleep:
PM / sleep: Fix entering suspend-to-IDLE if no freeze_oops is set
* pm-cpufreq:
cpufreq: Avoid crash in resume on SMP without OPP
cpufreq: cpufreq-dt: Fix arguments in clock failure error message
|
|
Pull networking fixes from David Miller:
1) sunhme driver lacks DMA mapping error checks, based upon a report by
Meelis Roos.
2) Fix memory leak in mvpp2 driver, from Sudip Mukherjee.
3) DMA memory allocation sizes are wrong in systemport ethernet driver,
fix from Florian Fainelli.
4) Fix use after free in mac80211 defragmentation code, from Johannes
Berg.
5) Some networking uapi headers missing from Kbuild file, from Stephen
Hemminger.
6) TUN driver gets csum_start offset wrong when VLAN accel is enabled,
and macvtap has a similar bug, from Herbert Xu.
7) Adjust several tunneling drivers to set dev->iflink after registry,
because registry sets that to -1 overwriting whatever we did. From
Steffen Klassert.
8) Geneve forgets to set inner tunneling type, causing GSO segmentation
to fail on some NICs. From Jesse Gross.
9) Fix several locking bugs in stmmac driver, from Fabrice Gasnier and
Giuseppe CAVALLARO.
10) Fix spurious timeouts with NewReno on low traffic connections, from
Marcelo Leitner.
11) Fix descriptor updates in enic driver, from Govindarajulu
Varadarajan.
12) PPP calls bpf_prog_create() with locks held, which isn't kosher.
Fix from Takashi Iwai.
13) Fix NULL deref in SCTP with malformed INIT packets, from Daniel
Borkmann.
14) psock_fanout selftest accesses past the end of the mmap ring, fix
from Shuah Khan.
15) Fix PTP timestamping for VLAN packets, from Richard Cochran.
16) netlink_unbind() calls in netlink pass wrong initial argument, from
Hiroaki SHIMODA.
17) vxlan socket reuse accidently reuses a socket when the address
family is different, so we have to explicitly check this, from
Marcelo Lietner.
18) Fix missing include in nft_reject_bridge.c breaking the build on ppc
and other architectures, from Guenter Roeck.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (75 commits)
vxlan: Do not reuse sockets for a different address family
smsc911x: power-up phydev before doing a software reset.
lib: rhashtable - Remove weird non-ASCII characters from comments
net/smsc911x: Fix delays in the PHY enable/disable routines
net/smsc911x: Fix rare soft reset timeout issue due to PHY power-down mode
netlink: Properly unbind in error conditions.
net: ptp: fix time stamp matching logic for VLAN packets.
cxgb4 : dcb open-lldp interop fixes
selftests/net: psock_fanout seg faults in sock_fanout_read_ring()
net: bcmgenet: apply MII configuration in bcmgenet_open()
net: bcmgenet: connect and disconnect from the PHY state machine
net: qualcomm: Fix dependency
ixgbe: phy: fix uninitialized status in ixgbe_setup_phy_link_tnx
net: phy: Correctly handle MII ioctl which changes autonegotiation.
ipv6: fix IPV6_PKTINFO with v4 mapped
net: sctp: fix memory leak in auth key management
net: sctp: fix NULL pointer dereference in af->from_addr_param on malformed packet
net: ppp: Don't call bpf_prog_create() in ppp_lock
net/mlx4_en: Advertize encapsulation offloads features only when VXLAN tunnel is set
cxgb4 : Fix bug in DCB app deletion
...
|
|
In free_area_init_core(), zone->managed_pages is set to an approximate
value for lowmem, and will be adjusted when the bootmem allocator frees
pages into the buddy system.
But free_area_init_core() is also called by hotadd_new_pgdat() when
hot-adding memory. As a result, zone->managed_pages of the newly added
node's pgdat is set to an approximate value in the very beginning.
Even if the memory on that node has node been onlined,
/sys/device/system/node/nodeXXX/meminfo has wrong value:
hot-add node2 (memory not onlined)
cat /sys/device/system/node/node2/meminfo
Node 2 MemTotal: 33554432 kB
Node 2 MemFree: 0 kB
Node 2 MemUsed: 33554432 kB
Node 2 Active: 0 kB
This patch fixes this problem by reset node managed pages to 0 after
hot-adding a new node.
1. Move reset_managed_pages_done from reset_node_managed_pages() to
reset_all_zones_managed_pages()
2. Make reset_node_managed_pages() non-static
3. Call reset_node_managed_pages() in hotadd_new_pgdat() after pgdat
is initialized
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: <stable@vger.kernel.org> [3.16+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Before describing bugs itself, I first explain definition of freepage.
1. pages on buddy list are counted as freepage.
2. pages on isolate migratetype buddy list are *not* counted as freepage.
3. pages on cma buddy list are counted as CMA freepage, too.
Now, I describe problems and related patch.
Patch 1: There is race conditions on getting pageblock migratetype that
it results in misplacement of freepages on buddy list, incorrect
freepage count and un-availability of freepage.
Patch 2: Freepages on pcp list could have stale cached information to
determine migratetype of buddy list to go. This causes misplacement of
freepages on buddy list and incorrect freepage count.
Patch 4: Merging between freepages on different migratetype of
pageblocks will cause freepages accouting problem. This patch fixes it.
Without patchset [3], above problem doesn't happens on my CMA allocation
test, because CMA reserved pages aren't used at all. So there is no
chance for above race.
With patchset [3], I did simple CMA allocation test and get below
result:
- Virtual machine, 4 cpus, 1024 MB memory, 256 MB CMA reservation
- run kernel build (make -j16) on background
- 30 times CMA allocation(8MB * 30 = 240MB) attempts in 5 sec interval
- Result: more than 5000 freepage count are missed
With patchset [3] and this patchset, I found that no freepage count are
missed so that I conclude that problems are solved.
On my simple memory offlining test, these problems also occur on that
environment, too.
This patch (of 4):
There are two paths to reach core free function of buddy allocator,
__free_one_page(), one is free_one_page()->__free_one_page() and the
other is free_hot_cold_page()->free_pcppages_bulk()->__free_one_page().
Each paths has race condition causing serious problems. At first, this
patch is focused on first type of freepath. And then, following patch
will solve the problem in second type of freepath.
In the first type of freepath, we got migratetype of freeing page
without holding the zone lock, so it could be racy. There are two cases
of this race.
1. pages are added to isolate buddy list after restoring orignal
migratetype
CPU1 CPU2
get migratetype => return MIGRATE_ISOLATE
call free_one_page() with MIGRATE_ISOLATE
grab the zone lock
unisolate pageblock
release the zone lock
grab the zone lock
call __free_one_page() with MIGRATE_ISOLATE
freepage go into isolate buddy list,
although pageblock is already unisolated
This may cause two problems. One is that we can't use this page anymore
until next isolation attempt of this pageblock, because freepage is on
isolate buddy list. The other is that freepage accouting could be wrong
due to merging between different buddy list. Freepages on isolate buddy
list aren't counted as freepage, but ones on normal buddy list are
counted as freepage. If merge happens, buddy freepage on normal buddy
list is inevitably moved to isolate buddy list without any consideration
of freepage accouting so it could be incorrect.
2. pages are added to normal buddy list while pageblock is isolated.
It is similar with above case.
This also may cause two problems. One is that we can't keep these
freepages from being allocated. Although this pageblock is isolated,
freepage would be added to normal buddy list so that it could be
allocated without any restriction. And the other problem is same as
case 1, that it, incorrect freepage accouting.
This race condition would be prevented by checking migratetype again
with holding the zone lock. Because it is somewhat heavy operation and
it isn't needed in common case, we want to avoid rechecking as much as
possible. So this patch introduce new variable, nr_isolate_pageblock in
struct zone to check if there is isolated pageblock. With this, we can
avoid to re-check migratetype in common case and do it only if there is
isolated pageblock or migratetype is MIGRATE_ISOLATE. This solve above
mentioned problems.
Changes from v3:
Add one more check in free_one_page() that checks whether migratetype is
MIGRATE_ISOLATE or not. Without this, abovementioned case 1 could happens.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Laura Abbott <lauraa@codeaurora.org>
Cc: Heesub Shin <heesub.shin@samsung.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Ritesh Harjani <ritesh.list@gmail.com>
Cc: Gioh Kim <gioh.kim@lge.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fix from Steven Rostedt:
"Rabin Vincent found a way that tracing could cause an infinite loop in
the kernel. The splice logic wants a full page from the ring buffer
but the ring_buffer_wait() returns when there's any data in the ring
buffer. The splice code would then continue the loop waiting for a
full page. But if a full page never happens, the splice code will
never sleep and just continue to loop.
There's another case that Rabin fixed that could loop if there's no
memory and kmalloc() constantly returns NULL"
* tag 'trace-fixes-v3.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Do not risk busy looping in buffer splice
tracing: Do not busy wait in buffer splice
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd
Pull MFD fixes from Lee Jones:
- register offset fix for stmpe
- eradicate build warning when !PM in rtsx_pcr
- fix device ID collision when multiple boards are connected in
viperboard
- use correct Regmap handle - fixing unhanded IRQs in max77693
- unmask MUIC IRQs in max77693
- clear VBUS & CHG bits so board doesn't reboot instead of poweroff in
twl4030
* tag 'mfd-fixes-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd:
mfd: twl4030-power: Fix poweroff with PM configuration enabled
mfd: max77693: Fix always masked MUIC interrupts
mfd: max77693: Use proper regmap for handling MUIC interrupts
mfd: viperboard: Fix platform-device id collision
mfd: rtsx: Fix build warnings for !PM
mfd: stmpe: Fix STMPE24xx GPMR LSB
|
|
For pNFS direct writes, layout driver may dynamically allocate ds_cinfo.buckets.
So we need to take care to free them when freeing dreq.
Ideally this needs to be done inside layout driver where ds_cinfo.buckets
are allocated. But buckets are attached to dreq and reused across LD IO iterations.
So I feel it's OK to free them in the generic layer.
Cc: stable@vger.kernel.org [v3.4+]
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
The initial state of the device's need_restore flag should'nt depend on
the current state of the PM domain. For example it should be perfectly
valid to attach an inactive device to a powered PM domain.
The pm_genpd_dev_need_restore() API allow us to update the need_restore
flag to somewhat cope with such scenarios. Typically that should have
been done from drivers/buses ->probe() since it's those that put the
requirements on the value of the need_restore flag.
Until recently, the Exynos SOCs were the only user of the
pm_genpd_dev_need_restore() API, though invoking it from a centralized
location while adding devices to their PM domains.
Due to that Exynos now have swithed to the generic OF-based PM domain
look-up, it's no longer possible to invoke the API from a centralized
location. The reason is because devices are now added to their PM
domains during the probe sequence.
Commit "ARM: exynos: Move to generic PM domain DT bindings"
did the switch for Exynos to the generic OF-based PM domain look-up,
but it also removed the call to pm_genpd_dev_need_restore(). This
caused a regression for some of the Exynos drivers.
To handle things more properly in the generic PM domain, let's change
the default initial value of the need_restore flag to reflect that the
state is unknown. As soon as some of the runtime PM callbacks gets
invoked, update the initial value accordingly.
Moreover, since the generic PM domain is verifying that all devices
are both runtime PM enabled and suspended, using pm_runtime_suspended()
while pm_genpd_poweroff() is invoked from the scheduled work, we can be
sure of that the PM domain won't be powering off while having active
devices.
Do note that, the generic PM domain can still only know about active
devices which has been activated through invoking its runtime PM resume
callback. In other words, buses/drivers using pm_runtime_set_active()
during ->probe() will still suffer from a race condition, potentially
probing a device without having its PM domain being powered. That issue
will have to be solved using a different approach.
This a log from the boot regression for Exynos5, which is being fixed in
this patch.
------------[ cut here ]------------
WARNING: CPU: 0 PID: 308 at ../drivers/clk/clk.c:851 clk_disable+0x24/0x30()
Modules linked in:
CPU: 0 PID: 308 Comm: kworker/0:1 Not tainted 3.18.0-rc3-00569-gbd9449f-dirty #10
Workqueue: pm pm_runtime_work
[<c0013c64>] (unwind_backtrace) from [<c0010dec>] (show_stack+0x10/0x14)
[<c0010dec>] (show_stack) from [<c03ee4cc>] (dump_stack+0x70/0xbc)
[<c03ee4cc>] (dump_stack) from [<c0020d34>] (warn_slowpath_common+0x64/0x88)
[<c0020d34>] (warn_slowpath_common) from [<c0020d74>] (warn_slowpath_null+0x1c/0x24)
[<c0020d74>] (warn_slowpath_null) from [<c03107b0>] (clk_disable+0x24/0x30)
[<c03107b0>] (clk_disable) from [<c02cc834>] (gsc_runtime_suspend+0x128/0x160)
[<c02cc834>] (gsc_runtime_suspend) from [<c0249024>] (pm_generic_runtime_suspend+0x2c/0x38)
[<c0249024>] (pm_generic_runtime_suspend) from [<c024f44c>] (pm_genpd_default_save_state+0x2c/0x8c)
[<c024f44c>] (pm_genpd_default_save_state) from [<c024ff2c>] (pm_genpd_poweroff+0x224/0x3ec)
[<c024ff2c>] (pm_genpd_poweroff) from [<c02501b4>] (pm_genpd_runtime_suspend+0x9c/0xcc)
[<c02501b4>] (pm_genpd_runtime_suspend) from [<c024a4f8>] (__rpm_callback+0x2c/0x60)
[<c024a4f8>] (__rpm_callback) from [<c024a54c>] (rpm_callback+0x20/0x74)
[<c024a54c>] (rpm_callback) from [<c024a930>] (rpm_suspend+0xd4/0x43c)
[<c024a930>] (rpm_suspend) from [<c024bbcc>] (pm_runtime_work+0x80/0x90)
[<c024bbcc>] (pm_runtime_work) from [<c0032a9c>] (process_one_work+0x12c/0x314)
[<c0032a9c>] (process_one_work) from [<c0032cf4>] (worker_thread+0x3c/0x4b0)
[<c0032cf4>] (worker_thread) from [<c003747c>] (kthread+0xcc/0xe8)
[<c003747c>] (kthread) from [<c000e738>] (ret_from_fork+0x14/0x3c)
---[ end trace 40cd58bcd6988f12 ]---
Fixes: a4a8c2c4962bb655 (ARM: exynos: Move to generic PM domain DT bindings)
Reported-and-tested0by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Reviewed-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Reviewed-by: Kevin Hilman <khilman@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
On a !PREEMPT kernel, attempting to use trace-cmd results in a soft
lockup:
# trace-cmd record -e raw_syscalls:* -F false
NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [trace-cmd:61]
...
Call Trace:
[<ffffffff8105b580>] ? __wake_up_common+0x90/0x90
[<ffffffff81092e25>] wait_on_pipe+0x35/0x40
[<ffffffff810936e3>] tracing_buffers_splice_read+0x2e3/0x3c0
[<ffffffff81093300>] ? tracing_stats_read+0x2a0/0x2a0
[<ffffffff812d10ab>] ? _raw_spin_unlock+0x2b/0x40
[<ffffffff810dc87b>] ? do_read_fault+0x21b/0x290
[<ffffffff810de56a>] ? handle_mm_fault+0x2ba/0xbd0
[<ffffffff81095c80>] ? trace_event_buffer_lock_reserve+0x40/0x80
[<ffffffff810951e2>] ? trace_buffer_lock_reserve+0x22/0x60
[<ffffffff81095c80>] ? trace_event_buffer_lock_reserve+0x40/0x80
[<ffffffff8112415d>] do_splice_to+0x6d/0x90
[<ffffffff81126971>] SyS_splice+0x7c1/0x800
[<ffffffff812d1edd>] tracesys_phase2+0xd3/0xd8
The problem is this: tracing_buffers_splice_read() calls
ring_buffer_wait() to wait for data in the ring buffers. The buffers
are not empty so ring_buffer_wait() returns immediately. But
tracing_buffers_splice_read() calls ring_buffer_read_page() with full=1,
meaning it only wants to read a full page. When the full page is not
available, tracing_buffers_splice_read() tries to wait again with
ring_buffer_wait(), which again returns immediately, and so on.
Fix this by adding a "full" argument to ring_buffer_wait() which will
make ring_buffer_wait() wait until the writer has left the reader's
page, i.e. until full-page reads will succeed.
Link: http://lkml.kernel.org/r/1415645194-25379-1-git-send-email-rabin@rab.in
Cc: stable@vger.kernel.org # 3.16+
Fixes: b1169cc69ba9 ("tracing: Remove mock up poll wait function")
Signed-off-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
All interrupts coming from MUIC were ignored because interrupt source
register was masked.
The Maxim 77693 has a "interrupt source" - a separate register and interrupts
which give information about PMIC block triggering the individual
interrupt (charger, topsys, MUIC, flash LED).
By default bootloader could initialize this register to "mask all"
value. In such case (observed on Trats2 board) MUIC interrupts won't be
generated regardless of their mask status. Regmap irq chip was unmasking
individual MUIC interrupts but the source was masked
Before introducing regmap irq chip this interrupt source was unmasked,
read and acked. Reading and acking is not necessary but unmasking is.
Fixes: 342d669c1ee4 ("mfd: max77693: Handle IRQs using regmap")
Cc: <stable@vger.kernel.org>
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/glikely/linux
Pull devicetree bugfix from Grant Likely:
"One buffer overflow bug that shouldn't be left around"
* 'devicetree/merge' of git://git.kernel.org/pub/scm/linux/kernel/git/glikely/linux:
of: Fix overflow bug in string property parsing functions
|
|
Convert the prototypes to return an int in order to support error
handling in these callbacks.
Also, as suggested by Dmitry Torokhov, pass the domain pointer for use
inside the callbacks, and so that they match the existing
power_on/power_off callbacks which currently take the domain pointer.
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
[ khilman: added domain as parameter to callbacks, as suggested by Dmitry ]
Signed-off-by: Kevin Hilman <khilman@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI fix from Bjorn Helgaas:
"This fixes an oops when enabling SR-IOV VF devices. The oops is a
regression I added by configuring all devices during enumeration.
- Don't oops on virtual buses in acpi_pci_get_bridge_handle() (Yinghai Lu)"
* tag 'pci-v3.18-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: Don't oops on virtual buses in acpi_pci_get_bridge_handle()
|
|
File descriptors are always closed on exit :-)
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
acpi_pci_get_bridge_handle() returns the ACPI handle for the bridge device
(either a host bridge or a PCI-to-PCI bridge) leading to a PCI bus. But
SR-IOV virtual functions can be on a virtual bus with no bridge leading to
it. Return a NULL acpi_handle in this case instead of trying to
dereference the NULL pointer to the bridge.
This fixes a NULL pointer dereference oops in pci_get_hp_params() when
adding SR-IOV VF devices on virtual buses.
[bhelgaas: changelog, add comment in code]
Fixes: 6cd33649fa83 ("PCI: Add pci_configure_device() during enumeration")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=87591
Reported-by: Chao Zhou <chao.zhou@intel.com>
Reported-by: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
The seq_printf() will soon just return void, and seq_has_overflowed()
should be used instead to see if the seq can no longer accept input.
As the return value of debugfs_print_regs32() has no users and
the seq_file descriptor should be checked with seq_has_overflowed()
instead of return values of functions, it is better to just have
debugfs_print_regs32() also return void.
Link: http://lkml.kernel.org/p/2634b19eb1c04a9d31148c1fe6f1f3819be95349.1412031505.git.joe@perches.com
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Joe Perches <joe@perches.com>
[ original change only updated seq_printf() return, added return of
void to debugfs_print_regs32() as well ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
seq_printf functions shouldn't really check the return value.
Checking seq_has_overflowed() occasionally is used instead.
Update vfs documentation.
Link: http://lkml.kernel.org/p/e37e6e7b76acbdcc3bb4ab2a57c8f8ca1ae11b9a.1412031505.git.joe@perches.com
Cc: David S. Miller <davem@davemloft.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Joe Perches <joe@perches.com>
[ did a few clean ups ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
The string property read helpers will run off the end of the buffer if
it is handed a malformed string property. Rework the parsers to make
sure that doesn't happen. At the same time add new test cases to make
sure the functions behave themselves.
The original implementations of of_property_read_string_index() and
of_property_count_strings() both open-coded the same block of parsing
code, each with it's own subtly different bugs. The fix here merges
functions into a single helper and makes the original functions static
inline wrappers around the helper.
One non-bugfix aspect of this patch is the addition of a new wrapper,
of_property_read_string_array(). The new wrapper is needed by the
device_properties feature that Rafael is working on and planning to
merge for v3.19. The implementation is identical both with and without
the new static inline wrapper, so it just got left in to reduce the
churn on the header file.
Signed-off-by: Grant Likely <grant.likely@linaro.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Darren Hart <darren.hart@intel.com>
Cc: <stable@vger.kernel.org> # v3.3+: Drop selftest hunks that don't apply
|