Age | Commit message (Collapse) | Author | Files | Lines |
|
The output of /proc/$pid/numa_maps is in terms of number of pages like
anon=22 or dirty=54. Here's some output:
7f4680000000 default file=/hugetlb/bigfile anon=50 dirty=50 N0=50
7f7659600000 default file=/anon_hugepage\040(deleted) anon=50 dirty=50 N0=50
7fff8d425000 default stack anon=50 dirty=50 N0=50
Looks like we have a stack and a couple of anonymous hugetlbfs
areas page which both use the same amount of memory. They don't.
The 'bigfile' uses 1GB pages and takes up ~50GB of space. The
anon_hugepage uses 2MB pages and takes up ~100MB of space while the stack
uses normal 4k pages. You can go over to smaps to figure out what the
page size _really_ is with KernelPageSize or MMUPageSize. But, I think
this is a pretty nasty and counterintuitive interface as it stands.
This patch introduces 'kernelpagesize_kB' line element to
/proc/<pid>/numa_maps report file in order to help identifying the size of
pages that are backing memory areas mapped by a given task. This is
specially useful to help differentiating between HUGE and GIGANTIC page
backed VMAs.
This patch is based on Dave Hansen's proposal and reviewer's follow-ups
taken from the following dicussion threads:
* https://lkml.org/lkml/2011/9/21/454
* https://lkml.org/lkml/2014/12/20/66
Signed-off-by: Rafael Aquini <aquini@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
explanation snippet
Add a small section to proc.txt doc in order to document its
/proc/pid/numa_maps interface. It does not introduce any functional
changes, just documentation.
Signed-off-by: Rafael Aquini <aquini@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
(peak RSS)
Peak resident size of a process can be reset back to the process's
current rss value by writing "5" to /proc/pid/clear_refs. The driving
use-case for this would be getting the peak RSS value, which can be
retrieved from the VmHWM field in /proc/pid/status, per benchmark
iteration or test scenario.
[akpm@linux-foundation.org: clarify behaviour in documentation]
Signed-off-by: Petr Cermak <petrcermak@chromium.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Primiano Tucci <primiano@chromium.org>
Cc: Petr Cermak <petrcermak@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull nfsd updates from Bruce Fields:
"The main change is the pNFS block server support from Christoph, which
allows an NFS client connected to shared disk to do block IO to the
shared disk in place of NFS reads and writes. This also requires xfs
patches, which should arrive soon through the xfs tree, barring
unexpected problems. Support for other filesystems is also possible
if there's interest.
Thanks also to Chuck Lever for continuing work to get NFS/RDMA into
shape"
* 'for-3.20' of git://linux-nfs.org/~bfields/linux: (32 commits)
nfsd: default NFSv4.2 to on
nfsd: pNFS block layout driver
exportfs: add methods for block layout exports
nfsd: add trace events
nfsd: update documentation for pNFS support
nfsd: implement pNFS layout recalls
nfsd: implement pNFS operations
nfsd: make find_any_file available outside nfs4state.c
nfsd: make find/get/put file available outside nfs4state.c
nfsd: make lookup/alloc/unhash_stid available outside nfs4state.c
nfsd: add fh_fsid_match helper
nfsd: move nfsd_fh_match to nfsfh.h
fs: add FL_LAYOUT lease type
fs: track fl_owner for leases
nfs: add LAYOUT_TYPE_MAX enum value
nfsd: factor out a helper to decode nfstime4 values
sunrpc/lockd: fix references to the BKL
nfsd: fix year-2038 nfs4 state problem
svcrdma: Handle additional inline content
svcrdma: Move read list XDR round-up logic
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU updates from Joerg Roedel:
"This time with:
- Generic page-table framework for ARM IOMMUs using the LPAE
page-table format, ARM-SMMU and Renesas IPMMU make use of it
already.
- Break out the IO virtual address allocator from the Intel IOMMU so
that it can be used by other DMA-API implementations too. The
first user will be the ARM64 common DMA-API implementation for
IOMMUs
- Device tree support for Renesas IPMMU
- Various fixes and cleanups all over the place"
* tag 'iommu-updates-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (36 commits)
iommu/amd: Convert non-returned local variable to boolean when relevant
iommu: Update my email address
iommu/amd: Use wait_event in put_pasid_state_wait
iommu/amd: Fix amd_iommu_free_device()
iommu/arm-smmu: Avoid build warning
iommu/fsl: Various cleanups
iommu/fsl: Use %pa to print phys_addr_t
iommu/omap: Print phys_addr_t using %pa
iommu: Make more drivers depend on COMPILE_TEST
iommu/ipmmu-vmsa: Fix IOMMU lookup when multiple IOMMUs are registered
iommu: Disable on !MMU builds
iommu/fsl: Remove unused fsl_of_pamu_ids[]
iommu/fsl: Fix section mismatch
iommu/ipmmu-vmsa: Use the ARM LPAE page table allocator
iommu: Fix trace_map() to report original iova and original size
iommu/arm-smmu: add support for iova_to_phys through ATS1PR
iopoll: Introduce memory-mapped IO polling macros
iommu/arm-smmu: don't touch the secure STLBIALL register
iommu/arm-smmu: make use of generic LPAE allocator
iommu: io-pgtable-arm: add non-secure quirk
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull DeviceTree changes from Rob Herring:
- DT unittests for I2C probing and overlays from Pantelis Antoniou
- Remove DT unittest dependency on OF_DYNAMIC from Gaurav Minocha
- Add Tegra compatible strings missing for newer parts from Paul
Walmsley
- Various vendor prefix additions
* tag 'devicetree-for-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
of: Add vendor prefix for OmniVision Technologies
of: Use ovti for Omnivision
of: Add vendor prefix for Truly Semiconductors Limited
of: Add vendor prefix for Himax Technologies Inc.
of/fdt: fix sparse warning
of: unitest: Add I2C overlay unit tests.
Documentation: DT: document compatible string existence requirement
Documentation: DT bindings: add nvidia, tegra132-denver compatible string
Documentation: DT bindings: add more Tegra chip compatible strings
of: EXPORT_SYMBOL_GPL of_property_read_u64_array
of: Fix brace position for struct of_device_id definition
of/unittest: Remove obsolete code
dt-bindings: use isil prefix for Intersil in vendor-prefixes.txt
Add AD Holdings Plc. to vendor-prefixes.
dt-bindings: Add Silicon Mitus vendor prefix
Removes OF_UNITTEST dependency on OF_DYNAMIC config symbol
pinctrl: fix up device tree bindings
DT: Vendors: Add Everspin
doc: add bindings document for altera fpga manager
drivers: of: Export of_reserved_mem_device_{init,release}
|
|
Pull ARM updates from Russell King:
- clang assembly fixes from Ard
- optimisations and cleanups for Aurora L2 cache support
- efficient L2 cache support for secure monitor API on Exynos SoCs
- debug menu cleanup from Daniel Thompson to allow better behaviour for
multiplatform kernels
- StrongARM SA11x0 conversion to irq domains, and pxa_timer
- kprobes updates for older ARM CPUs
- move probes support out of arch/arm/kernel to arch/arm/probes
- add inline asm support for the rbit (reverse bits) instruction
- provide an ARM mode secondary CPU entry point (for Qualcomm CPUs)
- remove the unused ARMv3 user access code
- add driver_override support to AMBA Primecell bus
* 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: (55 commits)
ARM: 8256/1: driver coamba: add device binding path 'driver_override'
ARM: 8301/1: qcom: Use secondary_startup_arm()
ARM: 8302/1: Add a secondary_startup that assumes ARM mode
ARM: 8300/1: teach __asmeq that r11 == fp and r12 == ip
ARM: kprobes: Fix compilation error caused by superfluous '*'
ARM: 8297/1: cache-l2x0: optimize aurora range operations
ARM: 8296/1: cache-l2x0: clean up aurora cache handling
ARM: 8284/1: sa1100: clear RCSR_SMR on resume
ARM: 8283/1: sa1100: collie: clear PWER register on machine init
ARM: 8282/1: sa1100: use handle_domain_irq
ARM: 8281/1: sa1100: move GPIO-related IRQ code to gpio driver
ARM: 8280/1: sa1100: switch to irq_domain_add_simple()
ARM: 8279/1: sa1100: merge both GPIO irqdomains
ARM: 8278/1: sa1100: split irq handling for low GPIOs
ARM: 8291/1: replace magic number with PAGE_SHIFT macro in fixup_pv code
ARM: 8290/1: decompressor: fix a wrong comment
ARM: 8286/1: mm: Fix dma_contiguous_reserve comment
ARM: 8248/1: pm: remove outdated comment
ARM: 8274/1: Fix DEBUG_LL for multi-platform kernels (without PL01X)
ARM: 8273/1: Seperate DEBUG_UART_PHYS from DEBUG_LL on EP93XX
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security layer updates from James Morris:
"Highlights:
- Smack adds secmark support for Netfilter
- /proc/keys is now mandatory if CONFIG_KEYS=y
- TPM gets its own device class
- Added TPM 2.0 support
- Smack file hook rework (all Smack users should review this!)"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (64 commits)
cipso: don't use IPCB() to locate the CIPSO IP option
SELinux: fix error code in policydb_init()
selinux: add security in-core xattr support for pstore and debugfs
selinux: quiet the filesystem labeling behavior message
selinux: Remove unused function avc_sidcmp()
ima: /proc/keys is now mandatory
Smack: Repair netfilter dependency
X.509: silence asn1 compiler debug output
X.509: shut up about included cert for silent build
KEYS: Make /proc/keys unconditional if CONFIG_KEYS=y
MAINTAINERS: email update
tpm/tpm_tis: Add missing ifdef CONFIG_ACPI for pnp_acpi_device
smack: fix possible use after frees in task_security() callers
smack: Add missing logging in bidirectional UDS connect check
Smack: secmark support for netfilter
Smack: Rework file hooks
tpm: fix format string error in tpm-chip.c
char/tpm/tpm_crb: fix build error
smack: Fix a bidirectional UDS connect check typo
smack: introduce a special case for tmpfs in smack_d_instantiate()
...
|
|
Merge second set of updates from Andrew Morton:
"More of MM"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (83 commits)
mm/nommu.c: fix arithmetic overflow in __vm_enough_memory()
mm/mmap.c: fix arithmetic overflow in __vm_enough_memory()
vmstat: Reduce time interval to stat update on idle cpu
mm/page_owner.c: remove unnecessary stack_trace field
Documentation/filesystems/proc.txt: describe /proc/<pid>/map_files
mm: incorporate read-only pages into transparent huge pages
vmstat: do not use deferrable delayed work for vmstat_update
mm: more aggressive page stealing for UNMOVABLE allocations
mm: always steal split buddies in fallback allocations
mm: when stealing freepages, also take pages created by splitting buddy page
mincore: apply page table walker on do_mincore()
mm: /proc/pid/clear_refs: avoid split_huge_page()
mm: pagewalk: fix misbehavior of walk_page_range for vma(VM_PFNMAP)
mempolicy: apply page table walker on queue_pages_range()
arch/powerpc/mm/subpage-prot.c: use walk->vma and walk_page_vma()
memcg: cleanup preparation for page table walk
numa_maps: remove numa_maps->vma
numa_maps: fix typo in gather_hugetbl_stats
pagemap: use walk->vma instead of calling find_vma()
clear_refs: remove clear_refs_private->vma and introduce clear_refs_test_walk()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux
Pull powerpc updates from Michael Ellerman:
- Update of all defconfigs
- Addition of a bunch of config options to modernise our defconfigs
- Some PS3 updates from Geoff
- Optimised memcmp for 64 bit from Anton
- Fix for kprobes that allows 'perf probe' to work from Naveen
- Several cxl updates from Ian & Ryan
- Expanded support for the '24x7' PMU from Cody & Sukadev
- Freescale updates from Scott:
"Highlights include 8xx optimizations, some more work on datapath
device tree content, e300 machine check support, t1040 corenet
error reporting, and various cleanups and fixes"
* tag 'powerpc-3.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: (102 commits)
cxl: Add missing return statement after handling AFU errror
cxl: Fail AFU initialisation if an invalid configuration record is found
cxl: Export optional AFU configuration record in sysfs
powerpc/mm: Warn on flushing tlb page in kernel context
powerpc/powernv: Add OPAL soft-poweroff routine
powerpc/perf/hv-24x7: Document sysfs event description entries
powerpc/perf/hv-gpci: add the remaining gpci requests
powerpc/perf/{hv-gpci, hv-common}: generate requests with counters annotated
powerpc/perf/hv-24x7: parse catalog and populate sysfs with events
perf: define EVENT_DEFINE_RANGE_FORMAT_LITE helper
perf: add PMU_EVENT_ATTR_STRING() helper
perf: provide sysfs_show for struct perf_pmu_events_attr
powerpc/kernel: Avoid initializing device-tree pointer twice
powerpc: Remove old compile time disabled syscall tracing code
powerpc/kernel: Make syscall_exit a local label
cxl: Fix device_node reference counting
powerpc/mm: bail out early when flushing TLB page
powerpc: defconfigs: add MTD_SPI_NOR (new dependency for M25P80)
perf/powerpc: reset event hw state when adding it to the PMU
powerpc/qe: Use strlcpy()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
"arm64 updates for 3.20:
- reimplementation of the virtual remapping of UEFI Runtime Services
in a way that is stable across kexec
- emulation of the "setend" instruction for 32-bit tasks (user
endianness switching trapped in the kernel, SCTLR_EL1.E0E bit set
accordingly)
- compat_sys_call_table implemented in C (from asm) and made it a
constant array together with sys_call_table
- export CPU cache information via /sys (like other architectures)
- DMA API implementation clean-up in preparation for IOMMU support
- macros clean-up for KVM
- dropped some unnecessary cache+tlb maintenance
- CONFIG_ARM64_CPU_SUSPEND clean-up
- defconfig update (CPU_IDLE)
The EFI changes going via the arm64 tree have been acked by Matt
Fleming. There is also a patch adding sys_*stat64 prototypes to
include/linux/syscalls.h, acked by Andrew Morton"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (47 commits)
arm64: compat: Remove incorrect comment in compat_siginfo
arm64: Fix section mismatch on alloc_init_p[mu]d()
arm64: Avoid breakage caused by .altmacro in fpsimd save/restore macros
arm64: mm: use *_sect to check for section maps
arm64: drop unnecessary cache+tlb maintenance
arm64:mm: free the useless initial page table
arm64: Enable CPU_IDLE in defconfig
arm64: kernel: remove ARM64_CPU_SUSPEND config option
arm64: make sys_call_table const
arm64: Remove asm/syscalls.h
arm64: Implement the compat_sys_call_table in C
syscalls: Declare sys_*stat64 prototypes if __ARCH_WANT_(COMPAT_)STAT64
compat: Declare compat_sys_sigpending and compat_sys_sigprocmask prototypes
arm64: uapi: expose our struct ucontext to the uapi headers
smp, ARM64: Kill SMP single function call interrupt
arm64: Emulate SETEND for AArch32 tasks
arm64: Consolidate hotplug notifier for instruction emulation
arm64: Track system support for mixed endian EL0
arm64: implement generic IOMMU configuration
arm64: Combine coherent and non-coherent swiotlb dma_ops
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:
- The remaining patches for the z13 machine support: kernel build
option for z13, the cache synonym avoidance, SMT support,
compare-and-delay for spinloops and the CES5S crypto adapater.
- The ftrace support for function tracing with the gcc hotpatch option.
This touches common code Makefiles, Steven is ok with the changes.
- The hypfs file system gets an extension to access diagnose 0x0c data
in user space for performance analysis for Linux running under z/VM.
- The iucv hvc console gets wildcard spport for the user id filtering.
- The cacheinfo code is converted to use the generic infrastructure.
- Cleanup and bug fixes.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (42 commits)
s390/process: free vx save area when releasing tasks
s390/hypfs: Eliminate hypfs interval
s390/hypfs: Add diagnose 0c support
s390/cacheinfo: don't use smp_processor_id() in preemptible context
s390/zcrypt: fixed domain scanning problem (again)
s390/smp: increase maximum value of NR_CPUS to 512
s390/jump label: use different nop instruction
s390/jump label: add sanity checks
s390/mm: correct missing space when reporting user process faults
s390/dasd: cleanup profiling
s390/dasd: add locking for global_profile access
s390/ftrace: hotpatch support for function tracing
ftrace: let notrace function attribute disable hotpatching if necessary
ftrace: allow architectures to specify ftrace compile options
s390: reintroduce diag 44 calls for cpu_relax()
s390/zcrypt: Add support for new crypto express (CEX5S) adapter.
s390/zcrypt: Number of supported ap domains is not retrievable.
s390/spinlock: add compare-and-delay to lock wait loops
s390/tape: remove redundant if statement
s390/hvc_iucv: add simple wildcard matches to the iucv allow filter
...
|
|
Pull NFS client updates from Trond Myklebust:
"Highlights incluse:
Features:
- Removing the forced serialisation of open()/close() calls in
NFSv4.x (x>0) makes for a significant performance improvement in
metadata intensive workloads.
- Full support for the pNFS "flexible files" layout type
- Further RPC/RDMA client improvements from Chuck
Bugfixes:
- Stable fix: NFSv4.1 backchannel calls blocking operations with !TASK_RUNNING
- Stable fix: pnfs_generic_pg_init_read/write can be called with lseg == NULL
- Stable fix: Fix an Oopsable condition when nsm_mon_unmon is called
as part of the namespace cleanup,
- Stable fix: Ensure we reference the inode for return-on-close in
delegreturn
- Use SO_REUSEPORT to ensure that NFSv3 TCP connections can rebind to
the same source address/port combination during a disconnect/
reconnect event. This is a requirement imposed by most NFSv3
server duplicate reply cache implementations.
Optimisations:
- Ask for no NFSv4.1 delegations on OPEN if using O_DIRECT
Other:
- Add Anna Schumaker as co-maintainer for the NFS client"
* tag 'nfs-for-3.20-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (119 commits)
SUNRPC: Cleanup to remove xs_tcp_close()
pnfs: delete an unintended goto
pnfs/flexfiles: Do not dprintk after the free
SUNRPC: Fix stupid typo in xs_sock_set_reuseport
SUNRPC: Define xs_tcp_fin_timeout only if CONFIG_SUNRPC_DEBUG
SUNRPC: Handle connection reset more efficiently.
SUNRPC: Remove the redundant XPRT_CONNECTION_CLOSE flag
SUNRPC: Make xs_tcp_close() do a socket shutdown rather than a sock_release
SUNRPC: Ensure xs_tcp_shutdown() requests a full close of the connection
SUNRPC: Cleanup to remove remaining uses of XPRT_CONNECTION_ABORT
SUNRPC: Remove TCP socket linger code
SUNRPC: Remove TCP client connection reset hack
SUNRPC: TCP/UDP always close the old socket before reconnecting
SUNRPC: Add helpers to prevent socket create from racing
SUNRPC: Ensure xs_reset_transport() resets the close connection flags
SUNRPC: Do not clear the source port in xs_reset_transport
SUNRPC: Handle EADDRINUSE on connect
SUNRPC: Set SO_REUSEPORT socket option for TCP connections
NFSv4.1: Fix pnfs_put_lseg races
NFSv4.1: pnfs_send_layoutreturn should use GFP_NOFS
...
|
|
[akpm@linux-foundation.org: tweaks]
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Calvin Owens <calvinowens@fb.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Dave noticed that unprivileged process can allocate significant amount of
memory -- >500 MiB on x86_64 -- and stay unnoticed by oom-killer and
memory cgroup. The trick is to allocate a lot of PMD page tables. Linux
kernel doesn't account PMD tables to the process, only PTE.
The use-cases below use few tricks to allocate a lot of PMD page tables
while keeping VmRSS and VmPTE low. oom_score for the process will be 0.
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/prctl.h>
#define PUD_SIZE (1UL << 30)
#define PMD_SIZE (1UL << 21)
#define NR_PUD 130000
int main(void)
{
char *addr = NULL;
unsigned long i;
prctl(PR_SET_THP_DISABLE);
for (i = 0; i < NR_PUD ; i++) {
addr = mmap(addr + PUD_SIZE, PUD_SIZE, PROT_WRITE|PROT_READ,
MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
if (addr == MAP_FAILED) {
perror("mmap");
break;
}
*addr = 'x';
munmap(addr, PMD_SIZE);
mmap(addr, PMD_SIZE, PROT_WRITE|PROT_READ,
MAP_ANONYMOUS|MAP_PRIVATE|MAP_FIXED, -1, 0);
if (addr == MAP_FAILED)
perror("re-mmap"), exit(1);
}
printf("PID %d consumed %lu KiB in PMD page tables\n",
getpid(), i * 4096 >> 10);
return pause();
}
The patch addresses the issue by account PMD tables to the process the
same way we account PTE.
The main place where PMD tables is accounted is __pmd_alloc() and
free_pmd_range(). But there're few corner cases:
- HugeTLB can share PMD page tables. The patch handles by accounting
the table to all processes who share it.
- x86 PAE pre-allocates few PMD tables on fork.
- Architectures with FIRST_USER_ADDRESS > 0. We need to adjust sanity
check on exit(2).
Accounting only happens on configuration where PMD page table's level is
present (PMD is not folded). As with nr_ptes we use per-mm counter. The
counter value is used to calculate baseline for badness score by
oom-killer.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: David Rientjes <rientjes@google.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Introduce the basic control files to account, partition, and limit
memory using cgroups in default hierarchy mode.
This interface versioning allows us to address fundamental design
issues in the existing memory cgroup interface, further explained
below. The old interface will be maintained indefinitely, but a
clearer model and improved workload performance should encourage
existing users to switch over to the new one eventually.
The control files are thus:
- memory.current shows the current consumption of the cgroup and its
descendants, in bytes.
- memory.low configures the lower end of the cgroup's expected
memory consumption range. The kernel considers memory below that
boundary to be a reserve - the minimum that the workload needs in
order to make forward progress - and generally avoids reclaiming
it, unless there is an imminent risk of entering an OOM situation.
- memory.high configures the upper end of the cgroup's expected
memory consumption range. A cgroup whose consumption grows beyond
this threshold is forced into direct reclaim, to work off the
excess and to throttle new allocations heavily, but is generally
allowed to continue and the OOM killer is not invoked.
- memory.max configures the hard maximum amount of memory that the
cgroup is allowed to consume before the OOM killer is invoked.
- memory.events shows event counters that indicate how often the
cgroup was reclaimed while below memory.low, how often it was
forced to reclaim excess beyond memory.high, how often it hit
memory.max, and how often it entered OOM due to memory.max. This
allows users to identify configuration problems when observing a
degradation in workload performance. An overcommitted system will
have an increased rate of low boundary breaches, whereas increased
rates of high limit breaches, maximum hits, or even OOM situations
will indicate internally overcommitted cgroups.
For existing users of memory cgroups, the following deviations from
the current interface are worth pointing out and explaining:
- The original lower boundary, the soft limit, is defined as a limit
that is per default unset. As a result, the set of cgroups that
global reclaim prefers is opt-in, rather than opt-out. The costs
for optimizing these mostly negative lookups are so high that the
implementation, despite its enormous size, does not even provide
the basic desirable behavior. First off, the soft limit has no
hierarchical meaning. All configured groups are organized in a
global rbtree and treated like equal peers, regardless where they
are located in the hierarchy. This makes subtree delegation
impossible. Second, the soft limit reclaim pass is so aggressive
that it not just introduces high allocation latencies into the
system, but also impacts system performance due to overreclaim, to
the point where the feature becomes self-defeating.
The memory.low boundary on the other hand is a top-down allocated
reserve. A cgroup enjoys reclaim protection when it and all its
ancestors are below their low boundaries, which makes delegation
of subtrees possible. Secondly, new cgroups have no reserve per
default and in the common case most cgroups are eligible for the
preferred reclaim pass. This allows the new low boundary to be
efficiently implemented with just a minor addition to the generic
reclaim code, without the need for out-of-band data structures and
reclaim passes. Because the generic reclaim code considers all
cgroups except for the ones running low in the preferred first
reclaim pass, overreclaim of individual groups is eliminated as
well, resulting in much better overall workload performance.
- The original high boundary, the hard limit, is defined as a strict
limit that can not budge, even if the OOM killer has to be called.
But this generally goes against the goal of making the most out of
the available memory. The memory consumption of workloads varies
during runtime, and that requires users to overcommit. But doing
that with a strict upper limit requires either a fairly accurate
prediction of the working set size or adding slack to the limit.
Since working set size estimation is hard and error prone, and
getting it wrong results in OOM kills, most users tend to err on
the side of a looser limit and end up wasting precious resources.
The memory.high boundary on the other hand can be set much more
conservatively. When hit, it throttles allocations by forcing
them into direct reclaim to work off the excess, but it never
invokes the OOM killer. As a result, a high boundary that is
chosen too aggressively will not terminate the processes, but
instead it will lead to gradual performance degradation. The user
can monitor this and make corrections until the minimal memory
footprint that still gives acceptable performance is found.
In extreme cases, with many concurrent allocations and a complete
breakdown of reclaim progress within the group, the high boundary
can be exceeded. But even then it's mostly better to satisfy the
allocation from the slack available in other groups or the rest of
the system than killing the group. Otherwise, memory.max is there
to limit this type of spillover and ultimately contain buggy or
even malicious applications.
- The original control file names are unwieldy and inconsistent in
many different ways. For example, the upper boundary hit count is
exported in the memory.failcnt file, but an OOM event count has to
be manually counted by listening to memory.oom_control events, and
lower boundary / soft limit events have to be counted by first
setting a threshold for that value and then counting those events.
Also, usage and limit files encode their units in the filename.
That makes the filenames very long, even though this is not
information that a user needs to be reminded of every time they
type out those names.
To address these naming issues, as well as to signal clearly that
the new interface carries a new configuration model, the naming
conventions in it necessarily differ from the old interface.
- The original limit files indicate the state of an unset limit with
a very high number, and a configured limit can be unset by echoing
-1 into those files. But that very high number is implementation
and architecture dependent and not very descriptive. And while -1
can be understood as an underflow into the highest possible value,
-2 or -10M etc. do not work, so it's not inconsistent.
memory.low, memory.high, and memory.max will use the string
"infinity" to indicate and set the highest possible value.
[akpm@linux-foundation.org: use seq_puts() for basic strings]
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: Greg Thelen <gthelen@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Add KPF_ZERO_PAGE flag for zero_page, so that userspace processes can
detect zero_page in /proc/kpageflags, and then do memory analysis more
accurately.
Signed-off-by: Yalin Wang <yalin.wang@sonymobile.com>
Acked-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull documentation updates from Jonathan Corbet:
"Highlights this time around include:
- A thrashing of SubmittingPatches to bring it out of the "send
everything to Linus" era of kernel development.
- A new document on completions from Nicholas McGuire
- Lots of typo fixes, formatting improvements, corrections, build
fixes, and more"
* tag 'docs-for-linus' of git://git.lwn.net/linux-2.6: (35 commits)
Documentation: Fix the wrong command `echo -1 > set_ftrace_pid` for cleaning the filter.
can-doc: Fixed a wrong filepath in can.txt
Documentation: Fix trivial typo in comment.
kgdb,docs: Fix typo and minor style issues
Documentation: add description for FTRACE probe status
doc: brief user documentation for completion
Documentation/misc-devices/mei: Fix indentation of embedded code.
Documentation/misc-devices/mei: Fix indentation of enumeration.
Documentation/misc-devices/mei: Fix spacing around parentheses.
Documentation/misc-devices/mei: Fix formatting of headings.
Documentation: devicetree: Fix double words in Doumentation/devicetree
Documentation: mm: Fix typo in vm.txt
lockstat: Add documentation on contention and contenting points
Documentation: fix blackfin gptimers-example build errors
Fixes column alignment in table of contents entry 1.9 in Documentation/filesystems/proc.txt
CodingStyle: enable emacs display of trailing whitespace
DocBook: Do not exceed argument list limit
gpio: board.txt: Fix the gpio name example
Documentation/SubmittingPatches: unify whitespace/tabs for the DCO
MAINTAINERS: Add the docs-next git tree to the maintainer entry
...
|
|
git://git.linaro.org/landing-teams/working/fujitsu/integration
Pull mailbox framework updates from Jassi Brar.
* 'mailbox-devel' of git://git.linaro.org/landing-teams/working/fujitsu/integration:
mailbox: Add Altera mailbox driver
mailbox: check for bit set before polling
Mailbox: Fix return value check in pcc_init()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pincontrol updates from Linus Walleij:
:This is the bulk of pin control changes for the v3.20 cycle:
Framework changes and enhancements:
- Passing -DDEBUG recursively to subdir drivers so we get debug
messages properly turned on.
- Infer map type from DT property in the groups parsing code in the
generic pinconfig code.
- Support for custom parameter passing in generic pin config. This
is used when you are using the generic pin config, but want to add
a few custom properties that no other driver will use.
New drivers:
- Driver for the Xilinx Zynq
- Driver for the AmLogic Meson SoCs
New features in drivers:
- Sleep support (suspend/resume) for the Cherryview driver
- mvebeu a38x can now mux a UART on pins MPP19 and MPP20
- Migrated the qualcomm driver to generic pin config handling of
extended config options in the core code.
- Support BUS1 and AUDIO in the Exynos pin controller.
- Add some missing functions in the sun6i driver.
- Add support for the A31S variant in the sun6i driver.
- EMEv2 support in the Renesas PFC driver.
- Add support for Qualcomm MSM8916 in the qcom driver.
Deleted features
- Drop support for the SiRF Marco that was never released to the
market.
- Drop SH7372 support as the support for this platform is removed
from the kernel"
* tag 'pinctrl-v3.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (40 commits)
sh-pfc: emev2 - Fix mangled author name
pinctrl: cherryview: Configure HiZ pins to be input when requested as GPIOs
pinctrl: imx25: fix numbering for pins
pinctrl: pinctrl-imx: don't use invalid value of conf_reg
pinctrl: qcom: delete pin_config_get/set pinconf operations
pinctrl: qcom: Add msm8916 pinctrl driver
DT: pinctrl: Document Qualcomm MSM8916 pinctrl binding
pinctrl: qcom: increase variable size for register offsets
pinctrl: hide PCONFDUMP in #ifdef
pinctrl: rockchip: Only mask interrupts; never disable
pinctrl: zynq: Fix usb0 pins
pinctrl: sh-pfc: sh7372: Remove DT binding documentation
pinctrl: sh-pfc: sh7372: Remove PFC support
sh-pfc: Add emev2 pinmux support
sh-pfc: add macro to define pinmux without function
pinctrl: add driver for Amlogic Meson SoCs
staging: drivers: pinctrl: Fixed checkpatch.pl warnings
pinctrl: exynos: Add AUDIO pin controller for exynos7
sh-pfc: r8a7790: add MLB+ pin group
sh-pfc: r8a7791: add MLB+ pin group
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO changes from Linus Walleij:
"This is the GPIO bulk changes for the v3.20 series:
GPIOLIB core changes:
- Create and use of_mm_gpiochip_remove() for removing memory-mapped
OF GPIO chips
- GPIO MMIO library suppports bgpio_set_multiple for switching
several lines at once, a feature merged in the last cycle.
New drivers:
- New driver for the APM X-gene standby GPIO controller
- New driver for the Fujitsu MB86S7x GPIO controller
Cleanups:
- Moved rcar driver to use gpiolib irqchip
- Moxart converted to the GPIO MMIO library
- GE driver converted to GPIO MMIO library
- Move sx150x to irqdomain
- Move max732x to irqdomain
- Move vx855 to use managed resources
- Move dwapb to use managed resources
- Clean tc3589x from platform data
- Clean stmpe driver to use device tree only probe
New subtypes:
- sx1506 support in the sx150x driver
- Quark 1000 SoC support in the SCH driver
- Support X86 in the Xilinx driver
- Support PXA1928 in the PXA driver
Extended drivers:
- max732x supports device tree probe
- sx150x supports device tree probe
Various minor cleanups and bug fixes"
* tag 'gpio-v3.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (61 commits)
gpio: kconfig: replace PPC_OF with PPC
gpio: pxa: add PXA1928 gpio type support
dt/bindings: gpio: add compatible string for marvell,pxa1928-gpio
gpio: pxa: remove mach IRQ includes
gpio: max732x: use an inline function for container cast
gpio: use sizeof() instead of hardcoded values
gpio: max732x: add set_multiple function
gpio: sch: Consolidate similar algorithms
gpio: tz1090-pdc: Use resource_size to fix off-by-one resource size calculation
gpio: ge: Convert to use devm_kstrdup
gpio: correctly use const char * const
gpio: sx150x: fixup OF support
gpio: mpc8xxx: Use of_mm_gpiochip_remove
gpio: Add Fujitsu MB86S7x GPIO driver
gpio: mpc8xxx: Convert to platform device interface.
gpio: zevio: Use of_mm_gpiochip_remove
gpio: gpio-mm-lantiq: Use of_mm_gpiochip_remove
gpio: gpio-mm-lantiq: Use of_property_read_u32
gpio: gpio-mm-lantiq: Do not replicate code
gpio :gpio-mm-lantiq: Use devm_kzalloc
...
|
|
Pull MMC updates from Ulf Hansson:
"MMC core:
- Support for MMC power sequences.
- SDIO function devicetree subnode parsing.
- Refactor the hardware reset routines and enable it for SD cards.
- Various code quality improvements, especially for slot-gpio.
MMC host:
- dw_mmc: Various fixes and cleanups.
- dw_mmc: Convert to mmc_send_tuning().
- moxart: Fix probe logic.
- sdhci: Various fixes and cleanups
- sdhci: Asynchronous request handling support.
- sdhci-pxav3: Various fixes and cleanups.
- sdhci-tegra: Fixes for T114, T124 and T132.
- rtsx: Various fixes and cleanups.
- rtsx: Support for SDIO.
- sdhi/tmio: Refactor and cleanup of header files.
- omap_hsmmc: Use slot-gpio and common MMC DT parser.
- Make all hosts to deal with errors from mmc_of_parse().
- sunxi: Various fixes and cleanups.
- sdhci: Support for Fujitsu SDHCI controller f_sdh30"
* tag 'mmc-v3.20-1' of git://git.linaro.org/people/ulf.hansson/mmc: (117 commits)
mmc: sdhci-s3c: solve problem with sleeping in atomic context
mmc: pwrseq: add driver for emmc hardware reset
mmc: moxart: fix probe logic
mmc: core: Invoke mmc_pwrseq_post_power_on() prior MMC_POWER_ON state
mmc: pwrseq_simple: Add optional reference clock support
mmc: pwrseq: Document optional clock for the simple power sequence
mmc: pwrseq_simple: Extend to support more pins
mmc: pwrseq: Document that simple sequence support more than one GPIO
mmc: Add hardware dependencies for sdhci-pxav3 and sdhci-pxav2
mmc: sdhci-pxav3: Modify clock settings for the SDR50 and DDR50 modes
mmc: sdhci-pxav3: Extend binding with SDIO3 conf reg for the Armada 38x
mmc: sdhci-pxav3: Fix Armada 38x controller's caps according to erratum ERR-7878951
mmc: sdhci-pxav3: Fix SDR50 and DDR50 capabilities for the Armada 38x flavor
mmc: sdhci: switch voltage before sdhci_set_ios in runtime resume
mmc: tegra: Write xfer_mode, CMD regs in together
mmc: Resolve BKOPS compatability issue
mmc: sdhci-pxav3: fix setting of pdata->clk_delay_cycles
mmc: dw_mmc: rockchip: remove incorrect __exit_p()
mmc: dw_mmc: exynos: remove incorrect __exit_p()
mmc: Fix menuconfig alignment of MMC_SDHCI_* options
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:
"The first round of updates for the input subsystem.
A few new drivers (power button handler for AXP20x PMIC, tps65218
power button driver, sun4i keys driver, regulator haptic driver, NI
Ettus Research USRP E3x0 button, Alwinner A10/A20 PS/2 controller).
Updates to Synaptics and ALPS touchpad drivers (with more to come
later), brand new Focaltech PS/2 support, update to Cypress driver to
handle Gen5 (in addition to Gen3) devices, and number of other fixups
to various drivers as well as input core"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (54 commits)
Input: elan_i2c - fix wrong %p extension
Input: evdev - do not queue SYN_DROPPED if queue is empty
Input: gscps2 - fix MODULE_DEVICE_TABLE invocation
Input: synaptics - use dmax in input_mt_assign_slots
Input: pxa27x_keypad - remove unnecessary ARM includes
Input: ti_am335x_tsc - replace delta filtering with median filtering
ARM: dts: AM335x: Make charge delay a DT parameter for TSC
Input: ti_am335x_tsc - read charge delay from DT
Input: ti_am335x_tsc - remove udelay in interrupt handler
Input: ti_am335x_tsc - interchange touchscreen and ADC steps
Input: MT - add support for balanced slot assignment
Input: drv2667 - remove wrong and unneeded drv2667-haptics modalias
Input: drv260x - remove wrong and unneeded drv260x-haptics modalias
Input: cap11xx - remove wrong and unneeded cap11xx modalias
Input: sun4i-ts - add support for touchpanel controller on A31
Input: serio - add support for Alwinner A10/A20 PS/2 controller
Input: gtco - use sign_extend32() for sign extension
Input: elan_i2c - verify firmware signature applying it
Input: elantech - remove stale comment from Kconfig
Input: cyapa - off by one in cyapa_update_fw_store()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux
Pull fbdev changes from Tomi Valkeinen:
- omapdss: add DRA7xxx SoC support
- fbdev: support DMT (Display Monitor Timing) calculation
* tag 'fbdev-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux: (40 commits)
omapfb: Return error code when applying overlay settings fails
OMAPDSS: DPI: DRA7xx support
OMAPDSS: HDMI: Add DRA7xx support
OMAPDSS: DISPC: program dispc polarities to control module
OMAPDSS: DISPC: Add DRA7xx support
OMAPDSS: Add Video PLLs for DRA7xx
OMAPDSS: Add functions for external control of PLL
OMAPDSS: DSS: Add DRA7xx base support
Doc/DT: Add DT binding doc for DRA7xx DSS
OMAPDSS: add define for DRA7xx HW version
OMAPDSS: encoder-tpd12s015: Fix race issue with LS_OE
OMAPDSS: OMAP5: fix digit output's allowed mgrs
OMAPDSS: constify port arrays
OMAPDSS: PLL: add dss_pll_wait_reset_done()
OMAPDSS: Add enum dss_pll_id
video: fbdev: fix sys_copyarea
video/mmpfb: allow modular build
fb: via: turn gpiolib and i2c selects into dependencies
fbdev: ssd1307fb: return proper error code if write command fails
fbdev: fix CVT vertical front and back porch values
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound updates from Takashi Iwai:
"In this batch, you can find lots of cleanups through the whole
subsystem, as our good New Year's resolution. Lots of LOCs and
commits are about LINE6 driver that was promoted finally from staging
tree, and as usual, there've been widely spread ASoC changes.
Here some highlights:
ALSA core changes
- Embedding struct device into ALSA core structures
- sequencer core cleanups / fixes
- PCM msbits constraints cleanups / fixes
- New SNDRV_PCM_TRIGGER_DRAIN command
- PCM kerneldoc fixes, header cleanups
- PCM code cleanups using more standard codes
- Control notification ID fixes
Driver cleanups
- Cleanups of PCI PM callbacks
- Timer helper usages cleanups
- Simplification (e.g. argument reduction) of many driver codes
HD-audio
- Hotkey and LED support on HP laptops with Realtek codecs
- Dock station support on HP laptops
- Toshiba Satellite S50D fixup
- Enhanced wallclock timestamp handling for HD-audio
- Componentization to simplify the linkage between i915 and hd-audio
drivers for Intel HDMI/DP
USB-audio
- Akai MPC Element support
- Enhanced timestamp handling
ASoC
- Lots of refactoringin ASoC core, moving drivers to more data driven
initialization and rationalizing a lot of DAPM usage
- Much improved handling of CDCLK clocks on Samsung I2S controllers
- Lots of driver specific cleanups and feature improvements
- CODEC support for TI PCM514x and TLV320AIC3104 devices
- Board support for Tegra systems with Realtek RT5677
- New driver for Maxim max98357a
- More enhancements / fixes for Intel SST driver
Others
- Promotion of LINE6 driver from staging along with lots of rewrites
and cleanups
- DT support for old non-ASoC atmel driver
- oxygen cleanups, XIO2001 init, Studio Evolution SE6x support
- Emu8000 DRAM size detection fix on ISA(!!) AWE64 boards
- A few more ak411x fixes for ice1724 boards"
* tag 'sound-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (542 commits)
ALSA: line6: toneport: Use explicit type for firmware version
ALSA: line6: Use explicit type for serial number
ALSA: line6: Return EIO if read/write not successful
ALSA: line6: Return error if device not responding
ALSA: line6: Add delay before reading status
ASoC: Intel: Clean data after SST fw fetch
ALSA: hda - Add docking station support for another HP machine
ALSA: control: fix failure to return new numerical ID in 'replace' event data
ALSA: usb: update trigger timestamp on first non-zero URB submitted
ALSA: hda: read trigger_timestamp immediately after starting DMA
ALSA: pcm: allow for trigger_tstamp snapshot in .trigger
ALSA: pcm: don't override timestamp unconditionally
ALSA: off by one bug in snd_riptide_joystick_probe()
ASoC: rt5670: Set use_single_rw flag for regmap
ASoC: rt286: Add rt288 codec support
ASoC: max98357a: Fix build in !CONFIG_OF case
ASoC: Intel: fix platform_no_drv_owner.cocci warnings
ARM: dts: Switch Odroid X2/U2 to simple-audio-card
ARM: dts: Exynos4 and Odroid X2/U3 sound device nodes update
ALSA: control: fix failure to return numerical ID in 'add' event
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:
- Some documentation updates and a few new pixel formats
- Stop btcx-risc abuse by cx88 and move it to bt8xx driver
- New platform driver: am437x
- New webcam driver: toptek
- New remote controller hardware protocols added to img-ir driver
- Removal of a few very old drivers that relies on old kABIs and are
for very hard to find hardware: parallel port webcam drivers
(bw-qcam, c-cam, pms and w9966), tlg2300, Video In/Out for SGI (vino)
- Removal of the USB Telegent driver (tlg2300). The company that
developed this driver has long gone and the hardware is hard to find.
As it relies on a legacy set of kABI symbols and nobody seems to care
about it, remove it.
- several improvements at rtl2832 driver
- conversion on cx28521 and au0828 to use videobuf2 (VB2)
- several improvements, fixups and board additions
* tag 'media/v3.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (321 commits)
[media] dvb_net: Convert local hex dump to print_hex_dump_debug
[media] dvb_net: Use standard debugging facilities
[media] dvb_net: Use vsprintf %pM extension to print Ethernet addresses
[media] staging: lirc_serial: adjust boolean assignments
[media] stb0899: use sign_extend32() for sign extension
[media] si2168: add support for 1.7MHz bandwidth
[media] si2168: return error if set_frontend is called with invalid parameters
[media] lirc_dev: avoid potential null-dereference
[media] mn88472: simplify bandwidth registers setting code
[media] dvb: tc90522: re-add symbol-rate report
[media] lmedm04: add read snr, signal strength and ber call backs
[media] lmedm04: Create frontend call back for read status
[media] lmedm04: create frontend callbacks for signal/snr/ber/ucblocks
[media] lmedm04: Fix usb_submit_urb BOGUS urb xfer, pipe 1 != type 3 in interrupt urb
[media] lmedm04: Increase Interupt due time to 200 msec
[media] cx88-dvb: whitespace cleanup
[media] rtl28xxu: properly initialize pdata
[media] rtl2832: declare functions as static
[media] rtl2830: declare functions as static
[media] rtl2832_sdr: add kernel-doc comments for platform_data
...
|
|
Pull power supply and reset changes from Sebastian Reichel:
"New drivers:
- charger driver for Maxim 77693
- battery gauge driver for LTC 2941/2943
- battery gauge driver for RT5033
- reset driver for R-Mobile platforms
Convert drivers to restart handler framework:
- arm-versatile
- at91
- st-poweroff
Misc:
- remove deprecated sun6i reboot driver
- use alarmtimer instead of rtc in charger-manager
- misc fixes"
* tag 'for-v3.20' of git://git.infradead.org/battery-2.6: (48 commits)
power_supply: 88pm860x: Fix leaked power supply on probe fail
power/reset: restart-poweroff: Remove arm dependencies
power/reset: st-poweroff: Fix misleading Kconfig description
power/reset: st-poweroff: Register with kernel restart handler
power/reset: Remove sun6i reboot driver
power/reset: at91: Register with kernel restart handler
power/reset: arm-versatile: Register with kernel restart handler
power: test_power: Use enum as index for array of supplies
Add devicetree binding documentation for the LTC2941/LTC2943 driver
Add LTC2941/LTC2943 Battery Gauge Driver
power/reset: brcmstb: Add support for old 65nm chips
power/reset: brcmstb: Use the DT "compatible" string to indicate bit positions
power/reset: brcmstb: Make the driver buildable on MIPS
power: charger-manager: Use alarmtimer for battery monitoring in suspend.
power/reset: at91-poweroff: Fix error handling and other compiler warnings
bq27x00_battery: Call power_supply_changed only when capacity changed
bq27x00_battery: fix register offset for bq27425
power: max14577: Remove SYSFS dependency from Kconfig
power: bq24190_charger: suppress build warning
power: reset: Add reset driver for R-Mobile platforms
...
|
|
Pull networking updates from David Miller:
1) More iov_iter conversion work from Al Viro.
[ The "crypto: switch af_alg_make_sg() to iov_iter" commit was
wrong, and this pull actually adds an extra commit on top of the
branch I'm pulling to fix that up, so that the pre-merge state is
ok. - Linus ]
2) Various optimizations to the ipv4 forwarding information base trie
lookup implementation. From Alexander Duyck.
3) Remove sock_iocb altogether, from CHristoph Hellwig.
4) Allow congestion control algorithm selection via routing metrics.
From Daniel Borkmann.
5) Make ipv4 uncached route list per-cpu, from Eric Dumazet.
6) Handle rfs hash collisions more gracefully, also from Eric Dumazet.
7) Add xmit_more support to r8169, e1000, and e1000e drivers. From
Florian Westphal.
8) Transparent Ethernet Bridging support for GRO, from Jesse Gross.
9) Add BPF packet actions to packet scheduler, from Jiri Pirko.
10) Add support for uniqu flow IDs to openvswitch, from Joe Stringer.
11) New NetCP ethernet driver, from Muralidharan Karicheri and Wingman
Kwok.
12) More sanely handle out-of-window dupacks, which can result in
serious ACK storms. From Neal Cardwell.
13) Various rhashtable bug fixes and enhancements, from Herbert Xu,
Patrick McHardy, and Thomas Graf.
14) Support xmit_more in be2net, from Sathya Perla.
15) Group Policy extensions for vxlan, from Thomas Graf.
16) Remove Checksum Offload support for vxlan, from Tom Herbert.
17) Like ipv4, support lockless transmit over ipv6 UDP sockets. From
Vlad Yasevich.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1494+1 commits)
crypto: fix af_alg_make_sg() conversion to iov_iter
ipv4: Namespecify TCP PMTU mechanism
i40e: Fix for stats init function call in Rx setup
tcp: don't include Fast Open option in SYN-ACK on pure SYN-data
openvswitch: Only set TUNNEL_VXLAN_OPT if VXLAN-GBP metadata is set
ipv6: Make __ipv6_select_ident static
ipv6: Fix fragment id assignment on LE arches.
bridge: Fix inability to add non-vlan fdb entry
net: Mellanox: Delete unnecessary checks before the function call "vunmap"
cxgb4: Add support in cxgb4 to get expansion rom version via ethtool
ethtool: rename reserved1 memeber in ethtool_drvinfo for expansion ROM version
net: dsa: Remove redundant phy_attach()
IB/mlx4: Reset flow support for IB kernel ULPs
IB/mlx4: Always use the correct port for mirrored multicast attachments
net/bonding: Fix potential bad memory access during bonding events
tipc: remove tipc_snprintf
tipc: nl compat add noop and remove legacy nl framework
tipc: convert legacy nl stats show to nl compat
tipc: convert legacy nl net id get to nl compat
tipc: convert legacy nl net id set to nl compat
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree changes from Jiri Kosina:
"Patches from trivial.git that keep the world turning around.
Mostly documentation and comment fixes, and a two corner-case code
fixes from Alan Cox"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
kexec, Kconfig: spell "architecture" properly
mm: fix cleancache debugfs directory path
blackfin: mach-common: ints-priority: remove unused function
doubletalk: probe failure causes OOPS
ARM: cache-l2x0.c: Make it clear that cache-l2x0 handles L310 cache controller
msdos_fs.h: fix 'fields' in comment
scsi: aic7xxx: fix comment
ARM: l2c: fix comment
ibmraid: fix writeable attribute with no store method
dynamic_debug: fix comment
doc: usbmon: fix spelling s/unpriviledged/unprivileged/
x86: init_mem_mapping(): use capital BIOS in comment
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching
Pull live patching infrastructure from Jiri Kosina:
"Let me provide a bit of history first, before describing what is in
this pile.
Originally, there was kSplice as a standalone project that implemented
stop_machine()-based patching for the linux kernel. This project got
later acquired, and the current owner is providing live patching as a
proprietary service, without any intentions to have their
implementation merged.
Then, due to rising user/customer demand, both Red Hat and SUSE
started working on their own implementation (not knowing about each
other), and announced first versions roughly at the same time [1] [2].
The principle difference between the two solutions is how they are
making sure that the patching is performed in a consistent way when it
comes to different execution threads with respect to the semantic
nature of the change that is being introduced.
In a nutshell, kPatch is issuing stop_machine(), then looking at
stacks of all existing processess, and if it decides that the system
is in a state that can be patched safely, it proceeds insterting code
redirection machinery to the patched functions.
On the other hand, kGraft provides a per-thread consistency during one
single pass of a process through the kernel and performs a lazy
contignuous migration of threads from "unpatched" universe to the
"patched" one at safe checkpoints.
If interested in a more detailed discussion about the consistency
models and its possible combinations, please see the thread that
evolved around [3].
It pretty quickly became obvious to the interested parties that it's
absolutely impractical in this case to have several isolated solutions
for one task to co-exist in the kernel. During a dedicated Live
Kernel Patching track at LPC in Dusseldorf, all the interested parties
sat together and came up with a joint aproach that would work for both
distro vendors. Steven Rostedt took notes [4] from this meeting.
And the foundation for that aproach is what's present in this pull
request.
It provides a basic infrastructure for function "live patching" (i.e.
code redirection), including API for kernel modules containing the
actual patches, and API/ABI for userspace to be able to operate on the
patches (look up what patches are applied, enable/disable them, etc).
It's relatively simple and minimalistic, as it's making use of
existing kernel infrastructure (namely ftrace) as much as possible.
It's also self-contained, in a sense that it doesn't hook itself in
any other kernel subsystem (it doesn't even touch any other code).
It's now implemented for x86 only as a reference architecture, but
support for powerpc, s390 and arm is already in the works (adding
arch-specific support basically boils down to teaching ftrace about
regs-saving).
Once this common infrastructure gets merged, both Red Hat and SUSE
have agreed to immediately start porting their current solutions on
top of this, abandoning their out-of-tree code. The plan basically is
that each patch will be marked by flag(s) that would indicate which
consistency model it is willing to use (again, the details have been
sketched out already in the thread at [3]).
Before this happens, the current codebase can be used to patch a large
group of secruity/stability problems the patches for which are not too
complex (in a sense that they don't introduce non-trivial change of
function's return value semantics, they don't change layout of data
structures, etc) -- this corresponds to LEAVE_FUNCTION &&
SWITCH_FUNCTION semantics described at [3].
This tree has been in linux-next since December.
[1] https://lkml.org/lkml/2014/4/30/477
[2] https://lkml.org/lkml/2014/7/14/857
[3] https://lkml.org/lkml/2014/11/7/354
[4] http://linuxplumbersconf.org/2014/wp-content/uploads/2014/10/LPC2014_LivePatching.txt
[ The core code is introduced by the three commits authored by Seth
Jennings, which got a lot of changes incorporated during numerous
respins and reviews of the initial implementation. All the followup
commits have materialized only after public tree has been created,
so they were not folded into initial three commits so that the
public tree doesn't get rebased ]"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
livepatch: add missing newline to error message
livepatch: rename config to CONFIG_LIVEPATCH
livepatch: fix uninitialized return value
livepatch: support for repatching a function
livepatch: enforce patch stacking semantics
livepatch: change ARCH_HAVE_LIVE_PATCHING to HAVE_LIVE_PATCHING
livepatch: fix deferred module patching order
livepatch: handle ancient compilers with more grace
livepatch: kconfig: use bool instead of boolean
livepatch: samples: fix usage example comments
livepatch: MAINTAINERS: add git tree location
livepatch: use FTRACE_OPS_FL_IPMODIFY
livepatch: move x86 specific ftrace handler code to arch/x86
livepatch: samples: add sample live patching module
livepatch: kernel: add support for live patching
livepatch: kernel: add TAINT_LIVEPATCH
|
|
Merge misc updates from Andrew Morton:
"Bite-sized chunks this time, to avoid the MTA ratelimiting woes.
- fs/notify updates
- ocfs2
- some of MM"
That laconic "some MM" is mainly the removal of remap_file_pages(),
which is a big simplification of the VM, and which gets rid of a *lot*
of random cruft and special cases because we no longer support the
non-linear mappings that it used.
From a user interface perspective, nothing has changed, because the
remap_file_pages() syscall still exists, it's just done by emulating the
old behavior by creating a lot of individual small mappings instead of
one non-linear one.
The emulation is slower than the old "native" non-linear mappings, but
nobody really uses or cares about remap_file_pages(), and simplifying
the VM is a big advantage.
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (78 commits)
memcg: zap memcg_slab_caches and memcg_slab_mutex
memcg: zap memcg_name argument of memcg_create_kmem_cache
memcg: zap __memcg_{charge,uncharge}_slab
mm/page_alloc.c: place zone_id check before VM_BUG_ON_PAGE check
mm: hugetlb: fix type of hugetlb_treat_as_movable variable
mm, hugetlb: remove unnecessary lower bound on sysctl handlers"?
mm: memory: merge shared-writable dirtying branches in do_wp_page()
mm: memory: remove ->vm_file check on shared writable vmas
xtensa: drop _PAGE_FILE and pte_file()-related helpers
x86: drop _PAGE_FILE and pte_file()-related helpers
unicore32: drop pte_file()-related helpers
um: drop _PAGE_FILE and pte_file()-related helpers
tile: drop pte_file()-related helpers
sparc: drop pte_file()-related helpers
sh: drop _PAGE_FILE and pte_file()-related helpers
score: drop _PAGE_FILE and pte_file()-related helpers
s390: drop pte_file()-related helpers
parisc: drop _PAGE_FILE and pte_file()-related helpers
openrisc: drop _PAGE_FILE and pte_file()-related helpers
nios2: drop _PAGE_FILE and pte_file()-related helpers
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs
Pull xfs update from Dave Chinner:
"This update contains:
- RENAME_EXCHANGE support
- Rework of the superblock logging infrastructure
- Rework of the XFS_IOCTL_SETXATTR implementation
* enables use inside user namespaces
* fixes inconsistencies setting extent size hints
- fixes for missing buffer type annotations used in log recovery
- more consolidation of libxfs headers
- preparation patches for block based PNFS support
- miscellaneous bug fixes and cleanups"
* tag 'xfs-for-linus-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (37 commits)
xfs: only trace buffer items if they exist
xfs: report proper f_files in statfs if we overshoot imaxpct
xfs: fix panic_mask documentation
xfs: xfs_ioctl_setattr_check_projid can be static
xfs: growfs should use synchronous transactions
xfs: fix behaviour of XFS_IOC_FSSETXATTR on directories
xfs: factor projid hint checking out of xfs_ioctl_setattr
xfs: factor extsize hint checking out of xfs_ioctl_setattr
xfs: XFS_IOCTL_SETXATTR can run in user namespaces
xfs: kill xfs_ioctl_setattr behaviour mask
xfs: disaggregate xfs_ioctl_setattr
xfs: factor out xfs_ioctl_setattr transaciton preamble
xfs: separate xflags from xfs_ioctl_setattr
xfs: FSX_NONBLOCK is not used
xfs: don't allocate an ioend for direct I/O completions
xfs: change kmem_free to use generic kvfree()
xfs: factor out a xfs_update_prealloc_flags() helper
xfs: remove incorrect error negation in attr_multi ioctl
xfs: set superblock buffer type correctly
xfs: set buf types when converting extent formats
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management updates from Rafael Wysocki:
"We have a few new features this time, including a new SFI-based
cpufreq driver, a new devfreq driver for Tegra Activity Monitor, a new
devfreq class for providing its governors with raw utilization data
and a new ACPI driver for AMD SoCs.
Still, the majority of changes here are reworks of existing code to
make it more straightforward or to prepare it for implementing new
features on top of it. The primary example is the rework of ACPI
resources handling from Jiang Liu, Thomas Gleixner and Lv Zheng with
support for IOAPIC hotplug implemented on top of it, but there is
quite a number of changes of this kind in the cpufreq core, ACPICA,
ACPI EC driver, ACPI processor driver and the generic power domains
core code too.
The most active developer is Viresh Kumar with his cpufreq changes.
Specifics:
- Rework of the core ACPI resources parsing code to fix issues in it
and make using resource offsets more convenient and consolidation
of some resource-handing code in a couple of places that have grown
analagous data structures and code to cover the the same gap in the
core (Jiang Liu, Thomas Gleixner, Lv Zheng).
- ACPI-based IOAPIC hotplug support on top of the resources handling
rework (Jiang Liu, Yinghai Lu).
- ACPICA update to upstream release 20150204 including an interrupt
handling rework that allows drivers to install raw handlers for
ACPI GPEs which then become entirely responsible for the given GPE
and the ACPICA core code won't touch it (Lv Zheng, David E Box,
Octavian Purdila).
- ACPI EC driver rework to fix several concurrency issues and other
problems related to events handling on top of the ACPICA's new
support for raw GPE handlers (Lv Zheng).
- New ACPI driver for AMD SoCs analogous to the LPSS (Low-Power
Subsystem) driver for Intel chips (Ken Xue).
- Two minor fixes of the ACPI LPSS driver (Heikki Krogerus, Jarkko
Nikula).
- Two new blacklist entries for machines (Samsung 730U3E/740U3E and
510R) where the native backlight interface doesn't work correctly
while the ACPI one does (Hans de Goede).
- Rework of the ACPI processor driver's handling of idle states to
make the code more straightforward and less bloated overall (Rafael
J Wysocki).
- Assorted minor fixes related to ACPI and SFI (Andreas Ruprecht,
Andy Shevchenko, Hanjun Guo, Jan Beulich, Rafael J Wysocki, Yaowei
Bai).
- PCI core power management modification to avoid resuming (some)
runtime-suspended devices during system suspend if they are in the
right states already (Rafael J Wysocki).
- New SFI-based cpufreq driver for Intel platforms using SFI
(Srinidhi Kasagar).
- cpufreq core fixes, cleanups and simplifications (Viresh Kumar,
Doug Anderson, Wolfram Sang).
- SkyLake CPU support and other updates for the intel_pstate driver
(Kristen Carlson Accardi, Srinivas Pandruvada).
- cpufreq-dt driver cleanup (Markus Elfring).
- Init fix for the ARM big.LITTLE cpuidle driver (Sudeep Holla).
- Generic power domains core code fixes and cleanups (Ulf Hansson).
- Operating Performance Points (OPP) core code cleanups and kernel
documentation update (Nishanth Menon).
- New dabugfs interface to make the list of PM QoS constraints
available to user space (Nishanth Menon).
- New devfreq driver for Tegra Activity Monitor (Tomeu Vizoso).
- New devfreq class (devfreq_event) to provide raw utilization data
to devfreq governors (Chanwoo Choi).
- Assorted minor fixes and cleanups related to power management
(Andreas Ruprecht, Krzysztof Kozlowski, Rickard Strandqvist, Pavel
Machek, Todd E Brandt, Wonhong Kwon).
- turbostat updates (Len Brown) and cpupower Makefile improvement
(Sriram Raghunathan)"
* tag 'pm+acpi-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (151 commits)
tools/power turbostat: relax dependency on APERF_MSR
tools/power turbostat: relax dependency on invariant TSC
Merge branch 'pci/host-generic' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci into acpi-resources
tools/power turbostat: decode MSR_*_PERF_LIMIT_REASONS
tools/power turbostat: relax dependency on root permission
ACPI / video: Add disable_native_backlight quirk for Samsung 510R
ACPI / PM: Remove unneeded nested #ifdef
USB / PM: Remove unneeded #ifdef and associated dead code
intel_pstate: provide option to only use intel_pstate with HWP
ACPI / EC: Add GPE reference counting debugging messages
ACPI / EC: Add query flushing support
ACPI / EC: Refine command storm prevention support
ACPI / EC: Add command flushing support.
ACPI / EC: Introduce STARTED/STOPPED flags to replace BLOCKED flag
ACPI: add AMD ACPI2Platform device support for x86 system
ACPI / table: remove duplicate NULL check for the handler of acpi_table_parse()
ACPI / EC: Update revision due to raw handler mode.
ACPI / EC: Reduce ec_poll() by referencing the last register access timestamp.
ACPI / EC: Fix several GPE handling issues by deploying ACPI_GPE_DISPATCH_RAW_HANDLER mode.
ACPICA: Events: Enable APIs to allow interrupt/polling adaptive request based GPE handling model
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI changes from Bjorn Helgaas:
"Enumeration
- Move domain assignment from arm64 to generic code (Lorenzo Pieralisi)
- ARM: Remove artificial dependency on pci_sys_data domain (Lorenzo Pieralisi)
- ARM: Move to generic PCI domains (Lorenzo Pieralisi)
- Generate uppercase hex for modalias var in uevent (Ricardo Ribalda Delgado)
- Add and use generic config accessors on ARM, PowerPC (Rob Herring)
Resource management
- Free resources on failure in of_pci_get_host_bridge_resources() (Lorenzo Pieralisi)
- Fix infinite loop with ROM image of size 0 (Michel Dänzer)
PCI device hotplug
- Handle surprise add even if surprise removal isn't supported (Bjorn Helgaas)
Virtualization
- Mark AMD/ATI VGA devices that don't reset on D3hot->D0 transition (Alex Williamson)
- Add DMA alias quirk for Adaptec 3405 (Alex Williamson)
- Add Wellsburg (X99) to Intel PCH root port ACS quirk (Alex Williamson)
- Add ACS quirk for Emulex NICs (Vasundhara Volam)
MSI
- Fail MSI-X mappings if there's no space assigned to MSI-X BAR (Yijing Wang)
Freescale Layerscape host bridge driver
- Fix platform_no_drv_owner.cocci warnings (Julia Lawall)
NVIDIA Tegra host bridge driver
- Remove unnecessary tegra_pcie_fixup_bridge() (Lucas Stach)
Renesas R-Car host bridge driver
- Fix error handling of irq_of_parse_and_map() (Dmitry Torokhov)
TI Keystone host bridge driver
- Fix error handling of irq_of_parse_and_map() (Dmitry Torokhov)
- Fix misspelling of current function in debug output (Julia Lawall)
Xilinx AXI host bridge driver
- Fix harmless format string warning (Arnd Bergmann)
Miscellaneous
- Use standard parsing functions for ASPM sysfs setters (Chris J Arges)
- Add pci_device_to_OF_node() stub for !CONFIG_OF (Kevin Hao)
- Delete unnecessary NULL pointer checks (Markus Elfring)
- Add and use defines for PCIe Max_Read_Request_Size (Rafał Miłecki)
- Include clk.h instead of clk-private.h (Stephen Boyd)"
* tag 'pci-v3.20-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (48 commits)
PCI: Add pci_device_to_OF_node() stub for !CONFIG_OF
PCI: xilinx: Convert to use generic config accessors
PCI: xgene: Convert to use generic config accessors
PCI: tegra: Convert to use generic config accessors
PCI: rcar: Convert to use generic config accessors
PCI: generic: Convert to use generic config accessors
powerpc/powermac: Convert PCI to use generic config accessors
powerpc/fsl_pci: Convert PCI to use generic config accessors
ARM: ks8695: Convert PCI to use generic config accessors
ARM: sa1100: Convert PCI to use generic config accessors
ARM: integrator: Convert PCI to use generic config accessors
PCI: versatile: Add DT-based ARM Versatile PB PCIe host driver
ARM: dts: versatile: add PCI controller binding
of/pci: Free resources on failure in of_pci_get_host_bridge_resources()
PCI: versatile: Add DT docs for ARM Versatile PB PCIe driver
PCI: Fail MSI-X mappings if there's no space assigned to MSI-X BAR
r8169: use PCI define for Max_Read_Request_Size
[SCSI] esas2r: use PCI define for Max_Read_Request_Size
tile: use PCI define for Max_Read_Request_Size
rapidio/tsi721: use PCI define for Max_Read_Request_Size
...
|
|
We don't create non-linear mappings anymore. Let's drop code which
handles them in rmap.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
remap_file_pages(2) was invented to be able efficiently map parts of
huge file into limited 32-bit virtual address space such as in database
workloads.
Nonlinear mappings are pain to support and it seems there's no
legitimate use-cases nowadays since 64-bit systems are widely available.
Let's drop it and get rid of all these special-cased code.
The patch replaces the syscall with emulation which creates new VMA on
each remap_file_pages(), unless they it can be merged with an adjacent
one.
I didn't find *any* real code that uses remap_file_pages(2) to test
emulation impact on. I've checked Debian code search and source of all
packages in ALT Linux. No real users: libc wrappers, mentions in
strace, gdb, valgrind and this kind of stuff.
There are few basic tests in LTP for the syscall. They work just fine
with emulation.
To test performance impact, I've written small test case which
demonstrate pretty much worst case scenario: map 4G shmfs file, write to
begin of every page pgoff of the page, remap pages in reverse order,
read every page.
The test creates 1 million of VMAs if emulation is in use, so I had to
set vm.max_map_count to 1100000 to avoid -ENOMEM.
Before: 23.3 ( +- 4.31% ) seconds
After: 43.9 ( +- 0.85% ) seconds
Slowdown: 1.88x
I believe we can live with that.
Test case:
#define _GNU_SOURCE
#include <assert.h>
#include <stdlib.h>
#include <stdio.h>
#include <sys/mman.h>
#define MB (1024UL * 1024)
#define SIZE (4096 * MB)
int main(int argc, char **argv)
{
unsigned long *p;
long i, pass;
for (pass = 0; pass < 10; pass++) {
p = mmap(NULL, SIZE, PROT_READ|PROT_WRITE,
MAP_SHARED | MAP_ANONYMOUS, -1, 0);
if (p == MAP_FAILED) {
perror("mmap");
return -1;
}
for (i = 0; i < SIZE / 4096; i++)
p[i * 4096 / sizeof(*p)] = i;
for (i = 0; i < SIZE / 4096; i++) {
if (remap_file_pages(p + i * 4096 / sizeof(*p), 4096,
0, (SIZE - 4096 * (i + 1)) >> 12, 0)) {
perror("remap_file_pages");
return -1;
}
}
for (i = SIZE / 4096 - 1; i >= 0; i--)
assert(p[i * 4096 / sizeof(*p)] == SIZE / 4096 - i - 1);
munmap(p, SIZE);
}
return 0;
}
[akpm@linux-foundation.org: fix spello]
[sasha.levin@oracle.com: initialize populate before usage]
[sasha.levin@oracle.com: grab file ref to prevent race while mmaping]
Signed-off-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Armin Rigo <arigo@tunes.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
__generic_block_fiemap may spin very long time for large sparse files.
Without this patch an unprivileged user may abuse system resources simply
by spawning a vast number of unkilable busyloops (works on ext2/ext3):
truncate --size 1T test
for ((i=0;i<1024;i++))
do
filefrag test > /dev/null &
done
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Add a mount option to support JBD2 feature:
JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT. When this feature is opened, journal
commit block can be written to disk without waiting for descriptor blocks,
which can improve journal commit performance. This option will enable
'journal_checksum' internally.
Using the fs_mark benchmark, using journal_async_commit shows a 50%
improvement, the files per second go up from 215.2 to 317.5.
test script:
fs_mark -d /mnt/ocfs2/ -s 10240 -n 1000
default:
FSUse% Count Size Files/sec App Overhead
0 1000 10240 215.2 17878
with journal_async_commit option:
FSUse% Count Size Files/sec App Overhead
0 1000 10240 317.5 17881
Signed-off-by: Alex Chen <alex.chen@huawei.com>
Signed-off-by: Weiwei Wang <wangww631@huawei.comm>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The inotify interface has changed a lot. The user interface was too
old, and the kernel interface was removed by Eric Paris in commit:
2dfc1ca inotify: remove inotify in kernel interface.
Signed-off-by: Zhang Zhen <zhenzhang.zhang@huawei.com>
Cc: Wang Kai <morgan.wang@huawei.com>
Cc: Eric Paris <eparis@parisplace.org>
Cc: Robert Love <robert.w.love@intel.com>
Cc: John McCutchan <john@johnmccutchan.com>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Prepare first round of input updates for 3.20.
|
|
* pm-cpufreq: (46 commits)
intel_pstate: provide option to only use intel_pstate with HWP
cpufreq-dt: Drop unnecessary check before cpufreq_cooling_unregister() invocation
cpufreq: Create for_each_governor()
cpufreq: Create for_each_policy()
cpufreq: Drop cpufreq_disabled() check from cpufreq_cpu_{get|put}()
cpufreq: Set cpufreq_cpu_data to NULL before putting kobject
intel_pstate: honor user space min_perf_pct override on resume
intel_pstate: respect cpufreq policy request
intel_pstate: Add num_pstates to sysfs
intel_pstate: expose turbo range to sysfs
intel_pstate: Add support for SkyLake
cpufreq: stats: drop unnecessary locking
cpufreq: stats: don't update stats on false notifiers
cpufreq: stats: don't update stats from show_trans_table()
cpufreq: stats: time_in_state can't be NULL in cpufreq_stats_update()
cpufreq: stats: create sysfs group once we are ready
cpufreq: remove CPUFREQ_UPDATE_POLICY_CPU notifications
cpufreq: stats: drop 'cpu' field of struct cpufreq_stats
cpufreq: Remove (now) unused 'last_cpu' from struct cpufreq_policy
cpufreq: stats: rename 'struct cpufreq_stats' objects as 'stats'
...
|
|
* pm-sleep:
PM / hibernate: exclude freed pages from allocated pages printout
PM / sleep: export suspend_resume trace event
PM / sleep: Mention async suspend in PM_TRACE documentation
PM / hibernate: Remove unused function
* pm-runtime:
ACPI / PM: Remove unneeded nested #ifdef
USB / PM: Remove unneeded #ifdef and associated dead code
|
|
* pm-qos:
PM / QoS: Use lockdep asserts to find missing hold of power.lock
PM / QoS: Add debugfs support to view the list of constraints
* pm-opp:
PM / OPP: Assert RCU lock in exported functions
PM / OPP: Update kernel documentation
PM / OPP: Ensure consistent naming of static functions
PM / OPP: export dev_pm_opp_get_notifier
* pm-devfreq:
PM / devfreq: event: Add documentation for exynos-ppmu devfreq-event driver
devfreq: Fix build break of devfreq-event class
PM / devfreq: event: Add devfreq_event class
PM / devfreq: tegra: add devfreq driver for Tegra Activity Monitor
|
|
* acpi-doc:
MAINTAINERS / ACPI: add the necessary '/' according to entry rules
ACPI / Documentation: add a missing '='
* acpi-pm:
ACPI / sleep: mark acpi_sleep_dmi_check() __init
* acpi-pcc:
ACPI / PCC: Use pr_debug() for debug messages in pcc_init()
* acpi-tables:
ACPI / table: remove duplicate NULL check for the handler of acpi_table_parse()
|
|
|
|
into for-next
|
|
As already demonstrated with PCI [1] and the platform bus [2], a
driver_override property in sysfs can be used to bypass the id
matching of a device to a AMBA driver. This can be used by VFIO to
bind to any AMBA device requested by the user.
[1] http://lists-archives.com/linux-kernel/28030441-pci-introduce-new-device-binding-path-using-pci_dev-driver_override.html
[2] https://www.redhat.com/archives/libvir-list/2014-April/msg00382.html
Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
Reviewed-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata
Pull libata changes from Tejun Heo:
"Mostly driver-specific changes. Nothing too noteworthy.
This pull request contains three merges from for-3.19-fixes. The
first two are to pull ahci_xgene and sata_dwc_460ex fix commits which
are depended upon by later changes. The last one is to pull in a fix
patch which missed the v3.19-rc window"
* 'for-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: (24 commits)
ahci_xgene: Fix the dma state machine lockup for the ATA_CMD_SMART PIO mode command.
ata: libahci: Use of_platform_device_create only if supported
sata_mv: Delete unnecessary checks before the function call "phy_power_off"
ata: Delete unnecessary checks before the function call "pci_dev_put"
ata: pata_platform: fix owner module reference mismatch for scsi host
ata: ahci_platform: fix owner module reference mismatch for scsi host
pata_pdc2027x: Use 64-bit timekeeping
ata: libahci: Fix devres cleanup on failure
ata: libahci: Allow using multiple regulators
Documentation: bindings: Add the regulator property to the sub-nodes AHCI bindings
ata: libahci: Clean-up the ahci_platform_en/disable_phys functions
sata_rcar: extend PM methods
sata_dwc_460ex: disable compilation on ARM and ARM64
ata: libata-core: Remove unused function
sata_dwc_460ex: convert to devm_kzalloc in ->probe()
sata_dwc_460ex: remove extra message
sata_dwc_460ex: use np local variable in ->probe()
sata_dwc_460ex: fix most of the sparse warnings
sata_dwc_460ex: enable COMPILE_TEST for the driver
sata_dwc_460ex: remove redundant dev_set_drvdata
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup changes from Tejun Heo:
"Three mostly trivial patches. The biggest change is that blkio is now
initialized before memcg which will be needed to make memcg and blkcg
cooperate on writeback IOs"
* 'for-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: add dummy css_put() for !CONFIG_CGROUPS
cgroup: reorder SUBSYS(blkio) in cgroup_subsys.h
Update of Documentation/cgroups/00-INDEX
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm changes from Ingo Molnar:
"The main changes in this cycle were the x86/entry and sysret
enhancements from Andy Lutomirski, see merge commits 772a9aca125 and
b57c0b5175dd for details"
[ Exectutive summary: IST exceptions that interrupt user space will run
on the regular kernel stack instead of the IST stack. Which
simplifies things particularly on return to user space.
The sysret cleanup ends up simplifying the logic on when we can use
sysret vs when we have to use iret. - Linus ]
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86_64, entry: Remove the syscall exit audit and schedule optimizations
x86_64, entry: Use sysret to return to userspace when possible
x86, traps: Fix ist_enter from userspace
x86, vdso: teach 'make clean' remove vdso64 binaries
x86_64 entry: Fix RCX for ptraced syscalls
x86: entry_64.S: fold SAVE_ARGS_IRQ macro into its sole user
x86: ia32entry.S: fix wrong symbolic constant usage: R11->ARGOFFSET
x86: entry_64.S: delete unused code
x86, mce: Get rid of TIF_MCE_NOTIFY and associated mce tricks
x86, traps: Add ist_begin_non_atomic and ist_end_non_atomic
x86: Clean up current_stack_pointer
x86, traps: Track entry into and exit from IST context
x86, entry: Switch stacks on a paranoid entry from userspace
|