Age | Commit message (Collapse) | Author | Files | Lines |
|
An upcoming patch changes and complicates the refcounting and especially
the "put page" aspects of it. In order to keep everything clean,
refactor the devmap page release routines:
* Rename put_devmap_managed_page() to page_is_devmap_managed(), and
limit the functionality to "read only": return a bool, with no side
effects.
* Add a new routine, put_devmap_managed_page(), to handle decrementing
the refcount for ZONE_DEVICE pages.
* Change callers (just release_pages() and put_page()) to check
page_is_devmap_managed() before calling the new
put_devmap_managed_page() routine. This is a performance point:
put_page() is a hot path, so we need to avoid non- inline function calls
where possible.
* Rename __put_devmap_managed_page() to free_devmap_managed_page(), and
limit the functionality to unconditionally freeing a devmap page.
This is originally based on a separate patch by Ira Weiny, which applied
to an early version of the put_user_page() experiments. Since then,
Jérôme Glisse suggested the refactoring described above.
Link: http://lkml.kernel.org/r/20200107224558.2362728-5-jhubbard@nvidia.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Suggested-by: Jérôme Glisse <jglisse@redhat.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Björn Töpel <bjorn.topel@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
At some point filemap_write_and_wait() and
filemap_write_and_wait_range() got the exact same implementation with
the exception of the range being specified in *_range()
Similar to other functions in fs.h which call *_range(..., 0,
LLONG_MAX), change filemap_write_and_wait() to be a static inline which
calls filemap_write_and_wait_range()
Link: http://lkml.kernel.org/r/20191129160713.30892-1-ira.weiny@intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
There are users already and will be more of BITS_TO_BYTES() macro. Move
it to bitops.h for wider use.
In the case of ocfs2 the replacement is identical.
As for bnx2x, there are two places where floor version is used. In the
first case to calculate the amount of structures that can fit one memory
page. In this case obviously the ceiling variant is correct and
original code might have a potential bug, if amount of bits % 8 is not
0. In the second case the macro is used to calculate bytes transmitted
in one microsecond. This will work for all speeds which is multiply of
1Gbps without any change, for the rest new code will give ceiling value,
for instance 100Mbps will give 13 bytes, while old code gives 12 bytes
and the arithmetically correct one is 12.5 bytes. Further the value is
used to setup timer threshold which in any case has its own margins due
to certain resolution. I don't see here an issue with slightly shifting
thresholds for low speed connections, the card is supposed to utilize
highest available rate, which is usually 10Gbps.
Link: http://lkml.kernel.org/r/20200108121316.22411-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Without memcg, there is a one-to-one mapping between the bdi and
bdi_writeback structures. In this world, things are fairly
straightforward; the first thing bdi_unregister() does is to shutdown
the bdi_writeback structure (or wb), and part of that writeback ensures
that no other work queued against the wb, and that the wb is fully
drained.
With memcg, however, there is a one-to-many relationship between the bdi
and bdi_writeback structures; that is, there are multiple wb objects
which can all point to a single bdi. There is a refcount which prevents
the bdi object from being released (and hence, unregistered). So in
theory, the bdi_unregister() *should* only get called once its refcount
goes to zero (bdi_put will drop the refcount, and when it is zero,
release_bdi gets called, which calls bdi_unregister).
Unfortunately, del_gendisk() in block/gen_hd.c never got the memo about
the Brave New memcg World, and calls bdi_unregister directly. It does
this without informing the file system, or the memcg code, or anything
else. This causes the root wb associated with the bdi to be
unregistered, but none of the memcg-specific wb's are shutdown. So when
one of these wb's are woken up to do delayed work, they try to
dereference their wb->bdi->dev to fetch the device name, but
unfortunately bdi->dev is now NULL, thanks to the bdi_unregister()
called by del_gendisk(). As a result, *boom*.
Fortunately, it looks like the rest of the writeback path is perfectly
happy with bdi->dev and bdi->owner being NULL, so the simplest fix is to
create a bdi_dev_name() function which can handle bdi->dev being NULL.
This also allows us to bulletproof the writeback tracepoints to prevent
them from dereferencing a NULL pointer and crashing the kernel if one is
tracing with memcg's enabled, and an iSCSI device dies or a USB storage
stick is pulled.
The most common way of triggering this will be hotremoval of a device
while writeback with memcg enabled is going on. It was triggering
several times a day in a heavily loaded production environment.
Google Bug Id: 145475544
Link: https://lore.kernel.org/r/20191227194829.150110-1-tytso@mit.edu
Link: http://lkml.kernel.org/r/20191228005211.163952-1-tytso@mit.edu
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: Chris Mason <clm@fb.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull mmu_notifier updates from Jason Gunthorpe:
"This small series revises the names in mmu_notifier to make the code
clearer and more readable"
* tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
mm/mmu_notifiers: Use 'interval_sub' as the variable for mmu_interval_notifier
mm/mmu_notifiers: Use 'subscription' as the variable name for mmu_notifier
mm/mmu_notifier: Rename struct mmu_notifier_mm to mmu_notifier_subscriptions
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull thread management updates from Christian Brauner:
"Sargun Dhillon over the last cycle has worked on the pidfd_getfd()
syscall.
This syscall allows for the retrieval of file descriptors of a process
based on its pidfd. A task needs to have ptrace_may_access()
permissions with PTRACE_MODE_ATTACH_REALCREDS (suggested by Oleg and
Andy) on the target.
One of the main use-cases is in combination with seccomp's user
notification feature. As a reminder, seccomp's user notification
feature was made available in v5.0. It allows a task to retrieve a
file descriptor for its seccomp filter. The file descriptor is usually
handed of to a more privileged supervising process. The supervisor can
then listen for syscall events caught by the seccomp filter of the
supervisee and perform actions in lieu of the supervisee, usually
emulating syscalls. pidfd_getfd() is needed to expand its uses.
There are currently two major users that wait on pidfd_getfd() and one
future user:
- Netflix, Sargun said, is working on a service mesh where users
should be able to connect to a dns-based VIP. When a user connects
to e.g. 1.2.3.4:80 that runs e.g. service "foo" they will be
redirected to an envoy process. This service mesh uses seccomp user
notifications and pidfd to intercept all connect calls and instead
of connecting them to 1.2.3.4:80 connects them to e.g.
127.0.0.1:8080.
- LXD uses the seccomp notifier heavily to intercept and emulate
mknod() and mount() syscalls for unprivileged containers/processes.
With pidfd_getfd() more uses-cases e.g. bridging socket connections
will be possible.
- The patchset has also seen some interest from the browser corner.
Right now, Firefox is using a SECCOMP_RET_TRAP sandbox managed by a
broker process. In the future glibc will start blocking all signals
during dlopen() rendering this type of sandbox impossible. Hence,
in the future Firefox will switch to a seccomp-user-nofication
based sandbox which also makes use of file descriptor retrieval.
The thread for this can be found at
https://sourceware.org/ml/libc-alpha/2019-12/msg00079.html
With pidfd_getfd() it is e.g. possible to bridge socket connections
for the supervisee (binding to a privileged port) and taking actions
on file descriptors on behalf of the supervisee in general.
Sargun's first version was using an ioctl on pidfds but various people
pushed for it to be a proper syscall which he duely implemented as
well over various review cycles. Selftests are of course included.
I've also added instructions how to deal with merge conflicts below.
There's also a small fix coming from the kernel mentee project to
correctly annotate struct sighand_struct with __rcu to fix various
sparse warnings. We've received a few more such fixes and even though
they are mostly trivial I've decided to postpone them until after -rc1
since they came in rather late and I don't want to risk introducing
build warnings.
Finally, there's a new prctl() command PR_{G,S}ET_IO_FLUSHER which is
needed to avoid allocation recursions triggerable by storage drivers
that have userspace parts that run in the IO path (e.g. dm-multipath,
iscsi, etc). These allocation recursions deadlock the device.
The new prctl() allows such privileged userspace components to avoid
allocation recursions by setting the PF_MEMALLOC_NOIO and
PF_LESS_THROTTLE flags. The patch carries the necessary acks from the
relevant maintainers and is routed here as part of prctl()
thread-management."
* tag 'threads-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
prctl: PR_{G,S}ET_IO_FLUSHER to support controlling memory reclaim
sched.h: Annotate sighand_struct with __rcu
test: Add test for pidfd getfd
arch: wire up pidfd_getfd syscall
pid: Implement pidfd_getfd syscall
vfs, fdtable: Add fget_task helper
|
|
Pull io_uring updates from Jens Axboe:
- Support for various new opcodes (fallocate, openat, close, statx,
fadvise, madvise, openat2, non-vectored read/write, send/recv, and
epoll_ctl)
- Faster ring quiesce for fileset updates
- Optimizations for overflow condition checking
- Support for max-sized clamping
- Support for probing what opcodes are supported
- Support for io-wq backend sharing between "sibling" rings
- Support for registering personalities
- Lots of little fixes and improvements
* tag 'for-5.6/io_uring-vfs-2020-01-29' of git://git.kernel.dk/linux-block: (64 commits)
io_uring: add support for epoll_ctl(2)
eventpoll: support non-blocking do_epoll_ctl() calls
eventpoll: abstract out epoll_ctl() handler
io_uring: fix linked command file table usage
io_uring: support using a registered personality for commands
io_uring: allow registering credentials
io_uring: add io-wq workqueue sharing
io-wq: allow grabbing existing io-wq
io_uring/io-wq: don't use static creds/mm assignments
io-wq: make the io_wq ref counted
io_uring: fix refcounting with batched allocations at OOM
io_uring: add comment for drain_next
io_uring: don't attempt to copy iovec for READ/WRITE
io_uring: honor IOSQE_ASYNC for linked reqs
io_uring: prep req when do IOSQE_ASYNC
io_uring: use labeled array init in io_op_defs
io_uring: optimise sqe-to-req flags translation
io_uring: remove REQ_F_IO_DRAINED
io_uring: file switch work needs to get flushed on exit
io_uring: hide uring_fd in ctx
...
|
|
Pull SCSI updates from James Bottomley:
"This series is slightly unusual because it includes Arnd's compat
ioctl tree here:
1c46a2cf2dbd Merge tag 'block-ioctl-cleanup-5.6' into 5.6/scsi-queue
Excluding Arnd's changes, this is mostly an update of the usual
drivers: megaraid_sas, mpt3sas, qla2xxx, ufs, lpfc, hisi_sas.
There are a couple of core and base updates around error propagation
and atomicity in the attribute container base we use for the SCSI
transport classes.
The rest is minor changes and updates"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (149 commits)
scsi: hisi_sas: Rename hisi_sas_cq.pci_irq_mask
scsi: hisi_sas: Add prints for v3 hw interrupt converge and automatic affinity
scsi: hisi_sas: Modify the file permissions of trigger_dump to write only
scsi: hisi_sas: Replace magic number when handle channel interrupt
scsi: hisi_sas: replace spin_lock_irqsave/spin_unlock_restore with spin_lock/spin_unlock
scsi: hisi_sas: use threaded irq to process CQ interrupts
scsi: ufs: Use UFS device indicated maximum LU number
scsi: ufs: Add max_lu_supported in struct ufs_dev_info
scsi: ufs: Delete is_init_prefetch from struct ufs_hba
scsi: ufs: Inline two functions into their callers
scsi: ufs: Move ufshcd_get_max_pwr_mode() to ufshcd_device_params_init()
scsi: ufs: Split ufshcd_probe_hba() based on its called flow
scsi: ufs: Delete struct ufs_dev_desc
scsi: ufs: Fix ufshcd_probe_hba() reture value in case ufshcd_scsi_add_wlus() fails
scsi: ufs-mediatek: enable low-power mode for hibern8 state
scsi: ufs: export some functions for vendor usage
scsi: ufs-mediatek: add dbg_register_dump implementation
scsi: qla2xxx: Fix a NULL pointer dereference in an error path
scsi: qla1280: Make checking for 64bit support consistent
scsi: megaraid_sas: Update driver version to 07.713.01.00-rc1
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull Kselftest kunit updates from Shuah Khan:
"This kunit update consists of:
- Support for building kunit as a module from Alan Maguire
- AppArmor KUnit tests for policy unpack from Mike Salvatore"
* tag 'linux-kselftest-5.6-rc1-kunit' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: building kunit as a module breaks allmodconfig
kunit: update documentation to describe module-based build
kunit: allow kunit to be loaded as a module
kunit: remove timeout dependence on sysctl_hung_task_timeout_seconds
kunit: allow kunit tests to be loaded as a module
kunit: hide unexported try-catch interface in try-catch-impl.h
kunit: move string-stream.h to lib/kunit
apparmor: add AppArmor KUnit tests for policy unpack
|
|
git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground
Pull y2038 updates from Arnd Bergmann:
"Core, driver and file system changes
These are updates to device drivers and file systems that for some
reason or another were not included in the kernel in the previous
y2038 series.
I've gone through all users of time_t again to make sure the kernel is
in a long-term maintainable state, replacing all remaining references
to time_t with safe alternatives.
Some related parts of the series were picked up into the nfsd, xfs,
alsa and v4l2 trees. A final set of patches in linux-mm removes the
now unused time_t/timeval/timespec types and helper functions after
all five branches are merged for linux-5.6, ensuring that no new users
get merged.
As a result, linux-5.6, or my backport of the patches to 5.4 [1],
should be the first release that can serve as a base for a 32-bit
system designed to run beyond year 2038, with a few remaining caveats:
- All user space must be compiled with a 64-bit time_t, which will be
supported in the coming musl-1.2 and glibc-2.32 releases, along
with installed kernel headers from linux-5.6 or higher.
- Applications that use the system call interfaces directly need to
be ported to use the time64 syscalls added in linux-5.1 in place of
the existing system calls. This impacts most users of futex() and
seccomp() as well as programming languages that have their own
runtime environment not based on libc.
- Applications that use a private copy of kernel uapi header files or
their contents may need to update to the linux-5.6 version, in
particular for sound/asound.h, xfs/xfs_fs.h, linux/input.h,
linux/elfcore.h, linux/sockios.h, linux/timex.h and
linux/can/bcm.h.
- A few remaining interfaces cannot be changed to pass a 64-bit
time_t in a compatible way, so they must be configured to use
CLOCK_MONOTONIC times or (with a y2106 problem) unsigned 32-bit
timestamps. Most importantly this impacts all users of 'struct
input_event'.
- All y2038 problems that are present on 64-bit machines also apply
to 32-bit machines. In particular this affects file systems with
on-disk timestamps using signed 32-bit seconds: ext4 with
ext3-style small inodes, ext2, xfs (to be fixed soon) and ufs"
[1] https://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git/log/?h=y2038-endgame
* tag 'y2038-drivers-for-v5.6-signed' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (21 commits)
Revert "drm/etnaviv: reject timeouts with tv_nsec >= NSEC_PER_SEC"
y2038: sh: remove timeval/timespec usage from headers
y2038: sparc: remove use of struct timex
y2038: rename itimerval to __kernel_old_itimerval
y2038: remove obsolete jiffies conversion functions
nfs: fscache: use timespec64 in inode auxdata
nfs: fix timstamp debug prints
nfs: use time64_t internally
sunrpc: convert to time64_t for expiry
drm/etnaviv: avoid deprecated timespec
drm/etnaviv: reject timeouts with tv_nsec >= NSEC_PER_SEC
drm/msm: avoid using 'timespec'
hfs/hfsplus: use 64-bit inode timestamps
hostfs: pass 64-bit timestamps to/from user space
packet: clarify timestamp overflow
tsacct: add 64-bit btime field
acct: stop using get_seconds()
um: ubd: use 64-bit time_t where possible
xtensa: ISS: avoid struct timeval
dlm: use SO_SNDTIMEO_NEW instead of SO_SNDTIMEO_OLD
...
|
|
This adds IORING_OP_EPOLL_CTL, which can perform the same work as the
epoll_ctl(2) system call.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Also make it available outside of epoll, along with the helper that
decides if we need to copy the passed in epoll_event.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull openat2 support from Al Viro:
"This is the openat2() series from Aleksa Sarai.
I'm afraid that the rest of namei stuff will have to wait - it got
zero review the last time I'd posted #work.namei, and there had been a
leak in the posted series I'd caught only last weekend. I was going to
repost it on Monday, but the window opened and the odds of getting any
review during that... Oh, well.
Anyway, openat2 part should be ready; that _did_ get sane amount of
review and public testing, so here it comes"
From Aleksa's description of the series:
"For a very long time, extending openat(2) with new features has been
incredibly frustrating. This stems from the fact that openat(2) is
possibly the most famous counter-example to the mantra "don't silently
accept garbage from userspace" -- it doesn't check whether unknown
flags are present[1].
This means that (generally) the addition of new flags to openat(2) has
been fraught with backwards-compatibility issues (O_TMPFILE has to be
defined as __O_TMPFILE|O_DIRECTORY|[O_RDWR or O_WRONLY] to ensure old
kernels gave errors, since it's insecure to silently ignore the
flag[2]). All new security-related flags therefore have a tough road
to being added to openat(2).
Furthermore, the need for some sort of control over VFS's path
resolution (to avoid malicious paths resulting in inadvertent
breakouts) has been a very long-standing desire of many userspace
applications.
This patchset is a revival of Al Viro's old AT_NO_JUMPS[3] patchset
(which was a variant of David Drysdale's O_BENEATH patchset[4] which
was a spin-off of the Capsicum project[5]) with a few additions and
changes made based on the previous discussion within [6] as well as
others I felt were useful.
In line with the conclusions of the original discussion of
AT_NO_JUMPS, the flag has been split up into separate flags. However,
instead of being an openat(2) flag it is provided through a new
syscall openat2(2) which provides several other improvements to the
openat(2) interface (see the patch description for more details). The
following new LOOKUP_* flags are added:
LOOKUP_NO_XDEV:
Blocks all mountpoint crossings (upwards, downwards, or through
absolute links). Absolute pathnames alone in openat(2) do not
trigger this. Magic-link traversal which implies a vfsmount jump is
also blocked (though magic-link jumps on the same vfsmount are
permitted).
LOOKUP_NO_MAGICLINKS:
Blocks resolution through /proc/$pid/fd-style links. This is done
by blocking the usage of nd_jump_link() during resolution in a
filesystem. The term "magic-links" is used to match with the only
reference to these links in Documentation/, but I'm happy to change
the name.
It should be noted that this is different to the scope of
~LOOKUP_FOLLOW in that it applies to all path components. However,
you can do openat2(NO_FOLLOW|NO_MAGICLINKS) on a magic-link and it
will *not* fail (assuming that no parent component was a
magic-link), and you will have an fd for the magic-link.
In order to correctly detect magic-links, the introduction of a new
LOOKUP_MAGICLINK_JUMPED state flag was required.
LOOKUP_BENEATH:
Disallows escapes to outside the starting dirfd's
tree, using techniques such as ".." or absolute links. Absolute
paths in openat(2) are also disallowed.
Conceptually this flag is to ensure you "stay below" a certain
point in the filesystem tree -- but this requires some additional
to protect against various races that would allow escape using
"..".
Currently LOOKUP_BENEATH implies LOOKUP_NO_MAGICLINKS, because it
can trivially beam you around the filesystem (breaking the
protection). In future, there might be similar safety checks done
as in LOOKUP_IN_ROOT, but that requires more discussion.
In addition, two new flags are added that expand on the above ideas:
LOOKUP_NO_SYMLINKS:
Does what it says on the tin. No symlink resolution is allowed at
all, including magic-links. Just as with LOOKUP_NO_MAGICLINKS this
can still be used with NOFOLLOW to open an fd for the symlink as
long as no parent path had a symlink component.
LOOKUP_IN_ROOT:
This is an extension of LOOKUP_BENEATH that, rather than blocking
attempts to move past the root, forces all such movements to be
scoped to the starting point. This provides chroot(2)-like
protection but without the cost of a chroot(2) for each filesystem
operation, as well as being safe against race attacks that
chroot(2) is not.
If a race is detected (as with LOOKUP_BENEATH) then an error is
generated, and similar to LOOKUP_BENEATH it is not permitted to
cross magic-links with LOOKUP_IN_ROOT.
The primary need for this is from container runtimes, which
currently need to do symlink scoping in userspace[7] when opening
paths in a potentially malicious container.
There is a long list of CVEs that could have bene mitigated by
having RESOLVE_THIS_ROOT (such as CVE-2017-1002101,
CVE-2017-1002102, CVE-2018-15664, and CVE-2019-5736, just to name a
few).
In order to make all of the above more usable, I'm working on
libpathrs[8] which is a C-friendly library for safe path resolution.
It features a userspace-emulated backend if the kernel doesn't support
openat2(2). Hopefully we can get userspace to switch to using it, and
thus get openat2(2) support for free once it's ready.
Future work would include implementing things like
RESOLVE_NO_AUTOMOUNT and possibly a RESOLVE_NO_REMOTE (to allow
programs to be sure they don't hit DoSes though stale NFS handles)"
* 'work.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
Documentation: path-lookup: include new LOOKUP flags
selftests: add openat2(2) selftests
open: introduce openat2(2) syscall
namei: LOOKUP_{IN_ROOT,BENEATH}: permit limited ".." resolution
namei: LOOKUP_IN_ROOT: chroot-like scoped resolution
namei: LOOKUP_BENEATH: O_BENEATH-like scoped resolution
namei: LOOKUP_NO_XDEV: block mountpoint crossing
namei: LOOKUP_NO_MAGICLINKS: block magic-link resolution
namei: LOOKUP_NO_SYMLINKS: block symlink resolution
namei: allow set_root() to produce errors
namei: allow nd_jump_link() to produce errors
nsfs: clean-up ns_get_path() signature to return int
namei: only return -ECHILD from follow_dotdot_rcu()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver updates from Greg KH:
"Here is the big char/misc/whatever driver changes for 5.6-rc1
Included in here are loads of things from a variety of different
driver subsystems:
- soundwire updates
- binder updates
- nvmem updates
- firmware drivers updates
- extcon driver updates
- various misc driver updates
- fpga driver updates
- interconnect subsystem and driver updates
- bus driver updates
- uio driver updates
- mei driver updates
- w1 driver cleanups
- various other small driver updates
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (86 commits)
mei: me: add jasper point DID
char: hpet: Use flexible-array member
binder: fix log spam for existing debugfs file creation.
mei: me: add comet point (lake) H device ids
nvmem: add QTI SDAM driver
dt-bindings: nvmem: add binding for QTI SPMI SDAM
dt-bindings: imx-ocotp: Add i.MX8MP compatible
dt-bindings: soundwire: fix example
soundwire: cadence: fix kernel-doc parameter descriptions
soundwire: intel: report slave_ids for each link to SOF driver
siox: Use the correct style for SPDX License Identifier
w1: omap-hdq: Simplify driver with PM runtime autosuspend
firmware: stratix10-svc: Remove unneeded semicolon
firmware: google: Probe for a GSMI handler in firmware
firmware: google: Unregister driver_info on failure and exit in gsmi
firmware: google: Release devices before unregistering the bus
slimbus: qcom: add missed clk_disable_unprepare in remove
slimbus: Use the correct style for SPDX License Identifier
slimbus: qcom-ngd-ctrl: Use dma_request_chan() instead dma_request_slave_channel()
dt-bindings: SLIMBus: add slim devices optional properties
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is a small set of changes for 5.6-rc1 for the driver core and
some firmware subsystem changes.
Included in here are:
- device.h splitup like you asked for months ago
- devtmpfs minor cleanups
- firmware core minor changes
- debugfs fix for lockdown mode
- kernfs cleanup fix
- cpu topology minor fix
All of these have been in linux-next for a while with no reported
issues"
* tag 'driver-core-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (22 commits)
firmware: Rename FW_OPT_NOFALLBACK to FW_OPT_NOFALLBACK_SYSFS
devtmpfs: factor out common tail of devtmpfs_{create,delete}_node
devtmpfs: initify a bit
devtmpfs: simplify initialization of mount_dev
devtmpfs: factor out setup part of devtmpfsd()
devtmpfs: fix theoretical stale pointer deref in devtmpfsd()
driver core: platform: fix u32 greater or equal to zero comparison
cpu-topology: Don't error on more than CONFIG_NR_CPUS CPUs in device tree
debugfs: Return -EPERM when locked down
driver core: Print device when resources present in really_probe()
driver core: Fix test_async_driver_probe if NUMA is disabled
driver core: platform: Prevent resouce overflow from causing infinite loops
fs/kernfs/dir.c: Clean code by removing always true condition
component: do not dereference opaque pointer in debugfs
drivers/component: remove modular code
debugfs: Fix warnings when building documentation
device.h: move 'struct driver' stuff out to device/driver.h
device.h: move 'struct class' stuff out to device/class.h
device.h: move 'struct bus' stuff out to device/bus.h
device.h: move dev_printk()-like functions to dev_printk.h
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging and IIO updates from Greg KH:
"Here is the big staging/iio driver patches for 5.6-rc1
Included in here are:
- lots of new IIO drivers and updates for that subsystem
- the usual huge quantity of minor cleanups for staging drivers
- removal of the following staging drivers:
- isdn/avm
- isdn/gigaset
- isdn/hysdn
- octeon-usb
- octeon ethernet
Overall we deleted far more lines than we added, removing over 40k of
old and obsolete driver code.
All of these changes have been in linux-next for a while with no
reported issues"
* tag 'staging-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (353 commits)
staging: most: usb: check for NULL device
staging: next: configfs: fix release link
staging: most: core: fix logging messages
staging: most: core: remove container struct
staging: most: remove struct device core driver
staging: most: core: drop device reference
staging: most: remove device from interface structure
staging: comedi: drivers: fix spelling mistake "to" -> "too"
staging: exfat: remove fs_func struct.
staging: wilc1000: avoid mutex unlock without lock in wilc_wlan_handle_txq()
staging: wilc1000: return zero on success and non-zero on function failure
staging: axis-fifo: replace spinlock with mutex
staging: wilc1000: remove unused code prior to throughput enhancement in SPI
staging: wilc1000: added 'wilc_' prefix for 'struct assoc_resp' name
staging: wilc1000: move firmware API struct's to separate header file
staging: wilc1000: remove use of infinite loop conditions
staging: kpc2000: rename variables with kpc namespace
staging: vt6656: Remove memory buffer from vnt_download_firmware.
staging: vt6656: Just check NEWRSR_DECRYPTOK for RX_FLAG_DECRYPTED.
staging: vt6656: Use vnt_rx_tail struct for tail variables.
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial driver updates from Greg KH:
"Here are the big set of tty and serial driver updates for 5.6-rc1
Included in here are:
- dummy_con cleanups (touches lots of arch code)
- sysrq logic cleanups (touches lots of serial drivers)
- samsung driver fixes (wasn't really being built)
- conmakeshash move to tty subdir out of scripts
- lots of small tty/serial driver updates
All of these have been in linux-next for a while with no reported
issues"
* tag 'tty-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (140 commits)
tty: n_hdlc: Use flexible-array member and struct_size() helper
tty: baudrate: SPARC supports few more baud rates
tty: baudrate: Synchronise baud_table[] and baud_bits[]
tty: serial: meson_uart: Add support for kernel debugger
serial: imx: fix a race condition in receive path
serial: 8250_bcm2835aux: Document struct bcm2835aux_data
serial: 8250_bcm2835aux: Use generic remapping code
serial: 8250_bcm2835aux: Allocate uart_8250_port on stack
serial: 8250_bcm2835aux: Suppress register_port error on -EPROBE_DEFER
serial: 8250_bcm2835aux: Suppress clk_get error on -EPROBE_DEFER
serial: 8250_bcm2835aux: Fix line mismatch on driver unbind
serial_core: Remove unused member in uart_port
vt: Correct comment documenting do_take_over_console()
vt: Delete comment referencing non-existent unbind_con_driver()
arch/xtensa/setup: Drop dummy_con initialization
arch/x86/setup: Drop dummy_con initialization
arch/unicore32/setup: Drop dummy_con initialization
arch/sparc/setup: Drop dummy_con initialization
arch/sh/setup: Drop dummy_con initialization
arch/s390/setup: Drop dummy_con initialization
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB/Thunderbolt/PHY driver updates from Greg KH:
"Here is the big USB and Thunderbolt and PHY driver updates for
5.6-rc1.
With the advent of USB4, "Thunderbolt" has really become USB4, so the
renaming of the Kconfig option and starting to share subsystem code
has begun, hence both subsystems coming in through the same tree here.
PHY driver updates also touched USB drivers, so that is coming in
through here as well.
Major stuff included in here are:
- USB 4 initial support added (i.e. Thunderbolt)
- musb driver updates
- USB gadget driver updates
- PHY driver updates
- USB PHY driver updates
- lots of USB serial stuff fixed up
- USB typec updates
- USB-IP fixes
- lots of other smaller USB driver updates
All of these have been in linux-next for a while now (the usb-serial
tree is already tested in linux-next on its own before merged into
here), with no reported issues"
[ Removed an incorrect compile test enablement for PHY_EXYNOS5250_SATA
that causes configuration warnings - Linus ]
* tag 'usb-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (207 commits)
Doc: ABI: add usb charger uevent
usb: phy: show USB charger type for user
usb: cdns3: fix spelling mistake and rework grammar in text
usb: phy: phy-gpio-vbus-usb: Convert to GPIO descriptors
USB: serial: cyberjack: fix spelling mistake "To" -> "Too"
USB: serial: ir-usb: simplify endpoint check
USB: serial: ir-usb: make set_termios synchronous
USB: serial: ir-usb: fix IrLAP framing
USB: serial: ir-usb: fix link-speed handling
USB: serial: ir-usb: add missing endpoint sanity check
usb: typec: fusb302: fix "op-sink-microwatt" default that was in mW
usb: typec: wcove: fix "op-sink-microwatt" default that was in mW
usb: dwc3: pci: add ID for the Intel Comet Lake -V variant
usb: typec: tcpci: mask event interrupts when remove driver
usb: host: xhci-tegra: set MODULE_FIRMWARE for tegra186
usb: chipidea: add inline for ci_hdrc_host_driver_init if host is not defined
usb: chipidea: handle single role for usb role class
usb: musb: fix spelling mistake: "periperal" -> "peripheral"
phy: ti: j721e-wiz: Fix build error without CONFIG_OF_ADDRESS
USB: usbfs: Always unlink URBs in reverse order
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control updates from Linus Walleij:
"This is the bulk of pin control changes, nothing too exciting about
this.
Some changes hit arch/sh and arch/arm but are well isolated and
acknowledged by the respective arch maintainers.
Core changes:
- Dropped the chained IRQ setup callback into GPIOLIB as we got rid
of the last users of that in this changeset.
New drivers:
- New driver for Ingenic X1830.
- New driver for Freescale i.MX8MP.
Driver enhancements:
- Fix all remaining Intel drivers to pass their IRQ chips along with
the GPIO chips.
- Intel Baytrail allocates its irqchip dynamically.
- Intel Lynxpoint is thoroughly rewritten and modernized.
- Aspeed AST2600 pin muxing and configuration is much improved.
- Qualcomm SC7180 functions are updated and wakeup interrupt map is
provided.
- A whole slew of Renesas SH-PFC cleanups and improvements.
- Fix up the Intel DT bindings to use the generic YAML DT bindings
schema (a first user of this)"
* tag 'pinctrl-v5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (99 commits)
pinctrl: madera: Remove extra blank line
pinctrl: qcom: Don't lock around irq_set_irq_wake()
pinctrl: mvebu: armada-37xx: use use platform api
gpio: Drop the chained IRQ handler assign function
pinctrl: freescale: Add i.MX8MP pinctrl driver support
dt-bindings: imx: Add pinctrl binding doc for i.MX8MP
pinctrl: tigerlake: Tiger Lake uses _HID enumeration
pinctrl: sunrisepoint: Add Coffee Lake-S ACPI ID
pinctrl: iproc: Use platform_get_irq_optional() to avoid error message
pinctrl: dt-bindings: Fix some errors in the lgm and pinmux schema
pinctrl: intel: Pass irqchip when adding gpiochip
pinctrl: intel: Add GPIO <-> pin mapping ranges via callback
pinctrl: baytrail: Replace WARN with dev_info_once when setting direct-irq pin to output
pinctrl: baytrail: Do not clear IRQ flags on direct-irq enabled pins
pinctrl: sunrisepoint: Add missing Interrupt Status register offset
pinctrl: sh-pfc: Split R-Car H3 support in two independent drivers
pinctrl: artpec6: fix __iomem on reg in set
pinctrl: ingenic: Use devm_platform_ioremap_resource()
pinctrl: ingenic: Factorize irq_set_type function
pinctrl: ingenic: Remove duplicated ingenic_chip_info structures
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO updates from Linus Walleij:
"This is the bulk of GPIO changes for the v5.6 kernel cycle.
This is a pretty calm cycle so far, nothing special going on really.
Some more changes will come in from the irqchip and pin control trees.
I also deleted an orphan include file for FMC that was dangling since
subsystem was removed.
Core changes:
- Document the usecases for the kernelspace vs userspace handling of
GPIOs.
- Handle MSI (message signalled interrupts) properly in the core
hierarchical irqdomain code.
- Fix a rare race condition while initializing the descriptor array.
New drivers:
- Xylon LogiCVC GPIO driver.
- WDC934x GPIO controller driver.
Driver improvements:
- Implemented suspend/resume in the Tegra driver.
- MPC8xx edge detection fixup.
- Properly convert ThunderX to use hierarchical irqdomain with
GPIOLIB_IRQCHIP on top of the revert of the previous buggy
switchover. This time it works (hopefully).
Misc:
- Drop a FMC remnant file <linux/ipmi-fru.h>
- A slew of fixes"
* tag 'gpio-v5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (48 commits)
MAINTAINERS: Replace Tien Hock Loh as Altera PIO maintainer
gpiolib: hold gpio devices lock until ->descs array is initialised
gpio: aspeed-sgpio: fixed typos
gpio: mvebu: clear irq in edge cause register before unmask edge irq
gpiolib: Lower verbosity when allocating hierarchy irq
gpiolib: Remove duplicated function gpio_do_set_config()
gpio: Fix the no return statement warning
gpio: wcd934x: Add support to wcd934x gpio controller
gpiolib: remove set but not used variable 'config'
gpio: vx855: fixed a typo
gpio: mockup: sort headers alphabetically
gpio: mockup: update the license tag
gpio: Remove the unused flags
gpiolib: Set lockdep class for hierarchical irq domains
gpio: thunderx: Switch to GPIOLIB_IRQCHIP
gpiolib: Add the support for the msi parent domain
gpiolib: Add support for the irqdomain which doesn't use irq_fwspec as arg
gpio: Add use guidance documentation
dt-bindings: gpio: wcd934x: Add bindings for gpio
gpio: altera: change to platform_get_irq_optional to avoid false-positive error
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
Pull IMA updates from Mimi Zohar:
"Two new features - measuring certificates and querying IMA for a file
hash - and three bug fixes:
- Measuring certificates is like the rest of IMA, based on policy,
but requires loading a custom policy. Certificates loaded onto a
keyring, for example during early boot, before a custom policy has
been loaded, are queued and only processed after loading the custom
policy.
- IMA calculates and caches files hashes. Other kernel subsystems,
and possibly kernel modules, are interested in accessing these
cached file hashes.
The bug fixes prevent classifying a file short read (e.g. shutdown) as
an invalid file signature, add a missing blank when displaying the
securityfs policy rules containing LSM labels, and, lastly, fix the
handling of the IMA policy information for unknown LSM labels"
* 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
IMA: Defined delayed workqueue to free the queued keys
IMA: Call workqueue functions to measure queued keys
IMA: Define workqueue for early boot key measurements
IMA: pre-allocate buffer to hold keyrings string
ima: ima/lsm policy rule loading logic bug fixes
ima: add the ability to query the cached hash of a given file
ima: Add a space after printing LSM rules for readability
IMA: fix measuring asymmetric keys Kconfig
IMA: Read keyrings= option from the IMA policy
IMA: Add support to limit measuring keys
KEYS: Call the IMA hook to measure keys
IMA: Define an IMA hook to measure keys
IMA: Add KEY_CHECK func to measure keys
IMA: Check IMA policy flag
ima: avoid appraise error for hash calc interrupt
|
|
For personalities previously registered via IORING_REGISTER_PERSONALITY,
allow any command to select them. This is done through setting
sqe->personality to the id returned from registration, and then flagging
sqe->flags with IOSQE_PERSONALITY.
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
If an application wants to use a ring with different kinds of
credentials, it can register them upfront. We don't lookup credentials,
the credentials of the task calling IORING_REGISTER_PERSONALITY is used.
An 'id' is returned for the application to use in subsequent personality
support.
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
If IORING_SETUP_ATTACH_WQ is set, it expects wq_fd in io_uring_params to
be a valid io_uring fd io-wq of which will be shared with the newly
created io_uring instance. If the flag is set but it can't share io-wq,
it fails.
This allows creation of "sibling" io_urings, where we prefer to keep the
SQ/CQ private, but want to share the async backend to minimize the amount
of overhead associated with having multiple rings that belong to the same
backend.
Reported-by: Jens Axboe <axboe@kernel.dk>
Reported-by: Daurnimator <quae@daurnimator.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We currently setup the io_wq with a static set of mm and creds. Even for
a single-use io-wq per io_uring, this is suboptimal as we have may have
multiple enters of the ring. For sharing the io-wq backend, it doesn't
work at all.
Switch to passing in the creds and mm when the work item is setup. This
means that async work is no longer deferred to the io_uring mm and creds,
it is done with the current mm and creds.
Flag this behavior with IORING_FEAT_CUR_PERSONALITY, so applications know
they can rely on the current personality (mm and creds) being the same
for direct issue and async issue.
Reviewed-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux
Pull thermal updates from Daniel Lezcano:
- Depromote debug print on the db8500 platform (Linus Walleij)
- Fix compilation warning when compiling with make W=1 (Amit Kucheria)
- Code cleanup and refactoring, regmap conversion and add hwmon support
on Qoriq (Andrey Smirnov)
- Add an idle injection cpu cooling device and its documentation,
rename the cpu_cooling device to cpufreq_cooling device (Daniel
Lezcano)
- Convert unexported functions to static, add the __init annotation in
the thermal-of code and remove the pointless wrapper functions
(Daniel Lezcano)
- Fix register offset for Armada XP and register reset bit
initialization (Zak Hays)
- Enable hwmon on the rockchip (Stefan Schaeckeler)
- Add the thermal sensor for the H6/H5/H3/A64/A83T/R40 sun8i platform
and their device tree bindings, followed by a fix for the ths number
and the sparse warnings (Yangtao Li)
- Code cleansup for the sun8i and hwmon support (Yangtao Li)
- Silent some messages which are misleading given the changes made in
the previous version on generic-adc (Martin Blumenstingl)
- Rename exynos to Exynos (Krzysztof Kozlowski)
- Add the bcm2711 thermal driver with the device tree bindings (Stefan
Wahren)
- Use usleep_range() instead of udelay() as the call is always done in
a sleep-able context (Geert Uytterhoeven)
- Do code cleanup and re-organization to set the scene for a new
process for the brcmstb (Florian Fainelli)
- Fix bindings check issues on brcm (Stefan Wahren)
- Add Jasper Lake support on int340x (Nivedita Swaminathan)
- Add Comet Lake support on intel pch (Gayatri Kammela)
- Fix unmatched pci_release_region() on x86 (Chuhong Yuan)
- Remove temperature boundaries for rcar and rcar3 (Niklas Söderlund)
- Fix return value to -ENODEV when thermal_zone_of_sensor_register() is
called with the of-node is missing (Peter Mamonov)
- Code cleanup, interrupt bouncing, and better support on stm32 (Pascal
Paillet)
* tag 'thermal-v5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux: (66 commits)
thermal: stm32: Fix low threshold interrupt flood
thermal: stm32: Improve temperature computing
thermal: stm32: Handle multiple trip points
thermal: stm32: Disable interrupts at probe
thermal: stm32: Rework sensor mode management
thermal: stm32: Fix icifr register name
thermal: of: Make thermal_zone_of_sensor_register return -ENODEV if a sensor OF node is missing
thermal: rcar_gen3_thermal: Remove temperature bound
thermal: rcar_thermal: Remove temperature bound
thermal: intel: intel_pch_thermal: Add Comet Lake (CML) platform support
thermal: intel: Fix unmatched pci_release_region
thermal: int340x: processor_thermal: Add Jasper Lake support
dt-bindings: brcm,avs-ro-thermal: Fix binding check issues
thermal: brcmstb_thermal: Register different ops per process
thermal: brcmstb_thermal: Restructure interrupt registration
thermal: brcmstb_thermal: Add 16nm process thermal parameters
dt-bindings: thermal: Define BCM7216 thermal sensor compatible
thermal: brcmstb_thermal: Prepare to support a different process
thermal: brcmstb_thermal: Do not use DT coefficients
thermal: rcar_thermal: Use usleep_range() instead of udelay()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound updates from Takashi Iwai:
"As the diffstat shows we've had again a lot of works done for this
cycle: the majority of changes are the continued componentization and
code refactoring in ASoC, the tree-wide PCM API updates and cleanups
and SOF updates while a few ASoC driver updates are seen, too.
Here we go, some highlights:
Core:
- Finally y2038 support landed to ALSA ABI; some ioctls have been
extended and lots of tricks were applied
- Applying the new managed PCM buffer API to all drivers; the API
itself was already merged in 5.5
- The already deprecated dimension support in ALSA control API is
dropped completely now
- Verification of ALSA control elements to catch API misuses
ASoC:
- Further code refactorings and moving things to the component level
- Lots of updates and improvements on SOF / Intel drivers; now
including common HDMI driver and SoundWire support
- New driver support for Ingenic JZ4770, Mediatek MT6660, Qualcomm
WCD934x and WSA881x, and Realtek RT700, RT711, RT715, RT1011,
RT1015 and RT1308
HD-audio:
- Improved ring-buffer communications using waitqueue
- Drop the superfluous buffer preallocation on x86
Others:
- Many code cleanups, mostly constifications over the whole tree
- USB-audio: quirks for MOTU, Corsair Virtuoso, Line6 Helix
- FireWire: code refactoring for oxfw and dice drivers"
* tag 'sound-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (638 commits)
ALSA: usb-audio: add quirks for Line6 Helix devices fw>=2.82
ALSA: hda: Add Clevo W65_67SB the power_save blacklist
ASoC: soc-core: remove null_snd_soc_ops
ASoC: soc-pcm: add soc_rtd_trigger()
ASoC: soc-pcm: add soc_rtd_hw_free()
ASoC: soc-pcm: add soc_rtd_hw_params()
ASoC: soc-pcm: add soc_rtd_prepare()
ASoC: soc-pcm: add soc_rtd_shutdown()
ASoC: soc-pcm: add soc_rtd_startup()
ASoC: rt1015: add rt1015 amplifier driver
ASoC: madera: Correct some kernel doc
ASoC: topology: fix soc_tplg_fe_link_create() - link->dobj initialization order
ASoC: Intel: skl_hda_dsp_common: Fix global-out-of-bounds bug
ASoC: madera: Correct DMIC only input hook ups
ALSA: cs46xx: fix spelling mistake "to" -> "too"
ALSA: hda - Add docking station support for Lenovo Thinkpad T420s
ASoC: Add MediaTek MT6660 Speaker Amp Driver
ASoC: dt-bindings: rt5645: add suppliers
ASoC: max98090: fix deadlock in max98090_dapm_put_enum_double()
ASoC: dapm: add snd_soc_dapm_put_enum_double_locked
...
|
|
Pull networking updates from David Miller:
1) Add WireGuard
2) Add HE and TWT support to ath11k driver, from John Crispin.
3) Add ESP in TCP encapsulation support, from Sabrina Dubroca.
4) Add variable window congestion control to TIPC, from Jon Maloy.
5) Add BCM84881 PHY driver, from Russell King.
6) Start adding netlink support for ethtool operations, from Michal
Kubecek.
7) Add XDP drop and TX action support to ena driver, from Sameeh
Jubran.
8) Add new ipv4 route notifications so that mlxsw driver does not have
to handle identical routes itself. From Ido Schimmel.
9) Add BPF dynamic program extensions, from Alexei Starovoitov.
10) Support RX and TX timestamping in igc, from Vinicius Costa Gomes.
11) Add support for macsec HW offloading, from Antoine Tenart.
12) Add initial support for MPTCP protocol, from Christoph Paasch,
Matthieu Baerts, Florian Westphal, Peter Krystad, and many others.
13) Add Octeontx2 PF support, from Sunil Goutham, Geetha sowjanya, Linu
Cherian, and others.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1469 commits)
net: phy: add default ARCH_BCM_IPROC for MDIO_BCM_IPROC
udp: segment looped gso packets correctly
netem: change mailing list
qed: FW 8.42.2.0 debug features
qed: rt init valid initialization changed
qed: Debug feature: ilt and mdump
qed: FW 8.42.2.0 Add fw overlay feature
qed: FW 8.42.2.0 HSI changes
qed: FW 8.42.2.0 iscsi/fcoe changes
qed: Add abstraction for different hsi values per chip
qed: FW 8.42.2.0 Additional ll2 type
qed: Use dmae to write to widebus registers in fw_funcs
qed: FW 8.42.2.0 Parser offsets modified
qed: FW 8.42.2.0 Queue Manager changes
qed: FW 8.42.2.0 Expose new registers and change windows
qed: FW 8.42.2.0 Internal ram offsets modifications
MAINTAINERS: Add entry for Marvell OcteonTX2 Physical Function driver
Documentation: net: octeontx2: Add RVU HW and drivers overview
octeontx2-pf: ethtool RSS config support
octeontx2-pf: Add basic ethtool support
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
"API:
- Removed CRYPTO_TFM_RES flags
- Extended spawn grabbing to all algorithm types
- Moved hash descsize verification into API code
Algorithms:
- Fixed recursive pcrypt dead-lock
- Added new 32 and 64-bit generic versions of poly1305
- Added cryptogams implementation of x86/poly1305
Drivers:
- Added support for i.MX8M Mini in caam
- Added support for i.MX8M Nano in caam
- Added support for i.MX8M Plus in caam
- Added support for A33 variant of SS in sun4i-ss
- Added TEE support for Raven Ridge in ccp
- Added in-kernel API to submit TEE commands in ccp
- Added AMD-TEE driver
- Added support for BCM2711 in iproc-rng200
- Added support for AES256-GCM based ciphers for chtls
- Added aead support on SEC2 in hisilicon"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (244 commits)
crypto: arm/chacha - fix build failured when kernel mode NEON is disabled
crypto: caam - add support for i.MX8M Plus
crypto: x86/poly1305 - emit does base conversion itself
crypto: hisilicon - fix spelling mistake "disgest" -> "digest"
crypto: chacha20poly1305 - add back missing test vectors and test chunking
crypto: x86/poly1305 - fix .gitignore typo
tee: fix memory allocation failure checks on drv_data and amdtee
crypto: ccree - erase unneeded inline funcs
crypto: ccree - make cc_pm_put_suspend() void
crypto: ccree - split overloaded usage of irq field
crypto: ccree - fix PM race condition
crypto: ccree - fix FDE descriptor sequence
crypto: ccree - cc_do_send_request() is void func
crypto: ccree - fix pm wrongful error reporting
crypto: ccree - turn errors to debug msgs
crypto: ccree - fix AEAD decrypt auth fail
crypto: ccree - fix typo in comment
crypto: ccree - fix typos in error msgs
crypto: atmel-{aes,sha,tdes} - Retire crypto_platform_data
crypto: x86/sha - Eliminate casts on asm implementations
...
|
|
git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt
Pull fsverity updates from Eric Biggers:
- Optimize fs-verity sequential read performance by implementing
readahead of Merkle tree pages. This allows the Merkle tree to be
read in larger chunks.
- Optimize FS_IOC_ENABLE_VERITY performance in the uncached case by
implementing readahead of data pages.
- Allocate the hash requests from a mempool in order to eliminate the
possibility of allocation failures during I/O.
* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
fs-verity: use u64_to_user_ptr()
fs-verity: use mempool for hash requests
fs-verity: implement readahead of Merkle tree pages
fs-verity: implement readahead for FS_IOC_ENABLE_VERITY
|
|
Pull fscrypt updates from Eric Biggers:
- Extend the FS_IOC_ADD_ENCRYPTION_KEY ioctl to allow the raw key to be
provided via a keyring key.
- Prepare for the new dirhash method (SipHash of plaintext name) that
will be used by directories that are both encrypted and casefolded.
- Switch to a new format for "no-key names" that prepares for the new
dirhash method, and also fixes a longstanding bug where multiple
filenames could map to the same no-key name.
- Allow the crypto algorithms used by fscrypt to be built as loadable
modules when the fscrypt-capable filesystems are.
- Optimize fscrypt_zeroout_range().
- Various cleanups.
* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt: (26 commits)
fscrypt: improve format of no-key names
ubifs: allow both hash and disk name to be provided in no-key names
ubifs: don't trigger assertion on invalid no-key filename
fscrypt: clarify what is meant by a per-file key
fscrypt: derive dirhash key for casefolded directories
fscrypt: don't allow v1 policies with casefolding
fscrypt: add "fscrypt_" prefix to fname_encrypt()
fscrypt: don't print name of busy file when removing key
ubifs: use IS_ENCRYPTED() instead of ubifs_crypt_is_encrypted()
fscrypt: document gfp_flags for bounce page allocation
fscrypt: optimize fscrypt_zeroout_range()
fscrypt: remove redundant bi_status check
fscrypt: Allow modular crypto algorithms
fscrypt: include <linux/ioctl.h> in UAPI header
fscrypt: don't check for ENOKEY from fscrypt_get_encryption_info()
fscrypt: remove fscrypt_is_direct_key_policy()
fscrypt: move fscrypt_valid_enc_modes() to policy.c
fscrypt: check for appropriate use of DIRECT_KEY flag earlier
fscrypt: split up fscrypt_supported_policy() by policy version
fscrypt: introduce fscrypt_needs_contents_encryption()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"Features, highlights:
- async discard
- "mount -o discard=async" to enable it
- freed extents are not discarded immediatelly, but grouped
together and trimmed later, with IO rate limiting
- the "sync" mode submits short extents that could have been
ignored completely by the device, for SATA prior to 3.1 the
requests are unqueued and have a big impact on performance
- the actual discard IO requests have been moved out of
transaction commit to a worker thread, improving commit latency
- IO rate and request size can be tuned by sysfs files, for now
enabled only with CONFIG_BTRFS_DEBUG as we might need to
add/delete the files and don't have a stable-ish ABI for
general use, defaults are conservative
- export device state info in sysfs, eg. missing, writeable
- no discard of extents known to be untouched on disk (eg. after
reservation)
- device stats reset is logged with process name and PID that called
the ioctl
Fixes:
- fix missing hole after hole punching and fsync when using NO_HOLES
- writeback: range cyclic mode could miss some dirty pages and lead
to OOM
- two more corner cases for metadata_uuid change after power loss
during the change
- fix infinite loop during fsync after mix of rename operations
Core changes:
- qgroup assign returns ENOTCONN when quotas not enabled, used to
return EINVAL that was confusing
- device closing does not need to allocate memory anymore
- snapshot aware code got removed, disabled for years due to
performance problems, reimplmentation will allow to select wheter
defrag breaks or does not break COW on shared extents
- tree-checker:
- check leaf chunk item size, cross check against number of
stripes
- verify location keys for DIR_ITEM, DIR_INDEX and XATTR items
- new self test for physical -> logical mapping code, used for super
block range exclusion
- assertion helpers/macros updated to avoid objtool "unreachable
code" reports on older compilers or config option combinations"
* tag 'for-5.6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (84 commits)
btrfs: free block groups after free'ing fs trees
btrfs: Fix split-brain handling when changing FSID to metadata uuid
btrfs: Handle another split brain scenario with metadata uuid feature
btrfs: Factor out metadata_uuid code from find_fsid.
btrfs: Call find_fsid from find_fsid_inprogress
Btrfs: fix infinite loop during fsync after rename operations
btrfs: set trans->drity in btrfs_commit_transaction
btrfs: drop log root for dropped roots
btrfs: sysfs, add devid/dev_state kobject and device attributes
btrfs: Refactor btrfs_rmap_block to improve readability
btrfs: Add self-tests for btrfs_rmap_block
btrfs: selftests: Add support for dummy devices
btrfs: Move and unexport btrfs_rmap_block
btrfs: separate definition of assertion failure handlers
btrfs: device stats, log when stats are zeroed
btrfs: fix improper setting of scanned for range cyclic write cache pages
btrfs: safely advance counter when looking up bio csums
btrfs: remove unused member btrfs_device::work
btrfs: remove unnecessary wrapper get_alloc_profile
btrfs: add correction to handle -1 edge case in async discard
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 updates from Ingo Molnar:
"Misc changes:
- Enhance #GP fault printouts by distinguishing between canonical and
non-canonical address faults, and also add KASAN fault decoding.
- Fix/enhance the x86 NMI handler by putting the duration check into
a direct function call instead of an irq_work which we know to be
broken in some cases.
- Clean up do_general_protection() a bit"
* 'x86-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/nmi: Remove irq_work from the long duration NMI handler
x86/traps: Cleanup do_general_protection()
x86/kasan: Print original address on #GP
x86/dumpstack: Introduce die_addr() for die() with #GP fault address
x86/traps: Print address on #GP
x86/insn-eval: Add support for 64-bit kernel mode
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 resource control updates from Ingo Molnar:
"The main change in this tree is the extension of the resctrl procfs
ABI with a new file that helps tooling to navigate from tasks back to
resctrl groups: /proc/{pid}/cpu_resctrl_groups.
Also fix static key usage for certain feature combinations and
simplify the task exit resctrl case"
* 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/resctrl: Add task resctrl information display
x86/resctrl: Check monitoring static key in the MBM overflow handler
x86/resctrl: Do not reconfigure exiting tasks
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
"These were the main changes in this cycle:
- More -rt motivated separation of CONFIG_PREEMPT and
CONFIG_PREEMPTION.
- Add more low level scheduling topology sanity checks and warnings
to filter out nonsensical topologies that break scheduling.
- Extend uclamp constraints to influence wakeup CPU placement
- Make the RT scheduler more aware of asymmetric topologies and CPU
capacities, via uclamp metrics, if CONFIG_UCLAMP_TASK=y
- Make idle CPU selection more consistent
- Various fixes, smaller cleanups, updates and enhancements - please
see the git log for details"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (58 commits)
sched/fair: Define sched_idle_cpu() only for SMP configurations
sched/topology: Assert non-NUMA topology masks don't (partially) overlap
idle: fix spelling mistake "iterrupts" -> "interrupts"
sched/fair: Remove redundant call to cpufreq_update_util()
sched/psi: create /proc/pressure and /proc/pressure/{io|memory|cpu} only when psi enabled
sched/fair: Fix sgc->{min,max}_capacity calculation for SD_OVERLAP
sched/fair: calculate delta runnable load only when it's needed
sched/cputime: move rq parameter in irqtime_account_process_tick
stop_machine: Make stop_cpus() static
sched/debug: Reset watchdog on all CPUs while processing sysrq-t
sched/core: Fix size of rq::uclamp initialization
sched/uclamp: Fix a bug in propagating uclamp value in new cgroups
sched/fair: Load balance aggressively for SCHED_IDLE CPUs
sched/fair : Improve update_sd_pick_busiest for spare capacity case
watchdog: Remove soft_lockup_hrtimer_cnt and related code
sched/rt: Make RT capacity-aware
sched/fair: Make EAS wakeup placement consider uclamp restrictions
sched/fair: Make task_fits_capacity() consider uclamp restrictions
sched/uclamp: Rename uclamp_util_with() into uclamp_rq_util_with()
sched/uclamp: Make uclamp util helpers use and return UL values
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
"Kernel side changes:
- Ftrace is one of the last W^X violators (after this only KLP is
left). These patches move it over to the generic text_poke()
interface and thereby get rid of this oddity. This requires a
surprising amount of surgery, by Peter Zijlstra.
- x86/AMD PMUs: add support for 'Large Increment per Cycle Events' to
count certain types of events that have a special, quirky hw ABI
(by Kim Phillips)
- kprobes fixes by Masami Hiramatsu
Lots of tooling updates as well, the following subcommands were
updated: annotate/report/top, c2c, clang, record, report/top TUI,
sched timehist, tests; plus updates were done to the gtk ui, libperf,
headers and the parser"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits)
perf/x86/amd: Add support for Large Increment per Cycle Events
perf/x86/amd: Constrain Large Increment per Cycle events
perf/x86/intel/rapl: Add Comet Lake support
tracing: Initialize ret in syscall_enter_define_fields()
perf header: Use last modification time for timestamp
perf c2c: Fix return type for histogram sorting comparision functions
perf beauty sockaddr: Fix augmented syscall format warning
perf/ui/gtk: Fix gtk2 build
perf ui gtk: Add missing zalloc object
perf tools: Use %define api.pure full instead of %pure-parser
libperf: Setup initial evlist::all_cpus value
perf report: Fix no libunwind compiled warning break s390 issue
perf tools: Support --prefix/--prefix-strip
perf report: Clarify in help that --children is default
tools build: Fix test-clang.cpp with Clang 8+
perf clang: Fix build with Clang 9
kprobes: Fix optimize_kprobe()/unoptimize_kprobe() cancellation logic
tools lib: Fix builds when glibc contains strlcpy()
perf report/top: Make 'e' visible in the help and make it toggle showing callchains
perf report/top: Do not offer annotation for symbols without samples
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI updates from Ingo Molnar:
"The main changes in this cycle were:
- Cleanup of the GOP [graphics output] handling code in the EFI stub
- Complete refactoring of the mixed mode handling in the x86 EFI stub
- Overhaul of the x86 EFI boot/runtime code
- Increase robustness for mixed mode code
- Add the ability to disable DMA at the root port level in the EFI
stub
- Get rid of RWX mappings in the EFI memory map and page tables,
where possible
- Move the support code for the old EFI memory mapping style into its
only user, the SGI UV1+ support code.
- plus misc fixes, updates, smaller cleanups.
... and due to interactions with the RWX changes, another round of PAT
cleanups make a guest appearance via the EFI tree - with no side
effects intended"
* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (75 commits)
efi/x86: Disable instrumentation in the EFI runtime handling code
efi/libstub/x86: Fix EFI server boot failure
efi/x86: Disallow efi=old_map in mixed mode
x86/boot/compressed: Relax sed symbol type regex for LLVM ld.lld
efi/x86: avoid KASAN false positives when accessing the 1: 1 mapping
efi: Fix handling of multiple efi_fake_mem= entries
efi: Fix efi_memmap_alloc() leaks
efi: Add tracking for dynamically allocated memmaps
efi: Add a flags parameter to efi_memory_map
efi: Fix comment for efi_mem_type() wrt absent physical addresses
efi/arm: Defer probe of PCIe backed efifb on DT systems
efi/x86: Limit EFI old memory map to SGI UV machines
efi/x86: Avoid RWX mappings for all of DRAM
efi/x86: Don't map the entire kernel text RW for mixed mode
x86/mm: Fix NX bit clearing issue in kernel_map_pages_in_pgd
efi/libstub/x86: Fix unused-variable warning
efi/libstub/x86: Use mandatory 16-byte stack alignment in mixed mode
efi/libstub/x86: Use const attribute for efi_is_64bit()
efi: Allow disabling PCI busmastering on bridges during boot
efi/x86: Allow translating 64-bit arguments for mixed mode calls
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
"The RCU changes in this cycle were:
- Expedited grace-period updates
- kfree_rcu() updates
- RCU list updates
- Preemptible RCU updates
- Torture-test updates
- Miscellaneous fixes
- Documentation updates"
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (69 commits)
rcu: Remove unused stop-machine #include
powerpc: Remove comment about read_barrier_depends()
.mailmap: Add entries for old paulmck@kernel.org addresses
srcu: Apply *_ONCE() to ->srcu_last_gp_end
rcu: Switch force_qs_rnp() to for_each_leaf_node_cpu_mask()
rcu: Move rcu_{expedited,normal} definitions into rcupdate.h
rcu: Move gp_state_names[] and gp_state_getname() to tree_stall.h
rcu: Remove the declaration of call_rcu() in tree.h
rcu: Fix tracepoint tracking RCU CPU kthread utilization
rcu: Fix harmless omission of "CONFIG_" from #if condition
rcu: Avoid tick_dep_set_cpu() misordering
rcu: Provide wrappers for uses of ->rcu_read_lock_nesting
rcu: Use READ_ONCE() for ->expmask in rcu_read_unlock_special()
rcu: Clear ->rcu_read_unlock_special only once
rcu: Clear .exp_hint only when deferred quiescent state has been reported
rcu: Rename some instance of CONFIG_PREEMPTION to CONFIG_PREEMPT_RCU
rcu: Remove kfree_call_rcu_nobatch()
rcu: Remove kfree_rcu() special casing and lazy-callback handling
rcu: Add support for debug_objects debugging for kfree_rcu()
rcu: Add multiple in-flight batches of kfree_rcu() work
...
|
|
Multicast and broadcast packets can be looped from egress to ingress
pre segmentation with dev_loopback_xmit. That function unconditionally
sets ip_summed to CHECKSUM_UNNECESSARY.
udp_rcv_segment segments gso packets in the udp rx path. Segmentation
usually executes on egress, and does not expect packets of this type.
__udp_gso_segment interprets !CHECKSUM_PARTIAL as CHECKSUM_NONE. But
the offsets are not correct for gso_make_checksum.
UDP GSO packets are of type CHECKSUM_PARTIAL, with their uh->check set
to the correct pseudo header checksum. Reset ip_summed to this type.
(CHECKSUM_PARTIAL is allowed on ingress, see comments in skbuff.h)
Reported-by: syzbot <syzkaller@googlegroups.com>
Fixes: cf329aa42b66 ("udp: cope with UDP GRO packet misdirection")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There are several storage drivers like dm-multipath, iscsi, tcmu-runner,
amd nbd that have userspace components that can run in the IO path. For
example, iscsi and nbd's userspace deamons may need to recreate a socket
and/or send IO on it, and dm-multipath's daemon multipathd may need to
send SG IO or read/write IO to figure out the state of paths and re-set
them up.
In the kernel these drivers have access to GFP_NOIO/GFP_NOFS and the
memalloc_*_save/restore functions to control the allocation behavior,
but for userspace we would end up hitting an allocation that ended up
writing data back to the same device we are trying to allocate for.
The device is then in a state of deadlock, because to execute IO the
device needs to allocate memory, but to allocate memory the memory
layers want execute IO to the device.
Here is an example with nbd using a local userspace daemon that performs
network IO to a remote server. We are using XFS on top of the nbd device,
but it can happen with any FS or other modules layered on top of the nbd
device that can write out data to free memory. Here a nbd daemon helper
thread, msgr-worker-1, is performing a write/sendmsg on a socket to execute
a request. This kicks off a reclaim operation which results in a WRITE to
the nbd device and the nbd thread calling back into the mm layer.
[ 1626.609191] msgr-worker-1 D 0 1026 1 0x00004000
[ 1626.609193] Call Trace:
[ 1626.609195] ? __schedule+0x29b/0x630
[ 1626.609197] ? wait_for_completion+0xe0/0x170
[ 1626.609198] schedule+0x30/0xb0
[ 1626.609200] schedule_timeout+0x1f6/0x2f0
[ 1626.609202] ? blk_finish_plug+0x21/0x2e
[ 1626.609204] ? _xfs_buf_ioapply+0x2e6/0x410
[ 1626.609206] ? wait_for_completion+0xe0/0x170
[ 1626.609208] wait_for_completion+0x108/0x170
[ 1626.609210] ? wake_up_q+0x70/0x70
[ 1626.609212] ? __xfs_buf_submit+0x12e/0x250
[ 1626.609214] ? xfs_bwrite+0x25/0x60
[ 1626.609215] xfs_buf_iowait+0x22/0xf0
[ 1626.609218] __xfs_buf_submit+0x12e/0x250
[ 1626.609220] xfs_bwrite+0x25/0x60
[ 1626.609222] xfs_reclaim_inode+0x2e8/0x310
[ 1626.609224] xfs_reclaim_inodes_ag+0x1b6/0x300
[ 1626.609227] xfs_reclaim_inodes_nr+0x31/0x40
[ 1626.609228] super_cache_scan+0x152/0x1a0
[ 1626.609231] do_shrink_slab+0x12c/0x2d0
[ 1626.609233] shrink_slab+0x9c/0x2a0
[ 1626.609235] shrink_node+0xd7/0x470
[ 1626.609237] do_try_to_free_pages+0xbf/0x380
[ 1626.609240] try_to_free_pages+0xd9/0x1f0
[ 1626.609245] __alloc_pages_slowpath+0x3a4/0xd30
[ 1626.609251] ? ___slab_alloc+0x238/0x560
[ 1626.609254] __alloc_pages_nodemask+0x30c/0x350
[ 1626.609259] skb_page_frag_refill+0x97/0xd0
[ 1626.609274] sk_page_frag_refill+0x1d/0x80
[ 1626.609279] tcp_sendmsg_locked+0x2bb/0xdd0
[ 1626.609304] tcp_sendmsg+0x27/0x40
[ 1626.609307] sock_sendmsg+0x54/0x60
[ 1626.609308] ___sys_sendmsg+0x29f/0x320
[ 1626.609313] ? sock_poll+0x66/0xb0
[ 1626.609318] ? ep_item_poll.isra.15+0x40/0xc0
[ 1626.609320] ? ep_send_events_proc+0xe6/0x230
[ 1626.609322] ? hrtimer_try_to_cancel+0x54/0xf0
[ 1626.609324] ? ep_read_events_proc+0xc0/0xc0
[ 1626.609326] ? _raw_write_unlock_irq+0xa/0x20
[ 1626.609327] ? ep_scan_ready_list.constprop.19+0x218/0x230
[ 1626.609329] ? __hrtimer_init+0xb0/0xb0
[ 1626.609331] ? _raw_spin_unlock_irq+0xa/0x20
[ 1626.609334] ? ep_poll+0x26c/0x4a0
[ 1626.609337] ? tcp_tsq_write.part.54+0xa0/0xa0
[ 1626.609339] ? release_sock+0x43/0x90
[ 1626.609341] ? _raw_spin_unlock_bh+0xa/0x20
[ 1626.609342] __sys_sendmsg+0x47/0x80
[ 1626.609347] do_syscall_64+0x5f/0x1c0
[ 1626.609349] ? prepare_exit_to_usermode+0x75/0xa0
[ 1626.609351] entry_SYSCALL_64_after_hwframe+0x44/0xa9
This patch adds a new prctl command that daemons can use after they have
done their initial setup, and before they start to do allocations that
are in the IO path. It sets the PF_MEMALLOC_NOIO and PF_LESS_THROTTLE
flags so both userspace block and FS threads can use it to avoid the
allocation recursion and try to prevent from being throttled while
writing out data to free up memory.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Tested-by: Masato Suzuki <masato.suzuki@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Link: https://lore.kernel.org/r/20191112001900.9206-1-mchristi@redhat.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"The interrupt departement provides:
- A mechanism to shield isolated tasks from managed interrupts:
The affinity of managed interrupts is completely controlled by the
kernel and user space has no influence on them. The reason is that
the automatically assigned affinity correlates to the multi-queue
CPU handling of block devices.
If the generated affinity mask spaws both housekeeping and isolated
CPUs the interrupt could be routed to an isolated CPU which would
then be disturbed by I/O submitted by a housekeeping CPU.
The new mechamism ensures that as long as one housekeeping CPU is
online in the assigned affinity mask the interrupt is routed to a
housekeeping CPU.
If there is no online housekeeping CPU in the affinity mask, then
the interrupt is routed to an isolated CPU to keep the device queue
intact, but unless the isolated CPU submits I/O by itself these
interrupts are not raised.
- A small addon to the device tree irqdomain core code to avoid
duplication in irq chip drivers
- Conversion of the SiFive PLIC to hierarchical domains
- The usual pile of new irq chip drivers: SiFive GPIO, Aspeed SCI,
NXP INTMUX, Meson A1 GPIO
- The first cut of support for the new ARM GICv4.1
- The usual pile of fixes and improvements in core and driver code"
* tag 'irq-core-2020-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
genirq, sched/isolation: Isolate from handling managed interrupts
irqchip/gic-v4.1: Allow direct invalidation of VLPIs
irqchip/gic-v4.1: Suppress per-VLPI doorbell
irqchip/gic-v4.1: Add VPE INVALL callback
irqchip/gic-v4.1: Add VPE eviction callback
irqchip/gic-v4.1: Add VPE residency callback
irqchip/gic-v4.1: Add mask/unmask doorbell callbacks
irqchip/gic-v4.1: Plumb skeletal VPE irqchip
irqchip/gic-v4.1: Implement the v4.1 flavour of VMOVP
irqchip/gic-v4.1: Don't use the VPE proxy if RVPEID is set
irqchip/gic-v4.1: Implement the v4.1 flavour of VMAPP
irqchip/gic-v4.1: VPE table (aka GICR_VPROPBASER) allocation
irqchip/gic-v3: Add GICv4.1 VPEID size discovery
irqchip/gic-v3: Detect GICv4.1 supporting RVPEID
irqchip/gic-v3-its: Fix get_vlpi_map() breakage with doorbells
irqdomain: Fix a memory leak in irq_domain_push_irq()
irqchip: Add NXP INTMUX interrupt multiplexer support
dt-bindings: interrupt-controller: Add binding for NXP INTMUX interrupt multiplexer
irqchip: Define EXYNOS_IRQ_COMBINER
irqchip/meson-gpio: Add support for meson a1 SoCs
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core SMP updates from Thomas Gleixner:
"A small set of SMP core code changes:
- Rework the smp function call core code to avoid the allocation of
an additional cpumask
- Remove the not longer required GFP argument from on_each_cpu_cond()
and on_each_cpu_cond_mask() and fixup the callers"
* tag 'smp-core-2020-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
smp: Remove allocation mask from on_each_cpu_cond.*()
smp: Add a smp_cond_func_t argument to smp_call_function_many()
smp: Use smp_cond_func_t as type for the conditional function
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
"The timekeeping and timers departement provides:
- Time namespace support:
If a container migrates from one host to another then it expects
that clocks based on MONOTONIC and BOOTTIME are not subject to
disruption. Due to different boot time and non-suspended runtime
these clocks can differ significantly on two hosts, in the worst
case time goes backwards which is a violation of the POSIX
requirements.
The time namespace addresses this problem. It allows to set offsets
for clock MONOTONIC and BOOTTIME once after creation and before
tasks are associated with the namespace. These offsets are taken
into account by timers and timekeeping including the VDSO.
Offsets for wall clock based clocks (REALTIME/TAI) are not provided
by this mechanism. While in theory possible, the overhead and code
complexity would be immense and not justified by the esoteric
potential use cases which were discussed at Plumbers '18.
The overhead for tasks in the root namespace (ie where host time
offsets = 0) is in the noise and great effort was made to ensure
that especially in the VDSO. If time namespace is disabled in the
kernel configuration the code is compiled out.
Kudos to Andrei Vagin and Dmitry Sofanov who implemented this
feature and kept on for more than a year addressing review
comments, finding better solutions. A pleasant experience.
- Overhaul of the alarmtimer device dependency handling to ensure
that the init/suspend/resume ordering is correct.
- A new clocksource/event driver for Microchip PIT64
- Suspend/resume support for the Hyper-V clocksource
- The usual pile of fixes, updates and improvements mostly in the
driver code"
* tag 'timers-core-2020-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits)
alarmtimer: Make alarmtimer_get_rtcdev() a stub when CONFIG_RTC_CLASS=n
alarmtimer: Use wakeup source from alarmtimer platform device
alarmtimer: Make alarmtimer platform device child of RTC device
alarmtimer: Update alarmtimer_get_rtcdev() docs to reflect reality
hrtimer: Add missing sparse annotation for __run_timer()
lib/vdso: Only read hrtimer_res when needed in __cvdso_clock_getres()
MIPS: vdso: Define BUILD_VDSO32 when building a 32bit kernel
clocksource/drivers/hyper-v: Set TSC clocksource as default w/ InvariantTSC
clocksource/drivers/hyper-v: Untangle stimers and timesync from clocksources
clocksource/drivers/timer-microchip-pit64b: Fix sparse warning
clocksource/drivers/exynos_mct: Rename Exynos to lowercase
clocksource/drivers/timer-ti-dm: Fix uninitialized pointer access
clocksource/drivers/timer-ti-dm: Switch to platform_get_irq
clocksource/drivers/timer-ti-dm: Convert to devm_platform_ioremap_resource
clocksource/drivers/em_sti: Fix variable declaration in em_sti_probe
clocksource/drivers/em_sti: Convert to devm_platform_ioremap_resource
clocksource/drivers/bcm2835_timer: Fix memory leak of timer
clocksource/drivers/cadence-ttc: Use ttc driver as platform driver
clocksource/drivers/timer-microchip-pit64b: Add Microchip PIT64B support
clocksource/drivers/hyper-v: Reserve PAGE_SIZE space for tsc page
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
"Two fixes for the generic VDSO code which missed 5.5:
- Make the update to the coarse timekeeper unconditional.
This is required because the coarse timekeeper interfaces in the
VDSO do not depend on a VDSO capable clocksource. If the system
does not have a VDSO capable clocksource and the update is
depending on the VDSO capable clocksource, the coarse VDSO
interfaces would operate on stale data forever.
- Invert the logic of __arch_update_vdso_data() to avoid further head
scratching.
Tripped over this several times while analyzing the update problem
above"
* tag 'timers-urgent-2020-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
lib/vdso: Update coarse timekeeper unconditionally
lib/vdso: Make __arch_update_vdso_data() logic understandable
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull SELinux update from Paul Moore:
"This is one of the bigger SELinux pull requests in recent years with
28 patches. Everything is passing our test suite and the highlights
are below:
- Mark CONFIG_SECURITY_SELINUX_DISABLE as deprecated. We're some time
away from actually attempting to remove this in the kernel, but the
only distro we know that still uses it (Fedora) is working on
moving away from this so we want to at least let people know we are
planning to remove it.
- Reorder the SELinux hooks to help prevent bad things when SELinux
is disabled at runtime. The proper fix is to remove the
CONFIG_SECURITY_SELINUX_DISABLE functionality (see above) and just
take care of it at boot time (e.g. "selinux=0").
- Add SELinux controls for the kernel lockdown functionality,
introducing a new SELinux class/permissions: "lockdown { integrity
confidentiality }".
- Add a SELinux control for move_mount(2) that reuses the "file {
mounton }" permission.
- Improvements to the SELinux security label data store lookup
functions to speed up translations between our internal label
representations and the visible string labels (both directions).
- Revisit a previous fix related to SELinux inode auditing and
permission caching and do it correctly this time.
- Fix the SELinux access decision cache to cleanup properly on error.
In some extreme cases this could limit the cache size and result in
a decrease in performance.
- Enable SELinux per-file labeling for binderfs.
- The SELinux initialized and disabled flags were wrapped with
accessors to ensure they are accessed correctly.
- Mark several key SELinux structures with __randomize_layout.
- Changes to the LSM build configuration to only build
security/lsm_audit.c when needed.
- Changes to the SELinux build configuration to only build the IB
object cache when CONFIG_SECURITY_INFINIBAND is enabled.
- Move a number of single-caller functions into their callers.
- Documentation fixes (/selinux -> /sys/fs/selinux).
- A handful of cleanup patches that aren't worth mentioning on their
own, the individual descriptions have plenty of detail"
* tag 'selinux-pr-20200127' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: (28 commits)
selinux: fix regression introduced by move_mount(2) syscall
selinux: do not allocate ancillary buffer on first load
selinux: remove redundant allocation and helper functions
selinux: remove redundant selinux_nlmsg_perm
selinux: fix wrong buffer types in policydb.c
selinux: reorder hooks to make runtime disable less broken
selinux: treat atomic flags more carefully
selinux: make default_noexec read-only after init
selinux: move ibpkeys code under CONFIG_SECURITY_INFINIBAND.
selinux: remove redundant msg_msg_alloc_security
Documentation,selinux: fix references to old selinuxfs mount point
selinux: deprecate disabling SELinux and runtime
selinux: allow per-file labelling for binderfs
selinuxfs: use scnprintf to get real length for inode
selinux: remove set but not used variable 'sidtab'
selinux: ensure the policy has been loaded before reading the sidtab stats
selinux: ensure we cleanup the internal AVC counters on error in avc_update()
selinux: randomize layout of key structures
selinux: clean up selinux_enabled/disabled/enforcing_boot
selinux: remove unnecessary selinux cred request
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
- cgroup2 interface for hugetlb controller. I think this was the last
remaining bit which was missing from cgroup2
- fixes for race and a spurious warning in threaded cgroup handling
- other minor changes
* 'for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
iocost: Fix iocost_monitor.py due to helper type mismatch
cgroup: Prevent double killing of css when enabling threaded cgroup
cgroup: fix function name in comment
mm: hugetlb controller for cgroups v2
|
|
Pull workqueue updates from Tejun Heo:
"Just a couple tracepoint patches"
* 'for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: remove workqueue_work event class
workqueue: add worker function to workqueue_execute_end tracepoint
|
|
Pull ioremap updates from Christoph Hellwig:
"Remove the ioremap_nocache API (plus wrappers) that are always
identical to ioremap"
* tag 'ioremap-5.6' of git://git.infradead.org/users/hch/ioremap:
remove ioremap_nocache and devm_ioremap_nocache
MIPS: define ioremap_nocache to ioremap
|
|
Pull block driver updates from Jens Axboe:
"Like the core side, not a lot of changes here, just two main items:
- Series of patches (via Coly) with fixes for bcache (Coly,
Christoph)
- MD pull request from Song"
* tag 'for-5.6/drivers-2020-01-27' of git://git.kernel.dk/linux-block: (31 commits)
bcache: reap from tail of c->btree_cache in bch_mca_scan()
bcache: reap c->btree_cache_freeable from the tail in bch_mca_scan()
bcache: remove member accessed from struct btree
bcache: print written and keys in trace_bcache_btree_write
bcache: avoid unnecessary btree nodes flushing in btree_flush_write()
bcache: add code comments for state->pool in __btree_sort()
lib: crc64: include <linux/crc64.h> for 'crc64_be'
bcache: use read_cache_page_gfp to read the superblock
bcache: store a pointer to the on-disk sb in the cache and cached_dev structures
bcache: return a pointer to the on-disk sb from read_super
bcache: transfer the sb_page reference to register_{bdev,cache}
bcache: fix use-after-free in register_bcache()
bcache: properly initialize 'path' and 'err' in register_bcache()
bcache: rework error unwinding in register_bcache
bcache: use a separate data structure for the on-disk super block
bcache: cached_dev_free needs to put the sb page
md/raid1: introduce wait_for_serialization
md/raid1: use bucket based mechanism for IO serialization
md: introduce a new struct for IO serialization
md: don't destroy serial_info_pool if serialize_policy is true
...
|
|
Pull core block updates from Jens Axboe:
"This may be the most quiet round we've had in years. I'm not
complaining. Really not a lot to detail here, outside of spelling and
documentation improvements/fixes, we have:
- Allow t10-pi to be modular (Herbert)
- Remove dead code in bfq (Alex)
- Mark zone management requests with REQ_SYNC (Chaitanya)
- BFQ division improvement (Wen)
- Small series improving plugging (Pavel)"
* tag 'for-5.6/block-2020-01-27' of git://git.kernel.dk/linux-block:
partitions/ldm: fix spelling mistake "to" -> "too"
block, bfq: improve arithmetic division in bfq_delta()
block/bfq: remove unused bfq_class_rt which never used
block: mark zone-mgmt bios with REQ_SYNC
blk-mq: Document functions for sending request
block: Allow t10-pi to be modular
blk-mq: optimise blk_mq_flush_plug_list()
list: introduce list_for_each_continue()
blk-mq: optimise rq sort function
|