Age | Commit message (Collapse) | Author | Files | Lines |
|
Add support for a polled io_uring instance. When a read or write is
submitted to a polled io_uring, the application must poll for
completions on the CQ ring through io_uring_enter(2). Polled IO may not
generate IRQ completions, hence they need to be actively found by the
application itself.
To use polling, io_uring_setup() must be used with the
IORING_SETUP_IOPOLL flag being set. It is illegal to mix and match
polled and non-polled IO on an io_uring.
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add a new fsync opcode, which either syncs a range if one is passed,
or the whole file if the offset and length fields are both cleared
to zero. A flag is provided to use fdatasync semantics, that is only
force out metadata which is required to retrieve the file data, but
not others like metadata.
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The submission queue (SQ) and completion queue (CQ) rings are shared
between the application and the kernel. This eliminates the need to
copy data back and forth to submit and complete IO.
IO submissions use the io_uring_sqe data structure, and completions
are generated in the form of io_uring_cqe data structures. The SQ
ring is an index into the io_uring_sqe array, which makes it possible
to submit a batch of IOs without them being contiguous in the ring.
The CQ ring is always contiguous, as completion events are inherently
unordered, and hence any io_uring_cqe entry can point back to an
arbitrary submission.
Two new system calls are added for this:
io_uring_setup(entries, params)
Sets up an io_uring instance for doing async IO. On success,
returns a file descriptor that the application can mmap to
gain access to the SQ ring, CQ ring, and io_uring_sqes.
io_uring_enter(fd, to_submit, min_complete, flags, sigset, sigsetsize)
Initiates IO against the rings mapped to this fd, or waits for
them to complete, or both. The behavior is controlled by the
parameters passed in. If 'to_submit' is non-zero, then we'll
try and submit new IO. If IORING_ENTER_GETEVENTS is set, the
kernel will wait for 'min_complete' events, if they aren't
already available. It's valid to set IORING_ENTER_GETEVENTS
and 'min_complete' == 0 at the same time, this allows the
kernel to return already completed events without waiting
for them. This is useful only for polling, as for IRQ
driven IO, the application can just check the CQ ring
without entering the kernel.
With this setup, it's possible to do async IO with a single system
call. Future developments will enable polled IO with this interface,
and polled submission as well. The latter will enable an application
to do IO without doing ANY system calls at all.
For IRQ driven IO, an application only needs to enter the kernel for
completions if it wants to wait for them to occur.
Each io_uring is backed by a workqueue, to support buffered async IO
as well. We will only punt to an async context if the command would
need to wait for IO on the device side. Any data that can be accessed
directly in the page cache is done inline. This avoids the slowness
issue of usual threadpools, since cached data is accessed as quickly
as a sync interface.
Sample application: http://git.kernel.dk/cgit/fio/plain/t/io_uring.c
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
mp_bvec_for_each_segment() is a bit big for the iteration, so introduce
a light-weight helper for iterating over pages, then 32bytes stack
space can be saved.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Single-page bvec can often be seen in small BS workloads, so
introduce bvec_nth_page() for avoiding to call nth_page() unnecessarily,
which looks not cheap.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Store the request queue the last bio was submitted to in the iocb
private data in addition to the cookie so that we find the right block
device. Also refactor the common direct I/O bio submission code into a
nice little helper.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Modified to use bio_set_polled().
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
For the upcoming async polled IO, we can't sleep allocating requests.
If we do, then we introduce a deadlock where the submitter already
has async polled IO in-flight, but can't wait for them to complete
since polled requests must be active found and reaped.
Utilize the helper in the blockdev DIRECT_IO code.
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This new methods is used to explicitly poll for I/O completion for an
iocb. It must be called for any iocb submitted asynchronously (that
is with a non-null ki_complete) which has the IOCB_HIPRI flag set.
The method is assisted by a new ki_cookie field in struct iocb to store
the polling cookie.
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Update license to use SPDX-License-Identifier instead of verbose license
text.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
|
|
Update license to use SPDX-License-Identifier instead of verbose license
text.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
|
|
For .h files we need to use /* */ style comments.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
|
|
We already have a ЅPDX header, so no need to duplicate the information.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
|
|
Pull in 5.0-rc6 to avoid a dumb merge conflict with fs/iomap.c.
This is needed since io_uring is now based on the block branch,
to avoid a conflict between the multi-page bvecs and the bits
of io_uring that touch the core block parts.
* tag 'v5.0-rc6': (525 commits)
Linux 5.0-rc6
x86/mm: Make set_pmd_at() paravirt aware
MAINTAINERS: Update the ocores i2c bus driver maintainer, etc
blk-mq: remove duplicated definition of blk_mq_freeze_queue
Blk-iolatency: warn on negative inflight IO counter
blk-iolatency: fix IO hang due to negative inflight counter
MAINTAINERS: unify reference to xen-devel list
x86/mm/cpa: Fix set_mce_nospec()
futex: Handle early deadlock return correctly
futex: Fix barrier comment
net: dsa: b53: Fix for failure when irq is not defined in dt
blktrace: Show requests without sector
mips: cm: reprime error cause
mips: loongson64: remove unreachable(), fix loongson_poweroff().
sit: check if IPv6 enabled before calling ip6_err_gen_icmpv6_unreach()
geneve: should not call rt6_lookup() when ipv6 was disabled
KVM: nVMX: unconditionally cancel preemption timer in free_nested (CVE-2019-7221)
KVM: x86: work around leak of uninitialized stack contents (CVE-2019-7222)
kvm: fix kvm_ioctl_create_device() reference counting (CVE-2019-6974)
signal: Better detection of synchronous signals
...
|
|
QUEUE_FLAG_NO_SG_MERGE has been killed, so kill BLK_MQ_F_SG_MERGE too.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Since bdced438acd83ad83a6c ("block: setup bi_phys_segments after splitting"),
physical segment number is mainly figured out in blk_queue_split() for
fast path, and the flag of BIO_SEG_VALID is set there too.
Now only blk_recount_segments() and blk_recalc_rq_segments() use this
flag.
Basically blk_recount_segments() is bypassed in fast path given BIO_SEG_VALID
is set in blk_queue_split().
For another user of blk_recalc_rq_segments():
- run in partial completion branch of blk_update_request, which is an unusual case
- run in blk_cloned_rq_check_limits(), still not a big problem if the flag is killed
since dm-rq is the only user.
Multi-page bvec is enabled now, not doing S/G merging is rather pointless with the
current setup of the I/O path, as it isn't going to save you a significant amount
of cycles.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Now multi-page bvec can cover CONFIG_THP_SWAP, so we don't need to
increase BIO_MAX_PAGES for it.
CONFIG_THP_SWAP needs to split one THP into normal pages and adds
them all to one bio. With multipage-bvec, it just takes one bvec to
hold them all.
Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This patch pulls the trigger for multi-page bvecs.
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This patch introduces one extra iterator variable to bio_for_each_segment_all(),
then we can allow bio_for_each_segment_all() to iterate over multi-page bvec.
Given it is just one mechannical & simple change on all bio_for_each_segment_all()
users, this patch does tree-wide change in one single patch, so that we can
avoid to use a temporary helper for this conversion.
Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
BTRFS and guard_bio_eod() need to get the last singlepage segment
from one multipage bvec, so introduce this helper to make them happy.
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
bio_for_each_bvec() is used for iterating over multi-page bvec for bio
split & merge code.
rq_for_each_bvec() can be used for drivers which may handle the
multi-page bvec directly, so far loop is one perfect use case.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This patch introduces helpers of 'mp_bvec_iter_*' for multi-page bvec
support.
The introduced helpers treate one bvec as real multi-page segment,
which may include more than one pages.
The existed helpers of bvec_iter_* are interfaces for supporting current
bvec iterator which is thought as single-page by drivers, fs, dm and
etc. These introduced helpers will build single-page bvec in flight, so
this way won't break current bio/bvec users, which needn't any change.
Follows some multi-page bvec background:
- bvecs stored in bio->bi_io_vec is always multi-page style
- bvec(struct bio_vec) represents one physically contiguous I/O
buffer, now the buffer may include more than one page after
multi-page bvec is supported, and all these pages represented
by one bvec is physically contiguous. Before multi-page bvec
support, at most one page is included in one bvec, we call it
single-page bvec.
- .bv_page of the bvec points to the 1st page in the multi-page bvec
- .bv_offset of the bvec is the offset of the buffer in the bvec
The effect on the current drivers/filesystem/dm/bcache/...:
- almost everyone supposes that one bvec only includes one single
page, so we keep the sp interface not changed, for example,
bio_for_each_segment() still returns single-page bvec
- bio_for_each_segment_all() will return single-page bvec too
- during iterating, iterator variable(struct bvec_iter) is always
updated in multi-page bvec style, and bvec_iter_advance() is kept
not changed
- returned(copied) single-page bvec is built in flight by bvec
helpers from the stored multi-page bvec
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Commit 7759eb23fd980 ("block: remove bio_rewind_iter()") removes
bio_rewind_iter(), then no one uses bvec_iter_rewind() any more,
so remove it.
Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
bio_readpage_error currently uses bi_vcnt to decide if it is worth
retrying an I/O. But the vector count is mostly an implementation
artifact - it really should figure out if there is more than a
single sector worth retrying. Use bi_size for that and shift by
PAGE_SHIFT. This really should be blocks/sectors, but given that
btrfs doesn't support a sector size different from the PAGE_SIZE
using the page size keeps the changes to a minimum.
Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Ingo Molnar:
"irqchip driver fixes: most of them are race fixes for ARM GIC (General
Interrupt Controller) variants, but also a fix for the ARM MMP
(Marvell PXA168 et al) irqchip affecting OLPC keyboards"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gic-v3-its: Fix ITT_entry_size accessor
irqchip/mmp: Only touch the PJ4 IRQ & FIQ bits on enable/disable
irqchip/gic-v3-its: Gracefully fail on LPI exhaustion
irqchip/gic-v3-its: Plug allocation race for devices sharing a DevID
irqchip/gic-v4: Fix occasional VLPI drop
|
|
We have QUEUE_FLAG_DEFAULT defined, but it's not used anymore since
the legacy IO stack is gone. Kill it.
Sanitize the queue flags in general, they use spaces (for some
reason), and the space is pretty sparse. With the flags renumbered,
we can more clearly see how many we have available.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We have various helpers for setting/clearing this flag, and also
a helper to check if the queue supports queueable flushes or not.
But nobody uses them anymore, kill it with fire.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Pull block fixes from Jens Axboe:
- NVMe pull request from Christoph, fixing namespace locking when
dealing with the effects log, and a rapid add/remove issue (Keith)
- blktrace tweak, ensuring requests with -1 sectors are shown (Jan)
- link power management quirk for a Smasung SSD (Hans)
- m68k nfblock dynamic major number fix (Chengguang)
- series fixing blk-iolatency inflight counter issue (Liu)
- ensure that we clear ->private when setting up the aio kiocb (Mike)
- __find_get_block_slow() rate limit print (Tetsuo)
* tag 'for-linus-20190209' of git://git.kernel.dk/linux-block:
blk-mq: remove duplicated definition of blk_mq_freeze_queue
Blk-iolatency: warn on negative inflight IO counter
blk-iolatency: fix IO hang due to negative inflight counter
blktrace: Show requests without sector
fs: ratelimit __find_get_block_slow() failure message.
m68k: set proper major_num when specifying module param major_num
libata: Add NOLPM quirk for SAMSUNG MZ7TE512HMHP-000L1 SSD
nvme-pci: fix rapid add remove sequence
nvme: lock NS list changes while handling command effects
aio: initialize kiocb private in case any filesystems expect it.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
"This is a bit larger than normal, as we had not managed to send out a
pull request before traveling for a week without my signing key.
There are multiple code fixes for older bugs, all of which should get
backported into stable kernels:
- tango: one fix for multiplatform configurations broken on other
platforms when tango is enabled
- arm_scmi: device unregistration fix
- iop32x: fix kernel oops from extraneous __init annotation
- pxa: remove a double kfree
- fsl qbman: close an interrupt clearing race
The rest is the usual collection of smaller fixes for device tree
files, on the renesas, allwinner, meson, omap, davinci, qualcomm and
imx platforms.
Some of these are for compile-time warnings, most are for board
specific functionality that fails to work because of incorrect
settings"
* tag 'armsoc-fixes-5.0' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (30 commits)
ARM: tango: Improve ARCH_MULTIPLATFORM compatibility
firmware: arm_scmi: provide the mandatory device release callback
ARM: iop32x/n2100: fix PCI IRQ mapping
arm64: dts: add msm8996 compatible to gicv3
ARM: dts: am335x-shc.dts: fix wrong cd pin level
ARM: dts: n900: fix mmc1 card detect gpio polarity
ARM: dts: omap3-gta04: Fix graph_port warning
ARM: pxa: ssp: unneeded to free devm_ allocated data
ARM: dts: r8a7743: Convert to new LVDS DT bindings
soc: fsl: qbman: avoid race in clearing QMan interrupt
arm64: dts: renesas: r8a77965: Enable DMA for SCIF2
arm64: dts: renesas: r8a7796: Enable DMA for SCIF2
arm64: dts: renesas: r8a774a1: Enable DMA for SCIF2
ARM: dts: da850: fix interrupt numbers for clocksource
dt-bindings: imx8mq: Number clocks consecutively
arm64: dts: meson: Fix mmc cd-gpios polarity
ARM: dts: imx6sx: correct backward compatible of gpt
ARM: dts: imx: replace gpio-key,wakeup with wakeup-source property
ARM: dts: vf610-bk4: fix incorrect #address-cells for dspi3
ARM: dts: meson8m2: mxiii-plus: mark the SD card detection GPIO active-low
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull signal fixes from Eric Biederman:
"This contains four small fixes for signal handling. A missing range
check, a regression fix, prioritizing signals we have already started
a signal group exit for, and better detection of synchronous signals.
The confused decision of which signals to handle failed spectacularly
when a timer was pointed at SIGBUS and the stack overflowed. Resulting
in an unkillable process in an infinite loop instead of a SIGSEGV and
core dump"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
signal: Better detection of synchronous signals
signal: Always notice exiting tasks
signal: Always attempt to allocate siginfo for SIGSTOP
signal: Make siginmask safe when passed a signal of 0
|
|
Pull networking fixes from David Miller:
"This pull request is dedicated to the upcoming snowpocalypse parts 2
and 3 in the Pacific Northwest:
1) Drop profiles are broken because some drivers use dev_kfree_skb*
instead of dev_consume_skb*, from Yang Wei.
2) Fix IWLWIFI kconfig deps, from Luca Coelho.
3) Fix percpu maps updating in bpftool, from Paolo Abeni.
4) Missing station release in batman-adv, from Felix Fietkau.
5) Fix some networking compat ioctl bugs, from Johannes Berg.
6) ucc_geth must reset the BQL queue state when stopping the device,
from Mathias Thore.
7) Several XDP bug fixes in virtio_net from Toshiaki Makita.
8) TSO packets must be sent always on queue 0 in stmmac, from Jose
Abreu.
9) Fix socket refcounting bug in RDS, from Eric Dumazet.
10) Handle sparse cpu allocations in bpf selftests, from Martynas
Pumputis.
11) Make sure mgmt frames have enough tailroom in mac80211, from Felix
Feitkau.
12) Use safe list walking in sctp_sendmsg() asoc list traversal, from
Greg Kroah-Hartman.
13) Make DCCP's ccid_hc_[rt]x_parse_options always check for NULL
ccid, from Eric Dumazet.
14) Need to reload WoL password into bcmsysport device after deep
sleeps, from Florian Fainelli.
15) Remove filter from mask before freeing in cls_flower, from Petr
Machata.
16) Missing release and use after free in error paths of s390 qeth
code, from Julian Wiedmann.
17) Fix lockdep false positive in dsa code, from Marc Zyngier.
18) Fix counting of ATU violations in mv88e6xxx, from Andrew Lunn.
19) Fix EQ firmware assert in qed driver, from Manish Chopra.
20) Don't default Caivum PTP to Y in kconfig, from Bjorn Helgaas"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits)
net: dsa: b53: Fix for failure when irq is not defined in dt
sit: check if IPv6 enabled before calling ip6_err_gen_icmpv6_unreach()
geneve: should not call rt6_lookup() when ipv6 was disabled
net: Don't default Cavium PTP driver to 'y'
net: broadcom: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: via-velocity: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: tehuti: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: sun: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: fsl_ucc_hdlc: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: fec_mpc52xx: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: smsc: epic100: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: dscc4: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: tulip: de2104x: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: defxx: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net/mlx5e: Don't overwrite pedit action when multiple pedit used
net/mlx5e: Update hw flows when encap source mac changed
qed*: Advance drivers version to 8.37.0.20
qed: Change verbosity for coalescing message.
qede: Fix system crash on configuring channels.
qed: Consider TX tcs while deriving the max num_queues for PF.
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent
Pull irqchip updates from Marc Zyngier:
- Another GICv3 ITS fix for devices sharing the same DevID
- Don't return invalid data on exhaustion of the GICv3 LPI pool
- Fix a GICv3 field decoding bug leading to memory over-allocation
- Init GICv4 at boot time instead of lazy init
- Fix interrupt masking on PJ4
|
|
Currently, blktrace will not show requests that don't have any data as
rq->__sector is initialized to -1 which is out of device range and thus
discarded by act_log_check(). This is most notably the case for cache
flush requests sent to the device. Fix the problem by making
blk_rq_trace_sector() return 0 for requests without initialized sector.
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID fix from Jiri Kosina:
"A fix for a bug in hid-debug that can lock up the kernel in infinite
loop (CVE-2019-3819), from Vladis Dronov"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: debug: fix the ring buffer implementation
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of a few small fixes.
The most significant one is the fix for the possible race at loading
HD-audio drivers. This has been present for long time and surfaced
only in a rare occasion, but finally spotted out"
* tag 'sound-5.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/ca0132 - Fix build error without CONFIG_PCI
ALSA: compress: Fix stop handling on compressed capture streams
ALSA: usb-audio: Add support for new T+A USB DAC
ALSA: hda - Serialize codec registrations
ALSA: hda/realtek - Use a common helper for hp pin reference
ALSA: hda/realtek - Fix lose hp_pins for disable auto mute
ALSA: hda/realtek - Headset microphone support for System76 darp5
|
|
Pull virtio fixes from Michael Tsirkin:
"A small fix for a uapi header, and a fix for VDPA for non-x86 guests"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio: drop internal struct from UAPI
virtio: support VIRTIO_F_ORDER_PLATFORM
|
|
It is normal user behaviour to start, stop, then start a stream
again without closing it. Currently this works for compressed
playback streams but not capture ones.
The states on a compressed capture stream go directly from OPEN to
PREPARED, unlike a playback stream which moves to SETUP and waits
for a write of data before moving to PREPARED. Currently however,
when a stop is sent the state is set to SETUP for both types of
streams. This leaves a capture stream in the situation where a new
start can't be sent as that requires the state to be PREPARED and
a new set_params can't be sent as that requires the state to be
OPEN. The only option being to close the stream, and then reopen.
Correct this issues by allowing snd_compr_drain_notify to set the
state depending on the stream direction, as we already do in
set_params.
Fixes: 49bb6402f1aa ("ALSA: compress_core: Add support for capture streams")
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
There's no reason to expose struct vring_packed in UAPI - if we do we
won't be able to change or drop it, and it's not part of any interface.
Let's move it to virtio_ring.c
Cc: Tiwei Bie <tiwei.bie@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Anonymous sets that are bound to rules from the same transaction trigger
a kernel splat from the abort path due to double set list removal and
double free.
This patch updates the logic to search for the transaction that is
responsible for creating the set and disable the set list removal and
release, given the rule is now responsible for this. Lookup is reverse
since the transaction that adds the set is likely to be at the tail of
the list.
Moreover, this patch adds the unbind step to deliver the event from the
commit path. This should not be done from the worker thread, since we
have no guarantees of in-order delivery to the listener.
This patch removes the assumption that both activate and deactivate
callbacks need to be provided.
Fixes: cd5125d8f518 ("netfilter: nf_tables: split set destruction in deactivate and destroy phase")
Reported-by: Mikhail Morfikov <mmorfikov@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A few updates for x86:
- Fix an unintended sign extension issue in the fault handling code
- Rename the new resource control config switch so it's less
confusing
- Avoid setting up EFI info in kexec when the EFI runtime is
disabled.
- Fix the microcode version check in the AMD microcode loader so it
only loads higher version numbers and never downgrades
- Set EFER.LME in the 32bit trampoline before returning to long mode
to handle older AMD/KVM behaviour properly.
- Add Darren and Andy as x86/platform reviewers"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/resctrl: Avoid confusion over the new X86_RESCTRL config
x86/kexec: Don't setup EFI info if EFI runtime is not enabled
x86/microcode/amd: Don't falsely trick the late loading mechanism
MAINTAINERS: Add Andy and Darren as arch/x86/platform/ reviewers
x86/fault: Fix sign-extend unintended sign extension
x86/boot/compressed/64: Set EFER.LME=1 in 32-bit trampoline before returning to long mode
x86/cpu: Add Atom Tremont (Jacobsville)
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull cpu hotplug fixes from Thomas Gleixner:
"Two fixes for the cpu hotplug machinery:
- Replace the overly clever 'SMT disabled by BIOS' detection logic as
it breaks KVM scenarios and prevents speculation control updates
when the Hyperthreads are brought online late after boot.
- Remove a redundant invocation of the speculation control update
function"
* 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVM
x86/speculation: Remove redundant arch_smt_update() invocation
|
|
Pull block fixes from Jens Axboe:
"A few fixes that should go into this release. This contains:
- MD pull request from Song, fixing a recovery OOM issue (Alexei)
- Fix for a sync related stall (Jianchao)
- Dummy callback for timeouts (Tetsuo)
- IDE atapi sense ordering fix (me)"
* tag 'for-linus-20190202' of git://git.kernel.dk/linux-block:
ide: ensure atapi sense request aren't preempted
blk-mq: fix a hung issue when fsync
block: pass no-op callback to INIT_WORK().
md/raid5: fix 'out of memory' during raid cache recovery
|
|
"Resource Control" is a very broad term for this CPU feature, and a term
that is also associated with containers, cgroups etc. This can easily
cause confusion.
Make the user prompt more specific. Match the config symbol name.
[ bp: In the future, the corresponding ARM arch-specific code will be
under ARM_CPU_RESCTRL and the arch-agnostic bits will be carved out
under the CPU_RESCTRL umbrella symbol. ]
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Babu Moger <Babu.Moger@amd.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morse <james.morse@arm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: linux-doc@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Pu Wen <puwen@hygon.cn>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190130195621.GA30653@cmpxchg.org
|
|
On an arm64 ThunderX2 server, the first kmemleak scan would crash [1]
with CONFIG_DEBUG_VM_PGFLAGS=y due to page_to_nid() found a pfn that is
not directly mapped (MEMBLOCK_NOMAP). Hence, the page->flags is
uninitialized.
This is due to the commit 9f1eb38e0e11 ("mm, kmemleak: little
optimization while scanning") starts to use pfn_to_online_page() instead
of pfn_valid(). However, in the CONFIG_MEMORY_HOTPLUG=y case,
pfn_to_online_page() does not call memblock_is_map_memory() while
pfn_valid() does.
Historically, the commit 68709f45385a ("arm64: only consider memblocks
with NOMAP cleared for linear mapping") causes pages marked as nomap
being no long reassigned to the new zone in memmap_init_zone() by
calling __init_single_page().
Since the commit 2d070eab2e82 ("mm: consider zone which is not fully
populated to have holes") introduced pfn_to_online_page() and was
designed to return a valid pfn only, but it is clearly broken on arm64.
Therefore, let pfn_to_online_page() call pfn_valid_within(), so it can
handle nomap thanks to the commit f52bb98f5ade ("arm64: mm: always
enable CONFIG_HOLES_IN_ZONE"), while it will be optimized away on
architectures where have no HOLES_IN_ZONE.
[1]
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000006
Mem abort info:
ESR = 0x96000005
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000005
CM = 0, WnR = 0
Internal error: Oops: 96000005 [#1] SMP
CPU: 60 PID: 1408 Comm: kmemleak Not tainted 5.0.0-rc2+ #8
pstate: 60400009 (nZCv daif +PAN -UAO)
pc : page_mapping+0x24/0x144
lr : __dump_page+0x34/0x3dc
sp : ffff00003a5cfd10
x29: ffff00003a5cfd10 x28: 000000000000802f
x27: 0000000000000000 x26: 0000000000277d00
x25: ffff000010791f56 x24: ffff7fe000000000
x23: ffff000010772f8b x22: ffff00001125f670
x21: ffff000011311000 x20: ffff000010772f8b
x19: fffffffffffffffe x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000
x15: 0000000000000000 x14: ffff802698b19600
x13: ffff802698b1a200 x12: ffff802698b16f00
x11: ffff802698b1a400 x10: 0000000000001400
x9 : 0000000000000001 x8 : ffff00001121a000
x7 : 0000000000000000 x6 : ffff0000102c53b8
x5 : 0000000000000000 x4 : 0000000000000003
x3 : 0000000000000100 x2 : 0000000000000000
x1 : ffff000010772f8b x0 : ffffffffffffffff
Process kmemleak (pid: 1408, stack limit = 0x(____ptrval____))
Call trace:
page_mapping+0x24/0x144
__dump_page+0x34/0x3dc
dump_page+0x28/0x4c
kmemleak_scan+0x4ac/0x680
kmemleak_scan_thread+0xb4/0xdc
kthread+0x12c/0x13c
ret_from_fork+0x10/0x18
Code: d503201f f9400660 36000040 d1000413 (f9400661)
---[ end trace 4d4bd7f573490c8e ]---
Kernel panic - not syncing: Fatal exception
SMP: stopping secondary CPUs
Kernel Offset: disabled
CPU features: 0x002,20000c38
Memory Limit: none
---[ end Kernel panic - not syncing: Fatal exception ]---
Link: http://lkml.kernel.org/r/20190122132916.28360-1-cai@lca.pw
Fixes: 9f1eb38e0e11 ("mm, kmemleak: little optimization while scanning")
Signed-off-by: Qian Cai <cai@lca.pw>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Arkadiusz reported that enabling memcg's group oom killing causes
strange memcg statistics where there is no task in a memcg despite the
number of tasks in that memcg is not 0. It turned out that there is a
bug in wake_oom_reaper() which allows enqueuing same task twice which
makes impossible to decrease the number of tasks in that memcg due to a
refcount leak.
This bug existed since the OOM reaper became invokable from
task_will_free_mem(current) path in out_of_memory() in Linux 4.7,
T1@P1 |T2@P1 |T3@P1 |OOM reaper
----------+----------+----------+------------
# Processing an OOM victim in a different memcg domain.
try_charge()
mem_cgroup_out_of_memory()
mutex_lock(&oom_lock)
try_charge()
mem_cgroup_out_of_memory()
mutex_lock(&oom_lock)
try_charge()
mem_cgroup_out_of_memory()
mutex_lock(&oom_lock)
out_of_memory()
oom_kill_process(P1)
do_send_sig_info(SIGKILL, @P1)
mark_oom_victim(T1@P1)
wake_oom_reaper(T1@P1) # T1@P1 is enqueued.
mutex_unlock(&oom_lock)
out_of_memory()
mark_oom_victim(T2@P1)
wake_oom_reaper(T2@P1) # T2@P1 is enqueued.
mutex_unlock(&oom_lock)
out_of_memory()
mark_oom_victim(T1@P1)
wake_oom_reaper(T1@P1) # T1@P1 is enqueued again due to oom_reaper_list == T2@P1 && T1@P1->oom_reaper_list == NULL.
mutex_unlock(&oom_lock)
# Completed processing an OOM victim in a different memcg domain.
spin_lock(&oom_reaper_lock)
# T1P1 is dequeued.
spin_unlock(&oom_reaper_lock)
but memcg's group oom killing made it easier to trigger this bug by
calling wake_oom_reaper() on the same task from one out_of_memory()
request.
Fix this bug using an approach used by commit 855b018325737f76 ("oom,
oom_reaper: disable oom_reaper for oom_kill_allocating_task"). As a
side effect of this patch, this patch also avoids enqueuing multiple
threads sharing memory via task_will_free_mem(current) path.
Link: http://lkml.kernel.org/r/e865a044-2c10-9858-f4ef-254bc71d6cc2@i-love.sakura.ne.jp
Link: http://lkml.kernel.org/r/5ee34fc6-1485-34f8-8790-903ddabaa809@i-love.sakura.ne.jp
Fixes: af8e15cc85a25315 ("oom, oom_reaper: do not enqueue task if it is on the oom_reaper_list head")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: Arkadiusz Miskiewicz <arekm@maven.pl>
Tested-by: Arkadiusz Miskiewicz <arekm@maven.pl>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Roman Gushchin <guro@fb.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Aleksa Sarai <asarai@suse.de>
Cc: Jay Kamat <jgkamat@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Alexei Starovoitov says:
====================
pull-request: bpf 2019-01-31
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) disable preemption in sender side of socket filters, from Alexei.
2) fix two potential deadlocks in syscall bpf lookup and prog_register,
from Martin and Alexei.
3) fix BTF to allow typedef on func_proto, from Yonghong.
4) two bpftool fixes, from Jiri and Paolo.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull rdma fixes from Jason Gunthorpe:
"Still not much going on, the usual set of oops and driver fixes this
time:
- Fix two uapi breakage regressions in mlx5 drivers
- Various oops fixes in hfi1, mlx4, umem, uverbs, and ipoib
- A protocol bug fix for hfi1 preventing it from implementing the
verbs API properly, and a compatability fix for EXEC STACK user
programs
- Fix missed refcounting in the 'advise_mr' patches merged this
cycle.
- Fix wrong use of the uABI in the hns SRQ patches merged this cycle"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
IB/uverbs: Fix OOPs in uverbs_user_mmap_disassociate
IB/ipoib: Fix for use-after-free in ipoib_cm_tx_start
IB/uverbs: Fix ioctl query port to consider device disassociation
RDMA/mlx5: Fix flow creation on representors
IB/uverbs: Fix OOPs upon device disassociation
RDMA/umem: Add missing initialization of owning_mm
RDMA/hns: Update the kernel header file of hns
IB/mlx5: Fix how advise_mr() launches async work
RDMA/device: Expose ib_device_try_get(()
IB/hfi1: Add limit test for RC/UC send via loopback
IB/hfi1: Remove overly conservative VM_EXEC flag check
IB/{hfi1, qib}: Fix WC.byte_len calculation for UD_SEND_WITH_IMM
IB/mlx4: Fix using wrong function to destroy sqp AHs under SRIOV
RDMA/mlx5: Fix check for supported user flags when creating a QP
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix a PM-runtime framework regression introduced by the recent
switch-over of device autosuspend to hrtimers and a mistake in the
"poll idle state" code introduced by a recent change in it.
Specifics:
- Since ktime_get() turns out to be problematic for device
autosuspend in the PM-runtime framework, make it use
ktime_get_mono_fast_ns() instead (Vincent Guittot).
- Fix an initial value of a local variable in the "poll idle state"
code that makes it behave not exactly as expected when all idle
states except for the "polling" one are disabled (Doug Smythies)"
* tag 'pm-5.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpuidle: poll_state: Fix default time limit
PM-runtime: Fix deadlock with ktime_get()
|
|
In the current code, the codec registration may happen both at the
codec bind time and the end of the controller probe time. In a rare
occasion, they race with each other, leading to Oops due to the still
uninitialized card device.
This patch introduces a simple flag to prevent the codec registration
at the codec bind time as long as the controller probe is going on.
The controller probe invokes snd_card_register() that does the whole
registration task, and we don't need to register each piece
beforehand.
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"Mostly driver fixes, but there's a core framework fix in here too:
- Revert the commits that introduce clk management for the SP clk on
MMP2 SoCs (used for OLPC). Turns out it wasn't a good idea and
there isn't any need to manage this clk, it just causes more
headaches.
- A performance regression that went unnoticed for many years where
we would traverse the entire clk tree looking for a clk by name
when we already have the pointer to said clk that we're looking for
- A parent linkage fix for the qcom SDM845 clk driver
- An i.MX clk driver rate miscalculation fix where order of
operations were messed up
- One error handling fix from the static checkers"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: qcom: gcc: Use active only source for CPUSS clocks
clk: ti: Fix error handling in ti_clk_parse_divider_data()
clk: imx: Fix fractional clock set rate computation
clk: Remove global clk traversal on fetch parent index
Revert "dt-bindings: marvell,mmp2: Add clock id for the SP clock"
Revert "clk: mmp2: add SP clock"
Revert "Input: olpc_apsp - enable the SP clock"
|
|
Disabled preemption is necessary for proper access to per-cpu maps
from BPF programs.
But the sender side of socket filters didn't have preemption disabled:
unix_dgram_sendmsg->sk_filter->sk_filter_trim_cap->bpf_prog_run_save_cb->BPF_PROG_RUN
and a combination of af_packet with tun device didn't disable either:
tpacket_snd->packet_direct_xmit->packet_pick_tx_queue->ndo_select_queue->
tun_select_queue->tun_ebpf_select_queue->bpf_prog_run_clear_cb->BPF_PROG_RUN
Disable preemption before executing BPF programs (both classic and extended).
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|