Age | Commit message (Collapse) | Author | Files | Lines |
|
Andrii Nakryiko says:
====================
This patch set implements initial version (as discussed at LSF/MM2019
conference) of a new way to specify BPF maps, relying on BTF type information,
which allows for easy extensibility, preserving forward and backward
compatibility. See details and examples in description for patch #6.
[0] contains an outline of follow up extensions to be added after this basic
set of features lands. They are useful by itself, but also allows to bring
libbpf to feature-parity with iproute2 BPF loader. That should open a path
forward for BPF loaders unification.
Patch #1 centralizes commonly used min/max macro in libbpf_internal.h.
Patch #2 extracts .BTF and .BTF.ext loading loging from elf_collect().
Patch #3 simplifies elf_collect() error-handling logic.
Patch #4 refactors map initialization logic into user-provided maps and global
data maps, in preparation to adding another way (BTF-defined maps).
Patch #5 adds support for map definitions in multiple ELF sections and
deprecates bpf_object__find_map_by_offset() API which doesn't appear to be
used anymore and makes assumption that all map definitions reside in single
ELF section.
Patch #6 splits BTF intialization from sanitization/loading into kernel to
preserve original BTF at the time of map initialization.
Patch #7 adds support for BTF-defined maps.
Patch #8 adds new test for BTF-defined map definition.
Patches #9-11 convert test BPF map definitions to use BTF way.
[0] https://lore.kernel.org/bpf/CAEf4BzbfdG2ub7gCi0OYqBrUoChVHWsmOntWAkJt47=FE+km+A@mail.gmail.com/
v1->v2:
- more BTF-sanity checks in parsing map definitions (Song);
- removed confusing usage of "attribute", switched to "field;
- split off elf_collect() refactor from btf loading refactor (Song);
- split selftests conversion into 3 patches (Stanislav):
1. test already relying on BTF;
2. tests w/ custom types as key/value (so benefiting from BTF);
3. all the rest tests (integers as key/value, special maps w/o BTF support).
- smaller code improvements (Song);
rfc->v1:
- error out on unknown field by default (Stanislav, Jakub, Lorenz);
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Convert a bulk of selftests that have maps with custom (not integer) key
and/or value.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Switch tests that already rely on BTF to BTF-defined map definitions.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Add file test for BTF-defined map definition.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
This patch adds support for a new way to define BPF maps. It relies on
BTF to describe mandatory and optional attributes of a map, as well as
captures type information of key and value naturally. This eliminates
the need for BPF_ANNOTATE_KV_PAIR hack and ensures key/value sizes are
always in sync with the key/value type.
Relying on BTF, this approach allows for both forward and backward
compatibility w.r.t. extending supported map definition features. By
default, any unrecognized attributes are treated as an error, but it's
possible relax this using MAPS_RELAX_COMPAT flag. New attributes, added
in the future will need to be optional.
The outline of the new map definition (short, BTF-defined maps) is as follows:
1. All the maps should be defined in .maps ELF section. It's possible to
have both "legacy" map definitions in `maps` sections and BTF-defined
maps in .maps sections. Everything will still work transparently.
2. The map declaration and initialization is done through
a global/static variable of a struct type with few mandatory and
extra optional fields:
- type field is mandatory and specified type of BPF map;
- key/value fields are mandatory and capture key/value type/size information;
- max_entries attribute is optional; if max_entries is not specified or
initialized, it has to be provided in runtime through libbpf API
before loading bpf_object;
- map_flags is optional and if not defined, will be assumed to be 0.
3. Key/value fields should be **a pointer** to a type describing
key/value. The pointee type is assumed (and will be recorded as such
and used for size determination) to be a type describing key/value of
the map. This is done to save excessive amounts of space allocated in
corresponding ELF sections for key/value of big size.
4. As some maps disallow having BTF type ID associated with key/value,
it's possible to specify key/value size explicitly without
associating BTF type ID with it. Use key_size and value_size fields
to do that (see example below).
Here's an example of simple ARRAY map defintion:
struct my_value { int x, y, z; };
struct {
int type;
int max_entries;
int *key;
struct my_value *value;
} btf_map SEC(".maps") = {
.type = BPF_MAP_TYPE_ARRAY,
.max_entries = 16,
};
This will define BPF ARRAY map 'btf_map' with 16 elements. The key will
be of type int and thus key size will be 4 bytes. The value is struct
my_value of size 12 bytes. This map can be used from C code exactly the
same as with existing maps defined through struct bpf_map_def.
Here's an example of STACKMAP definition (which currently disallows BTF type
IDs for key/value):
struct {
__u32 type;
__u32 max_entries;
__u32 map_flags;
__u32 key_size;
__u32 value_size;
} stackmap SEC(".maps") = {
.type = BPF_MAP_TYPE_STACK_TRACE,
.max_entries = 128,
.map_flags = BPF_F_STACK_BUILD_ID,
.key_size = sizeof(__u32),
.value_size = PERF_MAX_STACK_DEPTH * sizeof(struct bpf_stack_build_id),
};
This approach is naturally extended to support map-in-map, by making a value
field to be another struct that describes inner map. This feature is not
implemented yet. It's also possible to incrementally add features like pinning
with full backwards and forward compatibility. Support for static
initialization of BPF_MAP_TYPE_PROG_ARRAY using pointers to BPF programs
is also on the roadmap.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Libbpf does sanitization of BTF before loading it into kernel, if kernel
doesn't support some of newer BTF features. This removes some of the
important information from BTF (e.g., DATASEC and VAR description),
which will be used for map construction. This patch splits BTF
processing into initialization step, in which BTF is initialized from
ELF and all the original data is still preserved; and
sanitization/loading step, which ensures that BTF is safe to load into
kernel. This allows to use full BTF information to construct maps, while
still loading valid BTF into older kernels.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
To support maps to be defined in multiple sections, it's important to
identify map not just by offset within its section, but section index as
well. This patch adds tracking of section index.
For global data, we record section index of corresponding
.data/.bss/.rodata ELF section for uniformity, and thus don't need
a special value of offset for those maps.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
User and global data maps initialization has gotten pretty complicated
and unnecessarily convoluted. This patch splits out the logic for global
data map and user-defined map initialization. It also removes the
restriction of pre-calculating how many maps will be initialized,
instead allowing to keep adding new maps as they are discovered, which
will be used later for BTF-defined map definitions.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Simplify ELF parsing logic by exiting early, as there is no common clean
up path to execute. That makes it unnecessary to track when err was set
and when it was cleared. It also reduces nesting in some places.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
As a preparation for adding BTF-based BPF map loading, extract .BTF and
.BTF.ext loading logic.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Multiple files in libbpf redefine their own definitions for min/max.
Let's define them in libbpf_internal.h and use those everywhere.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
When propagating mounts across mount namespaces owned by different user
namespaces it is not possible anymore to move or umount the mount in the
less privileged mount namespace.
Here is a reproducer:
sudo mount -t tmpfs tmpfs /mnt
sudo --make-rshared /mnt
# create unprivileged user + mount namespace and preserve propagation
unshare -U -m --map-root --propagation=unchanged
# now change back to the original mount namespace in another terminal:
sudo mkdir /mnt/aaa
sudo mount -t tmpfs tmpfs /mnt/aaa
# now in the unprivileged user + mount namespace
mount --move /mnt/aaa /opt
Unfortunately, this is a pretty big deal for userspace since this is
e.g. used to inject mounts into running unprivileged containers.
So this regression really needs to go away rather quickly.
The problem is that a recent change falsely locked the root of the newly
added mounts by setting MNT_LOCKED. Fix this by only locking the mounts
on copy_mnt_ns() and not when adding a new mount.
Fixes: 3bd045cc9c4b ("separate copying and locking mount tree on cross-userns copies")
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Tested-by: Christian Brauner <christian@brauner.io>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Christian Brauner <christian@brauner.io>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
sys_fsmount() needs to take a reference to the new mount when adding it
to the anonymous mount namespace. Otherwise the filesystem can be
unmounted while it's still in use, as found by syzkaller.
Reported-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: syzbot+99de05d099a170867f22@syzkaller.appspotmail.com
Reported-by: syzbot+7008b8b8ba7df475fdc8@syzkaller.appspotmail.com
Fixes: 93766fbd2696 ("vfs: syscall: Add fsmount() to create a mount for a superblock")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
We can not hold the GlobalMid_Lock spinlock during the
dfs processing in cifs_reconnect since it invokes things that may sleep
and thus trigger :
BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:23
Thus we need to drop the spinlock during this code block.
RHBZ: 1716743
Cc: stable@vger.kernel.org
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Some servers such as Windows 10 will return STATUS_INSUFFICIENT_RESOURCES
as the number of simultaneous SMB3 requests grows (even though the client
has sufficient credits). Return EAGAIN on STATUS_INSUFFICIENT_RESOURCES
so that we can retry writes which fail with this status code.
This (for example) fixes large file copies to Windows 10 on fast networks.
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
|
|
Currently user is unable to delete the filter. See following example:
$ tc filter add dev ens16np1 ingress pref 1 handle 1 matchall action drop
$ tc filter show dev ens16np1 ingress
filter protocol all pref 1 matchall chain 0
filter protocol all pref 1 matchall chain 0 handle 0x1
in_hw
action order 1: gact action drop
random type none pass val 0
index 1 ref 1 bind 1
$ tc filter del dev ens16np1 ingress pref 1 handle 1 matchall action drop
RTNETLINK answers: Operation not supported
Implement tcf_proto_ops->delete() op and allow user to delete the filter.
Reported-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pointer ae_dev is null checked however, prior to that it is dereferenced
when assigned pointer ops. Fix this by assigning pointer ops after ae_dev
has been null checked.
Addresses-Coverity: ("Dereference before null check")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Kevin Darbyshire-Bryant says:
====================
net: sched: act_ctinfo: fixes
This is first attempt at sending a small series. Order is important
because one bug (policy validation) prevents us from encountering the
more important 'OOPS' generating bug in action creation. Fix the OOPS
first.
Confession time: Until very recently, development of this module has
been done on 'net-next' tree to 'clean compile' level with run-time
testing on backports to 4.14 & 4.19 kernels under openwrt. It turns out
that sched: action: based code has been under more active change than I
realised.
During the back & forward porting during development & testing, the
critical ACT_P_CREATED return code got missed despite being in the 4.14
& 4.19 backports. I have now gone through the init functions, using
act_csum as reference with a fine toothed comb and am happy they do the
same things.
This issue hadn't been caught till now due to another issue caused by
new strict nla_parse_nested function failing parsing validation before
action creation.
Thanks to Marcelo Leitner <marcelo.leitner@gmail.com> for flagging
extack deficiency (fixed in 733f0766c3de sched: act_ctinfo: use extack
error reporting) which led to b424e432e770 ("netlink: add validation of
NLA_F_NESTED flag") and 8cb081746c03 ("netlink: make validation more
configurable for future strictness”) which led to the policy validation
fix, which then led to the action creation fix both contained in this
series.
If I ever get to a developer conference please feel free to
tar/feather/apply cone of shame.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix nla_policy definition by specifying an exact length type attribute
to CTINFO action paraneter block structure. Without this change,
netlink parsing will fail validation and the action will not be
instantiated.
8cb081746c03 ("netlink: make validation more configurable for future")
introduced much stricter checking to attributes being passed via
netlink. Existing actions were updated to use less restrictive
deprecated versions of nla_parse_nested.
As a new module, act_ctinfo should be designed to use the strict
checking model otherwise, well, what was the point of implementing it.
Confession time: Until very recently, development of this module has
been done on 'net-next' tree to 'clean compile' level with run-time
testing on backports to 4.14 & 4.19 kernels under openwrt. This is how
I managed to miss the run-time impacts of the new strict
nla_parse_nested function. I hopefully have learned something from this
(glances toward laptop running a net-next kernel)
There is however a still outstanding implication on iproute2 user space
in that it needs to be told to pass nested netlink messages with the
nested attribute actually set. So even with this kernel fix to do
things correctly you still cannot instantiate a new 'strict'
nla_parse_nested based action such as act_ctinfo with iproute2's tc.
Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use correct return value on action creation: ACT_P_CREATED.
The use of incorrect return value could result in a situation where the
system thought a ctinfo module was listening but actually wasn't
instantiated correctly leading to an OOPS in tcf_generic_walker().
Confession time: Until very recently, development of this module has
been done on 'net-next' tree to 'clean compile' level with run-time
testing on backports to 4.14 & 4.19 kernels under openwrt. During the
back & forward porting during development & testing, the critical
ACT_P_CREATED return code got missed despite being in the 4.14 & 4.19
backports. I have now gone through the init functions, using act_csum
as reference with a fine toothed comb. Bonus, no more OOPSes. I
managed to also miss this issue till now due to the new strict
nla_parse_nested function failing validation before action creation.
As an inexperienced developer I've learned that
copy/pasting/backporting/forward porting code correctly is hard. If I
ever get to a developer conference I shall don the cone of shame.
Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A branch is needed here to get the fix into staging-next as well.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
There are some backward incompatible features pending
for months, mainly due to on-disk format expensions.
However, we should ensure that it cannot be mounted with
old kernels. Otherwise, it will causes unexpected behaviors.
Fixes: ba2b77a82022 ("staging: erofs: add super block operations")
Cc: <stable@vger.kernel.org> # 4.19+
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Vhost_net was known to suffer from HOL[1] issues which is not easy to
fix. Several downstream disable the feature by default. What's more,
the datapath was split and datacopy path got the support of batching
and XDP support recently which makes it faster than zerocopy part for
small packets transmission.
It looks to me that disable zerocopy by default is more
appropriate. It cold be enabled by default again in the future if we
fix the above issues.
[1] https://patchwork.kernel.org/patch/3787671/
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Using a bare block cipher in non-crypto code is almost always a bad idea,
not only for security reasons (and we've seen some examples of this in
the kernel in the past), but also for performance reasons.
In the TCP fastopen case, we call into the bare AES block cipher one or
two times (depending on whether the connection is IPv4 or IPv6). On most
systems, this results in a call chain such as
crypto_cipher_encrypt_one(ctx, dst, src)
crypto_cipher_crt(tfm)->cit_encrypt_one(crypto_cipher_tfm(tfm), ...);
aesni_encrypt
kernel_fpu_begin();
aesni_enc(ctx, dst, src); // asm routine
kernel_fpu_end();
It is highly unlikely that the use of special AES instructions has a
benefit in this case, especially since we are doing the above twice
for IPv6 connections, instead of using a transform which can process
the entire input in one go.
We could switch to the cbcmac(aes) shash, which would at least get
rid of the duplicated overhead in *some* cases (i.e., today, only
arm64 has an accelerated implementation of cbcmac(aes), while x86 will
end up using the generic cbcmac template wrapping the AES-NI cipher,
which basically ends up doing exactly the above). However, in the given
context, it makes more sense to use a light-weight MAC algorithm that
is more suitable for the purpose at hand, such as SipHash.
Since the output size of SipHash already matches our chosen value for
TCP_FASTOPEN_COOKIE_SIZE, and given that it accepts arbitrary input
sizes, this greatly simplifies the code as well.
NOTE: Server farms backing a single server IP for load balancing purposes
and sharing a single fastopen key will be adversely affected by
this change unless all systems in the pool receive their kernel
upgrades at the same time.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-linus
Jonathan writes:
Second set of IIO fixes for the 5.2 cycle.
* ad7150
- sense of bit for controlling adaptive vs fixed threshold was flipped.
* adt7316
- Fix a build issue due to wrong headers for gpio usage.
* lsm6dsx
- correctly suspend / resume i2c slaves when the host goes to sleep.
* mlx90632
- relax a compatability check to allow for newer devices.
Also one counters fix
* counter/ftm-quaddec
- missing dependencies in Kconfig.
* tag 'iio-fixes-for-5.2b' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio:
counter/ftm-quaddec: Add missing dependencies in Kconfig
staging: iio: adt7316: Fix build errors when GPIOLIB is not set
iio: temperature: mlx90632 Relax the compatibility check
iio: imu: st_lsm6dsx: fix PM support for st_lsm6dsx i2c controller
staging:iio:ad7150: fix threshold mode config bit
|
|
In patch series, commit 9195948fbf34 ("tipc: improve TIPC throughput by
Gap ACK blocks"), as for simplicity, the repeated retransmit failures'
detection in the function - "tipc_link_retrans()" was kept there for
broadcast retransmissions only.
This commit now reapplies this feature for link unicast retransmissions
that has been done via the function - "tipc_link_advance_transmq()".
Also, the "tipc_link_retrans()" is renamed to "tipc_link_bc_retrans()"
as it is used only for broadcast.
Acked-by: Jon Maloy <jon.maloy@ericsson.se>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Like bond, add ethtool get_link_ksettings to show the total speed.
v2: no update, just repost.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Eric Dumazet says:
====================
tcp: make sack processing more robust
Jonathan Looney brought to our attention multiple problems
in TCP stack at the sender side.
SACK processing can be abused by malicious peers to either
cause overflows, or increase of memory usage.
First two patches fix the immediate problems.
Since the malicious peers abuse senders by advertizing a very
small MSS in their SYN or SYNACK packet, the last two
patches add a new sysctl so that admins can chose a higher
limit for MSS clamping.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix ssbd.c which depends implicitly on asm/ptrace.h including
linux/prctl.h (through for example linux/compat.h, then linux/time.h,
linux/seqlock.h, linux/spinlock.h and linux/irqflags.h), and uses
PR_SPEC* defines.
This is an issue since we'll soon be removing the include from
asm/ptrace.h.
Fixes: 9cdc0108baa8 ("arm64: ssbd: Add prctl interface for per-thread mitigation")
Cc: stable@vger.kernel.org
Signed-off-by: Anisse Astier <aastier@freebox.fr>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Paul Walmsley:
"This contains fixes, defconfig, and DT data changes for the v5.2-rc
series.
The fixes are relatively straightforward:
- Addition of a TLB fence in the vmalloc_fault path, so the CPU
doesn't enter an infinite page fault loop
- Readdition of the pm_power_off export, so device drivers that
reassign it can now be built as modules
- A udelay() fix for RV32, fixing a miscomputation of the delay time
- Removal of deprecated smp_mb__*() barriers
This also adds initial DT data infrastructure for arch/riscv, along
with initial data for the SiFive FU540-C000 SoC and the corresponding
HiFive Unleashed board.
We also update the RV64 defconfig to include some core drivers for the
FU540 in the build"
* tag 'riscv-for-v5.2/fixes-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: remove unused barrier defines
riscv: mm: synchronize MMU after pte change
riscv: dts: add initial board data for the SiFive HiFive Unleashed
riscv: dts: add initial support for the SiFive FU540-C000 SoC
dt-bindings: riscv: convert cpu binding to json-schema
dt-bindings: riscv: sifive: add YAML documentation for the SiFive FU540
arch: riscv: add support for building DTB files from DT source data
riscv: Fix udelay in RV32.
riscv: export pm_power_off again
RISC-V: defconfig: enable clocks, serial console
|
|
When multiple iovecs reference the same page, each get_user_page call
will add a reference to the page. But once we've created the bio that
information gets lost and only a single reference will be dropped after
I/O completion. Use the same_page information returned from
__bio_try_merge_page to drop additional references to pages that were
already present in the bio.
Based on a patch from Ming Lei.
Link: https://lkml.org/lkml/2019/4/23/64
Fixes: 576ed913 ("block: use bio_add_page in bio_iov_iter_get_pages")
Reported-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We currently have an input same_page parameter to __bio_try_merge_page
to prohibit merging in the same page. The rationale for that is that
some callers need to account for every page added to a bio. Instead of
letting these callers call twice into the merge code to account for the
new vs existing page cases, just turn the paramter into an output one that
returns if a merge in the same page occured and let them act accordingly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
After the recent series of cleanups in the properties and xattrs modules
that landed in the 5.2 merge window, we ended up with a regression where
after deleting the compression xattr property through the setflags ioctl,
we don't set the BTRFS_INODE_COPY_EVERYTHING flag in the inode anymore.
As a consequence, if the inode was fsync'ed when it had the compression
property set, after deleting the compression property through the setflags
ioctl and fsync'ing again the inode, the log will still contain the
compression xattr, because the inode did not had that bit set, which
made the fsync not delete all xattrs from the log and copy all xattrs
from the subvolume tree to the log tree.
This regression happens due to the fact that that series of cleanups
made btrfs_set_prop() call the old function do_setxattr() (which is now
named btrfs_setxattr()), and not the old version of btrfs_setxattr(),
which is now called btrfs_setxattr_trans().
Fix this by setting the BTRFS_INODE_COPY_EVERYTHING bit in the current
btrfs_setxattr() function and remove it from everywhere else, including
its setup at btrfs_ioctl_setflags(). This is cleaner, avoids similar
regressions in the future, and centralizes the setup of the bit. After
all, the need to setup this bit should only be in the xattrs module,
since it is an implementation of xattrs.
Fixes: 04e6863b19c722 ("btrfs: split btrfs_setxattr calls regarding transaction")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
They were introduced in commit fab957c11efe ("RISC-V: Atomic and
Locking Code") long after commit 2e39465abc4b ("locking: Remove
deprecated smp_mb__() barriers") removed the remnants of all previous
instances from the tree.
Signed-off-by: Rolf Eike Beer <eb@emlix.com>
[paul.walmsley@sifive.com: stripped spurious mbox header from patch
description; fixed commit references in patch header]
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
|
|
An endpoint conflict occurs when the USB is working in device mode
during an isochronous communication. When the endpointA IN direction
is an isochronous IN endpoint, and the host sends an IN token to
endpointA on another device, then the OUT transaction may be missed
regardless the OUT endpoint number. Generally, this occurs when the
device is connected to the host through a hub and other devices are
connected to the same hub.
The affected OUT endpoint can be either control, bulk, isochronous, or
an interrupt endpoint. After the OUT endpoint is primed, if an IN token
to the same endpoint number on another device is received, then the OUT
endpoint may be unprimed (cannot be detected by software), which causes
this endpoint to no longer respond to the host OUT token, and thus, no
corresponding interrupt occurs.
There is no good workaround for this issue, the only thing the software
could do is numbering isochronous IN from the highest endpoint since we
have observed most of device number endpoint from the lowest.
Cc: <stable@vger.kernel.org> #v3.14+
Cc: Fabio Estevam <festevam@gmail.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Cc: Jun Li <jun.li@nxp.com>
Signed-off-by: Peter Chen <peter.chen@nxp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
This patch updates the Qualcomm SoC repo to a new location.
Signed-off-by: Andy Gross <agross@kernel.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Olof Johansson <olof@lixom.net>
|
|
If cmd19 timeout or response crcerr occurs during execute_tuning(),
it need invoke msdc_reset_hw(). Otherwise SDIO IRQ can't be detected.
Signed-off-by: jjian zhou <jjian.zhou@mediatek.com>
Signed-off-by: Chaotian Jing <chaotian.jing@mediatek.com>
Signed-off-by: Yong Mao <yong.mao@mediatek.com>
Fixes: 5215b2e952f3 ("mmc: mediatek: Add MMC_CAP_SDIO_IRQ support")
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
SDIO IRQ is triggered by low level. It need disable SDIO IRQ
detected function. Otherwise the interrupt register can't be cleared.
It will process the interrupt more.
Signed-off-by: Jjian Zhou <jjian.zhou@mediatek.com>
Signed-off-by: Chaotian Jing <chaotian.jing@mediatek.com>
Signed-off-by: Yong Mao <yong.mao@mediatek.com>
Fixes: 5215b2e952f3 ("mmc: mediatek: Add MMC_CAP_SDIO_IRQ support")
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
We don't have a reproducible error case, yet our BSP team suggested that
the mmc_switch_status() command in mmc_select_hs400() should come after
the callback into the driver completing HS400 setup. It makes sense to
me because we want the status of a fully setup HS400, so it will
increase the reliability of the mmc_switch_status() command.
Reported-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Fixes: ba6c7ac3a2f4 ("mmc: core: more fine-grained hooks for HS400 tuning")
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
Because RISC-V compliant implementations can cache invalid entries
in TLB, an SFENCE.VMA is necessary after changes to the page table.
This patch adds an SFENCE.vma for the vmalloc_fault path.
Signed-off-by: ShihPo Hung <shihpo.hung@sifive.com>
[paul.walmsley@sifive.com: reversed tab->whitespace conversion,
wrapped comment lines]
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: linux-riscv@lists.infradead.org
Cc: stable@vger.kernel.org
|
|
My @arm.com address will stop working at the end of August, so update to
my @kernel.org address where you'll still be able to reach me.
When I say "stop working" I really mean "will go to my line manager", so
send patches there at your peril because they may reply with roadmaps
and spreadsheets. You have been warned.
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: arm-soc <arm@kernel.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Add initial board data for the SiFive HiFive Unleashed A00.
Currently the data populated in this DT file describes the board
DRAM configuration and the external clock sources that supply the
PRCI.
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
Signed-off-by: Paul Walmsley <paul@pwsan.com>
Tested-by: Loys Ollivier <lollivier@baylibre.com>
Tested-by: Kevin Hilman <khilman@baylibre.com>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Antony Pavlov <antonynpavlov@gmail.com>
Cc: devicetree@vger.kernel.org
Cc: linux-riscv@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
|
|
Add initial support for the SiFive FU540-C000 SoC. This is a 28nm SoC
based around the SiFive U54-MC core complex and a TileLink
interconnect.
This file is expected to grow as more device drivers are added to the
kernel.
This patch includes a fix to the QSPI memory map due to a
documentation bug, found by ShihPo Hung <shihpo.hung@sifive.com>, adds
entries for the I2C controller, and merges all DT changes that
formerly were made dynamically by the riscv-pk BBL proxy kernel.
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
Signed-off-by: Paul Walmsley <paul@pwsan.com>
Tested-by: Loys Ollivier <lollivier@baylibre.com>
Tested-by: Kevin Hilman <khilman@baylibre.com>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: ShihPo Hung <shihpo.hung@sifive.com>
Cc: devicetree@vger.kernel.org
Cc: linux-riscv@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
|
|
At Rob's request, we're starting to migrate our DT binding
documentation to json-schema YAML format. Start by converting our cpu
binding documentation. While doing so, document more properties and
nodes. This includes adding binding documentation support for the E51
and U54 CPU cores ("harts") that are present on this SoC. These cores
are described in:
https://static.dev.sifive.com/FU540-C000-v1.0.pdf
This cpus.yaml file is intended to be a starting point and to
evolve over time. It passes dt-doc-validate as of the yaml-bindings
commit 4c79d42e9216.
This patch was originally based on the ARM json-schema binding
documentation as added by commit 672951cbd1b7 ("dt-bindings: arm: Convert
cpu binding to json-schema").
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
Signed-off-by: Paul Walmsley <paul@pwsan.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: devicetree@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-riscv@lists.infradead.org
|
|
Add YAML DT binding documentation for the SiFive FU540 SoC. This
SoC is documented at:
https://static.dev.sifive.com/FU540-C000-v1.0.pdf
Passes dt-doc-validate, as of yaml-bindings commit 4c79d42e9216.
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
Signed-off-by: Paul Walmsley <paul@pwsan.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: devicetree@vger.kernel.org
Cc: linux-riscv@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
|
|
Similar to ARM64, add support for building DTB files from DT source
data for RISC-V boards.
This patch starts with the infrastructure needed for SiFive boards.
Boards from other vendors would add support here in a similar form.
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
Signed-off-by: Paul Walmsley <paul@pwsan.com>
Tested-by: Loys Ollivier <lollivier@baylibre.com>
Tested-by: Kevin Hilman <khilman@baylibre.com>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
|
|
There is pvinfo writing come from vgpu might be unexpected, like
writing to one unknown address, GVT-g should do as reserved register
to discard any invalid write. Now GVT-g lets it write to the vreg
without prompt error message, should ignore the unexpected pvinfo
write access and leave the vreg as the default value.
For possible guest query GVT-g host feature, this returned proper
value instead of wrong guest setting.
v2: ignore unexpected pvinfo write instead of return predefined value
Fixes: e39c5add3221 ("drm/i915/gvt: vGPU MMIO virtualization")
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Weinan Li <weinan.z.li@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
|
|
lapb_register calls lapb_create_cb, which initializes the control-
block's ref-count to one, and __lapb_insert_cb, which increments it when
adding the new block to the list of blocks.
lapb_unregister calls __lapb_remove_cb, which decrements the ref-count
when removing control-block from the list of blocks, and calls lapb_put
itself to decrement the ref-count before returning.
However, lapb_unregister also calls __lapb_devtostruct to look up the
right control-block for the given net_device, and __lapb_devtostruct
also bumps the ref-count, which means that when lapb_unregister returns
the ref-count is still 1 and the control-block is leaked.
Call lapb_put after __lapb_devtostruct to fix leak.
Reported-by: syzbot+afb980676c836b4a0afa@syzkaller.appspotmail.com
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Syzbot reported a memleak caused by grp members' deferredq list not
purged when the grp is be deleted.
The issue occurs when more(msg_grp_bc_seqno(hdr), m->bc_rcv_nxt) in
tipc_group_filter_msg() and the skb will stay in deferredq.
So fix it by calling __skb_queue_purge for each member's deferredq
in tipc_group_delete() when a tipc sk leaves the grp.
Fixes: b87a5ea31c93 ("tipc: guarantee group unicast doesn't bypass group broadcast")
Reported-by: syzbot+78fbe679c8ca8d264a8d@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
One warning each on signedness, unused variable and return type.
Fixes: 10fbcdd12aa2 ("selftests/net: add TFO key rotation selftest")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|