Age | Commit message (Collapse) | Author | Files | Lines |
|
Add code to do cleanup for signals and application completion in a unified
fashion. The signal handler sets benckmark_done flag terminating the
threads. The cleanup is called before returning from main() function.
Signed-off-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191220085530.4980-3-jay.jayatheerthan@intel.com
|
|
The application now supports '-d' or '--duration' option to specify number of
seconds to run. This is used in tx, rx and l2fwd features. If this option is
not provided, the application runs forever.
Signed-off-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191220085530.4980-2-jay.jayatheerthan@intel.com
|
|
It's follow-up for discussion [1]
CHECK and CHECK_FAIL macros in test_progs.h can affect errno in some
circumstances, e.g. if some code accidentally closes stdout. It makes
checking errno in patterns like this unreliable:
if (CHECK(!bpf_prog_attach_xattr(...), "tag", "msg"))
goto err;
CHECK_FAIL(errno != ENOENT);
, since by CHECK_FAIL time errno could be affected not only by
bpf_prog_attach_xattr but by CHECK as well.
Fix it by saving and restoring errno in the macros. There is no "Fixes"
tag since no problems were discovered yet and it's rather precaution.
test_progs was run with this change and no difference was identified.
[1] https://lore.kernel.org/bpf/20191219210907.GD16266@rdna-mbp.dhcp.thefacebook.com/
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20191220000511.1684853-1-rdna@fb.com
|
|
Magnus Karlsson says:
====================
This patch set cleans up the ring access functions of AF_XDP in hope
that it will now be easier to understand and maintain. I used to get a
headache every time I looked at this code in order to really understand it,
but now I do think it is a lot less painful.
The code has been simplified a lot and as a bonus we get better
performance in nearly all cases. On my new 2.1 GHz Cascade Lake
machine with a standard default config plus AF_XDP support and
CONFIG_PREEMPT on I get the following results in percent performance
increases with this patch set compared to without it:
Zero-copy (-N):
rxdrop txpush l2fwd
1 core: -2% 0% 3%
2 cores: 4% 0% 3%
Zero-copy with poll() (-N -p):
rxdrop txpush l2fwd
1 core: 3% 0% 1%
2 cores: 21% 0% 9%
Skb mode (-S):
Shows a 0% to 5% performance improvement over the same benchmarks as
above.
Here 1 core means that we are running the driver processing and the
application on the same core, while 2 cores means that they execute on
separate cores. The applications are from the xdpsock sample app.
On my older 2.0 Ghz Broadwell machine that I used for the v1, I get
the following results:
Zero-copy (-N):
rxdrop txpush l2fwd
1 core: 4% 5% 4%
2 cores: 1% 0% 2%
Zero-copy with poll() (-N -p):
rxdrop txpush l2fwd
1 core: 1% 3% 3%
2 cores: 22% 0% 5%
Skb mode (-S):
Shows a 0% to 1% performance improvement over the same benchmarks as
above.
When a results says 21 or 22% better, as in the case of poll mode with
2 cores and rxdrop, my first reaction is that it must be a
bug. Everything else shows between 0% and 5% performance
improvement. What is giving rise to 22%? A quick bisect indicates that
it is patches 2, 3, 4, 5, and 6 that are giving rise to most of this
improvement. So not one patch in particular, but something around 4%
improvement from each one of them. Note that exactly this benchmark
has previously had an extraordinary slow down compared to when running
without poll syscalls. For all the other poll tests above, the
slowdown has always been around 4% for using poll syscalls. But with
the bad performing test in question, it was above 25%. Interestingly,
after this clean up, the slow down is 4%, just like all the other poll
tests. Please take an extra peek at this so I have not messed up
something.
The 0% for several txpush results are due to the test bottlenecking on
a non-CPU HW resource. If I eliminated that bottleneck on my system, I
would expect to see an increase there too.
Changes v1 -> v2:
* Corrected textual errors in the commit logs (Sergei and Martin)
* Fixed the functions that detect empty and full rings so that they
now operate on the global ring state (Maxim)
This patch has been applied against commit a352a82496d1 ("Merge branch 'libbpf-extern-followups'")
Structure of the patch set:
Patch 1: Eliminate the lazy update threshold used when preallocating
entries in the completion ring
Patch 2: Simplify the detection of empty and full rings
Patch 3: Consolidate the two local producer pointers into one
Patch 4: Standardize the naming of the producer ring access functions
Patch 5: Eliminate the Rx batch size used for the fill ring
Patch 6: Simplify the functions xskq_nb_avail and xskq_nb_free
Patch 7: Simplify and standardize the naming of the consumer ring
access functions
Patch 8: Change the names of the validation functions to improve
readability and also the return value of these functions
Patch 9: Change the name of xsk_umem_discard_addr() to
xsk_umem_release_addr() to better reflect the new
names. Requires a name change in the drivers that support AF_XDP
zero-copy.
Patch 10: Remove unnecessary READ_ONCE of data in the ring
Patch 11: Add overall function naming comment and reorder the functions
for easier reference
Patch 12: Use the struct_size helper function when allocating rings
====================
Reviewed-by: Björn Töpel <bjorn.topel@intel.com>
Tested-by: Björn Töpel <bjorn.topel@intel.com>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Improve readability and maintainability by using the struct_size()
helper when allocating the AF_XDP rings.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/1576759171-28550-13-git-send-email-magnus.karlsson@intel.com
|
|
Add comments on how the ring access functions are named and how they
are supposed to be used for producers and consumers. The functions are
also reordered so that the consumer functions are in the beginning and
the producer functions in the end, for easier reference. Put this in a
separate patch as the diff might look a little odd, but no
functionality has changed in this patch.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/1576759171-28550-12-git-send-email-magnus.karlsson@intel.com
|
|
There are two unnecessary READ_ONCE of descriptor data. These are not
needed since the data is written by the producer before it signals
that the data is available by incrementing the producer pointer. As the
access to this producer pointer is serialized and the consumer always
reads the descriptor after it has read and synchronized with the
producer counter, the write of the descriptor will have fully
completed and it does not matter if the consumer has any read tearing.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/1576759171-28550-11-git-send-email-magnus.karlsson@intel.com
|
|
Change the name of xsk_umem_discard_addr to xsk_umem_release_addr to
better reflect the new naming of the AF_XDP queue manipulation
functions. As this functions is used by drivers implementing support
for AF_XDP zero-copy, it requires a name change to these drivers. The
function xsk_umem_release_addr_rq has also changed name in the same
fashion.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/1576759171-28550-10-git-send-email-magnus.karlsson@intel.com
|
|
Change the names of the validation functions to better reflect what
they are doing. The uppermost ones are reading entries from the rings
and only the bottom ones validate entries. So xskq_cons_read_ is a
better prefix name.
Also change the xskq_cons_read_ functions to return a bool
as the the descriptor or address is already returned by reference
in the parameters. Everyone is using the return value as a bool
anyway.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/1576759171-28550-9-git-send-email-magnus.karlsson@intel.com
|
|
Simplify and refactor consumer ring functions. The consumer first
"peeks" to find descriptors or addresses that are available to
read from the ring, then reads them and finally "releases" these
descriptors once it is done. The two local variables cons_tail
and cons_head are turned into one single variable called
cached_cons. cached_tail referred to the cached value of the
global consumer pointer and will be stored in cached_cons. For
cached_head, we just use cached_prod instead as it was not used
for a consumer queue before. It also better reflects what it
really is now: a cached copy of the producer pointer.
The names of the functions are also renamed in the same manner as
the producer functions. The new functions are called xskq_cons_
followed by what it does.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/1576759171-28550-8-git-send-email-magnus.karlsson@intel.com
|
|
At this point, there are no users of the functions xskq_nb_avail and
xskq_nb_free that take any other number of entries argument than 1, so
let us get rid of the second argument that takes the number of
entries.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/1576759171-28550-7-git-send-email-magnus.karlsson@intel.com
|
|
In the xsk consumer ring code there is a variable called RX_BATCH_SIZE
that dictates the minimum number of entries that we try to grab from
the fill and Tx rings. In fact, the code always try to grab the
maximum amount of entries from these rings. The only thing this
variable does is to throw an error if there is less than 16 (as it is
defined) entries on the ring. There is no reason to do this and it
will just lead to weird behavior from user space's point of view. So
eliminate this variable.
With this change, we will be able to simplify the xskq_nb_free and
xskq_nb_avail code in the next commit.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/1576759171-28550-6-git-send-email-magnus.karlsson@intel.com
|
|
Adopt the naming of the producer ring access functions to have a
similar naming convention as the functions in libbpf, but adapted to
the kernel. You first reserve a number of entries that you later
submit to the global state of the ring. This is much clearer, IMO,
than the one that was in the kernel part. Once renamed, we also
discover that two functions are actually the same, so remove one of
them. Some of the primitive ring submission operations are also the
same so break these out into __xskq_prod_submit that the upper level
ring access functions can use.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/1576759171-28550-5-git-send-email-magnus.karlsson@intel.com
|
|
Currently, the xsk ring code has two cached producer pointers:
prod_head and prod_tail. This patch consolidates these two into a
single one called cached_prod to make the code simpler and easier to
maintain. This will be in line with the user space part of the the
code found in libbpf, that only uses a single cached pointer.
The Rx path only uses the two top level functions
xskq_produce_batch_desc and xskq_produce_flush_desc and they both use
prod_head and never prod_tail. So just move them over to
cached_prod.
The Tx XDP_DRV path uses xskq_produce_addr_lazy and
xskq_produce_flush_addr_n and unnecessarily operates on both prod_tail
and prod_head, so move them over to just use cached_prod by skipping
the intermediate step of updating prod_tail.
The Tx path in XDP_SKB mode uses xskq_reserve_addr and
xskq_produce_addr. They currently use both cached pointers, but we can
operate on the global producer pointer in xskq_produce_addr since it
has to be updated anyway, thus eliminating the use of both cached
pointers. We can also remove the xskq_nb_free in xskq_produce_addr
since it is already called in xskq_reserve_addr. No need to do it
twice.
When there is only one cached producer pointer, we can also simplify
xskq_nb_free by removing one argument.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/1576759171-28550-4-git-send-email-magnus.karlsson@intel.com
|
|
In order to set the correct return flags for poll, the xsk code has to
check if the Rx queue is empty and if the Tx queue is full. This code
was unnecessarily large and complex as it used the functions that are
used to update the local state from the global state (xskq_nb_free and
xskq_nb_avail). Since we are not doing this nor updating any data
dependent on this state, we can simplify the functions. Another
benefit from this is that we can also simplify the xskq_nb_free and
xskq_nb_avail functions in a later commit.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/1576759171-28550-3-git-send-email-magnus.karlsson@intel.com
|
|
The lazy update threshold was introduced to keep the producer and
consumer some distance apart in the completion ring. This was
important in the beginning of the development of AF_XDP as the ring
format as that point in time was very sensitive to the producer and
consumer being on the same cache line. This is not the case
anymore as the current ring format does not degrade in any noticeable
way when this happens. Moreover, this threshold makes it impossible
to run the system with rings that have less than 128 entries.
So let us remove this threshold and just get one entry from the ring
as in all other functions. This will enable us to remove this function
in a later commit. Note that xskq_produce_addr_lazy followed by
xskq_produce_flush_addr_n are still not the same function as
xskq_produce_addr() as it operates on another cached pointer.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/1576759171-28550-2-git-send-email-magnus.karlsson@intel.com
|
|
Under heavy loads where the kyber I/O scheduler hits the token limits for
its scheduling domains, kyber can become stuck. When active requests
complete, kyber may not be woken up leaving the I/O requests in kyber
stuck.
This stuck state is due to a race condition with kyber and the sbitmap
functions it uses to run a callback when enough requests have completed.
The running of a sbt_wait callback can race with the attempt to insert the
sbt_wait. Since sbitmap_del_wait_queue removes the sbt_wait from the list
first then sets the sbq field to NULL, kyber can see the item as not on a
list but the call to sbitmap_add_wait_queue will see sbq as non-NULL. This
results in the sbt_wait being inserted onto the wait list but ws_active
doesn't get incremented. So the sbitmap queue does not know there is a
waiter on a wait list.
Since sbitmap doesn't think there is a waiter, kyber may never be
informed that there are domain tokens available and the I/O never advances.
With the sbt_wait on a wait list, kyber believes it has an active waiter
so cannot insert a new waiter when reaching the domain's full state.
This race can be fixed by only adding the sbt_wait to the queue if the
sbq field is NULL. If sbq is not NULL, there is already an action active
which will trigger the re-running of kyber. Let it run and add the
sbt_wait to the wait list if still needing to wait.
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Jeffery <djeffery@redhat.com>
Reported-by: John Pittman <jpittman@redhat.com>
Tested-by: John Pittman <jpittman@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Properly initialize refcount to 1 when hardware queue arrays for
TC-MQPRIO offload have been freshly allocated. Otherwise, following
warning is observed. Also fix up error path to only free hardware
queue arrays when refcount reaches 0.
[ 130.075342] ------------[ cut here ]------------
[ 130.075343] refcount_t: addition on 0; use-after-free.
[ 130.075355] WARNING: CPU: 0 PID: 10870 at lib/refcount.c:25
refcount_warn_saturate+0xe1/0x100
[ 130.075356] Modules linked in: sch_mqprio iptable_nat ib_iser
libiscsi scsi_transport_iscsi ib_ipoib rdma_ucm ib_umad iw_cxgb4 libcxgb
ib_uverbs x86_pkg_temp_thermal cxgb4 igb
[ 130.075361] CPU: 0 PID: 10870 Comm: tc Kdump: loaded Not tainted
5.5.0-rc1+ #11
[ 130.075362] Hardware name: Supermicro
X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 3.2
01/16/2015
[ 130.075363] RIP: 0010:refcount_warn_saturate+0xe1/0x100
[ 130.075364] Code: e8 14 41 c1 ff 0f 0b c3 80 3d 44 f4 10 01 00 0f 85
63 ff ff ff 48 c7 c7 38 9f 83 8c 31 c0 c6 05 2e f4 10 01 01 e8 ef 40 c1
ff <0f> 0b c3 48 c7 c7 10 9f 83 8c 31 c0 c6 05 17 f4 10 01 01 e8 d7 40
[ 130.075365] RSP: 0018:ffffa48d00c0b768 EFLAGS: 00010286
[ 130.075366] RAX: 0000000000000000 RBX: 0000000000000008 RCX:
0000000000000001
[ 130.075366] RDX: 0000000000000001 RSI: 0000000000000096 RDI:
ffff8a2e9fa187d0
[ 130.075367] RBP: ffff8a2e93890000 R08: 0000000000000398 R09:
000000000000003c
[ 130.075367] R10: 00000000000142a0 R11: 0000000000000397 R12:
ffffa48d00c0b848
[ 130.075368] R13: ffff8a2e94746498 R14: ffff8a2e966f7000 R15:
0000000000000031
[ 130.075368] FS: 00007f689015f840(0000) GS:ffff8a2e9fa00000(0000)
knlGS:0000000000000000
[ 130.075369] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 130.075369] CR2: 00000000006762a0 CR3: 00000007cf164005 CR4:
00000000001606f0
[ 130.075370] Call Trace:
[ 130.075377] cxgb4_setup_tc_mqprio+0xbee/0xc30 [cxgb4]
[ 130.075382] ? cxgb4_ethofld_restart+0x50/0x50 [cxgb4]
[ 130.075384] ? pfifo_fast_init+0x7e/0xf0
[ 130.075386] mqprio_init+0x5f4/0x630 [sch_mqprio]
[ 130.075389] qdisc_create+0x1bf/0x4a0
[ 130.075390] tc_modify_qdisc+0x1ff/0x770
[ 130.075392] rtnetlink_rcv_msg+0x28b/0x350
[ 130.075394] ? rtnl_calcit.isra.32+0x110/0x110
[ 130.075395] netlink_rcv_skb+0xc6/0x100
[ 130.075396] netlink_unicast+0x1db/0x330
[ 130.075397] netlink_sendmsg+0x2f5/0x460
[ 130.075399] ? _copy_from_user+0x2e/0x60
[ 130.075400] sock_sendmsg+0x59/0x70
[ 130.075401] ____sys_sendmsg+0x1f0/0x230
[ 130.075402] ? copy_msghdr_from_user+0xd7/0x140
[ 130.075403] ___sys_sendmsg+0x77/0xb0
[ 130.075404] ? ___sys_recvmsg+0x84/0xb0
[ 130.075406] ? __handle_mm_fault+0x377/0xaf0
[ 130.075407] __sys_sendmsg+0x53/0xa0
[ 130.075409] do_syscall_64+0x44/0x130
[ 130.075412] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 130.075413] RIP: 0033:0x7f688f13af10
[ 130.075414] Code: c3 48 8b 05 82 6f 2c 00 f7 db 64 89 18 48 83 cb ff
eb dd 0f 1f 80 00 00 00 00 83 3d 8d d0 2c 00 00 75 10 b8 2e 00 00 00 0f
05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ae cc 00 00 48 89 04 24
[ 130.075414] RSP: 002b:00007ffe6c7d9988 EFLAGS: 00000246 ORIG_RAX:
000000000000002e
[ 130.075415] RAX: ffffffffffffffda RBX: 00000000006703a0 RCX:
00007f688f13af10
[ 130.075415] RDX: 0000000000000000 RSI: 00007ffe6c7d99f0 RDI:
0000000000000003
[ 130.075416] RBP: 000000005df38312 R08: 0000000000000002 R09:
0000000000008000
[ 130.075416] R10: 00007ffe6c7d93e0 R11: 0000000000000246 R12:
0000000000000000
[ 130.075417] R13: 00007ffe6c7e9c50 R14: 0000000000000001 R15:
000000000067c600
[ 130.075418] ---[ end trace 8fbb3bf36a8671db ]---
v2:
- Move the refcount_set() closer to where the hardware queue arrays
are being allocated.
- Fix up error path to only free hardware queue arrays when refcount
reaches 0.
Fixes: 2d0cb84dd973 ("cxgb4: add ETHOFLD hardware queue support")
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull Devicetree fix from Rob Herring:
"Add missing 'properties' keyword enclosing 'snps,tso' in snps,dwmac
binding"
* tag 'devicetree-fixes-for-5.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
dt-bindings: Add missing 'properties' keyword enclosing 'snps,tso'
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Leftover put_cpu() in the perf/smmuv3 error path.
- Add Hisilicon TSV110 to spectre-v2 safe list
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: cpu_errata: Add Hisilicon TSV110 to spectre-v2 safe list
perf/smmuv3: Remove the leftover put_cpu() in error path
|
|
Pull drm fixes from Dave Airlie:
"Probably the last one before Christmas, I'll see if there is much
demand over next few weeks for more fixes, I expect it'll be quiet
enough.
This has one exynos fix, and a bunch of i915 core and i915 GVT fixes.
Summary:
exynos:
- component delete fix
i915:
- Fix to drop an unused and harmful display W/A
- Fix to define EHL power wells independent of ICL
- Fix for priority inversion on bonded requests
- Fix in mmio offset calculation of DSB instance
- Fix memory leak from get_task_pid when banning clients
- Fixes to avoid dereference of uninitialized ops in dma_fence
tracing and keep reference to execbuf object until submitted.
- vGPU state setting locking fix (Zhenyu)
- Fix vGPU display dmabuf as read-only (Zhenyu)
- Properly handle vGPU display dmabuf page pin when rendering (Tina)
- Fix one guest boot warning to handle guc reset state (Fred)"
* tag 'drm-fixes-2019-12-21' of git://anongit.freedesktop.org/drm/drm:
drm/exynos: gsc: add missed component_del
drm/i915: Fix pid leak with banned clients
drm/i915/gem: Keep request alive while attaching fences
drm/i915: Fix WARN_ON condition for cursor plane ddb allocation
drm/i915/gvt: Fix guest boot warning
drm/i915/tgl: Drop Wa#1178
drm/i915/ehl: Define EHL powerwells independently of ICL
drm/i915: Set fence_work.ops before dma_fence_init
drm/i915: Copy across scheduler behaviour flags across submit fences
drm/i915/dsb: Fix in mmio offset calculation of DSB instance
drm/i915/gvt: Pin vgpu dma address before using
drm/i915/gvt: set guest display buffer as readonly
drm/i915/gvt: use vgpu lock for active state setting
|
|
Pull io_uring fixes from Jens Axboe:
"Here's a set of fixes that should go into 5.5-rc3 for io_uring.
This is bigger than I'd like it to be, mainly because we're fixing the
case where an application reuses sqe data right after issue. This
really must work, or it's confusing. With 5.5 we're flagging us as
submit stable for the actual data, this must also be the case for
SQEs.
Honestly, I'd really like to add another series on top of this, since
it cleans it up considerable and prevents any SQE reuse by design. I
posted that here:
https://lore.kernel.org/io-uring/20191220174742.7449-1-axboe@kernel.dk/T/#u
and may still send it your way early next week once it's been looked
at and had some more soak time (does pass all regression tests). With
that series, we've unified the prep+issue handling, and only the prep
phase even has access to the SQE.
Anyway, outside of that, fixes in here for a few other issues that
have been hit in testing or production"
* tag 'io_uring-5.5-20191220' of git://git.kernel.dk/linux-block:
io_uring: io_wq_submit_work() should not touch req->rw
io_uring: don't wait when under-submitting
io_uring: warn about unhandled opcode
io_uring: read opcode and user_data from SQE exactly once
io_uring: make IORING_OP_TIMEOUT_REMOVE deferrable
io_uring: make IORING_OP_CANCEL_ASYNC deferrable
io_uring: make IORING_POLL_ADD and IORING_POLL_REMOVE deferrable
io_uring: make HARDLINK imply LINK
io_uring: any deferred command must have stable sqe data
io_uring: remove 'sqe' parameter to the OP helpers that take it
io_uring: fix pre-prepped issue with force_nonblock == true
io-wq: re-add io_wq_current_is_worker()
io_uring: fix sporadic -EFAULT from IORING_OP_RECVMSG
io_uring: fix stale comment and a few typos
|
|
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
- Fix to drop an unused and harmful display W/A
- Fix to define EHL power wells independent of ICL
- Fix for priority inversion on bonded requests
- Fix in mmio offset calculation of DSB instance
- Fix memory leak from get_task_pid when banning clients
- Fixes to avoid dereference of uninitialized ops in dma_fence tracing
and keep reference to execbuf object until submitted.
- Includes gvt-fixes-2019-12-18
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191219124635.GA16068@jlahtine-desk.ger.corp.intel.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-fixes
Just one bug fixup
. Make sure to unregister a component for Exynos gscaler driver
when the driver is removed.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Inki Dae <inki.dae@samsung.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1576714013-3788-1-git-send-email-inki.dae@samsung.com
|
|
Fix this compiler warning:
kernel/debug/debug_core.c: In function ‘kgdb_cpu_enter’:
arch/parisc/include/asm/cmpxchg.h:48:3: warning: value computed is not used [-Wunused-value]
48 | ((__typeof__(*(ptr)))__xchg((unsigned long)(x), (ptr), sizeof(*(ptr))))
arch/parisc/include/asm/atomic.h:78:30: note: in expansion of macro ‘xchg’
78 | #define atomic_xchg(v, new) (xchg(&((v)->counter), new))
| ^~~~
kernel/debug/debug_core.c:596:4: note: in expansion of macro ‘atomic_xchg’
596 | atomic_xchg(&kgdb_active, cpu);
| ^~~~~~~~~~~
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
When I doing fuzzy test, get the memleak report:
BUG: memory leak
unreferenced object 0xffff88837af80000 (size 4096):
comm "memleak", pid 3557, jiffies 4294817681 (age 112.499s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
20 00 00 00 10 01 00 00 00 00 00 00 01 00 00 00 ...............
backtrace:
[<000000001c894df8>] bio_alloc_bioset+0x393/0x590
[<000000008b139a3c>] bio_copy_user_iov+0x300/0xcd0
[<00000000a998bd8c>] blk_rq_map_user_iov+0x2f1/0x5f0
[<000000005ceb7f05>] blk_rq_map_user+0xf2/0x160
[<000000006454da92>] sg_common_write.isra.21+0x1094/0x1870
[<00000000064bb208>] sg_write.part.25+0x5d9/0x950
[<000000004fc670f6>] sg_write+0x5f/0x8c
[<00000000b0d05c7b>] __vfs_write+0x7c/0x100
[<000000008e177714>] vfs_write+0x1c3/0x500
[<0000000087d23f34>] ksys_write+0xf9/0x200
[<000000002c8dbc9d>] do_syscall_64+0x9f/0x4f0
[<00000000678d8e9a>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
If __blk_rq_map_user_iov() is failed in blk_rq_map_user_iov(),
the bio(s) which is allocated before this failing will leak. The
refcount of the bio(s) is init to 1 and increased to 2 by calling
bio_get(), but __blk_rq_unmap_user() only decrease it to 1, so
the bio cannot be freed. Fix it by calling blk_rq_unmap_user().
Reviewed-by: Bob Liu <bob.liu@oracle.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
coypright -> copyright
Reported-by: Kate Stewart <kstewart@linuxfoundation.org>
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
If for whatever reason the dasd_eckd_check_characteristics() function
exits after at least some paths have their configuration data
allocated those data is never freed again. In the error case the
device->private pointer is set to NULL and dasd_eckd_uncheck_device()
will exit without freeing the path data because of this NULL pointer.
Fix by calling dasd_eckd_clear_conf_data() for error cases.
Also use dasd_eckd_clear_conf_data() in dasd_eckd_uncheck_device()
to avoid code duplication.
Reported-by: Qian Cai <cai@lca.pw>
Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com>
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The max data count (mdc) is an unsigned 16-bit integer value as per AR
documentation and is received via ccw_device_get_mdc() for a specific
path mask from the CIO layer. The function itself also always returns a
positive mdc value or 0 in case mdc isn't supported or couldn't be
determined.
Though, the comment for this function describes a negative return value
to indicate failures.
As a result, the DASD device driver interprets the return value of
ccw_device_get_mdc() incorrectly. The error case is essentially a dead
code path.
To fix this behaviour, check explicitly for a return value of 0 and
change the comment for ccw_device_get_mdc() accordingly.
This fix merely enables the error code path in the DASD functions
get_fcx_max_data() and verify_fcx_max_data(). The actual functionality
stays the same and is still correct.
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com>
Acked-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Reviewed-by: Stefan Haberland <sth@linux.ibm.com>
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Avoid that running test nvme/012 from the blktests suite triggers the
following false positive lockdep complaint:
============================================
WARNING: possible recursive locking detected
5.0.0-rc3-xfstests-00015-g1236f7d60242 #841 Not tainted
--------------------------------------------
ksoftirqd/1/16 is trying to acquire lock:
000000000282032e (&(&fq->mq_flush_lock)->rlock){..-.}, at: flush_end_io+0x4e/0x1d0
but task is already holding lock:
00000000cbadcbc2 (&(&fq->mq_flush_lock)->rlock){..-.}, at: flush_end_io+0x4e/0x1d0
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&(&fq->mq_flush_lock)->rlock);
lock(&(&fq->mq_flush_lock)->rlock);
*** DEADLOCK ***
May be due to missing lock nesting notation
1 lock held by ksoftirqd/1/16:
#0: 00000000cbadcbc2 (&(&fq->mq_flush_lock)->rlock){..-.}, at: flush_end_io+0x4e/0x1d0
stack backtrace:
CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 5.0.0-rc3-xfstests-00015-g1236f7d60242 #841
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
dump_stack+0x67/0x90
__lock_acquire.cold.45+0x2b4/0x313
lock_acquire+0x98/0x160
_raw_spin_lock_irqsave+0x3b/0x80
flush_end_io+0x4e/0x1d0
blk_mq_complete_request+0x76/0x110
nvmet_req_complete+0x15/0x110 [nvmet]
nvmet_bio_done+0x27/0x50 [nvmet]
blk_update_request+0xd7/0x2d0
blk_mq_end_request+0x1a/0x100
blk_flush_complete_seq+0xe5/0x350
flush_end_io+0x12f/0x1d0
blk_done_softirq+0x9f/0xd0
__do_softirq+0xca/0x440
run_ksoftirqd+0x24/0x50
smpboot_thread_fn+0x113/0x1e0
kthread+0x121/0x140
ret_from_fork+0x3a/0x50
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This patch fixes the following sparse warnings:
block/bsg-lib.c:269:19: warning: incorrect type in initializer (different base types)
block/bsg-lib.c:269:19: expected int sts
block/bsg-lib.c:269:19: got restricted blk_status_t [usertype]
block/bsg-lib.c:286:16: warning: incorrect type in return expression (different base types)
block/bsg-lib.c:286:16: expected restricted blk_status_t
block/bsg-lib.c:286:16: got int [assigned] sts
Cc: Martin Wilck <mwilck@suse.com>
Fixes: d46fe2cb2dce ("block: drop device references in bsg_queue_rq()")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Switch page deallocation table (pdt) driver to use pfn instead of a page
pointer in soft_offline_page().
Fixes: feec24a6139d ("mm, soft-offline: convert parameter to pfn")
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
- Fix kmemleak warning in IOVA code
- Fix compile warnings on ARM32/64 in dma-iommu code due to dma_mask
type mismatches
- Make ISA reserved regions relaxable, so that VFIO can assign devices
which have such regions defined
- Fix mapping errors resulting in IO page-faults in the VT-d driver
- Make sure direct mappings for a domain are created after the default
domain is updated
- Map ISA reserved regions in the VT-d driver with correct permissions
- Remove unneeded check for PSI capability in the IOTLB flush code of
the VT-d driver
- Lockdep fix iommu_dma_prepare_msi()
* tag 'iommu-fixes-v5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/dma: Relax locking in iommu_dma_prepare_msi()
iommu/vt-d: Remove incorrect PSI capability check
iommu/vt-d: Allocate reserved region for ISA with correct permission
iommu: set group default domain before creating direct mappings
iommu/vt-d: Fix dmar pte read access not set error
iommu/vt-d: Set ISA bridge reserved region as relaxable
iommu/dma: Rationalise types for DMA masks
iommu/iova: Init the struct iova to fix the possible memleak
|
|
git://git.infradead.org/linux-platform-drivers-x86
Pull x86 platform driver fixes from Andy Shevchenko:
"Bucket of fixes for PDx86. Note, that there is no ABI breakage in
Mellanox driver because it has been introduced in v5.5-rc1, so we can
change it.
Summary:
- Add support of APUv4 and fix an assignment of simswap GPIO
- Add Siemens CONNECT X300 to DMI table to avoid stuck during boot
- Correct arguments of WMI call on HP Envy x360 15-cp0xxx model
- Fix the mlx-bootctl sysfs attributes to be device related"
* tag 'platform-drivers-x86-v5.5-2' of git://git.infradead.org/linux-platform-drivers-x86:
platform/x86: pcengines-apuv2: Spelling fixes in the driver
platform/x86: pcengines-apuv2: detect apuv4 board
platform/x86: pcengines-apuv2: fix simswap GPIO assignment
platform/x86: pmc_atom: Add Siemens CONNECT X300 to critclk_systems DMI table
platform/x86: hp-wmi: Make buffer for HPWMI_FEATURE2_QUERY 128 bytes
platform/mellanox: fix the mlx-bootctl sysfs
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC host fixes from Ulf Hansson:
- mtk-sd: Fix tuning for MT8173 HS200/HS400 mode
- sdhci: Revert a fix for incorrect switch to HS mode
- sdhci-msm: Fixup accesses to the DDR_CONFIG register
- sdhci-of-esdhc: Revert a bad fix for erratum A-009204
- sdhci-of-esdhc: Re-implement fix for erratum A-009204
- sdhci-of-esdhc: Fixup P2020 errata handling
- sdhci-pci: Disable broken CMDQ on Intel GLK based Lenovo systems
* tag 'mmc-v5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci-of-esdhc: re-implement erratum A-009204 workaround
mmc: sdhci: Add a quirk for broken command queuing
mmc: sdhci: Workaround broken command queuing on Intel GLK
mmc: sdhci-of-esdhc: fix P2020 errata handling
mmc: sdhci: Update the tuning failed messages to pr_debug level
mmc: sdhci-of-esdhc: Revert "mmc: sdhci-of-esdhc: add erratum A-009204 support"
mmc: mediatek: fix CMD_TA to 2 for MT8173 HS200/HS400 mode
mmc: sdhci-msm: Correct the offset and value for DDR_CONFIG register
Revert "mmc: sdhci: Fix incorrect switch to HS mode"
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are some small char and other driver fixes for 5.5-rc3.
The most noticable one is a much-reported fix for a random driver
issue that came up from 5.5-rc1 compat_ioctl cleanups. The others are
a chunk of habanalab driver fixes and intel_th driver fixes and new
device ids.
All have been in linux-next with no reported issues"
* tag 'char-misc-5.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
random: don't forget compat_ioctl on urandom
intel_th: msu: Fix window switching without windows
intel_th: Fix freeing IRQs
intel_th: pci: Add Elkhart Lake SOC support
intel_th: pci: Add Comet Lake PCH-V support
habanalabs: remove variable 'val' set but not used
habanalabs: rate limit error msg on waiting for CS
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver fixes from Greg KH:
"Here are some small staging driver fixes for a number of reported
issues.
The majority here are some fixes for the wfx driver, but also in here
is a comedi driver fix found during some code review, and an axis-fifo
build dependancy issue to resolve some reported testing problems.
All of these have been in linux-next with no reported issues"
* tag 'staging-5.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: wfx: fix wrong error message
staging: wfx: fix hif_set_mfp() with big endian hosts
staging: wfx: detect race condition in WEP authentication
staging: wfx: ensure that retry policy always fallbacks to MCS0 / 1Mbps
staging: wfx: fix rate control handling
staging: wfx: firmware does not support more than 32 total retries
staging: wfx: use boolean appropriately
staging: wfx: fix counter overflow
staging: wfx: fix case of lack of tx_retry_policies
staging: wfx: fix the cache of rate policies on interface reset
staging: axis-fifo: add unspecified HAS_IOMEM dependency
staging: comedi: gsc_hpdi: check dma_alloc_coherent() return value
|
|
HiSilicon Taishan v110 CPUs didn't implement CSV2 field of the
ID_AA64PFR0_EL1, but spectre-v2 is mitigated by hardware, so
whitelist the MIDR in the safe list.
Signed-off-by: Wei Li <liwei391@huawei.com>
[hanjun: re-write the commit log]
Signed-off-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are some small tty and serial driver fixes for 5.5-rc3.
Only four small patches here:
- atmel serial driver fix
- msm_serial driver fix
- sprd serial driver fix
- tty core port fix
The last tty core fix should resolve a long-standing bug with a race
at port creation time that some people would see, and Sudip finally
tracked down.
All of these have been in linux-next with no reported issues"
* tag 'tty-5.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty/serial: atmel: fix out of range clock divider handling
tty: link tty and port before configuring it as console
serial: sprd: Add clearing break interrupt operation
tty: serial: msm_serial: Fix lockup for sysrq and oops
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some small USB fixes for some reported issues.
Included in here are:
- xhci build warning fix
- ehci disconnect warning fix
- usbip lockup fix and error cleanup fix
- typec build fix
All of these have been in linux-next with no reported issues"
* tag 'usb-5.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: xhci: Fix build warning seen with CONFIG_PM=n
usbip: Fix error path of vhci_recv_ret_submit()
usbip: Fix receive error in vhci-hcd when using scatter-gather
USB: EHCI: Do not return -EPIPE when hub is disconnected
usb: typec: fusb302: Fix an undefined reference to 'extcon_get_state'
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
"Sorry that this fixes pull request took a while. Too much christmas
business going on.
This contains a few really important Intel fixes and some odd fixes:
- A host of fixes for the Intel baytrail and cherryview: properly
serialize all register accesses and add the irqchip with the
gpiochip as we need to, fix some pin lists and initialize the
hardware in the right order.
- Fix the Aspeed G6 LPC configuration.
- Handle a possible NULL pointer exception in the core.
- Fix the Kconfig dependencies for the Equilibrium driver"
* tag 'pinctrl-v5.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: ingenic: Fixup PIN_CONFIG_OUTPUT config
pinctrl: Modify Kconfig to fix linker error
pinctrl: pinmux: fix a possible null pointer in pinmux_can_be_used_for_gpio
pinctrl: aspeed-g6: Fix LPC/eSPI mux configuration
pinctrl: cherryview: Pass irqchip when adding gpiochip
pinctrl: cherryview: Add GPIO <-> pin mapping ranges via callback
pinctrl: cherryview: Split out irq hw-init into a separate helper function
pinctrl: baytrail: Pass irqchip when adding gpiochip
pinctrl: baytrail: Add GPIO <-> pin mapping ranges via callback
pinctrl: baytrail: Update North Community pin list
pinctrl: baytrail: Really serialize all register accesses
|
|
This moves the prep handlers outside of the opcode handlers, and allows
us to pass in the sqe directly. If the sqe is non-NULL, it means that
the request should be prepared for the first time.
With the opcode handlers not having access to the sqe at all, we are
guaranteed that the prep handler has setup the request fully by the
time we get there. As before, for opcodes that need to copy in more
data then the io_kiocb allows for, the io_async_ctx holds that info. If
a prep handler is invoked with req->io set, it must use that to retain
information for later.
Finally, we can remove io_kiocb->sqe as well.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We currently have a mix of use cases. Most of the newer ones are pretty
uniform, but we have some older ones that use different calling
calling conventions. This is confusing.
For the opcodes that currently rely on the req->io->sqe copy saving
them from reuse, add a request type struct in the io_kiocb command
union to store the data they need.
Prepare for all opcodes having a standard prep method, so we can call
it in a uniform fashion and outside of the opcode handler. This is in
preparation for passing in the 'sqe' pointer, rather than storing it
in the io_kiocb. Once we have uniform prep handlers, we can leave all
the prep work to that part, and not even pass in the sqe to the opcode
handler. This ensures that we don't reuse sqe data inadvertently.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Mainly does:
- capitalize gpio and bios to GPIO and BIOS
- capitalize beginning of comments
- add periods in multi-line comments
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
|
|
GPIO stuff on APUv4 seems to be the same as on APUv2, so we just
need to match on DMI data.
Signed-off-by: Enrico Weigelt, metux IT consult <info@metux.net>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
|
|
The mapping entry has to hold the GPIO line index instead of
controller's register number.
Fixes: 5037d4ddda31 ("platform/x86: pcengines-apuv2: wire up simswitch gpio as led")
Signed-off-by: Enrico Weigelt, metux IT consult <info@metux.net>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
|
|
The CONNECT X300 uses the PMC clock for on-board components and gets
stuck during boot if the clock is disabled. Therefore, add this
device to the critical systems list.
Tested on CONNECT X300.
Fixes: 648e921888ad ("clk: x86: Stop marking clocks as CLK_IS_CRITICAL")
Signed-off-by: Michael Haener <michael.haener@siemens.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
|
|
At least on the HP Envy x360 15-cp0xxx model the WMI interface
for HPWMI_FEATURE2_QUERY requires an outsize of at least 128 bytes,
otherwise it fails with an error code 5 (HPWMI_RET_INVALID_PARAMETERS):
Dec 06 00:59:38 kernel: hp_wmi: query 0xd returned error 0x5
We do not care about the contents of the buffer, we just want to know
if the HPWMI_FEATURE2_QUERY command is supported.
This commits bumps the buffer size, fixing the error.
Fixes: 8a1513b4932 ("hp-wmi: limit hotkey enable")
Cc: stable@vger.kernel.org
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1520703
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
|
|
This is a follow-up commit for the sysfs attributes to change
from DRIVER_ATTR to DEVICE_ATTR according to some initial comments.
In such case, it's better to point the sysfs path to the device
itself instead of the driver. The ABI document is also updated.
Fixes: 79e29cb8fbc5 ("platform/mellanox: Add bootctl driver for Mellanox BlueField Soc")
Signed-off-by: Liming Sun <lsun@mellanox.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
|
|
Add the count field to struct io_timeout, and ensure the prep handler
has read it. Timeout also needs an async context always, set it up
in the prep handler if we don't have one.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|