Age | Commit message (Collapse) | Author | Files | Lines |
|
try_cmpxchg implicitly assigns old head->first value to "first" when
cmpxchg fails. There is no need to re-read the value in the loop.
Link: https://lkml.kernel.org/r/20221017145226.4044-1-ubizjak@gmail.com
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The variable num is being assigned a value that is never read, it is being
re-assigned a new value in both paths if an if-statement. The assignment
is redundant and can be removed.
Cleans up clang scan build warning:
lib/oid_registry.c:149:3: warning: Value stored to 'num' is
never read [deadcode.DeadStores]
Link: https://lkml.kernel.org/r/20221017214556.863357-1-colin.i.king@gmail.com
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
These cases were done with this Coccinelle:
@@
expression H;
expression L;
@@
- (get_random_u32_below(H) + L)
+ get_random_u32_inclusive(L, H + L - 1)
@@
expression H;
expression L;
expression E;
@@
get_random_u32_inclusive(L,
H
- + E
- - E
)
@@
expression H;
expression L;
expression E;
@@
get_random_u32_inclusive(L,
H
- - E
- + E
)
@@
expression H;
expression L;
expression E;
expression F;
@@
get_random_u32_inclusive(L,
H
- - E
+ F
- + E
)
@@
expression H;
expression L;
expression E;
expression F;
@@
get_random_u32_inclusive(L,
H
- + E
+ F
- - E
)
And then subsequently cleaned up by hand, with several automatic cases
rejected if it didn't make sense contextually.
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> # for infiniband
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
These cases were done with this Coccinelle:
@@
expression E;
identifier I;
@@
- do {
... when != I
- I = get_random_u32();
... when != I
- } while (I > E);
+ I = get_random_u32_below(E + 1);
@@
expression E;
identifier I;
@@
- do {
... when != I
- I = get_random_u32();
... when != I
- } while (I >= E);
+ I = get_random_u32_below(E);
@@
expression E;
identifier I;
@@
- do {
... when != I
- I = get_random_u32();
... when != I
- } while (I < E);
+ I = get_random_u32_above(E - 1);
@@
expression E;
identifier I;
@@
- do {
... when != I
- I = get_random_u32();
... when != I
- } while (I <= E);
+ I = get_random_u32_above(E);
@@
identifier I;
@@
- do {
... when != I
- I = get_random_u32();
... when != I
- } while (!I);
+ I = get_random_u32_above(0);
@@
identifier I;
@@
- do {
... when != I
- I = get_random_u32();
... when != I
- } while (I == 0);
+ I = get_random_u32_above(0);
@@
expression E;
@@
- E + 1 + get_random_u32_below(U32_MAX - E)
+ get_random_u32_above(E)
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
This is a simple mechanical transformation done by:
@@
expression E;
@@
- prandom_u32_max
+ get_random_u32_below
(E)
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs
Reviewed-by: SeongJae Park <sj@kernel.org> # for damon
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> # for infiniband
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> # for arm
Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Jan reported the new algorithm as merged might be problematic if the
queue being awaken becomes empty between the waitqueue_active inside
sbq_wake_ptr check and the wake up. If that happens, wake_up_nr will
not wake up any waiter and we loose too many wake ups. In order to
guarantee progress, we need to wake up at least one waiter here, if
there are any. This now requires trying to wake up from every queue.
Instead of walking through all the queues with sbq_wake_ptr, this call
moves the wake up inside that function. In a previous version of the
patch, I found that updating wake_index several times when walking
through queues had a measurable overhead. This ensures we only update
it once, at the end.
Fixes: 4f8126bb2308 ("sbitmap: Use single per-bitmap counting to wake up queued tags")
Reported-by: Jan Kara <jack@suse.cz>
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20221115224553.23594-4-krisman@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
When a queue is awaken, the wake_index written by sbq_wake_ptr currently
keeps pointing to the same queue. On the next wake up, it will thus
retry the same queue, which is unfair to other queues, and can lead to
starvation. This patch, moves the index update to happen before the
queue is returned, such that it will now try a different queue first on
the next wake up, improving fairness.
Fixes: 4f8126bb2308 ("sbitmap: Use single per-bitmap counting to wake up queued tags")
Reported-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://lore.kernel.org/r/20221115224553.23594-2-krisman@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
New initialization macro for linear ranges was added. Slightly simplify
the test code by using this macro - and at the same time also verify the
macro is working as intended.
Use the newly added LINEAR_RANGE() initialization macro for linear range
test.
Signed-off-by: Matti Vaittinen <mazziesaccount@gmail.com>
Link: https://lore.kernel.org/r/Y3R13IRrs+x5PcZ4@dc75zzyyyyyyyyyyyyydt-3.rev.dnainternet.fi
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
1. Var debug_objects_allocated tracks valid kmem_cache_alloc calls, so
track it in debug_objects_replace_static_objects. Do similar things in
object_cpu_offline.
2. In debug_objects_mem_init, there is no need to call function
cpuhp_setup_state_nocalls when debug_objects_enabled = 0 (out of
memory).
Link: https://lkml.kernel.org/r/20220611130634.99741-1-wuchi.zero@gmail.com
Fixes: 634d61f45d6f ("debugobjects: Percpu pool lookahead freeing/allocation")
Fixes: c4b73aabd098 ("debugobjects: Track number of kmem_cache_alloc/kmem_cache_free done")
Signed-off-by: wuchi <wuchi.zero@gmail.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
With CXL security features, and CXL dynamic provisioning, global CPU
cache flushing nvdimm requirements are no longer specific to that
subsystem, even beyond the scope of security_ops. CXL will need such
semantics for features not necessarily limited to persistent memory.
The functionality this is enabling is to be able to instantaneously
secure erase potentially terabytes of memory at once and the kernel
needs to be sure that none of the data from before the erase is still
present in the cache. It is also used when unlocking a memory device
where speculative reads and firmware accesses could have cached poison
from before the device was unlocked. Lastly this facility is used when
mapping new devices, or new capacity into an established physical
address range. I.e. when the driver switches DeviceA mapping AddressX to
DeviceB mapping AddressX then any cached data from DeviceA:AddressX
needs to be invalidated.
This capability is typically only used once per-boot (for unlock), or
once per bare metal provisioning event (secure erase), like when handing
off the system to another tenant or decommissioning a device. It may
also be used for dynamic CXL region provisioning.
Users must first call cpu_cache_has_invalidate_memregion() to know
whether this functionality is available on the architecture. On x86 this
respects the constraints of when wbinvd() is tolerable. It is already
the case that wbinvd() is problematic to allow in VMs due its global
performance impact and KVM, for example, has been known to just trap and
ignore the call. With confidential computing guest execution of wbinvd()
may even trigger an exception. Given guests should not be messing with
the bare metal address map via CXL configuration changes
cpu_cache_has_invalidate_memregion() returns false in VMs.
While this global cache invalidation facility, is exported to modules,
since NVDIMM and CXL support can be built as a module, it is not for
general use. The intent is that this facility is not available outside
of specific "device-memory" use cases. To make that expectation as clear
as possible the API is scoped to a new "DEVMEM" module namespace that
only the NVDIMM and CXL subsystems are expected to import.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Tested-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Co-developed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
RAID6_USE_EMPTY_ZERO_PAGE is unused and hardcoded to 0, so let's drop it.
Signed-off-by: Giulio Benetti <giulio.benetti@benettiengineering.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>
|
|
sbitmap suffers from code complexity, as demonstrated by recent fixes,
and eventual lost wake ups on nested I/O completion. The later happens,
from what I understand, due to the non-atomic nature of the updates to
wait_cnt, which needs to be subtracted and eventually reset when equal
to zero. This two step process can eventually miss an update when a
nested completion happens to interrupt the CPU in between the wait_cnt
updates. This is very hard to fix, as shown by the recent changes to
this code.
The code complexity arises mostly from the corner cases to avoid missed
wakes in this scenario. In addition, the handling of wake_batch
recalculation plus the synchronization with sbq_queue_wake_up is
non-trivial.
This patchset implements the idea originally proposed by Jan [1], which
removes the need for the two-step updates of wait_cnt. This is done by
tracking the number of completions and wakeups in always increasing,
per-bitmap counters. Instead of having to reset the wait_cnt when it
reaches zero, we simply keep counting, and attempt to wake up N threads
in a single wait queue whenever there is enough space for a batch.
Waking up less than batch_wake shouldn't be a problem, because we
haven't changed the conditions for wake up, and the existing batch
calculation guarantees at least enough remaining completions to wake up
a batch for each queue at any time.
Performance-wise, one should expect very similar performance to the
original algorithm for the case where there is no queueing. In both the
old algorithm and this implementation, the first thing is to check
ws_active, which bails out if there is no queueing to be managed. In the
new code, we took care to avoid accounting completions and wakeups when
there is no queueing, to not pay the cost of atomic operations
unnecessarily, since it doesn't skew the numbers.
For more interesting cases, where there is queueing, we need to take
into account the cross-communication of the atomic operations. I've
been benchmarking by running parallel fio jobs against a single hctx
nullb in different hardware queue depth scenarios, and verifying both
IOPS and queueing.
Each experiment was repeated 5 times on a 20-CPU box, with 20 parallel
jobs. fio was issuing fixed-size randwrites with qd=64 against nullb,
varying only the hardware queue length per test.
queue size 2 4 8 16 32 64
6.1-rc2 1681.1K (1.6K) 2633.0K (12.7K) 6940.8K (16.3K) 8172.3K (617.5K) 8391.7K (367.1K) 8606.1K (351.2K)
patched 1721.8K (15.1K) 3016.7K (3.8K) 7543.0K (89.4K) 8132.5K (303.4K) 8324.2K (230.6K) 8401.8K (284.7K)
The following is a similar experiment, ran against a nullb with a single
bitmap shared by 20 hctx spread across 2 NUMA nodes. This has 40
parallel fio jobs operating on the same device
queue size 2 4 8 16 32 64
6.1-rc2 1081.0K (2.3K) 957.2K (1.5K) 1699.1K (5.7K) 6178.2K (124.6K) 12227.9K (37.7K) 13286.6K (92.9K)
patched 1081.8K (2.8K) 1316.5K (5.4K) 2364.4K (1.8K) 6151.4K (20.0K) 11893.6K (17.5K) 12385.6K (18.4K)
It has also survived blktests and a 12h-stress run against nullb. I also
ran the code against nvme and a scsi SSD, and I didn't observe
performance regression in those. If there are other tests you think I
should run, please let me know and I will follow up with results.
[1] https://lore.kernel.org/all/aef9de29-e9f5-259a-f8be-12d1b734e72@google.com/
Cc: Hugh Dickins <hughd@google.com>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Liu Song <liusong@linux.alibaba.com>
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://lore.kernel.org/r/20221105231055.25953-1-krisman@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Consecutive zone device pages should not be merged into the same sgl
or bvec segment with other types of pages or if they belong to different
pgmaps. Otherwise getting the pgmap of a given segment is not possible
without scanning the entire segment. This helper returns true either if
both pages are not zone device pages or both pages are zone device
pages with the same pgmap.
Factor out the check for page mergability into a pages_are_mergable()
helper and add a check with zone_device_pages_are_mergeable().
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20221021174116.7200-6-logang@deltatee.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add iov_iter_get_pages_flags() and iov_iter_get_pages_alloc_flags()
which take a flags argument that is passed to get_user_pages_fast().
This is so that FOLL_PCI_P2PDMA can be passed when appropriate.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20221021174116.7200-4-logang@deltatee.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
In RCU mode, the node limits were being updated to the last pivot which
may not be correct and would cause the metadata to be set when it
shouldn't. Fix this by not setting a new limit in this case.
Link: https://lkml.kernel.org/r/20221107163857.867377-1-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
It is possible to confuse the depth tracking in the maple state by
searching the same node for values. Fix the depth tracking by moving
where the depth is incremented closer to where the node changes level.
Also change the initial depth setting when using the root node.
Link: https://lkml.kernel.org/r/20221107163814.866612-1-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
As pointed out by Peter Zijlstra, __msan_poison_alloca() does not play
well with IRQ code when PREEMPT_RT is on, because in that mode even
GFP_ATOMIC allocations cannot be performed.
Fixing this would require making stackdepot completely lockless, which is
quite challenging and may be excessive for the time being.
Instead, make sure KMSAN is incompatible with PREEMPT_RT, like other debug
configs are.
Link: https://lkml.kernel.org/r/20221102110611.1085175-4-glider@google.com
Link: https://lore.kernel.org/lkml/20221025221755.3810809-1-glider@google.com/
Signed-off-by: Alexander Potapenko <glider@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
As pointed out by Masahiro Yamada, Kconfig picks up the first default
entry which has true 'if' condition. Hence, the previously added check
for KMSAN was never used, because it followed the checks for 64BIT and
!64BIT.
Put KMSAN check before others to ensure it is always applied.
Link: https://lkml.kernel.org/r/20221102110611.1085175-3-glider@google.com
Link: https://github.com/google/kmsan/issues/89
Link: https://lore.kernel.org/linux-mm/20221024212144.2852069-3-glider@google.com/
Fixes: 921757bc9b61 ("Kconfig.debug: disable CONFIG_FRAME_WARN for KMSAN by default")
Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Marco Elver <elver@google.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Along the development cycle, the testing code support for module/in-kernel
compiles was removed. Restore this functionality by moving any internal
API tests to the userspace side, as well as threading tests. Fix the
lockdep issues and add a way to reduce memory usage so the tests can
complete with KASAN + memleak detection. Make the tests work on 32 bit
hosts where possible and detect 32 bit hosts in the radix test suite.
[akpm@linux-foundation.org: fix module export]
[akpm@linux-foundation.org: fix it some more]
[liam.howlett@oracle.com: fix compile warnings on 32bit build in check_find()]
Link: https://lkml.kernel.org/r/20221107203816.1260327-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20221028180415.3074673-1-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
clang-analyzer reported some Dead Stores in mas_anode_descend(). Upon
inspection, there were a few clean ups that would make the code cleaner:
The count variable was set from the mt_slots array and then updated but
never used again. Just use the array reference directly.
Also stop updating the type since it isn't used after the update.
Stop setting the gaps pointer to NULL at the start since it is always
set before the loop begins.
Link: https://lkml.kernel.org/r/20221026151413.4032730-1-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Suggested-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
There is a more direct and cleaner way of implementing the same functional
code. Remove the confusing and unnecessary use of pointers here.
Link: https://lkml.kernel.org/r/20221026151241.4031117-1-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Suggested-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from bluetooth and netfilter.
Current release - regressions:
- net: several zerocopy flags fixes
- netfilter: fix possible memory leak in nf_nat_init()
- openvswitch: add missing .resv_start_op
Previous releases - regressions:
- neigh: fix null-ptr-deref in neigh_table_clear()
- sched: fix use after free in red_enqueue()
- dsa: fall back to default tagger if we can't load the one from DT
- bluetooth: fix use-after-free in l2cap_conn_del()
Previous releases - always broken:
- netfilter: netlink notifier might race to release objects
- nfc: fix potential memory leak of skb
- bluetooth: fix use-after-free caused by l2cap_reassemble_sdu
- bluetooth: use skb_put to set length
- eth: tun: fix bugs for oversize packet when napi frags enabled
- eth: lan966x: fixes for when MTU is changed
- eth: dwmac-loongson: fix invalid mdio_node"
* tag 'net-6.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (53 commits)
vsock: fix possible infinite sleep in vsock_connectible_wait_data()
vsock: remove the unused 'wait' in vsock_connectible_recvmsg()
ipv6: fix WARNING in ip6_route_net_exit_late()
bridge: Fix flushing of dynamic FDB entries
net, neigh: Fix null-ptr-deref in neigh_table_clear()
net/smc: Fix possible leaked pernet namespace in smc_init()
stmmac: dwmac-loongson: fix invalid mdio_node
ibmvnic: Free rwi on reset success
net: mdio: fix undefined behavior in bit shift for __mdiobus_register
Bluetooth: L2CAP: Fix attempting to access uninitialized memory
Bluetooth: L2CAP: Fix l2cap_global_chan_by_psm
Bluetooth: L2CAP: Fix accepting connection request for invalid SPSM
Bluetooth: hci_conn: Fix not restoring ISO buffer count on disconnect
Bluetooth: L2CAP: Fix memory leak in vhci_write
Bluetooth: L2CAP: fix use-after-free in l2cap_conn_del()
Bluetooth: virtio_bt: Use skb_put to set length
Bluetooth: hci_conn: Fix CIS connection dst_type handling
Bluetooth: L2CAP: Fix use-after-free caused by l2cap_reassemble_sdu
netfilter: ipset: enforce documented limit to prevent allocating huge memory
isdn: mISDN: netjet: fix wrong check of device registration
...
|
|
Jakub reported that the addition of the "network_byte_order"
member in struct nla_policy increases size of 32bit platforms.
Instead of scraping the bit from elsewhere Johannes suggested
to add explicit NLA_BE types instead, so do this here.
NLA_POLICY_MAX_BE() macro is removed again, there is no need
for it: NLA_POLICY_MAX(NLA_BE.., ..) will do the right thing.
NLA_BE64 can be added later.
Fixes: 08724ef69907 ("netlink: introduce NLA_POLICY_MAX_BE")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Suggested-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20221031123407.9158-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Converting fwnode_pointer() to use better swnode API allows to
make code more readable.
While at it, rename full_name to full_name_third to show exact
relation in the hierarchy.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20220824170542.18263-1-andriy.shevchenko@linux.intel.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc hotfixes from Andrew Morton:
"Eight fix pre-6.0 bugs and the remainder address issues which were
introduced in the 6.1-rc merge cycle, or address issues which aren't
considered sufficiently serious to warrant a -stable backport"
* tag 'mm-hotfixes-stable-2022-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (23 commits)
mm: multi-gen LRU: move lru_gen_add_mm() out of IRQ-off region
lib: maple_tree: remove unneeded initialization in mtree_range_walk()
mmap: fix remap_file_pages() regression
mm/shmem: ensure proper fallback if page faults
mm/userfaultfd: replace kmap/kmap_atomic() with kmap_local_page()
x86: fortify: kmsan: fix KMSAN fortify builds
x86: asm: make sure __put_user_size() evaluates pointer once
Kconfig.debug: disable CONFIG_FRAME_WARN for KMSAN by default
x86/purgatory: disable KMSAN instrumentation
mm: kmsan: export kmsan_copy_page_meta()
mm: migrate: fix return value if all subpages of THPs are migrated successfully
mm/uffd: fix vma check on userfault for wp
mm: prep_compound_tail() clear page->private
mm,madvise,hugetlb: fix unexpected data loss with MADV_DONTNEED on hugetlbfs
mm/page_isolation: fix clang deadcode warning
fs/ext4/super.c: remove unused `deprecated_msg'
ipc/msg.c: fix percpu_counter use after free
memory tier, sysfs: rename attribute "nodes" to "nodelist"
MAINTAINERS: git://github.com -> https://github.com for nilfs2
mm/kmemleak: prevent soft lockup in kmemleak_scan()'s object iteration loops
...
|
|
It's really difficult to debug when cgroup or css refs leak. Let's add a
debug option to force the refcnt function to not be inlined so that they can
be kprobed for debugging.
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Before the do-while loop in mtree_range_walk(), the variables next, min,
max need to be initialized. The variables last, prev_min and prev_max are
set within the loop body before they are eventually used after exiting the
loop body.
As it is a do-while loop, the loop body is executed at least once, so the
variables last, prev_min and prev_max do not need to be initialized before
the loop body.
Remove unneeded initialization of last and prev_min.
The needless initialization was reported by clang-analyzer as Dead Stores.
As the compiler already identifies these assignments as unneeded, it
optimizes the assignments away. Hence:
No functional change. No change in object code.
Link: https://lkml.kernel.org/r/20221026120029.12555-2-lukas.bulwahn@gmail.com
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
KMSAN adds a lot of instrumentation to the code, which results in
increased stack usage (up to 2048 bytes and more in some cases). It's
hard to predict how big the stack frames can be, so we disable the
warnings for KMSAN instead.
Link: https://lkml.kernel.org/r/20221024212144.2852069-3-glider@google.com
Link: https://github.com/google/kmsan/issues/89
Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from 802.15.4 (Zigbee et al).
Current release - regressions:
- ipa: fix bugs in the register conversion for IPA v3.1 and v3.5.1
Current release - new code bugs:
- mptcp: fix abba deadlock on fastopen
- eth: stmmac: rk3588: allow multiple gmac controllers in one system
Previous releases - regressions:
- ip: rework the fix for dflt addr selection for connected nexthop
- net: couple more fixes for misinterpreting bits in struct page
after the signature was added
Previous releases - always broken:
- ipv6: ensure sane device mtu in tunnels
- openvswitch: switch from WARN to pr_warn on a user-triggerable path
- ethtool: eeprom: fix null-deref on genl_info in dump
- ieee802154: more return code fixes for corner cases in
dgram_sendmsg
- mac802154: fix link-quality-indicator recording
- eth: mlx5: fixes for IPsec, PTP timestamps, OvS and conntrack
offload
- eth: fec: limit register access on i.MX6UL
- eth: bcm4908_enet: update TX stats after actual transmission
- can: rcar_canfd: improve IRQ handling for RZ/G2L
Misc:
- genetlink: piggy back on the newly added resv_op_start to enforce
more sanity checks on new commands"
* tag 'net-6.1-rc3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (57 commits)
net: enetc: survive memory pressure without crashing
kcm: do not sense pfmemalloc status in kcm_sendpage()
net: do not sense pfmemalloc status in skb_append_pagefrags()
net/mlx5e: Fix macsec sci endianness at rx sa update
net/mlx5e: Fix wrong bitwise comparison usage in macsec_fs_rx_add_rule function
net/mlx5e: Fix macsec rx security association (SA) update/delete
net/mlx5e: Fix macsec coverity issue at rx sa update
net/mlx5: Fix crash during sync firmware reset
net/mlx5: Update fw fatal reporter state on PCI handlers successful recover
net/mlx5e: TC, Fix cloned flow attr instance dests are not zeroed
net/mlx5e: TC, Reject forwarding from internal port to internal port
net/mlx5: Fix possible use-after-free in async command interface
net/mlx5: ASO, Create the ASO SQ with the correct timestamp format
net/mlx5e: Update restore chain id for slow path packets
net/mlx5e: Extend SKB room check to include PTP-SQ
net/mlx5: DR, Fix matcher disconnect error flow
net/mlx5: Wait for firmware to enable CRS before pci_restore_state
net/mlx5e: Do not increment ESN when updating IPsec ESN state
netdevsim: remove dir in nsim_dev_debugfs_init() when creating ports dir failed
netdevsim: fix memory leak in nsim_drv_probe() when nsim_dev_resources_register() failed
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening fixes from Kees Cook:
- Fix older Clang vs recent overflow KUnit test additions (Nick
Desaulniers, Kees Cook)
- Fix kern-doc visibility for overflow helpers (Kees Cook)
* tag 'hardening-v6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
overflow: Refactor test skips for Clang-specific issues
overflow: disable failing tests for older clang versions
overflow: Fix kern-doc markup for functions
|
|
remove unused string_stream_alloc_context structure definition.
Signed-off-by: YoungJun.park <her0gyugyu@gmail.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
Augment the example_all_expect_macros_test with the KUNIT_EXPECT_MEMEQ
and KUNIT_EXPECT_MEMNEQ macros by creating a test with memory block
assertions.
Signed-off-by: Maíra Canal <mairacanal@riseup.net>
Reviewed-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
Currently, in order to compare memory blocks in KUnit, the KUNIT_EXPECT_EQ
or KUNIT_EXPECT_FALSE macros are used in conjunction with the memcmp
function, such as:
KUNIT_EXPECT_EQ(test, memcmp(foo, bar, size), 0);
Although this usage produces correct results for the test cases, when
the expectation fails, the error message is not very helpful,
indicating only the return of the memcmp function.
Therefore, create a new set of macros KUNIT_EXPECT_MEMEQ and
KUNIT_EXPECT_MEMNEQ that compare memory blocks until a specified size.
In case of expectation failure, those macros print the hex dump of the
memory blocks, making it easier to debug test failures for memory blocks.
That said, the expectation
KUNIT_EXPECT_EQ(test, memcmp(foo, bar, size), 0);
would translate to the expectation
KUNIT_EXPECT_MEMEQ(test, foo, bar, size);
Signed-off-by: Maíra Canal <mairacanal@riseup.net>
Reviewed-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
When KUNIT_EXPECT_EQ() or KUNIT_ASSERT_EQ() log a failure, they log the
two values being compared, with numerical values logged in decimal.
In some cases, decimal output is painful to consume, and hexadecimal
output would be more helpful. For example, this is the case for tests
I'm currently developing for the arm64 insn encoding/decoding code,
where comparing two 32-bit instruction opcodes results in output such
as:
| # test_insn_add_shifted_reg: EXPECTATION FAILED at arch/arm64/lib/test_insn.c:2791
| Expected obj_insn == gen_insn, but
| obj_insn == 2332164128
| gen_insn == 1258422304
To make this easier to consume, this patch logs the values in both
decimal and hexadecimal:
| # test_insn_add_shifted_reg: EXPECTATION FAILED at arch/arm64/lib/test_insn.c:2791
| Expected obj_insn == gen_insn, but
| obj_insn == 2332164128 (0x8b020020)
| gen_insn == 1258422304 (0x4b020020)
As can be seen from the example, having hexadecimal makes it
significantly easier for a human to spot which specific bits are
incorrect.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: linux-kselftest@vger.kernel.org
Cc: kunit-dev@googlegroups.com
Acked-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
The "random rhlist add/delete operations" actually wasn't very random, as all
cases tested the same bit. Since the later parts of this loop depend on the
first case execute this unconditionally, and then test on different bits for the
remaining tests. While at it only request as much random bits as are actually
used.
Signed-off-by: Rolf Eike Beer <eike-kernel@sf-tec.de>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Convert test exclusion into test skipping. This brings the logic for
why a test is being skipped into the test itself, instead of having to
spread ifdefs around the code. This will make cleanup easier as minimum
tests get raised. Drop __maybe_unused so missed tests will be noticed
again and clean up whitespace.
For example, clang-11 on i386:
[15:52:32] ================== overflow (18 subtests) ==================
[15:52:32] [PASSED] u8_u8__u8_overflow_test
[15:52:32] [PASSED] s8_s8__s8_overflow_test
[15:52:32] [PASSED] u16_u16__u16_overflow_test
[15:52:32] [PASSED] s16_s16__s16_overflow_test
[15:52:32] [PASSED] u32_u32__u32_overflow_test
[15:52:32] [PASSED] s32_s32__s32_overflow_test
[15:52:32] [SKIPPED] u64_u64__u64_overflow_test
[15:52:32] [SKIPPED] s64_s64__s64_overflow_test
[15:52:32] [SKIPPED] u32_u32__int_overflow_test
[15:52:32] [PASSED] u32_u32__u8_overflow_test
[15:52:32] [PASSED] u8_u8__int_overflow_test
[15:52:32] [PASSED] int_int__u8_overflow_test
[15:52:32] [PASSED] shift_sane_test
[15:52:32] [PASSED] shift_overflow_test
[15:52:32] [PASSED] shift_truncate_test
[15:52:32] [PASSED] shift_nonsense_test
[15:52:32] [PASSED] overflow_allocation_test
[15:52:32] [PASSED] overflow_size_helpers_test
[15:52:32] ==================== [PASSED] overflow =====================
[15:52:32] ============================================================
[15:52:32] Testing complete. Ran 18 tests: passed: 15, skipped: 3
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Tom Rix <trix@redhat.com>
Cc: Daniel Latypov <dlatypov@google.com>
Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Cc: llvm@lists.linux.dev
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20221006230017.1833458-1-keescook@chromium.org
|
|
Building the overflow kunit tests with clang-11 fails with:
$ ./tools/testing/kunit/kunit.py run --arch=arm --make_options LLVM=1 \
overflow
...
ld.lld: error: undefined symbol: __mulodi4
...
Clang 11 and earlier generate unwanted libcalls for signed output,
unsigned input.
Disable these tests for now, but should these become used in the kernel
we might consider that as justification for dropping clang-11 support.
Keep the clang-11 build alive a little bit longer.
Avoid -Wunused-function warnings via __maybe_unused. To test W=1:
$ make LLVM=1 -j128 defconfig
$ ./scripts/config -e KUNIT -e KUNIT_ALL
$ make LLVM=1 -j128 olddefconfig lib/overflow_kunit.o W=1
Link: https://github.com/ClangBuiltLinux/linux/issues/1711
Link: https://github.com/llvm/llvm-project/commit/3203143f1356a4e4e3ada231156fc6da6e1a9f9d
Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221006171751.3444575-1-ndesaulniers@google.com
|
|
Fix trivial typo in Kconfig help comments in KCSAN_SKIP_WATCH and
KCSAN_SKIP_WATCH_RANDOMIZE
Signed-off-by: Ryosuke Yasuoka <ryasuoka@redhat.com>
Reviewed-by: Marco Elver <elver@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
The alloc_string_stream() functions were changed from returning NULL on
error to returning error pointers so these caller needs to be updated
as well.
Fixes: 78b1c6584fce ("kunit: string-stream: Simplify resource use")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
Replace the obsolete and ambiguos macro in_irq() with new
macro in_hardirq().
Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20221011024831.322799-1-ye.xingchen@zte.com.cn
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull more random number generator updates from Jason Donenfeld:
"This time with some large scale treewide cleanups.
The intent of this pull is to clean up the way callers fetch random
integers. The current rules for doing this right are:
- If you want a secure or an insecure random u64, use get_random_u64()
- If you want a secure or an insecure random u32, use get_random_u32()
The old function prandom_u32() has been deprecated for a while
now and is just a wrapper around get_random_u32(). Same for
get_random_int().
- If you want a secure or an insecure random u16, use get_random_u16()
- If you want a secure or an insecure random u8, use get_random_u8()
- If you want secure or insecure random bytes, use get_random_bytes().
The old function prandom_bytes() has been deprecated for a while
now and has long been a wrapper around get_random_bytes()
- If you want a non-uniform random u32, u16, or u8 bounded by a
certain open interval maximum, use prandom_u32_max()
I say "non-uniform", because it doesn't do any rejection sampling
or divisions. Hence, it stays within the prandom_*() namespace, not
the get_random_*() namespace.
I'm currently investigating a "uniform" function for 6.2. We'll see
what comes of that.
By applying these rules uniformly, we get several benefits:
- By using prandom_u32_max() with an upper-bound that the compiler
can prove at compile-time is ≤65536 or ≤256, internally
get_random_u16() or get_random_u8() is used, which wastes fewer
batched random bytes, and hence has higher throughput.
- By using prandom_u32_max() instead of %, when the upper-bound is
not a constant, division is still avoided, because
prandom_u32_max() uses a faster multiplication-based trick instead.
- By using get_random_u16() or get_random_u8() in cases where the
return value is intended to indeed be a u16 or a u8, we waste fewer
batched random bytes, and hence have higher throughput.
This series was originally done by hand while I was on an airplane
without Internet. Later, Kees and I worked on retroactively figuring
out what could be done with Coccinelle and what had to be done
manually, and then we split things up based on that.
So while this touches a lot of files, the actual amount of code that's
hand fiddled is comfortably small"
* tag 'random-6.1-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
prandom: remove unused functions
treewide: use get_random_bytes() when possible
treewide: use get_random_u32() when possible
treewide: use get_random_{u8,u16}() when possible, part 2
treewide: use get_random_{u8,u16}() when possible, part 1
treewide: use prandom_u32_max() when possible, part 2
treewide: use prandom_u32_max() when possible, part 1
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Fix CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y compile error for the
combination of Clang >= 14 and GAS <= 2.35.
- Drop vmlinux.bz2 from the rpm package as it just annoyingly increased
the package size.
- Fix modpost error under build environments using musl.
- Make *.ll files keep value names for easier debugging
- Fix single directory build
- Prevent RISC-V from selecting the broken DWARF5 support when Clang
and GAS are used together.
* tag 'kbuild-fixes-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
lib/Kconfig.debug: Add check for non-constant .{s,u}leb128 support to DWARF5
kbuild: fix single directory build
kbuild: add -fno-discard-value-names to cmd_cc_ll_c
scripts/clang-tools: Convert clang-tidy args to list
modpost: put modpost options before argument
kbuild: Stop including vmlinux.bz2 in the rpm's
Kconfig.debug: add toolchain checks for DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT
Kconfig.debug: simplify the dependency of DEBUG_INFO_DWARF4/5
|
|
When building with a RISC-V kernel with DWARF5 debug info using clang
and the GNU assembler, several instances of the following error appear:
/tmp/vgettimeofday-48aa35.s:2963: Error: non-constant .uleb128 is not supported
Dumping the .s file reveals these .uleb128 directives come from
.debug_loc and .debug_ranges:
.Ldebug_loc0:
.byte 4 # DW_LLE_offset_pair
.uleb128 .Lfunc_begin0-.Lfunc_begin0 # starting offset
.uleb128 .Ltmp1-.Lfunc_begin0 # ending offset
.byte 1 # Loc expr size
.byte 90 # DW_OP_reg10
.byte 0 # DW_LLE_end_of_list
.Ldebug_ranges0:
.byte 4 # DW_RLE_offset_pair
.uleb128 .Ltmp6-.Lfunc_begin0 # starting offset
.uleb128 .Ltmp27-.Lfunc_begin0 # ending offset
.byte 4 # DW_RLE_offset_pair
.uleb128 .Ltmp28-.Lfunc_begin0 # starting offset
.uleb128 .Ltmp30-.Lfunc_begin0 # ending offset
.byte 0 # DW_RLE_end_of_list
There is an outstanding binutils issue to support a non-constant operand
to .sleb128 and .uleb128 in GAS for RISC-V but there does not appear to
be any movement on it, due to concerns over how it would work with
linker relaxation.
To avoid these build errors, prevent DWARF5 from being selected when
using clang and an assembler that does not have support for these symbol
deltas, which can be easily checked in Kconfig with as-instr plus the
small test program from the dwz test suite from the binutils issue.
Link: https://sourceware.org/bugzilla/show_bug.cgi?id=27215
Link: https://github.com/ClangBuiltLinux/linux/issues/1719
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more MM updates from Andrew Morton:
- fix a race which causes page refcounting errors in ZONE_DEVICE pages
(Alistair Popple)
- fix userfaultfd test harness instability (Peter Xu)
- various other patches in MM, mainly fixes
* tag 'mm-stable-2022-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (29 commits)
highmem: fix kmap_to_page() for kmap_local_page() addresses
mm/page_alloc: fix incorrect PGFREE and PGALLOC for high-order page
mm/selftest: uffd: explain the write missing fault check
mm/hugetlb: use hugetlb_pte_stable in migration race check
mm/hugetlb: fix race condition of uffd missing/minor handling
zram: always expose rw_page
LoongArch: update local TLB if PTE entry exists
mm: use update_mmu_tlb() on the second thread
kasan: fix array-bounds warnings in tests
hmm-tests: add test for migrate_device_range()
nouveau/dmem: evict device private memory during release
nouveau/dmem: refactor nouveau_dmem_fault_copy_one()
mm/migrate_device.c: add migrate_device_range()
mm/migrate_device.c: refactor migrate_vma and migrate_deivce_coherent_page()
mm/memremap.c: take a pgmap reference on page allocation
mm: free device private pages have zero refcount
mm/memory.c: fix race when faulting a device private page
mm/damon: use damon_sz_region() in appropriate place
mm/damon: move sz_damon_region to damon_sz_region
lib/test_meminit: add checks for the allocation functions
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc updates from Helge Deller:
"Fixes:
- When we added basic vDSO support in kernel 5.18 we introduced a bug
which prevented a mmap() of graphic card memory. This is because we
used the DMB (data memory break trap bit) page flag as special-bit,
but missed to clear that bit when loading the TLB.
- Graphics card memory size was not correctly aligned
- Spelling fixes (from Colin Ian King)
Enhancements:
- PDC console (which uses firmware calls) now rewritten as early
console
- Reduced size of alternative tables"
* tag 'parisc-for-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Fix spelling mistake "mis-match" -> "mismatch" in eisa driver
parisc: Fix userspace graphics card breakage due to pgtable special bit
parisc: fbdev/stifb: Align graphics memory size to 4MB
parisc: Convert PDC console to an early console
parisc: Reduce kernel size by packing alternative tables
|
|
Link: https://lkml.kernel.org/r/a73cf109de0224cfd118d22be58ddebac3ae2897.1664366292.git-series.apopple@nvidia.com
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Alex Sierra <alex.sierra@amd.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Since 27674ef6c73f ("mm: remove the extra ZONE_DEVICE struct page
refcount") device private pages have no longer had an extra reference
count when the page is in use. However before handing them back to the
owning device driver we add an extra reference count such that free pages
have a reference count of one.
This makes it difficult to tell if a page is free or not because both free
and in use pages will have a non-zero refcount. Instead we should return
pages to the drivers page allocator with a zero reference count. Kernel
code can then safely use kernel functions such as get_page_unless_zero().
Link: https://lkml.kernel.org/r/cf70cf6f8c0bdb8aaebdbfb0d790aea4c683c3c6.1664366292.git-series.apopple@nvidia.com
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Alex Sierra <alex.sierra@amd.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "Fix several device private page reference counting issues",
v2
This series aims to fix a number of page reference counting issues in
drivers dealing with device private ZONE_DEVICE pages. These result in
use-after-free type bugs, either from accessing a struct page which no
longer exists because it has been removed or accessing fields within the
struct page which are no longer valid because the page has been freed.
During normal usage it is unlikely these will cause any problems. However
without these fixes it is possible to crash the kernel from userspace.
These crashes can be triggered either by unloading the kernel module or
unbinding the device from the driver prior to a userspace task exiting.
In modules such as Nouveau it is also possible to trigger some of these
issues by explicitly closing the device file-descriptor prior to the task
exiting and then accessing device private memory.
This involves some minor changes to both PowerPC and AMD GPU code.
Unfortunately I lack hardware to test either of those so any help there
would be appreciated. The changes mimic what is done in for both Nouveau
and hmm-tests though so I doubt they will cause problems.
This patch (of 8):
When the CPU tries to access a device private page the migrate_to_ram()
callback associated with the pgmap for the page is called. However no
reference is taken on the faulting page. Therefore a concurrent migration
of the device private page can free the page and possibly the underlying
pgmap. This results in a race which can crash the kernel due to the
migrate_to_ram() function pointer becoming invalid. It also means drivers
can't reliably read the zone_device_data field because the page may have
been freed with memunmap_pages().
Close the race by getting a reference on the page while holding the ptl to
ensure it has not been freed. Unfortunately the elevated reference count
will cause the migration required to handle the fault to fail. To avoid
this failure pass the faulting page into the migrate_vma functions so that
if an elevated reference count is found it can be checked to see if it's
expected or not.
[mpe@ellerman.id.au: fix build]
Link: https://lkml.kernel.org/r/87fsgbf3gh.fsf@mpe.ellerman.id.au
Link: https://lkml.kernel.org/r/cover.60659b549d8509ddecafad4f498ee7f03bb23c69.1664366292.git-series.apopple@nvidia.com
Link: https://lkml.kernel.org/r/d3e813178a59e565e8d78d9b9a4e2562f6494f90.1664366292.git-series.apopple@nvidia.com
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Alex Sierra <alex.sierra@amd.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
alloc_pages(), kmalloc() and vmalloc() are all memory allocation functions
which can return NULL when some internal memory failures happen. So it is
better to check the return of them to catch the failure in time for better
test them.
Link: https://lkml.kernel.org/r/tencent_D44A49FFB420EDCCBFB9221C8D14DFE12908@qq.com
Signed-off-by: Xiaoke Wang <xkernel.wang@foxmail.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Marco Elver <elver@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull more KUnit updates from Shuah Khan:
"Features and fixes:
- simplify resource use
- make kunit_malloc() and kunit_free() allocations and frees
consistent. kunit_free() frees only the memory allocated by
kunit_malloc()
- stop downloading risc-v opensbi binaries using wget
- other fixes and improvements to tool and KUnit framework"
* tag 'linux-kselftest-kunit-6.1-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
Documentation: kunit: Update description of --alltests option
kunit: declare kunit_assert structs as const
kunit: rename base KUNIT_ASSERTION macro to _KUNIT_FAILED
kunit: remove format func from struct kunit_assert, get it to 0 bytes
kunit: tool: Don't download risc-v opensbi firmware with wget
kunit: make kunit_kfree(NULL) a no-op to match kfree()
kunit: make kunit_kfree() not segfault on invalid inputs
kunit: make kunit_kfree() only work on pointers from kunit_malloc() and friends
kunit: drop test pointer in string_stream_fragment
kunit: string-stream: Simplify resource use
|