Age | Commit message (Collapse) | Author | Files | Lines |
|
On some machines, the normal zone can have a large memory hole like below
memory layout, and we can see the range from 0x100000000 to 0x1800000000
is a hole. So when isolating some migratable pages, the scanner can meet
the hole and it will take more time to skip the large hole. From my
measurement, I can see the isolation scanner will take 80us ~ 100us to
skip the large hole [0x100000000 - 0x1800000000].
So adding a new helper to fast search next online memory section to skip
the large hole can help to find next suitable pageblock efficiently. With
this patch, I can see the large hole scanning only takes < 1us.
[ 0.000000] Zone ranges:
[ 0.000000] DMA [mem 0x0000000040000000-0x00000000ffffffff]
[ 0.000000] DMA32 empty
[ 0.000000] Normal [mem 0x0000000100000000-0x0000001fa7ffffff]
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000040000000-0x0000000fffffffff]
[ 0.000000] node 0: [mem 0x0000001800000000-0x0000001fa3c7ffff]
[ 0.000000] node 0: [mem 0x0000001fa3c80000-0x0000001fa3ffffff]
[ 0.000000] node 0: [mem 0x0000001fa4000000-0x0000001fa402ffff]
[ 0.000000] node 0: [mem 0x0000001fa4030000-0x0000001fa40effff]
[ 0.000000] node 0: [mem 0x0000001fa40f0000-0x0000001fa73cffff]
[ 0.000000] node 0: [mem 0x0000001fa73d0000-0x0000001fa745ffff]
[ 0.000000] node 0: [mem 0x0000001fa7460000-0x0000001fa746ffff]
[ 0.000000] node 0: [mem 0x0000001fa7470000-0x0000001fa758ffff]
[ 0.000000] node 0: [mem 0x0000001fa7590000-0x0000001fa7ffffff]
[baolin.wang@linux.alibaba.com: limit next_ptn to not exceed cc->free_pfn]
Link: https://lkml.kernel.org/r/a1d859c28af0c7e85e91795e7473f553eb180a9d.1686813379.git.baolin.wang@linux.alibaba.com
Link: https://lkml.kernel.org/r/75b4c8ca36bf44ad8c42bf0685ac19d272e426ec.1686705221.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Note the behaviour of kasan.fault=panic_on_write for async modes, since
all asynchronous faults will result in panic (even if they are reads).
Link: https://lkml.kernel.org/r/ZJHfL6vavKUZ3Yd8@elver.google.com
Fixes: 452c03fdbed0 ("kasan: add support for kasan.fault=panic_on_write")
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Aleksandr Nogikh <nogikh@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Taras Madan <tarasmadan@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
|
|
Pull drm fixes from Dave Airlie:
"Very quiet last week, just two misc fixes, one dp-mst and one qaic:
qaic:
- dma-buf import fix
dp-mst:
- fix NULL ptr deref"
[ It turns out it was a quiet week because Alex Deucher hadn't sent in
his pending AMD changes. So they are coming next - Linus ]
* tag 'drm-fixes-2023-06-23' of git://anongit.freedesktop.org/drm/drm:
drm: use mgr->dev in drm_dbg_kms in drm_dp_add_payload_part2
accel/qaic: Call DRM helper function to destroy prime GEM
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
"The final bug fixes for Qualcomm and Rockchips came in, all of them
for devicetree files:
- Devices on Qualcomm SC7180/SC7280 that are cache coherent are now
marked so correctly to fix a regression after a change in kernel
behavior
- Rockchips has a few minor changes for correctness of regulator and
cache properties, as well as fixes for incorrect behavior of the
RK3568 PCI controller and reset pins on two boards"
* tag 'arm-fixes-6.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
arm64: dts: qcom: sc7280: Mark SCM as dma-coherent for chrome devices
arm64: dts: qcom: sc7180: Mark SCM as dma-coherent for trogdor
arm64: dts: qcom: sc7180: Mark SCM as dma-coherent for IDP
dt-bindings: firmware: qcom,scm: Document that SCM can be dma-coherent
arm64: dts: rockchip: Fix rk356x PCIe register and range mappings
arm64: dts: rockchip: fix button reset pin for nanopi r5c
arm64: dts: rockchip: fix nEXTRST on SOQuartz
arm64: dts: rockchip: add missing cache properties
arm64: dts: rockchip: fix USB regulator on ROCK64
|
|
lru_gen_rotate_memcg() can happen in softirq if memory.soft_limit_in_bytes
is set. This requires memcg_lru->lock to be irq safe. Lockdep warns on
this.
This problem only affects memcg v1.
Link: https://lkml.kernel.org/r/20230619193821.2710944-1-yuzhao@google.com
Fixes: e4dde56cd208 ("mm: multi-gen LRU: per-node lru_gen_folio lists")
Signed-off-by: Yu Zhao <yuzhao@google.com>
Reported-by: syzbot+87c490fd2be656269b6a@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=87c490fd2be656269b6a
Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fix from David Sterba:
"Unfortunately the recent u32 overflow fix was not complete, there was
one conversion left, assertion not triggered by my tests but caught by
Qu's fstests case.
The "cleanup for later" has been promoted to a proper fix and wraps
all uses of the stripe left shift so the diffstat has grown but leaves
no potentially problematic uses.
We should have done it that way before, sorry"
* tag 'for-6.4-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix remaining u32 overflows when left shifting stripe_nr
|
|
Pull block fix from Jens Axboe:
"It's apparently the week of 'fixup something from last week', because
the same is true for this block pull request.
Fix up a lock grab that needs to be IRQ saving, rather than just IRQ
disabling, in the block cgroup code"
* tag 'block-6.4-2023-06-23' of git://git.kernel.dk/linux:
block: make sure local irq is disabled when calling __blkcg_rstat_flush
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu fix from Joerg Roedel:
- Fix potential memory leak in AMD IOMMU domain allocation path
* tag 'iommu-fix-v6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/amd: Fix possible memory leak of 'domain'
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Three oneliner fixes: one for a thinko in SOF SoundWire code and two
HD-audio quirks for ASUS laptops. All device-specific and should be
safe to apply"
* tag 'sound-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek: Add quirk for ASUS ROG GV601V
ALSA: hda/realtek: Add quirk for ASUS ROG G634Z
ASoC: intel: sof_sdw: Fixup typo in device link checking
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- fix IRQ initialization in gpiochip_irqchip_add_domain()
- add a missing return value check for platform_get_irq() in
gpio-sifive
- don't free irq_domains which GPIOLIB does not manage
* tag 'gpio-fixes-for-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpiolib: Fix irq_domain resource tracking for gpiochip_irqchip_add_domain()
gpio: sifive: add missing check for platform_get_irq
gpiolib: Fix GPIO chip IRQ initialization restriction
|
|
Commit 2736e8eeb0cc ("block: use the holder as indication for exclusive
opens") introduced a change that blkdev_put() has to get exclusive
holder of the bdev as an argument. However it overlooked that
register_bdev() and register_cache() overwrite the bdev->bd_holder field
in the block device to point to the real owning object which was not
available at the time we called blkdev_get_by_path(). Messing with bdev
internals like this is a layering violation and it also causes
blkdev_put() to issue warning about mismatching holders.
Fix bcache to reopen the block device with appropriate holder once it is
available which also restores the behavior that multiple bcache caches
cannot claim the same device which was broken by commit 29499ab060fe
("bcache: don't pass a stack address to blkdev_get_by_path").
Fixes: 2736e8eeb0cc ("block: use the holder as indication for exclusive opens")
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev>
Acked-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20230622164658.12861-2-jack@suse.cz
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Allocate holder object (cache or cached_dev) before offloading the
rest of the startup to async work. This will allow us to open the block
block device with proper holder.
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Coly Li <colyli@suse.de>
Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev>
Link: https://lore.kernel.org/r/20230622164658.12861-1-jack@suse.cz
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add sam9x7 compatible to DT bindings documentation.
Signed-off-by: Varshini Rajendran <varshini.rajendran@microchip.com>
Link: https://lore.kernel.org/r/Message-Id: <20230623203056.689705-33-varshini.rajendran@microchip.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/fixes
One last Qualcomm ARM64 DeviceTree fix for v6.4
Changes related to cache management for DMA memory caused WiFi to stop
work on SC7180 and SC7280 based products, using TF-A. These changes
marks the relevant device dma-coherent to correct the behavior.
* tag 'qcom-arm64-fixes-for-6.4-2' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
arm64: dts: qcom: sc7280: Mark SCM as dma-coherent for chrome devices
arm64: dts: qcom: sc7180: Mark SCM as dma-coherent for trogdor
arm64: dts: qcom: sc7180: Mark SCM as dma-coherent for IDP
dt-bindings: firmware: qcom,scm: Document that SCM can be dma-coherent
Link: https://lore.kernel.org/r/20230622203248.106422-1-andersson@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
Dave Airlie reports that gcc-13.1.1 has started complaining about some
of the workqueue code in 32-bit arm builds:
kernel/workqueue.c: In function ‘get_work_pwq’:
kernel/workqueue.c:713:24: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
713 | return (void *)(data & WORK_STRUCT_WQ_DATA_MASK);
| ^
[ ... a couple of other cases ... ]
and while it's not immediately clear exactly why gcc started complaining
about it now, I suspect it's some C23-induced enum type handlign fixup in
gcc-13 is the cause.
Whatever the reason for starting to complain, the code and data types
are indeed disgusting enough that the complaint is warranted.
The wq code ends up creating various "helper constants" (like that
WORK_STRUCT_WQ_DATA_MASK) using an enum type, which is all kinds of
confused. The mask needs to be 'unsigned long', not some unspecified
enum type.
To make matters worse, the actual "mask and cast to a pointer" is
repeated a couple of times, and the cast isn't even always done to the
right pointer, but - as the error case above - to a 'void *' with then
the compiler finishing the job.
That's now how we roll in the kernel.
So create the masks using the proper types rather than some ambiguous
enumeration, and use a nice helper that actually does the type
conversion in one well-defined place.
Incidentally, this magically makes clang generate better code. That,
admittedly, is really just a sign of clang having been seriously
confused before, and cleaning up the typing unconfuses the compiler too.
Reported-by: Dave Airlie <airlied@gmail.com>
Link: https://lore.kernel.org/lkml/CAPM=9twNnV4zMCvrPkw3H-ajZOH-01JVh_kDrxdPYQErz8ZTdA@mail.gmail.com/
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-6.5/block-late
Pull MD fixes from Song.
* tag 'md-next-20230623' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
raid10: avoid spin_lock from fastpath from raid10_unplug()
md: fix 'delete_mutex' deadlock
md: use mddev->external to select holder in export_rdev()
md/raid1-10: fix casting from randomized structure in raid1_submit_write()
md/raid10: fix the condition to call bio_end_io_acct()
|
|
* for-next/feat_s1pie:
: Support for the Armv8.9 Permission Indirection Extensions (stage 1 only)
KVM: selftests: get-reg-list: add Permission Indirection registers
KVM: selftests: get-reg-list: support ID register features
arm64: Document boot requirements for PIE
arm64: transfer permission indirection settings to EL2
arm64: enable Permission Indirection Extension (PIE)
arm64: add encodings of PIRx_ELx registers
arm64: disable EL2 traps for PIE
arm64: reorganise PAGE_/PROT_ macros
arm64: add PTE_WRITE to PROT_SECT_NORMAL
arm64: add PTE_UXN/PTE_WRITE to SWAPPER_*_FLAGS
KVM: arm64: expose ID_AA64MMFR3_EL1 to guests
KVM: arm64: Save/restore PIE registers
KVM: arm64: Save/restore TCR2_EL1
arm64: cpufeature: add Permission Indirection Extension cpucap
arm64: cpufeature: add TCR2 cpucap
arm64: cpufeature: add system register ID_AA64MMFR3
arm64/sysreg: add PIR*_ELx registers
arm64/sysreg: update HCRX_EL2 register
arm64/sysreg: add system registers TCR2_ELx
arm64/sysreg: Add ID register ID_AA64MMFR3
|
|
'for-next/iss2-decode', 'for-next/kselftest', 'for-next/misc', 'for-next/feat_mops', 'for-next/module-alloc', 'for-next/sysreg', 'for-next/cpucap', 'for-next/acpi', 'for-next/kdump', 'for-next/acpi-doc', 'for-next/doc' and 'for-next/tpidr2-fix', remote-tracking branch 'arm64/for-next/perf' into for-next/core
* arm64/for-next/perf:
docs: perf: Fix warning from 'make htmldocs' in hisi-pmu.rst
docs: perf: Add new description for HiSilicon UC PMU
drivers/perf: hisi: Add support for HiSilicon UC PMU driver
drivers/perf: hisi: Add support for HiSilicon H60PA and PAv3 PMU driver
perf: arm_cspmu: Add missing MODULE_DEVICE_TABLE
perf/arm-cmn: Add sysfs identifier
perf/arm-cmn: Revamp model detection
perf/arm_dmc620: Add cpumask
dt-bindings: perf: fsl-imx-ddr: Add i.MX93 compatible
drivers/perf: imx_ddr: Add support for NXP i.MX9 SoC DDRC PMU driver
perf/arm_cspmu: Decouple APMT dependency
perf/arm_cspmu: Clean up ACPI dependency
ACPI/APMT: Don't register invalid resource
perf/arm_cspmu: Fix event attribute type
perf: arm_cspmu: Set irq affinitiy only if overflow interrupt is used
drivers/perf: hisi: Don't migrate perf to the CPU going to teardown
drivers/perf: apple_m1: Force 63bit counters for M2 CPUs
perf/arm-cmn: Fix DTC reset
perf: qcom_l2_pmu: Make l2_cache_pmu_probe_cluster() more robust
perf/arm-cci: Slightly optimize cci_pmu_sync_counters()
* for-next/kpti:
: Simplify KPTI trampoline exit code
arm64: entry: Simplify tramp_alias macro and tramp_exit routine
arm64: entry: Preserve/restore X29 even for compat tasks
* for-next/missing-proto-warn:
: Address -Wmissing-prototype warnings
arm64: add alt_cb_patch_nops prototype
arm64: move early_brk64 prototype to header
arm64: signal: include asm/exception.h
arm64: kaslr: add kaslr_early_init() declaration
arm64: flush: include linux/libnvdimm.h
arm64: module-plts: inline linux/moduleloader.h
arm64: hide unused is_valid_bugaddr()
arm64: efi: add efi_handle_corrupted_x18 prototype
arm64: cpuidle: fix #ifdef for acpi functions
arm64: kvm: add prototypes for functions called in asm
arm64: spectre: provide prototypes for internal functions
arm64: move cpu_suspend_set_dbg_restorer() prototype to header
arm64: avoid prototype warnings for syscalls
arm64: add scs_patch_vmlinux prototype
arm64: xor-neon: mark xor_arm64_neon_*() static
* for-next/iss2-decode:
: Add decode of ISS2 to data abort reports
arm64/esr: Add decode of ISS2 to data abort reporting
arm64/esr: Use GENMASK() for the ISS mask
* for-next/kselftest:
: Various arm64 kselftest improvements
kselftest/arm64: Log signal code and address for unexpected signals
kselftest/arm64: Add a smoke test for ptracing hardware break/watch points
* for-next/misc:
: Miscellaneous patches
arm64: alternatives: make clean_dcache_range_nopatch() noinstr-safe
arm64: hibernate: remove WARN_ON in save_processor_state
arm64/fpsimd: Exit streaming mode when flushing tasks
arm64: mm: fix VA-range sanity check
arm64/mm: remove now-superfluous ISBs from TTBR writes
arm64: consolidate rox page protection logic
arm64: set __exception_irq_entry with __irq_entry as a default
arm64: syscall: unmask DAIF for tracing status
arm64: lockdep: enable checks for held locks when returning to userspace
arm64/cpucaps: increase string width to properly format cpucaps.h
arm64/cpufeature: Use helper for ECV CNTPOFF cpufeature
* for-next/feat_mops:
: Support for ARMv8.8 memcpy instructions in userspace
kselftest/arm64: add MOPS to hwcap test
arm64: mops: allow disabling MOPS from the kernel command line
arm64: mops: detect and enable FEAT_MOPS
arm64: mops: handle single stepping after MOPS exception
arm64: mops: handle MOPS exceptions
KVM: arm64: hide MOPS from guests
arm64: mops: don't disable host MOPS instructions from EL2
arm64: mops: document boot requirements for MOPS
KVM: arm64: switch HCRX_EL2 between host and guest
arm64: cpufeature: detect FEAT_HCX
KVM: arm64: initialize HCRX_EL2
* for-next/module-alloc:
: Make the arm64 module allocation code more robust (clean-up, VA range expansion)
arm64: module: rework module VA range selection
arm64: module: mandate MODULE_PLTS
arm64: module: move module randomization to module.c
arm64: kaslr: split kaslr/module initialization
arm64: kasan: remove !KASAN_VMALLOC remnants
arm64: module: remove old !KASAN_VMALLOC logic
* for-next/sysreg: (21 commits)
: More sysreg conversions to automatic generation
arm64/sysreg: Convert TRBIDR_EL1 register to automatic generation
arm64/sysreg: Convert TRBTRG_EL1 register to automatic generation
arm64/sysreg: Convert TRBMAR_EL1 register to automatic generation
arm64/sysreg: Convert TRBSR_EL1 register to automatic generation
arm64/sysreg: Convert TRBBASER_EL1 register to automatic generation
arm64/sysreg: Convert TRBPTR_EL1 register to automatic generation
arm64/sysreg: Convert TRBLIMITR_EL1 register to automatic generation
arm64/sysreg: Rename TRBIDR_EL1 fields per auto-gen tools format
arm64/sysreg: Rename TRBTRG_EL1 fields per auto-gen tools format
arm64/sysreg: Rename TRBMAR_EL1 fields per auto-gen tools format
arm64/sysreg: Rename TRBSR_EL1 fields per auto-gen tools format
arm64/sysreg: Rename TRBBASER_EL1 fields per auto-gen tools format
arm64/sysreg: Rename TRBPTR_EL1 fields per auto-gen tools format
arm64/sysreg: Rename TRBLIMITR_EL1 fields per auto-gen tools format
arm64/sysreg: Convert OSECCR_EL1 to automatic generation
arm64/sysreg: Convert OSDTRTX_EL1 to automatic generation
arm64/sysreg: Convert OSDTRRX_EL1 to automatic generation
arm64/sysreg: Convert OSLAR_EL1 to automatic generation
arm64/sysreg: Standardise naming of bitfield constants in OSL[AS]R_EL1
arm64/sysreg: Convert MDSCR_EL1 to automatic register generation
...
* for-next/cpucap:
: arm64 cpucap clean-up
arm64: cpufeature: fold cpus_set_cap() into update_cpu_capabilities()
arm64: cpufeature: use cpucap naming
arm64: alternatives: use cpucap naming
arm64: standardise cpucap bitmap names
* for-next/acpi:
: Various arm64-related ACPI patches
ACPI: bus: Consolidate all arm specific initialisation into acpi_arm_init()
* for-next/kdump:
: Simplify the crashkernel reservation behaviour of crashkernel=X,high on arm64
arm64: add kdump.rst into index.rst
Documentation: add kdump.rst to present crashkernel reservation on arm64
arm64: kdump: simplify the reservation behaviour of crashkernel=,high
* for-next/acpi-doc:
: Update ACPI documentation for Arm systems
Documentation/arm64: Update ACPI tables from BBR
Documentation/arm64: Update references in arm-acpi
Documentation/arm64: Update ARM and arch reference
* for-next/doc:
: arm64 documentation updates
Documentation/arm64: Add ptdump documentation
* for-next/tpidr2-fix:
: Fix the TPIDR2_EL0 register restoring on sigreturn
kselftest/arm64: Add a test case for TPIDR2 restore
arm64/signal: Restore TPIDR2 register rather than memory state
|
|
Due to the fact that TPIDR2 is intended to be managed by libc we don't
currently test modifying it via the signal context since that might
disrupt libc's usage of it and cause instability. We can however test the
opposite case with less risk, modifying TPIDR2 in a signal handler and
making sure that the original value is restored after returning from the
signal handler. Add a test which does this.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230621-arm64-fix-tpidr2-signal-restore-v2-2-c8e8fcc10302@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Currently when restoring the TPIDR2 signal context we set the new value
from the signal frame in the thread data structure but not the register,
following the pattern for the rest of the data we are restoring. This does
not work in the case of TPIDR2, the register always has the value for the
current task. This means that either we return to userspace and ignore the
new value or we context switch and save the register value on top of the
newly restored value.
Load the value from the signal context into the register instead.
Fixes: 39e54499280f ("arm64/signal: Include TPIDR2 in the signal context")
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: <stable@vger.kernel.org> # 6.3.x
Link: https://lore.kernel.org/r/20230621-arm64-fix-tpidr2-signal-restore-v2-1-c8e8fcc10302@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
|
|
Commit 0c0be98bbe67 ("md/raid10: prevent unnecessary calls to wake_up()
in fast path") missed one place, for example, with:
fio -direct=1 -rw=write/randwrite -iodepth=1 ...
Plug and unplug are called for each io, then wake_up() from raid10_unplug()
will cause lock contention as well.
Avoid this contention by using wake_up_barrier() instead of wake_up(),
where spin_lock is not held if waitqueue is empty.
Fio test script:
[global]
name=random reads and writes
ioengine=libaio
direct=1
readwrite=randrw
rwmixread=70
iodepth=64
buffered=0
filename=/dev/md0
size=1G
runtime=30
time_based
randrepeat=0
norandommap
refill_buffers
ramp_time=10
bs=4k
numjobs=400
group_reporting=1
[job1]
Test result with ramdisk raid10(By Ali):
Before this patch With this patch
READ IOPS=2033k IOPS=3642k
WRITE IOPS=871k IOPS=1561K
By the way, in this scenario, blk_plug_cb() will be allocated and freed
for each io, this seems need to be optimized as well.
Reported-and-tested-by: Ali Gholami Rudi <aligrudi@gmail.com>
Closes: https://lore.kernel.org/all/20231606122233@laper.mirepesht/
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230621105728.1268542-1-yukuai1@huaweicloud.com
|
|
Commit 3ce94ce5d05a ("md: fix duplicate filename for rdev") introduce a
new lock 'delete_mutex', and trigger a new deadlock:
t1: remove rdev t2: sysfs writer
rdev_attr_store rdev_attr_store
mddev_lock
state_store
md_kick_rdev_from_array
lock delete_mutex
list_add mddev->deleting
unlock delete_mutex
mddev_unlock
mddev_lock
...
lock delete_mutex
kobject_del
// wait for sysfs writers to be done
mddev_unlock
lock delete_mutex
// wait for delete_mutex, deadlock
'delete_mutex' is used to protect the list 'mddev->deleting', turns out
that this list can be protected by 'reconfig_mutex' directly, and this
lock can be removed.
Fix this problem by removing the lock, and use 'reconfig_mutex' to
protect the list. mddev_unlock() will move this list to a local list to
be handled after 'reconfig_mutex' is dropped.
Fixes: 3ce94ce5d05a ("md: fix duplicate filename for rdev")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230621142933.1395629-1-yukuai1@huaweicloud.com
|
|
mdadm test "10ddf-create-fail-rebuild" triggers warnings like the following
[ 215.526357] ------------[ cut here ]------------
[ 215.527243] WARNING: CPU: 18 PID: 1264 at block/bdev.c:617 blkdev_put+0x269/0x350
[ 215.528334] Modules linked in:
[ 215.528806] CPU: 18 PID: 1264 Comm: mdmon Not tainted 6.4.0-rc2+ #768
[ 215.529863] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
[ 215.531464] RIP: 0010:blkdev_put+0x269/0x350
[ 215.532167] Code: ff ff 49 8d 7d 10 e8 56 bf b8 ff 4d 8b 65 10 49 8d bc
24 58 05 00 00 e8 05 be b8 ff 41 83 ac 24 58 05 00 00 01 e9 44 ff ff ff
<0f> 0b e9 52 fe ff ff 0f 0b e9 6b fe ff ff1
[ 215.534780] RSP: 0018:ffffc900040bfbf0 EFLAGS: 00010283
[ 215.535635] RAX: ffff888174001000 RBX: ffff88810b1c3b00 RCX: ffffffff819a4061
[ 215.536645] RDX: dffffc0000000000 RSI: dffffc0000000000 RDI: ffff88810b1c3ba0
[ 215.537657] RBP: ffff88810dbde800 R08: fffffbfff0fca983 R09: fffffbfff0fca983
[ 215.538674] R10: ffffc900040bfbf0 R11: fffffbfff0fca982 R12: ffff88810b1c3b38
[ 215.539687] R13: ffff88810b1c3b10 R14: ffff88810dbdecb8 R15: ffff88810b1c3b00
[ 215.540833] FS: 00007f2aabdff700(0000) GS:ffff888dfb400000(0000) knlGS:0000000000000000
[ 215.541961] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 215.542775] CR2: 00007fa19a85d934 CR3: 000000010c076006 CR4: 0000000000370ee0
[ 215.543814] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 215.544840] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 215.545885] Call Trace:
[ 215.546257] <TASK>
[ 215.546608] export_rdev.isra.63+0x71/0xe0
[ 215.547338] mddev_unlock+0x1b1/0x2d0
[ 215.547898] array_state_store+0x28d/0x450
[ 215.548519] md_attr_store+0xd7/0x150
[ 215.549059] ? __pfx_sysfs_kf_write+0x10/0x10
[ 215.549702] kernfs_fop_write_iter+0x1b9/0x260
[ 215.550351] vfs_write+0x491/0x760
[ 215.550863] ? __pfx_vfs_write+0x10/0x10
[ 215.551445] ? __fget_files+0x156/0x230
[ 215.552053] ksys_write+0xc0/0x160
[ 215.552570] ? __pfx_ksys_write+0x10/0x10
[ 215.553141] ? ktime_get_coarse_real_ts64+0xec/0x100
[ 215.553878] do_syscall_64+0x3a/0x90
[ 215.554403] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 215.555125] RIP: 0033:0x7f2aade11847
[ 215.555696] Code: c3 66 90 41 54 49 89 d4 55 48 89 f5 53 89 fb 48 83 ec
10 e8 1b fd ff ff 4c 89 e2 48 89 ee 89 df 41 89 c0 b8 01 00 00 00 0f 05
<48> 3d 00 f0 ff ff 77 35 44 89 c7 48 89 448
[ 215.558398] RSP: 002b:00007f2aabdfeba0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
[ 215.559516] RAX: ffffffffffffffda RBX: 0000000000000010 RCX: 00007f2aade11847
[ 215.560515] RDX: 0000000000000005 RSI: 0000000000438b8b RDI: 0000000000000010
[ 215.561512] RBP: 0000000000438b8b R08: 0000000000000000 R09: 00007f2aaecf0060
[ 215.562511] R10: 000000000e3ba40b R11: 0000000000000293 R12: 0000000000000005
[ 215.563647] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000c70750
[ 215.564693] </TASK>
[ 215.565029] irq event stamp: 15979
[ 215.565584] hardirqs last enabled at (15991): [<ffffffff811a7432>] __up_console_sem+0x52/0x60
[ 215.566806] hardirqs last disabled at (16000): [<ffffffff811a7417>] __up_console_sem+0x37/0x60
[ 215.568022] softirqs last enabled at (15716): [<ffffffff8277a2db>] __do_softirq+0x3eb/0x531
[ 215.569239] softirqs last disabled at (15711): [<ffffffff810d8f45>] irq_exit_rcu+0x115/0x160
[ 215.570434] ---[ end trace 0000000000000000 ]---
This means export_rdev() calls blkdev_put with a different holder than the
one used by blkdev_get_by_dev(). This is because mddev->major_version == -2
is not a good check for external metadata. Fix this by using
mddev->external instead.
Also, do not clear mddev->external in md_clean(), as the flag might be used
later in export_rdev().
Fixes: 2736e8eeb0cc ("block: use the holder as indication for exclusive opens")
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Song Liu <song@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230617052405.305871-1-song@kernel.org
|
|
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/0b2967c4a4141875c493e835d5a6f8f2d19ae2d6.1687499804.git.baruch@tkos.co.il
|
|
NT_PRFPREG note is named "CORE". Correct the comment accordingly.
Fixes: 00e19ceec80b ("ELF: Add ELF program property parsing support")
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/455b22b986de4d3bc6d9bfd522378e442943de5f.1687499411.git.baruch@tkos.co.il
|
|
Following build error triggered while build with clang version 17.0.0
with W=1(this can't be reporduced with gcc 13.1.0):
drivers/md/raid1-10.c:117:25: error: casting from randomized structure
pointer type 'struct block_device *' to 'struct md_rdev *'
117 | struct md_rdev *rdev = (struct md_rdev *)bio->bi_bdev;
| ^
Fix this by casting 'bio->bi_bdev' to 'void *', as it used to be.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202306142042.fmjfmTF8-lkp@intel.com/
Fixes: 8295efbe68c0 ("md/raid1-10: factor out a helper to submit normal write")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230616012136.3047071-1-yukuai1@huaweicloud.com
|
|
/sys/block/[device]/queue/iostats is used to control whether to count io
stat. Write 0 to it will clear queue_flags QUEUE_FLAG_IO_STAT which means
iostats is disabled. If we disable iostats and later endable it, the io
issued during this period will be counted incorrectly, inflight will be
decreased to -1.
//T1 set iostats
echo 0 > /sys/block/md0/queue/iostats
clear QUEUE_FLAG_IO_STAT
//T2 issue io
if (QUEUE_FLAG_IO_STAT) -> false
bio_start_io_acct
inflight++
echo 1 > /sys/block/md0/queue/iostats
set QUEUE_FLAG_IO_STAT
//T3 io end
if (QUEUE_FLAG_IO_STAT) -> true
bio_end_io_acct
inflight-- -> -1
Also, if iostats is enabled while issuing io but disabled while io end,
inflight will never be decreased.
Fix it by checking start_time when io end. If start_time is not 0, call
bio_end_io_acct().
Fixes: 528bc2cf2fcc ("md/raid10: enable io accounting")
Signed-off-by: Li Nan <linan122@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230609094320.2397604-1-linan666@huaweicloud.com
|
|
The RAA215300 is a 9-channel PMIC that consists of
* Internally compensated regulators
* built-in Real Time Clock (RTC)
* 32kHz crystal oscillator
* coin cell battery charger
The RTC on RAA215300 is similar to the IP found in the ISL1208.
The existing driver for the ISL1208 works for this PMIC too,
however the RAA215300 exposes two devices via I2C, one for the RTC
IP, and one for everything else. The RTC IP has to be enabled
by the other I2C device, therefore this driver is necessary to get
the RTC to work.
The external oscillator bit is inverted on PMIC version 0x11.
Add PMIC RAA215300 driver for enabling RTC block and instantiating
RTC device based on PMIC version.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Link: https://lore.kernel.org/r/Message-Id: <20230623140948.384762-3-biju.das.jz@bp.renesas.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Document Renesas RAA215300 PMIC bindings.
The RAA215300 is a high Performance 9-Channel PMIC supporting DDR
Memory, with Built-In Charger and RTC.
It supports DDR3, DDR3L, DDR4, and LPDDR4 memory power requirements.
The internally compensated regulators, built-in Real-Time Clock (RTC),
32kHz crystal oscillator, and coin cell battery charger provide a
highly integrated, small footprint power solution ideal for
System-On-Module (SOM) applications. A spread spectrum feature
provides an ease-of-use solution for noise-sensitive audio or RF
applications.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/Message-Id: <20230623140948.384762-2-biju.das.jz@bp.renesas.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
In order to prevent request_queue to be freed before cleaning up
blktrace debugfs entries, commit db59133e9279 ("scsi: sg: fix blktrace
debugfs entries leakage") use scsi_device_get(), however,
scsi_device_get() will also grab scsi module reference and scsi module
can't be removed.
It's reported that blktests can't unload scsi_debug after block/001:
blktests (master) # ./check block
block/001 (stress device hotplugging) [failed]
+++ /root/blktests/results/nodev/block/001.out.bad 2023-06-19
Running block/001
Stressing sd
+modprobe: FATAL: Module scsi_debug is in use.
Fix this problem by grabbing request_queue reference directly, so that
scsi host module can still be unloaded while request_queue will be
pinged by sg device.
Reported-by: Chaitanya Kulkarni <chaitanyak@nvidia.com>
Link: https://lore.kernel.org/all/1760da91-876d-fc9c-ab51-999a6f66ad50@nvidia.com/
Fixes: db59133e9279 ("scsi: sg: fix blktrace debugfs entries leakage")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230621160111.1433521-1-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
There is no reason not to use __io_cq_unlock_post_flush for intermediate
aux CQE flushing, all ->task_complete should apply there, i.e. if set it
should be the submitter task. Combine them, get rid of of
__io_cq_unlock_post() and rename the left function.
This place was also taking a couple percents of CPU according to
profiles for max throughput net benchmarks due to multishot recv
flooding it with completions.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/bbed60734cbec2e833d9c7bdcf9741aada5d8aab.1687518903.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
io_cq_unlock_post() is exclusively used in io_uring/io_uring.c, mark it
static and don't expose to other files.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/3dc8127dda4514e1dd24bb32035faac887c5fa37.1687518903.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
__io_cq_unlock is not very helpful, and users should be calling flush
variants anyway. Open code the function.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d875c4cfb69f38ccecb58a57111446c77a614caa.1687518903.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We do conditional locking, so __io_cq_lock() and friends not always
actually grab/release the lock, so kill misleading annotations.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2a098f9144c24cab622f8bf90b39f44da5d0401e.1687518903.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We're abusing ->completion_lock helpers. io_cq_unlock() neither
locking conditionally nor doing CQE flushing, which means that callers
must have some side reason of taking the lock and should do it directly.
Open code io_cq_unlock() into io_cqring_overflow_kill() and clean it up.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7dabb36856db2b562e78780480396c52c29b2bf4.1687518903.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Extract a function for non-local task_work_add, and use it directly from
io_move_task_work_from_local(). Now we don't use IOU_F_TWQ_FORCE_NORMAL
and it can be killed.
As a small positive side effect we don't grab task->io_uring in
io_req_normal_work_add anymore, which is not needed for
io_req_local_work_add().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2e55571e8ff2927ae3cc12da606d204e2485525b.1687518903.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We're trying to batch io_put_task() in io_free_batch_list(), but
considering that the hot path is a simple inc, it's most cerainly and
probably faster to just do io_put_task() instead of task tracking.
We don't care about io_put_task_remote() as it's only for IOPOLL
where polling/waiting is done by not the submitter task.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4a7ef7dce845fe2bd35507bf389d6bd2d5c1edf0.1687518903.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Move io_clean_op() up in the source file and remove the forward
declaration, as the function doesn't have tricky dependencies
anymore.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1b7163b2ba7c3a8322d972c79c1b0a9301b3057e.1687518903.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
io_dismantle_req() is only used in __io_req_complete_post(), open code
it there.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ba8f20cb2c914eefa2e7d120a104a198552050db.1687518903.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Request completion is a very hot path in general, but there are 3 places
that can be doing it: io_free_batch_list(), io_req_complete_post() and
io_free_req_tw().
io_free_req_tw() is used rather marginally and we don't care about it.
Killing it can help to clean up and optimise the left two, do that by
replacing it with io_req_task_complete().
There are two things to consider:
1) io_free_req() is called when all refs are put, so we need to reinit
references. The easiest way to do that is to clear REQ_F_REFCOUNT.
2) We also don't need a cqe from it, so silence it with REQ_F_CQE_SKIP.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/434a2be8f33d474ad888ce1c17fe5ea7bbcb2a55.1687518903.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
There is only one user of io_put_req_find_next() and it doesn't make
much sense to have it. Open code the function.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/38b5c5e48e4adc8e6a0cd16fdd5c1531d7ff81a9.1687518903.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
ext4_blkdev_remove() passes a wrong holder pointer to blkdev_put() which
triggers a warning there. Fix it.
Fixes: 2736e8eeb0cc ("block: use the holder as indication for exclusive opens")
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230622165107.13687-1-jack@suse.cz
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Claim clkhi and clklo as integer type to avoid possible calculation
errors caused by data overflow.
Fixes: a55fa9d0e42e ("i2c: imx-lpi2c: add low power i2c bus driver")
Signed-off-by: Clark Wang <xiaoning.wang@nxp.com>
Signed-off-by: Carlos Song <carlos.song@nxp.com>
Reviewed-by: Andi Shyti <andi.shyti@kernel.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
|
|
Smatch Warns:
drivers/i2c/busses/i2c-qup.c:1784 qup_i2c_probe()
warn: missing unwind goto?
The goto label "fail_runtime" and "fail" will disable qup->pclk,
but here qup->pclk failed to obtain, in order to be consistent,
change the direct return to goto label "fail_dma".
Fixes: 9cedf3b2f099 ("i2c: qup: Add bam dma capabilities")
Signed-off-by: Shuai Jiang <d202180596@hust.edu.cn>
Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn>
Reviewed-by: Andi Shyti <andi.shyti@kernel.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Cc: <stable@vger.kernel.org> # v4.6+
|
|
"regstep" may be deprecated, but it still needs a type.
Fixes: 8ad69f490516 ("dt-bindings: i2c: convert ocores binding to yaml")
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Peter Korsgaard <peter@korsgaard.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Andi Shyti <andi.shyti@kernel.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
|
|
The RZ/V2M SoC comes with the Clocked Serial Interface (CSI)
IP, which is a master/slave SPI controller.
This commit adds a driver to support CSI master mode.
Signed-off-by: Fabrizio Castro <fabrizio.castro.jz@renesas.com>
Link: https://lore.kernel.org/r/Message-Id: <20230622113341.657842-4-fabrizio.castro.jz@renesas.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Add dt-bindings for the CSI IP found inside the RZ/V2M SoC.
Signed-off-by: Fabrizio Castro <fabrizio.castro.jz@renesas.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/Message-Id: <20230622113341.657842-2-fabrizio.castro.jz@renesas.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Smatch reports:
drivers/clocksource/timer-cadence-ttc.c:529 ttc_timer_probe()
warn: 'timer_baseaddr' from of_iomap() not released on lines: 498,508,516.
timer_baseaddr may have the problem of not being released after use,
I replaced it with the devm_of_iomap() function and added the clk_put()
function to cleanup the "clk_ce" and "clk_cs".
Fixes: e932900a3279 ("arm: zynq: Use standard timer binding")
Fixes: 70504f311d4b ("clocksource/drivers/cadence_ttc: Convert init function to return error")
Signed-off-by: Feng Mingxi <m202271825@hust.edu.cn>
Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn>
Acked-by: Michal Simek <michal.simek@amd.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20230425065611.702917-1-m202271825@hust.edu.cn
|