Age | Commit message (Collapse) | Author | Files | Lines |
|
A couple whitespace fixups, no functional changes.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
A pure move, no other changes.
Note that ceph_tcp_recv{msg,page}() and ceph_tcp_send{msg,page}()
helpers are also moved. msgr2 will bring its own, more efficient,
variants based on iov_iter. Switching msgr1 to them was considered
but decided against to avoid subtle regressions.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
In preparation for msgr2, define internal messenger <-> protocol
interface (as opposed to external messenger <-> client interface, which
is struct ceph_connection_operations) consisting of try_read(),
try_write(), revoke(), revoke_incoming(), opened(), reset_session() and
reset_protocol() ops. The semantics are exactly the same as they are
now.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
In preparation for msgr2, make all protocol independent functions
in messenger.c global.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
In preparation for msgr2, make zero_page global.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
In preparation for msgr2, move the defines to the header file.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
In preparation for msgr2, rename msgr1 specific states and move the
defines to the header file.
Also drop state transition comments. They don't cover all possible
transitions (e.g. NEGOTIATING -> STANDBY, etc) and currently do more
harm than good.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
unsigned long is a leftover from when con->state used to be a set of
bits managed with set_bit(), clear_bit(), etc. Save a bit of memory.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
It is set in process_ack() but never used.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
The current setting allows the backoff to climb up to 5 minutes. This
is too high -- it becomes hard to tell whether the client is stuck on
something or just in backoff.
In userspace, ms_max_backoff is defaulted to 15 seconds. Let's do the
same.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
When we added the btime feature in mainline ceph, we had to extend
struct ceph_mds_request_args so that it could be set. Implement the same
in the kernel client.
Rename ceph_mds_request_head with a _old extension, and a union
ceph_mds_request_args_ext to allow for the extended size of the new
header format.
Add the appropriate code to handle both formats in struct
create_request_message and key the behavior on whether the peer supports
CEPH_FEATURE_FS_BTIME.
The gid_list field in the payload is now populated from the saved
credential. For now, we don't add any support for setting the btime via
setattr, but this does enable us to add that in the future.
[ idryomov: break unnecessarily long lines ]
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
When setting the file/dir layout, it may need data pool info. So
in mds server, it needs to check the osdmap. At present, if mds
doesn't find the data pool specified, it will try to get the latest
osdmap. Now if pass the osd epoch for setxattr, the mds server can
only check this epoch of osdmap.
URL: https://tracker.ceph.com/issues/48504
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
1. monitor's default port is defined by CEPH_MON_PORT
2. CEPH_PORT_START and CEPH_PORT_LAST are not needed.
Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
When recovering a session (a'la recover_session=clean), we want to do
all of the operations that we do on a forced umount, but changing the
mount state to SHUTDOWN is can cause queued MDS requests to fail when
the session comes back. Most of those can idle until the session is
recovered in this situation.
Reserve SHUTDOWN state for forced umount, and make a new RECOVER state
for the forced reconnect situation. Change several tests for equality with
SHUTDOWN to test for that or RECOVER.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
kernel/elfcore.c only contains weak symbols, which triggers a bug with
clang in combination with recordmcount:
Cannot find symbol for section 2: .text.
kernel/elfcore.o: failed
Move the empty stubs into linux/elfcore.h as inline functions. As only
two architectures use these, just use the architecture specific Kconfig
symbols to key off the declaration.
Link: https://lkml.kernel.org/r/20201204165742.3815221-2-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Nathan Chancellor <natechancellor@gmail.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Barret Rhoden <brho@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
genksyms does not know or care about the _Static_assert() built-in, and
sometimes falls back to ignoring the later symbols, which causes
undefined behavior such as
WARNING: modpost: EXPORT symbol "ethtool_set_ethtool_phy_ops" [vmlinux] version generation failed, symbol will not be versioned.
ld: net/ethtool/common.o: relocation R_AARCH64_ABS32 against `__crc_ethtool_set_ethtool_phy_ops' can not be used when making a shared object
net/ethtool/common.o:(_ftrace_annotated_branch+0x0): dangerous relocation: unsupported relocation
Redefine static_assert for genksyms to avoid that.
Link: https://lkml.kernel.org/r/20201203230955.1482058-1-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Suggested-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Michal Marek <michal.lkml@markovi.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Cc: Marco Elver <elver@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull namespaced fscaps fix from James Morris:
"Fix namespaced fscaps when !CONFIG_SECURITY (Serge Hallyn)"
* tag 'fixes-v5.10a' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
[SECURITY] fix namespaced fscaps when !CONFIG_SECURITY
|
|
Pull NFS client fixes from Anna Schumaker:
"Here are a handful more bugfixes for 5.10.
Unfortunately, we found some problems with the new READ_PLUS operation
that aren't easy to fix. We've decided to disable this codepath
through a Kconfig option for now, but a series of patches going into
5.11 will clean up the code and fix the issues at the same time. This
seemed like the best way to go about it.
Summary:
- Fix array overflow when flexfiles mirroring is enabled
- Fix rpcrdma_inline_fixup() crash with new LISTXATTRS
- Fix 5 second delay when doing inter-server copy
- Disable READ_PLUS by default"
* tag 'nfs-for-5.10-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
NFS: Disable READ_PLUS by default
NFSv4.2: Fix 5 seconds delay when doing inter server copy
NFS: Fix rpcrdma_inline_fixup() crash with new LISTXATTRS operation
pNFS/flexfiles: Fix array overflow when flexfiles mirroring is enabled
|
|
Pull networking fixes from David Miller:
1) IPsec compat fixes, from Dmitry Safonov.
2) Fix memory leak in xfrm_user_policy(). Fix from Yu Kuai.
3) Fix polling in xsk sockets by using sk_poll_wait() instead of
datagram_poll() which keys off of sk_wmem_alloc and such which xsk
sockets do not update. From Xuan Zhuo.
4) Missing init of rekey_data in cfgh80211, from Sara Sharon.
5) Fix destroy of timer before init, from Davide Caratti.
6) Missing CRYPTO_CRC32 selects in ethernet driver Kconfigs, from Arnd
Bergmann.
7) Missing error return in rtm_to_fib_config() switch case, from Zhang
Changzhong.
8) Fix some src/dest address handling in vrf and add a testcase. From
Stephen Suryaputra.
9) Fix multicast handling in Seville switches driven by mscc-ocelot
driver. From Vladimir Oltean.
10) Fix proto value passed to skb delivery demux in udp, from Xin Long.
11) HW pkt counters not reported correctly in enetc driver, from Claudiu
Manoil.
12) Fix deadlock in bridge, from Joseph Huang.
13) Missing of_node_pur() in dpaa2 driver, fromn Christophe JAILLET.
14) Fix pid fetching in bpftool when there are a lot of results, from
Andrii Nakryiko.
15) Fix long timeouts in nft_dynset, from Pablo Neira Ayuso.
16) Various stymmac fixes, from Fugang Duan.
17) Fix null deref in tipc, from Cengiz Can.
18) When mss is biog, coose more resonable rcvq_space in tcp, fromn Eric
Dumazet.
19) Revert a geneve change that likely isnt necessary, from Jakub
Kicinski.
20) Avoid premature rx buffer reuse in various Intel driversm from Björn
Töpel.
21) retain EcT bits during TIS reflection in tcp, from Wei Wang.
22) Fix Tso deferral wrt. cwnd limiting in tcp, from Neal Cardwell.
23) MPLS_OPT_LSE_LABEL attribute is 342 ot 8 bits, from Guillaume Nault
24) Fix propagation of 32-bit signed bounds in bpf verifier and add test
cases, from Alexei Starovoitov.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (81 commits)
selftests: fix poll error in udpgro.sh
selftests/bpf: Fix "dubious pointer arithmetic" test
selftests/bpf: Fix array access with signed variable test
selftests/bpf: Add test for signed 32-bit bound check bug
bpf: Fix propagation of 32-bit signed bounds from 64-bit bounds.
MAINTAINERS: Add entry for Marvell Prestera Ethernet Switch driver
net: sched: Fix dump of MPLS_OPT_LSE_LABEL attribute in cls_flower
net/mlx4_en: Handle TX error CQE
net/mlx4_en: Avoid scheduling restart task if it is already running
tcp: fix cwnd-limited bug for TSO deferral where we send nothing
net: flow_offload: Fix memory leak for indirect flow block
tcp: Retain ECT bits for tos reflection
ethtool: fix stack overflow in ethnl_parse_bitset()
e1000e: fix S0ix flow to allow S0i3.2 subset entry
ice: avoid premature Rx buffer reuse
ixgbe: avoid premature Rx buffer reuse
i40e: avoid premature Rx buffer reuse
igb: avoid transmit queue timeout in xdp path
igb: use xdp_do_flush
igb: skb add metasize for xdp
...
|
|
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Switch to RCU in x_tables to fix possible NULL pointer dereference,
from Subash Abhinov Kasiviswanathan.
2) Fix netlink dump of dynset timeouts later than 23 days.
3) Add comment for the indirect serialization of the nft commit mutex
with rtnl_mutex.
4) Remove bogus check for confirmed conntrack when matching on the
conntrack ID, from Brett Mastbergen.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The current IP register MAC_HW_Feature1[ADDR64] only defines
32/40/64 bit width, but some SOCs support others like i.MX8MP
support 34 bits but it maps to 40 bits width in MAC_HW_Feature1[ADDR64].
So overwrite dma_cap.addr64 according to HW real design.
Fixes: 94abdad6974a ("net: ethernet: dwmac: add ethernet glue logic for NXP imx8 chip")
Signed-off-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When running concurrent iptables rules replacement with data, the per CPU
sequence count is checked after the assignment of the new information.
The sequence count is used to synchronize with the packet path without the
use of any explicit locking. If there are any packets in the packet path using
the table information, the sequence count is incremented to an odd value and
is incremented to an even after the packet process completion.
The new table value assignment is followed by a write memory barrier so every
CPU should see the latest value. If the packet path has started with the old
table information, the sequence counter will be odd and the iptables
replacement will wait till the sequence count is even prior to freeing the
old table info.
However, this assumes that the new table information assignment and the memory
barrier is actually executed prior to the counter check in the replacement
thread. If CPU decides to execute the assignment later as there is no user of
the table information prior to the sequence check, the packet path in another
CPU may use the old table information. The replacement thread would then free
the table information under it leading to a use after free in the packet
processing context-
Unable to handle kernel NULL pointer dereference at virtual
address 000000000000008e
pc : ip6t_do_table+0x5d0/0x89c
lr : ip6t_do_table+0x5b8/0x89c
ip6t_do_table+0x5d0/0x89c
ip6table_filter_hook+0x24/0x30
nf_hook_slow+0x84/0x120
ip6_input+0x74/0xe0
ip6_rcv_finish+0x7c/0x128
ipv6_rcv+0xac/0xe4
__netif_receive_skb+0x84/0x17c
process_backlog+0x15c/0x1b8
napi_poll+0x88/0x284
net_rx_action+0xbc/0x23c
__do_softirq+0x20c/0x48c
This could be fixed by forcing instruction order after the new table
information assignment or by switching to RCU for the synchronization.
Fixes: 80055dab5de0 ("netfilter: x_tables: make xt_replace_table wait until old rules are not used anymore")
Reported-by: Sean Tranchetti <stranche@codeaurora.org>
Reported-by: kernel test robot <lkp@intel.com>
Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty fixes from Greg KH:
"Here are two tty core fixes for 5.10-rc7.
They resolve some reported locking issues in the tty core. While they
have not been in a released linux-next yet, they have passed all of
the 0-day bot testing as well as the submitter's testing"
* tag 'tty-5.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty: Fix ->session locking
tty: Fix ->pgrp locking in tiocspgrp()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"A set of updates for the interrupt subsystem:
- Make multiqueue devices which use the managed interrupt affinity
infrastructure work on PowerPC/Pseries. PowerPC does not use the
generic infrastructure for setting up PCI/MSI interrupts and the
multiqueue changes failed to update the legacy PCI/MSI
infrastructure. Make this work by passing the affinity setup
information down to the mapping and allocation functions.
- Move Jason Cooper from MAINTAINERS to CREDITS as his mail is
bouncing and he's not reachable. We hope all is well with him and
say thanks for his work over the years"
* tag 'irq-urgent-2020-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
powerpc/pseries: Pass MSI affinity to irq_create_mapping()
genirq/irqdomain: Add an irq_create_mapping_affinity() function
MAINTAINERS: Move Jason Cooper to CREDITS
|
|
While I was doing zram testing, I found sometimes decompression failed
since the compression buffer was corrupted. With investigation, I found
below commit calls cond_resched unconditionally so it could make a
problem in atomic context if the task is reschedule.
BUG: sleeping function called from invalid context at mm/vmalloc.c:108
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 946, name: memhog
3 locks held by memhog/946:
#0: ffff9d01d4b193e8 (&mm->mmap_lock#2){++++}-{4:4}, at: __mm_populate+0x103/0x160
#1: ffffffffa3d53de0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath.constprop.0+0xa98/0x1160
#2: ffff9d01d56b8110 (&zspage->lock){.+.+}-{3:3}, at: zs_map_object+0x8e/0x1f0
CPU: 0 PID: 946 Comm: memhog Not tainted 5.9.3-00011-gc5bfc0287345-dirty #316
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
Call Trace:
unmap_kernel_range_noflush+0x2eb/0x350
unmap_kernel_range+0x14/0x30
zs_unmap_object+0xd5/0xe0
zram_bvec_rw.isra.0+0x38c/0x8e0
zram_rw_page+0x90/0x101
bdev_write_page+0x92/0xe0
__swap_writepage+0x94/0x4a0
pageout+0xe3/0x3a0
shrink_page_list+0xb94/0xd60
shrink_inactive_list+0x158/0x460
We can fix this by removing the ZSMALLOC_PGTABLE_MAPPING feature (which
contains the offending calling code) from zsmalloc.
Even though this option showed some amount improvement(e.g., 30%) in
some arm32 platforms, it has been headache to maintain since it have
abused APIs[1](e.g., unmap_kernel_range in atomic context).
Since we are approaching to deprecate 32bit machines and already made
the config option available for only builtin build since v5.8, lastly it
has been not default option in zsmalloc, it's time to drop the option
for better maintenance.
[1] http://lore.kernel.org/linux-mm/20201105170249.387069-1-minchan@kernel.org
Fixes: e47110e90584 ("mm/vunmap: add cond_resched() in vunmap_pmd_range")
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Harish Sriram <harish@linux.ibm.com>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20201117202916.GA3856507@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull fix for device mapper fixes from Mike Snitzer:
"Apologies for the glaring bug I introduced with my previous pull
request!
Fix incorrect branching at top of blk_max_size_offset()"
* tag 'for-5.10/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
block: fix incorrect branching in blk_max_size_offset()
|
|
Namespaced file capabilities were introduced in 8db6c34f1dbc .
When userspace reads an xattr for a namespaced capability, a
virtualized representation of it is returned if the caller is
in a user namespace owned by the capability's owning rootid.
The function which performs this virtualization was not hooked
up if CONFIG_SECURITY=n. Therefore in that case the original
xattr was shown instead of the virtualized one.
To test this using libcap-bin (*1),
$ v=$(mktemp)
$ unshare -Ur setcap cap_sys_admin-eip $v
$ unshare -Ur setcap -v cap_sys_admin-eip $v
/tmp/tmp.lSiIFRvt8Y: OK
"setcap -v" verifies the values instead of setting them, and
will check whether the rootid value is set. Therefore, with
this bug un-fixed, and with CONFIG_SECURITY=n, setcap -v will
fail:
$ v=$(mktemp)
$ unshare -Ur setcap cap_sys_admin=eip $v
$ unshare -Ur setcap -v cap_sys_admin=eip $v
nsowner[got=1000, want=0],/tmp/tmp.HHDiOOl9fY differs in []
Fix this bug by calling cap_inode_getsecurity() in
security_inode_getsecurity() instead of returning
-EOPNOTSUPP, when CONFIG_SECURITY=n.
*1 - note, if libcap is too old for getcap to have the '-n'
option, then use verify-caps instead.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=209689
Cc: Hervé Guillemet <herve@guillemet.org>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Serge Hallyn <shallyn@cisco.com>
Signed-off-by: Andrew G. Morgan <morgan@kernel.org>
Signed-off-by: James Morris <jamorris@linux.microsoft.com>
|
|
If non-zero 'chunk_sectors' is passed in to blk_max_size_offset() that
override will be incorrectly ignored.
Old blk_max_size_offset() branching, prior to commit 3ee16db390b4,
must be used only if passed 'chunk_sectors' override is zero.
Fixes: 3ee16db390b4 ("dm: fix IO splitting")
Cc: stable@vger.kernel.org # 5.9
Reported-by: John Dorminy <jdorminy@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- Fix DM's bio splitting changes that were made during v5.9. This
restores splitting in terms of varied per-target ti->max_io_len
rather than use block core's single stacked 'chunk_sectors' limit.
- Like DM crypt, update DM integrity to not use crypto drivers that
have CRYPTO_ALG_ALLOCATES_MEMORY set.
- Fix DM writecache target's argument parsing and status display.
- Remove needless BUG() from dm writecache's persistent_memory_claim()
- Remove old gcc workaround in DM cache target's block_div() for ARM
link errors now that gcc >= 4.9 is required.
- Fix RCU locking in dm_blk_report_zones and dm_dax_zero_page_range.
- Remove old, and now frowned upon, BUG_ON(in_interrupt()) in
dm_table_event().
- Remove invalid sparse annotations from dm_prepare_ioctl() and
dm_unprepare_ioctl().
* tag 'for-5.10/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm: remove invalid sparse __acquires and __releases annotations
dm: fix double RCU unlock in dm_dax_zero_page_range() error path
dm: fix IO splitting
dm writecache: remove BUG() and fail gracefully instead
dm table: Remove BUG_ON(in_interrupt())
dm: fix bug with RCU locking in dm_blk_report_zones
Revert "dm cache: fix arm link errors with inline"
dm writecache: fix the maximum number of arguments
dm writecache: advance the number of arguments when reporting max_age
dm integrity: don't use drivers that have CRYPTO_ALG_ALLOCATES_MEMORY
|
|
Commit 882ec4e609c1 ("dm table: stack 'chunk_sectors' limit to account
for target-specific splitting") caused a couple regressions:
1) Using lcm_not_zero() when stacking chunk_sectors was a bug because
chunk_sectors must reflect the most limited of all devices in the
IO stack.
2) DM targets that set max_io_len but that do _not_ provide an
.iterate_devices method no longer had there IO split properly.
And commit 5091cdec56fa ("dm: change max_io_len() to use
blk_max_size_offset()") also caused a regression where DM no longer
supported varied (per target) IO splitting. The implication being the
potential for severely reduced performance for IO stacks that use a DM
target like dm-cache to hide performance limitations of a slower
device (e.g. one that requires 4K IO splitting).
Coming full circle: Fix all these issues by discontinuing stacking
chunk_sectors up using ti->max_io_len in dm_calculate_queue_limits(),
add optional chunk_sectors override argument to blk_max_size_offset()
and update DM's max_io_len() to pass ti->max_io_len to its
blk_max_size_offset() call.
Passing in an optional chunk_sectors override to blk_max_size_offset()
allows for code reuse of block's centralized calculation for max IO
size based on provided offset and split boundary.
Fixes: 882ec4e609c1 ("dm table: stack 'chunk_sectors' limit to account for target-specific splitting")
Fixes: 5091cdec56fa ("dm: change max_io_len() to use blk_max_size_offset()")
Cc: stable@vger.kernel.org
Reported-by: John Dorminy <jdorminy@redhat.com>
Reported-by: Bruce Johnston <bjohnsto@redhat.com>
Reported-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Reviewed-by: John Dorminy <jdorminy@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
|
|
Currently, locking of ->session is very inconsistent; most places
protect it using the legacy tty mutex, but disassociate_ctty(),
__do_SAK(), tiocspgrp() and tiocgsid() don't.
Two of the writers hold the ctrl_lock (because they already need it for
->pgrp), but __proc_set_tty() doesn't do that yet.
On a PREEMPT=y system, an unprivileged user can theoretically abuse
this broken locking to read 4 bytes of freed memory via TIOCGSID if
tiocgsid() is preempted long enough at the right point. (Other things
might also go wrong, especially if root-only ioctls are involved; I'm
not sure about that.)
Change the locking on ->session such that:
- tty_lock() is held by all writers: By making disassociate_ctty()
hold it. This should be fine because the same lock can already be
taken through the call to tty_vhangup_session().
The tricky part is that we need to shorten the area covered by
siglock to be able to take tty_lock() without ugly retry logic; as
far as I can tell, this should be fine, since nothing in the
signal_struct is touched in the `if (tty)` branch.
- ctrl_lock is held by all writers: By changing __proc_set_tty() to
hold the lock a little longer.
- All readers that aren't holding tty_lock() hold ctrl_lock: By
adding locking to tiocgsid() and __do_SAK(), and expanding the area
covered by ctrl_lock in tiocspgrp().
Cc: stable@kernel.org
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Jiri Slaby <jirislaby@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Networking fixes for 5.10-rc7, including fixes from bpf, netfilter,
wireless drivers, wireless mesh and can.
Current release - regressions:
- mt76: usb: fix crash on device removal
Current release - always broken:
- xsk: Fix umem cleanup from wrong context in socket destruct
Previous release - regressions:
- net: ip6_gre: set dev->hard_header_len when using header_ops
- ipv4: Fix TOS mask in inet_rtm_getroute()
- net, xsk: Avoid taking multiple skbuff references
Previous release - always broken:
- net/x25: prevent a couple of overflows
- netfilter: ipset: prevent uninit-value in hash_ip6_add
- geneve: pull IP header before ECN decapsulation
- mpls: ensure LSE is pullable in TC and openvswitch paths
- vxlan: respect needed_headroom of lower device
- batman-adv: Consider fragmentation for needed packet headroom
- can: drivers: don't count arbitration loss as an error
- netfilter: bridge: reset skb->pkt_type after POST_ROUTING traversal
- inet_ecn: Fix endianness of checksum update when setting ECT(1)
- ibmvnic: fix various corner cases around reset handling
- net/mlx5: fix rejecting unsupported Connect-X6DX SW steering
- net/mlx5: Enforce HW TX csum offload with kTLS"
* tag 'net-5.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (62 commits)
net/mlx5: DR, Proper handling of unsupported Connect-X6DX SW steering
net/mlx5e: kTLS, Enforce HW TX csum offload with kTLS
net: mlx5e: fix fs_tcp.c build when IPV6 is not enabled
net/mlx5: Fix wrong address reclaim when command interface is down
net/sched: act_mpls: ensure LSE is pullable before reading it
net: openvswitch: ensure LSE is pullable before reading it
net: skbuff: ensure LSE is pullable before decrementing the MPLS ttl
net: mvpp2: Fix error return code in mvpp2_open()
chelsio/chtls: fix a double free in chtls_setkey()
rtw88: debug: Fix uninitialized memory in debugfs code
vxlan: fix error return code in __vxlan_dev_create()
net: pasemi: fix error return code in pasemi_mac_open()
cxgb3: fix error return code in t3_sge_alloc_qset()
net/x25: prevent a couple of overflows
dpaa_eth: copy timestamp fields to new skb in A-050385 workaround
net: ip6_gre: set dev->hard_header_len when using header_ops
mt76: usb: fix crash on device removal
iwlwifi: pcie: add some missing entries for AX210
iwlwifi: pcie: invert values of NO_160 device config entries
iwlwifi: pcie: add one missing entry for AX210
...
|
|
STEs format for Connect-X5 and Connect-X6DX different. Currently, on
Connext-X6DX the SW steering would break at some point when building STEs
w/o giving a proper error message. Fix this by checking the STE format of
the current device when initializing domain: add mlx5_ifc definitions for
Connect-X6DX SW steering, read FW capability to get the current format
version, and check this version when domain is being created.
Fixes: 26d688e33f88 ("net/mlx5: DR, Add Steering entry (STE) utilities")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
- Use correct timestamp variable for ring buffer write stamp update
- Fix up before stamp and write stamp when crossing ring buffer sub
buffers
- Keep a zero delta in ring buffer in slow path if cmpxchg fails
- Fix trace_printk static buffer for archs that care
- Fix ftrace record accounting for ftrace ops with trampolines
- Fix DYNAMIC_FTRACE_WITH_DIRECT_CALLS dependency
- Remove WARN_ON in hwlat tracer that triggers on something that is OK
- Make "my_tramp" trampoline in ftrace direct sample code global
- Fixes in the bootconfig tool for better alignment management
* tag 'trace-v5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ring-buffer: Always check to put back before stamp when crossing pages
ftrace: Fix DYNAMIC_FTRACE_WITH_DIRECT_CALLS dependency
ftrace: Fix updating FTRACE_FL_TRAMP
tracing: Fix alignment of static buffer
tracing: Remove WARN_ON in start_thread()
samples/ftrace: Mark my_tramp[12]? global
ring-buffer: Set the right timestamp in the slow path of __rb_reserve_next()
ring-buffer: Update write stamp with the correct ts
docs: bootconfig: Update file format on initrd image
tools/bootconfig: Align the bootconfig applied initrd image size to 4
tools/bootconfig: Fix to check the write failure correctly
tools/bootconfig: Fix errno reference after printf()
|
|
If the flexfiles mirroring is enabled, then the read code expects to be
able to set pgio->pg_mirror_idx to point to the data server that is
being used for this particular read. However it does not change the
pg_mirror_count because we only need to send a single read.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
There is currently no way to convey the affinity of an interrupt
via irq_create_mapping(), which creates issues for devices that
expect that affinity to be managed by the kernel.
In order to sort this out, rename irq_create_mapping() to
irq_create_mapping_affinity() with an additional affinity parameter that
can be passed down to irq_domain_alloc_descs().
irq_create_mapping() is re-implemented as a wrapper around
irq_create_mapping_affinity().
No functional change.
Fixes: e75eafb9b039 ("genirq/msi: Switch to new irq spreading infrastructure")
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kurz <groug@kaod.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20201126082852.1178497-2-lvivier@redhat.com
|
|
Daniel Borkmann says:
====================
pull-request: bpf 2020-11-28
1) Do not reference the skb for xsk's generic TX side since when looped
back into RX it might crash in generic XDP, from Björn Töpel.
2) Fix umem cleanup on a partially set up xsk socket when being destroyed,
from Magnus Karlsson.
3) Fix an incorrect netdev reference count when failing xsk_bind() operation,
from Marek Majtyka.
4) Fix bpftool to set an error code on failed calloc() in build_btf_type_table(),
from Zhen Lei.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Add MAINTAINERS entry for BPF LSM
bpftool: Fix error return value in build_btf_type_table
net, xsk: Avoid taking multiple skbuff references
xsk: Fix incorrect netdev reference count
xsk: Fix umem cleanup bug at socket destruct
MAINTAINERS: Update XDP and AF_XDP entries
====================
Link: https://lore.kernel.org/r/20201128005104.1205-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic fix from Arnd Bergmann:
"Add correct MAX_POSSIBLE_PHYSMEM_BITS setting to asm-generic.
This is a single bugfix for a bug that Stefan Agner found on 32-bit
Arm, but that exists on several other architectures"
* tag 'asm-generic-fixes-5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
arch: pgtable: define MAX_POSSIBLE_PHYSMEM_BITS where needed
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
"Another set of patches for devicetree files and Arm SoC specific
drivers:
- A fix for OP-TEE shared memory on non-SMP systems
- multiple code fixes for the OMAP platform, including one regression
for the CPSW network driver and a few runtime warning fixes
- Some DT patches for the Rockchip RK3399 platform, in particular
fixing the MMC device ordering that recently became
nondeterministic with async probe.
- Multiple DT fixes for the Tegra platform, including a regression
fix for suspend/resume on TX2
- A regression fix for a user-triggered fault in the NXP dpio driver
- A regression fix for a bug caused by an earlier bug fix in the
xilinx firmware driver
- Two more DTC warning fixes
- Sylvain Lemieux steps down as maintainer for the NXP LPC32xx
platform"
* tag 'arm-soc-fixes-v5.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (24 commits)
arm64: tegra: Fix Tegra234 VDK node names
arm64: tegra: Wrong AON HSP reg property size
arm64: tegra: Fix USB_VBUS_EN0 regulator on Jetson TX1
arm64: tegra: Correct the UART for Jetson Xavier NX
arm64: tegra: Disable the ACONNECT for Jetson TX2
optee: add writeback to valid memory type
firmware: xilinx: Use hash-table for api feature check
firmware: xilinx: Fix SD DLL node reset issue
soc: fsl: dpio: Get the cpumask through cpumask_of(cpu)
ARM: dts: dra76x: m_can: fix order of clocks
bus: ti-sysc: suppress err msg for timers used as clockevent/source
MAINTAINERS: Remove myself as LPC32xx maintainers
arm64: dts: qcom: clear the warnings caused by empty dma-ranges
arm64: dts: broadcom: clear the warnings caused by empty dma-ranges
ARM: dts: am437x-l4: fix compatible for cpsw switch dt node
arm64: dts: rockchip: Reorder LED triggers from mmc devices on rk3399-roc-pc.
arm64: dts: rockchip: Assign a fixed index to mmc devices on rk3399 boards.
arm64: dts: rockchip: Remove system-power-controller from pmic on Odroid Go Advance
arm64: dts: rockchip: fix NanoPi R2S GMAC clock name
ARM: OMAP2+: Manage MPU state properly for omap_enter_idle_coupled()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Networking fixes for 5.10-rc6, including fixes from the WiFi driver,
and CAN subtrees.
Current release - regressions:
- gro_cells: reduce number of synchronize_net() calls
- ch_ktls: release a lock before jumping to an error path
Current release - always broken:
- tcp: Allow full IP tos/IPv6 tclass to be reflected in L3 header
Previous release - regressions:
- net/tls: fix missing received data after fast remote close
- vsock/virtio: discard packets only when socket is really closed
- sock: set sk_err to ee_errno on dequeue from errq
- cxgb4: fix the panic caused by non smac rewrite
Previous release - always broken:
- tcp: fix corner cases around setting ECN with BPF selection of
congestion control
- tcp: fix race condition when creating child sockets from syncookies
on loopback interface
- usbnet: ipheth: fix connectivity with iOS 14
- tun: honor IOCB_NOWAIT flag
- net/packet: fix packet receive on L3 devices without visible hard
header
- devlink: Make sure devlink instance and port are in same net
namespace
- net: openvswitch: fix TTL decrement action netlink message format
- bonding: wait for sysfs kobject destruction before freeing struct
slave
- net: stmmac: fix upstream patch applied to the wrong context
- bnxt_en: fix return value and unwind in probe error paths
Misc:
- devlink: add extra layer of categorization to the reload stats uAPI
before it's released"
* tag 'net-5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (68 commits)
sock: set sk_err to ee_errno on dequeue from errq
mptcp: fix NULL ptr dereference on bad MPJ
net: openvswitch: fix TTL decrement action netlink message format
can: af_can: can_rx_unregister(): remove WARN() statement from list operation sanity check
can: m_can: m_can_dev_setup(): add support for bosch mcan version 3.3.0
can: m_can: fix nominal bitiming tseg2 min for version >= 3.1
can: m_can: m_can_open(): remove IRQF_TRIGGER_FALLING from request_threaded_irq()'s flags
can: mcp251xfd: mcp251xfd_probe(): bail out if no IRQ was given
can: gs_usb: fix endianess problem with candleLight firmware
ch_ktls: lock is not freed
net/tls: Protect from calling tls_dev_del for TLS RX twice
devlink: Make sure devlink instance and port are in same net namespace
devlink: Hold rtnl lock while reading netdev attributes
ptp: clockmatrix: bug fix for idtcm_strverscmp
enetc: Let the hardware auto-advance the taprio base-time of 0
gro_cells: reduce number of synchronize_net() calls
net: stmmac: fix incorrect merge of patch upstream
ipv6: addrlabel: fix possible memory leak in ip6addrlbl_net_init
Documentation: netdev-FAQ: suggest how to post co-dependent series
ibmvnic: enhance resetting status check during module exit
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/fixes
Fixes for omaps for various issues noticed during the -rc cycle:
- Earlier omap4 cpuidle fix was incomplete and needs to use a
configured idle state instead
- Fix am4 cpsw driver compatible to avoid invalid resource error
for the legacy driver
- Two kconfig fixes for genpd support that we added for for v5.10
for proper location of the option and adding missing option
- Fix ti-sysc reset status checking on enabling modules to ignore
quirky modules with reset status only usable when the quirk is
activated during reset. Also fix bogus resetdone warning for
cpsw and modules with no sysst register reset status bit
- Suppress a ti-sysc warning for timers reserved as system timers
- Fix the ordering of clocks for dra7 m_can
* tag 'omap-for-v5.10/fixes-rc5-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: dts: dra76x: m_can: fix order of clocks
bus: ti-sysc: suppress err msg for timers used as clockevent/source
ARM: dts: am437x-l4: fix compatible for cpsw switch dt node
ARM: OMAP2+: Manage MPU state properly for omap_enter_idle_coupled()
bus: ti-sysc: Fix bogus resetdone warning on enable for cpsw
bus: ti-sysc: Fix reset status check for modules with quirks
ARM: OMAP2+: Fix missing select PM_GENERIC_DOMAINS_OF
ARM: OMAP2+: Fix location for select PM_GENERIC_DOMAINS
Link: https://lore.kernel.org/r/pull-1606460270-864284@atomide.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
0day reported one -22.7% regression for will-it-scale page_fault2
case [1] on a 4 sockets 144 CPU platform, and bisected to it to be
caused by Waiman's optimization (commit bd0b230fe1) of saving one
'struct page_counter' space for 'struct mem_cgroup'.
Initially we thought it was due to the cache alignment change introduced
by the patch, but further debug shows that it is due to some hot data
members ('vmstats_local', 'vmstats_percpu', 'vmstats') sit in 2 adjacent
cacheline (2N and 2N+1 cacheline), and when adjacent cache line prefetch
is enabled, it triggers an "extended level" of cache false sharing for
2 adjacent cache lines.
So exchange the 2 member blocks, while keeping mostly the original
cache alignment, which can restore and even enhance the performance,
and save 64 bytes of space for 'struct mem_cgroup' (from 2880 to 2816,
with 0day's default RHEL-8.3 kernel config)
[1]. https://lore.kernel.org/lkml/20201102091543.GM31092@shao2-debian/
Fixes: bd0b230fe145 ("mm/memcg: unify swap and memsw page counters")
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Acked-by: Waiman Long <longman@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit 642e450b6b59 ("xsk: Do not discard packet when NETDEV_TX_BUSY")
addressed the problem that packets were discarded from the Tx AF_XDP
ring, when the driver returned NETDEV_TX_BUSY. Part of the fix was
bumping the skbuff reference count, so that the buffer would not be
freed by dev_direct_xmit(). A reference count larger than one means
that the skbuff is "shared", which is not the case.
If the "shared" skbuff is sent to the generic XDP receive path,
netif_receive_generic_xdp(), and pskb_expand_head() is entered the
BUG_ON(skb_shared(skb)) will trigger.
This patch adds a variant to dev_direct_xmit(), __dev_direct_xmit(),
where a user can select the skbuff free policy. This allows AF_XDP to
avoid bumping the reference count, but still keep the NETDEV_TX_BUSY
behavior.
Fixes: 642e450b6b59 ("xsk: Do not discard packet when NETDEV_TX_BUSY")
Reported-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20201123175600.146255-1-bjorn.topel@gmail.com
|
|
Currently array of fix length PM_API_MAX is used to cache
the pm_api version (valid or invalid). However ATF based
PM APIs values are much higher then PM_API_MAX.
So to include ATF based PM APIs also, use hash-table to
store the pm_api version status.
Signed-off-by: Amit Sunil Dhamne <amit.sunil.dhamne@xilinx.com>
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ravi Patel <ravi.patel@xilinx.com>
Signed-off-by: Rajan Vaja <rajan.vaja@xilinx.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Michal Simek <michal.simek@xilinx.com>
Fixes: f3217d6f2f7a ("firmware: xilinx: fix out-of-bounds access")
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/1606197161-25976-1-git-send-email-rajan.vaja@xilinx.com
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
|
|
In the patchset merged by commit b9fcf0a0d826
("Merge branch 'support-AF_PACKET-for-layer-3-devices'") L3 devices which
did not have header_ops were given one for the purpose of protocol parsing
on af_packet transmit path.
That change made af_packet receive path regard these devices as having a
visible L3 header and therefore aligned incoming skb->data to point to the
skb's mac_header. Some devices, such as ipip, xfrmi, and others, do not
reset their mac_header prior to ingress and therefore their incoming
packets became malformed.
Ideally these devices would reset their mac headers, or af_packet would be
able to rely on dev->hard_header_len being 0 for such cases, but it seems
this is not the case.
Fix by changing af_packet RX ll visibility criteria to include the
existence of a '.create()' header operation, which is used when creating
a device hard header - via dev_hard_header() - by upper layers, and does
not exist in these L3 devices.
As this predicate may be useful in other situations, add it as a common
dev_has_header() helper in netdevice.h.
Fixes: b9fcf0a0d826 ("Merge branch 'support-AF_PACKET-for-layer-3-devices'")
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Acked-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20201121062817.3178900-1-eyal.birger@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Thomas Gleixner:
"A couple of scheduler fixes:
- Make the conditional update of the overutilized state work
correctly by caching the relevant flags state before overwriting
them and checking them afterwards.
- Fix a data race in the wakeup path which caused loadavg on ARM64
platforms to become a random number generator.
- Fix the ordering of the iowaiter accounting operations so it can't
be decremented before it is incremented.
- Fix a bug in the deadline scheduler vs. priority inheritance when a
non-deadline task A has inherited the parameters of a deadline task
B and then blocks on a non-deadline task C.
The second inheritance step used the static deadline parameters of
task A, which are usually 0, instead of further propagating task
B's parameters. The zero initialized parameters trigger a bug in
the deadline scheduler"
* tag 'sched-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/deadline: Fix priority inheritance with multiple scheduling classes
sched: Fix rq->nr_iowait ordering
sched: Fix data-race in wakeup
sched/fair: Fix overutilized update in enqueue_task_fair()
|
|
Merge misc fixes from Andrew Morton:
"8 patches.
Subsystems affected by this patch series: mm (madvise, pagemap,
readahead, memcg, userfaultfd), kbuild, and vfs"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: fix madvise WILLNEED performance problem
libfs: fix error cast of negative value in simple_attr_write()
mm/userfaultfd: do not access vma->vm_mm after calling handle_userfault()
mm: memcg/slab: fix root memcg vmstats
mm: fix readahead_page_batch for retry entries
mm: fix phys_to_target_node() and memory_add_physaddr_to_nid() exports
compiler-clang: remove version check for BPF Tracing
mm/madvise: fix memory leak from process_madvise
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"A final set of miscellaneous bug fixes for ext4"
* tag 'ext4_for_linus_fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix bogus warning in ext4_update_dx_flag()
jbd2: fix kernel-doc markups
ext4: drop fast_commit from /proc/mounts
|
|
Both btrfs and fuse have reported faults caused by seeing a retry entry
instead of the page they were looking for. This was caused by a missing
check in the iterator.
As can be seen in the below panic log, the accessing 0x402 causes a
panic. In the xarray.h, 0x402 means RETRY_ENTRY.
BUG: kernel NULL pointer dereference, address: 0000000000000402
CPU: 14 PID: 306003 Comm: as Not tainted 5.9.0-1-amd64 #1 Debian 5.9.1-1
Hardware name: Lenovo ThinkSystem SR665/7D2VCTO1WW, BIOS D8E106Q-1.01 05/30/2020
RIP: 0010:fuse_readahead+0x152/0x470 [fuse]
Code: 41 8b 57 18 4c 8d 54 10 ff 4c 89 d6 48 8d 7c 24 10 e8 d2 e3 28 f9 48 85 c0 0f 84 fe 00 00 00 44 89 f2 49 89 04 d4 44 8d 72 01 <48> 8b 10 41 8b 4f 1c 48 c1 ea 10 83 e2 01 80 fa 01 19 d2 81 e2 01
RSP: 0018:ffffad99ceaebc50 EFLAGS: 00010246
RAX: 0000000000000402 RBX: 0000000000000001 RCX: 0000000000000002
RDX: 0000000000000000 RSI: ffff94c5af90bd98 RDI: ffffad99ceaebc60
RBP: ffff94ddc1749a00 R08: 0000000000000402 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000100 R12: ffff94de6c429ce0
R13: ffff94de6c4d3700 R14: 0000000000000001 R15: ffffad99ceaebd68
FS: 00007f228c5c7040(0000) GS:ffff94de8ed80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000402 CR3: 0000001dbd9b4000 CR4: 0000000000350ee0
Call Trace:
read_pages+0x83/0x270
page_cache_readahead_unbounded+0x197/0x230
generic_file_buffered_read+0x57a/0xa20
new_sync_read+0x112/0x1a0
vfs_read+0xf8/0x180
ksys_read+0x5f/0xe0
do_syscall_64+0x33/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 042124cc64c3 ("mm: add new readahead_control API")
Reported-by: David Sterba <dsterba@suse.com>
Reported-by: Wonhyuk Yang <vvghjk1234@gmail.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20201103142852.8543-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20201103124349.16722-1-vvghjk1234@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The core-mm has a default __weak implementation of phys_to_target_node()
to mirror the weak definition of memory_add_physaddr_to_nid(). That
symbol is exported for modules. However, while the export in
mm/memory_hotplug.c exported the symbol in the configuration cases of:
CONFIG_NUMA_KEEP_MEMINFO=y
CONFIG_MEMORY_HOTPLUG=y
...and:
CONFIG_NUMA_KEEP_MEMINFO=n
CONFIG_MEMORY_HOTPLUG=y
...it failed to export the symbol in the case of:
CONFIG_NUMA_KEEP_MEMINFO=y
CONFIG_MEMORY_HOTPLUG=n
Not only is that broken, but Christoph points out that the kernel should
not be exporting any __weak symbol, which means that
memory_add_physaddr_to_nid() example that phys_to_target_node() copied
is broken too.
Rework the definition of phys_to_target_node() and
memory_add_physaddr_to_nid() to not require weak symbols. Move to the
common arch override design-pattern of an asm header defining a symbol
to replace the default implementation.
The only common header that all memory_add_physaddr_to_nid() producing
architectures implement is asm/sparsemem.h. In fact, powerpc already
defines its memory_add_physaddr_to_nid() helper in sparsemem.h.
Double-down on that observation and define phys_to_target_node() where
necessary in asm/sparsemem.h. An alternate consideration that was
discarded was to put this override in asm/numa.h, but that entangles
with the definition of MAX_NUMNODES relative to the inclusion of
linux/nodemask.h, and requires powerpc to grow a new header.
The dependency on NUMA_KEEP_MEMINFO for DEV_DAX_HMEM_DEVICES is invalid
now that the symbol is properly exported / stubbed in all combinations
of CONFIG_NUMA_KEEP_MEMINFO and CONFIG_MEMORY_HOTPLUG.
[dan.j.williams@intel.com: v4]
Link: https://lkml.kernel.org/r/160461461867.1505359.5301571728749534585.stgit@dwillia2-desk3.amr.corp.intel.com
[dan.j.williams@intel.com: powerpc: fix create_section_mapping compile warning]
Link: https://lkml.kernel.org/r/160558386174.2948926.2740149041249041764.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: a035b6bf863e ("mm/memory_hotplug: introduce default phys_to_target_node() implementation")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Link: https://lkml.kernel.org/r/160447639846.1133764.7044090803980177548.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|