Age | Commit message (Collapse) | Author | Files | Lines |
|
[ Upstream commit 0d979509539ed1df883a30d442177ca7be609565 ]
The huge page functionality in TTM does not work safely because PUD and
PMD entries do not have a special bit.
get_user_pages_fast() considers any page that passed pmd_huge() as
usable:
if (unlikely(pmd_trans_huge(pmd) || pmd_huge(pmd) ||
pmd_devmap(pmd))) {
And vmf_insert_pfn_pmd_prot() unconditionally sets
entry = pmd_mkhuge(pfn_t_pmd(pfn, prot));
eg on x86 the page will be _PAGE_PRESENT | PAGE_PSE.
As such gup_huge_pmd() will try to deref a struct page:
head = try_grab_compound_head(pmd_page(orig), refs, flags);
and thus crash.
Thomas further notices that the drivers are not expecting the struct page
to be used by anything - in particular the refcount incr above will cause
them to malfunction.
Thus everything about this is not able to fully work correctly considering
GUP_fast. Delete it entirely. It can return someday along with a proper
PMD/PUD_SPECIAL bit in the page table itself to gate GUP_fast.
Fixes: 314b6580adc5 ("drm/ttm, drm/vmwgfx: Support huge TTM pagefaults")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Thomas Hellström <thomas.helllstrom@linux.intel.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
[danvet: Update subject per Thomas' &Christian's review]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/0-v2-a44694790652+4ac-ttm_pmd_jgg@nvidia.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 92f62485b3715882cd397b0cbd80a96d179b86d6 ]
Normally it is expected that the dsa_device_ops :: rcv() method finishes
parsing the DSA tag and consumes it, then never looks at it again.
But commit c0bcf537667c ("net: dsa: ocelot: add hardware timestamping
support for Felix") added support for RX timestamping in a very
unconventional way. On this switch, a partial timestamp is available in
the DSA header, but the driver got away with not parsing that timestamp
right away, but instead delayed that parsing for a little longer:
dsa_switch_rcv():
nskb = cpu_dp->rcv(skb, dev); <------------- not here
-> ocelot_rcv()
...
skb = nskb;
skb_push(skb, ETH_HLEN);
skb->pkt_type = PACKET_HOST;
skb->protocol = eth_type_trans(skb, skb->dev);
...
if (dsa_skb_defer_rx_timestamp(p, skb)) <--- but here
-> felix_rxtstamp()
return 0;
When in felix_rxtstamp(), this driver accounted for the fact that
eth_type_trans() happened in the meanwhile, so it got a hold of the
extraction header again by subtracting (ETH_HLEN + OCELOT_TAG_LEN) bytes
from the current skb->data.
This worked for quite some time but was quite fragile from the very
beginning. Not to mention that having DSA tag parsing split in two
different files, under different folders (net/dsa/tag_ocelot.c vs
drivers/net/dsa/ocelot/felix.c) made it quite non-obvious for patches to
come that they might break this.
Finally, the blamed commit does the following: at the end of
ocelot_rcv(), it checks whether the skb payload contains a VLAN header.
If it does, and this port is under a VLAN-aware bridge, that VLAN ID
might not be correct in the sense that the packet might have suffered
VLAN rewriting due to TCAM rules (VCAP IS1). So we consume the VLAN ID
from the skb payload using __skb_vlan_pop(), and take the classified
VLAN ID from the DSA tag, and construct a hwaccel VLAN tag with the
classified VLAN, and the skb payload is VLAN-untagged.
The big problem is that __skb_vlan_pop() does:
memmove(skb->data + VLAN_HLEN, skb->data, 2 * ETH_ALEN);
__skb_pull(skb, VLAN_HLEN);
aka it moves the Ethernet header 4 bytes to the right, and pulls 4 bytes
from the skb headroom (effectively also moving skb->data, by definition).
So for felix_rxtstamp()'s fragile logic, all bets are off now.
Instead of having the "extraction" pointer point to the DSA header,
it actually points to 4 bytes _inside_ the extraction header.
Corollary, the last 4 bytes of the "extraction" header are in fact 4
stale bytes of the destination MAC address from the Ethernet header,
from prior to the __skb_vlan_pop() movement.
So of course, RX timestamps are completely bogus when the system is
configured in this way.
The fix is actually very simple: just don't structure the code like that.
For better or worse, the DSA PTP timestamping API does not offer a
straightforward way for drivers to present their RX timestamps, but
other drivers (sja1105) have established a simple mechanism to carry
their RX timestamp from dsa_device_ops :: rcv() all the way to
dsa_switch_ops :: port_rxtstamp() and even later. That mechanism is to
simply save the partial timestamp to the skb->cb, and complete it later.
Question: why don't we simply populate the skb's struct
skb_shared_hwtstamps from ocelot_rcv(), and bother with this
complication of propagating the timestamp to felix_rxtstamp()?
Answer: dsa_switch_ops :: port_rxtstamp() answers the question whether
PTP packets need sleepable context to retrieve the full RX timestamp.
Currently felix_rxtstamp() answers "no, thanks" to that question, and
calls ocelot_ptp_gettime64() from softirq atomic context. This is
understandable, since Felix VSC9959 is a PCIe memory-mapped switch, so
hardware access does not require sleeping. But the felix driver is
preparing for the introduction of other switches where hardware access
is over a slow bus like SPI or MDIO:
https://lore.kernel.org/lkml/20210814025003.2449143-1-colin.foster@in-advantage.com/
So I would like to keep this code structure, so the rework needed when
that driver will need PTP support will be minimal (answer "yes, I need
deferred context for this skb's RX timestamp", then the partial
timestamp will still be found in the skb->cb.
Fixes: ea440cd2d9b2 ("net: dsa: tag_ocelot: use VLAN information from tagging header when available")
Reported-by: Po Liu <po.liu@nxp.com>
Cc: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 1aabe578dd86e9f2867c4db4fba9a15f4ba1825d ]
ETHTOOL_A_PAUSE_STAT_MAX is the MAX attribute id,
so we need to subtract non-stats and add one to
get a count (IOW -2+1 == -1).
Otherwise we'll see:
ethnl cmd 21: calculated reply length 40, but consumed 52
Fixes: 9a27a33027f2 ("ethtool: add standard pause stats")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 133a48abf6ecc535d7eddc6da1c3e4c972445882 ]
If O_DIRECT bumps the commit_info rpcs_out field, then that could lead
to fsync() hangs. The fix is to ensure that O_DIRECT calls
nfs_commit_end().
Fixes: 723c921e7dfc ("sched/wait, fs/nfs: Convert wait_on_atomic_t() usage to the new wait_var_event() API")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 537d3af1bee8ad1415fda9b622d1ea6d1ae76dfa ]
According to the description of the rpmsg_create_ept in rpmsg_core.c
the function should return NULL on error.
Fixes: 2c8a57088045 ("rpmsg: Provide function stubs for API")
Signed-off-by: Arnaud Pouliquen <arnaud.pouliquen@foss.st.com>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Link: https://lore.kernel.org/r/20210712123912.10672-1-arnaud.pouliquen@foss.st.com
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 1198ff12cbdd5f42c032cba1d96ebc7af8024cf9 ]
When removing the index argument from snd_soc_topology_component_remove()
commit a5b8f71c5477f (ASoC: topology: Remove multistep topology loading)
forgot to update the stub for !SND_SOC_TOPOLOGY use, causing build failures
for anything that tries to make use of it.
Fixes: a5b8f71c5477f (ASoC: topology: Remove multistep topology loading)
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211025154844.2342120-1-broonie@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit ac0fffa0859b8e1e991939663b3ebdd80bf979e6 ]
ib_dma_map_sgtable_attrs() should be mapping the sgls and setting nents
but the ib_uses_virt_dma() path falls back to ib_dma_virt_map_sg() which
will not set the nents in the sgtable.
Check the return value (per the map_sg calling convention) and set
sgt->nents appropriately on success.
Fixes: 79fbd3e1241c ("RDMA: Use the sg_table directly and remove the opencoded version from umem")
Link: https://lore.kernel.org/r/20211013165942.89806-1-logang@deltatee.com
Reported-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 4a08e3271c55f8b5d56906a8aa5bd041911cf897 ]
Pass cpu to parse_perf_domain() instead of pcpu.
Fixes: 8486a32dd484 ("cpufreq: Add of_perf_domain_get_sharing_cpumask")
Signed-off-by: Hector.Yuan <hector.yuan@mediatek.com>
[ Viresh: Massaged changelog ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 7303524e04af49a47991e19f895c3b8cdc3796c7 ]
If sockmap enable strparser, there are lose offset info in
sk_psock_skb_ingress(). If the length determined by parse_msg function is not
skb->len, the skb will be converted to sk_msg multiple times, and userspace
app will get the data multiple times.
Fix this by get the offset and length from strp_msg. And as Cong suggested,
add one bit in skb->_sk_redir to distinguish enable or disable strparser.
Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: Liu Jian <liujian56@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20211029141216.211899-1-liujian56@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit cc4665ca646c96181a7c00198aa72c59e0c576e8 ]
sctp_transport_pl_hlen() is called to calculate the outer header length
for PL. However, as the Figure in rfc8899#section-4.4:
Any additional
headers .--- MPS -----.
| | |
v v v
+------------------------------+
| IP | ** | PL | protocol data |
+------------------------------+
<----- PLPMTU ----->
<---------- PMTU -------------->
Outer header are IP + Any additional headers, which doesn't include
Packetization Layer itself header, namely sctphdr, whereas sctphdr
is counted by __sctp_mtu_payload().
The incorrect calculation caused the link pathmtu to be set larger
than expected by t->pl.pmtu + sctp_transport_pl_hlen(). This patch
is to fix it by subtracting sctphdr len in sctp_transport_pl_hlen().
Fixes: d9e2e410ae30 ("sctp: add the constants/variables and states and some APIs for transport")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit c6ea04ea692fa0d8e7faeb133fcd28e3acf470a0 ]
sctp_transport_pl_update() is called when transport update its dst and
pathmtu, instead of stopping the PLPMTUD probe timer, PLPMTUD should
start over and reset the probe timer. Otherwise, the PLPMTUD service
would stop.
Fixes: 92548ec2f1f9 ("sctp: add the probe timer in transport for PLPMTUD")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit f941eadd8d6d4ee2f8c9aeab8e1da5e647533a7d ]
__bpf_prog_run() can run from non IRQ contexts, meaning
it could be re entered if interrupted.
This calls for the irq safe variant of u64_stats_update_{begin|end},
or risk a deadlock.
This patch is a nop on 64bit arches, fortunately.
syzbot report:
WARNING: inconsistent lock state
5.12.0-rc3-syzkaller #0 Not tainted
--------------------------------
inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
udevd/4013 [HC0[0]:SC0[0]:HE1:SE1] takes:
ff7c9dec (&(&pstats->syncp)->seq){+.?.}-{0:0}, at: sk_filter include/linux/filter.h:867 [inline]
ff7c9dec (&(&pstats->syncp)->seq){+.?.}-{0:0}, at: do_one_broadcast net/netlink/af_netlink.c:1468 [inline]
ff7c9dec (&(&pstats->syncp)->seq){+.?.}-{0:0}, at: netlink_broadcast_filtered+0x27c/0x4fc net/netlink/af_netlink.c:1520
{IN-SOFTIRQ-W} state was registered at:
lock_acquire.part.0+0xf0/0x41c kernel/locking/lockdep.c:5510
lock_acquire+0x6c/0x74 kernel/locking/lockdep.c:5483
do_write_seqcount_begin_nested include/linux/seqlock.h:520 [inline]
do_write_seqcount_begin include/linux/seqlock.h:545 [inline]
u64_stats_update_begin include/linux/u64_stats_sync.h:129 [inline]
bpf_prog_run_pin_on_cpu include/linux/filter.h:624 [inline]
bpf_prog_run_clear_cb+0x1bc/0x270 include/linux/filter.h:755
run_filter+0xa0/0x17c net/packet/af_packet.c:2031
packet_rcv+0xc0/0x3e0 net/packet/af_packet.c:2104
dev_queue_xmit_nit+0x2bc/0x39c net/core/dev.c:2387
xmit_one net/core/dev.c:3588 [inline]
dev_hard_start_xmit+0x94/0x518 net/core/dev.c:3609
sch_direct_xmit+0x11c/0x1f0 net/sched/sch_generic.c:313
qdisc_restart net/sched/sch_generic.c:376 [inline]
__qdisc_run+0x194/0x7f8 net/sched/sch_generic.c:384
qdisc_run include/net/pkt_sched.h:136 [inline]
qdisc_run include/net/pkt_sched.h:128 [inline]
__dev_xmit_skb net/core/dev.c:3795 [inline]
__dev_queue_xmit+0x65c/0xf84 net/core/dev.c:4150
dev_queue_xmit+0x14/0x18 net/core/dev.c:4215
neigh_resolve_output net/core/neighbour.c:1491 [inline]
neigh_resolve_output+0x170/0x228 net/core/neighbour.c:1471
neigh_output include/net/neighbour.h:510 [inline]
ip6_finish_output2+0x2e4/0x9fc net/ipv6/ip6_output.c:117
__ip6_finish_output net/ipv6/ip6_output.c:182 [inline]
__ip6_finish_output+0x164/0x3f8 net/ipv6/ip6_output.c:161
ip6_finish_output+0x2c/0xb0 net/ipv6/ip6_output.c:192
NF_HOOK_COND include/linux/netfilter.h:290 [inline]
ip6_output+0x74/0x294 net/ipv6/ip6_output.c:215
dst_output include/net/dst.h:448 [inline]
NF_HOOK include/linux/netfilter.h:301 [inline]
NF_HOOK include/linux/netfilter.h:295 [inline]
mld_sendpack+0x2a8/0x7e4 net/ipv6/mcast.c:1679
mld_send_cr net/ipv6/mcast.c:1975 [inline]
mld_ifc_timer_expire+0x1e8/0x494 net/ipv6/mcast.c:2474
call_timer_fn+0xd0/0x570 kernel/time/timer.c:1431
expire_timers kernel/time/timer.c:1476 [inline]
__run_timers kernel/time/timer.c:1745 [inline]
run_timer_softirq+0x2e4/0x384 kernel/time/timer.c:1758
__do_softirq+0x204/0x7ac kernel/softirq.c:345
do_softirq_own_stack include/asm-generic/softirq_stack.h:10 [inline]
invoke_softirq kernel/softirq.c:228 [inline]
__irq_exit_rcu+0x1d8/0x200 kernel/softirq.c:422
irq_exit+0x10/0x3c kernel/softirq.c:446
__handle_domain_irq+0xb4/0x120 kernel/irq/irqdesc.c:692
handle_domain_irq include/linux/irqdesc.h:176 [inline]
gic_handle_irq+0x84/0xac drivers/irqchip/irq-gic.c:370
__irq_svc+0x5c/0x94 arch/arm/kernel/entry-armv.S:205
debug_smp_processor_id+0x0/0x24 lib/smp_processor_id.c:53
rcu_read_lock_held_common kernel/rcu/update.c:108 [inline]
rcu_read_lock_sched_held+0x24/0x7c kernel/rcu/update.c:123
trace_lock_acquire+0x24c/0x278 include/trace/events/lock.h:13
lock_acquire+0x3c/0x74 kernel/locking/lockdep.c:5481
rcu_lock_acquire include/linux/rcupdate.h:267 [inline]
rcu_read_lock include/linux/rcupdate.h:656 [inline]
avc_has_perm_noaudit+0x6c/0x260 security/selinux/avc.c:1150
selinux_inode_permission+0x140/0x220 security/selinux/hooks.c:3141
security_inode_permission+0x44/0x60 security/security.c:1268
inode_permission.part.0+0x5c/0x13c fs/namei.c:521
inode_permission fs/namei.c:494 [inline]
may_lookup fs/namei.c:1652 [inline]
link_path_walk.part.0+0xd4/0x38c fs/namei.c:2208
link_path_walk fs/namei.c:2189 [inline]
path_lookupat+0x3c/0x1b8 fs/namei.c:2419
filename_lookup+0xa8/0x1a4 fs/namei.c:2453
user_path_at_empty+0x74/0x90 fs/namei.c:2733
do_readlinkat+0x5c/0x12c fs/stat.c:417
__do_sys_readlink fs/stat.c:450 [inline]
sys_readlink+0x24/0x28 fs/stat.c:447
ret_fast_syscall+0x0/0x2c arch/arm/mm/proc-v7.S:64
0x7eaa4974
irq event stamp: 298277
hardirqs last enabled at (298277): [<802000d0>] no_work_pending+0x4/0x34
hardirqs last disabled at (298276): [<8020c9b8>] do_work_pending+0x9c/0x648 arch/arm/kernel/signal.c:676
softirqs last enabled at (298216): [<8020167c>] __do_softirq+0x584/0x7ac kernel/softirq.c:372
softirqs last disabled at (298201): [<8024dff4>] do_softirq_own_stack include/asm-generic/softirq_stack.h:10 [inline]
softirqs last disabled at (298201): [<8024dff4>] invoke_softirq kernel/softirq.c:228 [inline]
softirqs last disabled at (298201): [<8024dff4>] __irq_exit_rcu+0x1d8/0x200 kernel/softirq.c:422
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&(&pstats->syncp)->seq);
<Interrupt>
lock(&(&pstats->syncp)->seq);
*** DEADLOCK ***
1 lock held by udevd/4013:
#0: 82b09c5c (rcu_read_lock){....}-{1:2}, at: sk_filter_trim_cap+0x54/0x434 net/core/filter.c:139
stack backtrace:
CPU: 1 PID: 4013 Comm: udevd Not tainted 5.12.0-rc3-syzkaller #0
Hardware name: ARM-Versatile Express
Backtrace:
[<81802550>] (dump_backtrace) from [<818027c4>] (show_stack+0x18/0x1c arch/arm/kernel/traps.c:252)
r7:00000080 r6:600d0093 r5:00000000 r4:82b58344
[<818027ac>] (show_stack) from [<81809e98>] (__dump_stack lib/dump_stack.c:79 [inline])
[<818027ac>] (show_stack) from [<81809e98>] (dump_stack+0xb8/0xe8 lib/dump_stack.c:120)
[<81809de0>] (dump_stack) from [<81804a00>] (print_usage_bug.part.0+0x228/0x230 kernel/locking/lockdep.c:3806)
r7:86bcb768 r6:81a0326c r5:830f96a8 r4:86bcb0c0
[<818047d8>] (print_usage_bug.part.0) from [<802bb1b8>] (print_usage_bug kernel/locking/lockdep.c:3776 [inline])
[<818047d8>] (print_usage_bug.part.0) from [<802bb1b8>] (valid_state kernel/locking/lockdep.c:3818 [inline])
[<818047d8>] (print_usage_bug.part.0) from [<802bb1b8>] (mark_lock_irq kernel/locking/lockdep.c:4021 [inline])
[<818047d8>] (print_usage_bug.part.0) from [<802bb1b8>] (mark_lock.part.0+0xc34/0x136c kernel/locking/lockdep.c:4478)
r10:83278fe8 r9:82c6d748 r8:00000000 r7:82c6d2d4 r6:00000004 r5:86bcb768
r4:00000006
[<802ba584>] (mark_lock.part.0) from [<802bc644>] (mark_lock kernel/locking/lockdep.c:4442 [inline])
[<802ba584>] (mark_lock.part.0) from [<802bc644>] (mark_usage kernel/locking/lockdep.c:4391 [inline])
[<802ba584>] (mark_lock.part.0) from [<802bc644>] (__lock_acquire+0x9bc/0x3318 kernel/locking/lockdep.c:4854)
r10:86bcb768 r9:86bcb0c0 r8:00000001 r7:00040000 r6:0000075a r5:830f96a8
r4:00000000
[<802bbc88>] (__lock_acquire) from [<802bfb90>] (lock_acquire.part.0+0xf0/0x41c kernel/locking/lockdep.c:5510)
r10:00000000 r9:600d0013 r8:00000000 r7:00000000 r6:828a2680 r5:828a2680
r4:861e5bc8
[<802bfaa0>] (lock_acquire.part.0) from [<802bff28>] (lock_acquire+0x6c/0x74 kernel/locking/lockdep.c:5483)
r10:8146137c r9:00000000 r8:00000001 r7:00000000 r6:00000000 r5:00000000
r4:ff7c9dec
[<802bfebc>] (lock_acquire) from [<81381eb4>] (do_write_seqcount_begin_nested include/linux/seqlock.h:520 [inline])
[<802bfebc>] (lock_acquire) from [<81381eb4>] (do_write_seqcount_begin include/linux/seqlock.h:545 [inline])
[<802bfebc>] (lock_acquire) from [<81381eb4>] (u64_stats_update_begin include/linux/u64_stats_sync.h:129 [inline])
[<802bfebc>] (lock_acquire) from [<81381eb4>] (__bpf_prog_run_save_cb include/linux/filter.h:727 [inline])
[<802bfebc>] (lock_acquire) from [<81381eb4>] (bpf_prog_run_save_cb include/linux/filter.h:741 [inline])
[<802bfebc>] (lock_acquire) from [<81381eb4>] (sk_filter_trim_cap+0x26c/0x434 net/core/filter.c:149)
r10:a4095dd0 r9:ff7c9dd0 r8:e44be000 r7:8146137c r6:00000001 r5:8611ba80
r4:00000000
[<81381c48>] (sk_filter_trim_cap) from [<8146137c>] (sk_filter include/linux/filter.h:867 [inline])
[<81381c48>] (sk_filter_trim_cap) from [<8146137c>] (do_one_broadcast net/netlink/af_netlink.c:1468 [inline])
[<81381c48>] (sk_filter_trim_cap) from [<8146137c>] (netlink_broadcast_filtered+0x27c/0x4fc net/netlink/af_netlink.c:1520)
r10:00000001 r9:833d6b1c r8:00000000 r7:8572f864 r6:8611ba80 r5:8698d800
r4:8572f800
[<81461100>] (netlink_broadcast_filtered) from [<81463e60>] (netlink_broadcast net/netlink/af_netlink.c:1544 [inline])
[<81461100>] (netlink_broadcast_filtered) from [<81463e60>] (netlink_sendmsg+0x3d0/0x478 net/netlink/af_netlink.c:1925)
r10:00000000 r9:00000002 r8:8698d800 r7:000000b7 r6:8611b900 r5:861e5f50
r4:86aa3000
[<81463a90>] (netlink_sendmsg) from [<81321f54>] (sock_sendmsg_nosec net/socket.c:654 [inline])
[<81463a90>] (netlink_sendmsg) from [<81321f54>] (sock_sendmsg+0x3c/0x4c net/socket.c:674)
r10:00000000 r9:861e5dd4 r8:00000000 r7:86570000 r6:00000000 r5:86570000
r4:861e5f50
[<81321f18>] (sock_sendmsg) from [<813234d0>] (____sys_sendmsg+0x230/0x29c net/socket.c:2350)
r5:00000040 r4:861e5f50
[<813232a0>] (____sys_sendmsg) from [<8132549c>] (___sys_sendmsg+0xac/0xe4 net/socket.c:2404)
r10:00000128 r9:861e4000 r8:00000000 r7:00000000 r6:86570000 r5:861e5f50
r4:00000000
[<813253f0>] (___sys_sendmsg) from [<81325684>] (__sys_sendmsg net/socket.c:2433 [inline])
[<813253f0>] (___sys_sendmsg) from [<81325684>] (__do_sys_sendmsg net/socket.c:2442 [inline])
[<813253f0>] (___sys_sendmsg) from [<81325684>] (sys_sendmsg+0x58/0xa0 net/socket.c:2440)
r8:80200224 r7:00000128 r6:00000000 r5:7eaa541c r4:86570000
[<8132562c>] (sys_sendmsg) from [<80200060>] (ret_fast_syscall+0x0/0x2c arch/arm/mm/proc-v7.S:64)
Exception stack(0x861e5fa8 to 0x861e5ff0)
5fa0: 00000000 00000000 0000000c 7eaa541c 00000000 00000000
5fc0: 00000000 00000000 76fbf840 00000128 00000000 0000008f 7eaa541c 000563f8
5fe0: 00056110 7eaa53e0 00036cec 76c9bf44
r6:76fbf840 r5:00000000 r4:00000000
Fixes: 492ecee892c2 ("bpf: enable program stats")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211026214133.3114279-2-eric.dumazet@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 79ca6f74dae067681a779fd573c2eb59649989bc ]
The Atmel TPM 1.2 chips crash with error
`tpm_try_transmit: send(): error -62` since kernel 4.14.
It is observed from the kernel log after running `tpm_sealdata -z`.
The error thrown from the command is as follows
```
$ tpm_sealdata -z
Tspi_Key_LoadKey failed: 0x00001087 - layer=tddl,
code=0087 (135), I/O error
```
The issue was reproduced with the following Atmel TPM chip:
```
$ tpm_version
T0 TPM 1.2 Version Info:
Chip Version: 1.2.66.1
Spec Level: 2
Errata Revision: 3
TPM Vendor ID: ATML
TPM Version: 01010000
Manufacturer Info: 41544d4c
```
The root cause of the issue is due to the TPM calls to msleep()
were replaced with usleep_range() [1], which reduces
the actual timeout. Via experiments, it is observed that
the original msleep(5) actually sleeps for 15ms.
Because of a known timeout issue in Atmel TPM 1.2 chip,
the shorter timeout than 15ms can cause the error described above.
A few further changes in kernel 4.16 [2] and 4.18 [3, 4] further
reduced the timeout to less than 1ms. With experiments,
the problematic timeout in the latest kernel is the one
for `wait_for_tpm_stat`.
To fix it, the patch reverts the timeout of `wait_for_tpm_stat`
to 15ms for all Atmel TPM 1.2 chips, but leave it untouched
for Ateml TPM 2.0 chip, and chips from other vendors.
As explained above, the chosen 15ms timeout is
the actual timeout before this issue introduced,
thus the old value is used here.
Particularly, TPM_ATML_TIMEOUT_WAIT_STAT_MIN is set to 14700us,
TPM_ATML_TIMEOUT_WAIT_STAT_MIN is set to 15000us according to
the existing TPM_TIMEOUT_RANGE_US (300us).
The fixed has been tested in the system with the affected Atmel chip
with no issues observed after boot up.
References:
[1] 9f3fc7bcddcb tpm: replace msleep() with usleep_range() in TPM
1.2/2.0 generic drivers
[2] cf151a9a44d5 tpm: reduce tpm polling delay in tpm_tis_core
[3] 59f5a6b07f64 tpm: reduce poll sleep time in tpm_transmit()
[4] 424eaf910c32 tpm: reduce polling time to usecs for even finer
granularity
Fixes: 9f3fc7bcddcb ("tpm: replace msleep() with usleep_range() in TPM 1.2/2.0 generic drivers")
Link: https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/
Signed-off-by: Hao Wu <hao.wu@rubrik.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 19757cebf0c5016a1f36f7fe9810a9f0b33c0832 ]
Use of percpu_counter structure to track count of orphaned
sockets is causing problems on modern hosts with 256 cpus
or more.
Stefan Bach reported a serious spinlock contention in real workloads,
that I was able to reproduce with a netfilter rule dropping
incoming FIN packets.
53.56% server [kernel.kallsyms] [k] queued_spin_lock_slowpath
|
---queued_spin_lock_slowpath
|
--53.51%--_raw_spin_lock_irqsave
|
--53.51%--__percpu_counter_sum
tcp_check_oom
|
|--39.03%--__tcp_close
| tcp_close
| inet_release
| inet6_release
| sock_close
| __fput
| ____fput
| task_work_run
| exit_to_usermode_loop
| do_syscall_64
| entry_SYSCALL_64_after_hwframe
| __GI___libc_close
|
--14.48%--tcp_out_of_resources
tcp_write_timeout
tcp_retransmit_timer
tcp_write_timer_handler
tcp_write_timer
call_timer_fn
expire_timers
__run_timers
run_timer_softirq
__softirqentry_text_start
As explained in commit cf86a086a180 ("net/dst: use a smaller percpu_counter
batch for dst entries accounting"), default batch size is too big
for the default value of tcp_max_orphans (262144).
But even if we reduce batch sizes, there would still be cases
where the estimated count of orphans is beyond the limit,
and where tcp_too_many_orphans() has to call the expensive
percpu_counter_sum_positive().
One solution is to use plain per-cpu counters, and have
a timer to periodically refresh this cache.
Updating this cache every 100ms seems about right, tcp pressure
state is not radically changing over shorter periods.
percpu_counter was nice 15 years ago while hosts had less
than 16 cpus, not anymore by current standards.
v2: Fix the build issue for CONFIG_CRYPTO_DEV_CHELSIO_TLS=m,
reported by kernel test robot <lkp@intel.com>
Remove unused socket argument from tcp_too_many_orphans()
Fixes: dd24c00191d5 ("net: Use a percpu_counter for orphan_count")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Stefan Bach <sfb@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 4ef0c5c6b5ba1f38f0ea1cedad0cad722f00c14a ]
There is a small race between copy_process() and sched_fork()
where child->sched_task_group point to an already freed pointer.
parent doing fork() | someone moving the parent
| to another cgroup
-------------------------------+-------------------------------
copy_process()
+ dup_task_struct()<1>
parent move to another cgroup,
and free the old cgroup. <2>
+ sched_fork()
+ __set_task_cpu()<3>
+ task_fork_fair()
+ sched_slice()<4>
In the worst case, this bug can lead to "use-after-free" and
cause panic as shown above:
(1) parent copy its sched_task_group to child at <1>;
(2) someone move the parent to another cgroup and free the old
cgroup at <2>;
(3) the sched_task_group and cfs_rq that belong to the old cgroup
will be accessed at <3> and <4>, which cause a panic:
[] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[] PGD 8000001fa0a86067 P4D 8000001fa0a86067 PUD 2029955067 PMD 0
[] Oops: 0000 [#1] SMP PTI
[] CPU: 7 PID: 648398 Comm: ebizzy Kdump: loaded Tainted: G OE --------- - - 4.18.0.x86_64+ #1
[] RIP: 0010:sched_slice+0x84/0xc0
[] Call Trace:
[] task_fork_fair+0x81/0x120
[] sched_fork+0x132/0x240
[] copy_process.part.5+0x675/0x20e0
[] ? __handle_mm_fault+0x63f/0x690
[] _do_fork+0xcd/0x3b0
[] do_syscall_64+0x5d/0x1d0
[] entry_SYSCALL_64_after_hwframe+0x65/0xca
[] RIP: 0033:0x7f04418cd7e1
Between cgroup_can_fork() and cgroup_post_fork(), the cgroup
membership and thus sched_task_group can't change. So update child's
sched_task_group at sched_post_fork() and move task_fork() and
__set_task_cpu() (where accees the sched_task_group) from sched_fork()
to sched_post_fork().
Fixes: 8323f26ce342 ("sched: Fix race in task_group")
Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lkml.kernel.org/r/20210915064030.2231-1-zhangqiao22@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 072af0c638dc8a5c7db2edc4dddbd6d44bee3bdb ]
The implementation for intra-object overflow in str*-family functions
accidentally dropped compile-time write overflow checking in strcpy(),
leaving it entirely to run-time. Add back the intended check.
Fixes: 6a39e62abbaf ("lib: string.h: detect intra-object overflow in fortified string functions")
Cc: Daniel Axtens <dja@axtens.net>
Cc: Francis Laniel <laniel_francis@privacyrequired.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 9cc2fa4f4a92ccc6760d764e7341be46ee8aaaa1 ]
The function end_of_stack() returns a pointer to the last entry of a
stack. For architectures like parisc where the stack grows upwards
return the pointer to the highest address in the stack.
Without this change I faced a crash on parisc, because the stackleak
functionality wrote STACKLEAK_POISON to the lowest address and thus
overwrote the first 4 bytes of the task_struct which included the
TIF_FLAGS.
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 9dfc685e0262d4c5e44e13302f89841fa75173ca ]
syzbot reported data-races in inet_getname() multiple times,
it is time we fix this instead of pretending applications
should not trigger them.
getsockname() and getpeername() are not really considered fast path.
v2: added the missing BPF_CGROUP_RUN_SA_PROG() declaration
needed when CONFIG_CGROUP_BPF=n, as reported by
kernel test robot <lkp@intel.com>
syzbot typical report:
BUG: KCSAN: data-race in __inet_hash_connect / inet_getname
write to 0xffff888136d66cf8 of 2 bytes by task 14374 on cpu 1:
__inet_hash_connect+0x7ec/0x950 net/ipv4/inet_hashtables.c:831
inet_hash_connect+0x85/0x90 net/ipv4/inet_hashtables.c:853
tcp_v4_connect+0x782/0xbb0 net/ipv4/tcp_ipv4.c:275
__inet_stream_connect+0x156/0x6e0 net/ipv4/af_inet.c:664
inet_stream_connect+0x44/0x70 net/ipv4/af_inet.c:728
__sys_connect_file net/socket.c:1896 [inline]
__sys_connect+0x254/0x290 net/socket.c:1913
__do_sys_connect net/socket.c:1923 [inline]
__se_sys_connect net/socket.c:1920 [inline]
__x64_sys_connect+0x3d/0x50 net/socket.c:1920
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x44/0xa0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
read to 0xffff888136d66cf8 of 2 bytes by task 14408 on cpu 0:
inet_getname+0x11f/0x170 net/ipv4/af_inet.c:790
__sys_getsockname+0x11d/0x1b0 net/socket.c:1946
__do_sys_getsockname net/socket.c:1961 [inline]
__se_sys_getsockname net/socket.c:1958 [inline]
__x64_sys_getsockname+0x3e/0x50 net/socket.c:1958
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x44/0xa0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
value changed: 0x0000 -> 0xdee0
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 14408 Comm: syz-executor.3 Not tainted 5.15.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Link: https://lore.kernel.org/r/20211026213014.3026708-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit d18785e213866935b4c3dc0c33c3e18801ce0ce8 ]
neigh_output() reads n->nud_state and hh->hh_len locklessly.
This is fine, but we need to add annotations and document this.
We evaluate skip_cache first to avoid reading these fields
if the cache has to by bypassed.
syzbot report:
BUG: KCSAN: data-race in __neigh_event_send / ip_finish_output2
write to 0xffff88810798a885 of 1 bytes by interrupt on cpu 1:
__neigh_event_send+0x40d/0xac0 net/core/neighbour.c:1128
neigh_event_send include/net/neighbour.h:444 [inline]
neigh_resolve_output+0x104/0x410 net/core/neighbour.c:1476
neigh_output include/net/neighbour.h:510 [inline]
ip_finish_output2+0x80a/0xaa0 net/ipv4/ip_output.c:221
ip_finish_output+0x3b5/0x510 net/ipv4/ip_output.c:309
NF_HOOK_COND include/linux/netfilter.h:296 [inline]
ip_output+0xf3/0x1a0 net/ipv4/ip_output.c:423
dst_output include/net/dst.h:450 [inline]
ip_local_out+0x164/0x220 net/ipv4/ip_output.c:126
__ip_queue_xmit+0x9d3/0xa20 net/ipv4/ip_output.c:525
ip_queue_xmit+0x34/0x40 net/ipv4/ip_output.c:539
__tcp_transmit_skb+0x142a/0x1a00 net/ipv4/tcp_output.c:1405
tcp_transmit_skb net/ipv4/tcp_output.c:1423 [inline]
tcp_xmit_probe_skb net/ipv4/tcp_output.c:4011 [inline]
tcp_write_wakeup+0x4a9/0x810 net/ipv4/tcp_output.c:4064
tcp_send_probe0+0x2c/0x2b0 net/ipv4/tcp_output.c:4079
tcp_probe_timer net/ipv4/tcp_timer.c:398 [inline]
tcp_write_timer_handler+0x394/0x520 net/ipv4/tcp_timer.c:626
tcp_write_timer+0xb9/0x180 net/ipv4/tcp_timer.c:642
call_timer_fn+0x2e/0x1d0 kernel/time/timer.c:1421
expire_timers+0x135/0x240 kernel/time/timer.c:1466
__run_timers+0x368/0x430 kernel/time/timer.c:1734
run_timer_softirq+0x19/0x30 kernel/time/timer.c:1747
__do_softirq+0x12c/0x26e kernel/softirq.c:558
invoke_softirq kernel/softirq.c:432 [inline]
__irq_exit_rcu kernel/softirq.c:636 [inline]
irq_exit_rcu+0x4e/0xa0 kernel/softirq.c:648
sysvec_apic_timer_interrupt+0x69/0x80 arch/x86/kernel/apic/apic.c:1097
asm_sysvec_apic_timer_interrupt+0x12/0x20
native_safe_halt arch/x86/include/asm/irqflags.h:51 [inline]
arch_safe_halt arch/x86/include/asm/irqflags.h:89 [inline]
acpi_safe_halt drivers/acpi/processor_idle.c:109 [inline]
acpi_idle_do_entry drivers/acpi/processor_idle.c:553 [inline]
acpi_idle_enter+0x258/0x2e0 drivers/acpi/processor_idle.c:688
cpuidle_enter_state+0x2b4/0x760 drivers/cpuidle/cpuidle.c:237
cpuidle_enter+0x3c/0x60 drivers/cpuidle/cpuidle.c:351
call_cpuidle kernel/sched/idle.c:158 [inline]
cpuidle_idle_call kernel/sched/idle.c:239 [inline]
do_idle+0x1a3/0x250 kernel/sched/idle.c:306
cpu_startup_entry+0x15/0x20 kernel/sched/idle.c:403
secondary_startup_64_no_verify+0xb1/0xbb
read to 0xffff88810798a885 of 1 bytes by interrupt on cpu 0:
neigh_output include/net/neighbour.h:507 [inline]
ip_finish_output2+0x79a/0xaa0 net/ipv4/ip_output.c:221
ip_finish_output+0x3b5/0x510 net/ipv4/ip_output.c:309
NF_HOOK_COND include/linux/netfilter.h:296 [inline]
ip_output+0xf3/0x1a0 net/ipv4/ip_output.c:423
dst_output include/net/dst.h:450 [inline]
ip_local_out+0x164/0x220 net/ipv4/ip_output.c:126
__ip_queue_xmit+0x9d3/0xa20 net/ipv4/ip_output.c:525
ip_queue_xmit+0x34/0x40 net/ipv4/ip_output.c:539
__tcp_transmit_skb+0x142a/0x1a00 net/ipv4/tcp_output.c:1405
tcp_transmit_skb net/ipv4/tcp_output.c:1423 [inline]
tcp_xmit_probe_skb net/ipv4/tcp_output.c:4011 [inline]
tcp_write_wakeup+0x4a9/0x810 net/ipv4/tcp_output.c:4064
tcp_send_probe0+0x2c/0x2b0 net/ipv4/tcp_output.c:4079
tcp_probe_timer net/ipv4/tcp_timer.c:398 [inline]
tcp_write_timer_handler+0x394/0x520 net/ipv4/tcp_timer.c:626
tcp_write_timer+0xb9/0x180 net/ipv4/tcp_timer.c:642
call_timer_fn+0x2e/0x1d0 kernel/time/timer.c:1421
expire_timers+0x135/0x240 kernel/time/timer.c:1466
__run_timers+0x368/0x430 kernel/time/timer.c:1734
run_timer_softirq+0x19/0x30 kernel/time/timer.c:1747
__do_softirq+0x12c/0x26e kernel/softirq.c:558
invoke_softirq kernel/softirq.c:432 [inline]
__irq_exit_rcu kernel/softirq.c:636 [inline]
irq_exit_rcu+0x4e/0xa0 kernel/softirq.c:648
sysvec_apic_timer_interrupt+0x69/0x80 arch/x86/kernel/apic/apic.c:1097
asm_sysvec_apic_timer_interrupt+0x12/0x20
native_safe_halt arch/x86/include/asm/irqflags.h:51 [inline]
arch_safe_halt arch/x86/include/asm/irqflags.h:89 [inline]
acpi_safe_halt drivers/acpi/processor_idle.c:109 [inline]
acpi_idle_do_entry drivers/acpi/processor_idle.c:553 [inline]
acpi_idle_enter+0x258/0x2e0 drivers/acpi/processor_idle.c:688
cpuidle_enter_state+0x2b4/0x760 drivers/cpuidle/cpuidle.c:237
cpuidle_enter+0x3c/0x60 drivers/cpuidle/cpuidle.c:351
call_cpuidle kernel/sched/idle.c:158 [inline]
cpuidle_idle_call kernel/sched/idle.c:239 [inline]
do_idle+0x1a3/0x250 kernel/sched/idle.c:306
cpu_startup_entry+0x15/0x20 kernel/sched/idle.c:403
rest_init+0xee/0x100 init/main.c:734
arch_call_rest_init+0xa/0xb
start_kernel+0x5e4/0x669 init/main.c:1142
secondary_startup_64_no_verify+0xb1/0xbb
value changed: 0x20 -> 0x01
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit ba0ffdd8ce48ad7f7e85191cd29f9674caca3745 ]
Particularly for NVMe with efficient deferred submission for many
requests, there are nice benefits to be seen by bumping the default max
plug count from 16 to 32. This is especially true for virtualized setups,
where the submit part is more expensive. But can be noticed even on
native hardware.
Reduce the multiple queue factor from 4 to 2, since we're changing the
default size.
While changing it, move the defines into the block layer private header.
These aren't values that anyone outside of the block layer uses, or
should use.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit a130e8fbc7de796eb6e680724d87f4737a26d0ac ]
/proc/uptime reports idle time by reading the CPUTIME_IDLE field from
the per-cpu kcpustats. However, on NO_HZ systems, idle time is not
continually updated on idle cpus, leading this value to appear
incorrectly small.
/proc/stat performs an accounting update when reading idle time; we
can use the same approach for uptime.
With this patch, /proc/stat and /proc/uptime now agree on idle time.
Additionally, the following shows idle time tick up consistently on an
idle machine:
(while true; do cat /proc/uptime; sleep 1; done) | awk '{print $2-prev; prev=$2}'
Reported-by: Luigi Rizzo <lrizzo@google.com>
Signed-off-by: Josh Don <joshdon@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lkml.kernel.org/r/20210827165438.3280779-1-joshdon@google.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit a4b83deb3e76fb9385ca58e2c072a145b3a320d6 ]
With the new DMA API we need an extension of the videobuf2 API.
Previously, videobuf2 core would set the non-coherent DMA bit
in the vb2_queue dma_attr field (if user-space would pass a
corresponding memory hint); the vb2 core then would pass the
vb2_queue dma_attrs to the vb2 allocators. The vb2 allocator
would use the queue's dma_attr and the DMA API would allocate
either coherent or non-coherent memory.
But we cannot do this anymore, since there is no corresponding DMA
attr flag and, hence, there is no way for the allocator to become
aware of what type of allocation user-space has requested. So we
need to pass more context from videobuf2 core to the allocators.
Fix this by changing the call_ptr_memop() macro to pass the
vb2 pointer to the corresponding op callbacks.
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 1e080f17750d1083e8a32f7b350584ae1cd7ff20 ]
mq / mqprio make the default child qdiscs visible. They only do
so for the qdiscs which are within real_num_tx_queues when the
device is registered. Depending on order of calls in the driver,
or if user space changes config via ethtool -L the number of
qdiscs visible under tc qdisc show will differ from the number
of queues. This is confusing to users and potentially to system
configuration scripts which try to make sure qdiscs have the
right parameters.
Add a new Qdisc_ops callback and make relevant qdiscs TTRT.
Note that this uncovers the "shortcut" created by
commit 1f27cde313d7 ("net: sched: use pfifo_fast for non real queues")
The default child qdiscs beyond initial real_num_tx are always
pfifo_fast, no matter what the sysfs setting is. Fixing this
gets a little tricky because we'd need to keep a reference
on whatever the default qdisc was at the time of creation.
In practice this is likely an non-issue the qdiscs likely have
to be configured to non-default settings, so whatever user space
is doing such configuration can replace the pfifos... now that
it will see them.
Reported-by: Matthew Massey <matthewmassey@fb.com>
Reviewed-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 027b57170bf8bb6999a28e4a5f3d78bf1db0f90c upstream.
Since commit edc6afc54968 ("tty: switch to ktermios and new framework")
termios speed is no longer stored only in c_cflag member but also in new
additional c_ispeed and c_ospeed members. If BOTHER flag is set in c_cflag
then termios speed is stored only in these new members.
Therefore to correctly restore termios speed it is required to store also
ispeed and ospeed members, not only cflag member.
In case only cflag member with BOTHER flag is restored then functions
tty_termios_baud_rate() and tty_termios_input_baud_rate() returns baudrate
stored in c_ospeed / c_ispeed member, which is zero as it was not restored
too. If reported baudrate is invalid (e.g. zero) then serial core functions
report fallback baudrate value 9600. So it means that in this case original
baudrate is lost and kernel changes it to value 9600.
Simple reproducer of this issue is to boot kernel with following command
line argument: "console=ttyXXX,86400" (where ttyXXX is the device name).
For speed 86400 there is no Bnnn constant and therefore kernel has to
represent this speed via BOTHER c_cflag. Which means that speed is stored
only in c_ospeed and c_ispeed members, not in c_cflag anymore.
If bootloader correctly configures serial device to speed 86400 then kernel
prints boot log to early console at speed speed 86400 without any issue.
But after kernel starts initializing real console device ttyXXX then speed
is changed to fallback value 9600 because information about speed was lost.
This patch fixes above issue by storing and restoring also ispeed and
ospeed members, which are required for BOTHER flag.
Fixes: edc6afc54968 ("[PATCH] tty: switch to ktermios and new framework")
Cc: stable@vger.kernel.org
Signed-off-by: Pali Rohár <pali@kernel.org>
Link: https://lore.kernel.org/r/20211002130900.9518-1-pali@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 00b06da29cf9dc633cdba87acd3f57f4df3fd5c7 upstream.
As Andy pointed out that there are races between
force_sig_info_to_task and sigaction[1] when force_sig_info_task. As
Kees discovered[2] ptrace is also able to change these signals.
In the case of seeccomp killing a process with a signal it is a
security violation to allow the signal to be caught or manipulated.
Solve this problem by introducing a new flag SA_IMMUTABLE that
prevents sigaction and ptrace from modifying these forced signals.
This flag is carefully made kernel internal so that no new ABI is
introduced.
Longer term I think this can be solved by guaranteeing short circuit
delivery of signals in this case. Unfortunately reliable and
guaranteed short circuit delivery of these signals is still a ways off
from being implemented, tested, and merged. So I have implemented a much
simpler alternative for now.
[1] https://lkml.kernel.org/r/b5d52d25-7bde-4030-a7b1-7c6f8ab90660@www.fastmail.com
[2] https://lkml.kernel.org/r/202110281136.5CE65399A7@keescook
Cc: stable@vger.kernel.org
Fixes: 307d522f5eb8 ("signal/seccomp: Refactor seccomp signal and coredump generation")
Tested-by: Andrea Righi <andrea.righi@canonical.com>
Tested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit fff53a551db50f5edecaa0b29a64056ab8d2bbca upstream.
This patch fixes 2 problems:
[1] The output warning logs and data loss when performing
mount/umount then remount the device with jffs2 format.
[2] The access width of SMWDR[0:1]/SMRDR[0:1] register is wrong.
This is the sample warning logs when performing mount/umount then
remount the device with jffs2 format:
jffs2: jffs2_scan_inode_node(): CRC failed on node at 0x031c51d4:
Read 0x00034e00, calculated 0xadb272a7
The reason for issue [1] is that the writing data seems to
get messed up.
Data is only completed when the number of bytes is divisible by 4.
If you only have 3 bytes of data left to write, 1 garbage byte
is inserted after the end of the write stream.
If you only have 2 bytes of data left to write, 2 bytes of '00'
are added into the write stream.
If you only have 1 byte of data left to write, 2 bytes of '00'
are added into the write stream. 1 garbage byte is inserted after
the end of the write stream.
To solve problem [1], data must be written continuously in serial
and the write stream ends when data is out.
Following HW manual 62.2.15, access to SMWDR0 register should be
in the same size as the transfer size specified in the SPIDE[3:0]
bits in the manual mode enable setting register (SMENR).
Be sure to access from address 0.
So, in 16-bit transfer (SPIDE[3:0]=b'1100), SMWDR0 should be
accessed by 16-bit width.
Similar to SMWDR1, SMDDR0/1 registers.
In current code, SMWDR0 register is accessed by regmap_write()
that only set up to do 32-bit width.
To solve problem [2], data must be written 16-bit or 8-bit when
transferring 1-byte or 2-byte.
Fixes: ca7d8b980b67 ("memory: add Renesas RPC-IF driver")
Cc: <stable@vger.kernel.org>
Signed-off-by: Duc Nguyen <duc.nguyen.ub@renesas.com>
[wsa: refactored to use regmap only via reg_read/reg_write]
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Link: https://lore.kernel.org/r/20210922091007.5516-1-wsa+renesas@sang-engineering.com
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 6b51b02a3a0ac49dfe302818d0746a799545e4e9 upstream.
Daniel pointed me towards this function and there are multiple obvious problems
in the implementation.
First of all the retry loop is not working as intended. In general the retry
makes only sense if you grab the reference first and then check the sequence
values.
Then we should always also wait for the exclusive fence.
It's also good practice to keep the reference around when installing callbacks
to fences you don't own.
And last the whole implementation was unnecessary complex and rather hard to
understand which could lead to probably unexpected behavior of the IOCTL.
Fix all this by reworking the implementation from scratch. Dropping the
whole RCU approach and taking the lock instead.
Only mildly tested and needs a thoughtful review of the code.
Pushing through drm-misc-next to avoid merge conflicts and give the code
another round of testing.
v2: fix the reference counting as well
v3: keep the excl fence handling as is for stable
v4: back to testing all fences, drop RCU
v5: handle in and out separately
v6: add missing clear of events
v7: change coding style as suggested by Michel, drop unused variables
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Tested-by: Michel Dänzer <mdaenzer@redhat.com>
CC: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20210720131110.88512-1-christian.koenig@amd.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit dc0fd0acb6e0e8025a0a43ada54513b216254fac upstream.
Until now, we have only ever seen the REG-category registry being used
on devices addressed with target ID 2. In fact, we have only ever seen
Surface Aggregator Module (SAM) HID devices with target ID 2. For those
devices, the registry also has to be addressed with target ID 2.
Some devices, like the new Surface Laptop Studio, however, address their
HID devices on target ID 1. As a result of this, any target ID 2
commands time out. This includes event management commands addressed to
the target ID 2 REG-category registry. For these devices, the registry
has to be addressed via target ID 1 instead.
We therefore assume that the target ID of the registry to be used
depends on the target ID of the respective device. Implement this
accordingly.
Note that we currently allow the surface HID driver to only load against
devices with target ID 2, so these timeouts are not happening (yet).
This is just a preparation step before we allow the driver to load
against all target IDs.
Cc: stable@vger.kernel.org # 5.14+
Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com>
Acked-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Link: https://lore.kernel.org/r/20211021130904.862610-3-luzmaximilian@gmail.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 5ae17501bc62a49b0b193dcce003f16375f16654 upstream.
The changes to issue the abort from the scmd->abort_work instead of the EH
thread introduced a problem if eh_deadline is used. If aborting the
command(s) is successful, and there are never any scmds added to the
shost->eh_cmd_q, there is no code path which will reset the ->last_reset
value back to zero.
The effect of this is that after a successful abort with no EH thread
activity, a subsequent timeout, perhaps a long time later, might
immediately be considered past a user-set eh_deadline time, and the host
will be reset with no attempt at recovery.
Fix this by resetting ->last_reset back to zero in scmd_eh_abort_handler()
if it is determined that the EH thread will not run to do this.
Thanks to Gopinath Marappan for investigating this problem.
Link: https://lore.kernel.org/r/20211029194311.17504-2-emilne@redhat.com
Fixes: e494f6a72839 ("[SCSI] improved eh timeout handler")
Cc: stable@vger.kernel.org
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 68dbbe7d5b4fde736d104cbbc9a2fce875562012 upstream.
Some ATA drives are very slow to respond to READ_LOG_EXT and
READ_LOG_DMA_EXT commands issued from ata_dev_configure() when the
device is revalidated right after resuming a system or inserting the
ATA adapter driver (e.g. ahci). The default 5s timeout
(ATA_EH_CMD_DFL_TIMEOUT) used for these commands is too short, causing
errors during the device configuration. Ex:
...
ata9: SATA max UDMA/133 abar m524288@0x9d200000 port 0x9d200400 irq 209
ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
ata9.00: ATA-9: XXX XXXXXXXXXXXXXXX, XXXXXXXX, max UDMA/133
ata9.00: qc timeout (cmd 0x2f)
ata9.00: Read log page 0x00 failed, Emask 0x4
ata9.00: Read log page 0x00 failed, Emask 0x40
ata9.00: NCQ Send/Recv Log not supported
ata9.00: Read log page 0x08 failed, Emask 0x40
ata9.00: 27344764928 sectors, multi 16: LBA48 NCQ (depth 32), AA
ata9.00: Read log page 0x00 failed, Emask 0x40
ata9.00: ATA Identify Device Log not supported
ata9.00: failed to set xfermode (err_mask=0x40)
ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
ata9.00: configured for UDMA/133
...
The timeout error causes a soft reset of the drive link, followed in
most cases by a successful revalidation as that give enough time to the
drive to become fully ready to quickly process the read log commands.
However, in some cases, this also fails resulting in the device being
dropped.
Fix this by using adding the ata_eh_revalidate_timeouts entries for the
READ_LOG_EXT and READ_LOG_DMA_EXT commands. This defines a timeout
increased to 15s, retriable one time.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 07e8481d3c38f461d7b79c1d5c9afe013b162b0c upstream.
Regardless of KFENCE mode (CONFIG_KFENCE_STATIC_KEYS: either using
static keys to gate allocations, or using a simple dynamic branch),
always use a static branch to avoid the dynamic branch in kfence_alloc()
if KFENCE was disabled at boot.
For CONFIG_KFENCE_STATIC_KEYS=n, this now avoids the dynamic branch if
KFENCE was disabled at boot.
To simplify, also unifies the location where kfence_allocation_gate is
read-checked to just be inline in kfence_alloc().
Link: https://lkml.kernel.org/r/20211019102524.2807208-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 4d5b5539742d2554591751b4248b0204d20dcc9d upstream.
Use the 'struct cred' saved at binder_open() to lookup
the security ID via security_cred_getsecid(). This
ensures that the security context that opened binder
is the one used to generate the secctx.
Cc: stable@vger.kernel.org # 5.4+
Fixes: ec74136ded79 ("binder: create node flag to request sender's security context")
Signed-off-by: Todd Kjos <tkjos@google.com>
Suggested-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 52f88693378a58094c538662ba652aff0253c4fe upstream.
Since binder was integrated with selinux, it has passed
'struct task_struct' associated with the binder_proc
to represent the source and target of transactions.
The conversion of task to SID was then done in the hook
implementations. It turns out that there are race conditions
which can result in an incorrect security context being used.
Fix by using the 'struct cred' saved during binder_open and pass
it to the selinux subsystem.
Cc: stable@vger.kernel.org # 5.14 (need backport for earlier stables)
Fixes: 79af73079d75 ("Add security hooks to binder and implement the hooks for SELinux.")
Suggested-by: Jann Horn <jannh@google.com>
Signed-off-by: Todd Kjos <tkjos@google.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
This reverts commit 58877b0824da15698bd85a0a9dbfa8c354e6ecb7.
It has been reported to be causing problems in Arch and Fedora bug
reports.
Reported-by: Hans de Goede <hdegoede@redhat.com>
Link: https://bbs.archlinux.org/viewtopic.php?pid=2000956#p2000956
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2019542
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2019576
Link: https://lore.kernel.org/r/42bcbea6-5eb8-16c7-336a-2cb72e71bc36@redhat.com
Cc: Mathias Nyman <mathias.nyman@linux.intel.com>
Cc: Chris Chiu <chris.chiu@canonical.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Add SMI event triggering support.
Signed-off-by: Jae Hyun Yoo <jae.hyun.yoo@intel.com>
Change-Id: I711b5642a654e671a2d97d3079e3a1a055d400a0
|
|
The size of mailbox differ from AST2500, AST2600 A0 and A1. Add an ioctl
support to fetch the mailbox size.
Tested:
Verfied ioctl call returns mailbox size as expected.
Change-Id: I4e261aaf8aa3fb108d6ad152d30a17b114d70ccd
Signed-off-by: Arun P. Mohanan <arun.p.m@linux.intel.com>
|
|
- Move TDI state matrix to core header file
- These changes are done based on feedback from Paul
Fertser, from the OpenOCD.
Test:
SPR ASD Sanity and jtag_test finished successfully.
ICX ASD Sanity and jtag_test finished successfully.
Change-Id: Idb612e50d5a8ea5929f7c9241d279c345587983a
Signed-off-by: Castro, Omar Eduardo <omar.eduardo.castro@intel.com>
|
|
JTAG xfer length is measured in bits and it is allowed to send non 8-bit
aligned xfers. For such xfers we will read the content of the remaining
bits in the last byte of tdi buffer and restore those bits along with
the xfer readback.
Add also linux types to JTAG header to remove external dependencies.
Test:
SPR ASD Sanity and jtag_test finished successfully.
SKX ASD Sanity and jtag_test finished successfully.
Signed-off-by: Ernesto Corona <ernesto.corona@intel.com>
|
|
This commit adds CPU generation info for ICX-D Xeon family.
Signed-off-by: Saravanan Palanisamy <saravanan.palanisamy@intel.com>
Signed-off-by: Anoop S <anoopx.s@intel.com>
|
|
This commit adds CPU generation info for ICX family.
Signed-off-by: Jae Hyun Yoo <jae.hyun.yoo@intel.com>
|
|
Recently, aspeed-mctp driver functionality was extended to store BDF
values for already discovered MCTP endpoints on PCIe bus.
Let's expose kernel API to read BDF based on endpoint ID.
Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>
|
|
AST2600 A1 has separate reset control for LPC and eSPI so this
commit fix the index definition to make it work on AST2600 A1.
Signed-off-by: Jae Hyun Yoo <jae.hyun.yoo@intel.com>
|
|
This commit ports I3C updates from Aspeed SDK v00.06.00.
Note: Should be refined to get upstreamed.
Signed-off-by: Dylan Hung <dylan_hung@aspeedtech.com>
|
|
Right now, PECI revision is determined using a result of GetDIB() PECI
command. Because GetDIB() may not be supported by all type of physical
media that provides PECI, we need an alternative.
Until we figure how to determine PECI revision there (if we can't do
that, we'll fallback to device tree), let's allow to hardcode PECI
revision as a property of hardware adapter.
Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>
|
|
Some protocols that are already implemented in kernel can be
encapsulated in MCTP packets. To allow use aspeed-mctp internally in
kernel space, let's allow to use selected functions outside of
aspeed-mctp.
Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>
|
|
Implement two new ioctls for storing EID related information:
* ASPEED_MCTP_IOCTL_GET_EID_INFO
* ASPEED_MCTP_IOCTL_SET_EID_INFO
Driver stores EID mapping in a list which is traversed when
one tries to get information using ASPEED_MCTP_IOCTL_GET_EID_INFO
ioctl, when given EID mapping is not found in the list, next entry
is returned. When there are no entries with EIDs higher than specified
in the IOCTL call -ENODEV is returned.
Whenever new information about EID mapping is stored with
ASPEED_MCTP_IOCTL_SET_EID_INFO ioctl driver empties exsiting
list of mappings and creates new one based on user input.
After insertion list is sorted by EID. Invalid input
such as duplicated EIDs will cause driver to return -EINVAL.
Signed-off-by: Karol Wachowski <karol.wachowski@intel.com>
|
|
MCTP client can register for receiving packets with selected
MCTP message type or PCIE vendor defined message type.
Vendor defined type is 2 bytes but in Intel VDMs the first byte
is variable and only the second byte contains constant message
type - to support this use case we have to specify 2 byte mask
that is applied to packet type before comparing with registered
vendor type.
When MCTP packet arrives its header is compared with a list
of registered (vendor) types.
If no client registered for packet's (vendor) type then
the packet is dispatched to the default client.
Fragmented packets are not considered for type matching.
Only one client can register for given (vendor) type.
Client can register for multiple (vendor) types.
All packet fields must be specified in big endian byte
order.
This feature allows to support multiple clients simultaneously
but only one client per (vendor) message type.
For example we can have PECI client in kernel that uses PECI
vendor message type, dcpmm daemon in user space that handles
NVDIMM vendor type messages and mctpd service that handles MCTP
control and PLDM message types.
tested with peci_mctp_test application
Signed-off-by: Andrzej Kacprowski <andrzej.kacprowski@linux.intel.com>
|
|
Add IOCTL to register given client as default client that
receives all packets that were not dispatched to other
clients.
This IOCTL is intended to be used by mctpd service or test
application that should receive all packets that are not
claimed by other clients.
mctpd service might not be the first user space
client since dcpmm or telemetry client can start
before mctpd or mctpd can crash and be restarted
automatically at any time.
To preserve backward compatibility with mctpd, the first user space
client will be registered automatically as default client - once mctpd
is modified to call ASPEED_MCTP_IOCTL_REGISTER_DEFAULT_HANDLER we
can remove this workaround.
Signed-off-by: Andrzej Kacprowski <andrzej.kacprowski@linux.intel.com>
|
|
1. Helpers for reading/writing PCS registers added.
2. PECI sensor configuration structure definition and helpers added.
3. New PECI PCS index and parameters definitions added.
Tested:
* on WilsonCity platform
* hwmon/peci modules work as before the change
Signed-off-by: Zbigniew Lukwinski <zbigniew.lukwinski@linux.intel.com>
|
|
Currently, there is no proper MCTP networking subsystem in Linux.
Until we are able to work out the details of that, we are going to
expose HW to userspace using raw read/write interface.
Because of that, this driver is not intended to be submitted upstream.
Here we are providing a simple device driver for AST2600 MCTP
controller.
v2: Added workarounds for BMC reboot/reset, corrected endianess comment,
changed TX_BUF_ADDR to be consistent, fixed typos.
v3: Added workaround for RX hang, added swapping PCIe VDM header to
network order, corrected buffer allocation size.
v4: Fixed TX broken after sending 32 byte packet
Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>
Signed-off-by: Andrzej Kacprowski <andrzej.kacprowski@linux.intel.com>
|