summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2020-11-30SUNRPC: Prepare for xdr_stream-style decoding on the server-sideChuck Lever1-0/+16
A "permanent" struct xdr_stream is allocated in struct svc_rqst so that it is usable by all server-side decoders. A per-rqst scratch buffer is also allocated to handle decoding XDR data items that cross page boundaries. To demonstrate how it will be used, add the first call site for the new svcxdr_init_decode() API. As an additional part of the overall conversion, add symbolic constants for successful and failed XDR operations. Returning "0" is overloaded. Sometimes it means something failed, but sometimes it means success. To make it more clear when XDR decoding functions succeed or fail, introduce symbolic constants. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30SUNRPC: Add xdr_set_scratch_page() and xdr_reset_scratch_buffer()Chuck Lever1-1/+43
Clean up: De-duplicate some frequently-used code. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30SUNRPC: Move the svc_xdr_recvfrom() tracepointChuck Lever1-24/+0
Commit c509f15a5801 ("SUNRPC: Split the xdr_buf event class") added display of the rqst's XID to the svc_xdr_buf_class. However, when the recvfrom tracepoint fires, rq_xid has yet to be filled in with the current XID. So it ends up recording the previous XID that was handled by that svc_rqst. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30svcrdma: Use the new parsed chunk list when pulling Read chunksChuck Lever1-3/+3
As a pre-requisite for handling multiple Read chunks in each Read list, convert svc_rdma_recv_read_chunk() to use the new parsed Read chunk list. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30svcrdma: Clean up chunk tracepointsChuck Lever1-96/+14
We already have trace_svcrdma_decode_rseg(), which records each ingress Read segment. Instead of reporting those again when they are about to be posted as RDMA Reads, let's fire one tracepoint before posting each type of chunk. So we'll get: nfsd-1998 [002] 321.666615: svcrdma_decode_rseg: cq.id=4 cid=42 segno=0 position=0 192@0x013ca9ebfae14000:0xb0010b05 nfsd-1998 [002] 321.666615: svcrdma_decode_rseg: cq.id=4 cid=42 segno=1 position=0 7688@0x013ca9ebf914e000:0xb0010a05 nfsd-1998 [002] 321.666615: svcrdma_decode_rseg: cq.id=4 cid=42 segno=2 position=0 28@0x013ca9ebfae15000:0xb0010905 nfsd-1998 [002] 321.666622: svcrdma_decode_rqst: cq.id=4 cid=42 xid=0x013ca9eb vers=1 credits=128 proc=RDMA_NOMSG hdrlen=100 nfsd-1998 [002] 321.666642: svcrdma_post_read_chunk: cq.id=3 cid=112 sqecount=3 kworker/2:1H-221 [002] 321.673949: svcrdma_wc_read: cq.id=3 cid=112 status=SUCCESS (0/0x0) Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30svcrdma: Remove chunk list pointersChuck Lever1-4/+0
Clean up: These pointers are no longer used. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30svcrdma: Support multiple Write chunks in svc_rdma_send_reply_chunkChuck Lever1-1/+1
Refactor svc_rdma_send_reply_chunk() so that it Sends only the parts of rq_res that do not contain a result payload. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30svcrdma: Support multiple Write chunks in svc_rdma_map_reply_msg()Chuck Lever2-1/+2
Refactor: svc_rdma_map_reply_msg() is restructured to DMA map only the parts of rq_res that do not contain a result payload. This change has been tested to confirm that it does not cause a regression in the no Write chunk and single Write chunk cases. Multiple Write chunks have not been tested. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30svcrdma: Support multiple write chunks when pulling upChuck Lever2-5/+17
When counting the number of SGEs needed to construct a Send request, do not count result payloads. And, when copying the Reply message into the pull-up buffer, result payloads are not to be copied to the Send buffer. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30svcrdma: Use parsed chunk lists to encode Reply transport headersChuck Lever1-1/+36
Refactor: Instead of re-parsing the ingress RPC Call transport header when constructing the egress RPC Reply transport header, use the new parsed Write list and Reply chunk, which are version- agnostic and already XDR decoded. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30svcrdma: Use parsed chunk lists to construct RDMA WritesChuck Lever1-3/+2
Refactor: Instead of re-parsing the ingress RPC Call transport header when constructing RDMA Writes, use the new parsed chunk lists for the Write list and Reply chunk, which are version-agnostic and already XDR-decoded. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30svcrdma: Use parsed chunk lists to detect reverse direction repliesChuck Lever1-0/+1
Refactor: Don't duplicate header decoding smarts here. Instead, use the new parsed chunk lists. Note that the XID sanity test is also removed. The XID is already looked up by the cb handler, and is rejected if it's not recognized. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30svcrdma: Add a "parsed chunk list" data structureChuck Lever3-2/+213
This simple data structure binds the location of each data payload inside of an RPC message to the chunk that will be used to push it to or pull it from the client. There are several benefits to this small additional overhead: * It enables support for more than one chunk in incoming Read and Write lists. * It translates the version-specific on-the-wire format into a generic in-memory structure, enabling support for multiple versions of the RPC/RDMA transport protocol. * It enables the server to re-organize a chunk list if it needs to adjust where Read chunk data lands in server memory without altering the contents of the XDR-encoded Receive buffer. Construction of these lists is done while sanity checking each incoming RPC/RDMA header. Subsequent patches will make use of the generated data structures. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30svcrdma: Post RDMA Writes while XDR encoding repliesChuck Lever1-3/+1
The only RPC/RDMA ordering requirement between RDMA Writes and RDMA Sends is that the responder must post the Writes on the Send queue before posting the Send that conveys the RPC Reply for that Write payload. The Linux NFS server implementation now has a transport method that can post result Payload Writes earlier than svc_rdma_sendto: ->xpo_result_payload() This gets RDMA Writes going earlier so they are more likely to be complete at the remote end before the Send completes. Some care must be taken with pulled-up Replies. We don't want to push the Write chunk and then send the same payload data via Send. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30SUNRPC: Rename svc_encode_read_payload()Chuck Lever3-7/+7
Clean up: "result payload" is a less confusing name for these payloads. "READ payload" reflects only the NFS usage. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30SUNRPC: Adjust synopsis of xdr_buf_subsegment()Chuck Lever1-1/+2
Clean up: This enables xdr_buf_subsegment()'s callers to pass in a const pointer to that buffer. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-28Merge tag 'asm-generic-fixes-5.10-2' of ↵Linus Torvalds1-0/+13
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic fix from Arnd Bergmann: "Add correct MAX_POSSIBLE_PHYSMEM_BITS setting to asm-generic. This is a single bugfix for a bug that Stefan Agner found on 32-bit Arm, but that exists on several other architectures" * tag 'asm-generic-fixes-5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: arch: pgtable: define MAX_POSSIBLE_PHYSMEM_BITS where needed
2020-11-28Merge tag 'arm-soc-fixes-v5.10-3' of ↵Linus Torvalds2-4/+1
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fixes from Arnd Bergmann: "Another set of patches for devicetree files and Arm SoC specific drivers: - A fix for OP-TEE shared memory on non-SMP systems - multiple code fixes for the OMAP platform, including one regression for the CPSW network driver and a few runtime warning fixes - Some DT patches for the Rockchip RK3399 platform, in particular fixing the MMC device ordering that recently became nondeterministic with async probe. - Multiple DT fixes for the Tegra platform, including a regression fix for suspend/resume on TX2 - A regression fix for a user-triggered fault in the NXP dpio driver - A regression fix for a bug caused by an earlier bug fix in the xilinx firmware driver - Two more DTC warning fixes - Sylvain Lemieux steps down as maintainer for the NXP LPC32xx platform" * tag 'arm-soc-fixes-v5.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (24 commits) arm64: tegra: Fix Tegra234 VDK node names arm64: tegra: Wrong AON HSP reg property size arm64: tegra: Fix USB_VBUS_EN0 regulator on Jetson TX1 arm64: tegra: Correct the UART for Jetson Xavier NX arm64: tegra: Disable the ACONNECT for Jetson TX2 optee: add writeback to valid memory type firmware: xilinx: Use hash-table for api feature check firmware: xilinx: Fix SD DLL node reset issue soc: fsl: dpio: Get the cpumask through cpumask_of(cpu) ARM: dts: dra76x: m_can: fix order of clocks bus: ti-sysc: suppress err msg for timers used as clockevent/source MAINTAINERS: Remove myself as LPC32xx maintainers arm64: dts: qcom: clear the warnings caused by empty dma-ranges arm64: dts: broadcom: clear the warnings caused by empty dma-ranges ARM: dts: am437x-l4: fix compatible for cpsw switch dt node arm64: dts: rockchip: Reorder LED triggers from mmc devices on rk3399-roc-pc. arm64: dts: rockchip: Assign a fixed index to mmc devices on rk3399 boards. arm64: dts: rockchip: Remove system-power-controller from pmic on Odroid Go Advance arm64: dts: rockchip: fix NanoPi R2S GMAC clock name ARM: OMAP2+: Manage MPU state properly for omap_enter_idle_coupled() ...
2020-11-28Merge tag 'net-5.10-rc6' of ↵Linus Torvalds6-2/+26
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes for 5.10-rc6, including fixes from the WiFi driver, and CAN subtrees. Current release - regressions: - gro_cells: reduce number of synchronize_net() calls - ch_ktls: release a lock before jumping to an error path Current release - always broken: - tcp: Allow full IP tos/IPv6 tclass to be reflected in L3 header Previous release - regressions: - net/tls: fix missing received data after fast remote close - vsock/virtio: discard packets only when socket is really closed - sock: set sk_err to ee_errno on dequeue from errq - cxgb4: fix the panic caused by non smac rewrite Previous release - always broken: - tcp: fix corner cases around setting ECN with BPF selection of congestion control - tcp: fix race condition when creating child sockets from syncookies on loopback interface - usbnet: ipheth: fix connectivity with iOS 14 - tun: honor IOCB_NOWAIT flag - net/packet: fix packet receive on L3 devices without visible hard header - devlink: Make sure devlink instance and port are in same net namespace - net: openvswitch: fix TTL decrement action netlink message format - bonding: wait for sysfs kobject destruction before freeing struct slave - net: stmmac: fix upstream patch applied to the wrong context - bnxt_en: fix return value and unwind in probe error paths Misc: - devlink: add extra layer of categorization to the reload stats uAPI before it's released" * tag 'net-5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (68 commits) sock: set sk_err to ee_errno on dequeue from errq mptcp: fix NULL ptr dereference on bad MPJ net: openvswitch: fix TTL decrement action netlink message format can: af_can: can_rx_unregister(): remove WARN() statement from list operation sanity check can: m_can: m_can_dev_setup(): add support for bosch mcan version 3.3.0 can: m_can: fix nominal bitiming tseg2 min for version >= 3.1 can: m_can: m_can_open(): remove IRQF_TRIGGER_FALLING from request_threaded_irq()'s flags can: mcp251xfd: mcp251xfd_probe(): bail out if no IRQ was given can: gs_usb: fix endianess problem with candleLight firmware ch_ktls: lock is not freed net/tls: Protect from calling tls_dev_del for TLS RX twice devlink: Make sure devlink instance and port are in same net namespace devlink: Hold rtnl lock while reading netdev attributes ptp: clockmatrix: bug fix for idtcm_strverscmp enetc: Let the hardware auto-advance the taprio base-time of 0 gro_cells: reduce number of synchronize_net() calls net: stmmac: fix incorrect merge of patch upstream ipv6: addrlabel: fix possible memory leak in ip6addrlbl_net_init Documentation: netdev-FAQ: suggest how to post co-dependent series ibmvnic: enhance resetting status check during module exit ...
2020-11-27net: openvswitch: fix TTL decrement action netlink message formatEelco Chaudron1-0/+2
Currently, the openvswitch module is not accepting the correctly formated netlink message for the TTL decrement action. For both setting and getting the dec_ttl action, the actions should be nested in the OVS_DEC_TTL_ATTR_ACTION attribute as mentioned in the openvswitch.h uapi. When the original patch was sent, it was tested with a private OVS userspace implementation. This implementation was unfortunately not upstreamed and reviewed, hence an erroneous version of this patch was sent out. Leaving the patch as-is would cause problems as the kernel module could interpret additional attributes as actions and vice-versa, due to the actions not being encapsulated/nested within the actual attribute, but being concatinated after it. Fixes: 744676e77720 ("openvswitch: add TTL decrement action") Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Link: https://lore.kernel.org/r/160622121495.27296.888010441924340582.stgit@wsfd-netdev64.ntdv.lab.eng.bos.redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-27Merge tag 'writeback_for_v5.10-rc6' of ↵Linus Torvalds1-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull writeback fix from Jan Kara: "A fix of possible missing string termination in writeback tracepoints" * tag 'writeback_for_v5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: trace: fix potenial dangerous pointer
2020-11-27Merge tag 'omap-for-v5.10/fixes-rc5-signed' of ↵Arnd Bergmann1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/fixes Fixes for omaps for various issues noticed during the -rc cycle: - Earlier omap4 cpuidle fix was incomplete and needs to use a configured idle state instead - Fix am4 cpsw driver compatible to avoid invalid resource error for the legacy driver - Two kconfig fixes for genpd support that we added for for v5.10 for proper location of the option and adding missing option - Fix ti-sysc reset status checking on enabling modules to ignore quirky modules with reset status only usable when the quirk is activated during reset. Also fix bogus resetdone warning for cpsw and modules with no sysst register reset status bit - Suppress a ti-sysc warning for timers reserved as system timers - Fix the ordering of clocks for dra7 m_can * tag 'omap-for-v5.10/fixes-rc5-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: ARM: dts: dra76x: m_can: fix order of clocks bus: ti-sysc: suppress err msg for timers used as clockevent/source ARM: dts: am437x-l4: fix compatible for cpsw switch dt node ARM: OMAP2+: Manage MPU state properly for omap_enter_idle_coupled() bus: ti-sysc: Fix bogus resetdone warning on enable for cpsw bus: ti-sysc: Fix reset status check for modules with quirks ARM: OMAP2+: Fix missing select PM_GENERIC_DOMAINS_OF ARM: OMAP2+: Fix location for select PM_GENERIC_DOMAINS Link: https://lore.kernel.org/r/pull-1606460270-864284@atomide.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2020-11-26mm: memcg: relayout structure mem_cgroup to avoid cache interferenceFeng Tang1-14/+14
0day reported one -22.7% regression for will-it-scale page_fault2 case [1] on a 4 sockets 144 CPU platform, and bisected to it to be caused by Waiman's optimization (commit bd0b230fe1) of saving one 'struct page_counter' space for 'struct mem_cgroup'. Initially we thought it was due to the cache alignment change introduced by the patch, but further debug shows that it is due to some hot data members ('vmstats_local', 'vmstats_percpu', 'vmstats') sit in 2 adjacent cacheline (2N and 2N+1 cacheline), and when adjacent cache line prefetch is enabled, it triggers an "extended level" of cache false sharing for 2 adjacent cache lines. So exchange the 2 member blocks, while keeping mostly the original cache alignment, which can restore and even enhance the performance, and save 64 bytes of space for 'struct mem_cgroup' (from 2880 to 2816, with 0day's default RHEL-8.3 kernel config) [1]. https://lore.kernel.org/lkml/20201102091543.GM31092@shao2-debian/ Fixes: bd0b230fe145 ("mm/memcg: unify swap and memsw page counters") Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: Feng Tang <feng.tang@intel.com> Acked-by: Waiman Long <longman@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-11-26net/tls: Protect from calling tls_dev_del for TLS RX twiceMaxim Mikityanskiy1-0/+6
tls_device_offload_cleanup_rx doesn't clear tls_ctx->netdev after calling tls_dev_del if TLX TX offload is also enabled. Clearing tls_ctx->netdev gets postponed until tls_device_gc_task. It leaves a time frame when tls_device_down may get called and call tls_dev_del for RX one extra time, confusing the driver, which may lead to a crash. This patch corrects this racy behavior by adding a flag to prevent tls_device_down from calling tls_dev_del the second time. Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20201125221810.69870-1-saeedm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-25trace: fix potenial dangerous pointerHui Su1-4/+4
The bdi_dev_name() returns a char [64], and the __entry->name is a char [32]. It maybe dangerous to TP_printk("%s", __entry->name) after the strncpy(). CC: stable@vger.kernel.org Link: https://lore.kernel.org/r/20201124165205.GA23937@rlk Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Hui Su <sh_def@163.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-11-25devlink: Fix reload stats structureMoshe Shemesh1-0/+2
Fix reload stats structure exposed to the user. Change stats structure hierarchy to have the reload action as a parent of the stat entry and then stat entry includes value per limit. This will also help to avoid string concatenation on iproute2 output. Reload stats structure before this fix: "stats": { "reload": { "driver_reinit": 2, "fw_activate": 1, "fw_activate_no_reset": 0 } } After this fix: "stats": { "reload": { "driver_reinit": { "unspecified": 2 }, "fw_activate": { "unspecified": 1, "no_reset": 0 } } Fixes: a254c264267e ("devlink: Add reload stats") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/1606109785-25197-1-git-send-email-moshe@mellanox.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-24firmware: xilinx: Use hash-table for api feature checkAmit Sunil Dhamne1-4/+0
Currently array of fix length PM_API_MAX is used to cache the pm_api version (valid or invalid). However ATF based PM APIs values are much higher then PM_API_MAX. So to include ATF based PM APIs also, use hash-table to store the pm_api version status. Signed-off-by: Amit Sunil Dhamne <amit.sunil.dhamne@xilinx.com> Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ravi Patel <ravi.patel@xilinx.com> Signed-off-by: Rajan Vaja <rajan.vaja@xilinx.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Tested-by: Michal Simek <michal.simek@xilinx.com> Fixes: f3217d6f2f7a ("firmware: xilinx: fix out-of-bounds access") Cc: stable <stable@vger.kernel.org> Link: https://lore.kernel.org/r/1606197161-25976-1-git-send-email-rajan.vaja@xilinx.com Signed-off-by: Michal Simek <michal.simek@xilinx.com>
2020-11-24net/packet: fix packet receive on L3 devices without visible hard headerEyal Birger1-0/+5
In the patchset merged by commit b9fcf0a0d826 ("Merge branch 'support-AF_PACKET-for-layer-3-devices'") L3 devices which did not have header_ops were given one for the purpose of protocol parsing on af_packet transmit path. That change made af_packet receive path regard these devices as having a visible L3 header and therefore aligned incoming skb->data to point to the skb's mac_header. Some devices, such as ipip, xfrmi, and others, do not reset their mac_header prior to ingress and therefore their incoming packets became malformed. Ideally these devices would reset their mac headers, or af_packet would be able to rely on dev->hard_header_len being 0 for such cases, but it seems this is not the case. Fix by changing af_packet RX ll visibility criteria to include the existence of a '.create()' header operation, which is used when creating a device hard header - via dev_hard_header() - by upper layers, and does not exist in these L3 devices. As this predicate may be useful in other situations, add it as a common dev_has_header() helper in netdevice.h. Fixes: b9fcf0a0d826 ("Merge branch 'support-AF_PACKET-for-layer-3-devices'") Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Acked-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20201121062817.3178900-1-eyal.birger@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-24tcp: fix race condition when creating child sockets from syncookiesRicardo Dias1-2/+3
When the TCP stack is in SYN flood mode, the server child socket is created from the SYN cookie received in a TCP packet with the ACK flag set. The child socket is created when the server receives the first TCP packet with a valid SYN cookie from the client. Usually, this packet corresponds to the final step of the TCP 3-way handshake, the ACK packet. But is also possible to receive a valid SYN cookie from the first TCP data packet sent by the client, and thus create a child socket from that SYN cookie. Since a client socket is ready to send data as soon as it receives the SYN+ACK packet from the server, the client can send the ACK packet (sent by the TCP stack code), and the first data packet (sent by the userspace program) almost at the same time, and thus the server will equally receive the two TCP packets with valid SYN cookies almost at the same instant. When such event happens, the TCP stack code has a race condition that occurs between the momement a lookup is done to the established connections hashtable to check for the existence of a connection for the same client, and the moment that the child socket is added to the established connections hashtable. As a consequence, this race condition can lead to a situation where we add two child sockets to the established connections hashtable and deliver two sockets to the userspace program to the same client. This patch fixes the race condition by checking if an existing child socket exists for the same client when we are adding the second child socket to the established connections socket. If an existing child socket exists, we drop the packet and discard the second child socket to the same client. Signed-off-by: Ricardo Dias <rdias@singlestore.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20201120111133.GA67501@rdias-suse-pc.lan Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-23Merge tag 'sched-urgent-2020-11-22' of ↵Linus Torvalds1-2/+24
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Thomas Gleixner: "A couple of scheduler fixes: - Make the conditional update of the overutilized state work correctly by caching the relevant flags state before overwriting them and checking them afterwards. - Fix a data race in the wakeup path which caused loadavg on ARM64 platforms to become a random number generator. - Fix the ordering of the iowaiter accounting operations so it can't be decremented before it is incremented. - Fix a bug in the deadline scheduler vs. priority inheritance when a non-deadline task A has inherited the parameters of a deadline task B and then blocks on a non-deadline task C. The second inheritance step used the static deadline parameters of task A, which are usually 0, instead of further propagating task B's parameters. The zero initialized parameters trigger a bug in the deadline scheduler" * tag 'sched-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/deadline: Fix priority inheritance with multiple scheduling classes sched: Fix rq->nr_iowait ordering sched: Fix data-race in wakeup sched/fair: Fix overutilized update in enqueue_task_fair()
2020-11-22Merge branch 'akpm' (patches from Andrew)Linus Torvalds4-15/+33
Merge misc fixes from Andrew Morton: "8 patches. Subsystems affected by this patch series: mm (madvise, pagemap, readahead, memcg, userfaultfd), kbuild, and vfs" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm: fix madvise WILLNEED performance problem libfs: fix error cast of negative value in simple_attr_write() mm/userfaultfd: do not access vma->vm_mm after calling handle_userfault() mm: memcg/slab: fix root memcg vmstats mm: fix readahead_page_batch for retry entries mm: fix phys_to_target_node() and memory_add_physaddr_to_nid() exports compiler-clang: remove version check for BPF Tracing mm/madvise: fix memory leak from process_madvise
2020-11-22Merge tag 'ext4_for_linus_fixes2' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "A final set of miscellaneous bug fixes for ext4" * tag 'ext4_for_linus_fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix bogus warning in ext4_update_dx_flag() jbd2: fix kernel-doc markups ext4: drop fast_commit from /proc/mounts
2020-11-22mm: fix readahead_page_batch for retry entriesMatthew Wilcox (Oracle)1-0/+2
Both btrfs and fuse have reported faults caused by seeing a retry entry instead of the page they were looking for. This was caused by a missing check in the iterator. As can be seen in the below panic log, the accessing 0x402 causes a panic. In the xarray.h, 0x402 means RETRY_ENTRY. BUG: kernel NULL pointer dereference, address: 0000000000000402 CPU: 14 PID: 306003 Comm: as Not tainted 5.9.0-1-amd64 #1 Debian 5.9.1-1 Hardware name: Lenovo ThinkSystem SR665/7D2VCTO1WW, BIOS D8E106Q-1.01 05/30/2020 RIP: 0010:fuse_readahead+0x152/0x470 [fuse] Code: 41 8b 57 18 4c 8d 54 10 ff 4c 89 d6 48 8d 7c 24 10 e8 d2 e3 28 f9 48 85 c0 0f 84 fe 00 00 00 44 89 f2 49 89 04 d4 44 8d 72 01 <48> 8b 10 41 8b 4f 1c 48 c1 ea 10 83 e2 01 80 fa 01 19 d2 81 e2 01 RSP: 0018:ffffad99ceaebc50 EFLAGS: 00010246 RAX: 0000000000000402 RBX: 0000000000000001 RCX: 0000000000000002 RDX: 0000000000000000 RSI: ffff94c5af90bd98 RDI: ffffad99ceaebc60 RBP: ffff94ddc1749a00 R08: 0000000000000402 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000100 R12: ffff94de6c429ce0 R13: ffff94de6c4d3700 R14: 0000000000000001 R15: ffffad99ceaebd68 FS: 00007f228c5c7040(0000) GS:ffff94de8ed80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000402 CR3: 0000001dbd9b4000 CR4: 0000000000350ee0 Call Trace: read_pages+0x83/0x270 page_cache_readahead_unbounded+0x197/0x230 generic_file_buffered_read+0x57a/0xa20 new_sync_read+0x112/0x1a0 vfs_read+0xf8/0x180 ksys_read+0x5f/0xe0 do_syscall_64+0x33/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 042124cc64c3 ("mm: add new readahead_control API") Reported-by: David Sterba <dsterba@suse.com> Reported-by: Wonhyuk Yang <vvghjk1234@gmail.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20201103142852.8543-1-willy@infradead.org Link: https://lkml.kernel.org/r/20201103124349.16722-1-vvghjk1234@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-11-22mm: fix phys_to_target_node() and memory_add_physaddr_to_nid() exportsDan Williams2-15/+29
The core-mm has a default __weak implementation of phys_to_target_node() to mirror the weak definition of memory_add_physaddr_to_nid(). That symbol is exported for modules. However, while the export in mm/memory_hotplug.c exported the symbol in the configuration cases of: CONFIG_NUMA_KEEP_MEMINFO=y CONFIG_MEMORY_HOTPLUG=y ...and: CONFIG_NUMA_KEEP_MEMINFO=n CONFIG_MEMORY_HOTPLUG=y ...it failed to export the symbol in the case of: CONFIG_NUMA_KEEP_MEMINFO=y CONFIG_MEMORY_HOTPLUG=n Not only is that broken, but Christoph points out that the kernel should not be exporting any __weak symbol, which means that memory_add_physaddr_to_nid() example that phys_to_target_node() copied is broken too. Rework the definition of phys_to_target_node() and memory_add_physaddr_to_nid() to not require weak symbols. Move to the common arch override design-pattern of an asm header defining a symbol to replace the default implementation. The only common header that all memory_add_physaddr_to_nid() producing architectures implement is asm/sparsemem.h. In fact, powerpc already defines its memory_add_physaddr_to_nid() helper in sparsemem.h. Double-down on that observation and define phys_to_target_node() where necessary in asm/sparsemem.h. An alternate consideration that was discarded was to put this override in asm/numa.h, but that entangles with the definition of MAX_NUMNODES relative to the inclusion of linux/nodemask.h, and requires powerpc to grow a new header. The dependency on NUMA_KEEP_MEMINFO for DEV_DAX_HMEM_DEVICES is invalid now that the symbol is properly exported / stubbed in all combinations of CONFIG_NUMA_KEEP_MEMINFO and CONFIG_MEMORY_HOTPLUG. [dan.j.williams@intel.com: v4] Link: https://lkml.kernel.org/r/160461461867.1505359.5301571728749534585.stgit@dwillia2-desk3.amr.corp.intel.com [dan.j.williams@intel.com: powerpc: fix create_section_mapping compile warning] Link: https://lkml.kernel.org/r/160558386174.2948926.2740149041249041764.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: a035b6bf863e ("mm/memory_hotplug: introduce default phys_to_target_node() implementation") Reported-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: Thomas Gleixner <tglx@linutronix.de> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Link: https://lkml.kernel.org/r/160447639846.1133764.7044090803980177548.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-11-22compiler-clang: remove version check for BPF TracingNick Desaulniers1-0/+2
bpftrace parses the kernel headers and uses Clang under the hood. Remove the version check when __BPF_TRACING__ is defined (as bpftrace does) so that this tool can continue to parse kernel headers, even with older clang sources. Fixes: commit 1f7a44f63e6c ("compiler-clang: add build check for clang 10.0.1") Reported-by: Chen Yu <yu.chen.surf@gmail.com> Reported-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Jarkko Sakkinen <jarkko@kernel.org> Acked-by: Jarkko Sakkinen <jarkko@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Nathan Chancellor <natechancellor@gmail.com> Acked-by: Miguel Ojeda <ojeda@kernel.org> Link: https://lkml.kernel.org/r/20201104191052.390657-1-ndesaulniers@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-11-22bonding: wait for sysfs kobject destruction before freeing struct slaveJamie Iles1-0/+8
syzkaller found that with CONFIG_DEBUG_KOBJECT_RELEASE=y, releasing a struct slave device could result in the following splat: kobject: 'bonding_slave' (00000000cecdd4fe): kobject_release, parent 0000000074ceb2b2 (delayed 1000) bond0 (unregistering): (slave bond_slave_1): Releasing backup interface ------------[ cut here ]------------ ODEBUG: free active (active state 0) object type: timer_list hint: workqueue_select_cpu_near kernel/workqueue.c:1549 [inline] ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x98 kernel/workqueue.c:1600 WARNING: CPU: 1 PID: 842 at lib/debugobjects.c:485 debug_print_object+0x180/0x240 lib/debugobjects.c:485 Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 842 Comm: kworker/u4:4 Tainted: G S 5.9.0-rc8+ #96 Hardware name: linux,dummy-virt (DT) Workqueue: netns cleanup_net Call trace: dump_backtrace+0x0/0x4d8 include/linux/bitmap.h:239 show_stack+0x34/0x48 arch/arm64/kernel/traps.c:142 __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x174/0x1f8 lib/dump_stack.c:118 panic+0x360/0x7a0 kernel/panic.c:231 __warn+0x244/0x2ec kernel/panic.c:600 report_bug+0x240/0x398 lib/bug.c:198 bug_handler+0x50/0xc0 arch/arm64/kernel/traps.c:974 call_break_hook+0x160/0x1d8 arch/arm64/kernel/debug-monitors.c:322 brk_handler+0x30/0xc0 arch/arm64/kernel/debug-monitors.c:329 do_debug_exception+0x184/0x340 arch/arm64/mm/fault.c:864 el1_dbg+0x48/0xb0 arch/arm64/kernel/entry-common.c:65 el1_sync_handler+0x170/0x1c8 arch/arm64/kernel/entry-common.c:93 el1_sync+0x80/0x100 arch/arm64/kernel/entry.S:594 debug_print_object+0x180/0x240 lib/debugobjects.c:485 __debug_check_no_obj_freed lib/debugobjects.c:967 [inline] debug_check_no_obj_freed+0x200/0x430 lib/debugobjects.c:998 slab_free_hook mm/slub.c:1536 [inline] slab_free_freelist_hook+0x190/0x210 mm/slub.c:1577 slab_free mm/slub.c:3138 [inline] kfree+0x13c/0x460 mm/slub.c:4119 bond_free_slave+0x8c/0xf8 drivers/net/bonding/bond_main.c:1492 __bond_release_one+0xe0c/0xec8 drivers/net/bonding/bond_main.c:2190 bond_slave_netdev_event drivers/net/bonding/bond_main.c:3309 [inline] bond_netdev_event+0x8f0/0xa70 drivers/net/bonding/bond_main.c:3420 notifier_call_chain+0xf0/0x200 kernel/notifier.c:83 __raw_notifier_call_chain kernel/notifier.c:361 [inline] raw_notifier_call_chain+0x44/0x58 kernel/notifier.c:368 call_netdevice_notifiers_info+0xbc/0x150 net/core/dev.c:2033 call_netdevice_notifiers_extack net/core/dev.c:2045 [inline] call_netdevice_notifiers net/core/dev.c:2059 [inline] rollback_registered_many+0x6a4/0xec0 net/core/dev.c:9347 unregister_netdevice_many.part.0+0x2c/0x1c0 net/core/dev.c:10509 unregister_netdevice_many net/core/dev.c:10508 [inline] default_device_exit_batch+0x294/0x338 net/core/dev.c:10992 ops_exit_list.isra.0+0xec/0x150 net/core/net_namespace.c:189 cleanup_net+0x44c/0x888 net/core/net_namespace.c:603 process_one_work+0x96c/0x18c0 kernel/workqueue.c:2269 worker_thread+0x3f0/0xc30 kernel/workqueue.c:2415 kthread+0x390/0x498 kernel/kthread.c:292 ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:925 This is a potential use-after-free if the sysfs nodes are being accessed whilst removing the struct slave, so wait for the object destruction to complete before freeing the struct slave itself. Fixes: 07699f9a7c8d ("bonding: add sysfs /slave dir for bond slave devices.") Fixes: a068aab42258 ("bonding: Fix reference count leak in bond_sysfs_slave_add.") Cc: Qiushi Wu <wu000273@umn.edu> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Jamie Iles <jamie@nuviainc.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20201120142827.879226-1-jamie@nuviainc.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-21Merge tag 'scsi-fixes' of ↵Linus Torvalds1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Fixes for two fairly obscure but annoying when triggered races in iSCSI" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: target: iscsi: Fix cmd abort fabric stop race scsi: libiscsi: Fix NOP race condition
2020-11-20Merge tag 'iommu-fixes' of ↵Linus Torvalds1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull iommu fixes from Will Deacon: "Two straightforward vt-d fixes: - Fix boot when intel iommu initialisation fails under TXT (tboot) - Fix intel iommu compilation error when DMAR is enabled without ATS and temporarily update IOMMU MAINTAINERs entry" * tag 'iommu-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: MAINTAINERS: Temporarily add myself to the IOMMU entry iommu/vt-d: Fix compile error with CONFIG_PCI_ATS not set iommu/vt-d: Avoid panic if iommu init fails in tboot system
2020-11-20Merge tag 'sound-5.10-rc5' of ↵Linus Torvalds1-0/+15
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "A collection of small fixes: the only core change is a minor error code handling in the control API, and all the rest are device-specific fixes, mostly quirks, fixups and ASoC Intel fixes. It looks boring, and good so" * tag 'sound-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: mixart: Fix mutex deadlock ALSA: hda/ca0132: Fix compile warning without PCI ASOC: Intel: kbl_rt5663_rt5514_max98927: Do not try to disable disabled clock ALSA: usb-audio: Add delay quirk for all Logitech USB devices ASoC: Intel: catpt: Correct clock selection for dai trigger ASoC: Intel: catpt: Skip position update for unprepared streams ASoC: qcom: lpass-platform: Fix memory leak ASoC: Intel: KMB: Fix S24_LE configuration ALSA: hda: Add Alderlake-S PCI ID and HDMI codec vid ALSA: usb-audio: Use ALC1220-VB-DT mapping for ASUS ROG Strix TRX40 mobo ALSA: firewire: Clean up a locking issue in copy_resp_to_buf() ASoC: rt1015: increase the time to detect BCLK ALSA: ctl: fix error path at adding user-defined element set ALSA: hda/realtek - HP Headset Mic can't detect after boot ALSA: hda/realtek - Add supported mute Led for HP ALSA: hda/realtek: Add some Clove SSID in the ALC293(ALC1220) ALSA: hda/realtek - Add supported for Lenovo ThinkPad Headset Button ASoC: rt1015: add delay to fix pop noise from speaker
2020-11-20jbd2: fix kernel-doc markupsMauro Carvalho Chehab1-1/+1
Kernel-doc markup should use this format: identifier - description They should not have any type before that, as otherwise the parser won't do the right thing. Also, some identifiers have different names between their prototypes and the kernel-doc markup. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Link: https://lore.kernel.org/r/72f5c6628f5f278d67625f60893ffbc2ca28d46e.1605521731.git.mchehab+huawei@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-11-20Merge tag 'net-5.10-rc5' of ↵Linus Torvalds5-4/+71
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes for 5.10-rc5, including fixes from the WiFi (mac80211), can and bpf (including the strncpy_from_user fix). Current release - regressions: - mac80211: fix memory leak of filtered powersave frames - mac80211: free sta in sta_info_insert_finish() on errors to avoid sleeping in atomic context - netlabel: fix an uninitialized variable warning added in -rc4 Previous release - regressions: - vsock: forward all packets to the host when no H2G is registered, un-breaking AWS Nitro Enclaves - net: Exempt multicast addresses from five-second neighbor lifetime requirement, decreasing the chances neighbor tables fill up - net/tls: fix corrupted data in recvmsg - qed: fix ILT configuration of SRC block - can: m_can: process interrupt only when not runtime suspended Previous release - always broken: - page_frag: Recover from memory pressure by not recycling pages allocating from the reserves - strncpy_from_user: Mask out bytes after NUL terminator - ip_tunnels: Set tunnel option flag only when tunnel metadata is present, always setting it confuses Open vSwitch - bpf, sockmap: - Fix partial copy_page_to_iter so progress can still be made - Fix socket memory accounting and obeying SO_RCVBUF - net: Have netpoll bring-up DSA management interface - net: bridge: add missing counters to ndo_get_stats64 callback - tcp: brr: only postpone PROBE_RTT if RTT is < current min_rtt - enetc: Workaround MDIO register access HW bug - net/ncsi: move netlink family registration to a subsystem init, instead of tying it to driver probe - net: ftgmac100: unregister NC-SI when removing driver to avoid crash - lan743x: - prevent interrupt storm on open - fix freeing skbs in the wrong context - net/mlx5e: Fix socket refcount leak on kTLS RX resync - net: dsa: mv88e6xxx: Avoid VLAN database corruption on 6097 - fix 21 unset return codes and other mistakes on error paths, mostly detected by the Hulk Robot" * tag 'net-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (115 commits) fail_function: Remove a redundant mutex unlock selftest/bpf: Test bpf_probe_read_user_str() strips trailing bytes after NUL lib/strncpy_from_user.c: Mask out bytes after NUL terminator. net/smc: fix direct access to ib_gid_addr->ndev in smc_ib_determine_gid() net/smc: fix matching of existing link groups ipv6: Remove dependency of ipv6_frag_thdr_truncated on ipv6 module libbpf: Fix VERSIONED_SYM_COUNT number parsing net/mlx4_core: Fix init_hca fields offset atm: nicstar: Unmap DMA on send error page_frag: Recover from memory pressure net: dsa: mv88e6xxx: Wait for EEPROM done after HW reset mlxsw: core: Use variable timeout for EMAD retries mlxsw: Fix firmware flashing net: Have netpoll bring-up DSA management interface atl1e: fix error return code in atl1e_probe() atl1c: fix error return code in atl1c_probe() ah6: fix error return code in ah6_input() net: usb: qmi_wwan: Set DTR quirk for MR400 can: m_can: process interrupt only when not runtime suspended can: flexcan: flexcan_chip_start(): fix erroneous flexcan_transceiver_enable() during bus-off recovery ...
2020-11-19Merge tag 'spi-fix-v5.10-rc4' of ↵Linus Torvalds1-0/+19
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "This is a relatively large set of fixes, the bulk of it being a series from Lukas Wunner which fixes confusion with the lifetime of driver data allocated along with the SPI controller structure that's been created as part of the conversion to devm APIs. The simplest fix, explained in detail in Lukas' commit message, is to move to a devm_ function for allocation of the controller and hence driver data in order to push the free of that after anything tries to reference the driver data in the remove path. This results in a relatively large diff due to the addition of a new function but isn't particularly complex. There's also a fix from Sven van Asbroeck which fixes yet more fallout from the conflicts between the various different places one can configure the polarity of GPIOs in modern systems. Otherwise everything is fairly small and driver specific" * tag 'spi-fix-v5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: npcm-fiu: Don't leak SPI master in probe error path spi: dw: Set transfer handler before unmasking the IRQs spi: cadence-quadspi: Fix error return code in cqspi_probe spi: bcm2835aux: Restore err assignment in bcm2835aux_spi_probe spi: lpspi: Fix use-after-free on unbind spi: bcm-qspi: Fix use-after-free on unbind spi: bcm2835aux: Fix use-after-free on unbind spi: bcm2835: Fix use-after-free on unbind spi: Introduce device-managed SPI controller allocation spi: fsi: Fix transfer returning without finalizing message spi: fix client driver breakages when using GPIO descriptors
2020-11-19Merge tag 'asoc-fix-v5.10-rc4' of ↵Takashi Iwai1-0/+15
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Fixes for v5.11 A collection of driver specific fixes, mostly for x86 systems (or CODECs used mostly on x86) and all for relatively minor issues, the biggest one being fixing S24_LE format on Keem Bay systems.
2020-11-19ipv6: Remove dependency of ipv6_frag_thdr_truncated on ipv6 moduleGeorg Kohmann2-2/+30
IPV6=m NF_DEFRAG_IPV6=y ld: net/ipv6/netfilter/nf_conntrack_reasm.o: in function `nf_ct_frag6_gather': net/ipv6/netfilter/nf_conntrack_reasm.c:462: undefined reference to `ipv6_frag_thdr_truncated' Netfilter is depending on ipv6 symbol ipv6_frag_thdr_truncated. This dependency is forcing IPV6=y. Remove this dependency by moving ipv6_frag_thdr_truncated out of ipv6. This is the same solution as used with a similar issues: Referring to commit 70b095c843266 ("ipv6: remove dependency of nf_defrag_ipv6 on ipv6 module") Fixes: 9d9e937b1c8b ("ipv6/netfilter: Discard first fragment not including all headers") Reported-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Georg Kohmann <geokohma@cisco.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Link: https://lore.kernel.org/r/20201119095833.8409-1-geokohma@cisco.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-18Merge tag 'nfsd-5.10-2' of git://linux-nfs.org/~bfields/linuxLinus Torvalds1-1/+2
Pull nfsd fix from Bruce Fields: "Just one quick fix for a tracing oops" * tag 'nfsd-5.10-2' of git://linux-nfs.org/~bfields/linux: SUNRPC: Fix oops in the rpc_xdr_buf event class
2020-11-18Merge tag 'linux-kselftest-kunit-fixes-5.10-rc5' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull Kunit fixes from Shuah Khan: "Several fixes to Kunit documentation and tools, and to not pollute the source directory. Also remove the incorrect kunit .gitattributes file" * tag 'linux-kselftest-kunit-fixes-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: kunit: fix display of failed expectations for strings kunit: tool: fix extra trailing \n in raw + parsed test output kunit: tool: print out stderr from make (like build warnings) KUnit: Docs: usage: wording fixes KUnit: Docs: style: fix some Kconfig example issues KUnit: Docs: fix a wording typo kunit: Do not pollute source directory with generated files (test.log) kunit: Do not pollute source directory with generated files (.kunitconfig) kunit: tool: fix pre-existing python type annotation errors kunit: Fix kunit.py parse subcommand (use null build_dir) kunit: tool: unmark test_data as binary blobs
2020-11-18iommu/vt-d: Avoid panic if iommu init fails in tboot systemZhenzhong Duan1-1/+0
"intel_iommu=off" command line is used to disable iommu but iommu is force enabled in a tboot system for security reason. However for better performance on high speed network device, a new option "intel_iommu=tboot_noforce" is introduced to disable the force on. By default kernel should panic if iommu init fail in tboot for security reason, but it's unnecessory if we use "intel_iommu=tboot_noforce,off". Fix the code setting force_on and move intel_iommu_tboot_noforce from tboot code to intel iommu code. Fixes: 7304e8f28bb2 ("iommu/vt-d: Correctly disable Intel IOMMU force on") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@gmail.com> Tested-by: Lukasz Hawrylko <lukasz.hawrylko@linux.intel.com> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20201110071908.3133-1-zhenzhong.duan@gmail.com Signed-off-by: Will Deacon <will@kernel.org>
2020-11-18net/tls: Fix wrong record sn in async mode of device resyncTariq Toukan1-1/+15
In async_resync mode, we log the TCP seq of records until the async request is completed. Later, in case one of the logged seqs matches the resync request, we return it, together with its record serial number. Before this fix, we mistakenly returned the serial number of the current record instead. Fixes: ed9b7646b06a ("net/tls: Add asynchronous resync") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Boris Pismenny <borisp@nvidia.com> Link: https://lore.kernel.org/r/20201115131448.2702-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-17sched/deadline: Fix priority inheritance with multiple scheduling classesJuri Lelli1-1/+9
Glenn reported that "an application [he developed produces] a BUG in deadline.c when a SCHED_DEADLINE task contends with CFS tasks on nested PTHREAD_PRIO_INHERIT mutexes. I believe the bug is triggered when a CFS task that was boosted by a SCHED_DEADLINE task boosts another CFS task (nested priority inheritance). ------------[ cut here ]------------ kernel BUG at kernel/sched/deadline.c:1462! invalid opcode: 0000 [#1] PREEMPT SMP CPU: 12 PID: 19171 Comm: dl_boost_bug Tainted: ... Hardware name: ... RIP: 0010:enqueue_task_dl+0x335/0x910 Code: ... RSP: 0018:ffffc9000c2bbc68 EFLAGS: 00010002 RAX: 0000000000000009 RBX: ffff888c0af94c00 RCX: ffffffff81e12500 RDX: 000000000000002e RSI: ffff888c0af94c00 RDI: ffff888c10b22600 RBP: ffffc9000c2bbd08 R08: 0000000000000009 R09: 0000000000000078 R10: ffffffff81e12440 R11: ffffffff81e1236c R12: ffff888bc8932600 R13: ffff888c0af94eb8 R14: ffff888c10b22600 R15: ffff888bc8932600 FS: 00007fa58ac55700(0000) GS:ffff888c10b00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fa58b523230 CR3: 0000000bf44ab003 CR4: 00000000007606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: ? intel_pstate_update_util_hwp+0x13/0x170 rt_mutex_setprio+0x1cc/0x4b0 task_blocks_on_rt_mutex+0x225/0x260 rt_spin_lock_slowlock_locked+0xab/0x2d0 rt_spin_lock_slowlock+0x50/0x80 hrtimer_grab_expiry_lock+0x20/0x30 hrtimer_cancel+0x13/0x30 do_nanosleep+0xa0/0x150 hrtimer_nanosleep+0xe1/0x230 ? __hrtimer_init_sleeper+0x60/0x60 __x64_sys_nanosleep+0x8d/0xa0 do_syscall_64+0x4a/0x100 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7fa58b52330d ... ---[ end trace 0000000000000002 ]— He also provided a simple reproducer creating the situation below: So the execution order of locking steps are the following (N1 and N2 are non-deadline tasks. D1 is a deadline task. M1 and M2 are mutexes that are enabled * with priority inheritance.) Time moves forward as this timeline goes down: N1 N2 D1 | | | | | | Lock(M1) | | | | | | Lock(M2) | | | | | | Lock(M2) | | | | Lock(M1) | | (!!bug triggered!) | Daniel reported a similar situation as well, by just letting ksoftirqd run with DEADLINE (and eventually block on a mutex). Problem is that boosted entities (Priority Inheritance) use static DEADLINE parameters of the top priority waiter. However, there might be cases where top waiter could be a non-DEADLINE entity that is currently boosted by a DEADLINE entity from a different lock chain (i.e., nested priority chains involving entities of non-DEADLINE classes). In this case, top waiter static DEADLINE parameters could be null (initialized to 0 at fork()) and replenish_dl_entity() would hit a BUG(). Fix this by keeping track of the original donor and using its parameters when a task is boosted. Reported-by: Glenn Elliott <glenn@aurora.tech> Reported-by: Daniel Bristot de Oliveira <bristot@redhat.com> Signed-off-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Daniel Bristot de Oliveira <bristot@redhat.com> Link: https://lkml.kernel.org/r/20201117061432.517340-1-juri.lelli@redhat.com
2020-11-17sched: Fix data-race in wakeupPeter Zijlstra1-1/+15
Mel reported that on some ARM64 platforms loadavg goes bananas and Will tracked it down to the following race: CPU0 CPU1 schedule() prev->sched_contributes_to_load = X; deactivate_task(prev); try_to_wake_up() if (p->on_rq &&) // false if (smp_load_acquire(&p->on_cpu) && // true ttwu_queue_wakelist()) p->sched_remote_wakeup = Y; smp_store_release(prev->on_cpu, 0); where both p->sched_contributes_to_load and p->sched_remote_wakeup are in the same word, and thus the stores X and Y race (and can clobber one another's data). Whereas prior to commit c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p->on_cpu") the p->on_cpu handoff serialized access to p->sched_remote_wakeup (just as it still does with p->sched_contributes_to_load) that commit broke that by calling ttwu_queue_wakelist() with p->on_cpu != 0. However, due to p->XXX = X ttwu() schedule() if (p->on_rq && ...) // false smp_mb__after_spinlock() if (smp_load_acquire(&p->on_cpu) && deactivate_task() ttwu_queue_wakelist()) p->on_rq = 0; p->sched_remote_wakeup = Y; We can be sure any 'current' store is complete and 'current' is guaranteed asleep. Therefore we can move p->sched_remote_wakeup into the current flags word. Note: while the observed failure was loadavg accounting gone wrong due to ttwu() cobbering p->sched_contributes_to_load, the reverse problem is also possible where schedule() clobbers p->sched_remote_wakeup, this could result in enqueue_entity() wrecking ->vruntime and causing scheduling artifacts. Fixes: c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p->on_cpu") Reported-by: Mel Gorman <mgorman@techsingularity.net> Debugged-by: Will Deacon <will@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201117083016.GK3121392@hirez.programming.kicks-ass.net