summaryrefslogtreecommitdiff
path: root/include/uapi/linux
AgeCommit message (Collapse)AuthorFilesLines
2021-07-19net: fix mistake path for netdev_features_stringsJian Shen1-2/+2
[ Upstream commit 2d8ea148e553e1dd4e80a87741abdfb229e2b323 ] Th_strings arrays netdev_features_strings, tunable_strings, and phy_tunable_strings has been moved to file net/ethtool/common.c. So fixes the comment. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14media: vicodec: Use _BITUL() macro in UAPI headersJoe Richey1-11/+12
[ Upstream commit ce67eaca95f8ab5c6aae41a10adfe9a6e8efa58c ] Replace BIT() in v4l2's UPAI header with _BITUL(). BIT() is not defined in the UAPI headers and its usage may cause userspace build errors. Fixes: 206bc0f6fb94 ("media: vicodec: mark the stateless FWHT API as stable") Signed-off-by: Joe Richey <joerichey@google.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-23icmp: don't send out ICMP messages with a source address of 0.0.0.0Toke Høiland-Jørgensen1-0/+3
[ Upstream commit 321827477360934dc040e9d3c626bf1de6c3ab3c ] When constructing ICMP response messages, the kernel will try to pick a suitable source address for the outgoing packet. However, if no IPv4 addresses are configured on the system at all, this will fail and we end up producing an ICMP message with a source address of 0.0.0.0. This can happen on a box routing IPv4 traffic via v6 nexthops, for instance. Since 0.0.0.0 is not generally routable on the internet, there's a good chance that such ICMP messages will never make it back to the sender of the original packet that the ICMP message was sent in response to. This, in turn, can create connectivity and PMTUd problems for senders. Fortunately, RFC7600 reserves a dummy address to be used as a source for ICMP messages (192.0.0.8/32), so let's teach the kernel to substitute that address as a last resort if the regular source address selection procedure fails. Below is a quick example reproducing this issue with network namespaces: ip netns add ns0 ip l add type veth peer netns ns0 ip l set dev veth0 up ip a add 10.0.0.1/24 dev veth0 ip a add fc00:dead:cafe:42::1/64 dev veth0 ip r add 10.1.0.0/24 via inet6 fc00:dead:cafe:42::2 ip -n ns0 l set dev veth0 up ip -n ns0 a add fc00:dead:cafe:42::2/64 dev veth0 ip -n ns0 r add 10.0.0.0/24 via inet6 fc00:dead:cafe:42::1 ip netns exec ns0 sysctl -w net.ipv4.icmp_ratelimit=0 ip netns exec ns0 sysctl -w net.ipv4.ip_forward=1 tcpdump -tpni veth0 -c 2 icmp & ping -w 1 10.1.0.1 > /dev/null tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on veth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes IP 10.0.0.1 > 10.1.0.1: ICMP echo request, id 29, seq 1, length 64 IP 0.0.0.0 > 10.0.0.1: ICMP net 10.1.0.1 unreachable, length 92 2 packets captured 2 packets received by filter 0 packets dropped by kernel With this patch the above capture changes to: IP 10.0.0.1 > 10.1.0.1: ICMP echo request, id 31127, seq 1, length 64 IP 192.0.0.8 > 10.0.0.1: ICMP net 10.1.0.1 unreachable, length 92 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Juliusz Chroboczek <jch@irif.fr> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-18HID: hid-input: add mapping for emoji picker keyDmitry Torokhov1-0/+1
[ Upstream commit 7b229b13d78d112e2c5d4a60a3c6f602289959fa ] HUTRR101 added a new usage code for a key that is supposed to invoke and dismiss an emoji picker widget to assist users to locate and enter emojis. This patch adds a new key definition KEY_EMOJI_PICKER and maps 0x0c/0x0d9 usage code to this new keycode. Additionally hid-debug is adjusted to recognize this new usage code as well. Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-03KVM: X86: Use _BITUL() macro in UAPI headersJoe Richey1-2/+3
commit fb1070d18edb37daf3979662975bc54625a19953 upstream. Replace BIT() in KVM's UPAI header with _BITUL(). BIT() is not defined in the UAPI headers and its usage may cause userspace build errors. Fixes: fb04a1eddb1a ("KVM: X86: Implement ring-based dirty memory tracking") Signed-off-by: Joe Richey <joerichey@google.com> Message-Id: <20210521085849.37676-3-joerichey94@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-19netfilter: xt_SECMARK: add new revision to fix structure layoutPablo Neira Ayuso1-0/+6
[ Upstream commit c7d13358b6a2f49f81a34aa323a2d0878a0532a2 ] This extension breaks when trying to delete rules, add a new revision to fix this. Fixes: 5e6874cdb8de ("[SECMARK]: Add xtables SECMARK target") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14tty: actually undefine superseded ASYNC flagsJohan Hovold1-2/+2
[ Upstream commit d09845e98a05850a8094ea8fd6dd09a8e6824fff ] Some kernel-internal ASYNC flags have been superseded by tty-port flags and should no longer be used by kernel drivers. Fix the misspelled "__KERNEL__" compile guards which failed their sole purpose to break out-of-tree drivers that have not yet been updated. Fixes: 5c0517fefc92 ("tty: core: Undefine ASYNC_* flags superceded by TTY_PORT* flags") Signed-off-by: Johan Hovold <johan@kernel.org> Link: https://lore.kernel.org/r/20210407095208.31838-2-johan@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12usb: webcam: Invalid size of Processing Unit DescriptorPawel Laszczak1-1/+2
[ Upstream commit 6a154ec9ef6762c774cd2b50215c7a8f0f08a862 ] According with USB Device Class Definition for Video Device the Processing Unit Descriptor bLength should be 12 (10 + bmControlSize), but it has 11. Invalid length caused that Processing Unit Descriptor Test Video form CV tool failed. To fix this issue patch adds bmVideoStandards into uvc_processing_unit_descriptor structure. The bmVideoStandards field was added in UVC 1.1 and it wasn't part of UVC 1.0a. Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Pawel Laszczak <pawell@cadence.com> Reviewed-by: Peter Chen <peter.chen@kernel.org> Link: https://lore.kernel.org/r/20210315071748.29706-1-pawell@gli-login.cadence.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-04-21capabilities: require CAP_SETFCAP to map uid 0Serge E. Hallyn1-1/+2
cap_setfcap is required to create file capabilities. Since commit 8db6c34f1dbc ("Introduce v3 namespaced file capabilities"), a process running as uid 0 but without cap_setfcap is able to work around this as follows: unshare a new user namespace which maps parent uid 0 into the child namespace. While this task will not have new capabilities against the parent namespace, there is a loophole due to the way namespaced file capabilities are represented as xattrs. File capabilities valid in userns 1 are distinguished from file capabilities valid in userns 2 by the kuid which underlies uid 0. Therefore the restricted root process can unshare a new self-mapping namespace, add a namespaced file capability onto a file, then use that file capability in the parent namespace. To prevent that, do not allow mapping parent uid 0 if the process which opened the uid_map file does not have CAP_SETFCAP, which is the capability for setting file capabilities. As a further wrinkle: a task can unshare its user namespace, then open its uid_map file itself, and map (only) its own uid. In this case we do not have the credential from before unshare, which was potentially more restricted. So, when creating a user namespace, we record whether the creator had CAP_SETFCAP. Then we can use that during map_write(). With this patch: 1. Unprivileged user can still unshare -Ur ubuntu@caps:~$ unshare -Ur root@caps:~# logout 2. Root user can still unshare -Ur ubuntu@caps:~$ sudo bash root@caps:/home/ubuntu# unshare -Ur root@caps:/home/ubuntu# logout 3. Root user without CAP_SETFCAP cannot unshare -Ur: root@caps:/home/ubuntu# /sbin/capsh --drop=cap_setfcap -- root@caps:/home/ubuntu# /sbin/setcap cap_setfcap=p /sbin/setcap unable to set CAP_SETFCAP effective capability: Operation not permitted root@caps:/home/ubuntu# unshare -Ur unshare: write failed /proc/self/uid_map: Operation not permitted Note: an alternative solution would be to allow uid 0 mappings by processes without CAP_SETFCAP, but to prevent such a namespace from writing any file capabilities. This approach can be seen at [1]. Background history: commit 95ebabde382 ("capabilities: Don't allow writing ambiguous v3 file capabilities") tried to fix the issue by preventing v3 fscaps to be written to disk when the root uid would map to the same uid in nested user namespaces. This led to regressions for various workloads. For example, see [2]. Ultimately this is a valid use-case we have to support meaning we had to revert this change in 3b0c2d3eaa83 ("Revert 95ebabde382c ("capabilities: Don't allow writing ambiguous v3 file capabilities")"). Link: https://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux.git/log/?h=2021-04-15/setfcap-nsfscaps-v4 [1] Link: https://github.com/containers/buildah/issues/3071 [2] Signed-off-by: Serge Hallyn <serge@hallyn.com> Reviewed-by: Andrew G. Morgan <morgan@kernel.org> Tested-by: Christian Brauner <christian.brauner@ubuntu.com> Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com> Tested-by: Giuseppe Scrivano <gscrivan@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-14Merge tag 'dmaengine-fix-5.12' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine Pull dmaengine fixes from Vinod Koul: "A couple of dmaengine driver fixes for: - race and descriptor issue for xilinx driver - fix interrupt handling, wq state & cleanup, field sizes for completion, msix permissions for idxd driver - runtime pm fix for tegra driver - double free fix in dma_async_device_register" * tag 'dmaengine-fix-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: dmaengine: idxd: fix wq cleanup of WQCFG registers dmaengine: idxd: clear MSIX permission entry on shutdown dmaengine: plx_dma: add a missing put_device() on error path dmaengine: tegra20: Fix runtime PM imbalance on error dmaengine: Fix a double free in dma_async_device_register dmaengine: dw: Make it dependent to HAS_IOMEM dmaengine: idxd: fix wq size store permission state dmaengine: idxd: fix opcap sysfs attribute output dmaengine: idxd: fix delta_rec and crc size field for completion record dmaengine: idxd: Fix clobbering of SWERR overflow bit on writeback dmaengine: xilinx: dpdma: Fix race condition in done IRQ dmaengine: xilinx: dpdma: Fix descriptor issuing on video group
2021-04-12dmaengine: idxd: fix delta_rec and crc size field for completion recordDave Jiang1-2/+2
The delta_rec_size and crc_val in the completion record should be 32bits and not 16bits. Fixes: bfe1d56091c1 ("dmaengine: idxd: Init and probe for Intel data accelerators") Reported-by: Nikhil Rao <nikhil.rao@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/161645618572.2003490.14466173451736323035.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2021-04-10Merge tag 'net-5.12-rc7' of ↵Linus Torvalds3-34/+102
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes for 5.12-rc7, including fixes from can, ipsec, mac80211, wireless, and bpf trees. No scary regressions here or in the works, but small fixes for 5.12 changes keep coming. Current release - regressions: - virtio: do not pull payload in skb->head - virtio: ensure mac header is set in virtio_net_hdr_to_skb() - Revert "net: correct sk_acceptq_is_full()" - mptcp: revert "mptcp: provide subflow aware release function" - ethernet: lan743x: fix ethernet frame cutoff issue - dsa: fix type was not set for devlink port - ethtool: remove link_mode param and derive link params from driver - sched: htb: fix null pointer dereference on a null new_q - wireless: iwlwifi: Fix softirq/hardirq disabling in iwl_pcie_enqueue_hcmd() - wireless: iwlwifi: fw: fix notification wait locking - wireless: brcmfmac: p2p: Fix deadlock introduced by avoiding the rtnl dependency Current release - new code bugs: - napi: fix hangup on napi_disable for threaded napi - bpf: take module reference for trampoline in module - wireless: mt76: mt7921: fix airtime reporting and related tx hangs - wireless: iwlwifi: mvm: rfi: don't lock mvm->mutex when sending config command Previous releases - regressions: - rfkill: revert back to old userspace API by default - nfc: fix infinite loop, refcount & memory leaks in LLCP sockets - let skb_orphan_partial wake-up waiters - xfrm/compat: Cleanup WARN()s that can be user-triggered - vxlan, geneve: do not modify the shared tunnel info when PMTU triggers an ICMP reply - can: fix msg_namelen values depending on CAN_REQUIRED_SIZE - can: uapi: mark union inside struct can_frame packed - sched: cls: fix action overwrite reference counting - sched: cls: fix err handler in tcf_action_init() - ethernet: mlxsw: fix ECN marking in tunnel decapsulation - ethernet: nfp: Fix a use after free in nfp_bpf_ctrl_msg_rx - ethernet: i40e: fix receiving of single packets in xsk zero-copy mode - ethernet: cxgb4: avoid collecting SGE_QBASE regs during traffic Previous releases - always broken: - bpf: Refuse non-O_RDWR flags in BPF_OBJ_GET - bpf: Refcount task stack in bpf_get_task_stack - bpf, x86: Validate computation of branch displacements - ieee802154: fix many similar syzbot-found bugs - fix NULL dereferences in netlink attribute handling - reject unsupported operations on monitor interfaces - fix error handling in llsec_key_alloc() - xfrm: make ipv4 pmtu check honor ip header df - xfrm: make hash generation lock per network namespace - xfrm: esp: delete NETIF_F_SCTP_CRC bit from features for esp offload - ethtool: fix incorrect datatype in set_eee ops - xdp: fix xdp_return_frame() kernel BUG throw for page_pool memory model - openvswitch: fix send of uninitialized stack memory in ct limit reply Misc: - udp: add get handling for UDP_GRO sockopt" * tag 'net-5.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (182 commits) net: fix hangup on napi_disable for threaded napi net: hns3: Trivial spell fix in hns3 driver lan743x: fix ethernet frame cutoff issue net: ipv6: check for validity before dereferencing cfg->fc_nlinfo.nlh net: dsa: lantiq_gswip: Configure all remaining GSWIP_MII_CFG bits net: dsa: lantiq_gswip: Don't use PHY auto polling net: sched: sch_teql: fix null-pointer dereference ipv6: report errors for iftoken via netlink extack net: sched: fix err handler in tcf_action_init() net: sched: fix action overwrite reference counting Revert "net: sched: bump refcount for new action in ACT replace mode" ice: fix memory leak of aRFS after resuming from suspend i40e: Fix sparse warning: missing error code 'err' i40e: Fix sparse error: 'vsi->netdev' could be null i40e: Fix sparse error: uninitialized symbol 'ring' i40e: Fix sparse errors in i40e_txrx.c i40e: Fix parameters in aq_get_phy_register() nl80211: fix beacon head validation bpf, x86: Validate computation of branch displacements for x86-32 bpf, x86: Validate computation of branch displacements for x86-64 ...
2021-04-08rfkill: revert back to old userspace API by defaultJohannes Berg1-12/+68
Recompiling with the new extended version of struct rfkill_event broke systemd in *two* ways: - It used "sizeof(struct rfkill_event)" to read the event, but then complained if it actually got something != 8, this broke it on new kernels (that include the updated API); - It used sizeof(struct rfkill_event) to write a command, but didn't implement the intended expansion protocol where the kernel returns only how many bytes it accepted, and errored out due to the unexpected smaller size on kernels that didn't include the updated API. Even though systemd has now been fixed, that fix may not be always deployed, and other applications could potentially have similar issues. As such, in the interest of avoiding regressions, revert the default API "struct rfkill_event" back to the original size. Instead, add a new "struct rfkill_event_ext" that extends it by the new field, and even more clearly document that applications should be prepared for extensions in two ways: * write might only accept fewer bytes on older kernels, and will return how many to let userspace know which data may have been ignored; * read might return anything between 8 (the original size) and whatever size the application sized its buffer at, indicating how much event data was supported by the kernel. Perhaps that will help avoid such issues in the future and we won't have to come up with another version of the struct if we ever need to extend it again. Applications that want to take advantage of the new field will have to be modified to use struct rfkill_event_ext instead now, which comes with the danger of them having already been updated to use it from 'struct rfkill_event', but I found no evidence of that, and it's still relatively new. Cc: stable@vger.kernel.org # 5.11 Reported-by: Takashi Iwai <tiwai@suse.de> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM/Clang v12.0.0-r4 (x86-64) Link: https://lore.kernel.org/r/20210319232510.f1a139cfdd9c.Ic5c7c9d1d28972059e132ea653a21a427c326678@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-04-08ethtool: fix kdoc in headersJakub Kicinski1-0/+6
Fix remaining issues with kdoc in the ethtool headers. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-08ethtool: document reserved fields in the uAPIJakub Kicinski1-1/+21
Add a note on expected handling of reserved fields, and references to all kdocs. This fixes a bunch of kdoc warnings. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-08ethtool: un-kdocify extended link stateJakub Kicinski1-20/+6
Extended link state structures and enums use kdoc headers but then do not describe any of the members. Convert to normal comments. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-03Merge tag 'block-5.12-2021-04-02' of git://git.kernel.dk/linux-blockLinus Torvalds1-26/+2
Pull block fixes from Jens Axboe: - Remove comment that never came to fruition in 22 years of development (Christoph) - Remove unused request flag (Christoph) - Fix for null_blk fake timeout handling (Damien) - Fix for IOCB_NOWAIT being ignored for O_DIRECT on raw bdevs (Pavel) - Error propagation fix for multiple split bios (Yufen) * tag 'block-5.12-2021-04-02' of git://git.kernel.dk/linux-block: block: remove the unused RQF_ALLOCED flag block: update a few comments in uapi/linux/blkpg.h block: don't ignore REQ_NOWAIT for direct IO null_blk: fix command timeout completion handling block: only update parent bi_status when bio fail
2021-04-02block: update a few comments in uapi/linux/blkpg.hChristoph Hellwig1-26/+2
The big top of the file comment talk about grand plans that never happened, so remove them to not confuse the readers. Also mark the devname and volname fields as ignored as they were never used by the kernel. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-29can: uapi: can.h: mark union inside struct can_frame packedMarc Kleine-Budde1-1/+1
In commit ea7800565a12 ("can: add optional DLC element to Classical CAN frame structure") the struct can_frame::can_dlc was put into an anonymous union with another u8 variable. For various reasons some members in struct can_frame and canfd_frame including the first 8 byes of data are expected to have the same memory layout. This is enforced by a BUILD_BUG_ON check in af_can.c. Since the above mentioned commit this check fails on ARM kernels compiled with the ARM OABI (which means CONFIG_AEABI not set). In this case -mabi=apcs-gnu is passed to the compiler, which leads to a structure size boundary of 32, instead of 8 compared to CONFIG_AEABI enabled. This means the the union in struct can_frame takes 4 bytes instead of the expected 1. Rong Chen illustrates the problem with pahole in the ARM OABI case: | struct can_frame { | canid_t can_id; /* 0 4 */ | union { | __u8 len; /* 4 1 */ | __u8 can_dlc; /* 4 1 */ | }; /* 4 4 */ | __u8 __pad; /* 8 1 */ | __u8 __res0; /* 9 1 */ | __u8 len8_dlc; /* 10 1 */ | | /* XXX 5 bytes hole, try to pack */ | | __u8 data[8] | __attribute__((__aligned__(8))); /* 16 8 */ | | /* size: 24, cachelines: 1, members: 6 */ | /* sum members: 19, holes: 1, sum holes: 5 */ | /* forced alignments: 1, forced holes: 1, sum forced holes: 5 */ | /* last cacheline: 24 bytes */ | } __attribute__((__aligned__(8))); Marking the anonymous union as __attribute__((packed)) fixes the BUILD_BUG_ON problem on these compilers. Fixes: ea7800565a12 ("can: add optional DLC element to Classical CAN frame structure") Reported-by: kernel test robot <lkp@intel.com> Suggested-by: Rong Chen <rong.a.chen@intel.com> Link: https://lore.kernel.org/linux-can/2c82ec23-3551-61b5-1bd8-178c3407ee83@hartkopp.net/ Link: https://lore.kernel.org/r/20210325125850.1620-3-socketcan@hartkopp.net Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-03-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds2-9/+12
Pull networking fixes from David Miller: "Various fixes, all over: 1) Fix overflow in ptp_qoriq_adjfine(), from Yangbo Lu. 2) Always store the rx queue mapping in veth, from Maciej Fijalkowski. 3) Don't allow vmlinux btf in map_create, from Alexei Starovoitov. 4) Fix memory leak in octeontx2-af from Colin Ian King. 5) Use kvalloc in bpf x86 JIT for storing jit'd addresses, from Yonghong Song. 6) Fix tx ptp stats in mlx5, from Aya Levin. 7) Check correct ip version in tun decap, fropm Roi Dayan. 8) Fix rate calculation in mlx5 E-Switch code, from arav Pandit. 9) Work item memork leak in mlx5, from Shay Drory. 10) Fix ip6ip6 tunnel crash with bpf, from Daniel Borkmann. 11) Lack of preemptrion awareness in macvlan, from Eric Dumazet. 12) Fix data race in pxa168_eth, from Pavel Andrianov. 13) Range validate stab in red_check_params(), from Eric Dumazet. 14) Inherit vlan filtering setting properly in b53 driver, from Florian Fainelli. 15) Fix rtnl locking in igc driver, from Sasha Neftin. 16) Pause handling fixes in igc driver, from Muhammad Husaini Zulkifli. 17) Missing rtnl locking in e1000_reset_task, from Vitaly Lifshits. 18) Use after free in qlcnic, from Lv Yunlong. 19) fix crash in fritzpci mISDN, from Tong Zhang. 20) Premature rx buffer reuse in igb, from Li RongQing. 21) Missing termination of ip[a driver message handler arrays, from Alex Elder. 22) Fix race between "x25_close" and "x25_xmit"/"x25_rx" in hdlc_x25 driver, from Xie He. 23) Use after free in c_can_pci_remove(), from Tong Zhang. 24) Uninitialized variable use in nl80211, from Jarod Wilson. 25) Off by one size calc in bpf verifier, from Piotr Krysiuk. 26) Use delayed work instead of deferrable for flowtable GC, from Yinjun Zhang. 27) Fix infinite loop in NPC unmap of octeontx2 driver, from Hariprasad Kelam. 28) Fix being unable to change MTU of dwmac-sun8i devices due to lack of fifo sizes, from Corentin Labbe. 29) DMA use after free in r8169 with WoL, fom Heiner Kallweit. 30) Mismatched prototypes in isdn-capi, from Arnd Bergmann. 31) Fix psample UAPI breakage, from Ido Schimmel" * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (171 commits) psample: Fix user API breakage math: Export mul_u64_u64_div_u64 ch_ktls: fix enum-conversion warning octeontx2-af: Fix memory leak of object buf ptp_qoriq: fix overflow in ptp_qoriq_adjfine() u64 calcalation net: bridge: don't notify switchdev for local FDB addresses net/sched: act_ct: clear post_ct if doing ct_clear net: dsa: don't assign an error value to tag_ops isdn: capi: fix mismatched prototypes net/mlx5: SF, do not use ecpu bit for vhca state processing net/mlx5e: Fix division by 0 in mlx5e_select_queue net/mlx5e: Fix error path for ethtool set-priv-flag net/mlx5e: Offload tuple rewrite for non-CT flows net/mlx5e: Allow to match on MPLS parameters only for MPLS over UDP net/mlx5: Add back multicast stats for uplink representor net: ipconfig: ic_dev can be NULL in ic_close_devs MAINTAINERS: Combine "QLOGIC QLGE 10Gb ETHERNET DRIVER" sections into one docs: networking: Fix a typo r8169: fix DMA being used after buffer free if WoL is enabled net: ipa: fix init header command validation ...
2021-03-25psample: Fix user API breakageIdo Schimmel1-4/+1
Cited commit added a new attribute before the existing group reference count attribute, thereby changing its value and breaking existing applications on new kernels. Before: # psample -l libpsample ERROR psample_group_foreach: failed to recv message: Operation not supported After: # psample -l Group Num Refcount Group Seq 1 1 0 Fix by restoring the value of the old attribute and remove the misleading comments from the enumerator to avoid future bugs. Cc: stable@vger.kernel.org Fixes: d8bed686ab96 ("net: psample: Add tunnel support") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reported-by: Adiel Bidani <adielb@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-16Merge tag 'fuse-fixes-5.12-rc4' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse fixes from Miklos Szeredi: "Fix a deadlock and a couple of other bugs" * tag 'fuse-fixes-5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: 32-bit user space ioctl compat for fuse device virtiofs: Fail dax mount if device does not support it fuse: fix live lock in fuse_iget()
2021-03-16fuse: 32-bit user space ioctl compat for fuse deviceAlessio Balsini1-1/+2
With a 64-bit kernel build the FUSE device cannot handle ioctl requests coming from 32-bit user space. This is due to the ioctl command translation that generates different command identifiers that thus cannot be used for direct comparisons without proper manipulation. Explicitly extract type and number from the ioctl command to enable 32-bit user space compatibility on 64-bit kernel builds. Signed-off-by: Alessio Balsini <balsini@android.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-03-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller1-5/+11
Daniel Borkmann says: ==================== pull-request: bpf 2021-03-10 The following pull-request contains BPF updates for your *net* tree. We've added 8 non-merge commits during the last 5 day(s) which contain a total of 11 files changed, 136 insertions(+), 17 deletions(-). The main changes are: 1) Reject bogus use of vmlinux BTF as map/prog creation BTF, from Alexei Starovoitov. 2) Fix allocation failure splat in x86 JIT for large progs. Also fix overwriting percpu cgroup storage from tracing programs when nested, from Yonghong Song. 3) Fix rx queue retrieval in XDP for multi-queue veth, from Maciej Fijalkowski. 4) Fix bpf_check_mtu() helper API before freeze to have mtu_len as custom skb/xdp L3 input length, from Jesper Dangaard Brouer. 5) Fix inode_storage's lookup_elem return value upon having bad fd, from Tal Lossos. 6) Fix bpftool and libbpf cross-build on MacOS, from Georgi Valkov. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-10Merge git://git.kernel.org:/pub/scm/linux/kernel/git/netdev/netLinus Torvalds3-2/+2
Pull networking fixes from David Miller: 1) Fix transmissions in dynamic SMPS mode in ath9k, from Felix Fietkau. 2) TX skb error handling fix in mt76 driver, also from Felix. 3) Fix BPF_FETCH atomic in x86 JIT, from Brendan Jackman. 4) Avoid double free of percpu pointers when freeing a cloned bpf prog. From Cong Wang. 5) Use correct printf format for dma_addr_t in ath11k, from Geert Uytterhoeven. 6) Fix resolve_btfids build with older toolchains, from Kun-Chuan Hsieh. 7) Don't report truncated frames to mac80211 in mt76 driver, from Lorenzop Bianconi. 8) Fix watcdog timeout on suspend/resume of stmmac, from Joakim Zhang. 9) mscc ocelot needs NET_DEVLINK selct in Kconfig, from Arnd Bergmann. 10) Fix sign comparison bug in TCP_ZEROCOPY_RECEIVE getsockopt(), from Arjun Roy. 11) Ignore routes with deleted nexthop object in mlxsw, from Ido Schimmel. 12) Need to undo tcp early demux lookup sometimes in nf_nat, from Florian Westphal. 13) Fix gro aggregation for udp encaps with zero csum, from Daniel Borkmann. 14) Make sure to always use imp*_ndo_send when necessaey, from Jason A. Donenfeld. 15) Fix TRSCER masks in sh_eth driver from Sergey Shtylyov. 16) prevent overly huge skb allocationsd in qrtr, from Pavel Skripkin. 17) Prevent rx ring copnsumer index loss of sync in enetc, from Vladimir Oltean. 18) Make sure textsearch copntrol block is large enough, from Wilem de Bruijn. 19) Revert MAC changes to r8152 leading to instability, from Hates Wang. 20) Advance iov in 9p even for empty reads, from Jissheng Zhang. 21) Double hook unregister in nftables, from PabloNeira Ayuso. 22) Fix memleak in ixgbe, fropm Dinghao Liu. 23) Avoid dups in pkt scheduler class dumps, from Maximilian Heyne. 24) Various mptcp fixes from Florian Westphal, Paolo Abeni, and Geliang Tang. 25) Fix DOI refcount bugs in cipso, from Paul Moore. 26) One too many irqsave in ibmvnic, from Junlin Yang. 27) Fix infinite loop with MPLS gso segmenting via virtio_net, from Balazs Nemeth. * git://git.kernel.org:/pub/scm/linux/kernel/git/netdev/net: (164 commits) s390/qeth: fix notification for pending buffers during teardown s390/qeth: schedule TX NAPI on QAOB completion s390/qeth: improve completion of pending TX buffers s390/qeth: fix memory leak after failed TX Buffer allocation net: avoid infinite loop in mpls_gso_segment when mpls_hlen == 0 net: check if protocol extracted by virtio_net_hdr_set_proto is correct net: dsa: xrs700x: check if partner is same as port in hsr join net: lapbether: Remove netif_start_queue / netif_stop_queue atm: idt77252: fix null-ptr-dereference atm: uPD98402: fix incorrect allocation atm: fix a typo in the struct description net: qrtr: fix error return code of qrtr_sendmsg() mptcp: fix length of ADD_ADDR with port sub-option net: bonding: fix error return code of bond_neigh_init() net: enetc: allow hardware timestamping on TX queues with tc-etf enabled net: enetc: set MAC RX FIFO to recommended value net: davicom: Use platform_get_irq_optional() net: davicom: Fix regulator not turned off on driver removal net: davicom: Fix regulator not turned off on failed probe net: dsa: fix switchdev objects on bridge master mistakenly being applied on ports ...
2021-03-09bpf: BPF-helper for MTU checking add length inputJesper Dangaard Brouer1-5/+11
The FIB lookup example[1] show how the IP-header field tot_len (iph->tot_len) is used as input to perform the MTU check. This patch extend the BPF-helper bpf_check_mtu() with the same ability to provide the length as user parameter input, via mtu_len parameter. This still needs to be done before the bpf_check_mtu() helper API becomes frozen. [1] samples/bpf/xdp_fwd_kern.c Fixes: 34b2021cc616 ("bpf: Add BPF-helper for MTU checking") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/161521555850.3515614.6533850861569774444.stgit@firesoul
2021-03-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller1-1/+1
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Fix incorrect enum type definition in nfnetlink_cthelper UAPI, from Dmitry V. Levin. 2) Remove extra space in deprecated automatic helper assignment notice, from Klemen Košir. 3) Drop early socket demux socket after NAT mangling, from Florian Westphal. Add a test to exercise this bug. 4) Fix bogus invalid packet report in the conntrack TCP tracker, also from Florian. 5) Fix access to xt[NFPROTO_UNSPEC] list with no mutex in target/match_revfn(), from Vasily Averin. 6) Disallow updates on the table ownership flag. 7) Fix double hook unregistration of tables with owner. 8) Remove bogus check on the table owner in __nft_release_tables(). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-0/+13
Pull KVM fixes from Paolo Bonzini: - Doc fixes - selftests fixes - Add runstate information to the new Xen support - Allow compiling out the Xen interface - 32-bit PAE without EPT bugfix - NULL pointer dereference bugfix * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: SVM: Clear the CR4 register on reset KVM: x86/xen: Add support for vCPU runstate information KVM: x86/xen: Fix return code when clearing vcpu_info and vcpu_time_info selftests: kvm: Mmap the entire vcpu mmap area KVM: Documentation: Fix index for KVM_CAP_PPC_DAWR1 KVM: x86: allow compiling out the Xen hypercall interface KVM: xen: flush deferred static key before checking it KVM: x86/mmu: Set SPTE_AD_WRPROT_ONLY_MASK if and only if PML is enabled KVM: x86: hyper-v: Fix Hyper-V context null-ptr-deref KVM: x86: remove misplaced comment on active_mmu_pages KVM: Documentation: rectify rst markup in kvm_run->flags Documentation: kvm: fix messy conversion from .txt to .rst
2021-03-04net: l2tp: reduce log level of messages in receive path, add counter insteadMatthias Schiffer1-0/+1
Commit 5ee759cda51b ("l2tp: use standard API for warning log messages") changed a number of warnings about invalid packets in the receive path so that they are always shown, instead of only when a special L2TP debug flag is set. Even with rate limiting these warnings can easily cause significant log spam - potentially triggered by a malicious party sending invalid packets on purpose. In addition these warnings were noticed by projects like Tunneldigger [1], which uses L2TP for its data path, but implements its own control protocol (which is sufficiently different from L2TP data packets that it would always be passed up to userspace even with future extensions of L2TP). Some of the warnings were already redundant, as l2tp_stats has a counter for these packets. This commit adds one additional counter for invalid packets that are passed up to userspace. Packets with unknown session are not counted as invalid, as there is nothing wrong with the format of these packets. With the additional counter, all of these messages are either redundant or benign, so we reduce them to pr_debug_ratelimited(). [1] https://github.com/wlanslovenija/tunneldigger/issues/160 Fixes: 5ee759cda51b ("l2tp: use standard API for warning log messages") Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-02KVM: x86/xen: Add support for vCPU runstate informationDavid Woodhouse1-0/+13
This is how Xen guests do steal time accounting. The hypervisor records the amount of time spent in each of running/runnable/blocked/offline states. In the Xen accounting, a vCPU is still in state RUNSTATE_running while in Xen for a hypercall or I/O trap, etc. Only if Xen explicitly schedules does the state become RUNSTATE_blocked. In KVM this means that even when the vCPU exits the kvm_run loop, the state remains RUNSTATE_running. The VMM can explicitly set the vCPU to RUNSTATE_blocked by using the KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_CURRENT attribute, and can also use KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST to retrospectively add a given amount of time to the blocked state and subtract it from the running state. The state_entry_time corresponds to get_kvmclock_ns() at the time the vCPU entered the current state, and the total times of all four states should always add up to state_entry_time. Co-developed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <20210301125309.874953-2-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-28uapi: nfnetlink_cthelper.h: fix userspace compilation errorDmitry V. Levin1-1/+1
Apparently, <linux/netfilter/nfnetlink_cthelper.h> and <linux/netfilter/nfnetlink_acct.h> could not be included into the same compilation unit because of a cut-and-paste typo in the former header. Fixes: 12f7a505331e6 ("netfilter: add user-space connection tracking helper infrastructure") Cc: <stable@vger.kernel.org> # v3.6 Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-02-27Merge tag 'io_uring-worker.v3-2021-02-25' of git://git.kernel.dk/linux-blockLinus Torvalds1-0/+1
Pull io_uring thread rewrite from Jens Axboe: "This converts the io-wq workers to be forked off the tasks in question instead of being kernel threads that assume various bits of the original task identity. This kills > 400 lines of code from io_uring/io-wq, and it's the worst part of the code. We've had several bugs in this area, and the worry is always that we could be missing some pieces for file types doing unusual things (recent /dev/tty example comes to mind, userfaultfd reads installing file descriptors is another fun one... - both of which need special handling, and I bet it's not the last weird oddity we'll find). With these identical workers, we can have full confidence that we're never missing anything. That, in itself, is a huge win. Outside of that, it's also more efficient since we're not wasting space and code on tracking state, or switching between different states. I'm sure we're going to find little things to patch up after this series, but testing has been pretty thorough, from the usual regression suite to production. Any issue that may crop up should be manageable. There's also a nice series of further reductions we can do on top of this, but I wanted to get the meat of it out sooner rather than later. The general worry here isn't that it's fundamentally broken. Most of the little issues we've found over the last week have been related to just changes in how thread startup/exit is done, since that's the main difference between using kthreads and these kinds of threads. In fact, if all goes according to plan, I want to get this into the 5.10 and 5.11 stable branches as well. That said, the changes outside of io_uring/io-wq are: - arch setup, simple one-liner to each arch copy_thread() implementation. - Removal of net and proc restrictions for io_uring, they are no longer needed or useful" * tag 'io_uring-worker.v3-2021-02-25' of git://git.kernel.dk/linux-block: (30 commits) io-wq: remove now unused IO_WQ_BIT_ERROR io_uring: fix SQPOLL thread handling over exec io-wq: improve manager/worker handling over exec io_uring: ensure SQPOLL startup is triggered before error shutdown io-wq: make buffered file write hashed work map per-ctx io-wq: fix race around io_worker grabbing io-wq: fix races around manager/worker creation and task exit io_uring: ensure io-wq context is always destroyed for tasks arch: ensure parisc/powerpc handle PF_IO_WORKER in copy_thread() io_uring: cleanup ->user usage io-wq: remove nr_process accounting io_uring: flag new native workers with IORING_FEAT_NATIVE_WORKERS net: remove cmsg restriction from io_uring based send/recvmsg calls Revert "proc: don't allow async path resolution of /proc/self components" Revert "proc: don't allow async path resolution of /proc/thread-self components" io_uring: move SQPOLL thread io-wq forked worker io-wq: make io_wq_fork_thread() available to other users io-wq: only remove worker from free_list, if it was there io_uring: remove io_identity io_uring: remove any grabbing of context ...
2021-02-27Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski1-1/+0
Alexei Starovoitov says: ==================== pull-request: bpf 2021-02-26 1) Fix for bpf atomic insns with src_reg=r0, from Brendan. 2) Fix use after free due to bpf_prog_clone, from Cong. 3) Drop imprecise verifier log message, from Dmitrii. 4) Remove incorrect blank line in bpf helper description, from Hangbin. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: No need to drop the packet when there is no geneve opt bpf: Remove blank line in bpf helper description comment tools/resolve_btfids: Fix build error with older host toolchains selftests/bpf: Fix a compiler warning in global func test bpf: Drop imprecise log message bpf: Clear percpu pointers in bpf_prog_clone_free() bpf: Fix a warning message in mark_ptr_not_null_reg() bpf, x86: Fix BPF_FETCH atomic and/or/xor with r0 as src ==================== Link: https://lore.kernel.org/r/20210226193737.57004-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-26Merge branch 'akpm' (patches from Andrew)Linus Torvalds2-2/+2
Merge more updates from Andrew Morton: "118 patches: - The rest of MM. Includes kfence - another runtime memory validator. Not as thorough as KASAN, but it has unmeasurable overhead and is intended to be usable in production builds. - Everything else Subsystems affected by this patch series: alpha, procfs, sysctl, misc, core-kernel, MAINTAINERS, lib, bitops, checkpatch, init, coredump, seq_file, gdb, ubsan, initramfs, and mm (thp, cma, vmstat, memory-hotplug, mlock, rmap, zswap, zsmalloc, cleanups, kfence, kasan2, and pagemap2)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits) MIPS: make userspace mapping young by default initramfs: panic with memory information ubsan: remove overflow checks kgdb: fix to kill breakpoints on initmem after boot scripts/gdb: fix list_for_each x86: fix seq_file iteration for pat/memtype.c seq_file: document how per-entry resources are managed. fs/coredump: use kmap_local_page() init/Kconfig: fix a typo in CC_VERSION_TEXT help text init: clean up early_param_on_off() macro init/version.c: remove Version_<LINUX_VERSION_CODE> symbol checkpatch: do not apply "initialise globals to 0" check to BPF progs checkpatch: don't warn about colon termination in linker scripts checkpatch: add kmalloc_array_node to unnecessary OOM message check checkpatch: add warning for avoiding .L prefix symbols in assembly files checkpatch: improve TYPECAST_INT_CONSTANT test message checkpatch: prefer ftrace over function entry/exit printks checkpatch: trivial style fixes checkpatch: ignore warning designated initializers using NR_CPUS checkpatch: improve blank line after declaration test ...
2021-02-26include/linux: remove repeated wordsRandy Dunlap2-2/+2
Drop the doubled word "for" in a comment. {firewire-cdev.h} Drop the doubled word "in" in a comment. {input.h} Drop the doubled word "a" in a comment. {mdev.h} Drop the doubled word "the" in a comment. {ptrace.h} Link: https://lkml.kernel.org/r/20210126232444.22861-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Stefan Richter <stefanr@s5r6.in-berlin.de> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: Kirti Wankhede <kwankhede@nvidia.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-25Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds1-0/+40
Pull virtio updates from Michael Tsirkin: - new vdpa features to allow creation and deletion of new devices - virtio-blk support per-device queue depth - fixes, cleanups all over the place * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (31 commits) virtio-input: add multi-touch support virtio_mmio: fix one typo vdpa/mlx5: fix param validation in mlx5_vdpa_get_config() virtio_net: Fix fall-through warnings for Clang virtio_input: Prevent EV_MSC/MSC_TIMESTAMP loop storm for MT. virtio-blk: support per-device queue depth virtio_vdpa: don't warn when fail to disable vq virtio-pci: introduce modern device module virito-pci-modern: rename map_capability() to vp_modern_map_capability() virtio-pci-modern: introduce helper to get notification offset virtio-pci-modern: introduce helper for getting queue nums virtio-pci-modern: introduce helper for setting/geting queue size virtio-pci-modern: introduce helper to set/get queue_enable virtio-pci-modern: introduce vp_modern_queue_address() virtio-pci-modern: introduce vp_modern_set_queue_vector() virtio-pci-modern: introduce vp_modern_generation() virtio-pci-modern: introduce helpers for setting and getting features virtio-pci-modern: introduce helpers for setting and getting status virtio-pci-modern: introduce helper to set config vector virtio-pci-modern: introduce vp_modern_remove() ...
2021-02-25Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-1/+3
Merge misc updates from Andrew Morton: "A few small subsystems and some of MM. 172 patches. Subsystems affected by this patch series: hexagon, scripts, ntfs, ocfs2, vfs, and mm (slab-generic, slab, slub, debug, pagecache, swap, memcg, pagemap, mprotect, mremap, page-reporting, vmalloc, kasan, pagealloc, memory-failure, hugetlb, vmscan, z3fold, compaction, mempolicy, oom-kill, hugetlbfs, and migration)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (172 commits) mm/migrate: remove unneeded semicolons hugetlbfs: remove unneeded return value of hugetlb_vmtruncate() hugetlbfs: fix some comment typos hugetlbfs: correct some obsolete comments about inode i_mutex hugetlbfs: make hugepage size conversion more readable hugetlbfs: remove meaningless variable avoid_reserve hugetlbfs: correct obsolete function name in hugetlbfs_read_iter() hugetlbfs: use helper macro default_hstate in init_hugetlbfs_fs hugetlbfs: remove useless BUG_ON(!inode) in hugetlbfs_setattr() hugetlbfs: remove special hugetlbfs_set_page_dirty() mm/hugetlb: change hugetlb_reserve_pages() to type bool mm, oom: fix a comment in dump_task() mm/mempolicy: use helper range_in_vma() in queue_pages_test_walk() numa balancing: migrate on fault among multiple bound nodes mm, compaction: make fast_isolate_freepages() stay within zone mm/compaction: fix misbehaviors of fast_find_migrateblock() mm/compaction: correct deferral logic for proactive compaction mm/compaction: remove duplicated VM_BUG_ON_PAGE !PageLocked mm/compaction: remove rcu_read_lock during page compaction z3fold: simplify the zhdr initialization code in init_z3fold_page() ...
2021-02-25numa balancing: migrate on fault among multiple bound nodesHuang Ying1-1/+3
Now, NUMA balancing can only optimize the page placement among the NUMA nodes if the default memory policy is used. Because the memory policy specified explicitly should take precedence. But this seems too strict in some situations. For example, on a system with 4 NUMA nodes, if the memory of an application is bound to the node 0 and 1, NUMA balancing can potentially migrate the pages between the node 0 and 1 to reduce cross-node accessing without breaking the explicit memory binding policy. So in this patch, we add MPOL_F_NUMA_BALANCING mode flag to set_mempolicy() when mode is MPOL_BIND. With the flag specified, NUMA balancing will be enabled within the thread to optimize the page placement within the constrains of the specified memory binding policy. With the newly added flag, the NUMA balancing control mechanism becomes, - sysctl knob numa_balancing can enable/disable the NUMA balancing globally. - even if sysctl numa_balancing is enabled, the NUMA balancing will be disabled for the memory areas or applications with the explicit memory policy by default. - MPOL_F_NUMA_BALANCING can be used to enable the NUMA balancing for the applications when specifying the explicit memory policy (MPOL_BIND). Various page placement optimization based on the NUMA balancing can be done with these flags. As the first step, in this patch, if the memory of the application is bound to multiple nodes (MPOL_BIND), and in the hint page fault handler the accessing node are in the policy nodemask, the page will be tried to be migrated to the accessing node to reduce the cross-node accessing. If the newly added MPOL_F_NUMA_BALANCING flag is specified by an application on an old kernel version without its support, set_mempolicy() will return -1 and errno will be set to EINVAL. The application can use this behavior to run on both old and new kernel versions. And if the MPOL_F_NUMA_BALANCING flag is specified for the mode other than MPOL_BIND, set_mempolicy() will return -1 and errno will be set to EINVAL as before. Because we don't support optimization based on the NUMA balancing for these modes. In the previous version of the patch, we tried to reuse MPOL_MF_LAZY for mbind(). But that flag is tied to MPOL_MF_MOVE.*, so it seems not a good API/ABI for the purpose of the patch. And because it's not clear whether it's necessary to enable NUMA balancing for a specific memory area inside an application, so we only add the flag at the thread level (set_mempolicy()) instead of the memory area level (mbind()). We can do that when it become necessary. To test the patch, we run a test case as follows on a 4-node machine with 192 GB memory (48 GB per node). 1. Change pmbench memory accessing benchmark to call set_mempolicy() to bind its memory to node 1 and 3 and enable NUMA balancing. Some related code snippets are as follows, #include <numaif.h> #include <numa.h> struct bitmask *bmp; int ret; bmp = numa_parse_nodestring("1,3"); ret = set_mempolicy(MPOL_BIND | MPOL_F_NUMA_BALANCING, bmp->maskp, bmp->size + 1); /* If MPOL_F_NUMA_BALANCING isn't supported, fall back to MPOL_BIND */ if (ret < 0 && errno == EINVAL) ret = set_mempolicy(MPOL_BIND, bmp->maskp, bmp->size + 1); if (ret < 0) { perror("Failed to call set_mempolicy"); exit(-1); } 2. Run a memory eater on node 3 to use 40 GB memory before running pmbench. 3. Run pmbench with 64 processes, the working-set size of each process is 640 MB, so the total working-set size is 64 * 640 MB = 40 GB. The CPU and the memory (as in step 1.) of all pmbench processes is bound to node 1 and 3. So, after CPU usage is balanced, some pmbench processes run on the CPUs of the node 3 will access the memory of the node 1. 4. After the pmbench processes run for 100 seconds, kill the memory eater. Now it's possible for some pmbench processes to migrate their pages from node 1 to node 3 to reduce cross-node accessing. Test results show that, with the patch, the pages can be migrated from node 1 to node 3 after killing the memory eater, and the pmbench score can increase about 17.5%. Link: https://lkml.kernel.org/r/20210120061235.148637-2-ying.huang@intel.com Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Rik van Riel <riel@surriel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-24Merge tag 'vfio-v5.12-rc1' of git://github.com/awilliam/linux-vfioLinus Torvalds1-0/+27
Pull VFIO updatesfrom Alex Williamson: - Virtual address update handling (Steve Sistare) - s390/zpci fixes and cleanups (Max Gurtovoy) - Fixes for dirty bitmap handling, non-mdev page pinning, and improved pinned dirty scope tracking (Keqian Zhu) - Batched page pinning enhancement (Daniel Jordan) - Page access permission fix (Alex Williamson) * tag 'vfio-v5.12-rc1' of git://github.com/awilliam/linux-vfio: (21 commits) vfio/type1: Batch page pinning vfio/type1: Prepare for batched pinning with struct vfio_batch vfio/type1: Change success value of vaddr_get_pfn() vfio/type1: Use follow_pte() vfio/pci: remove CONFIG_VFIO_PCI_ZDEV from Kconfig vfio/iommu_type1: Fix duplicate included kthread.h vfio-pci/zdev: fix possible segmentation fault issue vfio-pci/zdev: remove unused vdev argument vfio/pci: Fix handling of pci use accessor return codes vfio/iommu_type1: Mantain a counter for non_pinned_groups vfio/iommu_type1: Fix some sanity checks in detach group vfio/iommu_type1: Populate full dirty when detach non-pinned group vfio/type1: block on invalid vaddr vfio/type1: implement notify callback vfio: iommu driver notify callback vfio/type1: implement interfaces to update vaddr vfio/type1: massage unmap iteration vfio: interfaces to update vaddr vfio/type1: implement unmap all vfio/type1: unmap cleanup ...
2021-02-24Merge tag 'char-misc-5.12-rc1' of ↵Linus Torvalds4-3/+706
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver updates from Greg KH: "Here is the large set of char/misc/whatever driver subsystem updates for 5.12-rc1. Over time it seems like this tree is collecting more and more tiny driver subsystems in one place, making it easier for those maintainers, which is why this is getting larger. Included in here are: - coresight driver updates - habannalabs driver updates - virtual acrn driver addition (proper acks from the x86 maintainers) - broadcom misc driver addition - speakup driver updates - soundwire driver updates - fpga driver updates - amba driver updates - mei driver updates - vfio driver updates - greybus driver updates - nvmeem driver updates - phy driver updates - mhi driver updates - interconnect driver udpates - fsl-mc bus driver updates - random driver fix - some small misc driver updates (rtsx, pvpanic, etc.) All of these have been in linux-next for a while, with the only reported issue being a merge conflict due to the dfl_device_id addition from the fpga subsystem in here" * tag 'char-misc-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (311 commits) spmi: spmi-pmic-arb: Fix hw_irq overflow Documentation: coresight: Add PID tracing description coresight: etm-perf: Support PID tracing for kernel at EL2 coresight: etm-perf: Clarify comment on perf options ACRN: update MAINTAINERS: mailing list is subscribers-only regmap: sdw-mbq: use MODULE_LICENSE("GPL") regmap: sdw: use no_pm routines for SoundWire 1.2 MBQ regmap: sdw: use _no_pm functions in regmap_read/write soundwire: intel: fix possible crash when no device is detected MAINTAINERS: replace my with email with replacements mhi: Fix double dma free uapi: map_to_7segment: Update example in documentation uio: uio_pci_generic: don't fail probe if pdev->irq equals to IRQ_NOTCONNECTED drivers/misc/vmw_vmci: restrict too big queue size in qp_host_alloc_queue firewire: replace tricky statement by two simple ones vme: make remove callback return void firmware: google: make coreboot driver's remove callback return void firmware: xilinx: Use explicit values for all enum values sample/acrn: Introduce a sample of HSM ioctl interface usage virt: acrn: Introduce an interface for Service VM to control vCPU ...
2021-02-24Merge tag 'cxl-for-5.12' of ↵Linus Torvalds1-0/+172
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull initial support for CXL (Compute Express Link) from Dan Williams: "Introduce an initial driver for CXL 2.0 Type-3 Memory Devices. CXL is Compute Express Link which released the 2.0 specification in November. The Linux relevant changes in CXL 2.0 are support for an OS to dynamically assign address space to memory devices, support for switches, persistent memory, and hotplug. A Type-3 Memory Device is a PCI enumerated device presenting the CXL Memory Device Class Code and implementing the CXL.mem protocol. CXL.mem allows device to advertise CPU and I/O coherent memory to the system, i.e. typical "System RAM" and "Persistent Memory" in Linux /proc/iomem terms. In addition to the CXL.mem fast path there is an administrative command hardware mailbox interface for maintenance and provisioning. It is this command interface that is the focus of the initial driver. With this driver a CXL device that is mapped by the BIOS can be administered by Linux. Linux support for CXL PMEM and dynamic CXL address space management are to be implemented post v5.12" Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> 4cdadfd5e0a7 ("cxl/mem: Introduce a driver for CXL-2.0-Type-3 endpoints") 13237183c735 ("cxl/mem: Add a "RAW" send command") 472b1ce6e9d6 ("cxl/mem: Enable commands via CEL") 57ee605b976c ("cxl/mem: Add set of informational commands") Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> 8adaf747c9f0 ("cxl/mem: Find device capabilities") b39cb1052a5c ("cxl/mem: Register CXL memX devices") * tag 'cxl-for-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: cxl/mem: Fix potential memory leak cxl/mem: Return -EFAULT if copy_to_user() fails MAINTAINERS: Add maintainers of the CXL driver cxl/mem: Add set of informational commands cxl/mem: Enable commands via CEL cxl/mem: Add a "RAW" send command cxl/mem: Add basic IOCTL interface cxl/mem: Register CXL memX devices cxl/mem: Find device capabilities cxl/mem: Introduce a driver for CXL-2.0-Type-3 endpoints
2021-02-24bpf: Remove blank line in bpf helper description commentHangbin Liu1-1/+0
Commit 34b2021cc616 ("bpf: Add BPF-helper for MTU checking") added an extra blank line in bpf helper description. This will make bpf_helpers_doc.py stop building bpf_helper_defs.h immediately after bpf_check_mtu(), which will affect future added functions. Fixes: 34b2021cc616 ("bpf: Add BPF-helper for MTU checking") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/20210223131457.1378978-1-liuhangbin@gmail.com
2021-02-24io_uring: flag new native workers with IORING_FEAT_NATIVE_WORKERSJens Axboe1-0/+1
A few reasons to do this: - The naming of the manager and worker have changed. That's a user visible change, so makes sense to flag it. - Opening certain files that use ->signal (like /proc/self or /dev/tty) now works, and the flag tells the application upfront that this is the case. - Related to the above, using signalfd will now work as well. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-24Merge tag 'gfs2-for-5.12' of ↵Linus Torvalds1-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Andreas Gruenbacher: - Log space and revoke accounting rework to fix some failed asserts. - Local resource group glock sharing for better local performance. - Add support for version 1802 filesystems: trusted xattr support and '-o rgrplvb' mounts by default. - Actually synchronize on the inode glock's FREEING bit during withdraw ("gfs2: fix glock confusion in function signal_our_withdraw"). - Fix parallel recovery of multiple journals ("gfs2: keep bios separate for each journal"). - Various other bug fixes. * tag 'gfs2-for-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (49 commits) gfs2: Don't get stuck with I/O plugged in gfs2_ail1_flush gfs2: Per-revoke accounting in transactions gfs2: Rework the log space allocation logic gfs2: Minor calc_reserved cleanup gfs2: Use resource group glock sharing gfs2: Allow node-wide exclusive glock sharing gfs2: Add local resource group locking gfs2: Add per-reservation reserved block accounting gfs2: Rename rs_{free -> requested} and rd_{reserved -> requested} gfs2: Check for active reservation in gfs2_release gfs2: Don't search for unreserved space twice gfs2: Only pass reservation down to gfs2_rbm_find gfs2: Also reflect single-block allocations in rgd->rd_extfail_pt gfs2: Recursive gfs2_quota_hold in gfs2_iomap_end gfs2: Add trusted xattr support gfs2: Enable rgrplvb for sb_fs_format 1802 gfs2: Don't skip dlm unlock if glock has an lvb gfs2: Lock imbalance on error path in gfs2_recover_one gfs2: Move function gfs2_ail_empty_tr gfs2: Get rid of current_tail() ...
2021-02-24Merge tag 'idmapped-mounts-v5.12' of ↵Linus Torvalds1-0/+16
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull idmapped mounts from Christian Brauner: "This introduces idmapped mounts which has been in the making for some time. Simply put, different mounts can expose the same file or directory with different ownership. This initial implementation comes with ports for fat, ext4 and with Christoph's port for xfs with more filesystems being actively worked on by independent people and maintainers. Idmapping mounts handle a wide range of long standing use-cases. Here are just a few: - Idmapped mounts make it possible to easily share files between multiple users or multiple machines especially in complex scenarios. For example, idmapped mounts will be used in the implementation of portable home directories in systemd-homed.service(8) where they allow users to move their home directory to an external storage device and use it on multiple computers where they are assigned different uids and gids. This effectively makes it possible to assign random uids and gids at login time. - It is possible to share files from the host with unprivileged containers without having to change ownership permanently through chown(2). - It is possible to idmap a container's rootfs and without having to mangle every file. For example, Chromebooks use it to share the user's Download folder with their unprivileged containers in their Linux subsystem. - It is possible to share files between containers with non-overlapping idmappings. - Filesystem that lack a proper concept of ownership such as fat can use idmapped mounts to implement discretionary access (DAC) permission checking. - They allow users to efficiently changing ownership on a per-mount basis without having to (recursively) chown(2) all files. In contrast to chown (2) changing ownership of large sets of files is instantenous with idmapped mounts. This is especially useful when ownership of a whole root filesystem of a virtual machine or container is changed. With idmapped mounts a single syscall mount_setattr syscall will be sufficient to change the ownership of all files. - Idmapped mounts always take the current ownership into account as idmappings specify what a given uid or gid is supposed to be mapped to. This contrasts with the chown(2) syscall which cannot by itself take the current ownership of the files it changes into account. It simply changes the ownership to the specified uid and gid. This is especially problematic when recursively chown(2)ing a large set of files which is commong with the aforementioned portable home directory and container and vm scenario. - Idmapped mounts allow to change ownership locally, restricting it to specific mounts, and temporarily as the ownership changes only apply as long as the mount exists. Several userspace projects have either already put up patches and pull-requests for this feature or will do so should you decide to pull this: - systemd: In a wide variety of scenarios but especially right away in their implementation of portable home directories. https://systemd.io/HOME_DIRECTORY/ - container runtimes: containerd, runC, LXD:To share data between host and unprivileged containers, unprivileged and privileged containers, etc. The pull request for idmapped mounts support in containerd, the default Kubernetes runtime is already up for quite a while now: https://github.com/containerd/containerd/pull/4734 - The virtio-fs developers and several users have expressed interest in using this feature with virtual machines once virtio-fs is ported. - ChromeOS: Sharing host-directories with unprivileged containers. I've tightly synced with all those projects and all of those listed here have also expressed their need/desire for this feature on the mailing list. For more info on how people use this there's a bunch of talks about this too. Here's just two recent ones: https://www.cncf.io/wp-content/uploads/2020/12/Rootless-Containers-in-Gitpod.pdf https://fosdem.org/2021/schedule/event/containers_idmap/ This comes with an extensive xfstests suite covering both ext4 and xfs: https://git.kernel.org/brauner/xfstests-dev/h/idmapped_mounts It covers truncation, creation, opening, xattrs, vfscaps, setid execution, setgid inheritance and more both with idmapped and non-idmapped mounts. It already helped to discover an unrelated xfs setgid inheritance bug which has since been fixed in mainline. It will be sent for inclusion with the xfstests project should you decide to merge this. In order to support per-mount idmappings vfsmounts are marked with user namespaces. The idmapping of the user namespace will be used to map the ids of vfs objects when they are accessed through that mount. By default all vfsmounts are marked with the initial user namespace. The initial user namespace is used to indicate that a mount is not idmapped. All operations behave as before and this is verified in the testsuite. Based on prior discussions we want to attach the whole user namespace and not just a dedicated idmapping struct. This allows us to reuse all the helpers that already exist for dealing with idmappings instead of introducing a whole new range of helpers. In addition, if we decide in the future that we are confident enough to enable unprivileged users to setup idmapped mounts the permission checking can take into account whether the caller is privileged in the user namespace the mount is currently marked with. The user namespace the mount will be marked with can be specified by passing a file descriptor refering to the user namespace as an argument to the new mount_setattr() syscall together with the new MOUNT_ATTR_IDMAP flag. The system call follows the openat2() pattern of extensibility. The following conditions must be met in order to create an idmapped mount: - The caller must currently have the CAP_SYS_ADMIN capability in the user namespace the underlying filesystem has been mounted in. - The underlying filesystem must support idmapped mounts. - The mount must not already be idmapped. This also implies that the idmapping of a mount cannot be altered once it has been idmapped. - The mount must be a detached/anonymous mount, i.e. it must have been created by calling open_tree() with the OPEN_TREE_CLONE flag and it must not already have been visible in the filesystem. The last two points guarantee easier semantics for userspace and the kernel and make the implementation significantly simpler. By default vfsmounts are marked with the initial user namespace and no behavioral or performance changes are observed. The manpage with a detailed description can be found here: https://git.kernel.org/brauner/man-pages/c/1d7b902e2875a1ff342e036a9f866a995640aea8 In order to support idmapped mounts, filesystems need to be changed and mark themselves with the FS_ALLOW_IDMAP flag in fs_flags. The patches to convert individual filesystem are not very large or complicated overall as can be seen from the included fat, ext4, and xfs ports. Patches for other filesystems are actively worked on and will be sent out separately. The xfstestsuite can be used to verify that port has been done correctly. The mount_setattr() syscall is motivated independent of the idmapped mounts patches and it's been around since July 2019. One of the most valuable features of the new mount api is the ability to perform mounts based on file descriptors only. Together with the lookup restrictions available in the openat2() RESOLVE_* flag namespace which we added in v5.6 this is the first time we are close to hardened and race-free (e.g. symlinks) mounting and path resolution. While userspace has started porting to the new mount api to mount proper filesystems and create new bind-mounts it is currently not possible to change mount options of an already existing bind mount in the new mount api since the mount_setattr() syscall is missing. With the addition of the mount_setattr() syscall we remove this last restriction and userspace can now fully port to the new mount api, covering every use-case the old mount api could. We also add the crucial ability to recursively change mount options for a whole mount tree, both removing and adding mount options at the same time. This syscall has been requested multiple times by various people and projects. There is a simple tool available at https://github.com/brauner/mount-idmapped that allows to create idmapped mounts so people can play with this patch series. I'll add support for the regular mount binary should you decide to pull this in the following weeks: Here's an example to a simple idmapped mount of another user's home directory: u1001@f2-vm:/$ sudo ./mount --idmap both:1000:1001:1 /home/ubuntu/ /mnt u1001@f2-vm:/$ ls -al /home/ubuntu/ total 28 drwxr-xr-x 2 ubuntu ubuntu 4096 Oct 28 22:07 . drwxr-xr-x 4 root root 4096 Oct 28 04:00 .. -rw------- 1 ubuntu ubuntu 3154 Oct 28 22:12 .bash_history -rw-r--r-- 1 ubuntu ubuntu 220 Feb 25 2020 .bash_logout -rw-r--r-- 1 ubuntu ubuntu 3771 Feb 25 2020 .bashrc -rw-r--r-- 1 ubuntu ubuntu 807 Feb 25 2020 .profile -rw-r--r-- 1 ubuntu ubuntu 0 Oct 16 16:11 .sudo_as_admin_successful -rw------- 1 ubuntu ubuntu 1144 Oct 28 00:43 .viminfo u1001@f2-vm:/$ ls -al /mnt/ total 28 drwxr-xr-x 2 u1001 u1001 4096 Oct 28 22:07 . drwxr-xr-x 29 root root 4096 Oct 28 22:01 .. -rw------- 1 u1001 u1001 3154 Oct 28 22:12 .bash_history -rw-r--r-- 1 u1001 u1001 220 Feb 25 2020 .bash_logout -rw-r--r-- 1 u1001 u1001 3771 Feb 25 2020 .bashrc -rw-r--r-- 1 u1001 u1001 807 Feb 25 2020 .profile -rw-r--r-- 1 u1001 u1001 0 Oct 16 16:11 .sudo_as_admin_successful -rw------- 1 u1001 u1001 1144 Oct 28 00:43 .viminfo u1001@f2-vm:/$ touch /mnt/my-file u1001@f2-vm:/$ setfacl -m u:1001:rwx /mnt/my-file u1001@f2-vm:/$ sudo setcap -n 1001 cap_net_raw+ep /mnt/my-file u1001@f2-vm:/$ ls -al /mnt/my-file -rw-rwxr--+ 1 u1001 u1001 0 Oct 28 22:14 /mnt/my-file u1001@f2-vm:/$ ls -al /home/ubuntu/my-file -rw-rwxr--+ 1 ubuntu ubuntu 0 Oct 28 22:14 /home/ubuntu/my-file u1001@f2-vm:/$ getfacl /mnt/my-file getfacl: Removing leading '/' from absolute path names # file: mnt/my-file # owner: u1001 # group: u1001 user::rw- user:u1001:rwx group::rw- mask::rwx other::r-- u1001@f2-vm:/$ getfacl /home/ubuntu/my-file getfacl: Removing leading '/' from absolute path names # file: home/ubuntu/my-file # owner: ubuntu # group: ubuntu user::rw- user:ubuntu:rwx group::rw- mask::rwx other::r--" * tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: (41 commits) xfs: remove the possibly unused mp variable in xfs_file_compat_ioctl xfs: support idmapped mounts ext4: support idmapped mounts fat: handle idmapped mounts tests: add mount_setattr() selftests fs: introduce MOUNT_ATTR_IDMAP fs: add mount_setattr() fs: add attr_flags_to_mnt_flags helper fs: split out functions to hold writers namespace: only take read lock in do_reconfigure_mnt() mount: make {lock,unlock}_mount_hash() static namespace: take lock_mount_hash() directly when changing flags nfs: do not export idmapped mounts overlayfs: do not mount on top of idmapped mounts ecryptfs: do not mount on top of idmapped mounts ima: handle idmapped mounts apparmor: handle idmapped mounts fs: make helpers idmap mount aware exec: handle idmapped mounts would_dump: handle idmapped mounts ...
2021-02-23Merge branches 'rgrp-glock-sharing' and 'gfs2-revoke' from ↵Andreas Gruenbacher6-24/+90
https://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git Merge the resource group glock sharing feature and the revoke accounting rework. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-02-23vdpa: Enable user to query vdpa device infoParav Pandit1-0/+5
Enable user to query vdpa device information. $ vdpa dev add mgmtdev vdpasim_net name foo2 Show the newly created vdpa device by its name: $ vdpa dev show foo2 foo2: type network mgmtdev vdpasim_net vendor_id 0 max_vqs 2 max_vq_size 256 $ vdpa dev show foo2 -jp { "dev": { "foo2": { "type": "network", "mgmtdev": "vdpasim_net", "vendor_id": 0, "max_vqs": 2, "max_vq_size": 256 } } } Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Eli Cohen <elic@nvidia.com> Reviewed-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20210105103203.82508-6-parav@nvidia.com Including a memory leak fix: Link: https://lore.kernel.org/r/20210217060614.59561-1-parav@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-02-23vdpa: Enable a user to add and delete a vdpa deviceParav Pandit1-0/+4
Add the ability to add and delete a vdpa device. Examples: Create a vdpa device of type network named "foo2" from the management device vdpasim: $ vdpa dev add mgmtdev vdpasim_net name foo2 Delete the vdpa device after its use: $ vdpa dev del foo2 Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Eli Cohen <elic@nvidia.com> Reviewed-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20210105103203.82508-5-parav@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-02-23vdpa: Define vdpa mgmt device, ops and a netlink interfaceParav Pandit1-0/+31
To add one or more VDPA devices, define a management device which allows adding or removing vdpa device. A management device defines set of callbacks to manage vdpa devices. To begin with, it defines add and remove callbacks through which a user defined vdpa device can be added or removed. A unique management device is identified by its unique handle identified by management device name and optionally the bus name. Hence, introduce routine through which driver can register a management device and its callback operations for adding and remove a vdpa device. Introduce vdpa netlink socket family so that user can query management device and its attributes. Example of show vdpa management device which allows creating vdpa device of networking class (device id = 0x1) of virtio specification 1.1 section 5.1.1. $ vdpa mgmtdev show vdpasim_net: supported_classes: net Example of showing vdpa management device in JSON format. $ vdpa mgmtdev show -jp { "show": { "vdpasim_net": { "supported_classes": [ "net" ] } } } Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Eli Cohen <elic@nvidia.com> Reviewed-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20210105103203.82508-4-parav@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Including a bugfix: vpda: correctly size vdpa_nl_policy We need to ensure last entry of vdpa_nl_policy[] is zero, otherwise out-of-bounds access is hurting us. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Parav Pandit <parav@nvidia.com> Cc: Eli Cohen <elic@nvidia.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20210210134911.4119555-1-eric.dumazet@gmail.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-02-22Merge tag 'for-5.12/dm-changes' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - Fix DM integrity's HMAC support to provide enhanced security of internal_hash and journal_mac capabilities. - Various DM writecache fixes to address performance, fix table output to match what was provided at table creation, fix writing beyond end of device when shrinking underlying data device, and a couple other small cleanups. - Add DM crypt support for using trusted keys. - Fix deadlock when swapping to DM crypt device by throttling number of in-flight REQ_SWAP bios. Implemented in DM core so that other bio-based targets can opt-in by setting ti->limit_swap_bios. - Fix various inverted logic bugs in the .iterate_devices callout functions that are used to assess if specific feature or capability is supported across all devices being combined/stacked by DM. - Fix DM era target bugs that exposed users to lost writes or memory leaks. - Add DM core support for passing through inline crypto support of underlying devices. Includes block/keyslot-manager changes that enable extending this support to DM. - Various small fixes and cleanups (spelling fixes, front padding calculation cleanup, cleanup conditional zoned support in targets, etc). * tag 'for-5.12/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (31 commits) dm: fix deadlock when swapping to encrypted device dm: simplify target code conditional on CONFIG_BLK_DEV_ZONED dm: set DM_TARGET_PASSES_CRYPTO feature for some targets dm: support key eviction from keyslot managers of underlying devices dm: add support for passing through inline crypto support block/keyslot-manager: Introduce functions for device mapper support block/keyslot-manager: Introduce passthrough keyslot manager dm era: only resize metadata in preresume dm era: Use correct value size in equality function of writeset tree dm era: Fix bitset memory leaks dm era: Verify the data block size hasn't changed dm era: Reinitialize bitset cache before digesting a new writeset dm era: Update in-core bitset after committing the metadata dm era: Recover committed writeset after crash dm writecache: use bdev_nr_sectors() instead of open-coded equivalent dm writecache: fix writing beyond end of underlying device when shrinking dm table: remove needless request_queue NULL pointer checks dm table: fix zoned iterate_devices based device capability checks dm table: fix DAX iterate_devices based device capability checks dm table: fix iterate_devices based device capability checks ...