summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2021-04-26macvlan: Add nodst option to macvlan type sourceJethro Beekman1-0/+1
The default behavior for source MACVLAN is to duplicate packets to appropriate type source devices, and then do the normal destination MACVLAN flow. This patch adds an option to skip destination MACVLAN processing if any matching source MACVLAN device has the option set. This allows setting up a "catch all" device for source MACVLAN: create one or more devices with type source nodst, and one device with e.g. type vepa, and incoming traffic will be received on exactly one device. v2: netdev wants non-standard line length Signed-off-by: Jethro Beekman <kernel@jbeekman.nl> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-26Merge tag 'mlx5-updates-2021-04-21' of ↵David S. Miller3-11/+13
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2021-04-21 devlink external port attribute for SF (Sub-Function) port flavour This adds the support to instantiate Sub-Functions on external hosts E.g when Eswitch manager is enabled on the ARM SmarNic SoC CPU, users are now able to spawn new Sub-Functions on the Host server CPU. Parav Pandit Says: ================== This series introduces and uses external attribute for the SF port to indicate that a SF port belongs to an external controller. This is needed to generate unique phys_port_name when PF and SF numbers are overlapping between local and external controllers. For example two controllers 0 and 1, both of these controller have a SF. having PF number 0, SF number 77. Here, phys_port_name has duplicate entry which doesn't have controller number in it. Hence, add controller number optionally when a SF port is for an external controller. This extension is similar to existing PF and VF eswitch ports of the external controller. When a SF is for external controller an example view of external SF port and config sequence: On eswitch system: $ devlink dev eswitch set pci/0033:01:00.0 mode switchdev $ devlink port show pci/0033:01:00.0/196607: type eth netdev enP51p1s0f0np0 flavour physical port 0 splittable false pci/0033:01:00.0/131072: type eth netdev eth0 flavour pcipf controller 1 pfnum 0 external true splittable false function: hw_addr 00:00:00:00:00:00 $ devlink port add pci/0033:01:00.0 flavour pcisf pfnum 0 sfnum 77 controller 1 pci/0033:01:00.0/163840: type eth netdev eth1 flavour pcisf controller 1 pfnum 0 sfnum 77 splittable false function: hw_addr 00:00:00:00:00:00 state inactive opstate detached phys_port_name construction: $ cat /sys/class/net/eth1/phys_port_name c1pf0sf77 Patch summary: First 3 patches prepares the eswitch to handle vports in more generic way using xarray to lookup vport from its unique vport number. Patch-1 returns maximum eswitch ports only when eswitch is enabled Patch-2 prepares eswitch to return eswitch max ports from a struct Patch-3 uses xarray for vport and representor lookup Patch-4 considers SF for an additioanl range of SF vports Patch-5 relies on SF hw table to check SF support Patch-6 extends SF devlink port attribute for external flag Patch-7 stores the per controller SF allocation attributes Patch-8 uses SF function id for filtering events Patch-9 uses helper for allocation and free Patch-10 splits hw table into per controller table and generic one Patch-11 extends sf table for additional range ================== ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-26net: ethernet: ixp4xx: Support device tree probingLinus Walleij1-0/+2
This adds device tree probing to the IXP4xx ethernet driver. Add a platform data bool to tell us whether to register an MDIO bus for the device or not, as well as the corresponding NPE. We need to drop the memory region request as part of this since the OF core will request the memory for the device. Cc: Zoltan HERPAI <wigyori@uid0.hu> Cc: Raylynn Knight <rayknight@me.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-26netfilter: remove all xt_table anchors from struct netFlorian Westphal2-19/+0
No longer needed, table pointer arg is now passed via netfilter core. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-26netfilter: ip6_tables: pass table pointer via nf_hook_opsFlorian Westphal1-3/+2
Same patch as the ip_tables one: removal of all accesses to ip6_tables xt_table pointers. After this patch the struct net xt_table anchors can be removed. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-26netfilter: arp_tables: pass table pointer via nf_hook_opsFlorian Westphal1-1/+1
Same change as previous patch. Only difference: no need to handle NULL template_ops parameter, the only caller (arptable_filter) always passes non-NULL argument. This removes all remaining accesses to net->ipv4.arptable_filter. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-26netfilter: ip_tables: pass table pointer via nf_hook_opsFlorian Westphal2-4/+5
iptable_x modules rely on 'struct net' to contain a pointer to the table that should be evaluated. In order to remove these pointers from struct net, pass them via the 'priv' pointer in a similar fashion as nf_tables passes the rule data. To do that, duplicate the nf_hook_info array passed in from the iptable_x modules, update the ops->priv pointers of the copy to refer to the table and then change the hookfn implementations to just pass the 'priv' argument to the traverser. After this patch, the xt_table pointers can already be removed from struct net. However, changes to struct net result in re-compile of the entire network stack, so do the removal after arptables and ip6tables have been converted as well. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-26netfilter: arptables: unregister the tables by nameFlorian Westphal1-2/+2
and again, this time for arptables. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-26netfilter: ip6tables: unregister the tables by nameFlorian Westphal1-2/+2
Same as the previous patch, but for ip6tables. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-26netfilter: iptables: unregister the tables by nameFlorian Westphal1-3/+3
xtables stores the xt_table structs in the struct net. This isn't needed anymore, the structures could be passed via the netfilter hook 'private' pointer to the hook functions, which would allow us to remove those pointers from struct net. As a first step, reduce the number of accesses to the net->ipv4.ip6table_{raw,filter,...} pointers. This allows the tables to get unregistered by name instead of having to pass the raw address. The xt_table structure cane looked up by name+address family instead. This patch is useless as-is (the backends still have the raw pointer address), but it lowers the bar to remove those. It also allows to put the 'was table registered in the first place' check into ip_tables.c rather than have it in each table sub module. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-26netfilter: x_tables: add xt_find_tableFlorian Westphal1-0/+1
This will be used to obtain the xt_table struct given address family and table name. Followup patches will reduce the number of direct accesses to the xt_table structures via net->ipv{4,6}.ip(6)table_{nat,mangle,...} pointers, then remove them. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-26netfilter: x_tables: remove ipt_unregister_tableFlorian Westphal2-5/+0
Its the same function as ipt_unregister_table_exit. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-26netfilter: ebtables: remove the 3 ebtables pointers from struct netFlorian Westphal2-13/+4
ebtables stores the table internal data (what gets passed to the ebt_do_table() interpreter) in struct net. nftables keeps the internal interpreter format in pernet lists and passes it via the netfilter core infrastructure (priv pointer). Do the same for ebtables: the nf_hook_ops are duplicated via kmemdup, then the ops->priv pointer is set to the table that is being registered. After that, the netfilter core passes this table info to the hookfn. This allows to remove the pointers from struct net. Same pattern can be applied to ip/ip6/arptables. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-26netfilter: disable defrag once its no longer neededFlorian Westphal2-2/+4
When I changed defrag hooks to no longer get registered by default I intentionally made it so that registration can only be un-done by unloading the nf_defrag_ipv4/6 module. In hindsight this was too conservative; there is no reason to keep defrag on while there is no feature dependency anymore. Moreover, this won't work if user isn't allowed to remove nf_defrag module. This adds the disable() functions for both ipv4 and ipv6 and calls them from conntrack, TPROXY and the xtables socket module. ipvs isn't converted here, it will behave as before this patch and will need module removal. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-26netfilter: nft_socket: add support for cgroupsv2Pablo Neira Ayuso1-0/+4
Allow to match on the cgroupsv2 id from ancestor level. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-26netfilter: nat: move nf_xfrm_me_harder to where it is usedFlorian Westphal1-2/+0
remove the export and make it static. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller8-7/+107
Alexei Starovoitov says: ==================== pull-request: bpf-next 2021-04-23 The following pull-request contains BPF updates for your *net-next* tree. We've added 69 non-merge commits during the last 22 day(s) which contain a total of 69 files changed, 3141 insertions(+), 866 deletions(-). The main changes are: 1) Add BPF static linker support for extern resolution of global, from Andrii. 2) Refine retval for bpf_get_task_stack helper, from Dave. 3) Add a bpf_snprintf helper, from Florent. 4) A bunch of miscellaneous improvements from many developers. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-25dmaengine: idxd: Add IDXD performance monitor supportTom Zanussi1-0/+1
Implement the IDXD performance monitor capability (named 'perfmon' in the DSA (Data Streaming Accelerator) spec [1]), which supports the collection of information about key events occurring during DSA and IAX (Intel Analytics Accelerator) device execution, to assist in performance tuning and debugging. The idxd perfmon support is implemented as part of the IDXD driver and interfaces with the Linux perf framework. It has several features in common with the existing uncore pmu support: - it does not support sampling - does not support per-thread counting However it also has some unique features not present in the core and uncore support: - all general-purpose counters are identical, thus no event constraints - operation is always system-wide While the core perf subsystem assumes that all counters are by default per-cpu, the uncore pmus are socket-scoped and use a cpu mask to restrict counting to one cpu from each socket. IDXD counters use a similar strategy but expand the scope even further; since IDXD counters are system-wide and can be read from any cpu, the IDXD perf driver picks a single cpu to do the work (with cpu hotplug notifiers to choose a different cpu if the chosen one is taken off-line). More specifically, the perf userspace tool by default opens a counter for each cpu for an event. However, if it finds a cpumask file associated with the pmu under sysfs, as is the case with the uncore pmus, it will open counters only on the cpus specified by the cpumask. Since perfmon only needs to open a single counter per event for a given IDXD device, the perfmon driver will create a sysfs cpumask file for the device and insert the first cpu of the system into it. When a user uses perf to open an event, perf will open a single counter on the cpu specified by the cpu mask. This amounts to the default system-wide rather than per-cpu counting mentioned previously for perfmon pmu events. In order to keep the cpu mask up-to-date, the driver implements cpu hotplug support for multiple devices, as IDXD usually enumerates and registers more than one idxd device. The perfmon driver implements basic perfmon hardware capability discovery and configuration, and is initialized by the IDXD driver's probe function. During initialization, the driver retrieves the total number of supported performance counters, the pmu ID, and the device type from idxd device, and registers itself under the Linux perf framework. The perf userspace tool can be used to monitor single or multiple events depending on the given configuration, as well as event groups, which are also supported by the perfmon driver. The user configures events using the perf tool command-line interface by specifying the event and corresponding event category, along with an optional set of filters that can be used to restrict counting to specific work queues, traffic classes, page and transfer sizes, and engines (See [1] for specifics). With the configuration specified by the user, the perf tool issues a system call passing that information to the kernel, which uses it to initialize the specified event(s). The event(s) are opened and started, and following termination of the perf command, they're stopped. At that point, the perfmon driver will read the latest count for the event(s), calculate the difference between the latest counter values and previously tracked counter values, and display the final incremental count as the event count for the cycle. An overflow handler registered on the IDXD irq path is used to account for counter overflows, which are signaled by an overflow interrupt. Below are a couple of examples of perf usage for monitoring DSA events. The following monitors all events in the 'engine' category. Becuuse no filters are specified, this captures all engine events for the workload, which in this case is 19 iterations of the work generated by the kernel dmatest module. Details describing the events can be found in Appendix D of [1], Performance Monitoring Events, but briefly they are: event 0x1: total input data processed, in 32-byte units event 0x2: total data written, in 32-byte units event 0x4: number of work descriptors that read the source event 0x8: number of work descriptors that write the destination event 0x10: number of work descriptors dispatched from batch descriptors event 0x20: number of work descriptors dispatched from work queues # perf stat -e dsa0/event=0x1,event_category=0x1/, dsa0/event=0x2,event_category=0x1/, dsa0/event=0x4,event_category=0x1/, dsa0/event=0x8,event_category=0x1/, dsa0/event=0x10,event_category=0x1/, dsa0/event=0x20,event_category=0x1/ modprobe dmatest channel=dma0chan0 timeout=2000 iterations=19 run=1 wait=1 Performance counter stats for 'system wide': 5,332 dsa0/event=0x1,event_category=0x1/ 5,327 dsa0/event=0x2,event_category=0x1/ 19 dsa0/event=0x4,event_category=0x1/ 19 dsa0/event=0x8,event_category=0x1/ 0 dsa0/event=0x10,event_category=0x1/ 19 dsa0/event=0x20,event_category=0x1/ 21.977436186 seconds time elapsed The command below illustrates filter usage with a simple example. It specifies that MEM_MOVE operations should be counted for the DSA device dsa0 (event 0x8 corresponds to the EV_MEM_MOVE event - Number of Memory Move Descriptors, which is part of event category 0x3 - Operations. The detailed category and event IDs are available in Appendix D, Performance Monitoring Events, of [1]). In addition to the event and event category, a number of filters are also specified (the detailed filter values are available in Chapter 6.4 (Filter Support) of [1]), which will restrict counting to only those events that meet all of the filter criteria. In this case, the filters specify that only MEM_MOVE operations that are serviced by work queue wq0 and specifically engine number engine0 and traffic class tc0 having sizes between 0 and 4k and page size of between 0 and 1G result in a counter hit; anything else will be filtered out and not appear in the final count. Note that filters are optional - any filter not specified is assumed to be all ones and will pass anything. # perf stat -e dsa0/filter_wq=0x1,filter_tc=0x1,filter_sz=0x7, filter_eng=0x1,event=0x8,event_category=0x3/ modprobe dmatest channel=dma0chan0 timeout=2000 iterations=19 run=1 wait=1 Performance counter stats for 'system wide': 19 dsa0/filter_wq=0x1,filter_tc=0x1,filter_sz=0x7, filter_eng=0x1,event=0x8,event_category=0x3/ 21.865914091 seconds time elapsed The output above reflects that the unspecified workload resulted in the counting of 19 MEM_MOVE operation events that met the filter criteria. [1]: https://software.intel.com/content/www/us/en/develop/download/intel-data-streaming-accelerator-preliminary-architecture-specification.html [ Based on work originally by Jing Lin. ] Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Link: https://lore.kernel.org/r/0c5080a7d541904c4ad42b848c76a1ce056ddac7.1619276133.git.zanussi@kernel.org Signed-off-by: Vinod Koul <vkoul@kernel.org>
2021-04-25io_uring: add full-fledged dynamic buffers supportPavel Begunkov1-0/+1
Hook buffers into all rsrc infrastructure, including tagging and updates. Suggested-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/119ed51d68a491dae87eb55fb467a47870c86aad.1619356238.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-25io_uring: add generic rsrc update with tagsPavel Begunkov1-6/+16
Add IORING_REGISTER_RSRC_UPDATE, which also supports passing in rsrc tags. Implement it for registered files. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d4dc66df204212f64835ffca2c4eb5e8363f2f05.1619356238.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-25io_uring: add IORING_REGISTER_RSRCPavel Begunkov1-0/+8
Add a new io_uring_register() opcode for rsrc registeration. Instead of accepting a pointer to resources, fds or iovecs, it @arg is now pointing to a struct io_uring_rsrc_register, and the second argument tells how large that struct is to make it easily extendible by adding new fields. All that is done mainly to be able to pass in a pointer with tags. Pass it in and enable CQE posting for file resources. Doesn't support setting tags on update yet. A design choice made here is to not post CQEs on rsrc de-registration, but only when we updated-removed it by rsrc dynamic update. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/c498aaec32a4bb277b2406b9069662c02cdda98c.1619356238.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-25io_uring: enumerate dynamic resourcesPavel Begunkov1-0/+4
As resources are getting more support and common parts, it'll be more convenient to index resources and use it for indexing. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f0be63e9310212d5601d36277c2946ff7a040485.1619356238.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-24kbuild: redo fake deps at include/config/*.hAlexey Dobriyan1-1/+1
Make include/config/foo/bar.h fake deps files generation simpler. * delete .h suffix those aren't header files, shorten filenames, * delete tolower() Linux filesystems can deal with both upper and lowercase filenames very well, * put everything in 1 directory Presumably 'mkdir -p' split is from dark times when filesystems handled huge directories badly, disks were round adding to seek times. x86_64 allmodconfig lists 12364 files in include/config. ../obj/include/config/ ├── 104_QUAD_8 ├── 60XX_WDT ├── 64BIT ... ├── ZSWAP_DEFAULT_ON ├── ZSWAP_ZPOOL_DEFAULT └── ZSWAP_ZPOOL_DEFAULT_ZBUD 0 directories, 12364 files Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-04-24kbuild: add an elfnote for whether vmlinux is built with ltoYonghong Song1-0/+14
Currently, clang LTO built vmlinux won't work with pahole. LTO introduced cross-cu dwarf tag references and broke current pahole model which handles one cu as a time. The solution is to merge all cu's as one pahole cu as in [1]. We would like to do this merging only if cross-cu dwarf references happens. The LTO build mode is a pretty good indication for that. In earlier version of this patch ([2]), clang flag -grecord-gcc-switches is proposed to add to compilation flags so pahole could detect "-flto" and then merging cu's. This will increate the binary size of 1% without LTO though. Arnaldo suggested to use a note to indicate the vmlinux is built with LTO. Such a cheap way to get whether the vmlinux is built with LTO or not helps pahole but is also useful for tracing as LTO may inline/delete/demote global functions, promote static functions, etc. So this patch added an elfnote with a new type LINUX_ELFNOTE_LTO_INFO. The owner of the note is "Linux". With gcc 8.4.1 and clang trunk, without LTO, I got $ readelf -n vmlinux Displaying notes found in: .notes Owner Data size Description ... Linux 0x00000004 func description data: 00 00 00 00 ... With "readelf -x ".notes" vmlinux", I can verify the above "func" with type code 0x101. With clang thin-LTO, I got the same as above except the following: description data: 01 00 00 00 which indicates the vmlinux is built with LTO. [1] https://lore.kernel.org/bpf/20210325065316.3121287-1-yhs@fb.com/ [2] https://lore.kernel.org/bpf/20210331001623.2778934-1-yhs@fb.com/ Suggested-by: Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com> Signed-off-by: Yonghong Song <yhs@fb.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM/Clang v12.0.0-rc4 (x86-64) Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-04-24Merge tag 'irqchip-5.13' of ↵Thomas Gleixner2-13/+2
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core Pull irqchip and irqdomain updates from Marc Zyngier: New HW support: - New driver for the Nuvoton WPCM450 interrupt controller - New driver for the IDT 79rc3243x interrupt controller - Add support for interrupt trigger configuration to the MStar irqchip - Add more external interrupt support to the STM32 irqchip - Add new compatible strings for QCOM SC7280 to the qcom-pdc binding Fixes and cleanups: - Drop irq_create_strict_mappings() and irq_create_identity_mapping() from the irqdomain API, with cleanups in a couple of drivers - Fix nested NMI issue with spurious interrupts on GICv3 - Don't allow GICv4.1 vSGIs when the CPU doesn't support them - Various cleanups and minor fixes Link: https://lore.kernel.org/r/20210424094640.1731920-1-maz@kernel.org
2021-04-24devlink: Extend SF port attributes to have external attributeParav Pandit1-1/+4
Extended SF port attributes to have optional external flag similar to PCI PF and VF port attributes. External atttibute is required to generate unique phys_port_name when PF number and SF number are overlapping between two controllers similar to SR-IOV VFs. When a SF is for external controller an example view of external SF port and config sequence. On eswitch system: $ devlink dev eswitch set pci/0033:01:00.0 mode switchdev $ devlink port show pci/0033:01:00.0/196607: type eth netdev enP51p1s0f0np0 flavour physical port 0 splittable false pci/0033:01:00.0/131072: type eth netdev eth0 flavour pcipf controller 1 pfnum 0 external true splittable false function: hw_addr 00:00:00:00:00:00 $ devlink port add pci/0033:01:00.0 flavour pcisf pfnum 0 sfnum 77 controller 1 pci/0033:01:00.0/163840: type eth netdev eth1 flavour pcisf controller 1 pfnum 0 sfnum 77 splittable false function: hw_addr 00:00:00:00:00:00 state inactive opstate detached phys_port_name construction: $ cat /sys/class/net/eth1/phys_port_name c1pf0sf77 Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-24net/mlx5: E-Switch, Prepare to return total vports from eswitch structParav Pandit1-8/+0
Total vports are already stored during eswitch initialization. Instead of calculating everytime, read directly from eswitch. Additionally, host PF's SF vport information is available using QUERY_HCA_CAP command. It is not available through HCA_CAP of the eswitch manager PF. Hence, this patch prepares the return total eswitch vport count from the existing eswitch struct. This further helps to keep eswitch port counting macros and logic within eswitch. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-24net/mlx5: E-Switch, Return eswitch max ports when eswitch is supportedParav Pandit1-2/+9
mlx5_eswitch_get_total_vports() doesn't honor MLX5_ESWICH Kconfig flag. When MLX5_ESWITCH is disabled, FS layer continues to initialize eswitch specific ACL namespaces. Instead, start honoring MLX5_ESWITCH flag and perform vport specific initialization only when vport count is non zero. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-23Merge branch 'master' of ↵David S. Miller1-3/+0
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2021-04-23 1) The SPI flow key in struct flowi has no consumers, so remove it. From Florian Westphal. 2) Remove stray synchronize_rcu from xfrm_init. From Florian Westphal. 3) Use the new exit_pre hook to reset the netlink socket on net namespace destruction. From Florian Westphal. 4) Remove an unnecessary get_cpu() in ipcomp, that code is always called with BHs off. From Sabrina Dubroca. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-23Merge remote-tracking branch 'regulator/for-5.13' into regulator-nextMark Brown1-1/+8
2021-04-23Merge remote-tracking branch 'asoc/for-5.13' into asoc-nextMark Brown11-106/+482
2021-04-23Merge tag 'gpio-fixes-for-v5.12' of ↵Linus Torvalds1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio fix from Bartosz Golaszewski: "Save and restore the sysconfig register in gpio-omap to fix a power-management issue" * tag 'gpio-fixes-for-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: gpio: omap: Save and restore sysconfig
2021-04-23regulator: core: Fix off_on_delay handlingVincent Whitchurch1-1/+1
The jiffies-based off_on_delay implementation has a couple of problems that cause it to sometimes not actually delay for the required time: (1) If, for example, the off_on_delay time is equivalent to one jiffy, and the ->last_off_jiffy is set just before a new jiffy starts, then _regulator_do_enable() does not wait at all since it checks using time_before(). (2) When jiffies overflows, the value of "remaining" becomes higher than "max_delay" and the code simply proceeds without waiting. Fix these problems by changing it to use ktime_t instead. [Note that since jiffies doesn't start at zero but at INITIAL_JIFFIES ("-5 minutes"), (2) above also led to the code not delaying if the first regulator_enable() is called when the ->last_off_jiffy is not initialised, such as for regulators with ->constraints->boot_on set. It's not clear to me if this was intended or not, but I've preserved this behaviour explicitly with the check for a non-zero ->last_off.] Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Link: https://lore.kernel.org/r/20210423114524.26414-1-vincent.whitchurch@axis.com Signed-off-by: Mark Brown <broonie@kernel.org>
2021-04-23Merge tag 'kvmarm-5.13' of ↵Paolo Bonzini12-11/+123
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for Linux 5.13 New features: - Stage-2 isolation for the host kernel when running in protected mode - Guest SVE support when running in nVHE mode - Force W^X hypervisor mappings in nVHE mode - ITS save/restore for guests using direct injection with GICv4.1 - nVHE panics now produce readable backtraces - Guest support for PTP using the ptp_kvm driver - Performance improvements in the S2 fault handler - Alexandru is now a reviewer (not really a new feature...) Fixes: - Proper emulation of the GICR_TYPER register - Handle the complete set of relocation in the nVHE EL2 object - Get rid of the oprofile dependency in the PMU code (and of the oprofile body parts at the same time) - Debug and SPE fixes - Fix vcpu reset
2021-04-23xen/arm: introduce XENFEAT_direct_mapped and XENFEAT_not_direct_mappedStefano Stabellini3-0/+22
Newer Xen versions expose two Xen feature flags to tell us if the domain is directly mapped or not. Only when a domain is directly mapped it makes sense to enable swiotlb-xen on ARM. Introduce a function on ARM to check the new Xen feature flags and also to deal with the legacy case. Call the function xen_swiotlb_detect. Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/20210319200140.12512-1-sstabellini@kernel.org Signed-off-by: Juergen Gross <jgross@suse.com>
2021-04-23afs: Use ITER_XARRAY for writingDavid Howells1-32/+19
Use a single ITER_XARRAY iterator to describe the portion of a file to be transmitted to the server rather than generating a series of small ITER_BVEC iterators on the fly. This will make it easier to implement AIO in afs. In theory we could maybe use one giant ITER_BVEC, but that means potentially allocating a huge array of bio_vec structs (max 256 per page) when in fact the pagecache already has a structure listing all the relevant pages (radix_tree/xarray) that can be walked over. Signed-off-by: David Howells <dhowells@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/153685395197.14766.16289516750731233933.stgit@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/158861251312.340223.17924900795425422532.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/159465828607.1377938.6903132788463419368.stgit@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/160588535018.3465195.14509994354240338307.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161118152415.1232039.6452879415814850025.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161048194.2537118.13763612220937637316.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340411602.1303470.4661108879482218408.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539555629.286939.5241869986617154517.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653811456.2770958.7017388543246759245.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789095005.6155.6789055030327407928.stgit@warthog.procyon.org.uk/ # v6
2021-04-23afs: Don't truncate iter during data fetchDavid Howells1-1/+1
Don't truncate the iterator to correspond to the actual data size when fetching the data from the server - rather, pass the length we want to read to rxrpc. This will allow the clear-after-read code in future to simply clear the remaining iterator capacity rather than having to reinitialise the iterator. Signed-off-by: David Howells <dhowells@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/158861249201.340223.13035445866976590375.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/159465825061.1377938.14403904452300909320.stgit@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/160588531418.3465195.10712005940763063144.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161118148567.1232039.13380313332292947956.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161044610.2537118.17908520793806837792.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340407907.1303470.6501394859511712746.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539551721.286939.14655713136572200716.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653807790.2770958.14034599989374173734.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789090823.6155.15673999934535049102.stgit@warthog.procyon.org.uk/ # v6
2021-04-23afs: Pass page into dirty region helpers to provide THP sizeDavid Howells1-10/+13
Pass a pointer to the page being accessed into the dirty region helpers so that the size of the page can be determined in case it's a transparent huge page. This also required the page to be passed into the afs_page_dirty trace point - so there's no need to specifically pass in the index or private data as these can be retrieved directly from the page struct. Signed-off-by: David Howells <dhowells@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/160588527183.3465195.16107942526481976308.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161118144921.1232039.11377711180492625929.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161040747.2537118.11435394902674511430.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340404553.1303470.11414163641767769882.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539548385.286939.8864598314493255313.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653804285.2770958.3497360004849598038.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789087043.6155.16922142208140170528.stgit@warthog.procyon.org.uk/ # v6
2021-04-23fscache, cachefiles: Add alternate API to use kiocb for read/write to cacheDavid Howells2-0/+43
Add an alternate API by which the cache can be accessed through a kiocb, doing async DIO, rather than using the current API that tells the cache where all the pages are. The new API is intended to be used in conjunction with the netfs helper library. A filesystem must pick one or the other and not mix them. Filesystems wanting to use the new API must #define FSCACHE_USE_NEW_IO_API before #including the header. This prevents them from continuing to use the old API at the same time as there are incompatibilities in how the PG_fscache page bit is used. Changes: v6: - Provide a routine to shape a write so that the start and length can be aligned for DIO[3]. v4: - Use the vfs_iocb_iter_read/write() helpers[1] - Move initial definition of fscache_begin_read_operation() here. - Remove a commented-out line[2] - Combine ki->term_func calls in cachefiles_read_complete()[2]. - Remove explicit NULL initialiser[2]. - Remove extern on func decl[2]. - Put in param names on func decl[2]. - Remove redundant else[2]. - Fill out the kdoc comment for fscache_begin_read_operation(). - Rename fs/fscache/page2.c to io.c to match later patches. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: Christoph Hellwig <hch@lst.de> cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20210216102614.GA27555@lst.de/ [1] Link: https://lore.kernel.org/r/20210216084230.GA23669@lst.de/ [2] Link: https://lore.kernel.org/r/161781047695.463527.7463536103593997492.stgit@warthog.procyon.org.uk/ [3] Link: https://lore.kernel.org/r/161118142558.1232039.17993829899588971439.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161037850.2537118.8819808229350326503.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340402057.1303470.8038373593844486698.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539545919.286939.14573472672781434757.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653801477.2770958.10543270629064934227.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789084517.6155.12799689829859169640.stgit@warthog.procyon.org.uk/ # v6
2021-04-23netfs: Add a tracepoint to log failures that would be otherwise unseenDavid Howells1-0/+58
Add a tracepoint to log internal failures (such as cache errors) that we don't otherwise want to pass back to the netfs. Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/161781048813.463527.1557000804674707986.stgit@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/161789082749.6155.15498680577213140870.stgit@warthog.procyon.org.uk/ # v6
2021-04-23netfs: Define an interface to talk to a cacheDavid Howells2-0/+57
Add an interface to the netfs helper library for reading data from the cache instead of downloading it from the server and support for writing data just downloaded or cleared to the cache. The API passes an iov_iter to the cache read/write routines to indicate the data/buffer to be used. This is done using the ITER_XARRAY type to provide direct access to the netfs inode's pagecache. When the netfs's ->begin_cache_operation() method is called, this must fill in the cache_resources in the netfs_read_request struct, including the netfs_cache_ops used by the helper lib to talk to the cache. The helper lib does not directly access the cache. Changes: v6: - Call trace_netfs_read() after beginning the cache op so that the cookie debug ID can be logged[3]. - Don't record the error from writing to the cache. We don't want to pass it back to the netfs[4]. - Fix copy-to-cache subreq amalgamation to not round up as it goes along otherwise it overcalculates the length of the write[5]. v5: - Use end_page_fscache() rather than unlock_page_fscache()[2]. v4: - Added flag to netfs_subreq_terminated() to indicate that the caller may have been running async and stuff that might sleep needs punting to a workqueue (can't use in_softirq()[1]). - Add missing inc of netfs_n_rh_read stat. - Move initial definition of fscache_begin_read_operation() elsewhere. - Need to call op->begin_cache_operation() from netfs_write_begin(). Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20210216084230.GA23669@lst.de/ [1] Link: https://lore.kernel.org/r/2499407.1616505440@warthog.procyon.org.uk/ [2] Link: https://lore.kernel.org/r/161781045123.463527.14533348855710902201.stgit@warthog.procyon.org.uk/ [3] Link: https://lore.kernel.org/r/161781046256.463527.18158681600085556192.stgit@warthog.procyon.org.uk/ [4] Link: https://lore.kernel.org/r/161781047695.463527.7463536103593997492.stgit@warthog.procyon.org.uk/ [5] Link: https://lore.kernel.org/r/161118141321.1232039.8296910406755622458.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161036700.2537118.11170748455436854978.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340399569.1303470.1138884774643385730.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539542874.286939.13337898213448136687.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653799826.2770958.9015430297426331950.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789081462.6155.3853904866933313256.stgit@warthog.procyon.org.uk/ # v6
2021-04-23netfs: Add write_begin helperDavid Howells2-1/+11
Add a helper to do the pre-reading work for the netfs write_begin address space op. Changes v6: - Fixed a missing rreq put in netfs_write_begin()[3]. - Use DEFINE_READAHEAD()[4]. v5: - Made the wait for PG_fscache in netfs_write_begin() killable[2]. v4: - Added flag to netfs_subreq_terminated() to indicate that the caller may have been running async and stuff that might sleep needs punting to a workqueue (can't use in_softirq()[1]). Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20210216084230.GA23669@lst.de/ [1] Link: https://lore.kernel.org/r/2499407.1616505440@warthog.procyon.org.uk/ [2] Link: https://lore.kernel.org/r/161781042127.463527.9154479794406046987.stgit@warthog.procyon.org.uk/ [3] Link: https://lore.kernel.org/r/1234933.1617886271@warthog.procyon.org.uk/ [4] Link: https://lore.kernel.org/r/160588543960.3465195.2792938973035886168.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161118140165.1232039.16418853874312234477.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161035539.2537118.15674887534950908530.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340398368.1303470.11242918276563276090.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539541541.286939.1889738674057013729.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653798616.2770958.17213315845968485563.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789080530.6155.1011847312392330491.stgit@warthog.procyon.org.uk/ # v6
2021-04-23netfs: Gather statsDavid Howells1-0/+1
Gather statistics from the netfs interface that can be exported through a seqfile. This is intended to be called by a later patch when viewing /proc/fs/fscache/stats. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/161118139247.1232039.10556850937548511068.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161034669.2537118.2761232524997091480.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340397101.1303470.17581910581108378458.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539539959.286939.6794352576462965914.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653797700.2770958.5801990354413178228.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789079281.6155.17141344853277186500.stgit@warthog.procyon.org.uk/ # v6
2021-04-23netfs: Add tracepointsDavid Howells2-0/+200
Add three tracepoints to track the activity of the read helpers: (1) netfs/netfs_read This logs entry to the read helpers and also expansion of the range in a readahead request. (2) netfs/netfs_rreq This logs the progress of netfs_read_request objects which track read requests. A read request may be a compound of multiple subrequests. (3) netfs/netfs_sreq This logs the progress of netfs_read_subrequest objects, which track the contributions from various sources to a read request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/161118138060.1232039.5353374588021776217.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161033468.2537118.14021843889844001905.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340395843.1303470.7355519662919639648.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539538693.286939.10171713520419106334.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653796447.2770958.1870655382450862155.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789078003.6155.17814844411672989942.stgit@warthog.procyon.org.uk/ # v6
2021-04-23netfs: Provide readahead and readpage netfs helpersDavid Howells1-0/+83
Add a pair of helper functions: (*) netfs_readahead() (*) netfs_readpage() to do the work of handling a readahead or a readpage, where the page(s) that form part of the request may be split between the local cache, the server or just require clearing, and may be single pages and transparent huge pages. This is all handled within the helper. Note that while both will read from the cache if there is data present, only netfs_readahead() will expand the request beyond what it was asked to do, and only netfs_readahead() will write back to the cache. netfs_readpage(), on the other hand, is synchronous and only fetches the page (which might be a THP) it is asked for. The netfs gives the helper parameters from the VM, the cache cookie it wants to use (or NULL) and a table of operations (only one of which is mandatory): (*) expand_readahead() [optional] Called to allow the netfs to request an expansion of a readahead request to meet its own alignment requirements. This is done by changing rreq->start and rreq->len. (*) clamp_length() [optional] Called to allow the netfs to cut down a subrequest to meet its own boundary requirements. If it does this, the helper will generate additional subrequests until the full request is satisfied. (*) is_still_valid() [optional] Called to find out if the data just read from the cache has been invalidated and must be reread from the server. (*) issue_op() [required] Called to ask the netfs to issue a read to the server. The subrequest describes the read. The read request holds information about the file being accessed. The netfs can cache information in rreq->netfs_priv. Upon completion, the netfs should set the error, transferred and can also set FSCACHE_SREQ_CLEAR_TAIL and then call fscache_subreq_terminated(). (*) done() [optional] Called after the pages have been unlocked. The read request is still pinning the file and mapping and may still be pinning pages with PG_fscache. rreq->error indicates any error that has been accumulated. (*) cleanup() [optional] Called when the helper is disposing of a finished read request. This allows the netfs to clear rreq->netfs_priv. Netfs support is enabled with CONFIG_NETFS_SUPPORT=y. It will be built even if CONFIG_FSCACHE=n and in this case much of it should be optimised away, allowing the filesystem to use it even when caching is disabled. Changes: v5: - Comment why netfs_readahead() is putting pages[2]. - Use page_file_mapping() rather than page->mapping[2]. - Use page_index() rather than page->index[2]. - Use set_page_fscache()[3] rather then SetPageFsCache() as this takes an appropriate ref too[4]. v4: - Folded in a kerneldoc comment fix. - Folded in a fix for the error handling in the case that ENOMEM occurs. - Added flag to netfs_subreq_terminated() to indicate that the caller may have been running async and stuff that might sleep needs punting to a workqueue (can't use in_softirq()[1]). Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20210216084230.GA23669@lst.de/ [1] Link: https://lore.kernel.org/r/20210321014202.GF3420@casper.infradead.org/ [2] Link: https://lore.kernel.org/r/2499407.1616505440@warthog.procyon.org.uk/ [3] Link: https://lore.kernel.org/r/CAHk-=wh+2gbF7XEjYc=HV9w_2uVzVf7vs60BPz0gFA=+pUm3ww@mail.gmail.com/ [4] Link: https://lore.kernel.org/r/160588497406.3465195.18003475695899726222.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161118136849.1232039.8923686136144228724.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161032290.2537118.13400578415247339173.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340394873.1303470.6237319335883242536.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539537375.286939.16642940088716990995.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653795430.2770958.4947584573720000554.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789076581.6155.6745849361504760209.stgit@warthog.procyon.org.uk/ # v6
2021-04-23netfs, mm: Add set/end/wait_on_page_fscache() aliasesDavid Howells1-0/+57
Add set/end/wait_on_page_fscache() as aliases of set/end/wait_page_private_2(). These allow a page to marked with PG_fscache, the flag to be removed and waiters woken and waiting for the flag to be cleared. A ref on the page is also taken and dropped. [Linus suggested putting the fscache-themed functions into the caching-specific headers rather than pagemap.h[1]] Changes: v5: - Mirror the changes to the core routines[2]. Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: Linus Torvalds <torvalds@linux-foundation.org> cc: Matthew Wilcox <willy@infradead.org> cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/1330473.1612974547@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/CAHk-=wjgA-74ddehziVk=XAEMTKswPu1Yw4uaro1R3ibs27ztw@mail.gmail.com/ [1] Link: https://lore.kernel.org/r/161340393568.1303470.4997526899111310530.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539536093.286939.5076448803512118764.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/2499407.1616505440@warthog.procyon.org.uk/ [2] Link: https://lore.kernel.org/r/161653793873.2770958.12157243390965814502.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789075327.6155.7432127924219092385.stgit@warthog.procyon.org.uk/ # v6
2021-04-23netfs, mm: Move PG_fscache helper funcs to linux/netfs.hDavid Howells2-10/+30
Move the PG_fscache related helper funcs (such as SetPageFsCache()) to linux/netfs.h rather than linux/fscache.h as the intention is to move to a model where they're used by the network filesystem and the helper library, but not by fscache/cachefiles itself. Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/161340392347.1303470.18065131603507621762.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539534516.286939.6265142985563005000.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653792959.2770958.5386546945273988117.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789073997.6155.18442271115255650614.stgit@warthog.procyon.org.uk/ # v6
2021-04-23mm: Implement readahead_control pageset expansionDavid Howells1-0/+2
Provide a function, readahead_expand(), that expands the set of pages specified by a readahead_control object to encompass a revised area with a proposed size and length. The proposed area must include all of the old area and may be expanded yet more by this function so that the edges align on (transparent huge) page boundaries as allocated. The expansion will be cut short if a page already exists in either of the areas being expanded into. Note that any expansion made in such a case is not rolled back. This will be used by fscache so that reads can be expanded to cache granule boundaries, thereby allowing whole granules to be stored in the cache, but there are other potential users also. Changes: v6: - Fold in a patch from Matthew Wilcox to tell the ondemand readahead algorithm about the expansion so that the next readahead starts at the right place[2]. v4: - Moved the declaration of readahead_expand() to a better place[1]. Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: Alexander Viro <viro@zeniv.linux.org.uk> cc: Christoph Hellwig <hch@lst.de> cc: Mike Marshall <hubcap@omnibond.com> cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20210217161358.GM2858050@casper.infradead.org/ [1] Link: https://lore.kernel.org/r/20210407201857.3582797-4-willy@infradead.org/ [2] Link: https://lore.kernel.org/r/159974633888.2094769.8326206446358128373.stgit@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/160588479816.3465195.553952688795241765.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161118131787.1232039.4863969952441067985.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161028670.2537118.13831420617039766044.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340389201.1303470.14353807284546854878.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539530488.286939.18085961677838089157.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653789422.2770958.2108046612147345000.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789069829.6155.4295672417565512161.stgit@warthog.procyon.org.uk/ # v6
2021-04-23fs: Document file_ra_stateMatthew Wilcox (Oracle)1-10/+14
Turn the comments into kernel-doc and improve the wording slightly. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> Link: https://lore.kernel.org/r/20210407201857.3582797-3-willy@infradead.org/ Link: https://lore.kernel.org/r/161789068619.6155.1397999970593531574.stgit@warthog.procyon.org.uk/ # v6
2021-04-23mm/filemap: Pass the file_ra_state in the ractlMatthew Wilcox (Oracle)1-9/+11
For readahead_expand(), we need to modify the file ra_state, so pass it down by adding it to the ractl. We have to do this because it's not always the same as f_ra in the struct file that is already being passed. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> Link: https://lore.kernel.org/r/20210407201857.3582797-2-willy@infradead.org/ Link: https://lore.kernel.org/r/161789067431.6155.8063840447229665720.stgit@warthog.procyon.org.uk/ # v6