diff options
Diffstat (limited to 'Documentation/mm/damon')
-rw-r--r-- | Documentation/mm/damon/design.rst | 142 | ||||
-rw-r--r-- | Documentation/mm/damon/index.rst | 6 | ||||
-rw-r--r-- | Documentation/mm/damon/maintainer-profile.rst | 35 | ||||
-rw-r--r-- | Documentation/mm/damon/monitoring_intervals_tuning_example.rst | 8 |
4 files changed, 125 insertions, 66 deletions
diff --git a/Documentation/mm/damon/design.rst b/Documentation/mm/damon/design.rst index e28c6a1b40ae..03f8137256f5 100644 --- a/Documentation/mm/damon/design.rst +++ b/Documentation/mm/damon/design.rst @@ -54,7 +54,7 @@ monitoring are address-space dependent. DAMON consolidates these implementations in a layer called DAMON Operations Set, and defines the interface between it and the upper layer. The upper layer is dedicated for DAMON's core logics including the mechanism for control of the -monitoring accruracy and the overhead. +monitoring accuracy and the overhead. Hence, DAMON can easily be extended for any address space and/or available hardware features by configuring the core logic to use the appropriate @@ -313,6 +313,10 @@ sufficient for the given purpose, it shouldn't be unnecessarily further lowered. It is recommended to be set proportional to ``aggregation interval``. By default, the ratio is set as ``1/20``, and it is still recommended. +Based on the manual tuning guide, DAMON provides more intuitive knob-based +intervals auto tuning mechanism. Please refer to :ref:`the design document of +the feature <damon_design_monitoring_intervals_autotuning>` for detail. + Refer to below documents for an example tuning based on the above guide. .. toctree:: @@ -321,6 +325,52 @@ Refer to below documents for an example tuning based on the above guide. monitoring_intervals_tuning_example +.. _damon_design_monitoring_intervals_autotuning: + +Monitoring Intervals Auto-tuning +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +DAMON provides automatic tuning of the ``sampling interval`` and ``aggregation +interval`` based on the :ref:`the tuning guide idea +<damon_design_monitoring_params_tuning_guide>`. The tuning mechanism allows +users to set the aimed amount of access events to observe via DAMON within +given time interval. The target can be specified by the user as a ratio of +DAMON-observed access events to the theoretical maximum amount of the events +(``access_bp``) that measured within a given number of aggregations +(``aggrs``). + +The DAMON-observed access events are calculated in byte granularity based on +DAMON :ref:`region assumption <damon_design_region_based_sampling>`. For +example, if a region of size ``X`` bytes of ``Y`` ``nr_accesses`` is found, it +means ``X * Y`` access events are observed by DAMON. Theoretical maximum +access events for the region is calculated in same way, but replacing ``Y`` +with theoretical maximum ``nr_accesses``, which can be calculated as +``aggregation interval / sampling interval``. + +The mechanism calculates the ratio of access events for ``aggrs`` aggregations, +and increases or decrease the ``sampleing interval`` and ``aggregation +interval`` in same ratio, if the observed access ratio is lower or higher than +the target, respectively. The ratio of the intervals change is decided in +proportion to the distance between current samples ratio and the target ratio. + +The user can further set the minimum and maximum ``sampling interval`` that can +be set by the tuning mechanism using two parameters (``min_sample_us`` and +``max_sample_us``). Because the tuning mechanism changes ``sampling interval`` +and ``aggregation interval`` in same ratio always, the minimum and maximum +``aggregation interval`` after each of the tuning changes can automatically set +together. + +The tuning is turned off by default, and need to be set explicitly by the user. +As a rule of thumbs and the Parreto principle, 4% access samples ratio target +is recommended. Note that Parreto principle (80/20 rule) has applied twice. +That is, assumes 4% (20% of 20%) DAMON-observed access events ratio (source) +to capture 64% (80% multipled by 80%) real access events (outcomes). + +To know how user-space can use this feature via :ref:`DAMON sysfs interface +<sysfs_interface>`, refer to :ref:`intervals_goal <sysfs_scheme>` part of +the documentation. + + .. _damon_design_damos: Operation Schemes @@ -402,9 +452,9 @@ that supports each action are as below. - ``lru_deprio``: Deprioritize the region on its LRU lists. Supported by ``paddr`` operations set. - ``migrate_hot``: Migrate the regions prioritizing warmer regions. - Supported by ``paddr`` operations set. + Supported by ``vaddr``, ``fvaddr`` and ``paddr`` operations set. - ``migrate_cold``: Migrate the regions prioritizing colder regions. - Supported by ``paddr`` operations set. + Supported by ``vaddr``, ``fvaddr`` and ``paddr`` operations set. - ``stat``: Do nothing but count the statistics. Supported by all operations sets. @@ -500,10 +550,10 @@ aggressiveness (the quota) of the corresponding scheme. For example, if DAMOS is under achieving the goal, DAMOS automatically increases the quota. If DAMOS is over achieving the goal, it decreases the quota. -The goal can be specified with three parameters, namely ``target_metric``, -``target_value``, and ``current_value``. The auto-tuning mechanism tries to -make ``current_value`` of ``target_metric`` be same to ``target_value``. -Currently, two ``target_metric`` are provided. +The goal can be specified with four parameters, namely ``target_metric``, +``target_value``, ``current_value`` and ``nid``. The auto-tuning mechanism +tries to make ``current_value`` of ``target_metric`` be same to +``target_value``. - ``user_input``: User-provided value. Users could use any metric that they has interest in for the value. Use space main workload's latency or @@ -515,6 +565,11 @@ Currently, two ``target_metric`` are provided. in microseconds that measured from last quota reset to next quota reset. DAMOS does the measurement on its own, so only ``target_value`` need to be set by users at the initial time. In other words, DAMOS does self-feedback. +- ``node_mem_used_bp``: Specific NUMA node's used memory ratio in bp (1/10,000). +- ``node_mem_free_bp``: Specific NUMA node's free memory ratio in bp (1/10,000). + +``nid`` is optionally required for only ``node_mem_used_bp`` and +``node_mem_free_bp`` to point the specific NUMA node. To know how user-space can set the tuning goal metric, the target value, and/or the current value via :ref:`DAMON sysfs interface <sysfs_interface>`, refer to @@ -569,11 +624,22 @@ number of filters for each scheme. Each filter specifies - whether it is to allow (include) or reject (exclude) applying the scheme's action to the memory (``allow``). -When multiple filters are installed, each filter is evaluated in the installed -order. If a part of memory is matched to one of the filter, next filters are -ignored. If the memory passes through the filters evaluation stage because it -is not matched to any of the filters, applying the scheme's action to it is -allowed, same to the behavior when no filter exists. +For efficient handling of filters, some types of filters are handled by the +core layer, while others are handled by operations set. In the latter case, +hence, support of the filter types depends on the DAMON operations set. In +case of the core layer-handled filters, the memory regions that excluded by the +filter are not counted as the scheme has tried to the region. In contrast, if +a memory regions is filtered by an operations set layer-handled filter, it is +counted as the scheme has tried. This difference affects the statistics. + +When multiple filters are installed, the group of filters that handled by the +core layer are evaluated first. After that, the group of filters that handled +by the operations layer are evaluated. Filters in each of the groups are +evaluated in the installed order. If a part of memory is matched to one of the +filter, next filters are ignored. If the part passes through the filters +evaluation stage because it is not matched to any of the filters, applying the +scheme's action to it depends on the last filter's allowance type. If the last +filter was for allowing, the part of memory will be rejected, and vice versa. For example, let's assume 1) a filter for allowing anonymous pages and 2) another filter for rejecting young pages are installed in the order. If a page @@ -585,39 +651,29 @@ second reject-filter blocks it. If the page is neither anonymous nor young, the page will pass through the filters evaluation stage since there is no matching filter, and the action will be applied to the page. -Note that the action can equally be applied to memory that either explicitly -filter-allowed or filters evaluation stage passed. It means that installing -allow-filters at the end of the list makes no practical change but only -filters-checking overhead. - -For efficient handling of filters, some types of filters are handled by the -core layer, while others are handled by operations set. In the latter case, -hence, support of the filter types depends on the DAMON operations set. In -case of the core layer-handled filters, the memory regions that excluded by the -filter are not counted as the scheme has tried to the region. In contrast, if -a memory regions is filtered by an operations set layer-handled filter, it is -counted as the scheme has tried. This difference affects the statistics. - Below ``type`` of filters are currently supported. -- anonymous page - - Applied to pages that containing data that not stored in files. - - Handled by operations set layer. Supported by only ``paddr`` set. -- memory cgroup - - Applied to pages that belonging to a given cgroup. - - Handled by operations set layer. Supported by only ``paddr`` set. -- young page - - Applied to pages that are accessed after the last access check from the - scheme. - - Handled by operations set layer. Supported by only ``paddr`` set. -- address range - - Applied to pages that belonging to a given address range. - - Handled by the core logic. -- DAMON monitoring target - - Applied to pages that belonging to a given DAMON monitoring target. - - Handled by the core logic. - -To know how user-space can set the watermarks via :ref:`DAMON sysfs interface +- Core layer handled + - addr + - Applied to pages that belonging to a given address range. + - target + - Applied to pages that belonging to a given DAMON monitoring target. +- Operations layer handled, supported by only ``paddr`` operations set. + - anon + - Applied to pages that containing data that not stored in files. + - active + - Applied to active pages. + - memcg + - Applied to pages that belonging to a given cgroup. + - young + - Applied to pages that are accessed after the last access check from the + scheme. + - hugepage_size + - Applied to pages that managed in a given size range. + - unmapped + - Applied to pages that unmapped. + +To know how user-space can set the filters via :ref:`DAMON sysfs interface <sysfs_interface>`, refer to :ref:`filters <sysfs_filters>` part of the documentation. diff --git a/Documentation/mm/damon/index.rst b/Documentation/mm/damon/index.rst index 5a3359704cce..31c1fa955b3d 100644 --- a/Documentation/mm/damon/index.rst +++ b/Documentation/mm/damon/index.rst @@ -1,8 +1,8 @@ .. SPDX-License-Identifier: GPL-2.0 -========================== -DAMON: Data Access MONitor -========================== +================================================================ +DAMON: Data Access MONitoring and Access-aware System Operations +================================================================ DAMON is a Linux kernel subsystem that provides a framework for data access monitoring and the monitoring results based system operations. The core diff --git a/Documentation/mm/damon/maintainer-profile.rst b/Documentation/mm/damon/maintainer-profile.rst index ce3e98458339..5cd07905a193 100644 --- a/Documentation/mm/damon/maintainer-profile.rst +++ b/Documentation/mm/damon/maintainer-profile.rst @@ -7,9 +7,9 @@ The DAMON subsystem covers the files that are listed in 'DATA ACCESS MONITOR' section of 'MAINTAINERS' file. The mailing lists for the subsystem are damon@lists.linux.dev and -linux-mm@kvack.org. Patches should be made against the `mm-unstable tree -<https://git.kernel.org/akpm/mm/h/mm-unstable>`_ whenever possible and posted -to the mailing lists. +linux-mm@kvack.org. Patches should be made against the `mm-new tree +<https://git.kernel.org/akpm/mm/h/mm-new>`_ whenever possible and posted to the +mailing lists. SCM Trees --------- @@ -17,17 +17,19 @@ SCM Trees There are multiple Linux trees for DAMON development. Patches under development or testing are queued in `damon/next <https://git.kernel.org/sj/h/damon/next>`_ by the DAMON maintainer. -Sufficiently reviewed patches will be queued in `mm-unstable -<https://git.kernel.org/akpm/mm/h/mm-unstable>`_ by the memory management -subsystem maintainer. After more sufficient tests, the patches will be queued -in `mm-stable <https://git.kernel.org/akpm/mm/h/mm-stable>`_, and finally -pull-requested to the mainline by the memory management subsystem maintainer. - -Note again the patches for `mm-unstable tree -<https://git.kernel.org/akpm/mm/h/mm-unstable>`_ are queued by the memory -management subsystem maintainer. If the patches requires some patches in -`damon/next tree <https://git.kernel.org/sj/h/damon/next>`_ which not yet merged -in mm-unstable, please make sure the requirement is clearly specified. +Sufficiently reviewed patches will be queued in `mm-new +<https://git.kernel.org/akpm/mm/h/mm-new>`_ by the memory management subsystem +maintainer. As more sufficient tests are done, the patches will move to +`mm-unstable <https://git.kernel.org/akpm/mm/h/mm-unstable>`_ and then to +`mm-stable <https://git.kernel.org/akpm/mm/h/mm-stable>`_. And finally those +will be pull-requested to the mainline by the memory management subsystem +maintainer. + +Note again the patches for `mm-new tree +<https://git.kernel.org/akpm/mm/h/mm-new>`_ are queued by the memory management +subsystem maintainer. If the patches requires some patches in `damon/next tree +<https://git.kernel.org/sj/h/damon/next>`_ which not yet merged in mm-new, +please make sure the requirement is clearly specified. Submit checklist addendum ------------------------- @@ -53,8 +55,9 @@ Further doing below and putting the results will be helpful. Key cycle dates --------------- -Patches can be sent anytime. Key cycle dates of the `mm-unstable -<https://git.kernel.org/akpm/mm/h/mm-unstable>`_ and `mm-stable +Patches can be sent anytime. Key cycle dates of the `mm-new +<https://git.kernel.org/akpm/mm/h/mm-new>`_, `mm-unstable +<https://git.kernel.org/akpm/mm/h/mm-unstable>`_and `mm-stable <https://git.kernel.org/akpm/mm/h/mm-stable>`_ trees depend on the memory management subsystem maintainer. diff --git a/Documentation/mm/damon/monitoring_intervals_tuning_example.rst b/Documentation/mm/damon/monitoring_intervals_tuning_example.rst index 334a854efb40..7207cbed591f 100644 --- a/Documentation/mm/damon/monitoring_intervals_tuning_example.rst +++ b/Documentation/mm/damon/monitoring_intervals_tuning_example.rst @@ -36,7 +36,7 @@ Then, list the DAMON-found regions of different access patterns, sorted by the "access temperature". "Access temperature" is a metric representing the access-hotness of a region. It is calculated as a weighted sum of the access frequency and the age of the region. If the access frequency is 0 %, the -temperature is multipled by minus one. That is, if a region is not accessed, +temperature is multiplied by minus one. That is, if a region is not accessed, it gets minus temperature and it gets lower as not accessed for longer time. The sorting is in temperature-ascendint order, so the region at the top of the list is the coldest, and the one at the bottom is the hottest one. :: @@ -58,11 +58,11 @@ list is the coldest, and the one at the bottom is the hottest one. :: The list shows not seemingly hot regions, and only minimum access pattern diversity. Every region has zero access frequency. The number of region is 10, which is the default ``min_nr_regions value``. Size of each region is also -nearly idential. We can suspect this is because “adaptive regions adjustment” +nearly identical. We can suspect this is because “adaptive regions adjustment” mechanism was not well working. As the guide suggested, we can get relative hotness of regions using ``age`` as the recency information. That would be better than nothing, but given the fact that the longest age is only about 6 -seconds while we waited about ten minuts, it is unclear how useful this will +seconds while we waited about ten minutes, it is unclear how useful this will be. The temperature ranges to total size of regions of each range histogram @@ -190,7 +190,7 @@ for sampling and aggregation intervals, respectively). :: The number of regions having different access patterns has significantly increased. Size of each region is also more varied. Total size of non-zero access frequency regions is also significantly increased. Maybe this is already -good enough to make some meaningful memory management efficieny changes. +good enough to make some meaningful memory management efficiency changes. 800ms/16s intervals: Another bias ================================= |