diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2017-05-02 07:15:50 +0300 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2017-05-02 07:15:50 +0300 |
commit | a52bbaf4a3b81e07430a91ee37ea76557c2c02ed (patch) | |
tree | 237b8878b7b68b41a4bfd549b34d174c8439c3a9 /Documentation/x86 | |
parent | 16b76293c5c81e6345323d7aef41b26e8390f62d (diff) | |
parent | 4797b7dfdfcf457075c36743d71e2b0feeaaa20f (diff) | |
download | linux-a52bbaf4a3b81e07430a91ee37ea76557c2c02ed.tar.xz |
Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu updates from Ingo Molnar:
"The biggest changes are an extension of the Intel RDT code to extend
it with Intel Memory Bandwidth Allocation CPU support: MBA allows
bandwidth allocation between cores, while CBM (already upstream)
allows CPU cache partitioning.
There's also misc smaller fixes and updates"
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
x86/intel_rdt: Return error for incorrect resource names in schemata
x86/intel_rdt: Trim whitespace while parsing schemata input
x86/intel_rdt: Fix padding when resource is enabled via mount
x86/intel_rdt: Get rid of anon union
x86/cpu: Keep model defines sorted by model number
x86/intel_rdt/mba: Add schemata file support for MBA
x86/intel_rdt: Make schemata file parsers resource specific
x86/intel_rdt/mba: Add info directory files for Memory Bandwidth Allocation
x86/intel_rdt: Make information files resource specific
x86/intel_rdt/mba: Add primary support for Memory Bandwidth Allocation (MBA)
x86/intel_rdt/mba: Memory bandwith allocation feature detect
x86/intel_rdt: Add resource specific msr update function
x86/intel_rdt: Move CBM specific data into a struct
x86/intel_rdt: Cleanup namespace to support multiple resource types
Documentation, x86: Intel Memory bandwidth allocation
x86/intel_rdt: Organize code properly
x86/intel_rdt: Init padding only if a device exists
x86/intel_rdt: Add cpus_list rdtgroup file
x86/intel_rdt: Cleanup kernel-doc
x86/intel_rdt: Update schemata read to show data in tabular format
...
Diffstat (limited to 'Documentation/x86')
-rw-r--r-- | Documentation/x86/intel_rdt_ui.txt | 124 |
1 files changed, 104 insertions, 20 deletions
diff --git a/Documentation/x86/intel_rdt_ui.txt b/Documentation/x86/intel_rdt_ui.txt index 51cf6fa5591f..0f6d8477b66c 100644 --- a/Documentation/x86/intel_rdt_ui.txt +++ b/Documentation/x86/intel_rdt_ui.txt @@ -4,6 +4,7 @@ Copyright (C) 2016 Intel Corporation Fenghua Yu <fenghua.yu@intel.com> Tony Luck <tony.luck@intel.com> +Vikas Shivappa <vikas.shivappa@intel.com> This feature is enabled by the CONFIG_INTEL_RDT_A Kconfig and the X86 /proc/cpuinfo flag bits "rdt", "cat_l3" and "cdp_l3". @@ -22,19 +23,34 @@ Info directory The 'info' directory contains information about the enabled resources. Each resource has its own subdirectory. The subdirectory -names reflect the resource names. Each subdirectory contains the -following files: +names reflect the resource names. +Cache resource(L3/L2) subdirectory contains the following files: -"num_closids": The number of CLOSIDs which are valid for this - resource. The kernel uses the smallest number of - CLOSIDs of all enabled resources as limit. +"num_closids": The number of CLOSIDs which are valid for this + resource. The kernel uses the smallest number of + CLOSIDs of all enabled resources as limit. -"cbm_mask": The bitmask which is valid for this resource. This - mask is equivalent to 100%. +"cbm_mask": The bitmask which is valid for this resource. + This mask is equivalent to 100%. -"min_cbm_bits": The minimum number of consecutive bits which must be - set when writing a mask. +"min_cbm_bits": The minimum number of consecutive bits which + must be set when writing a mask. +Memory bandwitdh(MB) subdirectory contains the following files: + +"min_bandwidth": The minimum memory bandwidth percentage which + user can request. + +"bandwidth_gran": The granularity in which the memory bandwidth + percentage is allocated. The allocated + b/w percentage is rounded off to the next + control step available on the hardware. The + available bandwidth control steps are: + min_bandwidth + N * bandwidth_gran. + +"delay_linear": Indicates if the delay scale is linear or + non-linear. This field is purely informational + only. Resource groups --------------- @@ -59,6 +75,9 @@ There are three files associated with each group: given to the default (root) group. You cannot remove CPUs from the default group. +"cpus_list": One or more CPU ranges of logical CPUs assigned to this + group. Same rules apply like for the "cpus" file. + "schemata": A list of all the resources available to this group. Each resource has its own line and format - see below for details. @@ -107,6 +126,22 @@ and 0xA are not. On a system with a 20-bit mask each bit represents 5% of the capacity of the cache. You could partition the cache into four equal parts with masks: 0x1f, 0x3e0, 0x7c00, 0xf8000. +Memory bandwidth(b/w) percentage +-------------------------------- +For Memory b/w resource, user controls the resource by indicating the +percentage of total memory b/w. + +The minimum bandwidth percentage value for each cpu model is predefined +and can be looked up through "info/MB/min_bandwidth". The bandwidth +granularity that is allocated is also dependent on the cpu model and can +be looked up at "info/MB/bandwidth_gran". The available bandwidth +control steps are: min_bw + N * bw_gran. Intermediate values are rounded +to the next control step available on the hardware. + +The bandwidth throttling is a core specific mechanism on some of Intel +SKUs. Using a high bandwidth and a low bandwidth setting on two threads +sharing a core will result in both threads being throttled to use the +low bandwidth. L3 details (code and data prioritization disabled) -------------------------------------------------- @@ -129,16 +164,38 @@ schemata format is always: L2:<cache_id0>=<cbm>;<cache_id1>=<cbm>;... +Memory b/w Allocation details +----------------------------- + +Memory b/w domain is L3 cache. + + MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;... + +Reading/writing the schemata file +--------------------------------- +Reading the schemata file will show the state of all resources +on all domains. When writing you only need to specify those values +which you wish to change. E.g. + +# cat schemata +L3DATA:0=fffff;1=fffff;2=fffff;3=fffff +L3CODE:0=fffff;1=fffff;2=fffff;3=fffff +# echo "L3DATA:2=3c0;" > schemata +# cat schemata +L3DATA:0=fffff;1=fffff;2=3c0;3=fffff +L3CODE:0=fffff;1=fffff;2=fffff;3=fffff + Example 1 --------- On a two socket machine (one L3 cache per socket) with just four bits -for cache bit masks +for cache bit masks, minimum b/w of 10% with a memory bandwidth +granularity of 10% # mount -t resctrl resctrl /sys/fs/resctrl # cd /sys/fs/resctrl # mkdir p0 p1 -# echo "L3:0=3;1=c" > /sys/fs/resctrl/p0/schemata -# echo "L3:0=3;1=3" > /sys/fs/resctrl/p1/schemata +# echo "L3:0=3;1=c\nMB:0=50;1=50" > /sys/fs/resctrl/p0/schemata +# echo "L3:0=3;1=3\nMB:0=50;1=50" > /sys/fs/resctrl/p1/schemata The default resource group is unmodified, so we have access to all parts of all caches (its schemata file reads "L3:0=f;1=f"). @@ -147,6 +204,14 @@ Tasks that are under the control of group "p0" may only allocate from the "lower" 50% on cache ID 0, and the "upper" 50% of cache ID 1. Tasks in group "p1" use the "lower" 50% of cache on both sockets. +Similarly, tasks that are under the control of group "p0" may use a +maximum memory b/w of 50% on socket0 and 50% on socket 1. +Tasks in group "p1" may also use 50% memory b/w on both sockets. +Note that unlike cache masks, memory b/w cannot specify whether these +allocations can overlap or not. The allocations specifies the maximum +b/w that the group may be able to use and the system admin can configure +the b/w accordingly. + Example 2 --------- Again two sockets, but this time with a more realistic 20-bit mask. @@ -160,9 +225,10 @@ of L3 cache on socket 0. # cd /sys/fs/resctrl First we reset the schemata for the default group so that the "upper" -50% of the L3 cache on socket 0 cannot be used by ordinary tasks: +50% of the L3 cache on socket 0 and 50% of memory b/w cannot be used by +ordinary tasks: -# echo "L3:0=3ff;1=fffff" > schemata +# echo "L3:0=3ff;1=fffff\nMB:0=50;1=100" > schemata Next we make a resource group for our first real time task and give it access to the "top" 25% of the cache on socket 0. @@ -185,6 +251,20 @@ Ditto for the second real time task (with the remaining 25% of cache): # echo 5678 > p1/tasks # taskset -cp 2 5678 +For the same 2 socket system with memory b/w resource and CAT L3 the +schemata would look like(Assume min_bandwidth 10 and bandwidth_gran is +10): + +For our first real time task this would request 20% memory b/w on socket +0. + +# echo -e "L3:0=f8000;1=fffff\nMB:0=20;1=100" > p0/schemata + +For our second real time task this would request an other 20% memory b/w +on socket 0. + +# echo -e "L3:0=f8000;1=fffff\nMB:0=20;1=100" > p0/schemata + Example 3 --------- @@ -198,18 +278,22 @@ the tasks. # cd /sys/fs/resctrl First we reset the schemata for the default group so that the "upper" -50% of the L3 cache on socket 0 cannot be used by ordinary tasks: +50% of the L3 cache on socket 0, and 50% of memory bandwidth on socket 0 +cannot be used by ordinary tasks: -# echo "L3:0=3ff" > schemata +# echo "L3:0=3ff\nMB:0=50" > schemata -Next we make a resource group for our real time cores and give -it access to the "top" 50% of the cache on socket 0. +Next we make a resource group for our real time cores and give it access +to the "top" 50% of the cache on socket 0 and 50% of memory bandwidth on +socket 0. # mkdir p0 -# echo "L3:0=ffc00;" > p0/schemata +# echo "L3:0=ffc00\nMB:0=50" > p0/schemata Finally we move core 4-7 over to the new group and make sure that the -kernel and the tasks running there get 50% of the cache. +kernel and the tasks running there get 50% of the cache. They should +also get 50% of memory bandwidth assuming that the cores 4-7 are SMT +siblings and only the real time threads are scheduled on the cores 4-7. # echo C0 > p0/cpus |