<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/Documentation/features/vm/TLB, branch v6.6.131</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.131</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.131'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2023-08-18T17:12:37+00:00</updated>
<entry>
<title>arm64: support batched/deferred tlb shootdown during page reclamation/migration</title>
<updated>2023-08-18T17:12:37+00:00</updated>
<author>
<name>Barry Song</name>
<email>v-songbaohua@oppo.com</email>
</author>
<published>2023-07-17T13:10:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=43b3dfdd04553171488cb11d46d21948b6b90e27'/>
<id>urn:sha1:43b3dfdd04553171488cb11d46d21948b6b90e27</id>
<content type='text'>
On x86, batched and deferred tlb shootdown has lead to 90% performance
increase on tlb shootdown.  on arm64, HW can do tlb shootdown without
software IPI.  But sync tlbi is still quite expensive.

Even running a simplest program which requires swapout can
prove this is true,
 #include &lt;sys/types.h&gt;
 #include &lt;unistd.h&gt;
 #include &lt;sys/mman.h&gt;
 #include &lt;string.h&gt;

 int main()
 {
 #define SIZE (1 * 1024 * 1024)
         volatile unsigned char *p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
                                          MAP_SHARED | MAP_ANONYMOUS, -1, 0);

         memset(p, 0x88, SIZE);

         for (int k = 0; k &lt; 10000; k++) {
                 /* swap in */
                 for (int i = 0; i &lt; SIZE; i += 4096) {
                         (void)p[i];
                 }

                 /* swap out */
                 madvise(p, SIZE, MADV_PAGEOUT);
         }
 }

Perf result on snapdragon 888 with 8 cores by using zRAM
as the swap block device.

 ~ # perf record taskset -c 4 ./a.out
 [ perf record: Woken up 10 times to write data ]
 [ perf record: Captured and wrote 2.297 MB perf.data (60084 samples) ]
 ~ # perf report
 # To display the perf.data header info, please use --header/--header-only options.
 # To display the perf.data header info, please use --header/--header-only options.
 #
 #
 # Total Lost Samples: 0
 #
 # Samples: 60K of event 'cycles'
 # Event count (approx.): 35706225414
 #
 # Overhead  Command  Shared Object      Symbol
 # ........  .......  .................  ......
 #
    21.07%  a.out    [kernel.kallsyms]  [k] _raw_spin_unlock_irq
     8.23%  a.out    [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
     6.67%  a.out    [kernel.kallsyms]  [k] filemap_map_pages
     6.16%  a.out    [kernel.kallsyms]  [k] __zram_bvec_write
     5.36%  a.out    [kernel.kallsyms]  [k] ptep_clear_flush
     3.71%  a.out    [kernel.kallsyms]  [k] _raw_spin_lock
     3.49%  a.out    [kernel.kallsyms]  [k] memset64
     1.63%  a.out    [kernel.kallsyms]  [k] clear_page
     1.42%  a.out    [kernel.kallsyms]  [k] _raw_spin_unlock
     1.26%  a.out    [kernel.kallsyms]  [k] mod_zone_state.llvm.8525150236079521930
     1.23%  a.out    [kernel.kallsyms]  [k] xas_load
     1.15%  a.out    [kernel.kallsyms]  [k] zram_slot_lock

ptep_clear_flush() takes 5.36% CPU in the micro-benchmark swapping in/out
a page mapped by only one process.  If the page is mapped by multiple
processes, typically, like more than 100 on a phone, the overhead would be
much higher as we have to run tlb flush 100 times for one single page. 
Plus, tlb flush overhead will increase with the number of CPU cores due to
the bad scalability of tlb shootdown in HW, so those ARM64 servers should
expect much higher overhead.

Further perf annonate shows 95% cpu time of ptep_clear_flush is actually
used by the final dsb() to wait for the completion of tlb flush.  This
provides us a very good chance to leverage the existing batched tlb in
kernel.  The minimum modification is that we only send async tlbi in the
first stage and we send dsb while we have to sync in the second stage.

With the above simplest micro benchmark, collapsed time to finish the
program decreases around 5%.

Typical collapsed time w/o patch:
 ~ # time taskset -c 4 ./a.out
 0.21user 14.34system 0:14.69elapsed
w/ patch:
 ~ # time taskset -c 4 ./a.out
 0.22user 13.45system 0:13.80elapsed

Also tested with benchmark in the commit on Kunpeng920 arm64 server
and observed an improvement around 12.5% with command
`time ./swap_bench`.
        w/o             w/
real    0m13.460s       0m11.771s
user    0m0.248s        0m0.279s
sys     0m12.039s       0m11.458s

Originally it's noticed a 16.99% overhead of ptep_clear_flush()
which has been eliminated by this patch:

[root@localhost yang]# perf record -- ./swap_bench &amp;&amp; perf report
[...]
16.99%  swap_bench  [kernel.kallsyms]  [k] ptep_clear_flush

It is tested on 4,8,128 CPU platforms and shows to be beneficial on
large systems but may not have improvement on small systems like on
a 4 CPU platform.

Also this patch improve the performance of page migration. Using pmbench
and tries to migrate the pages of pmbench between node 0 and node 1 for
100 times for 1G memory, this patch decrease the time used around 20%
(prev 18.338318910 sec after 13.981866350 sec) and saved the time used
by ptep_clear_flush().

Link: https://lkml.kernel.org/r/20230717131004.12662-5-yangyicong@huawei.com
Tested-by: Yicong Yang &lt;yangyicong@hisilicon.com&gt;
Tested-by: Xin Hao &lt;xhao@linux.alibaba.com&gt;
Tested-by: Punit Agrawal &lt;punit.agrawal@bytedance.com&gt;
Signed-off-by: Barry Song &lt;v-songbaohua@oppo.com&gt;
Signed-off-by: Yicong Yang &lt;yangyicong@hisilicon.com&gt;
Reviewed-by: Kefeng Wang &lt;wangkefeng.wang@huawei.com&gt;
Reviewed-by: Xin Hao &lt;xhao@linux.alibaba.com&gt;
Reviewed-by: Anshuman Khandual &lt;anshuman.khandual@arm.com&gt;
Reviewed-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Anshuman Khandual &lt;anshuman.khandual@arm.com&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Nadav Amit &lt;namit@vmware.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Anshuman Khandual &lt;khandual@linux.vnet.ibm.com&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Barry Song &lt;baohua@kernel.org&gt;
Cc: Darren Hart &lt;darren@os.amperecomputing.com&gt;
Cc: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Cc: lipeifeng &lt;lipeifeng@oppo.com&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ryan Roberts &lt;ryan.roberts@arm.com&gt;
Cc: Steven Miao &lt;realmz6@gmail.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Cc: Zeng Tao &lt;prime.zeng@hisilicon.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Documentation/features: Use loongarch instead of loong</title>
<updated>2022-12-05T09:50:12+00:00</updated>
<author>
<name>Tiezhu Yang</name>
<email>yangtiezhu@loongson.cn</email>
</author>
<published>2022-12-04T12:18:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=cc8c418b4fc09ed58ddd27b8e90ec797e9ca1e67'/>
<id>urn:sha1:cc8c418b4fc09ed58ddd27b8e90ec797e9ca1e67</id>
<content type='text'>
The official arch name is LoongArch [1], we should use small letter
loongarch instead of loong in Documentation/features, just use the
features-refresh.sh to refresh all the related files.

[1] https://www.kernel.org/doc/html/latest/loongarch/index.html

Fixes: 5860800e8696 ("Documentation/features: Update the arch support status files")
Signed-off-by: Tiezhu Yang &lt;yangtiezhu@loongson.cn&gt;
Link: https://lore.kernel.org/r/1670156327-9631-3-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Jonathan Corbet &lt;corbet@lwn.net&gt;
</content>
</entry>
<entry>
<title>Documentation/features: Update the arch support status files</title>
<updated>2022-06-09T15:35:57+00:00</updated>
<author>
<name>Zheng Zengkai</name>
<email>zhengzengkai@huawei.com</email>
</author>
<published>2022-06-09T02:56:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5860800e8696d2cbbd1a0dd60b433549d176e668'/>
<id>urn:sha1:5860800e8696d2cbbd1a0dd60b433549d176e668</id>
<content type='text'>
The arch support status files don't match reality as of v5.19-rc1,
use the features-refresh.sh to refresh all the arch-support.txt files
in place.  The main effect is to add entries for the new loong
architecture.

Signed-off-by: Zheng Zengkai &lt;zhengzengkai@huawei.com&gt;
Link: https://lore.kernel.org/r/20220609025656.143460-1-zhengzengkai@huawei.com
Signed-off-by: Jonathan Corbet &lt;corbet@lwn.net&gt;
</content>
</entry>
<entry>
<title>Merge branch 'remove-h8300' of git://git.infradead.org/users/hch/misc into asm-generic</title>
<updated>2022-04-04T12:42:49+00:00</updated>
<author>
<name>Arnd Bergmann</name>
<email>arnd@arndb.de</email>
</author>
<published>2022-04-04T12:42:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=fba2689ee77e63b05e203b3f26079ef915e55660'/>
<id>urn:sha1:fba2689ee77e63b05e203b3f26079ef915e55660</id>
<content type='text'>
* 'remove-h8300' of git://git.infradead.org/users/hch/misc:
  remove the h8300 architecture

This is clearly the least actively maintained architecture we have at
the moment, and probably the least useful. It is now the only one that
does not support MMUs at all, and most of the boards only support 4MB
of RAM, out of which the defconfig kernel needs more than half just
for .text/.data.

Guenter Roeck did the original patch to remove the architecture in 2013
after it had already been obsolete for a while, and Yoshinori Sato brought
it back in a much more modern form in 2015. Looking at the git history
since the reinstantiation, it's clear that almost all commits in the tree
are build fixes or cross-architecture cleanups:

$ git log --no-merges --format=%an v4.5.. arch/h8300/  | sort | uniq
-c | sort -rn | head -n 12
     25 Masahiro Yamada
     18 Christoph Hellwig
     14 Mike Rapoport
      9 Arnd Bergmann
      8 Mark Rutland
      7 Peter Zijlstra
      6 Kees Cook
      6 Ingo Molnar
      6 Al Viro
      5 Randy Dunlap
      4 Yury Norov

Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
</content>
</entry>
<entry>
<title>nds32: Remove the architecture</title>
<updated>2022-03-07T12:54:59+00:00</updated>
<author>
<name>Alan Kao</name>
<email>alankao@andestech.com</email>
</author>
<published>2022-03-02T07:42:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=aec499c75cf8e0b599be4d559e6922b613085f8f'/>
<id>urn:sha1:aec499c75cf8e0b599be4d559e6922b613085f8f</id>
<content type='text'>
The nds32 architecture, also known as AndeStar V3, is a custom 32-bit
RISC target designed by Andes Technologies. Support was added to the
kernel in 2016 as the replacement RISC-V based V5 processors were
already announced, and maintained by (current or former) Andes
employees.

As explained by Alan Kao, new customers are now all using RISC-V,
and all known nds32 users are already on longterm stable kernels
provided by Andes, with no development work going into mainline
support any more.

While the port is still in a reasonably good shape, it only gets
worse over time without active maintainers, so it seems best
to remove it before it becomes unusable. As always, if it turns
out that there are mainline users after all, and they volunteer
to maintain the port in the future, the removal can be reverted.

Link: https://lore.kernel.org/linux-mm/YhdWNLUhk+x9RAzU@yamatobi.andestech.com/
Link: https://lore.kernel.org/lkml/20220302065213.82702-1-alankao@andestech.com/
Link: https://www.andestech.com/en/products-solutions/andestar-architecture/
Signed-off-by: Alan Kao &lt;alankao@andestech.com&gt;
[arnd: rewrite changelog to provide more background]
Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
</content>
</entry>
<entry>
<title>remove the h8300 architecture</title>
<updated>2022-02-23T07:52:50+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2022-02-23T07:47:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1c4b5ecb7ea190fa3e9f9d6891e6c90b60e04f24'/>
<id>urn:sha1:1c4b5ecb7ea190fa3e9f9d6891e6c90b60e04f24</id>
<content type='text'>
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
</entry>
<entry>
<title>Documentation/features: mark BATCHED_UNMAP_TLB_FLUSH doesn't apply to ARM64</title>
<updated>2021-03-15T19:17:40+00:00</updated>
<author>
<name>Barry Song</name>
<email>song.bao.hua@hisilicon.com</email>
</author>
<published>2021-02-23T00:32:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6bfef171d0d74cb050112e0e49feb20bfddf7f42'/>
<id>urn:sha1:6bfef171d0d74cb050112e0e49feb20bfddf7f42</id>
<content type='text'>
BATCHED_UNMAP_TLB_FLUSH is used on x86 to do batched tlb shootdown by
sending one IPI to TLB flush all entries after unmapping pages rather
than sending an IPI to flush each individual entry.
On arm64, tlb shootdown is done by hardware. Flush instructions are
innershareable. The local flushes are limited to the boot (1 per CPU)
and when a task is getting a new ASID.
So marking this feature as "TODO" is not proper. ".." isn't good as
well. So this patch adds a "N/A" for this kind of features which are
not needed on some architectures.

Signed-off-by: Barry Song &lt;song.bao.hua@hisilicon.com&gt;
Acked-by: Will Deacon &lt;will@kernel.org&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Link: https://lore.kernel.org/r/20210223003230.11976-1-song.bao.hua@hisilicon.com
Signed-off-by: Jonathan Corbet &lt;corbet@lwn.net&gt;
</content>
</entry>
<entry>
<title>Documentation: features: remove c6x references</title>
<updated>2021-02-25T18:25:57+00:00</updated>
<author>
<name>Arnd Bergmann</name>
<email>arnd@arndb.de</email>
</author>
<published>2021-02-25T14:27:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4f3c8320c78cdd11c8fdd23c33787407f719322e'/>
<id>urn:sha1:4f3c8320c78cdd11c8fdd23c33787407f719322e</id>
<content type='text'>
The references to arch/c6x are obsolete now that the architecture
is gone. Remove them.

Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Link: https://lore.kernel.org/r/20210225142841.3385428-1-arnd@kernel.org
Signed-off-by: Jonathan Corbet &lt;corbet@lwn.net&gt;
</content>
</entry>
<entry>
<title>arch: remove unicore32 port</title>
<updated>2020-07-01T09:09:13+00:00</updated>
<author>
<name>Mike Rapoport</name>
<email>rppt@linux.ibm.com</email>
</author>
<published>2020-06-10T06:45:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=fb37409a01b011a664347702f44dbf13fa7c7486'/>
<id>urn:sha1:fb37409a01b011a664347702f44dbf13fa7c7486</id>
<content type='text'>
The unicore32 port do not seem maintained for a long time now, there is no
upstream toolchain that can create unicore32 binaries and all the links to
prebuilt toolchains for unicore32 are dead. Even compilers that were
available are not supported by the kernel anymore.

Guenter Roeck says:

  I have stopped building unicore32 images since v4.19 since there is no
  available compiler that is still supported by the kernel. I am surprised
  that support for it has not been removed from the kernel.

Remove unicore32 port.

Signed-off-by: Mike Rapoport &lt;rppt@linux.ibm.com&gt;
Acked-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Acked-by: Guenter Roeck &lt;linux@roeck-us.net&gt;
</content>
</entry>
<entry>
<title>Documentation/features: Add csky kernel features</title>
<updated>2019-01-07T14:22:16+00:00</updated>
<author>
<name>Guo Ren</name>
<email>ren_guo@c-sky.com</email>
</author>
<published>2019-01-04T03:17:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8a5aaf97cc4876a9b61cb3b7c07128d4569ac536'/>
<id>urn:sha1:8a5aaf97cc4876a9b61cb3b7c07128d4569ac536</id>
<content type='text'>
      core/ cBPF-JIT             : TODO |
      core/ eBPF-JIT             : TODO |
      core/ generic-idle-thread  :  ok  |
      core/ jump-labels          : TODO |
      core/ tracehook            :  ok  |
     debug/ KASAN                : TODO |
     debug/ gcov-profile-all     : TODO |
     debug/ kgdb                 : TODO |
     debug/ kprobes-on-ftrace    : TODO |
     debug/ kprobes              : TODO |
     debug/ kretprobes           : TODO |
     debug/ optprobes            : TODO |
     debug/ stackprotector       : TODO |
     debug/ uprobes              : TODO |
     debug/ user-ret-profiler    : TODO |
        io/ dma-contiguous       :  ok  |
   locking/ cmpxchg-local        : TODO |
   locking/ lockdep              : TODO |
   locking/ queued-rwlocks       :  ok  |
   locking/ queued-spinlocks     : TODO |
   locking/ rwsem-optimized      : TODO |
      perf/ kprobes-event        : TODO |
      perf/ perf-regs            : TODO |
      perf/ perf-stackdump       : TODO |
     sched/ membarrier-sync-core : TODO |
     sched/ numa-balancing       :  ..  |
   seccomp/ seccomp-filter       : TODO |
      time/ arch-tick-broadcast  : TODO |
      time/ clockevents          :  ok  |
      time/ context-tracking     : TODO |
      time/ irq-time-acct        : TODO |
      time/ modern-timekeeping   :  ok  |
      time/ virt-cpuacct         : TODO |
        vm/ ELF-ASLR             : TODO |
        vm/ PG_uncached          : TODO |
        vm/ THP                  :  ..  |
        vm/ batch-unmap-tlb-flush: TODO |
        vm/ huge-vmap            : TODO |
        vm/ ioremap_prot         : TODO |
        vm/ numa-memblock        :  ..  |
        vm/ pte_special          : TODO |

Signed-off-by: Guo Ren &lt;ren_guo@c-sky.com&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
</content>
</entry>
</feed>
