diff options
author | Mike Rapoport <rppt@linux.vnet.ibm.com> | 2018-03-21 22:22:47 +0300 |
---|---|---|
committer | Jonathan Corbet <corbet@lwn.net> | 2018-04-16 23:18:15 +0300 |
commit | ad56b738c5dd223a2f66685830f82194025a6138 (patch) | |
tree | 3994f40f1f93aec279d0b5c9117c0085a9f9ab03 /Documentation/vm/unevictable-lru.txt | |
parent | 3406bb5c64a091ad887c3fb339ad88e9e88ef938 (diff) | |
download | linux-ad56b738c5dd223a2f66685830f82194025a6138.tar.xz |
docs/vm: rename documentation files to .rst
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Diffstat (limited to 'Documentation/vm/unevictable-lru.txt')
-rw-r--r-- | Documentation/vm/unevictable-lru.txt | 614 |
1 files changed, 0 insertions, 614 deletions
diff --git a/Documentation/vm/unevictable-lru.txt b/Documentation/vm/unevictable-lru.txt deleted file mode 100644 index fdd84cb8d511..000000000000 --- a/Documentation/vm/unevictable-lru.txt +++ /dev/null @@ -1,614 +0,0 @@ -.. _unevictable_lru: - -============================== -Unevictable LRU Infrastructure -============================== - -.. contents:: :local: - - -Introduction -============ - -This document describes the Linux memory manager's "Unevictable LRU" -infrastructure and the use of this to manage several types of "unevictable" -pages. - -The document attempts to provide the overall rationale behind this mechanism -and the rationale for some of the design decisions that drove the -implementation. The latter design rationale is discussed in the context of an -implementation description. Admittedly, one can obtain the implementation -details - the "what does it do?" - by reading the code. One hopes that the -descriptions below add value by provide the answer to "why does it do that?". - - - -The Unevictable LRU -=================== - -The Unevictable LRU facility adds an additional LRU list to track unevictable -pages and to hide these pages from vmscan. This mechanism is based on a patch -by Larry Woodman of Red Hat to address several scalability problems with page -reclaim in Linux. The problems have been observed at customer sites on large -memory x86_64 systems. - -To illustrate this with an example, a non-NUMA x86_64 platform with 128GB of -main memory will have over 32 million 4k pages in a single zone. When a large -fraction of these pages are not evictable for any reason [see below], vmscan -will spend a lot of time scanning the LRU lists looking for the small fraction -of pages that are evictable. This can result in a situation where all CPUs are -spending 100% of their time in vmscan for hours or days on end, with the system -completely unresponsive. - -The unevictable list addresses the following classes of unevictable pages: - - * Those owned by ramfs. - - * Those mapped into SHM_LOCK'd shared memory regions. - - * Those mapped into VM_LOCKED [mlock()ed] VMAs. - -The infrastructure may also be able to handle other conditions that make pages -unevictable, either by definition or by circumstance, in the future. - - -The Unevictable Page List -------------------------- - -The Unevictable LRU infrastructure consists of an additional, per-zone, LRU list -called the "unevictable" list and an associated page flag, PG_unevictable, to -indicate that the page is being managed on the unevictable list. - -The PG_unevictable flag is analogous to, and mutually exclusive with, the -PG_active flag in that it indicates on which LRU list a page resides when -PG_lru is set. - -The Unevictable LRU infrastructure maintains unevictable pages on an additional -LRU list for a few reasons: - - (1) We get to "treat unevictable pages just like we treat other pages in the - system - which means we get to use the same code to manipulate them, the - same code to isolate them (for migrate, etc.), the same code to keep track - of the statistics, etc..." [Rik van Riel] - - (2) We want to be able to migrate unevictable pages between nodes for memory - defragmentation, workload management and memory hotplug. The linux kernel - can only migrate pages that it can successfully isolate from the LRU - lists. If we were to maintain pages elsewhere than on an LRU-like list, - where they can be found by isolate_lru_page(), we would prevent their - migration, unless we reworked migration code to find the unevictable pages - itself. - - -The unevictable list does not differentiate between file-backed and anonymous, -swap-backed pages. This differentiation is only important while the pages are, -in fact, evictable. - -The unevictable list benefits from the "arrayification" of the per-zone LRU -lists and statistics originally proposed and posted by Christoph Lameter. - -The unevictable list does not use the LRU pagevec mechanism. Rather, -unevictable pages are placed directly on the page's zone's unevictable list -under the zone lru_lock. This allows us to prevent the stranding of pages on -the unevictable list when one task has the page isolated from the LRU and other -tasks are changing the "evictability" state of the page. - - -Memory Control Group Interaction --------------------------------- - -The unevictable LRU facility interacts with the memory control group [aka -memory controller; see Documentation/cgroup-v1/memory.txt] by extending the -lru_list enum. - -The memory controller data structure automatically gets a per-zone unevictable -list as a result of the "arrayification" of the per-zone LRU lists (one per -lru_list enum element). The memory controller tracks the movement of pages to -and from the unevictable list. - -When a memory control group comes under memory pressure, the controller will -not attempt to reclaim pages on the unevictable list. This has a couple of -effects: - - (1) Because the pages are "hidden" from reclaim on the unevictable list, the - reclaim process can be more efficient, dealing only with pages that have a - chance of being reclaimed. - - (2) On the other hand, if too many of the pages charged to the control group - are unevictable, the evictable portion of the working set of the tasks in - the control group may not fit into the available memory. This can cause - the control group to thrash or to OOM-kill tasks. - - -.. _mark_addr_space_unevict: - -Marking Address Spaces Unevictable ----------------------------------- - -For facilities such as ramfs none of the pages attached to the address space -may be evicted. To prevent eviction of any such pages, the AS_UNEVICTABLE -address space flag is provided, and this can be manipulated by a filesystem -using a number of wrapper functions: - - * ``void mapping_set_unevictable(struct address_space *mapping);`` - - Mark the address space as being completely unevictable. - - * ``void mapping_clear_unevictable(struct address_space *mapping);`` - - Mark the address space as being evictable. - - * ``int mapping_unevictable(struct address_space *mapping);`` - - Query the address space, and return true if it is completely - unevictable. - -These are currently used in two places in the kernel: - - (1) By ramfs to mark the address spaces of its inodes when they are created, - and this mark remains for the life of the inode. - - (2) By SYSV SHM to mark SHM_LOCK'd address spaces until SHM_UNLOCK is called. - - Note that SHM_LOCK is not required to page in the locked pages if they're - swapped out; the application must touch the pages manually if it wants to - ensure they're in memory. - - -Detecting Unevictable Pages ---------------------------- - -The function page_evictable() in vmscan.c determines whether a page is -evictable or not using the query function outlined above [see section -:ref:`Marking address spaces unevictable <mark_addr_space_unevict>`] -to check the AS_UNEVICTABLE flag. - -For address spaces that are so marked after being populated (as SHM regions -might be), the lock action (eg: SHM_LOCK) can be lazy, and need not populate -the page tables for the region as does, for example, mlock(), nor need it make -any special effort to push any pages in the SHM_LOCK'd area to the unevictable -list. Instead, vmscan will do this if and when it encounters the pages during -a reclamation scan. - -On an unlock action (such as SHM_UNLOCK), the unlocker (eg: shmctl()) must scan -the pages in the region and "rescue" them from the unevictable list if no other -condition is keeping them unevictable. If an unevictable region is destroyed, -the pages are also "rescued" from the unevictable list in the process of -freeing them. - -page_evictable() also checks for mlocked pages by testing an additional page -flag, PG_mlocked (as wrapped by PageMlocked()), which is set when a page is -faulted into a VM_LOCKED vma, or found in a vma being VM_LOCKED. - - -Vmscan's Handling of Unevictable Pages --------------------------------------- - -If unevictable pages are culled in the fault path, or moved to the unevictable -list at mlock() or mmap() time, vmscan will not encounter the pages until they -have become evictable again (via munlock() for example) and have been "rescued" -from the unevictable list. However, there may be situations where we decide, -for the sake of expediency, to leave a unevictable page on one of the regular -active/inactive LRU lists for vmscan to deal with. vmscan checks for such -pages in all of the shrink_{active|inactive|page}_list() functions and will -"cull" such pages that it encounters: that is, it diverts those pages to the -unevictable list for the zone being scanned. - -There may be situations where a page is mapped into a VM_LOCKED VMA, but the -page is not marked as PG_mlocked. Such pages will make it all the way to -shrink_page_list() where they will be detected when vmscan walks the reverse -map in try_to_unmap(). If try_to_unmap() returns SWAP_MLOCK, -shrink_page_list() will cull the page at that point. - -To "cull" an unevictable page, vmscan simply puts the page back on the LRU list -using putback_lru_page() - the inverse operation to isolate_lru_page() - after -dropping the page lock. Because the condition which makes the page unevictable -may change once the page is unlocked, putback_lru_page() will recheck the -unevictable state of a page that it places on the unevictable list. If the -page has become unevictable, putback_lru_page() removes it from the list and -retries, including the page_unevictable() test. Because such a race is a rare -event and movement of pages onto the unevictable list should be rare, these -extra evictabilty checks should not occur in the majority of calls to -putback_lru_page(). - - -MLOCKED Pages -============= - -The unevictable page list is also useful for mlock(), in addition to ramfs and -SYSV SHM. Note that mlock() is only available in CONFIG_MMU=y situations; in -NOMMU situations, all mappings are effectively mlocked. - - -History -------- - -The "Unevictable mlocked Pages" infrastructure is based on work originally -posted by Nick Piggin in an RFC patch entitled "mm: mlocked pages off LRU". -Nick posted his patch as an alternative to a patch posted by Christoph Lameter -to achieve the same objective: hiding mlocked pages from vmscan. - -In Nick's patch, he used one of the struct page LRU list link fields as a count -of VM_LOCKED VMAs that map the page. This use of the link field for a count -prevented the management of the pages on an LRU list, and thus mlocked pages -were not migratable as isolate_lru_page() could not find them, and the LRU list -link field was not available to the migration subsystem. - -Nick resolved this by putting mlocked pages back on the lru list before -attempting to isolate them, thus abandoning the count of VM_LOCKED VMAs. When -Nick's patch was integrated with the Unevictable LRU work, the count was -replaced by walking the reverse map to determine whether any VM_LOCKED VMAs -mapped the page. More on this below. - - -Basic Management ----------------- - -mlocked pages - pages mapped into a VM_LOCKED VMA - are a class of unevictable -pages. When such a page has been "noticed" by the memory management subsystem, -the page is marked with the PG_mlocked flag. This can be manipulated using the -PageMlocked() functions. - -A PG_mlocked page will be placed on the unevictable list when it is added to -the LRU. Such pages can be "noticed" by memory management in several places: - - (1) in the mlock()/mlockall() system call handlers; - - (2) in the mmap() system call handler when mmapping a region with the - MAP_LOCKED flag; - - (3) mmapping a region in a task that has called mlockall() with the MCL_FUTURE - flag - - (4) in the fault path, if mlocked pages are "culled" in the fault path, - and when a VM_LOCKED stack segment is expanded; or - - (5) as mentioned above, in vmscan:shrink_page_list() when attempting to - reclaim a page in a VM_LOCKED VMA via try_to_unmap() - -all of which result in the VM_LOCKED flag being set for the VMA if it doesn't -already have it set. - -mlocked pages become unlocked and rescued from the unevictable list when: - - (1) mapped in a range unlocked via the munlock()/munlockall() system calls; - - (2) munmap()'d out of the last VM_LOCKED VMA that maps the page, including - unmapping at task exit; - - (3) when the page is truncated from the last VM_LOCKED VMA of an mmapped file; - or - - (4) before a page is COW'd in a VM_LOCKED VMA. - - -mlock()/mlockall() System Call Handling ---------------------------------------- - -Both [do\_]mlock() and [do\_]mlockall() system call handlers call mlock_fixup() -for each VMA in the range specified by the call. In the case of mlockall(), -this is the entire active address space of the task. Note that mlock_fixup() -is used for both mlocking and munlocking a range of memory. A call to mlock() -an already VM_LOCKED VMA, or to munlock() a VMA that is not VM_LOCKED is -treated as a no-op, and mlock_fixup() simply returns. - -If the VMA passes some filtering as described in "Filtering Special Vmas" -below, mlock_fixup() will attempt to merge the VMA with its neighbors or split -off a subset of the VMA if the range does not cover the entire VMA. Once the -VMA has been merged or split or neither, mlock_fixup() will call -populate_vma_page_range() to fault in the pages via get_user_pages() and to -mark the pages as mlocked via mlock_vma_page(). - -Note that the VMA being mlocked might be mapped with PROT_NONE. In this case, -get_user_pages() will be unable to fault in the pages. That's okay. If pages -do end up getting faulted into this VM_LOCKED VMA, we'll handle them in the -fault path or in vmscan. - -Also note that a page returned by get_user_pages() could be truncated or -migrated out from under us, while we're trying to mlock it. To detect this, -populate_vma_page_range() checks page_mapping() after acquiring the page lock. -If the page is still associated with its mapping, we'll go ahead and call -mlock_vma_page(). If the mapping is gone, we just unlock the page and move on. -In the worst case, this will result in a page mapped in a VM_LOCKED VMA -remaining on a normal LRU list without being PageMlocked(). Again, vmscan will -detect and cull such pages. - -mlock_vma_page() will call TestSetPageMlocked() for each page returned by -get_user_pages(). We use TestSetPageMlocked() because the page might already -be mlocked by another task/VMA and we don't want to do extra work. We -especially do not want to count an mlocked page more than once in the -statistics. If the page was already mlocked, mlock_vma_page() need do nothing -more. - -If the page was NOT already mlocked, mlock_vma_page() attempts to isolate the -page from the LRU, as it is likely on the appropriate active or inactive list -at that time. If the isolate_lru_page() succeeds, mlock_vma_page() will put -back the page - by calling putback_lru_page() - which will notice that the page -is now mlocked and divert the page to the zone's unevictable list. If -mlock_vma_page() is unable to isolate the page from the LRU, vmscan will handle -it later if and when it attempts to reclaim the page. - - -Filtering Special VMAs ----------------------- - -mlock_fixup() filters several classes of "special" VMAs: - -1) VMAs with VM_IO or VM_PFNMAP set are skipped entirely. The pages behind - these mappings are inherently pinned, so we don't need to mark them as - mlocked. In any case, most of the pages have no struct page in which to so - mark the page. Because of this, get_user_pages() will fail for these VMAs, - so there is no sense in attempting to visit them. - -2) VMAs mapping hugetlbfs page are already effectively pinned into memory. We - neither need nor want to mlock() these pages. However, to preserve the - prior behavior of mlock() - before the unevictable/mlock changes - - mlock_fixup() will call make_pages_present() in the hugetlbfs VMA range to - allocate the huge pages and populate the ptes. - -3) VMAs with VM_DONTEXPAND are generally userspace mappings of kernel pages, - such as the VDSO page, relay channel pages, etc. These pages - are inherently unevictable and are not managed on the LRU lists. - mlock_fixup() treats these VMAs the same as hugetlbfs VMAs. It calls - make_pages_present() to populate the ptes. - -Note that for all of these special VMAs, mlock_fixup() does not set the -VM_LOCKED flag. Therefore, we won't have to deal with them later during -munlock(), munmap() or task exit. Neither does mlock_fixup() account these -VMAs against the task's "locked_vm". - -.. _munlock_munlockall_handling: - -munlock()/munlockall() System Call Handling -------------------------------------------- - -The munlock() and munlockall() system calls are handled by the same functions - -do_mlock[all]() - as the mlock() and mlockall() system calls with the unlock vs -lock operation indicated by an argument. So, these system calls are also -handled by mlock_fixup(). Again, if called for an already munlocked VMA, -mlock_fixup() simply returns. Because of the VMA filtering discussed above, -VM_LOCKED will not be set in any "special" VMAs. So, these VMAs will be -ignored for munlock. - -If the VMA is VM_LOCKED, mlock_fixup() again attempts to merge or split off the -specified range. The range is then munlocked via the function -populate_vma_page_range() - the same function used to mlock a VMA range - -passing a flag to indicate that munlock() is being performed. - -Because the VMA access protections could have been changed to PROT_NONE after -faulting in and mlocking pages, get_user_pages() was unreliable for visiting -these pages for munlocking. Because we don't want to leave pages mlocked, -get_user_pages() was enhanced to accept a flag to ignore the permissions when -fetching the pages - all of which should be resident as a result of previous -mlocking. - -For munlock(), populate_vma_page_range() unlocks individual pages by calling -munlock_vma_page(). munlock_vma_page() unconditionally clears the PG_mlocked -flag using TestClearPageMlocked(). As with mlock_vma_page(), -munlock_vma_page() use the Test*PageMlocked() function to handle the case where -the page might have already been unlocked by another task. If the page was -mlocked, munlock_vma_page() updates that zone statistics for the number of -mlocked pages. Note, however, that at this point we haven't checked whether -the page is mapped by other VM_LOCKED VMAs. - -We can't call try_to_munlock(), the function that walks the reverse map to -check for other VM_LOCKED VMAs, without first isolating the page from the LRU. -try_to_munlock() is a variant of try_to_unmap() and thus requires that the page -not be on an LRU list [more on these below]. However, the call to -isolate_lru_page() could fail, in which case we couldn't try_to_munlock(). So, -we go ahead and clear PG_mlocked up front, as this might be the only chance we -have. If we can successfully isolate the page, we go ahead and -try_to_munlock(), which will restore the PG_mlocked flag and update the zone -page statistics if it finds another VMA holding the page mlocked. If we fail -to isolate the page, we'll have left a potentially mlocked page on the LRU. -This is fine, because we'll catch it later if and if vmscan tries to reclaim -the page. This should be relatively rare. - - -Migrating MLOCKED Pages ------------------------ - -A page that is being migrated has been isolated from the LRU lists and is held -locked across unmapping of the page, updating the page's address space entry -and copying the contents and state, until the page table entry has been -replaced with an entry that refers to the new page. Linux supports migration -of mlocked pages and other unevictable pages. This involves simply moving the -PG_mlocked and PG_unevictable states from the old page to the new page. - -Note that page migration can race with mlocking or munlocking of the same page. -This has been discussed from the mlock/munlock perspective in the respective -sections above. Both processes (migration and m[un]locking) hold the page -locked. This provides the first level of synchronization. Page migration -zeros out the page_mapping of the old page before unlocking it, so m[un]lock -can skip these pages by testing the page mapping under page lock. - -To complete page migration, we place the new and old pages back onto the LRU -after dropping the page lock. The "unneeded" page - old page on success, new -page on failure - will be freed when the reference count held by the migration -process is released. To ensure that we don't strand pages on the unevictable -list because of a race between munlock and migration, page migration uses the -putback_lru_page() function to add migrated pages back to the LRU. - - -Compacting MLOCKED Pages ------------------------- - -The unevictable LRU can be scanned for compactable regions and the default -behavior is to do so. /proc/sys/vm/compact_unevictable_allowed controls -this behavior (see Documentation/sysctl/vm.txt). Once scanning of the -unevictable LRU is enabled, the work of compaction is mostly handled by -the page migration code and the same work flow as described in MIGRATING -MLOCKED PAGES will apply. - -MLOCKING Transparent Huge Pages -------------------------------- - -A transparent huge page is represented by a single entry on an LRU list. -Therefore, we can only make unevictable an entire compound page, not -individual subpages. - -If a user tries to mlock() part of a huge page, we want the rest of the -page to be reclaimable. - -We cannot just split the page on partial mlock() as split_huge_page() can -fail and new intermittent failure mode for the syscall is undesirable. - -We handle this by keeping PTE-mapped huge pages on normal LRU lists: the -PMD on border of VM_LOCKED VMA will be split into PTE table. - -This way the huge page is accessible for vmscan. Under memory pressure the -page will be split, subpages which belong to VM_LOCKED VMAs will be moved -to unevictable LRU and the rest can be reclaimed. - -See also comment in follow_trans_huge_pmd(). - -mmap(MAP_LOCKED) System Call Handling -------------------------------------- - -In addition the mlock()/mlockall() system calls, an application can request -that a region of memory be mlocked supplying the MAP_LOCKED flag to the mmap() -call. There is one important and subtle difference here, though. mmap() + mlock() -will fail if the range cannot be faulted in (e.g. because mm_populate fails) -and returns with ENOMEM while mmap(MAP_LOCKED) will not fail. The mmaped -area will still have properties of the locked area - aka. pages will not get -swapped out - but major page faults to fault memory in might still happen. - -Furthermore, any mmap() call or brk() call that expands the heap by a -task that has previously called mlockall() with the MCL_FUTURE flag will result -in the newly mapped memory being mlocked. Before the unevictable/mlock -changes, the kernel simply called make_pages_present() to allocate pages and -populate the page table. - -To mlock a range of memory under the unevictable/mlock infrastructure, the -mmap() handler and task address space expansion functions call -populate_vma_page_range() specifying the vma and the address range to mlock. - -The callers of populate_vma_page_range() will have already added the memory range -to be mlocked to the task's "locked_vm". To account for filtered VMAs, -populate_vma_page_range() returns the number of pages NOT mlocked. All of the -callers then subtract a non-negative return value from the task's locked_vm. A -negative return value represent an error - for example, from get_user_pages() -attempting to fault in a VMA with PROT_NONE access. In this case, we leave the -memory range accounted as locked_vm, as the protections could be changed later -and pages allocated into that region. - - -munmap()/exit()/exec() System Call Handling -------------------------------------------- - -When unmapping an mlocked region of memory, whether by an explicit call to -munmap() or via an internal unmap from exit() or exec() processing, we must -munlock the pages if we're removing the last VM_LOCKED VMA that maps the pages. -Before the unevictable/mlock changes, mlocking did not mark the pages in any -way, so unmapping them required no processing. - -To munlock a range of memory under the unevictable/mlock infrastructure, the -munmap() handler and task address space call tear down function -munlock_vma_pages_all(). The name reflects the observation that one always -specifies the entire VMA range when munlock()ing during unmap of a region. -Because of the VMA filtering when mlocking() regions, only "normal" VMAs that -actually contain mlocked pages will be passed to munlock_vma_pages_all(). - -munlock_vma_pages_all() clears the VM_LOCKED VMA flag and, like mlock_fixup() -for the munlock case, calls __munlock_vma_pages_range() to walk the page table -for the VMA's memory range and munlock_vma_page() each resident page mapped by -the VMA. This effectively munlocks the page, only if this is the last -VM_LOCKED VMA that maps the page. - - -try_to_unmap() --------------- - -Pages can, of course, be mapped into multiple VMAs. Some of these VMAs may -have VM_LOCKED flag set. It is possible for a page mapped into one or more -VM_LOCKED VMAs not to have the PG_mlocked flag set and therefore reside on one -of the active or inactive LRU lists. This could happen if, for example, a task -in the process of munlocking the page could not isolate the page from the LRU. -As a result, vmscan/shrink_page_list() might encounter such a page as described -in section "vmscan's handling of unevictable pages". To handle this situation, -try_to_unmap() checks for VM_LOCKED VMAs while it is walking a page's reverse -map. - -try_to_unmap() is always called, by either vmscan for reclaim or for page -migration, with the argument page locked and isolated from the LRU. Separate -functions handle anonymous and mapped file and KSM pages, as these types of -pages have different reverse map lookup mechanisms, with different locking. -In each case, whether rmap_walk_anon() or rmap_walk_file() or rmap_walk_ksm(), -it will call try_to_unmap_one() for every VMA which might contain the page. - -When trying to reclaim, if try_to_unmap_one() finds the page in a VM_LOCKED -VMA, it will then mlock the page via mlock_vma_page() instead of unmapping it, -and return SWAP_MLOCK to indicate that the page is unevictable: and the scan -stops there. - -mlock_vma_page() is called while holding the page table's lock (in addition -to the page lock, and the rmap lock): to serialize against concurrent mlock or -munlock or munmap system calls, mm teardown (munlock_vma_pages_all), reclaim, -holepunching, and truncation of file pages and their anonymous COWed pages. - - -try_to_munlock() Reverse Map Scan ---------------------------------- - -.. warning:: - [!] TODO/FIXME: a better name might be page_mlocked() - analogous to the - page_referenced() reverse map walker. - -When munlock_vma_page() [see section :ref:`munlock()/munlockall() System Call -Handling <munlock_munlockall_handling>` above] tries to munlock a -page, it needs to determine whether or not the page is mapped by any -VM_LOCKED VMA without actually attempting to unmap all PTEs from the -page. For this purpose, the unevictable/mlock infrastructure -introduced a variant of try_to_unmap() called try_to_munlock(). - -try_to_munlock() calls the same functions as try_to_unmap() for anonymous and -mapped file and KSM pages with a flag argument specifying unlock versus unmap -processing. Again, these functions walk the respective reverse maps looking -for VM_LOCKED VMAs. When such a VMA is found, as in the try_to_unmap() case, -the functions mlock the page via mlock_vma_page() and return SWAP_MLOCK. This -undoes the pre-clearing of the page's PG_mlocked done by munlock_vma_page. - -Note that try_to_munlock()'s reverse map walk must visit every VMA in a page's -reverse map to determine that a page is NOT mapped into any VM_LOCKED VMA. -However, the scan can terminate when it encounters a VM_LOCKED VMA. -Although try_to_munlock() might be called a great many times when munlocking a -large region or tearing down a large address space that has been mlocked via -mlockall(), overall this is a fairly rare event. - - -Page Reclaim in shrink_*_list() -------------------------------- - -shrink_active_list() culls any obviously unevictable pages - i.e. -!page_evictable(page) - diverting these to the unevictable list. -However, shrink_active_list() only sees unevictable pages that made it onto the -active/inactive lru lists. Note that these pages do not have PageUnevictable -set - otherwise they would be on the unevictable list and shrink_active_list -would never see them. - -Some examples of these unevictable pages on the LRU lists are: - - (1) ramfs pages that have been placed on the LRU lists when first allocated. - - (2) SHM_LOCK'd shared memory pages. shmctl(SHM_LOCK) does not attempt to - allocate or fault in the pages in the shared memory region. This happens - when an application accesses the page the first time after SHM_LOCK'ing - the segment. - - (3) mlocked pages that could not be isolated from the LRU and moved to the - unevictable list in mlock_vma_page(). - -shrink_inactive_list() also diverts any unevictable pages that it finds on the -inactive lists to the appropriate zone's unevictable list. - -shrink_inactive_list() should only see SHM_LOCK'd pages that became SHM_LOCK'd -after shrink_active_list() had moved them to the inactive list, or pages mapped -into VM_LOCKED VMAs that munlock_vma_page() couldn't isolate from the LRU to -recheck via try_to_munlock(). shrink_inactive_list() won't notice the latter, -but will pass on to shrink_page_list(). - -shrink_page_list() again culls obviously unevictable pages that it could -encounter for similar reason to shrink_inactive_list(). Pages mapped into -VM_LOCKED VMAs but without PG_mlocked set will make it all the way to -try_to_unmap(). shrink_page_list() will divert them to the unevictable list -when try_to_unmap() returns SWAP_MLOCK, as discussed above. |