页面所有者:跟踪每个页面的分配者¶
简介¶
页面所有者用于跟踪每个页面的分配者。它可以用于调试内存泄漏或查找内存占用者。当发生分配时,有关分配的信息(例如调用堆栈和页面顺序)会存储到每个页面的特定存储空间中。当我们需要了解所有页面的状态时,我们可以获取并分析这些信息。
虽然我们已经有用于跟踪页面分配/释放的跟踪点,但使用它来分析谁分配了每个页面相当复杂。我们需要扩大跟踪缓冲区以防止重叠,直到用户空间程序启动。而且,启动的程序会不断转储跟踪缓冲区以供后续分析,并且它会以更大的可能性改变系统行为,而不是仅仅将其保留在内存中,因此不利于调试。
页面所有者还可以用于各种目的。例如,可以通过每个页面的 gfp 标志信息获取准确的碎片统计信息。如果启用了页面所有者,则它已经实现并激活。非常欢迎其他用途。
它还可以用于显示所有堆栈及其当前分配的基页数量,这让我们能够快速了解内存的去向,而无需筛选所有页面并匹配分配和释放操作。
页面所有者默认禁用。因此,如果您想使用它,需要在启动命令行中添加“page_owner=on”。如果内核构建时启用了页面所有者,但由于未启用启动选项而在运行时禁用,则运行时开销很小。如果在运行时禁用,则不需要内存来存储所有者信息,因此没有运行时内存开销。并且,页面所有者仅在页面分配器热路径中插入了两个不太可能的分支,如果未启用,则分配就像没有页面所有者的内核一样完成。这两个不太可能的分支不应影响分配性能,尤其是在静态键跳转标签修补功能可用时。以下是由于此功能导致的内核代码大小更改。
虽然启用页面所有者会使内核大小增加几千字节,但大部分代码都在页面分配器及其热路径之外。在构建内核时启用页面所有者,并在需要时启用它,这将是调试内核内存问题的好选择。
有一个由实现细节引起的注意事项。页面所有者将信息存储到 struct page 扩展的内存中。此内存的初始化时间晚于稀疏内存系统中页面分配器开始的时间,因此,在初始化之前,可以分配许多页面,并且它们将没有所有者信息。为了解决这个问题,这些早期分配的页面会被调查并在初始化阶段标记为已分配。尽管这并不意味着它们具有正确的所有者信息,但至少我们可以更准确地判断页面是否已分配。在 2GB 内存的 x86-64 VM 机器上,捕获并标记了 13343 个早期分配的页面,尽管它们主要从 struct page 扩展功能分配。无论如何,在那之后,没有页面处于未跟踪状态。
用法¶
构建用户空间助手
cd tools/mm make page_owner_sort
启用页面所有者:在启动命令行中添加“page_owner=on”。
执行您要调试的任务。
分析页面所有者的信息
cat /sys/kernel/debug/page_owner_stacks/show_stacks > stacks.txt cat stacks.txt post_alloc_hook+0x177/0x1a0 get_page_from_freelist+0xd01/0xd80 __alloc_pages+0x39e/0x7e0 allocate_slab+0xbc/0x3f0 ___slab_alloc+0x528/0x8a0 kmem_cache_alloc+0x224/0x3b0 sk_prot_alloc+0x58/0x1a0 sk_alloc+0x32/0x4f0 inet_create+0x427/0xb50 __sock_create+0x2e4/0x650 inet_ctl_sock_create+0x30/0x180 igmp_net_init+0xc1/0x130 ops_init+0x167/0x410 setup_net+0x304/0xa60 copy_net_ns+0x29b/0x4a0 create_new_namespaces+0x4a1/0x820 nr_base_pages: 16 ... ... echo 7000 > /sys/kernel/debug/page_owner_stacks/count_threshold cat /sys/kernel/debug/page_owner_stacks/show_stacks> stacks_7000.txt cat stacks_7000.txt post_alloc_hook+0x177/0x1a0 get_page_from_freelist+0xd01/0xd80 __alloc_pages+0x39e/0x7e0 alloc_pages_mpol+0x22e/0x490 folio_alloc+0xd5/0x110 filemap_alloc_folio+0x78/0x230 page_cache_ra_order+0x287/0x6f0 filemap_get_pages+0x517/0x1160 filemap_read+0x304/0x9f0 xfs_file_buffered_read+0xe6/0x1d0 [xfs] xfs_file_read_iter+0x1f0/0x380 [xfs] __kernel_read+0x3b9/0x730 kernel_read_file+0x309/0x4d0 __do_sys_finit_module+0x381/0x730 do_syscall_64+0x8d/0x150 entry_SYSCALL_64_after_hwframe+0x62/0x6a nr_base_pages: 20824 ... cat /sys/kernel/debug/page_owner > page_owner_full.txt ./page_owner_sort page_owner_full.txt sorted_page_owner.txt
page_owner_full.txt
的一般输出如下Page allocated via order XXX, ... PFN XXX ... // Detailed stack Page allocated via order XXX, ... PFN XXX ... // Detailed stack By default, it will do full pfn dump, to start with a given pfn, page_owner supports fseek. FILE *fp = fopen("/sys/kernel/debug/page_owner", "r"); fseek(fp, pfn_start, SEEK_SET);
page_owner_sort
工具忽略PFN
行,将剩余的行放入 buf,使用正则表达式提取页面顺序值,计算 buf 的次数和页面数,最后根据参数对其进行排序。请参阅
sorted_page_owner.txt
中有关谁分配了每个页面的结果。一般输出XXX times, XXX pages: Page allocated via order XXX, ... // Detailed stack
默认情况下,
page_owner_sort
按照 buf 的次数排序。如果要按照 buf 的页面数排序,请使用-m
参数。详细参数是基本功能
Sort: -a Sort by memory allocation time. -m Sort by total memory. -p Sort by pid. -P Sort by tgid. -n Sort by task command name. -r Sort by memory release time. -s Sort by stack trace. -t Sort by times (default). --sort <order> Specify sorting order. Sorting syntax is [+|-]key[,[+|-]key[,...]]. Choose a key from the **STANDARD FORMAT SPECIFIERS** section. The "+" is optional since default direction is increasing numerical or lexicographic order. Mixed use of abbreviated and complete-form of keys is allowed. Examples: ./page_owner_sort <input> <output> --sort=n,+pid,-tgid ./page_owner_sort <input> <output> --sort=at
附加功能
Cull: --cull <rules> Specify culling rules.Culling syntax is key[,key[,...]].Choose a multi-letter key from the **STANDARD FORMAT SPECIFIERS** section. <rules> is a single argument in the form of a comma-separated list, which offers a way to specify individual culling rules. The recognized keywords are described in the **STANDARD FORMAT SPECIFIERS** section below. <rules> can be specified by the sequence of keys k1,k2, ..., as described in the STANDARD SORT KEYS section below. Mixed use of abbreviated and complete-form of keys is allowed. Examples: ./page_owner_sort <input> <output> --cull=stacktrace ./page_owner_sort <input> <output> --cull=st,pid,name ./page_owner_sort <input> <output> --cull=n,f Filter: -f Filter out the information of blocks whose memory has been released. Select: --pid <pidlist> Select by pid. This selects the blocks whose process ID numbers appear in <pidlist>. --tgid <tgidlist> Select by tgid. This selects the blocks whose thread group ID numbers appear in <tgidlist>. --name <cmdlist> Select by task command name. This selects the blocks whose task command name appear in <cmdlist>. <pidlist>, <tgidlist>, <cmdlist> are single arguments in the form of a comma-separated list, which offers a way to specify individual selecting rules. Examples: ./page_owner_sort <input> <output> --pid=1 ./page_owner_sort <input> <output> --tgid=1,2,3 ./page_owner_sort <input> <output> --name name1,name2
标准格式说明符¶
For --sort option:
KEY LONG DESCRIPTION
p pid process ID
tg tgid thread group ID
n name task command name
st stacktrace stack trace of the page allocation
T txt full text of block
ft free_ts timestamp of the page when it was released
at alloc_ts timestamp of the page when it was allocated
ator allocator memory allocator for pages
For --cull option:
KEY LONG DESCRIPTION
p pid process ID
tg tgid thread group ID
n name task command name
f free whether the page has been released or not
st stacktrace stack trace of the page allocation
ator allocator memory allocator for pages