• Y
    mm: thp: move deferred split queue to memcg's nodeinfo · 1d1b4c6c
    Yang Shi 提交于
    The commit 87eaceb3faa59b9b4d940ec9554ce251325d83fe ("mm: thp: make
    deferred split shrinker memcg aware") makes deferred split queue per
    memcg to resolve memcg pre-mature OOM problem.  But, all nodes end up
    sharing the same queue instead of one queue per-node before the commit.
    It is not a big deal for memcg limit reclaim, but it may cause global
    kswapd shrink THPs from a different node.
    
    And, 0-day testing reported -19.6% regression of stress-ng's madvise
    test [1].  I didn't see that much regression on my test box (24 threads,
    48GB memory, 2 nodes), with the same test (stress-ng --timeout 1
    --metrics-brief --sequential 72  --class vm --exclude spawn,exec), I saw
    average -3% (run the same test 10 times then calculate the average since
    the test itself may have most 15% variation according to my test)
    regression sometimes (not every time, sometimes I didn't see regression
    at all).
    
    This might be caused by deferred split queue lock contention.  With some
    configuration (i.e. just one root memcg) the lock contention my be worse
    than before (given 2 nodes, two locks are reduced to one lock).
    
    So, moving deferred split queue to memcg's nodeinfo to make it NUMA
    aware again.
    
    With this change stress-ng's madvise test shows average 4% improvement
    sometimes and I didn't see degradation anymore.
    
    [1]: https://lore.kernel.org/lkml/20190930084604.GC17687@shao2-debian/
    
    Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: David Rientjes <rientjes@google.com>
    Signed-off-by: NYang Shi <yang.shi@linux.alibaba.com>
    Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>
    1d1b4c6c
huge_memory.c 83.8 KB