[kernel/f15] mm: Do not stall in synchronous compaction for THP allocations

Tue Nov 15 22:15:41 UTC 2011

commit b4cf83d3b8fcd978c7ff581208c470d78b4b6fba
Author: Dave Jones <davej at redhat.com>
Date:   Tue Nov 15 17:15:30 2011 -0500

    mm: Do not stall in synchronous compaction for THP allocations

 kernel.spec                                        |    5 +
 ...ynchronous-compaction-for-THP-allocations.patch |  115 ++++++++++++++++++++
 2 files changed, 120 insertions(+), 0 deletions(-)
---

diff --git a/kernel.spec b/kernel.spec
index f86b030..703893c 100644
--- a/kernel.spec
+++ b/kernel.spec
@@ -668,6 +668,7 @@ Patch21001: arm-smsc-support-reading-mac-address-from-device-tree.patch
 #rhbz #735946
 Patch21020: 0001-mm-vmscan-Limit-direct-reclaim-for-higher-order-allo.patch
 Patch21021: 0002-mm-Abort-reclaim-compaction-if-compaction-can-procee.patch
+Patch21022: mm-do-not-stall-in-synchronous-compaction-for-THP-allocations.patch
 
 #rhbz 748691
 Patch21030: be2net-non-member-vlan-pkts-not-received-in-promisco.patch
@@ -1245,6 +1246,7 @@ ApplyPatch utrace.patch
 #rhbz #735946
 ApplyPatch 0001-mm-vmscan-Limit-direct-reclaim-for-higher-order-allo.patch
 ApplyPatch 0002-mm-Abort-reclaim-compaction-if-compaction-can-procee.patch
+ApplyPatch mm-do-not-stall-in-synchronous-compaction-for-THP-allocations.patch
 
 #rhbz 748691
 ApplyPatch be2net-non-member-vlan-pkts-not-received-in-promisco.patch
@@ -1884,6 +1886,9 @@ fi
 # and build.
 
 %changelog
+* Tue Nov 15 2011 Dave Jones <davej at redhat.com>
+- mm: Do not stall in synchronous compaction for THP allocations
+
 * Mon Nov 14 2011 Josh Boyer <jwboyer at redhat.com>
 - Patch from Joshua Roys to add rtl8192* to modules.networking (rhbz 753645)
 - Add patch for wacom tablets for Bastien Nocera (upstream 3797ef6b6)
diff --git a/mm-do-not-stall-in-synchronous-compaction-for-THP-allocations.patch b/mm-do-not-stall-in-synchronous-compaction-for-THP-allocations.patch
new file mode 100644
index 0000000..6202341
--- /dev/null
+++ b/mm-do-not-stall-in-synchronous-compaction-for-THP-allocations.patch
@@ -0,0 +1,115 @@
+https://lkml.org/lkml/2011/11/10/173
+
+Date	Thu, 10 Nov 2011 10:06:16 +0000
+From	Mel Gorman <>
+Subject	[PATCH] mm: Do not stall in synchronous compaction for THP allocations
+	
+
+Occasionally during large file copies to slow storage, there are still
+reports of user-visible stalls when THP is enabled. Reports on this
+have been intermittent and not reliable to reproduce locally but;
+
+Andy Isaacson reported a problem copying to VFAT on SD Card
+	https://lkml.org/lkml/2011/11/7/2
+
+	In this case, it was stuck in munmap for betwen 20 and 60
+	seconds in compaction. It is also possible that khugepaged
+	was holding mmap_sem on this process if CONFIG_NUMA was set.
+
+Johannes Weiner reported stalls on USB
+	https://lkml.org/lkml/2011/7/25/378
+
+	In this case, there is no stack trace but it looks like the
+	same problem. The USB stick may have been using NTFS as a
+	filesystem based on other work done related to writing back
+	to USB around the same time.
+
+Internally in SUSE, I received a bug report related to stalls in firefox
+	when using Java and Flash heavily while copying from NFS
+	to VFAT on USB. It has not been confirmed to be the same problem
+	but if it looks like a duck and quacks like a duck.....
+In the past, commit [11bc82d6: mm: compaction: Use async migration for
+__GFP_NO_KSWAPD and enforce no writeback] forced that sync compaction
+would never be used for THP allocations. This was reverted in commit
+[c6a140bf: mm/compaction: reverse the change that forbade sync
+migraton with __GFP_NO_KSWAPD] on the grounds that it was uncertain
+it was beneficial.
+
+While user-visible stalls do not happen for me when writing to USB,
+I setup a test running postmark while short-lived processes created
+anonymous mapping. The objective was to exercise the paths that
+allocate transparent huge pages. I then logged when processes were
+stalled for more than 1 second, recorded a stack strace and did some
+analysis to aggregate unique "stall events" which revealed
+
+Time stalled in this event:    47369 ms
+Event count:                      20
+usemem               sleep_on_page          3690 ms
+usemem               sleep_on_page          2148 ms
+usemem               sleep_on_page          1534 ms
+usemem               sleep_on_page          1518 ms
+usemem               sleep_on_page          1225 ms
+usemem               sleep_on_page          2205 ms
+usemem               sleep_on_page          2399 ms
+usemem               sleep_on_page          2398 ms
+usemem               sleep_on_page          3760 ms
+usemem               sleep_on_page          1861 ms
+usemem               sleep_on_page          2948 ms
+usemem               sleep_on_page          1515 ms
+usemem               sleep_on_page          1386 ms
+usemem               sleep_on_page          1882 ms
+usemem               sleep_on_page          1850 ms
+usemem               sleep_on_page          3715 ms
+usemem               sleep_on_page          3716 ms
+usemem               sleep_on_page          4846 ms
+usemem               sleep_on_page          1306 ms
+usemem               sleep_on_page          1467 ms
+[<ffffffff810ef30c>] wait_on_page_bit+0x6c/0x80
+[<ffffffff8113de9f>] unmap_and_move+0x1bf/0x360
+[<ffffffff8113e0e2>] migrate_pages+0xa2/0x1b0
+[<ffffffff81134273>] compact_zone+0x1f3/0x2f0
+[<ffffffff811345d8>] compact_zone_order+0xa8/0xf0
+[<ffffffff811346ff>] try_to_compact_pages+0xdf/0x110
+[<ffffffff810f773a>] __alloc_pages_direct_compact+0xda/0x1a0
+[<ffffffff810f7d5d>] __alloc_pages_slowpath+0x55d/0x7a0
+[<ffffffff810f8151>] __alloc_pages_nodemask+0x1b1/0x1c0
+[<ffffffff811331db>] alloc_pages_vma+0x9b/0x160
+[<ffffffff81142bb0>] do_huge_pmd_anonymous_page+0x160/0x270
+[<ffffffff814410a7>] do_page_fault+0x207/0x4c0
+[<ffffffff8143dde5>] page_fault+0x25/0x30
+The stall times are approximate at best but the estimates represent 25%
+of the worst stalls and even if the estimates are off by a factor of
+10, it's severe.
+
+This patch once again prevents sync migration for transparent
+hugepage allocations as it is preferable to fail a THP allocation
+than stall. It was suggested that __GFP_NORETRY be used instead of
+__GFP_NO_KSWAPD. This would look less like a special case but would
+still cause compaction to run at least once with sync compaction.
+
+If accepted, this is a -stable candidate.
+
+Reported-by: Andy Isaacson <adi at hexapodia.org>
+Reported-by: Johannes Weiner <hannes at cmpxchg.org>
+Signed-off-by: Mel Gorman <mgorman at suse.de>
+---
+
+diff --git a/mm/page_alloc.c b/mm/page_alloc.c
+index 9dd443d..84bf962 100644
+--- a/mm/page_alloc.c
++++ b/mm/page_alloc.c
+@@ -2168,7 +2168,13 @@ rebalance:
+ 					sync_migration);
+ 	if (page)
+ 		goto got_pg;
+-	sync_migration = true;
++
++	/*
++	 * Do not use sync migration for transparent hugepage allocations as
++	 * it could stall writing back pages which is far worse than simply
++	 * failing to promote a page.
++	 */
++	sync_migration = !(gfp_mask & __GFP_NO_KSWAPD);
+ 
+ 	/* Try direct reclaim and then allocating */
+ 	page = __alloc_pages_direct_reclaim(gfp_mask, order,