[kernel/f16] Add patch to fix BUG_ON mm->nr_ptes (rhbz 786632)

Josh Boyer jwboyer at fedoraproject.org
Mon Mar 5 20:39:37 UTC 2012


commit c07883e8e825714cc1dab4d6252902840ab23f24
Author: Josh Boyer <jwboyer at redhat.com>
Date:   Mon Mar 5 15:39:03 2012 -0500

    Add patch to fix BUG_ON mm->nr_ptes (rhbz 786632)

 kernel.spec                        |    7 ++
 mm-thp-fix-BUG-on-mm-nr_ptes.patch |  121 ++++++++++++++++++++++++++++++++++++
 2 files changed, 128 insertions(+), 0 deletions(-)
---
diff --git a/kernel.spec b/kernel.spec
index a917a86..e1f8dde 100644
--- a/kernel.spec
+++ b/kernel.spec
@@ -827,6 +827,9 @@ Patch21243: disable-threading-in-compression-for-hibernate.patch
 Patch21244: regset-Prevent-null-pointer-reference-on-readonly-re.patch
 Patch21245: regset-Return-EFAULT-not-EIO-on-host-side-memory-fau.patch
 
+#rhbz 786632
+Patch21246: mm-thp-fix-BUG-on-mm-nr_ptes.patch
+
 Patch21300: unhandled-irqs-switch-to-polling.patch
 
 # compat-wireless patches
@@ -1580,6 +1583,9 @@ ApplyPatch disable-threading-in-compression-for-hibernate.patch
 ApplyPatch regset-Prevent-null-pointer-reference-on-readonly-re.patch
 ApplyPatch regset-Return-EFAULT-not-EIO-on-host-side-memory-fau.patch
 
+#rhbz 786632
+ApplyPatch mm-thp-fix-BUG-on-mm-nr_ptes.patch
+
 ApplyPatch unhandled-irqs-switch-to-polling.patch
 
 # END OF PATCH APPLICATIONS
@@ -2400,6 +2406,7 @@ fi
 %changelog
 * Mon Mar 05 2012 Josh Boyer <jwboyer at redhat.com>
 - CVE-2012-1097 regset: Prevent null pointer reference on readonly regsets
+- Add patch to fix BUG_ON mm->nr_ptes (rhbz 786632)
 
 * Fri Mar 02 2012 Dave Jones <davej at redhat.com>
 - Enable VM debugging in non-debug kernels too.
diff --git a/mm-thp-fix-BUG-on-mm-nr_ptes.patch b/mm-thp-fix-BUG-on-mm-nr_ptes.patch
new file mode 100644
index 0000000..44be5d5
--- /dev/null
+++ b/mm-thp-fix-BUG-on-mm-nr_ptes.patch
@@ -0,0 +1,121 @@
+On Thu, 16 Feb 2012, Andrea Arcangeli wrote:
+> On Thu, Feb 16, 2012 at 01:53:04AM -0800, Hugh Dickins wrote:
+> > Yes (and I think less troublesome than most BUGs, coming at exit
+> > while not holding locks; though we could well make it a WARN_ON,
+> > I don't think that existed back in the day).
+> 
+> A WARN_ON would be fine with me, go ahead if you prefer it... only
+> risk would be to go unnoticed or be underestimated. I am ok with the
+> BUG_ON too (even if this time it triggered false positives... sigh).
+> 
+> > Acked-by: Hugh Dickins <hughd at google.com>
+> 
+> Thanks for the quick review!
+> 
+> > In looking into the bug, it had actually bothered me a little that you
+> > were setting aside those pages, yet not counting them into nr_ptes;
+> > though the only thing that cares is oom_kill.c, and the count of pages
+> > in each hugepage can only dwarf the count in nr_ptes (whereas, without
+> > hugepages, it's possible to populate very sparsely and nr_ptes become
+> > significant).
+> 
+> Agreed, it's not significant either ways.
+> 
+> Running my two primary systems with this applied for half a day and no
+> problem so far so it should be good foro -mm at least.
+
+And I've had no trouble running your patch since then (but I never hit
+the bug it fixes either).  But we've all forgottent about it, so let me
+bring your patch back inline (I've added one introductory sentence) and
+address to akpm...
+
+
+From: Andrea Arcangeli <aarcange at redhat.com>
+Subject: [PATCH] mm: thp: fix BUG on mm->nr_ptes
+
+Dave Jones reports a few Fedora users hitting the BUG_ON(mm->nr_ptes...)
+in exit_mmap() recently.
+
+Quoting Hugh's discovery and explanation of the SMP race condition:
+
+===
+mm->nr_ptes had unusual locking: down_read mmap_sem plus
+page_table_lock when incrementing, down_write mmap_sem (or mm_users 0)
+when decrementing; whereas THP is careful to increment and decrement
+it under page_table_lock.
+
+Now most of those paths in THP also hold mmap_sem for read or write
+(with appropriate checks on mm_users), but two do not: when
+split_huge_page() is called by hwpoison_user_mappings(), and when
+called by add_to_swap().
+
+It's conceivable that the latter case is responsible for the
+exit_mmap() BUG_ON mm->nr_ptes that has been reported on Fedora.
+===
+
+The simplest way to fix it without having to alter the locking is to
+make split_huge_page() a noop in nr_ptes terms, so by counting the
+preallocated pagetables that exists for every mapped hugepage. It was
+an arbitrary choice not to count them and either way is not wrong or
+right, because they are not used but they're still allocated.
+
+Reported-by: Dave Jones <davej at redhat.com>
+Reported-by: Hugh Dickins <hughd at google.com>
+Signed-off-by: Andrea Arcangeli <aarcange at redhat.com>
+Acked-by: Hugh Dickins <hughd at google.com>
+---
+ mm/huge_memory.c |    6 +++---
+ 1 files changed, 3 insertions(+), 3 deletions(-)
+
+diff --git a/mm/huge_memory.c b/mm/huge_memory.c
+index 91d3efb..8f7fc39 100644
+--- a/mm/huge_memory.c
++++ b/mm/huge_memory.c
+@@ -671,6 +671,7 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
+ 		set_pmd_at(mm, haddr, pmd, entry);
+ 		prepare_pmd_huge_pte(pgtable, mm);
+ 		add_mm_counter(mm, MM_ANONPAGES, HPAGE_PMD_NR);
++		mm->nr_ptes++;
+ 		spin_unlock(&mm->page_table_lock);
+ 	}
+ 
+@@ -789,6 +790,7 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
+ 	pmd = pmd_mkold(pmd_wrprotect(pmd));
+ 	set_pmd_at(dst_mm, addr, dst_pmd, pmd);
+ 	prepare_pmd_huge_pte(pgtable, dst_mm);
++	dst_mm->nr_ptes++;
+ 
+ 	ret = 0;
+ out_unlock:
+@@ -887,7 +889,6 @@ static int do_huge_pmd_wp_page_fallback(struct mm_struct *mm,
+ 	}
+ 	kfree(pages);
+ 
+-	mm->nr_ptes++;
+ 	smp_wmb(); /* make pte visible before pmd */
+ 	pmd_populate(mm, pmd, pgtable);
+ 	page_remove_rmap(page);
+@@ -1047,6 +1048,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
+ 			VM_BUG_ON(page_mapcount(page) < 0);
+ 			add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR);
+ 			VM_BUG_ON(!PageHead(page));
++			tlb->mm->nr_ptes--;
+ 			spin_unlock(&tlb->mm->page_table_lock);
+ 			tlb_remove_page(tlb, page);
+ 			pte_free(tlb->mm, pgtable);
+@@ -1375,7 +1377,6 @@ static int __split_huge_page_map(struct page *page,
+ 			pte_unmap(pte);
+ 		}
+ 
+-		mm->nr_ptes++;
+ 		smp_wmb(); /* make pte visible before pmd */
+ 		/*
+ 		 * Up to this point the pmd is present and huge and
+@@ -1988,7 +1989,6 @@ static void collapse_huge_page(struct mm_struct *mm,
+ 	set_pmd_at(mm, address, pmd, _pmd);
+ 	update_mmu_cache(vma, address, _pmd);
+ 	prepare_pmd_huge_pte(pgtable, mm);
+-	mm->nr_ptes--;
+ 	spin_unlock(&mm->page_table_lock);
+ 
+ #ifndef CONFIG_NUMA


More information about the scm-commits mailing list