[kernel] CVE-2012-2372 mm: 32bit PAE pmd walk vs populate SMP race (rhbz 822821 822825)

Josh Boyer jwboyer at fedoraproject.org
Thu May 24 11:47:28 UTC 2012


commit 440886d0908f9f900d0fcd4b77d0838e940df7d6
Author: Josh Boyer <jwboyer at redhat.com>
Date:   Thu May 24 07:43:48 2012 -0400

    CVE-2012-2372 mm: 32bit PAE pmd walk vs populate SMP race (rhbz 822821 822825)

 kernel.spec                                        |   11 +-
 ...d-walk-vs-pmd_populate-SMP-race-condition.patch |  272 ++++++++++++++++++++
 2 files changed, 282 insertions(+), 1 deletions(-)
---
diff --git a/kernel.spec b/kernel.spec
index 42854aa..0a91e02 100644
--- a/kernel.spec
+++ b/kernel.spec
@@ -62,7 +62,7 @@ Summary: The Linux kernel
 # For non-released -rc kernels, this will be appended after the rcX and
 # gitX tags, so a 3 here would become part of release "0.rcX.gitX.3"
 #
-%global baserelease 1
+%global baserelease 2
 %global fedora_build %{baserelease}
 
 # base_sublevel is the kernel version we're starting with and patching
@@ -753,6 +753,9 @@ Patch22000: weird-root-dentry-name-debug.patch
 #selinux ptrace child permissions
 Patch22001: selinux-apply-different-permission-to-ptrace-child.patch
 
+#rhbz 822825 822821 CVE-2012-2372
+Patch22021: mm-pmd_read_atomic-fix-32bit-PAE-pmd-walk-vs-pmd_populate-SMP-race-condition.patch
+
 # END OF PATCH DEFINITIONS
 
 %endif
@@ -1451,6 +1454,9 @@ ApplyPatch highbank-export-clock-functions.patch
 #vgaarb patches.  blame mjg59
 ApplyPatch vgaarb-vga_default_device.patch
 
+#rhbz 822825 822821 CVE-2012-2372
+ApplyPatch mm-pmd_read_atomic-fix-32bit-PAE-pmd-walk-vs-pmd_populate-SMP-race-condition.patch
+
 # END OF PATCH APPLICATIONS
 
 %endif
@@ -2301,6 +2307,9 @@ fi
 #                 ||----w |
 #                 ||     ||
 %changelog
+* Thu May 24 2012 Josh Boyer <jwboyer at redhat.com>
+- CVE-2012-2372 mm: 32bit PAE pmd walk vs populate SMP race (rhbz 822821 822825)
+
 * Thu May 24 2012 Peter Robinson <pbrobinson at fedoraproject.org>
 - Don't build Nokia ARM device support
 
diff --git a/mm-pmd_read_atomic-fix-32bit-PAE-pmd-walk-vs-pmd_populate-SMP-race-condition.patch b/mm-pmd_read_atomic-fix-32bit-PAE-pmd-walk-vs-pmd_populate-SMP-race-condition.patch
new file mode 100644
index 0000000..49ff98a
--- /dev/null
+++ b/mm-pmd_read_atomic-fix-32bit-PAE-pmd-walk-vs-pmd_populate-SMP-race-condition.patch
@@ -0,0 +1,272 @@
+Path: news.gmane.org!not-for-mail
+From: Andrea Arcangeli <aarcange at redhat.com>
+Newsgroups: gmane.linux.kernel.mm
+Subject: [PATCH] mm: pmd_read_atomic: fix 32bit PAE pmd walk vs pmd_populate SMP race condition
+Date: Thu, 24 May 2012 01:39:01 +0200
+Lines: 208
+Approved: news at gmane.org
+Message-ID: <1337816341-30743-1-git-send-email-aarcange at redhat.com>
+References: <20120518230028.GF32479 at redhat.com>
+NNTP-Posting-Host: plane.gmane.org
+X-Trace: dough.gmane.org 1337816354 18906 80.91.229.3 (23 May 2012 23:39:14 GMT)
+X-Complaints-To: usenet at dough.gmane.org
+NNTP-Posting-Date: Wed, 23 May 2012 23:39:14 +0000 (UTC)
+Cc: Andrew Morton <akpm at linux-foundation.org>, Mel Gorman <mgorman at suse.de>,
+        Hugh Dickins <hughd at google.com>, Larry Woodman <lwoodman at redhat.com>,
+        Petr Matousek <pmatouse at redhat.com>,
+        Ulrich Obergfell <uobergfe at redhat.com>, Rik van Riel <riel at redhat.com>
+To: linux-mm at kvack.org
+Original-X-From: owner-linux-mm at kvack.org Thu May 24 01:39:12 2012
+Return-path: <owner-linux-mm at kvack.org>
+Envelope-to: glkm-linux-mm-2 at m.gmane.org
+Original-Received: from kanga.kvack.org ([205.233.56.17])
+	by plane.gmane.org with esmtp (Exim 4.69)
+	(envelope-from <owner-linux-mm at kvack.org>)
+	id 1SXL94-0002ub-3P
+	for glkm-linux-mm-2 at m.gmane.org; Thu, 24 May 2012 01:39:10 +0200
+Original-Received: by kanga.kvack.org (Postfix)
+	id 1684A6B0083; Wed, 23 May 2012 19:39:09 -0400 (EDT)
+Delivered-To: linux-mm-outgoing at kvack.org
+Original-Received: by kanga.kvack.org (Postfix, from userid 40)
+	id 080DD6B0092; Wed, 23 May 2012 19:39:08 -0400 (EDT)
+X-Original-To: int-list-linux-mm at kvack.org
+Delivered-To: int-list-linux-mm at kvack.org
+Original-Received: by kanga.kvack.org (Postfix, from userid 63042)
+	id C84046B00E7; Wed, 23 May 2012 19:39:08 -0400 (EDT)
+X-Original-To: linux-mm at kvack.org
+Delivered-To: linux-mm at kvack.org
+Original-Received: from psmtp.com (na3sys010amx119.postini.com [74.125.245.119])
+	by kanga.kvack.org (Postfix) with SMTP id 0B2DC6B0083
+	for <linux-mm at kvack.org>; Wed, 23 May 2012 19:39:07 -0400 (EDT)
+Original-Received: from mx1.redhat.com ([209.132.183.28]) (using TLSv1) by na3sys010amx119.postini.com ([74.125.244.10]) with SMTP;
+	Wed, 23 May 2012 18:39:08 CDT
+Original-Received: from int-mx12.intmail.prod.int.phx2.redhat.com (int-mx12.intmail.prod.int.phx2.redhat.com [10.5.11.25])
+	by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id q4NNd3dP002492
+	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK);
+	Wed, 23 May 2012 19:39:03 -0400
+Original-Received: from random.random (ovpn-113-72.phx2.redhat.com [10.3.113.72])
+	by int-mx12.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id q4NNd1P7012233;
+	Wed, 23 May 2012 19:39:02 -0400
+In-Reply-To: <20120518230028.GF32479 at redhat.com>
+X-Scanned-By: MIMEDefang 2.68 on 10.5.11.25
+X-pstn-neptune: 0/0/0.00/0
+X-pstn-levels: (S:99.90000/99.90000 CV:99.9000 FC:95.5390 LC:95.5390 R:95.9108 P:95.9108 M:97.0282 C:98.6951 )
+X-pstn-dkim: 0 skipped:not-enabled
+X-pstn-settings: 3 (1.0000:1.0000) s cv gt3 gt2 gt1 r p m c 
+X-pstn-addresses: from <aarcange at redhat.com> [db-null] 
+X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.2
+Original-Sender: owner-linux-mm at kvack.org
+Precedence: bulk
+X-Loop: owner-majordomo at kvack.org
+List-ID: <linux-mm.kvack.org>
+Xref: news.gmane.org gmane.linux.kernel.mm:78936
+Archived-At: <http://permalink.gmane.org/gmane.linux.kernel.mm/78936>
+
+When holding the mmap_sem for reading, pmd_offset_map_lock should only
+run on a pmd_t that has been read atomically from the pmdp
+pointer, otherwise we may read only half of it leading to this crash.
+
+PID: 11679  TASK: f06e8000  CPU: 3   COMMAND: "do_race_2_panic"
+ #0 [f06a9dd8] crash_kexec at c049b5ec
+ #1 [f06a9e2c] oops_end at c083d1c2
+ #2 [f06a9e40] no_context at c0433ded
+ #3 [f06a9e64] bad_area_nosemaphore at c043401a
+ #4 [f06a9e6c] __do_page_fault at c0434493
+ #5 [f06a9eec] do_page_fault at c083eb45
+ #6 [f06a9f04] error_code (via page_fault) at c083c5d5
+    EAX: 01fb470c EBX: fff35000 ECX: 00000003 EDX: 00000100 EBP:
+    00000000
+    DS:  007b     ESI: 9e201000 ES:  007b     EDI: 01fb4700 GS:  00e0
+    CS:  0060     EIP: c083bc14 ERR: ffffffff EFLAGS: 00010246
+ #7 [f06a9f38] _spin_lock at c083bc14
+ #8 [f06a9f44] sys_mincore at c0507b7d
+ #9 [f06a9fb0] system_call at c083becd
+                         start           len
+    EAX: ffffffda  EBX: 9e200000  ECX: 00001000  EDX: 6228537f
+    DS:  007b      ESI: 00000000  ES:  007b      EDI: 003d0f00
+    SS:  007b      ESP: 62285354  EBP: 62285388  GS:  0033
+    CS:  0073      EIP: 00291416  ERR: 000000da  EFLAGS: 00000286
+
+This should be a longstanding bug affecting x86 32bit PAE without
+THP. Only archs with 64bit large pmd_t and 32bit unsigned long should
+be affected.
+
+With THP enabled the barrier() in
+pmd_none_or_trans_huge_or_clear_bad() would partly hide the bug when
+the pmd transition from none to stable, by forcing a re-read of the
+*pmd in pmd_offset_map_lock, but when THP is enabled a new set of
+problem arises by the fact could then transition freely in any of the
+none, pmd_trans_huge or pmd_trans_stable states. So making the barrier
+in pmd_none_or_trans_huge_or_clear_bad() unconditional isn't good idea
+and it would be a flakey solution.
+
+This should be fully fixed by introducing a pmd_read_atomic that reads
+the pmd in order with THP disabled, or by reading the pmd atomically
+with cmpxchg8b with THP enabled.
+
+Luckily this new race condition only triggers in the places that must
+already be covered by pmd_none_or_trans_huge_or_clear_bad() so the fix
+is localized there but this bug is not related to THP.
+
+NOTE: this can trigger on x86 32bit systems with PAE enabled with more
+than 4G of ram, otherwise the high part of the pmd will never risk to
+be truncated because it would be zero at all times, in turn so hiding
+the SMP race.
+
+This bug was discovered and fully debugged by Ulrich, quote:
+
+----
+[..]
+pmd_none_or_trans_huge_or_clear_bad() loads the content of edx and
+eax.
+
+    496 static inline int pmd_none_or_trans_huge_or_clear_bad(pmd_t
+    *pmd)
+    497 {
+    498         /* depend on compiler for an atomic pmd read */
+    499         pmd_t pmdval = *pmd;
+
+                                // edi = pmd pointer
+0xc0507a74 <sys_mincore+548>:   mov    0x8(%esp),%edi
+...
+                                // edx = PTE page table high address
+0xc0507a84 <sys_mincore+564>:   mov    0x4(%edi),%edx
+...
+                                // eax = PTE page table low address
+0xc0507a8e <sys_mincore+574>:   mov    (%edi),%eax
+
+[..]
+
+Please note that the PMD is not read atomically. These are two "mov"
+instructions where the high order bits of the PMD entry are fetched
+first. Hence, the above machine code is prone to the following race.
+
+-  The PMD entry {high|low} is 0x0000000000000000.
+   The "mov" at 0xc0507a84 loads 0x00000000 into edx.
+
+-  A page fault (on another CPU) sneaks in between the two "mov"
+   instructions and instantiates the PMD.
+
+-  The PMD entry {high|low} is now 0x00000003fda38067.
+   The "mov" at 0xc0507a8e loads 0xfda38067 into eax.
+----
+
+Reported-by: Ulrich Obergfell <uobergfe at redhat.com>
+Signed-off-by: Andrea Arcangeli <aarcange at redhat.com>
+---
+ arch/x86/include/asm/pgtable-3level.h |   50 +++++++++++++++++++++++++++++++++
+ include/asm-generic/pgtable.h         |   22 +++++++++++++-
+ 2 files changed, 70 insertions(+), 2 deletions(-)
+
+diff --git a/arch/x86/include/asm/pgtable-3level.h b/arch/x86/include/asm/pgtable-3level.h
+index effff47..43876f1 100644
+--- a/arch/x86/include/asm/pgtable-3level.h
++++ b/arch/x86/include/asm/pgtable-3level.h
+@@ -31,6 +31,56 @@ static inline void native_set_pte(pte_t *ptep, pte_t pte)
+ 	ptep->pte_low = pte.pte_low;
+ }
+ 
++#define pmd_read_atomic pmd_read_atomic
++/*
++ * pte_offset_map_lock on 32bit PAE kernels was reading the pmd_t with
++ * a "*pmdp" dereference done by gcc. Problem is, in certain places
++ * where pte_offset_map_lock is called, concurrent page faults are
++ * allowed, if the mmap_sem is hold for reading. An example is mincore
++ * vs page faults vs MADV_DONTNEED. On the page fault side
++ * pmd_populate rightfully does a set_64bit, but if we're reading the
++ * pmd_t with a "*pmdp" on the mincore side, a SMP race can happen
++ * because gcc will not read the 64bit of the pmd atomically. To fix
++ * this all places running pmd_offset_map_lock() while holding the
++ * mmap_sem in read mode, shall read the pmdp pointer using this
++ * function to know if the pmd is null nor not, and in turn to know if
++ * they can run pmd_offset_map_lock or pmd_trans_huge or other pmd
++ * operations.
++ *
++ * Without THP if the mmap_sem is hold for reading, the
++ * pmd can only transition from null to not null while pmd_read_atomic runs.
++ * So there's no need of literally reading it atomically.
++ *
++ * With THP if the mmap_sem is hold for reading, the pmd can become
++ * THP or null or point to a pte (and in turn become "stable") at any
++ * time under pmd_read_atomic, so it's mandatory to read it atomically
++ * with cmpxchg8b.
++ */
++#ifndef CONFIG_TRANSPARENT_HUGEPAGE
++static inline pmd_t pmd_read_atomic(pmd_t *pmdp)
++{
++	pmdval_t ret;
++	u32 *tmp = (u32 *)pmdp;
++
++	ret = (pmdval_t) (*tmp);
++	if (ret) {
++		/*
++		 * If the low part is null, we must not read the high part
++		 * or we can end up with a partial pmd.
++		 */
++		smp_rmb();
++		ret |= ((pmdval_t)*(tmp + 1)) << 32;
++	}
++
++	return (pmd_t) { ret };
++}
++#else /* CONFIG_TRANSPARENT_HUGEPAGE */
++static inline pmd_t pmd_read_atomic(pmd_t *pmdp)
++{
++	return (pmd_t) { atomic64_read((atomic64_t *)pmdp) };
++}
++#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
++
+ static inline void native_set_pte_atomic(pte_t *ptep, pte_t pte)
+ {
+ 	set_64bit((unsigned long long *)(ptep), native_pte_val(pte));
+diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
+index 125c54e..fa596d9 100644
+--- a/include/asm-generic/pgtable.h
++++ b/include/asm-generic/pgtable.h
+@@ -446,6 +446,18 @@ static inline int pmd_write(pmd_t pmd)
+ #endif /* __HAVE_ARCH_PMD_WRITE */
+ #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
+ 
++#ifndef pmd_read_atomic
++static inline pmd_t pmd_read_atomic(pmd_t *pmdp)
++{
++	/*
++	 * Depend on compiler for an atomic pmd read. NOTE: this is
++	 * only going to work, if the pmdval_t isn't larger than
++	 * an unsigned long.
++	 */
++	return *pmdp;
++}
++#endif
++
+ /*
+  * This function is meant to be used by sites walking pagetables with
+  * the mmap_sem hold in read mode to protect against MADV_DONTNEED and
+@@ -459,11 +471,17 @@ static inline int pmd_write(pmd_t pmd)
+  * undefined so behaving like if the pmd was none is safe (because it
+  * can return none anyway). The compiler level barrier() is critically
+  * important to compute the two checks atomically on the same pmdval.
++ *
++ * For 32bit kernels with a 64bit large pmd_t this automatically takes
++ * care of reading the pmd atomically to avoid SMP race conditions
++ * against pmd_populate() when the mmap_sem is hold for reading by the
++ * caller (a special atomic read not done by "gcc" as in the generic
++ * version above, is also needed when THP is disabled because the page
++ * fault can populate the pmd from under us).
+  */
+ static inline int pmd_none_or_trans_huge_or_clear_bad(pmd_t *pmd)
+ {
+-	/* depend on compiler for an atomic pmd read */
+-	pmd_t pmdval = *pmd;
++	pmd_t pmdval = pmd_read_atomic(pmd);
+ 	/*
+ 	 * The barrier will stabilize the pmdval in a register or on
+ 	 * the stack so that it will stop changing under the code.
+
+--
+To unsubscribe, send a message with 'unsubscribe linux-mm' in
+the body to majordomo at kvack.org.  For more info on Linux MM,
+see: http://www.linux-mm.org/ .
+Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
+Don't email: <a href=mailto:"dont at kvack.org"> email at kvack.org </a>
+


More information about the scm-commits mailing list