rpms/kernel/devel linux-2.6-die-closed-source-bios-muppets-die.patch, NONE, 1.1 linux-2.6-intel-iommu-updates.patch, NONE, 1.1 config-x86-generic, 1.86, 1.87 config-x86_64-generic, 1.88, 1.89 kernel.spec, 1.1706, 1.1707

David Woodhouse dwmw2 at fedoraproject.org
Mon Aug 10 14:21:09 UTC 2009


Author: dwmw2

Update of /cvs/pkgs/rpms/kernel/devel
In directory cvs1.fedora.phx.redhat.com:/tmp/cvs-serv17907

Modified Files:
	config-x86-generic config-x86_64-generic kernel.spec 
Added Files:
	linux-2.6-die-closed-source-bios-muppets-die.patch 
	linux-2.6-intel-iommu-updates.patch 
Log Message:
Updates and workarounds for Intel IOMMU; re-enable it

linux-2.6-die-closed-source-bios-muppets-die.patch:
 pci-quirks.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

--- NEW FILE linux-2.6-die-closed-source-bios-muppets-die.patch ---
>From iommu-bounces at lists.linux-foundation.org Mon Aug  3 11:42:21 2009
Return-path: <iommu-bounces at lists.linux-foundation.org>
Envelope-to: dwmw2 at twosheds.infradead.org
Delivery-date: Mon, 03 Aug 2009 11:42:21 +0000
Received: from bombadil.infradead.org
 ([2001:4830:2446:ff00:214:51ff:fe65:c65c]) by twosheds.infradead.org with
 esmtps (Exim 4.69 #1 (Red Hat Linux)) id 1MXvvp-00015t-LN for
 dwmw2 at twosheds.infradead.org; Mon, 03 Aug 2009 11:42:21 +0000
Received: from smtp1.linux-foundation.org ([140.211.169.13]) by
 bombadil.infradead.org with esmtps (Exim 4.69 #1 (Red Hat Linux)) id
 1MXvvk-00078a-3q; Mon, 03 Aug 2009 11:42:20 +0000
Received: from daredevil.linux-foundation.org (localhost [127.0.0.1]) by
 smtp1.linux-foundation.org (8.14.2/8.13.5/Debian-3ubuntu1.1) with ESMTP id
 n73Bf3qH007400; Mon, 3 Aug 2009 04:41:33 -0700
Received: from bombadil.infradead.org (bombadil.infradead.org
 [18.85.46.34]) by smtp1.linux-foundation.org
 (8.14.2/8.13.5/Debian-3ubuntu1.1) with ESMTP id n73BeU5T007358
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for
 <iommu at lists.linux-foundation.org>; Mon, 3 Aug 2009 04:40:32 -0700
Received: from macbook.infradead.org ([2001:8b0:10b:1:216:eaff:fe05:bbb8])
 by bombadil.infradead.org with esmtpsa (Exim 4.69 #1 (Red Hat Linux)) id
 1MXvu1-000759-Lm; Mon, 03 Aug 2009 11:40:30 +0000
Subject: [PATCH] Work around BIOS bugs by quiescing USB controllers earlier
From: David Woodhouse <dwmw2 at infradead.org>
To: "linux-usb at vger.kernel.org" <linux-usb at vger.kernel.org>
Date: Mon, 03 Aug 2009 12:40:27 +0100
Message-Id: <1249299627.14968.1.camel at macbook.infradead.org>
Mime-Version: 1.0
X-Mailer: Evolution 2.26.3 (2.26.3-1.fc11) 
X-SRS-Rewrite: SMTP reverse-path rewritten from <dwmw2 at infradead.org> by
 bombadil.infradead.org See http://www.infradead.org/rpr.html
Received-SPF: pass (localhost is always allowed.)
X-Spam-Status: No, hits=-5.585 required=5
 tests=AWL,BAYES_00,OSDL_HEADER_SUBJECT_BRACKETED,PATCH_SUBJECT_OSDL
X-Spam-Checker-Version: SpamAssassin 3.2.4-osdl_revision__1.47__
X-MIMEDefang-Filter: lf$Revision: 1.188 $
X-Scanned-By: MIMEDefang 2.63 on 140.211.169.21
Cc: iommu at lists.linux-foundation.org
X-BeenThere: iommu at lists.linux-foundation.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Development issues for Linux IOMMU support
 <iommu.lists.linux-foundation.org>
List-Unsubscribe:
 <https://lists.linux-foundation.org/mailman/listinfo/iommu>, 
 <mailto:iommu-request at lists.linux-foundation.org?subject=unsubscribe>
List-Archive: <http://lists.linux-foundation.org/pipermail/iommu>
List-Post: <mailto:iommu at lists.linux-foundation.org>
List-Help: <mailto:iommu-request at lists.linux-foundation.org?subject=help>
List-Subscribe:
 <https://lists.linux-foundation.org/mailman/listinfo/iommu>,
 <mailto:iommu-request at lists.linux-foundation.org?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Sender: iommu-bounces at lists.linux-foundation.org
Errors-To: iommu-bounces at lists.linux-foundation.org
X-Spam-Score: 0.0 (/)
X-Evolution-Source: imap://dwmw2@baythorne.infradead.org/
Content-Transfer-Encoding: 8bit

We are seeing a number of crashes in SMM, when VT-d is enabled while
'Legacy USB support' is enabled in various BIOSes.

The BIOS is supposed to indicate which addresses it uses for DMA in a
special ACPI table ("RMRR"), so that we can punch a hole for it when we
set up the IOMMU.

The problem is, as usual, that BIOS engineers are totally incompetent.
They write code which will crash if the DMA goes AWOL, and then they
either neglect to provide an RMRR table at all, or they put the wrong
addresses in it. And of course they don't do _any_ QA, since that would
take too much time away from their crack-smoking habit.

The real fix, of course, is for consumers to refuse to buy motherboards
which only have closed-source firmware available. If we had _open_
firmware, bugs like this would be easy to fix.

Since that's something I can only dream about, this patch implements an
alternative -- ensuring that the USB controllers are handed off from the
BIOS and quiesced _before_ the IOMMU is initialised. That would have
been a much better design than this RMRR nonsense in the first place, of
course. The bootloader has no business doing DMA after the OS has booted
anyway.

Signed-off-by: David Woodhouse <David.Woodhouse at intel.com>
---
Is this reasonable? At first glance, it looks like everything we do here
is perfectly OK to be done earlier.

diff --git a/drivers/usb/host/pci-quirks.c b/drivers/usb/host/pci-quirks.c
index 83b5f9c..7708886 100644
@@ -475,4 +478,4 @@ static void __devinit quirk_usb_early_handoff(struct pci_dev *pdev)
 	else if (pdev->class == PCI_CLASS_SERIAL_USB_XHCI)
 		quirk_usb_handoff_xhci(pdev);
 }
-DECLARE_PCI_FIXUP_FINAL(PCI_ANY_ID, PCI_ANY_ID, quirk_usb_early_handoff);
+DECLARE_PCI_FIXUP_HEADER(PCI_ANY_ID, PCI_ANY_ID, quirk_usb_early_handoff);


-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse at intel.com                              Intel Corporation

_______________________________________________
iommu mailing list
iommu at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/iommu

linux-2.6-intel-iommu-updates.patch:
 arch/x86/kernel/pci-dma.c     |    4 
 arch/x86/kernel/pci-swiotlb.c |    5 
 drivers/pci/dmar.c            |   22 +++
 drivers/pci/intel-iommu.c     |  243 +++++++++++++++++++-----------------------
 drivers/pci/iova.c            |   16 --
 include/linux/iova.h          |    1 
 6 files changed, 135 insertions(+), 156 deletions(-)

--- NEW FILE linux-2.6-intel-iommu-updates.patch ---
commit 5fe60f4e5871b64e687229199fafd4ef13cd0886
Author: David Woodhouse <David.Woodhouse at intel.com>
Date:   Sun Aug 9 10:53:41 2009 +0100

    intel-iommu: make domain_add_dev_info() call domain_context_mapping()
    
    All callers of the former were also calling the latter, in one order or
    the other, and failing to correctly clean up if the second returned
    failure.
    
    Signed-off-by: David Woodhouse <David.Woodhouse at intel.com>

commit a131bc185528331451a93db6c50a7d2070376a61
Merge: 19943b0 ff1649f
Author: David Woodhouse <David.Woodhouse at intel.com>
Date:   Sat Aug 8 11:25:28 2009 +0100

    Merge branch 'master' of /pub/scm/linux/kernel/git/torvalds/linux-2.6
    
    Pull fixes in from 2.6.31 so that people testing the iommu-2.6.git tree
    no longer trip over bugs which were already fixed (sorry, Horms).

commit 19943b0e30b05d42e494ae6fef78156ebc8c637e
Author: David Woodhouse <David.Woodhouse at intel.com>
Date:   Tue Aug 4 16:19:20 2009 +0100

    intel-iommu: Unify hardware and software passthrough support
    
    This makes the hardware passthrough mode work a lot more like the
    software version, so that the behaviour of a kernel with 'iommu=pt'
    is the same whether the hardware supports passthrough or not.
    
    In particular:
     - We use a single si_domain for the pass-through devices.
     - 32-bit devices can be taken out of the pass-through domain so that
       they don't have to use swiotlb.
     - Devices will work again after being removed from a KVM guest.
     - A potential oops on OOM (in init_context_pass_through()) is fixed.
    
    Signed-off-by: David Woodhouse <David.Woodhouse at intel.com>

commit 0815565adfe3f4c369110c57d8ffe83caefeed68
Author: David Woodhouse <David.Woodhouse at intel.com>
Date:   Tue Aug 4 09:17:20 2009 +0100

    intel-iommu: Cope with broken HP DC7900 BIOS
    
    Yet another reason why trusting this stuff to the BIOS was a bad idea.
    The HP DC7900 BIOS reports an iommu at an address which just returns all
    ones, when VT-d is disabled in the BIOS.
    
    Fix up the missing iounmap in the error paths while we're at it.
    
    Signed-off-by: David Woodhouse <David.Woodhouse at intel.com>

commit cfc65dd57967f2e0c7b3a8b73e6d12470b1cf1c1
Author: Alex Williamson <alex.williamson at hp.com>
Date:   Thu Jul 30 16:15:18 2009 -0600

    iommu=pt is a valid early param
    
    This avoids a "Malformed early option 'iommu'" warning on boot when
    trying to use pass-through mode.
    
    Signed-off-by: Alex Williamson <alex.williamson at hp.com>
    Signed-off-by: David Woodhouse <David.Woodhouse at intel.com>

commit 86f4d0123b1fddb47d35b9a893f8c0b94bf89abe
Author: Dan Carpenter <error27 at gmail.com>
Date:   Sun Jul 19 14:47:45 2009 +0300

    intel-iommu: double kfree()
    
    g_iommus is freed after we "goto error;".
    
    Found by smatch (http://repo.or.cz/w/smatch.git).
    
    Signed-off-by: Dan Carpenter <error27 at gmail.com>
    Signed-off-by: David Woodhouse <David.Woodhouse at intel.com>

commit 0db9b7aebb6a1c2bba2d0636ae0b1f9ef729c827
Author: David Woodhouse <David.Woodhouse at intel.com>
Date:   Tue Jul 14 02:01:57 2009 +0100

    intel-iommu: Kill pointless intel_unmap_single() function
    
    Signed-off-by: David Woodhouse <David.Woodhouse at intel.com>

commit acea0018a24b794e32afea4f3be4230c58f2f8e3
Author: David Woodhouse <David.Woodhouse at intel.com>
Date:   Tue Jul 14 01:55:11 2009 +0100

    intel-iommu: Defer the iotlb flush and iova free for intel_unmap_sg() too.
    
    I see no reason why we did this _only_ in intel_unmap_page().
    
    Signed-off-by: David Woodhouse <David.Woodhouse at intel.com>

commit 3d39cecc4841e8d4c4abdb401d10180f5faaded0
Author: David Woodhouse <David.Woodhouse at intel.com>
Date:   Wed Jul 8 15:23:30 2009 +0100

    intel-iommu: Remove superfluous iova_alloc_lock from IOVA code
    
    We only ever obtain this lock immediately before the iova_rbtree_lock,
    and release it immediately after the iova_rbtree_lock. So ditch it and
    just use iova_rbtree_lock.
    
    [v2: Remove the lockdep bits this time too]
    Signed-off-by: David Woodhouse <David.Woodhouse at intel.com>

commit 147202aa772329a02c6e80bc2b7a6b8dd3deac0b
Author: David Woodhouse <David.Woodhouse at intel.com>
Date:   Tue Jul 7 19:43:20 2009 +0100

    intel-iommu: Speed up map routines by using cached domain ASAP
    
    We did before, in the end -- but it was at the bottom of a long stack of
    functions. Add an inline wrapper get_valid_domain_for_dev() which will
    use the cached one _first_ and only make the out-of-line call if it's
    not already set.
    
    This takes the average time taken for a 1-page intel_map_sg() from 5961
    cycles to 4812 cycles on my Lenovo x200s test box -- a modest 20%.
    
    Signed-off-by: David Woodhouse <David.Woodhouse at intel.com>
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index 1a041bc..ae13e34 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -212,10 +212,8 @@ static __init int iommu_setup(char *p)
 		if (!strncmp(p, "soft", 4))
 			swiotlb = 1;
 #endif
-		if (!strncmp(p, "pt", 2)) {
+		if (!strncmp(p, "pt", 2))
 			iommu_pass_through = 1;
-			return 1;
-		}
 
 		gart_parse_options(p);
 
diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
index 6af96ee..1e66b18 100644
--- a/arch/x86/kernel/pci-swiotlb.c
+++ b/arch/x86/kernel/pci-swiotlb.c
@@ -71,9 +71,8 @@ void __init pci_swiotlb_init(void)
 {
 	/* don't initialize swiotlb if iommu=off (no_iommu=1) */
 #ifdef CONFIG_X86_64
-	if ((!iommu_detected && !no_iommu && max_pfn > MAX_DMA32_PFN) ||
-		iommu_pass_through)
-	       swiotlb = 1;
+	if ((!iommu_detected && !no_iommu && max_pfn > MAX_DMA32_PFN))
+		swiotlb = 1;
 #endif
 	if (swiotlb_force)
 		swiotlb = 1;
diff --git a/drivers/pci/dmar.c b/drivers/pci/dmar.c
index 7b287cb..380b60e 100644
--- a/drivers/pci/dmar.c
+++ b/drivers/pci/dmar.c
@@ -632,20 +632,31 @@ int alloc_iommu(struct dmar_drhd_unit *drhd)
 	iommu->cap = dmar_readq(iommu->reg + DMAR_CAP_REG);
 	iommu->ecap = dmar_readq(iommu->reg + DMAR_ECAP_REG);
 
+	if (iommu->cap == (uint64_t)-1 && iommu->ecap == (uint64_t)-1) {
+		/* Promote an attitude of violence to a BIOS engineer today */
+		WARN(1, "Your BIOS is broken; DMAR reported at address %llx returns all ones!\n"
+		     "BIOS vendor: %s; Ver: %s; Product Version: %s\n",
+		     drhd->reg_base_addr,
+		     dmi_get_system_info(DMI_BIOS_VENDOR),
+		     dmi_get_system_info(DMI_BIOS_VERSION),
+		     dmi_get_system_info(DMI_PRODUCT_VERSION));
+		goto err_unmap;
+	}
+
 #ifdef CONFIG_DMAR
 	agaw = iommu_calculate_agaw(iommu);
 	if (agaw < 0) {
 		printk(KERN_ERR
 		       "Cannot get a valid agaw for iommu (seq_id = %d)\n",
 		       iommu->seq_id);
-		goto error;
+		goto err_unmap;
 	}
 	msagaw = iommu_calculate_max_sagaw(iommu);
 	if (msagaw < 0) {
 		printk(KERN_ERR
 			"Cannot get a valid max agaw for iommu (seq_id = %d)\n",
 			iommu->seq_id);
-		goto error;
+		goto err_unmap;
 	}
 #endif
 	iommu->agaw = agaw;
@@ -665,7 +676,7 @@ int alloc_iommu(struct dmar_drhd_unit *drhd)
 	}
 
 	ver = readl(iommu->reg + DMAR_VER_REG);
-	pr_debug("IOMMU %llx: ver %d:%d cap %llx ecap %llx\n",
+	pr_info("IOMMU %llx: ver %d:%d cap %llx ecap %llx\n",
 		(unsigned long long)drhd->reg_base_addr,
 		DMAR_VER_MAJOR(ver), DMAR_VER_MINOR(ver),
 		(unsigned long long)iommu->cap,
@@ -675,7 +686,10 @@ int alloc_iommu(struct dmar_drhd_unit *drhd)
 
 	drhd->iommu = iommu;
 	return 0;
-error:
+
+ err_unmap:
+	iounmap(iommu->reg);
+ error:
 	kfree(iommu);
 	return -1;
 }
diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index 2314ad7..09606e9 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -251,7 +251,8 @@ static inline int first_pte_in_page(struct dma_pte *pte)
  * 	2. It maps to each iommu if successful.
  *	3. Each iommu mapps to this domain if successful.
  */
-struct dmar_domain *si_domain;
+static struct dmar_domain *si_domain;
+static int hw_pass_through = 1;
 
 /* devices under the same p2p bridge are owned in one domain */
 #define DOMAIN_FLAG_P2P_MULTIPLE_DEVICES (1 << 0)
@@ -1309,7 +1310,6 @@ static void iommu_detach_domain(struct dmar_domain *domain,
 }
 
 static struct iova_domain reserved_iova_list;
-static struct lock_class_key reserved_alloc_key;
 static struct lock_class_key reserved_rbtree_key;
 
 static void dmar_init_reserved_ranges(void)
@@ -1320,8 +1320,6 @@ static void dmar_init_reserved_ranges(void)
 
 	init_iova_domain(&reserved_iova_list, DMA_32BIT_PFN);
 
-	lockdep_set_class(&reserved_iova_list.iova_alloc_lock,
-		&reserved_alloc_key);
 	lockdep_set_class(&reserved_iova_list.iova_rbtree_lock,
 		&reserved_rbtree_key);
 
@@ -1958,14 +1956,24 @@ static int iommu_prepare_identity_map(struct pci_dev *pdev,
 	struct dmar_domain *domain;
 	int ret;
 
-	printk(KERN_INFO
-	       "IOMMU: Setting identity map for device %s [0x%Lx - 0x%Lx]\n",
-	       pci_name(pdev), start, end);
-
 	domain = get_domain_for_dev(pdev, DEFAULT_DOMAIN_ADDRESS_WIDTH);
 	if (!domain)
 		return -ENOMEM;
 
+	/* For _hardware_ passthrough, don't bother. But for software
+	   passthrough, we do it anyway -- it may indicate a memory
+	   range which is reserved in E820, so which didn't get set
+	   up to start with in si_domain */
+	if (domain == si_domain && hw_pass_through) {
+		printk("Ignoring identity map for HW passthrough device %s [0x%Lx - 0x%Lx]\n",
+		       pci_name(pdev), start, end);
+		return 0;
+	}
+
+	printk(KERN_INFO
+	       "IOMMU: Setting identity map for device %s [0x%Lx - 0x%Lx]\n",
+	       pci_name(pdev), start, end);
+
 	ret = iommu_domain_identity_map(domain, start, end);
 	if (ret)
 		goto error;
@@ -2016,23 +2024,6 @@ static inline void iommu_prepare_isa(void)
 }
 #endif /* !CONFIG_DMAR_FLPY_WA */
 
-/* Initialize each context entry as pass through.*/
-static int __init init_context_pass_through(void)
-{
-	struct pci_dev *pdev = NULL;
-	struct dmar_domain *domain;
-	int ret;
-
-	for_each_pci_dev(pdev) {
-		domain = get_domain_for_dev(pdev, DEFAULT_DOMAIN_ADDRESS_WIDTH);
-		ret = domain_context_mapping(domain, pdev,
-					     CONTEXT_TT_PASS_THROUGH);
-		if (ret)
-			return ret;
-	}
-	return 0;
-}
-
 static int md_domain_init(struct dmar_domain *domain, int guest_width);
 
 static int __init si_domain_work_fn(unsigned long start_pfn,
@@ -2047,7 +2038,7 @@ static int __init si_domain_work_fn(unsigned long start_pfn,
 
 }
 
-static int si_domain_init(void)
+static int si_domain_init(int hw)
 {
 	struct dmar_drhd_unit *drhd;
 	struct intel_iommu *iommu;
@@ -2074,6 +2065,9 @@ static int si_domain_init(void)
 
 	si_domain->flags = DOMAIN_FLAG_STATIC_IDENTITY;
 
+	if (hw)
+		return 0;
+
 	for_each_online_node(nid) {
 		work_with_active_regions(nid, si_domain_work_fn, &ret);
 		if (ret)
@@ -2100,15 +2094,23 @@ static int identity_mapping(struct pci_dev *pdev)
 }
 
 static int domain_add_dev_info(struct dmar_domain *domain,
-				  struct pci_dev *pdev)
+			       struct pci_dev *pdev,
+			       int translation)
 {
 	struct device_domain_info *info;
 	unsigned long flags;
+	int ret;
 
 	info = alloc_devinfo_mem();
 	if (!info)
 		return -ENOMEM;
 
+	ret = domain_context_mapping(domain, pdev, translation);
+	if (ret) {
+		free_devinfo_mem(info);
+		return ret;
+	}
+
 	info->segment = pci_domain_nr(pdev->bus);
 	info->bus = pdev->bus->number;
 	info->devfn = pdev->devfn;
@@ -2165,27 +2167,25 @@ static int iommu_should_identity_map(struct pci_dev *pdev, int startup)
 	return 1;
 }
 
-static int iommu_prepare_static_identity_mapping(void)
+static int iommu_prepare_static_identity_mapping(int hw)
 {
 	struct pci_dev *pdev = NULL;
 	int ret;
 
-	ret = si_domain_init();
+	ret = si_domain_init(hw);
 	if (ret)
 		return -EFAULT;
 
 	for_each_pci_dev(pdev) {
 		if (iommu_should_identity_map(pdev, 1)) {
-			printk(KERN_INFO "IOMMU: identity mapping for device %s\n",
-			       pci_name(pdev));
+			printk(KERN_INFO "IOMMU: %s identity mapping for device %s\n",
+			       hw ? "hardware" : "software", pci_name(pdev));
 
-			ret = domain_context_mapping(si_domain, pdev,
+			ret = domain_add_dev_info(si_domain, pdev,
+						     hw ? CONTEXT_TT_PASS_THROUGH :
 						     CONTEXT_TT_MULTI_LEVEL);
 			if (ret)
 				return ret;
-			ret = domain_add_dev_info(si_domain, pdev);
-			if (ret)
-				return ret;
 		}
 	}
 
@@ -2199,14 +2199,6 @@ int __init init_dmars(void)
 	struct pci_dev *pdev;
 	struct intel_iommu *iommu;
 	int i, ret;
-	int pass_through = 1;
-
-	/*
-	 * In case pass through can not be enabled, iommu tries to use identity
-	 * mapping.
-	 */
-	if (iommu_pass_through)
-		iommu_identity_mapping = 1;
 
 	/*
 	 * for each drhd
@@ -2234,7 +2226,6 @@ int __init init_dmars(void)
 	deferred_flush = kzalloc(g_num_of_iommus *
 		sizeof(struct deferred_flush_tables), GFP_KERNEL);
 	if (!deferred_flush) {
-		kfree(g_iommus);
 		ret = -ENOMEM;
 		goto error;
 	}
@@ -2261,14 +2252,8 @@ int __init init_dmars(void)
 			goto error;
 		}
 		if (!ecap_pass_through(iommu->ecap))
-			pass_through = 0;
+			hw_pass_through = 0;
 	}
-	if (iommu_pass_through)
-		if (!pass_through) {
-			printk(KERN_INFO
-			       "Pass Through is not supported by hardware.\n");
-			iommu_pass_through = 0;
-		}
 
 	/*
 	 * Start from the sane iommu hardware state.
@@ -2323,64 +2308,57 @@ int __init init_dmars(void)
 		}
 	}
 
+	if (iommu_pass_through)
+		iommu_identity_mapping = 1;
+#ifdef CONFIG_DMAR_BROKEN_GFX_WA
+	else
+		iommu_identity_mapping = 2;
+#endif
 	/*
-	 * If pass through is set and enabled, context entries of all pci
-	 * devices are intialized by pass through translation type.
+	 * If pass through is not set or not enabled, setup context entries for
+	 * identity mappings for rmrr, gfx, and isa and may fall back to static
+	 * identity mapping if iommu_identity_mapping is set.
 	 */
-	if (iommu_pass_through) {
-		ret = init_context_pass_through();
+	if (iommu_identity_mapping) {
+		ret = iommu_prepare_static_identity_mapping(hw_pass_through);
 		if (ret) {
-			printk(KERN_ERR "IOMMU: Pass through init failed.\n");
-			iommu_pass_through = 0;
+			printk(KERN_CRIT "Failed to setup IOMMU pass-through\n");
+			goto error;
 		}
 	}
-
 	/*
-	 * If pass through is not set or not enabled, setup context entries for
-	 * identity mappings for rmrr, gfx, and isa and may fall back to static
-	 * identity mapping if iommu_identity_mapping is set.
+	 * For each rmrr
+	 *   for each dev attached to rmrr
+	 *   do
+	 *     locate drhd for dev, alloc domain for dev
+	 *     allocate free domain
+	 *     allocate page table entries for rmrr
+	 *     if context not allocated for bus
+	 *           allocate and init context
+	 *           set present in root table for this bus
+	 *     init context with domain, translation etc
+	 *    endfor
+	 * endfor
 	 */
-	if (!iommu_pass_through) {
-#ifdef CONFIG_DMAR_BROKEN_GFX_WA
-		if (!iommu_identity_mapping)
-			iommu_identity_mapping = 2;
-#endif
-		if (iommu_identity_mapping)
-			iommu_prepare_static_identity_mapping();
-		/*
-		 * For each rmrr
-		 *   for each dev attached to rmrr
-		 *   do
-		 *     locate drhd for dev, alloc domain for dev
-		 *     allocate free domain
-		 *     allocate page table entries for rmrr
-		 *     if context not allocated for bus
-		 *           allocate and init context
-		 *           set present in root table for this bus
-		 *     init context with domain, translation etc
-		 *    endfor
-		 * endfor
-		 */
-		printk(KERN_INFO "IOMMU: Setting RMRR:\n");
-		for_each_rmrr_units(rmrr) {
-			for (i = 0; i < rmrr->devices_cnt; i++) {
-				pdev = rmrr->devices[i];
-				/*
-				 * some BIOS lists non-exist devices in DMAR
-				 * table.
-				 */
-				if (!pdev)
-					continue;
-				ret = iommu_prepare_rmrr_dev(rmrr, pdev);
-				if (ret)
-					printk(KERN_ERR
-				 "IOMMU: mapping reserved region failed\n");
-			}
+	printk(KERN_INFO "IOMMU: Setting RMRR:\n");
+	for_each_rmrr_units(rmrr) {
+		for (i = 0; i < rmrr->devices_cnt; i++) {
+			pdev = rmrr->devices[i];
+			/*
+			 * some BIOS lists non-exist devices in DMAR
+			 * table.
+			 */
+			if (!pdev)
+				continue;
+			ret = iommu_prepare_rmrr_dev(rmrr, pdev);
+			if (ret)
+				printk(KERN_ERR
+				       "IOMMU: mapping reserved region failed\n");
 		}
-
-		iommu_prepare_isa();
 	}
 
+	iommu_prepare_isa();
+
 	/*
 	 * for each drhd
 	 *   enable fault log
@@ -2454,8 +2432,7 @@ static struct iova *intel_alloc_iova(struct device *dev,
 	return iova;
 }
 
-static struct dmar_domain *
-get_valid_domain_for_dev(struct pci_dev *pdev)
+static struct dmar_domain *__get_valid_domain_for_dev(struct pci_dev *pdev)
 {
 	struct dmar_domain *domain;
 	int ret;
@@ -2483,6 +2460,18 @@ get_valid_domain_for_dev(struct pci_dev *pdev)
 	return domain;
 }
 
+static inline struct dmar_domain *get_valid_domain_for_dev(struct pci_dev *dev)
+{
+	struct device_domain_info *info;
+
+	/* No lock here, assumes no domain exit in normal case */
+	info = dev->dev.archdata.iommu;
+	if (likely(info))
+		return info->domain;
+
+	return __get_valid_domain_for_dev(dev);
+}
+
 static int iommu_dummy(struct pci_dev *pdev)
 {
 	return pdev->dev.archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO;
@@ -2525,10 +2514,10 @@ static int iommu_no_mapping(struct device *dev)
 		 */
 		if (iommu_should_identity_map(pdev, 0)) {
 			int ret;
-			ret = domain_add_dev_info(si_domain, pdev);
-			if (ret)
-				return 0;
-			ret = domain_context_mapping(si_domain, pdev, CONTEXT_TT_MULTI_LEVEL);
+			ret = domain_add_dev_info(si_domain, pdev,
+						  hw_pass_through ?
+						  CONTEXT_TT_PASS_THROUGH :
+						  CONTEXT_TT_MULTI_LEVEL);
 			if (!ret) {
 				printk(KERN_INFO "64bit %s uses identity mapping\n",
 				       pci_name(pdev));
@@ -2733,12 +2722,6 @@ static void intel_unmap_page(struct device *dev, dma_addr_t dev_addr,
 	}
 }
 
-static void intel_unmap_single(struct device *dev, dma_addr_t dev_addr, size_t size,
-			       int dir)
-{
-	intel_unmap_page(dev, dev_addr, size, dir, NULL);
-}
-
 static void *intel_alloc_coherent(struct device *hwdev, size_t size,
 				  dma_addr_t *dma_handle, gfp_t flags)
 {
@@ -2771,7 +2754,7 @@ static void intel_free_coherent(struct device *hwdev, size_t size, void *vaddr,
 	size = PAGE_ALIGN(size);
 	order = get_order(size);
 
-	intel_unmap_single(hwdev, dma_handle, size, DMA_BIDIRECTIONAL);
+	intel_unmap_page(hwdev, dma_handle, size, DMA_BIDIRECTIONAL, NULL);
 	free_pages((unsigned long)vaddr, order);
 }
 
@@ -2807,11 +2790,18 @@ static void intel_unmap_sg(struct device *hwdev, struct scatterlist *sglist,
 	/* free page tables */
 	dma_pte_free_pagetable(domain, start_pfn, last_pfn);
 
-	iommu_flush_iotlb_psi(iommu, domain->id, start_pfn,
-			      (last_pfn - start_pfn + 1));
-
-	/* free iova */
-	__free_iova(&domain->iovad, iova);
+	if (intel_iommu_strict) {
+		iommu_flush_iotlb_psi(iommu, domain->id, start_pfn,
+				      last_pfn - start_pfn + 1);
+		/* free iova */
+		__free_iova(&domain->iovad, iova);
+	} else {
+		add_unmap(domain, iova);
+		/*
+		 * queue up the release of the unmap to save the 1/6th of the
+		 * cpu used up by the iotlb flush operation...
+		 */
+	}
 }
 
 static int intel_nontranslate_map_sg(struct device *hddev,
@@ -3194,7 +3184,7 @@ int __init intel_iommu_init(void)
 	 * Check the need for DMA-remapping initialization now.
 	 * Above initialization will also be used by Interrupt-remapping.
 	 */
-	if (no_iommu || (swiotlb && !iommu_pass_through) || dmar_disabled)
+	if (no_iommu || swiotlb || dmar_disabled)
 		return -ENODEV;
 
 	iommu_init_mempool();
@@ -3214,14 +3204,7 @@ int __init intel_iommu_init(void)
 
 	init_timer(&unmap_timer);
 	force_iommu = 1;
-
-	if (!iommu_pass_through) {
-		printk(KERN_INFO
-		       "Multi-level page-table translation for DMAR.\n");
-		dma_ops = &intel_dma_ops;
-	} else
-		printk(KERN_INFO
-		       "DMAR: Pass through translation for DMAR.\n");
+	dma_ops = &intel_dma_ops;
 
 	init_iommu_sysfs();
 
@@ -3504,7 +3487,6 @@ static int intel_iommu_attach_device(struct iommu_domain *domain,
 	struct intel_iommu *iommu;
 	int addr_width;
 	u64 end;
-	int ret;
 
 	/* normally pdev is not mapped */
 	if (unlikely(domain_context_mapped(pdev))) {
@@ -3536,12 +3518,7 @@ static int intel_iommu_attach_device(struct iommu_domain *domain,
 		return -EFAULT;
 	}
 
-	ret = domain_add_dev_info(dmar_domain, pdev);
-	if (ret)
-		return ret;
-
-	ret = domain_context_mapping(dmar_domain, pdev, CONTEXT_TT_MULTI_LEVEL);
-	return ret;
+	return domain_add_dev_info(dmar_domain, pdev, CONTEXT_TT_MULTI_LEVEL);
 }
 
 static void intel_iommu_detach_device(struct iommu_domain *domain,
diff --git a/drivers/pci/iova.c b/drivers/pci/iova.c
index 46dd440..7914951 100644
--- a/drivers/pci/iova.c
+++ b/drivers/pci/iova.c
@@ -22,7 +22,6 @@
 void
 init_iova_domain(struct iova_domain *iovad, unsigned long pfn_32bit)
 {
-	spin_lock_init(&iovad->iova_alloc_lock);
 	spin_lock_init(&iovad->iova_rbtree_lock);
 	iovad->rbroot = RB_ROOT;
 	iovad->cached32_node = NULL;
@@ -205,7 +204,6 @@ alloc_iova(struct iova_domain *iovad, unsigned long size,
 	unsigned long limit_pfn,
 	bool size_aligned)
 {
-	unsigned long flags;
 	struct iova *new_iova;
 	int ret;
 
@@ -219,11 +217,9 @@ alloc_iova(struct iova_domain *iovad, unsigned long size,
 	if (size_aligned)
 		size = __roundup_pow_of_two(size);
 
-	spin_lock_irqsave(&iovad->iova_alloc_lock, flags);
 	ret = __alloc_and_insert_iova_range(iovad, size, limit_pfn,
 			new_iova, size_aligned);
 
-	spin_unlock_irqrestore(&iovad->iova_alloc_lock, flags);
 	if (ret) {
 		free_iova_mem(new_iova);
 		return NULL;
@@ -381,8 +377,7 @@ reserve_iova(struct iova_domain *iovad,
 	struct iova *iova;
 	unsigned int overlap = 0;
 
-	spin_lock_irqsave(&iovad->iova_alloc_lock, flags);
-	spin_lock(&iovad->iova_rbtree_lock);
+	spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
 	for (node = rb_first(&iovad->rbroot); node; node = rb_next(node)) {
 		if (__is_range_overlap(node, pfn_lo, pfn_hi)) {
 			iova = container_of(node, struct iova, node);
@@ -402,8 +397,7 @@ reserve_iova(struct iova_domain *iovad,
 	iova = __insert_new_range(iovad, pfn_lo, pfn_hi);
 finish:
 
-	spin_unlock(&iovad->iova_rbtree_lock);
-	spin_unlock_irqrestore(&iovad->iova_alloc_lock, flags);
+	spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
 	return iova;
 }
 
@@ -420,8 +414,7 @@ copy_reserved_iova(struct iova_domain *from, struct iova_domain *to)
 	unsigned long flags;
 	struct rb_node *node;
 
-	spin_lock_irqsave(&from->iova_alloc_lock, flags);
-	spin_lock(&from->iova_rbtree_lock);
+	spin_lock_irqsave(&from->iova_rbtree_lock, flags);
 	for (node = rb_first(&from->rbroot); node; node = rb_next(node)) {
 		struct iova *iova = container_of(node, struct iova, node);
 		struct iova *new_iova;
@@ -430,6 +423,5 @@ copy_reserved_iova(struct iova_domain *from, struct iova_domain *to)
 			printk(KERN_ERR "Reserve iova range %lx@%lx failed\n",
 				iova->pfn_lo, iova->pfn_lo);
 	}
-	spin_unlock(&from->iova_rbtree_lock);
-	spin_unlock_irqrestore(&from->iova_alloc_lock, flags);
+	spin_unlock_irqrestore(&from->iova_rbtree_lock, flags);
 }
diff --git a/include/linux/iova.h b/include/linux/iova.h
index 228f6c9..76a0759 100644
--- a/include/linux/iova.h
+++ b/include/linux/iova.h
@@ -28,7 +28,6 @@ struct iova {
 
 /* holds all the iova translations for a domain */
 struct iova_domain {
-	spinlock_t	iova_alloc_lock;/* Lock to protect iova  allocation */
 	spinlock_t	iova_rbtree_lock; /* Lock to protect update of rbtree */
 	struct rb_root	rbroot;		/* iova domain rbtree root */
 	struct rb_node	*cached32_node; /* Save last alloced node */


Index: config-x86-generic
===================================================================
RCS file: /cvs/pkgs/rpms/kernel/devel/config-x86-generic,v
retrieving revision 1.86
retrieving revision 1.87
diff -u -p -r1.86 -r1.87
--- config-x86-generic	23 Jul 2009 17:45:33 -0000	1.86
+++ config-x86-generic	10 Aug 2009 14:21:08 -0000	1.87
@@ -96,7 +96,7 @@ CONFIG_FB_EFI=y
 CONFIG_DMAR=y
 CONFIG_DMAR_BROKEN_GFX_WA=y
 CONFIG_DMAR_FLOPPY_WA=y
-# CONFIG_DMAR_DEFAULT_ON is not set
+CONFIG_DMAR_DEFAULT_ON=y
 
 CONFIG_FB_GEODE=y
 CONFIG_FB_GEODE_LX=y


Index: config-x86_64-generic
===================================================================
RCS file: /cvs/pkgs/rpms/kernel/devel/config-x86_64-generic,v
retrieving revision 1.88
retrieving revision 1.89
diff -u -p -r1.88 -r1.89
--- config-x86_64-generic	23 Jul 2009 17:45:33 -0000	1.88
+++ config-x86_64-generic	10 Aug 2009 14:21:09 -0000	1.89
@@ -36,7 +36,7 @@ CONFIG_PCI_MMCONFIG=y
 CONFIG_DMAR=y
 CONFIG_DMAR_BROKEN_GFX_WA=y
 CONFIG_DMAR_FLOPPY_WA=y
-# CONFIG_DMAR_DEFAULT_ON is not set
+CONFIG_DMAR_DEFAULT_ON=y
 
 CONFIG_KEXEC_JUMP=y
 


Index: kernel.spec
===================================================================
RCS file: /cvs/pkgs/rpms/kernel/devel/kernel.spec,v
retrieving revision 1.1706
retrieving revision 1.1707
diff -u -p -r1.1706 -r1.1707
--- kernel.spec	10 Aug 2009 04:13:28 -0000	1.1706
+++ kernel.spec	10 Aug 2009 14:21:09 -0000	1.1707
@@ -606,6 +606,10 @@ Patch30: sched-introduce-SCHED_RESET_ON_
 
 Patch41: linux-2.6-sysrq-c.patch
 
+# Intel IOMMU fixes/workarounds
+Patch100: linux-2.6-die-closed-source-bios-muppets-die.patch
+Patch101: linux-2.6-intel-iommu-updates.patch
+
 Patch141: linux-2.6-ps3-storage-alias.patch
 Patch143: linux-2.6-g5-therm-shutdown.patch
 Patch144: linux-2.6-vio-modalias.patch
@@ -1138,6 +1142,16 @@ ApplyPatch via-hwmon-temp-sensor.patch
 ApplyPatch linux-2.6-dell-laptop-rfkill-fix.patch
 
 #
+# Intel IOMMU
+#
+# Quiesce USB host controllers before setting up the IOMMU
+ApplyPatch linux-2.6-die-closed-source-bios-muppets-die.patch
+# Some performance fixes, unify hardware/software passthrough support, and 
+# most importantly: notice when the BIOS points us to a region that returns
+# all 0xFF, and claims that there's an IOMMU there.
+ApplyPatch linux-2.6-intel-iommu-updates.patch
+
+#
 # PowerPC
 #
 ### NOT (YET) UPSTREAM:
@@ -1966,6 +1980,9 @@ fi
 # and build.
 
 %changelog
+* Mon Aug 10 2009 David Woodhouse <David.Woodhouse at intel.com>
+- Merge latest Intel IOMMU fixes and BIOS workarounds, re-enable by default.
+
 * Sun Aug 09 2009 Kyle McMartin <kyle at redhat.com>
 - btusb autosuspend: fix build on !CONFIG_PM by stubbing out
   suspend/resume methods.




More information about the scm-commits mailing list