rpms/kernel/devel writeback-Update-dirty-flags-in-two-steps.patch, NONE, 1.1 writeback-bdi_writeback_task-must-set-task-state-bef.patch, NONE, 1.1 writeback-disable-periodic-old-data-writeback-for-di.patch, NONE, 1.1 writeback-ensure-that-WB_SYNC_NONE-writeback-with-sb.patch, NONE, 1.1 writeback-fix-WB_SYNC_NONE-writeback-from-umount.patch, NONE, 1.1 kernel.spec, 1.2007, 1.2008 fs-explicitly-pass-in-whether-sb-is-pinned-or-not.patch, 1.1, NONE

Kyle McMartin kyle at fedoraproject.org
Tue Jun 1 13:29:12 UTC 2010


Author: kyle

Update of /cvs/pkgs/rpms/kernel/devel
In directory cvs01.phx2.fedoraproject.org:/tmp/cvs-serv30488

Modified Files:
	kernel.spec 
Added Files:
	writeback-Update-dirty-flags-in-two-steps.patch 
	writeback-bdi_writeback_task-must-set-task-state-bef.patch 
	writeback-disable-periodic-old-data-writeback-for-di.patch 
	writeback-ensure-that-WB_SYNC_NONE-writeback-with-sb.patch 
	writeback-fix-WB_SYNC_NONE-writeback-from-umount.patch 
Removed Files:
	fs-explicitly-pass-in-whether-sb-is-pinned-or-not.patch 
Log Message:
* Tue Jun 01 2010 Kyle McMartin <kyle at redhat.com> 2.6.34-17
- backport writeback fixes from Jens until stable@ picks them up.


writeback-Update-dirty-flags-in-two-steps.patch:
 fs-writeback.c |   16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

--- NEW FILE writeback-Update-dirty-flags-in-two-steps.patch ---
>From 1ada9aebe5ad7a564811b99c269a5b60fadfd0ce Mon Sep 17 00:00:00 2001
From: Dmitry Monakhov <dmonakhov at openvz.org>
Date: Fri, 7 May 2010 13:35:44 +0400
Subject: writeback: Update dirty flags in two steps

Filesystems with delalloc support may dirty inode during writepages.
As result inode will have dirty metadata flags even after write_inode.
In fact we have two dedicated functions for proper data and metadata
writeback. It is reasonable to separate flags updates in two stages.

https://bugzilla.kernel.org/show_bug.cgi?id=15906

Signed-off-by: Dmitry Monakhov <dmonakhov at openvz.org>
Reviewed-by: Christoph Hellwig <hch at lst.de>
Signed-off-by: Jens Axboe <jens.axboe at oracle.com>
---
 fs/fs-writeback.c |   15 +++++++++++----
 1 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 0af2edf..7f5f006 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -463,11 +463,9 @@ writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
 
 	BUG_ON(inode->i_state & I_SYNC);
 
-	/* Set I_SYNC, reset I_DIRTY */
-	dirty = inode->i_state & I_DIRTY;
+	/* Set I_SYNC, reset I_DIRTY_PAGES */
 	inode->i_state |= I_SYNC;
-	inode->i_state &= ~I_DIRTY;
-
+	inode->i_state &= ~I_DIRTY_PAGES;
 	spin_unlock(&inode_lock);
 
 	ret = do_writepages(mapping, wbc);
@@ -483,6 +481,15 @@ writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
 			ret = err;
 	}
 
+	/*
+	 * Some filesystems may redirty the inode during the writeback
+	 * due to delalloc, clear dirty metadata flags right before
+	 * write_inode()
+	 */
+	spin_lock(&inode_lock);
+	dirty = inode->i_state & I_DIRTY;
+	inode->i_state &= ~(I_DIRTY_SYNC | I_DIRTY_DATASYNC);
+	spin_unlock(&inode_lock);
 	/* Don't write the inode if only I_DIRTY_PAGES was set */
 	if (dirty & (I_DIRTY_SYNC | I_DIRTY_DATASYNC)) {
 		int err = write_inode(inode, wbc);
-- 
1.7.0.1


writeback-bdi_writeback_task-must-set-task-state-bef.patch:
 fs-writeback.c |   10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

--- NEW FILE writeback-bdi_writeback_task-must-set-task-state-bef.patch ---
>From c33d894edcc3a45c46d4bb21f830ed85c8444950 Mon Sep 17 00:00:00 2001
From: Jens Axboe <jens.axboe at oracle.com>
Date: Tue, 18 May 2010 14:31:45 +0200
Subject: writeback: bdi_writeback_task() must set task state before calling schedule()

Calling schedule without setting the task state to non-running will
return immediately, so ensure that we set it properly and check our
sleep conditions after doing so.

This is a fixup for commit 69b62d01.

Signed-off-by: Jens Axboe <jens.axboe at oracle.com>
---
 fs/fs-writeback.c |    9 +++++++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 76f546d..437a743 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -978,8 +978,13 @@ int bdi_writeback_task(struct bdi_writeback *wb)
 		if (dirty_writeback_interval) {
 			wait_jiffies = msecs_to_jiffies(dirty_writeback_interval * 10);
 			schedule_timeout_interruptible(wait_jiffies);
-		} else
-			schedule();
+		} else {
+			set_current_state(TASK_INTERRUPTIBLE);
+			if (list_empty_careful(&wb->bdi->work_list) &&
+			    !kthread_should_stop())
+				schedule();
+			__set_current_state(TASK_RUNNING);
+		}
 
 		try_to_freeze();
 	}
-- 
1.7.0.1


writeback-disable-periodic-old-data-writeback-for-di.patch:
 fs-writeback.c |   15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

--- NEW FILE writeback-disable-periodic-old-data-writeback-for-di.patch ---
>From 89081a62c9c99d86efb322089f1e235e7532454a Mon Sep 17 00:00:00 2001
From: Jens Axboe <jens.axboe at oracle.com>
Date: Mon, 17 May 2010 12:51:03 +0200
Subject: writeback: disable periodic old data writeback for !dirty_writeback_centisecs

Prior to 2.6.32, setting /proc/sys/vm/dirty_writeback_centisecs disabled
periodic dirty writeback from kupdate. This got broken and now causes
excessive sys CPU usage if set to zero, as we'll keep beating on
schedule().

Cc: stable at kernel.org
Reported-by: Justin Maggard <jmaggard10 at gmail.com>
Signed-off-by: Jens Axboe <jens.axboe at oracle.com>
---
 fs/fs-writeback.c |   14 ++++++++++++--
 1 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 7f5f006..76f546d 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -871,6 +871,12 @@ static long wb_check_old_data_flush(struct bdi_writeback *wb)
 	unsigned long expired;
 	long nr_pages;
 
+	/*
+	 * When set to zero, disable periodic writeback
+	 */
+	if (!dirty_writeback_interval)
+		return 0;
+
 	expired = wb->last_old_flush +
 			msecs_to_jiffies(dirty_writeback_interval * 10);
 	if (time_before(jiffies, expired))
@@ -969,8 +975,12 @@ int bdi_writeback_task(struct bdi_writeback *wb)
 				break;
 		}
 
-		wait_jiffies = msecs_to_jiffies(dirty_writeback_interval * 10);
-		schedule_timeout_interruptible(wait_jiffies);
+		if (dirty_writeback_interval) {
+			wait_jiffies = msecs_to_jiffies(dirty_writeback_interval * 10);
+			schedule_timeout_interruptible(wait_jiffies);
+		} else
+			schedule();
+
 		try_to_freeze();
 	}
 
-- 
1.7.0.1


writeback-ensure-that-WB_SYNC_NONE-writeback-with-sb.patch:
 fs-writeback.c |   17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

--- NEW FILE writeback-ensure-that-WB_SYNC_NONE-writeback-with-sb.patch ---
>From 0759fed096228ae36b7a27ab6f965411906821d6 Mon Sep 17 00:00:00 2001
From: Jens Axboe <jens.axboe at oracle.com>
Date: Tue, 18 May 2010 14:29:29 +0200
Subject: writeback: ensure that WB_SYNC_NONE writeback with sb pinned is sync

Even if the writeout itself isn't a data integrity operation, we need
to ensure that the caller doesn't drop the sb umount sem before we
have actually done the writeback.

This is a fixup for commit e913fc82.

Signed-off-by: Jens Axboe <jens.axboe at oracle.com>
---
 fs/fs-writeback.c |   16 +++++++++++-----
 1 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index c9ac9cb..0af2edf 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -193,7 +193,8 @@ static void bdi_wait_on_work_clear(struct bdi_work *work)
 }
 
 static void bdi_alloc_queue_work(struct backing_dev_info *bdi,
-				 struct wb_writeback_args *args)
+				 struct wb_writeback_args *args,
+				 int wait)
 {
 	struct bdi_work *work;
 
@@ -205,6 +206,8 @@ static void bdi_alloc_queue_work(struct backing_dev_info *bdi,
 	if (work) {
 		bdi_work_init(work, args);
 		bdi_queue_work(bdi, work);
+		if (wait)
+			bdi_wait_on_work_clear(work);
 	} else {
 		struct bdi_writeback *wb = &bdi->wb;
 
@@ -279,7 +282,7 @@ void bdi_start_writeback(struct backing_dev_info *bdi, struct super_block *sb,
 		args.for_background = 1;
 	}
 
-	bdi_alloc_queue_work(bdi, &args);
+	bdi_alloc_queue_work(bdi, &args, sb_locked);
 }
 
 /*
@@ -896,6 +899,7 @@ long wb_do_writeback(struct bdi_writeback *wb, int force_wait)
 
 	while ((work = get_next_work_item(bdi, wb)) != NULL) {
 		struct wb_writeback_args args = work->args;
+		int post_clear;
 
 		/*
 		 * Override sync mode, in case we must wait for completion
@@ -903,11 +907,13 @@ long wb_do_writeback(struct bdi_writeback *wb, int force_wait)
 		if (force_wait)
 			work->args.sync_mode = args.sync_mode = WB_SYNC_ALL;
 
+		post_clear = WB_SYNC_ALL || args.sb_pinned;
+
 		/*
 		 * If this isn't a data integrity operation, just notify
 		 * that we have seen this work and we are now starting it.
 		 */
-		if (args.sync_mode == WB_SYNC_NONE)
+		if (!post_clear)
 			wb_clear_pending(wb, work);
 
 		wrote += wb_writeback(wb, &args);
@@ -916,7 +922,7 @@ long wb_do_writeback(struct bdi_writeback *wb, int force_wait)
 		 * This is a data integrity writeback, so only do the
 		 * notification when we have completed the work.
 		 */
-		if (args.sync_mode == WB_SYNC_ALL)
+		if (post_clear)
 			wb_clear_pending(wb, work);
 	}
 
@@ -983,7 +989,7 @@ static void bdi_writeback_all(struct super_block *sb, long nr_pages)
 		if (!bdi_has_dirty_io(bdi))
 			continue;
 
-		bdi_alloc_queue_work(bdi, &args);
+		bdi_alloc_queue_work(bdi, &args, 0);
 	}
 
 	rcu_read_unlock();
-- 
1.7.0.1


writeback-fix-WB_SYNC_NONE-writeback-from-umount.patch:
 fs/fs-writeback.c           |   48 +++++++++++++++++++++++++++++++++-----------
 fs/sync.c                   |    2 -
 include/linux/backing-dev.h |    2 -
 include/linux/writeback.h   |   10 +++++++++
 mm/page-writeback.c         |    3 --
 5 files changed, 50 insertions(+), 15 deletions(-)

--- NEW FILE writeback-fix-WB_SYNC_NONE-writeback-from-umount.patch ---
>From f7d91a6ab7d536ac1b59c6d791929c4adcfffbec Mon Sep 17 00:00:00 2001
From: Jens Axboe <jens.axboe at oracle.com>
Date: Mon, 17 May 2010 12:55:07 +0200
Subject: writeback: fix WB_SYNC_NONE writeback from umount

When umount calls sync_filesystem(), we first do a WB_SYNC_NONE
writeback to kick off writeback of pending dirty inodes, then follow
that up with a WB_SYNC_ALL to wait for it. Since umount already holds
the sb s_umount mutex, WB_SYNC_NONE ends up doing nothing and all
writeback happens as WB_SYNC_ALL. This can greatly slow down umount,
since WB_SYNC_ALL writeback is a data integrity operation and thus
a bigger hammer than simple WB_SYNC_NONE. For barrier aware file systems
it's a lot slower.

Signed-off-by: Jens Axboe <jens.axboe at oracle.com>
---
 fs/fs-writeback.c           |   48 +++++++++++++++++++++++++++++++++---------
 fs/sync.c                   |    2 +-
 include/linux/backing-dev.h |    2 +-
 include/linux/writeback.h   |   10 +++++++++
 mm/page-writeback.c         |    2 +-
 5 files changed, 50 insertions(+), 14 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 4b37f7c..c9ac9cb 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -45,6 +45,7 @@ struct wb_writeback_args {
 	int for_kupdate:1;
 	int range_cyclic:1;
 	int for_background:1;
+	int sb_pinned:1;
 };
 
 /*
@@ -230,6 +231,11 @@ static void bdi_sync_writeback(struct backing_dev_info *bdi,
 		.sync_mode	= WB_SYNC_ALL,
 		.nr_pages	= LONG_MAX,
 		.range_cyclic	= 0,
+		/*
+		 * Setting sb_pinned is not necessary for WB_SYNC_ALL, but
+		 * lets make it explicitly clear.
+		 */
+		.sb_pinned	= 1,
 	};
 	struct bdi_work work;
 
@@ -245,21 +251,23 @@ static void bdi_sync_writeback(struct backing_dev_info *bdi,
  * @bdi: the backing device to write from
  * @sb: write inodes from this super_block
  * @nr_pages: the number of pages to write
+ * @sb_locked: caller already holds sb umount sem.
  *
  * Description:
  *   This does WB_SYNC_NONE opportunistic writeback. The IO is only
  *   started when this function returns, we make no guarentees on
- *   completion. Caller need not hold sb s_umount semaphore.
+ *   completion. Caller specifies whether sb umount sem is held already or not.
  *
  */
 void bdi_start_writeback(struct backing_dev_info *bdi, struct super_block *sb,
-			 long nr_pages)
+			 long nr_pages, int sb_locked)
 {
 	struct wb_writeback_args args = {
 		.sb		= sb,
 		.sync_mode	= WB_SYNC_NONE,
 		.nr_pages	= nr_pages,
 		.range_cyclic	= 1,
+		.sb_pinned	= sb_locked,
 	};
 
 	/*
@@ -577,7 +585,7 @@ static enum sb_pin_state pin_sb_for_writeback(struct writeback_control *wbc,
 	/*
 	 * Caller must already hold the ref for this
 	 */
-	if (wbc->sync_mode == WB_SYNC_ALL) {
+	if (wbc->sync_mode == WB_SYNC_ALL || wbc->sb_pinned) {
 		WARN_ON(!rwsem_is_locked(&sb->s_umount));
 		return SB_NOT_PINNED;
 	}
@@ -751,6 +759,7 @@ static long wb_writeback(struct bdi_writeback *wb,
 		.for_kupdate		= args->for_kupdate,
 		.for_background		= args->for_background,
 		.range_cyclic		= args->range_cyclic,
+		.sb_pinned		= args->sb_pinned,
 	};
 	unsigned long oldest_jif;
 	long wrote = 0;
@@ -1183,6 +1192,18 @@ static void wait_sb_inodes(struct super_block *sb)
 	iput(old_inode);
 }
 
+static void __writeback_inodes_sb(struct super_block *sb, int sb_locked)
+{
+	unsigned long nr_dirty = global_page_state(NR_FILE_DIRTY);
+	unsigned long nr_unstable = global_page_state(NR_UNSTABLE_NFS);
+	long nr_to_write;
+
+	nr_to_write = nr_dirty + nr_unstable +
+			(inodes_stat.nr_inodes - inodes_stat.nr_unused);
+
+	bdi_start_writeback(sb->s_bdi, sb, nr_to_write, sb_locked);
+}
+
 /**
  * writeback_inodes_sb	-	writeback dirty inodes from given super_block
  * @sb: the superblock
@@ -1194,18 +1215,23 @@ static void wait_sb_inodes(struct super_block *sb)
  */
 void writeback_inodes_sb(struct super_block *sb)
 {
-	unsigned long nr_dirty = global_page_state(NR_FILE_DIRTY);
-	unsigned long nr_unstable = global_page_state(NR_UNSTABLE_NFS);
-	long nr_to_write;
-
-	nr_to_write = nr_dirty + nr_unstable +
-			(inodes_stat.nr_inodes - inodes_stat.nr_unused);
-
-	bdi_start_writeback(sb->s_bdi, sb, nr_to_write);
+	__writeback_inodes_sb(sb, 0);
 }
 EXPORT_SYMBOL(writeback_inodes_sb);
 
 /**
+ * writeback_inodes_sb_locked	- writeback dirty inodes from given super_block
+ * @sb: the superblock
+ *
+ * Like writeback_inodes_sb(), except the caller already holds the
+ * sb umount sem.
+ */
+void writeback_inodes_sb_locked(struct super_block *sb)
+{
+	__writeback_inodes_sb(sb, 1);
+}
+
+/**
  * writeback_inodes_sb_if_idle	-	start writeback if none underway
  * @sb: the superblock
  *
diff --git a/fs/sync.c b/fs/sync.c
index 92b2281..de6a441 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -42,7 +42,7 @@ static int __sync_filesystem(struct super_block *sb, int wait)
 	if (wait)
 		sync_inodes_sb(sb);
 	else
-		writeback_inodes_sb(sb);
+		writeback_inodes_sb_locked(sb);
 
 	if (sb->s_op->sync_fs)
 		sb->s_op->sync_fs(sb, wait);
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index bd0e3c6..90e677a 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -103,7 +103,7 @@ int bdi_register_dev(struct backing_dev_info *bdi, dev_t dev);
 void bdi_unregister(struct backing_dev_info *bdi);
 int bdi_setup_and_register(struct backing_dev_info *, char *, unsigned int);
 void bdi_start_writeback(struct backing_dev_info *bdi, struct super_block *sb,
-				long nr_pages);
+				long nr_pages, int sb_locked);
 int bdi_writeback_task(struct bdi_writeback *wb);
 int bdi_has_dirty_io(struct backing_dev_info *bdi);
 
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 36520de..3790165 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -65,6 +65,15 @@ struct writeback_control {
 	 * so we use a single control to update them
 	 */
 	unsigned no_nrwrite_index_update:1;
+
+	/*
+	 * For WB_SYNC_ALL, the sb must always be pinned. For WB_SYNC_NONE,
+	 * the writeback code will pin the sb for the caller. However,
+	 * for eg umount, the caller does WB_SYNC_NONE but already has
+	 * the sb pinned. If the below is set, caller already has the
+	 * sb pinned.
+	 */
+	unsigned sb_pinned:1;
 };
 
 /*
@@ -73,6 +82,7 @@ struct writeback_control {
 struct bdi_writeback;
 int inode_wait(void *);
 void writeback_inodes_sb(struct super_block *);
+void writeback_inodes_sb_locked(struct super_block *);
 int writeback_inodes_sb_if_idle(struct super_block *);
 void sync_inodes_sb(struct super_block *);
 void writeback_inodes_wbc(struct writeback_control *wbc);
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 0b19943..49d3508 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -597,7 +597,7 @@ static void balance_dirty_pages(struct address_space *mapping,
 	    (!laptop_mode && ((global_page_state(NR_FILE_DIRTY)
 			       + global_page_state(NR_UNSTABLE_NFS))
 					  > background_thresh)))
-		bdi_start_writeback(bdi, NULL, 0);
+		bdi_start_writeback(bdi, NULL, 0, 0);
 }
 
 void set_page_dirty_balance(struct page *page, int page_mkwrite)
-- 
1.7.0.1



Index: kernel.spec
===================================================================
RCS file: /cvs/pkgs/rpms/kernel/devel/kernel.spec,v
retrieving revision 1.2007
retrieving revision 1.2008
diff -u -p -r1.2007 -r1.2008
--- kernel.spec	1 Jun 2010 12:51:29 -0000	1.2007
+++ kernel.spec	1 Jun 2010 13:29:10 -0000	1.2008
@@ -753,7 +753,11 @@ Patch2911: linux-2.6-v4l-dvb-add-kworld-
 
 # fs fixes
 
-Patch3000: fs-explicitly-pass-in-whether-sb-is-pinned-or-not.patch
+Patch3000: writeback-fix-WB_SYNC_NONE-writeback-from-umount.patch
+Patch3001: writeback-ensure-that-WB_SYNC_NONE-writeback-with-sb.patch
+Patch3002: writeback-Update-dirty-flags-in-two-steps.patch
+Patch3003: writeback-disable-periodic-old-data-writeback-for-di.patch
+Patch3004: writeback-bdi_writeback_task-must-set-task-state-bef.patch
 
 # NFSv4
 
@@ -1226,7 +1230,11 @@ ApplyPatch linux-2.6-execshield.patch
 #
 # bugfixes to drivers and filesystems
 #
-ApplyPatch fs-explicitly-pass-in-whether-sb-is-pinned-or-not.patch
+ApplyPatch writeback-fix-WB_SYNC_NONE-writeback-from-umount.patch
+ApplyPatch writeback-ensure-that-WB_SYNC_NONE-writeback-with-sb.patch
+ApplyPatch writeback-Update-dirty-flags-in-two-steps.patch
+ApplyPatch writeback-disable-periodic-old-data-writeback-for-di.patch
+ApplyPatch writeback-bdi_writeback_task-must-set-task-state-bef.patch
 
 # ext4
 
@@ -2049,6 +2057,9 @@ fi
 #                 ||     ||
 
 %changelog
+* Tue Jun 01 2010 Kyle McMartin <kyle at redhat.com> 2.6.34-17
+- backport writeback fixes from Jens until stable@ picks them up.
+
 * Tue Jun 01 2010 Kyle McMartin <kyle at redhat.com> 2.6.34-16
 - quiet-prove_RCU-in-cgroups.patch: shut RCU lockdep up
   as in 8b08ca52f5942c21564bbb90ccfb61053f2c26a1.


--- fs-explicitly-pass-in-whether-sb-is-pinned-or-not.patch DELETED ---



More information about the scm-commits mailing list