FYI
-------- Forwarded Message --------
Subject: Re: file write that exceeds thin device capacity
Date: Wed, 14 Nov 2018 09:10:04 +1100
From: Dave Chinner <david(a)fromorbit.com>
To: Todd Gill <tgill(a)redhat.com>
CC: linux-xfs(a)vger.kernel.org
On Tue, Nov 13, 2018 at 02:57:18PM -0500, Todd Gill wrote:
Hi,
This script creates a 1 TB thin device (device mapper) backed by 1 GB
of physical space. The script then writes more than 1 GB via
$BLOCK_SIZE files to XFS. I'm testing to see if recovery can be
automated.
https://paste.fedoraproject.org/paste/ropelNyOQWCjk3hfK0jltA
When the $BLOCK_SIZE passed to dd is 4k - dd gets an error on the file
write that exceeds the physical capacity that backs the thin device.
XFS doesn't indicate any problems.
user data write error.
If I set the $BLOCK_SIZE to 32k - I see entries in the system log
that
indicate XFS loops retrying the writes.
Is that expected? Is it just more likely to happen with larger block
sizes?
I’m looking to understand how to recover when a thin device runs out of
space under XFS.
Example system log entries:
[ +5.048997] XFS (dm-3): metadata I/O error: block 0xf0000
("xfs_buf_iodone_callback_error") error 28 numblks 32
[ +1.376913] XFS: Failing async write: 1164 callbacks suppressed
[ +0.000004] XFS (dm-3): Failing async write on buffer block 0xf0020.
Retrying async write.
Filesystem Metadata write error. XFS is configured to retry them by
default. Failing this write will shut down the filesystem as it is a
corruption vector.
If you expand your thin device at this point, the
write will then succeed and the filesystem will continue to operate
normally.
If you configure your filesystem (through
/sys/fs/xfs/<dev>/error/...) to fail metadata writes on ENOSPC
errors, then it will shutdown the filesystem rather than wait for
the thinp device to be expanded.
Cheers,
Dave.
--
Dave Chinner
david(a)fromorbit.com