I have two mounted disks, both ext3 mounted as /sdb1 /sdc1
On /sdb1 I have a directory, let's call it dirx.
1. rm -rf /sdc1/dirx
2. cd /sdb1 3. tar cf - dirx | tar -C /sdc1 -xpf -
Neither dir (/sdb1 and /sdc1) are not accessed by any programs other than the tar program (and of course /sdb1 is the shell's CWD). The shell's history file is in my home dir.
After tar:
4. du -sk dirx /sdc1/dirx 2904536 /sdc1/dirx 2802124 dirx
So, why this size inflation by 104MiB ?
I repeated the process twice. Same difference.
Other dirs tarred in this way from sdb1 to sdc1 do not show this discrepancy.
Dirx contains mp3's.
On 03Sep2010 14:21, JD jd1008@gmail.com wrote: | I have two mounted disks, both ext3 mounted | as | /sdb1 | /sdc1 | | On /sdb1 I have a directory, let's call it dirx. | | 1. rm -rf /sdc1/dirx | | 2. cd /sdb1 | 3. tar cf - dirx | tar -C /sdc1 -xpf - | | Neither dir (/sdb1 and /sdc1) are not accessed by any programs other | than the tar program (and of course /sdb1 is the shell's CWD). | The shell's history file is in my home dir. | | After tar: | | 4. du -sk dirx /sdc1/dirx | 2904536 /sdc1/dirx | 2802124 dirx | | So, why this size inflation by 104MiB ? | | I repeated the process twice. Same difference. | | Other dirs tarred in this way from sdb1 to sdc1 do not show this | discrepancy.
There are two possible sources of discrepancies that I can think of: - different filesystem types - different directory packing - file fragmentation
I presume we can discount the first one.
Directory packing normally is _better_ in a new directory; older directories can accumulate holes from file deletions. So the second one seems unlikely too. The way to check is to walk the trees with find and tally sizes with awk:
find /sdb1/dirx -type d -ls | awk '{sum += $7} END { print sum }' find /sdc1/dirx -type d -ls | awk '{sum += $7} END { print sum }'
The size difference seems to large for this anyway.
That leaves file fragmentation. Does sdc1 have a lot of other data? Maybe complete MP3s won't fit into the gaps, and must be broken up more. Again, like new directories, there is normally less fragmentation in copied files, not more. And MP3s tend to be written in one go anyway, so the source files are probablem not fragmented either.
None of these choices seem likely to me.
There is a final option which should not apply because these are different fileystems and also because your files are definitely copies: hard link counting. du notices hard links and correctly does not count the second name twice. If you do this:
du -sk dir1 dir2
and dir1 and dir2 have some files hard linked between them then du will not count the hardlinked files when it encounters them, and you would then see "dir2" have a lower count than you might expect otherwise.
The way to check this one is to run two dus:
du -sk dir1 du -sk dir2
You can also scour your tree for hard links:
find /sdc1/dirx -type f -nlink +1 -ls
though your tar copy should preserve the hard linking in your copy, and thus not change the totals.
In short, several things are listed above that can produce different "on disc" sizes for copied data, and I don't really think any of them explain your results. But do some of the checks I suggest - if nothing else they may reveal more clues.
| Dirx contains mp3's.
"MP3s", please. There are no apostrophes in plurals!
Cheers,
On 09/03/2010 04:40 PM, Cameron Simpson wrote:
On 03Sep2010 14:21, JDjd1008@gmail.com wrote: | I have two mounted disks, both ext3 mounted | as | /sdb1 | /sdc1 | | On /sdb1 I have a directory, let's call it dirx. | | 1. rm -rf /sdc1/dirx | | 2. cd /sdb1 | 3. tar cf - dirx | tar -C /sdc1 -xpf - | | Neither dir (/sdb1 and /sdc1) are not accessed by any programs other | than the tar program (and of course /sdb1 is the shell's CWD). | The shell's history file is in my home dir. | | After tar: | | 4. du -sk dirx /sdc1/dirx | 2904536 /sdc1/dirx | 2802124 dirx | | So, why this size inflation by 104MiB ? | | I repeated the process twice. Same difference. | | Other dirs tarred in this way from sdb1 to sdc1 do not show this | discrepancy.
There are two possible sources of discrepancies that I can think of:
- different filesystem types
- different directory packing
- file fragmentation
I presume we can discount the first one.
Directory packing normally is _better_ in a new directory; older directories can accumulate holes from file deletions. So the second one seems unlikely too. The way to check is to walk the trees with find and tally sizes with awk:
find /sdb1/dirx -type d -ls | awk '{sum += $7} END { print sum }' find /sdc1/dirx -type d -ls | awk '{sum += $7} END { print sum }'
The size difference seems to large for this anyway. That leaves file fragmentation. Does sdc1 have a lot of other data? Maybe complete MP3s won't fit into the gaps, and must be broken up more. Again, like new directories, there is normally less fragmentation in copied files, not more. And MP3s tend to be written in one go anyway, so the source files are probablem not fragmented either.
None of these choices seem likely to me.
There is a final option which should not apply because these are different fileystems and also because your files are definitely copies: hard link counting. du notices hard links and correctly does not count the second name twice. If you do this:
du -sk dir1 dir2
and dir1 and dir2 have some files hard linked between them then du will not count the hardlinked files when it encounters them, and you would then see "dir2" have a lower count than you might expect otherwise.
The way to check this one is to run two dus:
du -sk dir1 du -sk dir2
You can also scour your tree for hard links:
find /sdc1/dirx -type f -nlink +1 -ls
though your tar copy should preserve the hard linking in your copy, and thus not change the totals.
In short, several things are listed above that can produce different "on disc" sizes for copied data, and I don't really think any of them explain your results. But do some of the checks I suggest - if nothing else they may reveal more clues.
| Dirx contains mp3's.
"MP3s", please. There are no apostrophes in plurals!
Cheers,
I believe it must be directory packing. sdc1 is 83% full and sdb1 is 81%full. There is also a high fragmentation of free space on both sdb1 and sdc1. Thanx for the info!!!!
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 09/04/2010 05:21 AM, JD wrote:
I have two mounted disks, both ext3 mounted as /sdb1 /sdc1
On /sdb1 I have a directory, let's call it dirx.
rm -rf /sdc1/dirx
cd /sdb1
tar cf - dirx | tar -C /sdc1 -xpf -
Neither dir (/sdb1 and /sdc1) are not accessed by any programs other than the tar program (and of course /sdb1 is the shell's CWD). The shell's history file is in my home dir.
After tar:
- du -sk dirx /sdc1/dirx
2904536 /sdc1/dirx 2802124 dirx
So, why this size inflation by 104MiB ?
What is the filesystem block size on both file systems ?
If the /sdc1 is ext2/3/4 with a block size of 4kb ( tune2fs -l will verify this), and if the source filesystem has a 1kb block size, then each file on the target filesystem may have an extra 1 to 3kb allocation of "usage".
Just a thought.
- -Greg
- -- +---------------------------------------------------------------------+
Please also check the log file at "/dev/null" for additional information. (from /var/log/Xorg.setup.log)
| Greg Hosler ghosler@redhat.com | +---------------------------------------------------------------------+
On 09/03/2010 03:21 PM, JD wrote:
I have two mounted disks, both ext3 mounted as /sdb1 /sdc1
On /sdb1 I have a directory, let's call it dirx.
rm -rf /sdc1/dirx
cd /sdb1
tar cf - dirx | tar -C /sdc1 -xpf -
Neither dir (/sdb1 and /sdc1) are not accessed by any programs other than the tar program (and of course /sdb1 is the shell's CWD). The shell's history file is in my home dir.
After tar:
- du -sk dirx /sdc1/dirx
2904536 /sdc1/dirx 2802124 dirx
So, why this size inflation by 104MiB ?
I repeated the process twice. Same difference.
Other dirs tarred in this way from sdb1 to sdc1 do not show this discrepancy.
Dirx contains mp3's.
You may have some sparse files. Add --sparse to your tar command and see what happens.
If the destination was smaller in size, that would account for fragmentation. But since its bigger than the original, it is most likely caused by sparse file(s).
Good Luck!