Hi, I'm planning to add filesystem-local database support to mlocate. This allows: - running updatedb on a file server and making the database automatically available to clients without any client-side configuration - using locate on GFS volumes without running updatedb on each host that has the volume mounted (which slows the volumes down due to lock contention)
The plan: * the mlocate.db format is extended to support databases without a fixed path prefix, such that all entries in a database in /foo/bar/.mlocate/mlocate.db are implicitly prefixed by /foo/bar. (this allows using /srv/home on the file server, mounting it as /home, and using a single database on both the server and the client). * locate(1)'s default database is not just /var/lib/mlocate/mlocate.db; mlocate also checks each mounted filesystem for a .mlocate/mlocate.db file, owned by root or the invoking user, and not writable by anyone but the owner. Such files are automatically added to the database path. * To allow overriding this check, the LOCATE_PATH variable is changed to override the default database path instead of appending to the database path. *note*: this is an incompatible change * updatedb(8) gets a new option, --single-fs PATH. This option generates a database in PATH/.mlocate/mlocate.db that spans only the subtree of PATH. filesystems mounted in subdirectories of PATH are automatically excluded, PRUNEFS is ignored. PRUNEPATHS is honored, except for PATH itself. * /etc/cron.daily/mlocate reads /etc/sysconfig/mlocate to get a list of single-fs PATHs. For each PATH it checks PATH/.mlocate/mlocate.db is older than 12 hours, creates a lock to prevent a concurrent updatedb, and runs updatedb --single-fs PATH.
The standard daily run is performed as well, with all entries of /etc/sysconfig/mlocate added to PRUNEPATHS automatically.
Usage for /home on NFS: - NFS is automatically excluded by clients, so updatedb on clients does not walk the filesystem. - On the server: Add /srv/home to /etc/sysconfig/mlocate. If /srv/home is not a separate mount point, add LOCATE_PATH=:/srv/home/.mlocate/mlocate.db to the global environment.
Usage for /home on GFS: - GFS is automatically excluded, so no host walks the filesystem by default. - On all hosts: add /home to /etc/sysconfig/mlocate
Can anyone see a problem with the plan, or an important feature that the above fails to address?
Thanks, Mirek
Le Ven 16 mars 2007 05:16, Miloslav Trmac a écrit :
Can anyone see a problem with the plan, or an important feature that the above fails to address?
1. You're spreading rw files all over the filesystems, which is contrary to the FHS. 2. You're putting hidden directories where people won't expect them, which will seriously annoy sysadmins
I don't have any magic solution, but IMHO you need to think about file location & naming a little more.
On Fri, Mar 16, 2007 at 08:55:53AM +0100, Nicolas Mailhot wrote:
Le Ven 16 mars 2007 05:16, Miloslav Trmac a écrit :
Can anyone see a problem with the plan, or an important feature that the above fails to address?
- You're spreading rw files all over the filesystems, which is contrary
to the FHS. 2. You're putting hidden directories where people won't expect them, which will seriously annoy sysadmins
I don't have any magic solution, but IMHO you need to think about file location & naming a little more.
It's quite similar to maintaining quotas (unless the filesystem is capable of maintaining quotas in its own metadata), e.g. consider the path information managed by mlocate the same metadata quality as the quotas following similar exceptions.
Hello, Nicolas Mailhot napsal(a):
- You're spreading rw files all over the filesystems, which is contrary
to the FHS.
Moving the database to the filesystem described by the database is the whole point. It might be the only shared filesystem between the computers, there is no other place to put the database in general.
- You're putting hidden directories where people won't expect them, which
will seriously annoy sysadmins
None of this happens automatically. The sysadmin must edit /etc/sysconfig/mlocate in order to use filesystem-local databases. Mirek
On Fri, Mar 16, 2007 at 05:16:26AM +0100, Miloslav Trmac wrote:
Hi, I'm planning to add filesystem-local database support to mlocate. This allows:
First of all thanks for attacking this!
- running updatedb on a file server and making the database automatically available to clients without any client-side configuration
- using locate on GFS volumes without running updatedb on each host that has the volume mounted (which slows the volumes down due to lock contention)
The plan:
- the mlocate.db format is extended to support databases without a fixed path prefix, such that all entries in a database in /foo/bar/.mlocate/mlocate.db are implicitly prefixed by /foo/bar. (this allows using /srv/home on the file server, mounting it as /home, and using a single database on both the server and the client).
- locate(1)'s default database is not just /var/lib/mlocate/mlocate.db; mlocate also checks each mounted filesystem for a .mlocate/mlocate.db file, owned by root or the invoking user, and not writable by anyone but the owner. Such files are automatically added to the database path.
locate should also include .mlocate/mlocate.db a previous updatedb has found and skipped. E.g. if updatedb detects a .mlocate/mlocate.db in a folder in its path it skips it and registers it for locate to use.
Perhaps that way you can even save the explicit mentioning of --single-fs paths in /etc/sysconfig/mlocate. If a paths is to be handled as such the admin just creates an .mlocate folder and updatedb and locate automatically pick it up.
To allow overriding this check, the LOCATE_PATH variable is changed to override the default database path instead of appending to the database path. *note*: this is an incompatible change
updatedb(8) gets a new option, --single-fs PATH. This option generates a database in PATH/.mlocate/mlocate.db that spans only the subtree of PATH. filesystems mounted in subdirectories of PATH are automatically excluded, PRUNEFS is ignored. PRUNEPATHS is honored, except for PATH itself.
/etc/cron.daily/mlocate reads /etc/sysconfig/mlocate to get a list of single-fs PATHs. For each PATH it checks PATH/.mlocate/mlocate.db is older than 12 hours, creates a lock to prevent a concurrent updatedb, and runs updatedb --single-fs PATH.
The standard daily run is performed as well, with all entries of /etc/sysconfig/mlocate added to PRUNEPATHS automatically.
Usage for /home on NFS:
- NFS is automatically excluded by clients, so updatedb on clients does not walk the filesystem.
- On the server: Add /srv/home to /etc/sysconfig/mlocate. If /srv/home is not a separate mount point, add LOCATE_PATH=:/srv/home/.mlocate/mlocate.db to the global environment.
Usage for /home on GFS:
- GFS is automatically excluded, so no host walks the filesystem by default.
- On all hosts: add /home to /etc/sysconfig/mlocate
Can anyone see a problem with the plan, or an important feature that the above fails to address?
Thanks, Mirek
Hello, Axel Thimm napsal(a):
- locate(1)'s default database is not just /var/lib/mlocate/mlocate.db; mlocate also checks each mounted filesystem for a .mlocate/mlocate.db file, owned by root or the invoking user, and not writable by anyone but the owner. Such files are automatically added to the database path.
locate should also include .mlocate/mlocate.db a previous updatedb has found and skipped. E.g. if updatedb detects a .mlocate/mlocate.db in a folder in its path it skips it and registers it for locate to use.
This would make a difference only for "subdirectory-local" databases within a filesystem. I can't think of a reason why they would be necessary.
Perhaps that way you can even save the explicit mentioning of --single-fs paths in /etc/sysconfig/mlocate. If a paths is to be handled as such the admin just creates an .mlocate folder and updatedb and locate automatically pick it up.
I'd prefer a more specific administrator action; otherwise just extracting an archive could unintentionally add a mlocate database and, in the worst case, double the updatedb overhead. Mirek
On Mon, Mar 19, 2007 at 11:51:19PM +0100, Miloslav Trmac wrote:
Hello, Axel Thimm napsal(a):
- locate(1)'s default database is not just /var/lib/mlocate/mlocate.db; mlocate also checks each mounted filesystem for a .mlocate/mlocate.db file, owned by root or the invoking user, and not writable by anyone but the owner. Such files are automatically added to the database path.
locate should also include .mlocate/mlocate.db a previous updatedb has found and skipped. E.g. if updatedb detects a .mlocate/mlocate.db in a folder in its path it skips it and registers it for locate to use.
This would make a difference only for "subdirectory-local" databases within a filesystem.
Exactly.
I can't think of a reason why they would be necessary.
Consider for example a single volume nfs server exporting /home. So you want to have updatedb create a subdirectory-local db in /home, so it can be used on remote clients.
E.g. you can't assume that every exported volume will be identical to a mounted volume on the server. Every exported dir is subject to create an .mlocate/mlocate.db which for the server itself looks like an ordinary subdirectory.
And instead of having to configure it in /etc/sysconfig it is easier to keep the metainformation of about where such .mlocate/mlocate.db should be maintained in the fs itself simply by creating the folder .mlocate.
Perhaps that way you can even save the explicit mentioning of --single-fs paths in /etc/sysconfig/mlocate. If a paths is to be handled as such the admin just creates an .mlocate folder and updatedb and locate automatically pick it up.
I'd prefer a more specific administrator action; otherwise just extracting an archive could unintentionally add a mlocate database and, in the worst case, double the updatedb overhead.
You mean an archive that contains .mlocate? That would be an unfortunate archiving and given that only root can do that it means the admin did it. Same would apply to quota or backup metainformation which also doesn't make sense to archive away.
Axel Thimm napsal(a):
On Mon, Mar 19, 2007 at 11:51:19PM +0100, Miloslav Trmac wrote:
Hello, Axel Thimm napsal(a):
- locate(1)'s default database is not just /var/lib/mlocate/mlocate.db; mlocate also checks each mounted filesystem for a .mlocate/mlocate.db file, owned by root or the invoking user, and not writable by anyone but the owner. Such files are automatically added to the database path.
locate should also include .mlocate/mlocate.db a previous updatedb has found and skipped. E.g. if updatedb detects a .mlocate/mlocate.db in a folder in its path it skips it and registers it for locate to use.
I can't think of a reason why they would be necessary.
Consider for example a single volume nfs server exporting /home. So you want to have updatedb create a subdirectory-local db in /home, so it can be used on remote clients.
But on the client, where locate runs, /home/foo would be a separate filesystem.
As for updatedb,
And instead of having to configure it in /etc/sysconfig it is easier to keep the metainformation of about where such .mlocate/mlocate.db should be maintained in the fs itself simply by creating the folder .mlocate.
wouldn't it be even more practical to support FS_DB_GLOB=/srv/home/* in /etc/sysconfig/mlocate ? Mirek
On Tue, Mar 20, 2007 at 12:50:26AM +0100, Miloslav Trmac wrote:
Axel Thimm napsal(a):
On Mon, Mar 19, 2007 at 11:51:19PM +0100, Miloslav Trmac wrote:
Hello, Axel Thimm napsal(a):
- locate(1)'s default database is not just /var/lib/mlocate/mlocate.db; mlocate also checks each mounted filesystem for a .mlocate/mlocate.db file, owned by root or the invoking user, and not writable by anyone but the owner. Such files are automatically added to the database path.
locate should also include .mlocate/mlocate.db a previous updatedb has found and skipped. E.g. if updatedb detects a .mlocate/mlocate.db in a folder in its path it skips it and registers it for locate to use.
I can't think of a reason why they would be necessary.
Consider for example a single volume nfs server exporting /home. So you want to have updatedb create a subdirectory-local db in /home, so it can be used on remote clients.
But on the client, where locate runs, /home/foo would be a separate filesystem.
Yes, but updatedb (and locate) runs also on the server.
As for updatedb,
And instead of having to configure it in /etc/sysconfig it is easier to keep the metainformation of about where such .mlocate/mlocate.db should be maintained in the fs itself simply by creating the folder .mlocate.
wouldn't it be even more practical to support FS_DB_GLOB=/srv/home/* in /etc/sysconfig/mlocate ?
It's better not to have any knowledge under /etc of where special .mlocate/mlocate.db are injected on the (local) filesystem. Consider the (local) volume in question to be mounted from a SAN device, maybe even in a cluster active/passive setup. Or consider moving a data disk from one system to another.
It's better to try and keep the metainformation non-global (e.g. out of /etc) and self-describing, so it is rather transparent for the admin. Otherwise he needs to cater for any change of host <-> storage mapping in mlocate as well.
On Tue, 2007-03-20 at 10:56 +0100, Axel Thimm wrote:
On Tue, Mar 20, 2007 at 12:50:26AM +0100, Miloslav Trmac wrote:
Axel Thimm napsal(a):
On Mon, Mar 19, 2007 at 11:51:19PM +0100, Miloslav Trmac wrote:
Hello, Axel Thimm napsal(a):
- locate(1)'s default database is not just /var/lib/mlocate/mlocate.db; mlocate also checks each mounted filesystem for a .mlocate/mlocate.db file, owned by root or the invoking user, and not writable by anyone but the owner. Such files are automatically added to the database path.
locate should also include .mlocate/mlocate.db a previous updatedb has found and skipped. E.g. if updatedb detects a .mlocate/mlocate.db in a folder in its path it skips it and registers it for locate to use.
I can't think of a reason why they would be necessary.
Consider for example a single volume nfs server exporting /home. So you want to have updatedb create a subdirectory-local db in /home, so it can be used on remote clients.
But on the client, where locate runs, /home/foo would be a separate filesystem.
Yes, but updatedb (and locate) runs also on the server.
As for updatedb,
And instead of having to configure it in /etc/sysconfig it is easier to keep the metainformation of about where such .mlocate/mlocate.db should be maintained in the fs itself simply by creating the folder .mlocate.
wouldn't it be even more practical to support FS_DB_GLOB=/srv/home/* in /etc/sysconfig/mlocate ?
It's better not to have any knowledge under /etc of where special .mlocate/mlocate.db are injected on the (local) filesystem. Consider the (local) volume in question to be mounted from a SAN device, maybe even in a cluster active/passive setup. Or consider moving a data disk from one system to another.
It's better to try and keep the metainformation non-global (e.g. out of /etc) and self-describing, so it is rather transparent for the admin. Otherwise he needs to cater for any change of host <-> storage mapping in mlocate as well.
The more I think of it, the more I think .mlocate directories and files are a very BAD idea. Security is not enforceable this way across machines boundaries, so anything that shares the filesystem layout cross machine IS a problem. The only way to solve the problem (and also avoid the uglyness of having writable files all over the places, but keep them in /var/cache where they should stay) is to have a network service that use authentication mechanisms _per user_ (kerberized), and returns back only the information such user is entitled to see.
Simo.
On Tue, Mar 20, 2007 at 08:36:36AM -0400, Simo Sorce wrote:
On Tue, 2007-03-20 at 10:56 +0100, Axel Thimm wrote:
On Tue, Mar 20, 2007 at 12:50:26AM +0100, Miloslav Trmac wrote:
Axel Thimm napsal(a):
On Mon, Mar 19, 2007 at 11:51:19PM +0100, Miloslav Trmac wrote:
Hello, Axel Thimm napsal(a):
> * locate(1)'s default database is not just /var/lib/mlocate/mlocate.db; > mlocate also checks each mounted filesystem for a .mlocate/mlocate.db > file, owned by root or the invoking user, and not writable by anyone > but the owner. Such files are automatically added to the database > path. locate should also include .mlocate/mlocate.db a previous updatedb has found and skipped. E.g. if updatedb detects a .mlocate/mlocate.db in a folder in its path it skips it and registers it for locate to use.
I can't think of a reason why they would be necessary.
Consider for example a single volume nfs server exporting /home. So you want to have updatedb create a subdirectory-local db in /home, so it can be used on remote clients.
But on the client, where locate runs, /home/foo would be a separate filesystem.
Yes, but updatedb (and locate) runs also on the server.
As for updatedb,
And instead of having to configure it in /etc/sysconfig it is easier to keep the metainformation of about where such .mlocate/mlocate.db should be maintained in the fs itself simply by creating the folder .mlocate.
wouldn't it be even more practical to support FS_DB_GLOB=/srv/home/* in /etc/sysconfig/mlocate ?
It's better not to have any knowledge under /etc of where special .mlocate/mlocate.db are injected on the (local) filesystem. Consider the (local) volume in question to be mounted from a SAN device, maybe even in a cluster active/passive setup. Or consider moving a data disk from one system to another.
It's better to try and keep the metainformation non-global (e.g. out of /etc) and self-describing, so it is rather transparent for the admin. Otherwise he needs to cater for any change of host <-> storage mapping in mlocate as well.
The more I think of it, the more I think .mlocate directories and files are a very BAD idea. Security is not enforceable this way across machines boundaries, so anything that shares the filesystem layout cross machine IS a problem.
The argument here is more about where to keep information about what is to be subindexed, not whether it makes security-wise sense to do so. Even so if the the .mlocate bits are only readable by the remote client if the remote client can traverse anyway through the filesystem, then there is no security leak anywhere.
This means that either the remote filesystem is root-mounted w/o root_suqashing or a per-user authetication is put in place. In the former case only the non-suqashed root get's to read it (and is allowed to traverse the fs anyway since he's unsquashed) and in the latter you have set up per user trust (by nfs/krb or cifs) and setup any finer-grained security model suiting your site's needs.
The default setup should asume the worst, e.g. have the indexes owned by root:root, so no remote fs old or new will be able to access the data if the admin of the server doesn't allow it.
On Tue, 2007-03-20 at 14:02 +0100, Axel Thimm wrote:
The default setup should asume the worst, e.g. have the indexes owned by root:root, so no remote fs old or new will be able to access the data if the admin of the server doesn't allow it.
Which kind of defeats the whole thing of having per FS locatedbs ... and is a temptation for admins to change it to nobody:nobody and give away info easily without fully recognizing the security problem.
However, I see the value for those 0.01% users using clustered file systems. So, if we stop talking about net FSs and instead we talk about SANs and GFS/GPFS/Lustre/OCFS2/whatever, I think it makes more sense :)
Simo.
On Tue, Mar 20, 2007 at 09:22:01AM -0400, Simo Sorce wrote:
On Tue, 2007-03-20 at 14:02 +0100, Axel Thimm wrote:
The default setup should asume the worst, e.g. have the indexes owned by root:root, so no remote fs old or new will be able to access the data if the admin of the server doesn't allow it.
Which kind of defeats the whole thing of having per FS locatedbs ... and is a temptation for admins to change it to nobody:nobody and give away info easily without fully recognizing the security problem.
The same admins would probably write the root password on their door, so they don't forget ;)
However, I see the value for those 0.01% users using clustered file systems. So, if we stop talking about net FSs and instead we talk about SANs and GFS/GPFS/Lustre/OCFS2/whatever, I think it makes more sense :)
Cluster users will certainly benefit, as well as such juggling data storage around either physical or by (re)assigning luns on the raid, and such using NFS for the homes on trusted clients as well.
Hi,
On Fri, Mar 16, 2007 at 05:16:26AM +0100, Miloslav Trmac wrote:
Can anyone see a problem with the plan, or an important feature that the above fails to address?
If I understood it correctly, every locate search would read the files on the remote volumes, right? The performance will suffer a bit I think. For example, NFS over 11mbit wifi is fine, but waiting tens of seconds for the database to download isn't good. Probably a global locate cache db that merges all the fs-local ones would be nice.
On Fri, Mar 16, 2007 at 12:26:06PM +0100, Tomas Janousek wrote:
Hi,
On Fri, Mar 16, 2007 at 05:16:26AM +0100, Miloslav Trmac wrote:
Can anyone see a problem with the plan, or an important feature that the above fails to address?
If I understood it correctly, every locate search would read the files on the remote volumes, right? The performance will suffer a bit I think. For example, NFS over 11mbit wifi is fine, but waiting tens of seconds for the database to download isn't good. Probably a global locate cache db that merges all the fs-local ones would be nice.
Perhaps the remote .mlocatedbs could be cached based on size and timestamp?
Axel Thimm wrote:
If I understood it correctly, every locate search would read the files on the remote volumes, right? The performance will suffer a bit I think. For example, NFS over 11mbit wifi is fine, but waiting tens of seconds for the database to download isn't good. Probably a global locate cache db that merges all the fs-local ones would be nice.
Perhaps the remote .mlocatedbs could be cached based on size and timestamp?
Linux NFS always has had very poor performance wrt local filesystems, but adding another layer of complexity in updatedb to overcome the limitations of NFS over slow links is inappropriate.
The NFSv4 spec allows very aggressive client-side caching. Recent kernels with cachefs may even use local files for backing store. This general solution should speedup most usage patterns without the need to add specialized caches to all applications.
On Sun, Mar 18, 2007 at 12:11:27PM +0100, Bernardo Innocenti wrote:
Axel Thimm wrote:
If I understood it correctly, every locate search would read the files on the remote volumes, right? The performance will suffer a bit I think. For example, NFS over 11mbit wifi is fine, but waiting tens of seconds for the database to download isn't good. Probably a global locate cache db that merges all the fs-local ones would be nice.
Perhaps the remote .mlocatedbs could be cached based on size and timestamp?
Linux NFS always has had very poor performance wrt local filesystems, but adding another layer of complexity in updatedb to overcome the limitations of NFS over slow links is inappropriate.
The NFSv4 spec allows very aggressive client-side caching. Recent kernels with cachefs may even use local files for backing store. This general solution should speedup most usage patterns without the need to add specialized caches to all applications.
We're certainly not there yet to disallow connecting to NFS3 servers (or anything but NFSv4).
Le dimanche 18 mars 2007 à 12:19 +0100, Axel Thimm a écrit :
We're certainly not there yet to disallow connecting to NFS3 servers (or anything but NFSv4).
However for the proposition to work the new mlocate must exist server (to create the files) and client-side (to read them). Is it reasonable to expect both ends to switch to a not-even-yet-released mlocate but not be able to do nfsv4 ?
On Sun, Mar 18, 2007 at 12:27:27PM +0100, Nicolas Mailhot wrote:
Le dimanche 18 mars 2007 à 12:19 +0100, Axel Thimm a écrit :
We're certainly not there yet to disallow connecting to NFS3 servers (or anything but NFSv4).
However for the proposition to work the new mlocate must exist server (to create the files) and client-side (to read them). Is it reasonable to expect both ends to switch to a not-even-yet-released mlocate but not be able to do nfsv4 ?
With mlocate you will not really have a choice when it the changes are applied, while with NFS it's an admin's choice to use NFS3 or NFSv4, and 3 still has the larger share and will probably do so long after mlocate introduces these changes.
And NFS is not the only remote filesystem, nor the only filesystem in general where this will be applied. Other prominent fs that will benefit from this setup are GFS and openafs.
So we need not only a solution that works with NFSv4, but NFS3 and other filesystems as well. Therefore we can't rely on NFSv4 specifica.
Axel Thimm wrote:
With mlocate you will not really have a choice when it the changes are applied, while with NFS it's an admin's choice to use NFS3 or NFSv4, and 3 still has the larger share and will probably do so long after mlocate introduces these changes.
Both of which always have done a better job than NFSv3 with client-side caching. Even Samba is much better.
As far as I can tell, NFSv4 is just catching up. And as of today I still find many trivial workloads for which NFSv4 still performs poorly. Try "time find /nfs_share >/dev/null" versus the same command on a local filesystem to see what I mean.
And NFS is not the only remote filesystem, nor the only filesystem in general where this will be applied. Other prominent fs that will benefit from this setup are GFS and openafs.
Why would GFS be any slower than a non-clustered filesystem when it comes to raw data read performance? The DLM overhead would supposedly not get in the way of every single block being read.
GFS is usually accessed through the same bus types of ordinary filesystems, including SAS, fiber-channel and even SATA.
Even gigabit ethernet, which would be a very uncommon transport for block-level storage, would be fast enough for the bandwidth of today's ordinary hard drives.
On Sun, Mar 18, 2007 at 02:38:59PM +0100, Bernardo Innocenti wrote:
Axel Thimm wrote:
With mlocate you will not really have a choice when it the changes are applied, while with NFS it's an admin's choice to use NFS3 or NFSv4, and 3 still has the larger share and will probably do so long after mlocate introduces these changes.
Both of which always have done a better job than NFSv3 with client-side caching. Even Samba is much better.
As far as I can tell, NFSv4 is just catching up. And as of today I still find many trivial workloads for which NFSv4 still performs poorly. Try "time find /nfs_share >/dev/null" versus the same command on a local filesystem to see what I mean.
Well, aren't you just arguing against your original proposal to move everything to NFSv4 and rely on the caching done by NFSv4? ;)
And NFS is not the only remote filesystem, nor the only filesystem in general where this will be applied. Other prominent fs that will benefit from this setup are GFS and openafs.
Why would GFS be any slower than a non-clustered filesystem when it comes to raw data read performance? The DLM overhead would supposedly not get in the way of every single block being read.
You should try and time GFS. When it drops the domain locks, no caching survives.
GFS is usually accessed through the same bus types of ordinary filesystems, including SAS, fiber-channel and even SATA.
And network block devices.
Even gigabit ethernet, which would be a very uncommon transport for block-level storage, would be fast enough for the bandwidth of today's ordinary hard drives.
You are trying to solve an easy-to-solve caching problem by requiring
o usage of NFSv4 o high bandwidth of drives o gigabit ethernet o and more
while the original poster mentioned he needs this for his wireless connection of his laptop ...
Axel Thimm wrote:
As far as I can tell, NFSv4 is just catching up. And as of today I still find many trivial workloads for which NFSv4 still performs poorly. Try "time find /nfs_share >/dev/null" versus the same command on a local filesystem to see what I mean.
Well, aren't you just arguing against your original proposal to move everything to NFSv4 and rely on the caching done by NFSv4? ;)
:-) Telling the truth shall outweigh one's desire of being always right.
Well, the NFSv4 performance problem I was talking about affects stat()-ing many files. Filesystem metadata is not being cached as efficiently as one would expect and client requests are not being clustered together as much as possibile.
*But* the contents of a single largish file such as mlocate.db should be cached on the clients. At least, that's what I'm experiencing with NFSv4 in my LAN.
To measure NFSv4 read() performance, I must create a file on the server, read it on the server to make sure it's in buffer-cache (otherwise I'd be measruing the performance of the RAID array too), then go on the client and cat the file to /dev/null. To repeat the test, I must remove the test file from the server and create a new one from scratch, otherwise the client would have cached it all and get much better results.
Why would GFS be any slower than a non-clustered filesystem when it comes to raw data read performance? The DLM overhead would supposedly not get in the way of every single block being read.
You should try and time GFS. When it drops the domain locks, no caching survives.
Are you talking about GFS2? I've never had a chance to try it.
It's been a while since I used GFS, and it was GFS1 on RHAS3 or maybe 4. At that time, GFS performance was poor wrt ext3 even when the storage was locally attached to a single server. But what it did was so useful for an HA cluster that you would excuse it for not being also fast.
You are trying to solve an easy-to-solve caching problem by requiring
o usage of NFSv4 o high bandwidth of drives o gigabit ethernet o and more
Ouch... But these conditions were just ORed together!
The reason I try to drive away from the caching solution is that most caches are more fragile and complex than their designers initially thought. Most break in the face of the user who's not even aware of them (because caches are designed to be transparent).
while the original poster mentioned he needs this for his wireless connection of his laptop ...
Then he's only got NFSv4 left...
On Sun, Mar 18, 2007 at 03:52:14PM +0100, Bernardo Innocenti wrote:
Axel Thimm wrote:
And as of today I still find many trivial workloads for which NFSv4 still performs poorly.
Well, aren't you just arguing against your original proposal to move everything to NFSv4 and rely on the caching done by NFSv4? ;)
:-) Telling the truth shall outweigh one's desire of being always right.
OK, but ...
Why would GFS be any slower than a non-clustered filesystem when it comes to raw data read performance? The DLM overhead would supposedly not get in the way of every single block being read.
It's been a while since I used GFS, and it was GFS1 on RHAS3 or maybe 4. At that time, GFS performance was poor wrt ext3 even when the storage was locally attached to a single server. But what it did was so useful for an HA cluster that you would excuse it for not being also fast.
... aren't you doing it again? On one post you assume GFS being as fast as any local fs, only to admit that it isn't.
Anyway, seems at the end we do agree ;)
You are trying to solve an easy-to-solve caching problem by requiring
o usage of NFSv4 o high bandwidth of drives o gigabit ethernet o and more
Ouch... But these conditions were just ORed together!
Even so, what does the poor fellow with a laptop and NFS3 do? Which is a very common setup?
The reason I try to drive away from the caching solution is that most caches are more fragile and complex than their designers initially thought. Most break in the face of the user who's not even aware of them (because caches are designed to be transparent).
In this case the caching is rather trivial, since it is just a copy operation and checking sizes & mtime. It can be made _perfect_ by adding a checksum at the beginning or end of the db.
Axel Thimm wrote:
It's been a while since I used GFS, and it was GFS1 on RHAS3 or maybe 4. At that time, GFS performance was poor wrt ext3 even when the storage was locally attached to a single server. But what it did was so useful for an HA cluster that you would excuse it for not being also fast.
... aren't you doing it again? On one post you assume GFS being as fast as any local fs, only to admit that it isn't.
Yes, I look confused but actually I'm not. Or so I believe.
The slow performance I'm talking about is again something you'd measure by running "find . >/dev/null", maybe twice. Issuing thousands of small queries makes most network filesystems, and the old GFS1, crawl. That's probably because these filesystems can't cache metadata at the VFS layer and must go through the lower layers to answer.
If you think this access pattern is uncommon, consider that git, svn, cvs, and even make are designed around the assumption that stat'ting is cheap.
When it comes to read() and write() in big chunks -- which is what you do to access mlocate.db -- I'd expect any half-decent filesystem to deliver almost the same raw performance of its underlying media.
Anyway, seems at the end we do agree ;)
Yep :)
Even so, what does the poor fellow with a laptop and NFS3 do? Which is a very common setup?
A local cache would be needed in this case.
In this case the caching is rather trivial, since it is just a copy operation and checking sizes & mtime. It can be made _perfect_ by adding a checksum at the beginning or end of the db.
Yes, I wasn't considering the whole picture: mlocate.db already *is* a cache. Caching a cache is trivial :-)
Hi.
Bernardo Innocenti schrieb:
To repeat the test, I must remove the test file from the server and create a new one from scratch, otherwise the client would have cached it all and get much better results.
Newer kernels make this a bit easier for you: http://linux-mm.org/Drop_Caches
Hi,
On Sun, Mar 18, 2007 at 03:12:30PM +0100, Axel Thimm wrote:
while the original poster mentioned he needs this for his wireless connection of his laptop ...
Well the original poster had something like NFS over the globe in mind, but mentioned wifi because 1) it's more common and 2) nobody would say I'm crazy.
:]
Tomas Janousek wrote:
Well the original poster had something like NFS over the globe in mind, but mentioned wifi because 1) it's more common and 2) nobody would say I'm crazy.
Back when the web was still a new idea, wuarchive.wustl.edu used to run a public NFS share to access its huge Amiga software collection.
Mounting that over an analog modem link and saying: "Finally, I can get rid of the ftp client"... *that* would be crazy!
Nicolas Mailhot wrote:
However for the proposition to work the new mlocate must exist server (to create the files) and client-side (to read them). Is it reasonable to expect both ends to switch to a not-even-yet-released mlocate but not be able to do nfsv4 ?
Not in Fedora, but people running mlocate on propretary systems such Win32 and MacOSX will have a hard time trying to get NFSv4 support from their vendors ;-)
Axel Thimm wrote:
The NFSv4 spec allows very aggressive client-side caching. Recent kernels with cachefs may even use local files for backing store. This general solution should speedup most usage patterns without the need to add specialized caches to all applications.
We're certainly not there yet to disallow connecting to NFS3 servers (or anything but NFSv4).
NFSv3 still works, but will do less caching.
Actually, I'm not sure whether NFSv3 can do efficient read-only caching or not. Maybe yes.
And btw, I can't find cachefs anywhere in recent 2.6.20 kernels. Does anyone know where it's gone?
Hello, Tomas Janousek napsal(a):
If I understood it correctly, every locate search would read the files on the remote volumes, right?
Yes.
The performance will suffer a bit I think. For example, NFS over 11mbit wifi is fine, but waiting tens of seconds for the database to download isn't good.
Good point. It's still an improvement over no mlocate database, though.
Probably a global locate cache db that merges all the fs-local ones would be nice.
This might inflate the local storage requirements a lot, though. In the most efficient case (system-global cache that is updated on each access) it also expands the part of locate that must run with elevated privileges.
Given that this mlocate feature certainly won't be ready in time for FC7, we have enough time to experiment with caching after some practical experience is collected. Mirek
Miloslav Trmac wrote:
Can anyone see a problem with the plan, or an important feature that the above fails to address?
I love your proposal, but I'm concerned with littering the roots of all mountpoints with .mlocate and possibily 10 other dot files from other applications.
I hope that you and the authors of other system services with similar requirements could get together and come up with a standard place for these files named .volume/, .info/, .db/ or something similar. Subsystems that may want to use it include full-text search engines, quota, etc.
Great care must be taken to make all this metadata interoperable between different architectures, operating systems and software versions.
Maybe the LSB would care to publish a standard within the next 10 years?
Le dimanche 18 mars 2007 à 12:18 +0100, Bernardo Innocenti a écrit :
Hi Bernardo,
You're rephrasing my opinion much better than I stated it originally
Maybe the LSB would care to publish a standard within the next 10 years?
Replace LSB with FHS, if we want something quick
Nicolas Mailhot wrote:
You're rephrasing my opinion much better than I stated it originally
Thanks :)
Maybe the LSB would care to publish a standard within the next 10 years?
Replace LSB with FHS, if we want something quick
I've looked at their web site, but the FHS spec has not been updated since early 2004, and their mailing-list archives look like a spam dumpyard.
These two clues make me suppose the FHS standardization body may have retired or maybe just moved elsewhere.
On Sun, Mar 18, 2007 at 09:40:08PM +0100, Bernardo Innocenti wrote:
Nicolas Mailhot wrote:
You're rephrasing my opinion much better than I stated it originally
Thanks :)
Maybe the LSB would care to publish a standard within the next 10 years?
Replace LSB with FHS, if we want something quick
I've looked at their web site, but the FHS spec has not been updated since early 2004, and their mailing-list archives look like a spam dumpyard.
These two clues make me suppose the FHS standardization body may have retired or maybe just moved elsewhere.
Unfortunately it hasn't moved. The spamfest is indeed the main mailing list :(
Hello, Bernardo Innocenti napsal(a):
Can anyone see a problem with the plan, or an important feature that the above fails to address?
I love your proposal, but I'm concerned with littering the roots of all mountpoints with .mlocate and possibily 10 other dot files from other applications.
I hope that you and the authors of other system services with similar requirements could get together and come up with a standard place for these files named .volume/, .info/, .db/ or something similar. Subsystems that may want to use it include full-text search engines, quota, etc.
Using a service-specific directory is slightly more secure: accessing files within the .mlocate directory is allowed only to root and the slocate user, so the risk of exploiting some other application and using the privileges to attack mlocate data files is smaller.
Maybe the LSB would care to publish a standard within the next 10 years?
It seems most successful standards codify existing practice, so there must be some practical applications first.
Anyway, I'll make sure to make the mlocate database format not depend on the specific .mlocate/mlocate.db suffix. Mirek
On Fri, 2007-03-16 at 05:16 +0100, Miloslav Trmac wrote:
Hi, I'm planning to add filesystem-local database support to mlocate. This allows:
- running updatedb on a file server and making the database automatically available to clients without any client-side configuration
- using locate on GFS volumes without running updatedb on each host that has the volume mounted (which slows the volumes down due to lock contention)
[...]
Usage for /home on NFS:
- NFS is automatically excluded by clients, so updatedb on clients does not walk the filesystem.
- On the server: Add /srv/home to /etc/sysconfig/mlocate. If /srv/home is not a separate mount point, add LOCATE_PATH=:/srv/home/.mlocate/mlocate.db to the global environment.
I am deeply concerned about the security implications of this idea. You are basically making it possible for everyone to get access to the complete remote FS layout ???
Can anyone see a problem with the plan, or an important feature that the above fails to address?
Yes, security and privacy wise it is BAD BAAD BAAAD :-)
Simo.
Simo Sorce wrote:
Usage for /home on NFS:
- NFS is automatically excluded by clients, so updatedb on clients does not walk the filesystem.
- On the server: Add /srv/home to /etc/sysconfig/mlocate. If /srv/home is not a separate mount point, add LOCATE_PATH=:/srv/home/.mlocate/mlocate.db to the global environment.
I am deeply concerned about the security implications of this idea. You are basically making it possible for everyone to get access to the complete remote FS layout ???
In the local case, mlocate.db contains the whole directory structure as read by the root user.
Local security is based on unix permissions: the locate.db is not readable to normal users and the locate binary is set-gid locate.
Remote databases exported in NFS shares cannot of course use this trick becausae it requires trusting the remote root of all clients.
A solution could be crawling the filesystem as user nobody to avoid disclosing private information, but this would make the shared locate.db completely useless to index home directories.
How did Apple solve the problem with Spotlight? Spotlight also stores its database in the root directory of all volumes, including flash pens and remote NFS shares.
On Mon, Mar 19, 2007 at 02:10:41AM -0400, Simo Sorce wrote:
On Fri, 2007-03-16 at 05:16 +0100, Miloslav Trmac wrote:
Hi, I'm planning to add filesystem-local database support to mlocate. This allows:
- running updatedb on a file server and making the database automatically available to clients without any client-side configuration
- using locate on GFS volumes without running updatedb on each host that has the volume mounted (which slows the volumes down due to lock contention)
[...]
Usage for /home on NFS:
- NFS is automatically excluded by clients, so updatedb on clients does not walk the filesystem.
- On the server: Add /srv/home to /etc/sysconfig/mlocate. If /srv/home is not a separate mount point, add LOCATE_PATH=:/srv/home/.mlocate/mlocate.db to the global environment.
I am deeply concerned about the security implications of this idea. You are basically making it possible for everyone to get access to the complete remote FS layout ???
The remote mlocate.db can be exported as owned by root with 0600, and depending on root_squash or other factors the database will be remotely readable or not.
Or placed differently: If the remote server allows root mounts, then reading the mlocate.db will only be possible, if the remote client can also traverse the real paths anyway (due to unsquashed root priviledges), so you're not giving more security sensitive information away than what's already possible.
Can anyone see a problem with the plan, or an important feature that the above fails to address?
Yes, security and privacy wise it is BAD BAAD BAAAD :-)
It would need to elevate /usr/bin/locate from an sgid to an suid program. That's a risk that needs to be weighed, but other than that I don't see any further issues. Or is there something still?
Simo Sorce napsal(a):
- NFS is automatically excluded by clients, so updatedb on clients does not walk the filesystem.
- On the server: Add /srv/home to /etc/sysconfig/mlocate. If /srv/home is not a separate mount point, add LOCATE_PATH=:/srv/home/.mlocate/mlocate.db to the global environment.
I am deeply concerned about the security implications of this idea. You are basically making it possible for everyone to get access to the complete remote FS layout ???
No, only the layout of the specific NFS filesystem that can be mounted from the client. mlocate.db would be readable only by the slocate user, like the current /var/lib/mlocate/mlocate.db.
Therefore, if a client can fake the UID and read the whole mlocate.db, it can fake the UID and traverse the whole NFS filesystem just the same. Mirek
On Tue, 2007-03-20 at 00:12 +0100, Miloslav Trmac wrote:
Simo Sorce napsal(a):
- NFS is automatically excluded by clients, so updatedb on clients does not walk the filesystem.
- On the server: Add /srv/home to /etc/sysconfig/mlocate. If /srv/home is not a separate mount point, add LOCATE_PATH=:/srv/home/.mlocate/mlocate.db to the global environment.
I am deeply concerned about the security implications of this idea. You are basically making it possible for everyone to get access to the complete remote FS layout ???
No, only the layout of the specific NFS filesystem that can be mounted from the client.
This is what I am talking about, the whole remote (exported) FS.
mlocate.db would be readable only by the slocate user, like the current /var/lib/mlocate/mlocate.db.
You are thinking in 1990 terms (NFSv3), how do you authenticate the slocate user on NFSv4 and CIFS?
Therefore, if a client can fake the UID and read the whole mlocate.db, it can fake the UID and traverse the whole NFS filesystem just the same.
You are thinking in 1990 terms (NFSv3), in 2007 we have CIFS and NFSv4 that authenticate per user, such a file would be a considerable security breach, on these file systems.
Simo.
On Tue, Mar 20, 2007 at 08:28:51AM -0400, Simo Sorce wrote:
On Tue, 2007-03-20 at 00:12 +0100, Miloslav Trmac wrote:
Simo Sorce napsal(a):
- NFS is automatically excluded by clients, so updatedb on clients does not walk the filesystem.
- On the server: Add /srv/home to /etc/sysconfig/mlocate. If /srv/home is not a separate mount point, add LOCATE_PATH=:/srv/home/.mlocate/mlocate.db to the global environment.
I am deeply concerned about the security implications of this idea. You are basically making it possible for everyone to get access to the complete remote FS layout ???
No, only the layout of the specific NFS filesystem that can be mounted from the client.
This is what I am talking about, the whole remote (exported) FS.
mlocate.db would be readable only by the slocate user, like the current /var/lib/mlocate/mlocate.db.
You are thinking in 1990 terms (NFSv3), how do you authenticate the slocate user on NFSv4 and CIFS?
Therefore, if a client can fake the UID and read the whole mlocate.db, it can fake the UID and traverse the whole NFS filesystem just the same.
You are thinking in 1990 terms (NFSv3), in 2007 we have CIFS and NFSv4 that authenticate per user, such a file would be a considerable security breach, on these file systems.
Why, on these filesystems (assuming the proper authentication mechanism is in place) you cannot fake the uid anymore, so you have even less access rights to any root owned file. It's the elder, fakable protocols that need attention.