glusterfs

Kevin Fenzi kevin at scrye.com
Wed Feb 1 19:32:20 UTC 2012


Greetings. 

I've been playing around with glusterfs the last few days, and I
thought I would send out a note about what I had found and ideas for
how we could use it. ;) 

glusterfs is a distributed filesystem. It's actually very easy to
setup and manage, which is nice. ;) 

You can setup a local one-node gluster volume in just a few commands: 

yum install glusterfs\*
service glusterd start
gluster volume create testvolume yourhostname:/testbrick
gluster volume start testvolume
mkdir /mnt/testvolume
mount -t glusterfs yourhostname:/testvolume /mnt/testvolume

Setting up multiple nodes/peers/bricks is pretty easy. 

Setting up distributing data and replication is pretty easy. 

The replication seems to work pretty transparently. I setup a 2 node
setup and kill -9'ed the gluster processes on one and the other kept on
trucking just fine, and resynced fine after I restarted it. 

Has an nfs mount ability, although it's not that good IMHO, as it's a
single point of failure on whatever hostname you specify in the mount
then. It could however be a handy fallback. 

Warts: 

- iptables rules are a bit anoying to allow the nodes to talk to each
  other. See:
  http://europe.gluster.org/community/documentation/index.php/Gluster_3.2:_Installing_GlusterFS_from_Source

- There is a georeplication feature to allow you to replicate over a
  WAN link to a slave gluster instance or directory. However, this
  can't be a live instance, it's just for disaster recovery, and
  currently it requires a root ssh login with passwordless key. Pass. 

- df is a bit useless, as mounts show the space on the backing volume
  that you created the brick on. Unless we setup mounts for each volume
  to use it won't really reflect space. On the other hand, du should
  work fine. 

Possible uses for us: 

We could look at using this for a shared virt storage, which would let
us move things around more easily. However, there's a number of
problems with that: we would have to run it on the bare virthosts, we
would have to switch (as far as I can tell) to filesystem .img files
for the virt images, which may not be as nice as lvm volumes. Also, we
haven't in the past really moved things around much, and libvirt allows
for migrations anyhow. So, I don't think this usage is too much win. 

So, looking at sharing application level data, I would think we would
want to setup a virt on each of our virthosts (called 'glusterN' or
something). Then we could make volumes and share them to needed
applications with the required replication/distribution needs for that
data. 

What things could we put on this? 

- The tracker xapian db?

- How about other databases? I'm not sure how some db's would handle
  it, but it could make us able to stop the db on one host, bring it up
  on another with virtually no outage. If each database is it's own
  volume, we can move each one around pretty easily. (ie, have 4 db
  servers, move all db's to one, reboot/update the other 3, move them
  back, reboot the last, etc). 

- Web/static content? Right now we rsync that to all the proxies every
  hour. If we had a gluster volume for it, we could just build and
  rsync to the gluster instance. Or if the build doesn't do anything
  wacky, just build on there. 

- Hosted and Collab data? This would be better than the current drbd
  setup as we could have two instances actively using the mount/data at
  the same time. We would need to figure out how to distribute requests
  tho. 

- Moving forward to next year/later this year if we get a hold of a
  bunch of storage, how about /mnt/koji? We would need at least 2 nodes
  that have enough space, but then that would get us good replication
  and ability to survive a machine crash much easier. 

- Insert your crazy idea here. What do we have that could be made
  better by replication/distribution in this way? Is there enough here
  to make it worth deploying? 

Thoughts? 

kevin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://lists.fedoraproject.org/pipermail/infrastructure/attachments/20120201/bd9026ec/attachment.sig>


More information about the infrastructure mailing list