On 09/07/2012 08:21 AM, M. Mohan Kumar wrote:
On Thu, 6 Sep 2012 18:59:19 -0400 (EDT), Ayal Baron
<abaron(a)redhat.com> wrote:
>
>
> ----- Original Message -----
>> ----- Original Message -----
>>> From: "M. Mohan Kumar" <mohan(a)in.ibm.com>
>>> To: vdsm-devel(a)lists.fedorahosted.org
>>> Sent: Wednesday, July 25, 2012 1:26:15 PM
>>> Subject: [vdsm] [RFC] GlusterFS domain specific changes
>>>
>>>
>>> We are developing a GlusterFS server translator to export block
>>> devices
>>> as regular files to the client. Using block devices to serve VM
>>> images
>>> gives performance improvements, since it avoids some file system
>>> bottlenecks in the host kernel. Goal is to use one block device(ie
>>> file
>>> at the client side) per VM image and feed this file to QEMU to get
>>> the
>>> performance improvements. QEMU will talk to glusterfs server
>>> directly
>>> using libgfapi.
>>>
>>> Currently we support only exporting Volume groups and Logical
>>> Volumes. Logical volumes are exported as regular files to the
>>> client.
>
> Are you actually using LVM behind the scenes?
> If so, why bother with exposing the LVs as files and not raw block devices?
>
Ayal,
The idea is to provide a FS interface for managing block devices. One
can mount the Block Device Gluster Volume and create a LV and size it
just by
$ touch lv1
$ truncate -s5G lv1
And other file commands can be used to clone LVs, snapshot LVs
$ ln lv1 lv2 # clones
$ ln -s lv1 lv1.sn # creates snapshot
By enabling this feature GlusterFS can directly export storage in
SAN. We are planning to add feature to export LUNs also as regular files
in future.
>>>
>>> In GlusterFS terminology a volume capable of exporting block
>>> devices is
>>> created by specifying the 'Volume Group' (ie VG in Logical Volume
>>> management). Block Device translator(BD xlator) exports this volume
>>> group as a directory and LVs under it as regular files. In the
>>> gluster
>>> mount point creating a file results in creating a logical volume,
>>> removing a file results in removing logical volume etc.
>>>
>>> When a GlusterFS volume enabled with BD xlator is used, directory
>>> creation in that gluster mount path is not supported because
>>> directory
>>> maps to Volume groups in BD xlator. But it could be an issue in
>>> VDSM
>>> environment when a new VDSM volume is created for GlusterFS domain,
>>> VDSM
>>> mounts the storage domain and creates directories under that and
>>> create
>>> files for vm image and other uses (like meta data).
>>
>>> Is it possible to modify this behavior in VDSM to use flat
>>> structure
>>> instead of creating directories and VM images and other files
>>> underneath
>>> it? ie for GlusterFS domain with BD xlator VDSM will not create any
>>> directory and only creates all required files under the mount point
>>> directory itself.
>>
>> From your description I think that the GlusterFS for block devices is
>> actually more similar to what happens with the regular block domains.
>> You should probably need to mount the share somewhere in the system
>> and
>> then use symlinks to point to the volumes.
>>
>> Create a regular block domain and look inside
>> /rhev/data-center/mnt/blockSD,
>> you'll probably get the idea of what I mean.
>>
>> That said we'd need to come up with a way of extending the LVs on the
>> gluster server when required (for thin provisioning).
>
> Why? if it's exposed as a file that probably means it supports sparseness. i.e.
if this becomes a new type of block domain it should only support 'preallocated'
images.
>
For start using the LVs we will always do truncate for the required
size, it will resize the LV. I didn't get what you are mentioning about
thin-provisioning, but I have a dumb code using dm-thin targets showing
BD xlators can be extended to use dm-thin targets for thin-provisioning.
so even though this is block storage, it will be extended as needed? how
does that work exactly?
say i have a VM with a 100GB disk.
thin provisioning means we only allocated 1GB to it, then as the guest
uses that storage, we allocate more as needed (lvextend, pause guest,
lvrefresh, resume guest)