On Fri, 7 Sep 2012 17:07:28 -0400 (EDT), Ayal Baron <abaron(a)redhat.com> wrote:
----- Original Message -----
> As of now BD xlator supports only working with linear Logical
> volumes,
> they are thick provisioned. gluster cli command "gluster volume
> create"
> with option "device=lv" allows to work with logical volumes as files.
>
> As a POC I have a code(not posted to external list), with option
> "device=thin" to gluster volume create command it allows to work with
> thin provisioned targets. But it does not take care of resizing
> thin-pool when it reaches low-level threshold. Supporting thin
> targets
> is in our TODO list. We have dependency on lvm2 library to provide
> apis
> to create thin-targets.
I'm definitely missing some background here.
1. Can the LV span on multiple bricks in gLuster?
i. If 'yes' then
a. do you use gLuster's replication and distribution schemes to gain performance
and redundancy?
b. what performance gain is there over normal gLuster with files?
ii. If 'not' then you're only exposing single host local storage LVM? (in
which case I don't see why gLuster is used at all and where).
No, as of now BD Xlator works only with one brick. There are some issues
in supporting GlusterFS features such as replication and stripe from BD
xlator. We are still evaluating BD xlator for such scenarios.
Advantages of BD Xlator:
(*) Ease of use and unified management for both file and block based
storage.
(*) Making block devices available to nodes which don't have direct
access to SAN. Supporting migration to nodes which don't have SAN
access.
(*) With FS interfaces, it becomes easier to support T10 extensions
like xcopy, writesame (Currently not supported, future plan)
(*) Use of dm-thin logical volumes to provide VM images that are
inherently thin provisioned. It allows multi-level snapshot. When we
support thin-provisioned logical volumes with 'unmap' support its
almost equivalant to sparse files. This is also a future plan.
From a different angle, the only benefit I can think of in exposing a
fs interface over LVM is for consumers who do not wish to know the details of the
underlying storage but want the performance gain of using block storage.
vdsm is already intimately familiar with LVM and block devices, so adding the FS layer
scheme on top doesn't strike me as adding any value. In addition, you require the
consumer to know a lot about your interface because it's not truely a FS interface.
e.g. consumer is not allowed to create directories, files are not sparse, not to mention
that if you're indeed using LVM then I don't think you're considering the VG
MD and extent size limitations:
1. LVM currently has severe limitations wrt number of objects it can manage (the
limitation is actually the size of the VG metadata, but the distinction is not important
just yet). This means that creating a metadata LV in addition to each data LV is very
costly (at around 1000 LVs you'd hit a problem. vdsm currently creates 2 files per
snapshot (the data and a small file with metadata describing it) meaning that you'd
reach this limit really fast.
2. LVM max LV size is extent size * 65K, this means that if I choose a 4K extent size
then my max LV size would be 256MB. This obviously won't do for VMs disks so you'd
choose a much larget extent size. However a larger extent size means that each metadata
file vdsm creates wastes a lot of storage space. So even if LVM could scale, your storage
usage plummets and your $/MB ratio increases.
The way around this is of course not to have a metadata file per volume but have 1 file
containing all the metadata, but then that means I'm fully aware of the limitations of
the environment and treating my objects as files gains me nothing (but does require a new
hybrid domain, a lot more code etc).
GlusterFS + BD xlator domain will be similar to block based storage
domain. IIUC in block based storage, VSDM will not create as many
LVs(files) similar to posix based storage.
BD xlator provides filesystem kind of interface to create/manipulate LVs
while in block based storage domain commands like lvcreate, lvextend
commands are used to manipulate them. ie BD xlator provides FS interface
for block based storage domain.
In future when we have proper support for reflink[1] cp --reflink can be
used for creating linked clone. Also there was a discussion in the past
on copyfile[2] interface which could be used to create full clone of lvs
[1]
http://marc.info/?l=linux-fsdevel&m=125296717319013&w=2
[2]
http://www.spinics.net/lists/linux-nfs/msg26203.html
Also note that without thin provisioning we loose our ability to
create snapshots.
Could you please explain it?