Re: Fwd: Encryption
by Edward Shishkin
On 02/24/2011 06:05 PM, Jeff Darcy wrote:
>
> There are three basic issues that need to be addressed in the encryption
> module: type of cipher used, initialization-vector handling, and
> conflict management. Each is non-trivial, so I'll address them in turn.
>
> = Cipher
>
> The main factor affecting our choice of ciphers (or APIs to them) is
> that we need to be able to deal efficiently with updates both in the
> middle of the file and at the end. At EOF, the problem is that we need
> a whole cipher-block in order to decrypt, but the file might actually
> end at any byte boundary within that cipher-block. Therefore, we have
> to deal with the "residue" somehow.
>
> * Store the residue in an xattr.
>
> * Store a whole cipher-block at the end, record the amount of padding in
> an xattr.
>
> * Use a stream cipher (or block cipher converted to a stream cipher).
Nup. "Block cipher converted to a stream cipher" doesn't resolve the EOF
problem. Moreover, this sounds bad, so I suggest to adhere the following
terminology.
There are block ciphers and stream ciphers.
. Stream cipher translates 1 byte of plain text to 1 byte of cipher
text. Example of stream cipher: RC4.
. Block cipher translates 1 byte of plain text to 1 block (of key size)
of cipher text. Examples of block cipher: DES, AES.
Block cipher can have stream (chaining) mode, which requires an initial
vector (IV). Regardless of modes block cipher has output of size, which
is multiple to key size.
So stream ciphers don't have the EOF problem, and block ciphers do have
such problem regardless of stream modes.
Block ciphers with non-stream modes sucks (are not stable to attacks),
and we won't consider them.
So eventually let's speak about "block ciphers" and "stream ciphers".
>
> This problem is further compounded by the striping case, where EOF for a
> stripe component (local file stored on one brick) might not be EOF for
> the entire file (union of all stripe components).
>
> Since the two xattr-based approaches both require extra calls, the
> stream-cipher approach has been used, with the cipher resetting at block
> (e.g. 4KB) boundaries to allow efficient middle-of-file updates. As it
> turns out, pure stream ciphers are relatively uncommon.
Stream ciphers are EU standard, whereas block ciphers are US standard.
In short, stream ciphers are not worse then block ones meaning speed and
stability, but they require more care. I know openssl supports a stream
cipher algorithm (RC4), but I can not recommend or refuse it for now
(didn't have a chance to make a marketing analysis of existing stream
algos).
More often,
> CFB/OFB/CTR methods are used to convert a block cipher into a stream
> cipher. The OpenSSL documentation is *amazingly* bad, but it looks like
> it should be pretty easy to use any of these techniques with AES as well
> as with DES.
>
> = Initialization vector
>
> Right now, the code uses a constant IV, which is totally unacceptable
> from a security standpoint and was always meant to be changed before
> release. The question is: what should we use for an IV? GlusterFS does
> attach a supposedly unique "gfid" as an xattr on each file, so that
> might be usable as a basis for the IV so long as we can verify that it's
> universal and stable enough to be sure that data won't become
> unrecoverable because a gfid is missing or changed.
There is an ESSIV technique of assigning IV (used in linux DMcrypt
subsystem):
IV(sector) = E_s(sector), where s = hash_K,
see http://en.wikipedia.org/wiki/Disk_encryption_theory for details
I think we can use it with the following modification: instead of
"sector" we should take (N + gfid), where N is a number of logical
cluster in a file (*). This is needed to make sure that files with
identical plain content will have different cipher text.
(*) Logical clusters are file's chunks ciphered independently for
good random access. I suggest to consider only 4K clusters.
Edward.
>
> = Conflict management
>
> For partial-block writes, the encryption module needs to do the
> following atomically.
>
> * Read the current block contents.
>
> * Decrypt.
>
> * Overlay the new partial block on the old whole block.
>
> * Encrypt.
>
> * Write the entire block.
>
> There's some additional complexity to do with EOF, but that's the basic
> idea. The current code eschews locks in favor of "optimistic"
> concurrency control in which a server-side "oplock" translator maintains
> a generation number for each inode. Clients can start a "transaction"
> before they read, associating the current inode generation with their
> connection. The next write on that connection will compare the stored
> generation number vs. the current one. If they're not the same, that
> means there was another write since the transaction started, and the
> write is rejected so the client can start over. Unfortunately, this
> does not account for "self conflicts" when one client sends multiple
> writes to the same file in parallel. The standard
> performance/write-behind translator does this constantly, which is why
> it has to be disabled when using cloudfs encryption, and there are many
> other ways for it to happen.
>
> My first inclination would be to add client code which detects and
> avoids such self-conflict, but I have a sneaking suspicion that will be
> pretty complex and have to be tweaked a lot to avoid compromising
> performance. I kind of suspect that server-side queuing might be the
> right answer here. If a transaction is begun which conflicts with
> another already in progress, then the new one is simply queued behind
> the old one and the transaction-begin call (actually a special setxattr)
> will be resumed when the old ones complete. This also addresses
> fairness/forward-progress issues inherent in both the locking and retry
> models, though we'll need to put some thought into recovery from faults.
> _______________________________________________
> cloudfs-devel mailing list
> cloudfs-devel(a)lists.fedorahosted.org
> https://fedorahosted.org/mailman/listinfo/cloudfs-devel
13 years, 1 month
Authentication and transport encryption
by Jeff Darcy
The thing we need in CloudFS is authentication at the tenant level, so
that we can map a tenant's connection to the appropriate namespace and
UID/GID space. For all practical purposes, this is the same as
machine-level authentication. Trying to authenticate at a finer
granularity, e.g. for separate users on a single machine or set of
machines, is too cumbersome. Tenants will not accept having to register
each individual with a site-wide service, when no such need exists e.g.
on Amazon or Rackspace and integrating that service alongside whatever
auth* service they're already using themselves is likely to be painful.
We give each tenant a sandbox, and they can do whatever the heck they
want within it, just like they can with compute instances and so on already.
So, let's say that we have some way of providing strong authentication
between the login/cloud translators. At the end of the authentication
handshake we know, to a cryptographic certainty, that connection X
belongs to tenant Y. After that, though, all bets are off. Without
transport-level encryption, or at least per-message authentication via
HMAC or similar, we're vulnerable to session high-jacking and other
sorts of tampering with subsequent requests. Here's where we run into a
problem: there's no reasonable way to make GlusterFS (knowingly) use a
connection with these sorts of encryption/authentication properties.
We'd have to change a lot of code in the existing 3500-line socket
transport module, and do a very careful sweep through the management
code to make sure that our new transport can be configured etc.
This pretty much means that we need to use some external means - VPN,
sshuttle, cloud-provided firewalls, manually-set-up ssh tunnels - to
ensure security at the transport level. If we do that, we really
shouldn't need our own separate authentication. We should be able to
re-use whatever authenticated identity was used to establish the
connection in the first place, but I don't happen to know of any
generically-good way to do that e.g. with a VPN. Also, such an approach
might run into the same "requiring integration with a site-wide service
would be bad" problem as with the accounts themselves. That leads us
back to requiring our own authentication even though we're relying on
something else at the transport level, and we have two choices:
* Implement our own authentication using filesystem calls as the
communications channel. This is what we currently do, using setxattr.
It could be modified so that clients do something like read a block of
data from a pseudo-file and then use a private key to generate an HMAC
which is actually used to log in. This could be provably as secure as
other methods, but would be non-standard.
* Use somebody else's authentication. For example, we could use the
OpenSSL suite to generate client certificates signed by the provider.
The server could then open a TLS port for the client to connect to,
verify the cert's validity (using standard mechanisms and code) when the
client does so, and then map the identity in the cert to an internal
tenant ID. This might be more appealing to the "not safe unless it's on
our list" types, but it's a bit unpleasant in other ways (e.g. opening a
second hole in a firewall for the very brief TLS connection).
I'm mostly leaning toward doing our own authentication and leaving the
transport-layer encryption as "somebody else's problem" (with
appropriate documentation) but I'm open to other suggestions. Any ideas?
13 years, 2 months
Configuration/management improvements
by Jeff Darcy
As I've discussed with many people, the current methods of setting up
and maintaining CloudFS on top of GlusterFS are cumbersome and
error-prone. Mostly this is the result of trying too hard to re-use too
much of the GlusterFS infrastructure for tasks such as distributing
volfiles and starting servers. Here are some basic requirements for
what a "next gen" management interface for CloudFS should be able to do:
* "Import" a GlusterFS volume, creating an equivalent set of server and
(generic) client volfiles which include the CloudFS translators.
* Handle addition and removal of tenants by updating our own database
and generating tenant-specific client volfiles from the generic ones.
When we implement stronger authentication, this might also involve some
key distribution.
* Ensure that all CloudFS volfiles remain in sync across servers and
clients, including regeneration when either the CloudFS volfiles or the
underlying GlusterFS volfiles are changed.
* Start/stop server daemons using the CloudFS volfiles.
* Mount/unmount on clients using the CloudFS volfiles.
The way I propose to handle these requirements is to create "cloudfsd"
which largely corresponds to glusterd (the management daemon, not
glusterfsd which is the server daemon). This would be responsible for
distributing volfiles among servers and starting/stopping server daemons
using those volfiles. We should rely on the existing glusterd to deal
with cluster ("pool") membership issues, instead of inventing our own
infrastructure for that. Similarly, we should use glusterd's
port-mapping infrastructure instead of our own. When we start servers,
we point them to glusterd for registration. When we mount on clients,
they'll get the information they need from there.
Importing volumes, or adding/removing tenants, should still be done by
the cloudfs script/executable, which can poke the local cloudfsd as
appropriate to handle distribution etc. Note that we don't actually
need to deal with starting up new translators in live server processes
and so on initially. For now it's sufficient to deal with the
volfile-distribution issues, and possibly restart server daemons
entirely. In fact, we might want to stick with that approach for quite
a while. Actually inserting and removing translators in a running
server process is tricky and probably not well tested yet; starting a
new process to serve new tenants would work very nearly as well without
those problems. The options for the CloudFS CLI should be a proper
superset of those for the GlusterFS CLI. If we can parse the command
and recognize it as one of our own, then we can handle it internally.
Otherwise, we should pass the command verbatim to "glusterfs" and then
take any necessary actions (e.g. regenerating our volfiles) when that
returns.
The last part is client mounts. Currently, mount.glusterfs will contact
a server to fetch its volfile - the same for anyone - and then use that
to mount. We can sort of do that, but we have to deal with issues of
having volfiles be tenant-specific and carry authentication information
*which should never be on the server(s)*. One way to do this would be
to have mount.cloudfs fetch a generic CloudFS client volfile (still not
the same as the original GlusterFS client volfile) from the server, then
post-process locally to add tenant identity and credentials before
passing the result to glusterfs for actual mounting. After that,
mapping ports and making connections can be handled by the existing
GlusterFS methods.
13 years, 2 months
Encryption
by Jeff Darcy
There are three basic issues that need to be addressed in the encryption
module: type of cipher used, initialization-vector handling, and
conflict management. Each is non-trivial, so I'll address them in turn.
= Cipher
The main factor affecting our choice of ciphers (or APIs to them) is
that we need to be able to deal efficiently with updates both in the
middle of the file and at the end. At EOF, the problem is that we need
a whole cipher-block in order to decrypt, but the file might actually
end at any byte boundary within that cipher-block. Therefore, we have
to deal with the "residue" somehow.
* Store the residue in an xattr.
* Store a whole cipher-block at the end, record the amount of padding in
an xattr.
* Use a stream cipher (or block cipher converted to a stream cipher).
This problem is further compounded by the striping case, where EOF for a
stripe component (local file stored on one brick) might not be EOF for
the entire file (union of all stripe components).
Since the two xattr-based approaches both require extra calls, the
stream-cipher approach has been used, with the cipher resetting at block
(e.g. 4KB) boundaries to allow efficient middle-of-file updates. As it
turns out, pure stream ciphers are relatively uncommon. More often,
CFB/OFB/CTR methods are used to convert a block cipher into a stream
cipher. The OpenSSL documentation is *amazingly* bad, but it looks like
it should be pretty easy to use any of these techniques with AES as well
as with DES.
= Initialization vector
Right now, the code uses a constant IV, which is totally unacceptable
from a security standpoint and was always meant to be changed before
release. The question is: what should we use for an IV? GlusterFS does
attach a supposedly unique "gfid" as an xattr on each file, so that
might be usable as a basis for the IV so long as we can verify that it's
universal and stable enough to be sure that data won't become
unrecoverable because a gfid is missing or changed.
= Conflict management
For partial-block writes, the encryption module needs to do the
following atomically.
* Read the current block contents.
* Decrypt.
* Overlay the new partial block on the old whole block.
* Encrypt.
* Write the entire block.
There's some additional complexity to do with EOF, but that's the basic
idea. The current code eschews locks in favor of "optimistic"
concurrency control in which a server-side "oplock" translator maintains
a generation number for each inode. Clients can start a "transaction"
before they read, associating the current inode generation with their
connection. The next write on that connection will compare the stored
generation number vs. the current one. If they're not the same, that
means there was another write since the transaction started, and the
write is rejected so the client can start over. Unfortunately, this
does not account for "self conflicts" when one client sends multiple
writes to the same file in parallel. The standard
performance/write-behind translator does this constantly, which is why
it has to be disabled when using cloudfs encryption, and there are many
other ways for it to happen.
My first inclination would be to add client code which detects and
avoids such self-conflict, but I have a sneaking suspicion that will be
pretty complex and have to be tweaked a lot to avoid compromising
performance. I kind of suspect that server-side queuing might be the
right answer here. If a transaction is begun which conflicts with
another already in progress, then the new one is simply queued behind
the old one and the transaction-begin call (actually a special setxattr)
will be resumed when the old ones complete. This also addresses
fairness/forward-progress issues inherent in both the locking and retry
models, though we'll need to put some thought into recovery from faults.
13 years, 2 months
4 commits - fedora-ize pkg/cloudfs.spec.in pkg/configure.ac README.md scripts/cloudfs
by Jeff Darcy
README.md | 171 +++++++++++++++++++++++++++-------------------------
fedora-ize | 4 -
pkg/cloudfs.spec.in | 2
pkg/configure.ac | 2
scripts/cloudfs | 16 +++-
5 files changed, 104 insertions(+), 91 deletions(-)
New commits:
commit a0893f212e0b68a1c49a18c9677cbc1370d31070
Author: Jeff Darcy <jdarcy(a)redhat.com>
Date: Tue Feb 22 12:21:19 2011 -0500
More doc updates.
diff --git a/README.md b/README.md
index c570777..2e12ed7 100644
--- a/README.md
+++ b/README.md
@@ -59,8 +59,8 @@ If that disappears, the -2 RPMs for el6 (which should be equivalent) are at:
http://jdarcy.fedorapeople.org/el6_rpms/
-To build CloudFS, you need to install the -devel RPMs. Once you've done that,
-you can go into your git tree and do the following:
+To build CloudFS, you need to install the glusterfs-devel RPMs. Once you've
+done that, you can go into your cloudfs git tree[3] and do the following:
./fedora-ize
rsync -aptv SOURCES/ ~/rpmbuild/SOURCES/
@@ -75,8 +75,10 @@ quick. Install the resulting RPM and you're ready for configuration.
## Configuration ##
-You use the "cloudfs" script to set up CloudFS-specific features. There are
-two main cloudfs commands.
+CloudFS operates on volumes that are modified from those created by GlusterFS,
+as described in the GlusterFS documentation[4]. Once you have created the
+volume in GlusterFS, do *not* start it. Use the "cloudfs" script to set up
+CloudFS-specific features. There are two main cloudfs commands.
* cloudfs init VOLUME USERS
This initalizes the "multi-tenant" (namespace isolation) features of CloudFS,
@@ -94,7 +96,9 @@ two main cloudfs commands.
These commands must be run *on every server*, and re-run any time you use the
"gluster volume set" command to change volume parameters (which will re-write
-the originals that CloudFS has copied and modified).
+the originals that CloudFS has copied and modified). Also, these changes will
+- unlike changes made with the "gluster" command - not take effect until the
+next time the volume is started.
In addition to rewriting volfiles, you must create subdirectories - again on
each server - for each user plus one for the "junk" pseudo-user. Thus, if you
@@ -102,6 +106,13 @@ have a brick belonging to a volume at server1:/exports/glu and you add a user
"fred" you will need to create /exports/glu/fred on server1 yourself . . . and
likewise for every other brick in the volume.
+Finally, you're ready to mount. Since the GlusterFS volfile-fetching
+infrastructure can't handle per-tenant volfiles, you'll have to do this the
+"old fashioned" (i.e. pre-3.1) way, by specifying the actual file instead
+of a server.
+
+ glusterfs --volfile my.vol.file /my/mount/point
+
NOTE: this process is recognized to be cumbersome, and will become less so
shortly. In the next version, there will still be a "cloudfs" command/script
to provide perform actions across the entire storage pool from a single
@@ -139,5 +150,6 @@ the work-list items below as well.
[2] https://fedoraproject.org/wiki/Features/CloudFS
+[3] http://git.fedorahosted.org/git/?p=CloudFS.git
-
+[4] http://gluster.com/community/documentation/index.php/Gluster_3.1_Filesyst...
commit 527b046b6bd6a22769bfedcba8c8dff89fd18f6b
Author: Jeff Darcy <jdarcy(a)redhat.com>
Date: Mon Feb 21 17:00:02 2011 -0500
New doc for building and configuration.
Also changed Markdown header styles.
diff --git a/README.md b/README.md
index 6af15bd..c570777 100644
--- a/README.md
+++ b/README.md
@@ -1,10 +1,8 @@
-CloudFS
-=======
+# CloudFS #
-Introduction
-------------
+## Introduction ##
-CloudFS is a set of enhancements to GlusterFS[1], allowing a cloud provider to
+CloudFS is a set of enhancements to GlusterFS[1] allowing a cloud provider to
set up a permanent, shared filesystem for their users. This mostly involves
protecting users from each other in various ways, but includes other features
as well:
@@ -27,11 +25,10 @@ as well:
In the future, CloudFS will also include an improved distribution ("DHT")
translator, and multi-site replication. These features are not part of the
-current release. The first-release functionality is more fully described
-in the Fedora 15 feature page[2].
+current release. The first-release functionality is more fully described in
+the Fedora 15 feature page[2].
-Code Structure
---------------
+## Code Structure ##
Most GlusterFS functionality is contained in "translators" which translate a
higher-level operation (e.g. a write) into one or more lower-level operations
@@ -50,80 +47,80 @@ implementation):
* auth (client, TBD): auxiliary/helper code for authentication
-Building
---------
+## Building ##
-To avoid distributing an entire GlusterFS tree without permission, CloudFS is
-currently distributed as a set of overlays (for new files) and patches (for
-existing files) to the official GlusterFS tree. To create a complete CloudFS
-tree, follow these steps:
+CloudFS depends on a specific version of GlusterFS, which is currently only
+packaged for Fedora 15. This version has (as of 2011/02/21) not hit the yum
+repositories yet, but the latest build can be downloaded from:
- git clone git://git.gluster.com/glusterfs.git cloudfs
- cd cloudfs
- rsync -apt $CLOUDFS_DIR/xlators/ xlators/
- for i in $CLOUDFS_DIR/patches/*; do patch -p1 < $i; done
+ http://koji.fedoraproject.org/koji/buildinfo?buildID=223571
-At this point you can follow the usual GlusterFS build process. If you're
-familiar with building RPMs, you can do something like this:
+If that disappears, the -2 RPMs for el6 (which should be equivalent) are at:
- ./autogen.sh
- ./configure --enable-fusermount
- make dist-gzip
- cp glusterfs-3.1.0git.tar.gz ~/rpmbuild/SOURCES
- cp glusterfs.spec ~/rpmbuild/SPECS
- cd ~/rpmbuild
- rpmbuild -bb SPECS/glusterfs.spec
-
-Work is currently under way to allow building translators "out of tree" much
-as can be done for kernel modules. Once that work is complete, a simpler
-CloudFS build procedure will be implemented, and a separate RPM specfile will
-be created embodying that process.
-
-Configuration
--------------
-
-The work to integrate configuration of the new translators with the current
-gluster CLI is, unfortunately, still TBD. The only method available to
-configure and use the new translators is to edit the "volfiles" by hand, as
-you would have done in 3.0 but with an extra twist. If you have created a
-volume named "fubar" then your volfiles will be in /etc/glusterd/vols/fubar on
-the servers. There will be one fubar-fuse.vol for the clients, and one
-fubar.${HOST}.${PATH}.vol for each "brick" making up the filesystem. To make
-a change globally, you'll need to do the following:
-
-1. Edit one of the brick volfiles, e.g. on host "gnarly"
-
-2. Propagate the changes to the other volfiles on the same host. If you only
- have one brick per server, and all bricks use the same path, you can simply
- copy the edited volfile.
-
-3. On every *other* server, do "volume sync gnarly all" to fetch the edited
- volfiles.
-
-See the CONFIG.txt in each translator's directory for instructions specific
-to that translator.
+ http://jdarcy.fedorapeople.org/el6_rpms/
-The .../scripts directory contains some Python scripts that can help automate
-the process of modifying volfiles. Specifically:
+To build CloudFS, you need to install the -devel RPMs. Once you've done that,
+you can go into your git tree and do the following:
-* filt-log-io.py: inserts a debug/log-io translator between a protocol/server
- volume and each of its subvolumes
-
-* filt-crypto.py: inserts an encryption/crypto translator on top of each
- protocol/client volume. Also disables performance/quick-read, which is
- incompatible with encryption/crypto.
-
-* filt-cloud.py: replaces a simple translator "stack" (from storage/posix up
- to whatever is below protocol/server) with one such stack per named tenant,
- plus a cluster/cloud translator to tie them together.
-
-Running these scripts after each gluster "volume create" or "volume set"
-command should generate a new volfile with the desired enhancements. They use
-a common volfile parsing/modification library that might be useful for other
-tasks as well.
-
-Work List
----------
+ ./fedora-ize
+ rsync -aptv SOURCES/ ~/rpmbuild/SOURCES/
+ rsync -aptv SPECS/ ~/rpmbuild/SPECS/
+ cd ~/rpmbuild
+ rpmbuild -bb SPECS/cloudfs.spec
+
+The rest is standard rpmbuild stuff. For debugging, you'll probably want to
+prepend CFLAGS=-g to your rpmbuild command line. Since this process only
+builds the CloudFS-specific translators and not all of GlusterFS, it's pretty
+quick. Install the resulting RPM and you're ready for configuration.
+
+## Configuration ##
+
+You use the "cloudfs" script to set up CloudFS-specific features. There are
+two main cloudfs commands.
+
+* cloudfs init VOLUME USERS
+ This initalizes the "multi-tenant" (namespace isolation) features of CloudFS,
+ by rewriting the server volfiles to include the "cloud" translator and
+ generating per-user client volfiles which include the "login" translator.
+ The USERS file is simply a list of name/password pairs, one pair per line,
+ separated by spaces. There is no provision currently for extra indentation,
+ comments, etc. Note that the per-user client volfiles are placed in
+ /var/lib/glusterd/vols/VOLUME/VOLUME-fuse.vol.USER and you will need to get
+ them to the client(s) yourself.
+
+* cloudfs initc VOLFILE KEY
+ This initializes the encryption feature of CloudFS, by rewriting a client
+ volfile (not a volume name) to include the "crypt" translator.
+
+These commands must be run *on every server*, and re-run any time you use the
+"gluster volume set" command to change volume parameters (which will re-write
+the originals that CloudFS has copied and modified).
+
+In addition to rewriting volfiles, you must create subdirectories - again on
+each server - for each user plus one for the "junk" pseudo-user. Thus, if you
+have a brick belonging to a volume at server1:/exports/glu and you add a user
+"fred" you will need to create /exports/glu/fred on server1 yourself . . . and
+likewise for every other brick in the volume.
+
+NOTE: this process is recognized to be cumbersome, and will become less so
+shortly. In the next version, there will still be a "cloudfs" command/script
+to provide perform actions across the entire storage pool from a single
+command line, but it will be improved in the following ways:
+
+* Instead of specifying users via a file, the list of users and corresponding
+ passwords (or other credentials - see work list) will be maintained by
+ cloudfs itself. The interface will include add-user and del-user commands,
+ which can even be issued dynamically.
+
+* Adding (or removing) users will automatically create (or delete) the
+ per-brick subdirectories. The "junk" pseudo-user will go away.
+
+As a side effect of the way these features are implemented, there will be
+separate cloudfsd and mount.cloudfs commands corresponding to glusterd and
+mount.glusterfs respectively. There will be other changes corresponding to
+the work-list items below as well.
+
+## Work List ##
* crypt translator: stronger encryption, keys in files
@@ -131,17 +128,15 @@ Work List
* auth translator: create
-* build system: out-of-tree build process, specfile
-
* config: CLI integration, other tools for UID/GID mapping, billing, cert/key
management
* doc: pull together per-translator options, CLI extensions, other tools
-Notes
------
+## Notes ##
[1] http://www.gluster.org or http://www.gluster.com
+
[2] https://fedoraproject.org/wiki/Features/CloudFS
commit 3c13ff84acc4f2ba7621b1ed545051ca4307940d
Author: Jeff Darcy <jdarcy(a)redhat.com>
Date: Thu Feb 3 21:58:46 2011 -0500
Even more packaging changes.
diff --git a/fedora-ize b/fedora-ize
index a5ac4b9..682812d 100755
--- a/fedora-ize
+++ b/fedora-ize
@@ -20,8 +20,8 @@ cp -r xlators/encryption/crypt/src $work/crypt/
cp -r xlators/features/oplock/src $work/oplock/
cp -r xlators/cluster/login/src $work/login/
cp pkg/* $work/
-cp scripts/cloudfs $work/
cp scripts/volfilter.py $work/
+cp scripts/cloudfs $work/
# Configure just enough to get a decent specfile.
cd $work
@@ -32,8 +32,6 @@ cd -
# Create and populate the SOURCES directory.
mkdir -p SOURCES
(cd $mytmp; tar cvfz ../SOURCES/cloudfs-0.5.tgz cloudfs-0.5)
-cp scripts/volfilter.py SOURCES/
-cp scripts/cloudfs SOURCES/
# Create and populate the SPECS directory.
mkdir -p SPECS
diff --git a/pkg/cloudfs.spec.in b/pkg/cloudfs.spec.in
index 30c8a3f..1397438 100644
--- a/pkg/cloudfs.spec.in
+++ b/pkg/cloudfs.spec.in
@@ -17,7 +17,7 @@ URL: http://cloudfs.org
Source0: http://cloudfs.org/dist/0.5/cloudfs-0.5.tgz
BuildRoot: %{_tmppath}/%{name}-%{version}-%{release}-root
-Requires: glusterfs >= 3.1.1
+Requires: glusterfs = 3.1.2
Requires: openssl
Requires: python
BuildRequires: glusterfs-devel >= 3.1.1
diff --git a/pkg/configure.ac b/pkg/configure.ac
index bc64915..8d3e2c0 100644
--- a/pkg/configure.ac
+++ b/pkg/configure.ac
@@ -65,7 +65,7 @@ if test "x${have_spinlock}" = "xyes"; then
AC_DEFINE(HAVE_SPINLOCK, 1, [define if found spinlock])
fi
-GLUSTER_VERSION=3.1.1
+GLUSTER_VERSION=3.1.2
GF_HOST_OS=""
GF_LDFLAGS="-rdynamic"
GF_HOST_OS="GF_LINUX_HOST_OS"
commit c03e1fec8bb8975b31d620cf1798df86d0ce5108
Author: Jeff Darcy <jdarcy(a)redhat.com>
Date: Thu Feb 3 21:58:28 2011 -0500
Be even more conservative about performance translators.
diff --git a/scripts/cloudfs b/scripts/cloudfs
index 4b6615c..ed521f1 100755
--- a/scripts/cloudfs
+++ b/scripts/cloudfs
@@ -50,6 +50,14 @@ glusterd_dirs = [
"/etc/glusterd" # Gluster
]
+# These are incompatible with crypt in various ways.
+bad_translators = [
+ "performance/quick-read",
+ "performance/read-ahead",
+ "performance/write-behind",
+ "performance/io-cache"
+]
+
def copy_stack (old_xl,suffix,recursive=False):
if recursive:
new_name = old_xl.name + "-" + suffix
@@ -179,17 +187,17 @@ def do_init_crypt ():
graph, last = volfilter.load(vfname+".save")
opts = { "key": sys.argv[3] }
to_do = [xl for xl in graph.itervalues()
- if xl.type == "performance/quick-read"]
+ if xl.type in bad_translators]
for td in to_do:
volfilter.delete(graph,td)
to_do = [xl for xl in graph.itervalues()
- if xl.type == "performance/write-behind"]
+ if xl.type == "cluster/dht"]
if to_do:
+ # Nice to push as close to dht as we can.
for td in to_do:
- # Nice to push below io-stats etc. if we can
volfilter.push_filter(graph,td,"encryption/crypt",opts)
else:
- # Might as well push it on top.
+ # Push on top if all else fails.
volfilter.push_filter(graph,last,"encryption/crypt",opts)
volfilter.generate(graph,last,file(vfname,"w"))
13 years, 2 months
5 commits - fedora-ize pkg/cloudfs.spec.in pkg/configure.ac pkg/Makefile.am scripts/cloudfs scripts/volfilter.py xlators/cluster xlators/encryption xlators/features
by Jeff Darcy
fedora-ize | 2
pkg/Makefile.am | 2
pkg/cloudfs.spec.in | 21 +----
pkg/configure.ac | 2
scripts/cloudfs | 13 ++-
scripts/volfilter.py | 9 +-
xlators/cluster/cloud/src/cloud.c | 6 +
xlators/encryption/crypt/src/crypt.c | 136 ++++++++++++++++++++++++-----------
xlators/encryption/crypt/src/crypt.h | 1
xlators/features/oplock/src/oplock.c | 2
10 files changed, 132 insertions(+), 62 deletions(-)
New commits:
commit a2786b98842257a29b197e5d416713e01b7c4000
Author: Jeff Darcy <jdarcy(a)redhat.com>
Date: Thu Feb 3 17:27:06 2011 -0500
Packaging changes (mostly from rpmlint).
diff --git a/fedora-ize b/fedora-ize
index c5bca93..a5ac4b9 100755
--- a/fedora-ize
+++ b/fedora-ize
@@ -20,6 +20,8 @@ cp -r xlators/encryption/crypt/src $work/crypt/
cp -r xlators/features/oplock/src $work/oplock/
cp -r xlators/cluster/login/src $work/login/
cp pkg/* $work/
+cp scripts/cloudfs $work/
+cp scripts/volfilter.py $work/
# Configure just enough to get a decent specfile.
cd $work
diff --git a/pkg/Makefile.am b/pkg/Makefile.am
index 0c83e6c..8fe762f 100644
--- a/pkg/Makefile.am
+++ b/pkg/Makefile.am
@@ -1,3 +1,5 @@
+bin_SCRIPTS = cloudfs
+python_PYTHON = volfilter.py
EXTRA_DIST = autogen.sh COPYING cloudfs.spec
SUBDIRS = cloud crypt oplock login
diff --git a/pkg/cloudfs.spec.in b/pkg/cloudfs.spec.in
index 3e0a705..30c8a3f 100644
--- a/pkg/cloudfs.spec.in
+++ b/pkg/cloudfs.spec.in
@@ -9,14 +9,12 @@ Summary: Cloud File System
Name: @PACKAGE_NAME@
Version: @PACKAGE_VERSION@
Release: %{release}
-License: AGPLv3+
+License: AGPLv3
+Group: Applications/File
Group: System Environment/Base
Vendor: Red Hat
-Packager: @PACKAGE_BUGREPORT@
URL: http://cloudfs.org
-Source0: cloudfs-0.5.tgz
-Source1: volfilter.py
-Source2: cloudfs
+Source0: http://cloudfs.org/dist/0.5/cloudfs-0.5.tgz
BuildRoot: %{_tmppath}/%{name}-%{version}-%{release}-root
Requires: glusterfs >= 3.1.1
@@ -31,10 +29,6 @@ BuildRequires: openssl-devel
CloudFS is a cloud-capable filesystem based on GlusterFS (http://gluster.org)
with additional authentication/encryption/multi-tenancy features.
-Summary: CloudFS
-Group: Applications/File
-Provides: cloudfs = %{version}-%{release}
-
%prep
%setup -q -n %{name}-%{version}
@@ -48,14 +42,11 @@ Provides: cloudfs = %{version}-%{release}
%install
%{__rm} -rf %{buildroot}
%{__make} install DESTDIR=%{buildroot}
-%{__install} -D -p -m 0644 %{SOURCE1} \
- %{buildroot}%{python_sitelib}/volfilter.py
-%{__install} -D -p -m 0755 %{SOURCE2} \
- %{buildroot}%{_bindir}/cloudfs
# Remove unwanted files from all the shared libraries
find %{buildroot}%{_libdir} -name '*.a' -delete
find %{buildroot}%{_libdir} -name '*.la' -delete
+find %{buildroot}%{_libdir} -name '*.so.0.0.0' | xargs strip
%clean
%{__rm} -rf %{buildroot}
@@ -66,10 +57,10 @@ find %{buildroot}%{_libdir} -name '*.la' -delete
%{_libdir}/glusterfs/@GLUSTER_VERSION(a)/xlator/cluster/*.so*
%{_libdir}/glusterfs/@GLUSTER_VERSION(a)/xlator/encryption/*.so*
%{_libdir}/glusterfs/@GLUSTER_VERSION(a)/xlator/features/*.so*
-%{python_sitelib}/volfilter.py
+%{python_sitelib}/volfilter.py*
%{_bindir}/cloudfs
%changelog
-* Fri Jan 21 2011 Jeff Darcy <jdarcy(a)redhat.com> - 0.5
+* Fri Jan 21 2011 Jeff Darcy <jdarcy(a)redhat.com> - 0.5-1
- Original version based on GlusterFS 3.1.1
diff --git a/pkg/configure.ac b/pkg/configure.ac
index d53fafa..bc64915 100644
--- a/pkg/configure.ac
+++ b/pkg/configure.ac
@@ -20,6 +20,8 @@ AM_INIT_AUTOMAKE
AM_CONFIG_HEADER([config.h])
+AM_PATH_PYTHON([])
+
AC_CONFIG_FILES([Makefile
cloud/Makefile
crypt/Makefile
commit 84de7f10d03e37f33b15579e6d42d6aedc571a92
Author: Jeff Darcy <jdarcy(a)redhat.com>
Date: Thu Jan 27 20:33:49 2011 -0500
Allow statfs even when unbound.
Without this, DHT gets errors back as it's trying to get disk-full
information so that it can avoid over-filling a subvolume. This seems
more important than the minor amount of information leakage from the
pre-binding dummy subvolume (which will in almost all cases be the same
actul local filesystem as the real per-tenant subvolumes).
diff --git a/xlators/cluster/cloud/src/cloud.c b/xlators/cluster/cloud/src/cloud.c
index 5cdbd3f..0bddb3a 100644
--- a/xlators/cluster/cloud/src/cloud.c
+++ b/xlators/cluster/cloud/src/cloud.c
@@ -192,12 +192,14 @@ cloud_stat (call_frame_t *frame, xlator_t *this, loc_t *loc)
return 0;
}
+#if 0
int32_t
cloud_statfs (call_frame_t *frame, xlator_t *this, loc_t *loc)
{
STACK_UNWIND_STRICT (statfs, frame, -1, EPERM, NULL);
return 0;
}
+#endif
int
cloud_symlink (call_frame_t *frame, xlator_t *this, const char *linkpath,
@@ -495,7 +497,11 @@ struct xlator_fops fops = {
.rmdir = cloud_rmdir,
.setattr = cloud_setattr,
.stat = cloud_stat,
+ /*
+ * Allow statfs, because it's fairly harmless and blocking it confuses
+ * DHT's capacity-balancing code.
.statfs = cloud_statfs,
+ */
.symlink = cloud_symlink,
.truncate = cloud_truncate,
.unlink = cloud_unlink,
commit a0f1109e5bc5bf5e077925ef07d8d34c32d0f733
Author: Jeff Darcy <jdarcy(a)redhat.com>
Date: Thu Jan 27 20:31:22 2011 -0500
Insert crypt above write-behind to avoid excessive conflict retries.
While conflicts are handled now, write-behind can cause an awful lot
of "self conflicts" (where one of two simultaneous writes breaks the
other's oplock). This brings the conflict/retry count down to a more
reasonable level.
diff --git a/scripts/cloudfs b/scripts/cloudfs
index d8abe72..4b6615c 100755
--- a/scripts/cloudfs
+++ b/scripts/cloudfs
@@ -92,6 +92,7 @@ def cloudify_server (volfile, users):
for user, pw in users:
new_stack = copy_stack(last.subvols[0],user)
volfilter.push_filter(graph,new_stack,"features/oplock")
+ new_stack.name = user
subvols.append(new_stack)
# One cloud to bring them all...
@@ -182,9 +183,14 @@ def do_init_crypt ():
for td in to_do:
volfilter.delete(graph,td)
to_do = [xl for xl in graph.itervalues()
- if xl.type == "protocol/client"]
- for td in to_do:
- volfilter.push_filter(graph,td,"encryption/crypt",opts)
+ if xl.type == "performance/write-behind"]
+ if to_do:
+ for td in to_do:
+ # Nice to push below io-stats etc. if we can
+ volfilter.push_filter(graph,td,"encryption/crypt",opts)
+ else:
+ # Might as well push it on top.
+ volfilter.push_filter(graph,last,"encryption/crypt",opts)
volfilter.generate(graph,last,file(vfname,"w"))
if len(sys.argv) < 3:
commit f9a4fd656c857406ae0db0ba541c20454f393211
Author: Jeff Darcy <jdarcy(a)redhat.com>
Date: Thu Jan 27 17:33:26 2011 -0500
Implement retries when conflicts are detected.
diff --git a/xlators/encryption/crypt/src/crypt.c b/xlators/encryption/crypt/src/crypt.c
index 6110816..d2d3be1 100644
--- a/xlators/encryption/crypt/src/crypt.c
+++ b/xlators/encryption/crypt/src/crypt.c
@@ -31,6 +31,13 @@
#include "crypt.h"
+/* Forward decls so crypt_launch can retry. */
+int32_t
+crypt_rmw_done (call_frame_t *frame, xlator_t *this);
+int32_t
+crypt_lock_cbk (call_frame_t *frame, void *cookie, xlator_t *this,
+ int32_t op_ret, int32_t op_errno);
+
/*
* The "lame" stuff is just for testing (makes it easier to verify correctness
* of contents) and must be enabled by hand.
@@ -86,7 +93,7 @@ lame_decrypt_iovec (crypt_private_t *priv, struct iovec *vector, int count)
*/
void
-good_encrypt (crypt_private_t *priv, char * buf, int len)
+good_crypt_buf (crypt_private_t *priv, char * buf, int len, int dir)
{
DES_cblock ivec;
int num;
@@ -94,8 +101,9 @@ good_encrypt (crypt_private_t *priv, char * buf, int len)
while (len >= priv->block_size) {
memset(&ivec,0,sizeof(ivec));
num = 0;
- DES_cfb64_encrypt((const unsigned char *)buf,(unsigned char *)buf,
- priv->block_size,&priv->sched,&ivec,&num,1);
+ DES_cfb64_encrypt((const unsigned char *)buf,
+ (unsigned char *)buf, priv->block_size,
+ &priv->sched,&ivec,&num,dir);
buf += priv->block_size;
len -= priv->block_size;
}
@@ -103,18 +111,25 @@ good_encrypt (crypt_private_t *priv, char * buf, int len)
if (len > 0) {
memset(&ivec,0,sizeof(ivec));
num = 0;
- DES_cfb64_encrypt((const unsigned char *)buf,(unsigned char *)buf,
- len,&priv->sched, &ivec,&num,1);
+ DES_cfb64_encrypt((const unsigned char *)buf,
+ (unsigned char *)buf,len,&priv->sched,&ivec,&num,dir);
}
}
-/*
- * We don't need good_decrypt because good_decrypt_iovec just does the
- * looping internally.
- */
+void
+good_encrypt (crypt_private_t *priv, char * buf, int len)
+{
+ good_crypt_buf(priv,buf,len,1);
+}
void
-good_decrypt_iovec (crypt_private_t *priv, struct iovec *vector, int count)
+good_decrypt (crypt_private_t *priv, char * buf, int len)
+{
+ good_crypt_buf(priv,buf,len,0);
+}
+
+void
+good_crypt_iov (crypt_private_t *priv, struct iovec *vector, int count, int dir)
{
DES_cblock ivec;
int num;
@@ -137,7 +152,7 @@ good_decrypt_iovec (crypt_private_t *priv, struct iovec *vector, int count)
}
DES_cfb64_encrypt((const unsigned char *)buf,
(unsigned char *)buf,
- bytes,&priv->sched,&ivec,&num,0);
+ bytes,&priv->sched,&ivec,&num,dir);
buf += bytes;
b_resid = (b_resid + bytes) % priv->block_size;
if (!b_resid) {
@@ -147,6 +162,18 @@ good_decrypt_iovec (crypt_private_t *priv, struct iovec *vector, int count)
}
}
+void
+good_encrypt_iovec (crypt_private_t *priv, struct iovec *vector, int count)
+{
+ good_crypt_iov(priv,vector,count,1);
+}
+
+void
+good_decrypt_iovec (crypt_private_t *priv, struct iovec *vector, int count)
+{
+ good_crypt_iov(priv,vector,count,0);
+}
+
int32_t
crypt_readv_cbk (call_frame_t *frame,
void *cookie,
@@ -344,6 +371,48 @@ err:
}
int32_t
+crypt_launch (call_frame_t *frame, xlator_t *this)
+{
+ crypt_wlocal_t *local = frame->local;
+ crypt_private_t *priv = this->private;
+ int32_t op_errno = ENOMEM;
+
+ local->call_count = (local->head_resid != 0) + (local->tail_resid != 0);
+
+ /* Check for the only case which doesn't require a lock. */
+ if (!local->call_count) {
+ /* Head and tail were encrypted on the previous pass. */
+ good_decrypt(priv,local->head_data,priv->block_size);
+ good_encrypt(priv,local->head_data,priv->block_size);
+ return crypt_rmw_done(frame,this);
+ }
+
+ local->xattr = get_new_dict();
+ if (!local->xattr) {
+ op_errno = ENOMEM;
+ goto err;
+ }
+
+ if (dict_set_str(local->xattr,"trusted.glusterfs.lock","fubar") != 0) {
+ op_errno = EIO;
+ dict_unref(local->xattr);
+ goto err;
+ }
+
+ local->op_ret = 0;
+
+ STACK_WIND (frame, crypt_lock_cbk, FIRST_CHILD(this),
+ FIRST_CHILD(this)->fops->fsetxattr,
+ local->fd, local->xattr, 0);
+
+ return 0;
+
+err:
+ STACK_UNWIND_STRICT(writev,frame,-1,op_errno,NULL,NULL);
+ return 0;
+}
+
+int32_t
crypt_writev_cbk (call_frame_t *frame,
void *cookie,
xlator_t *this,
@@ -357,9 +426,15 @@ crypt_writev_cbk (call_frame_t *frame,
/*
* This is where we might get an error indicating that somebody else
* wrote to the file between our read and write. In that case, we
- * would simply re-start at the lock call.
+ * simply re-start at the lock call.
*/
+ if ((op_ret < 0) && (op_errno == EBUSY)) {
+ gf_log(this->name,GF_LOG_WARNING,"retrying conflicted write");
+ local->is_retry = _gf_true;
+ return crypt_launch(frame,this);
+ }
+
if (op_ret > local->orig_size) {
op_ret = local->orig_size;
}
@@ -425,10 +500,12 @@ crypt_rmw_done (call_frame_t *frame, xlator_t *this)
b_offset = 0;
to_go = local->vector[v_index].iov_len - v_offset;
while (to_go >= priv->block_size) {
- good_encrypt(priv,
- (char *)(local->vector[v_index].iov_base)
- + v_offset + b_offset,
- priv->block_size);
+ if (!local->is_retry) {
+ good_encrypt(priv,
+ (char *)(local->vector[v_index].iov_base)
+ + v_offset + b_offset,
+ priv->block_size);
+ }
b_offset += priv->block_size;
to_go -= priv->block_size;
}
@@ -480,6 +557,7 @@ crypt_rmw_done (call_frame_t *frame, xlator_t *this)
goto err;
}
}
+
/*
* We always start our writes at block boundaries, so we subtract the
* "residue" before passing the offset to the next translator. Since
@@ -671,31 +749,9 @@ crypt_writev (call_frame_t *frame,
}
local->head_resid = head_resid;
local->tail_resid = tail_resid;
- local->call_count = (head_resid != 0) + (tail_resid != 0);
- /* Check for the only case which doesn't require a lock. */
- if (!local->call_count) {
- return crypt_rmw_done(frame,this);
- }
-
- local->xattr = get_new_dict();
- if (!local->xattr) {
- op_errno = ENOMEM;
- goto err;
- }
-
- if (dict_set_str(local->xattr,"trusted.glusterfs.lock","fubar") != 0) {
- op_errno = EIO;
- dict_unref(local->xattr);
- goto err;
- }
-
- local->op_ret = 0;
-
- STACK_WIND (frame, crypt_lock_cbk, FIRST_CHILD(this),
- FIRST_CHILD(this)->fops->fsetxattr, fd, local->xattr, 0);
-
- return 0;
+ local->is_retry = _gf_false;
+ return crypt_launch(frame,this);
err:
STACK_UNWIND_STRICT(writev,frame,-1,op_errno,NULL,NULL);
diff --git a/xlators/encryption/crypt/src/crypt.h b/xlators/encryption/crypt/src/crypt.h
index 0c6548c..fde242b 100644
--- a/xlators/encryption/crypt/src/crypt.h
+++ b/xlators/encryption/crypt/src/crypt.h
@@ -61,6 +61,7 @@ typedef struct {
size_t head_resid;
size_t tail_resid;
dict_t *xattr;
+ gf_boolean_t is_retry;
} crypt_wlocal_t;
#endif /* __CRYPT_H__ */
diff --git a/xlators/features/oplock/src/oplock.c b/xlators/features/oplock/src/oplock.c
index 940398e..402e6ea 100644
--- a/xlators/features/oplock/src/oplock.c
+++ b/xlators/features/oplock/src/oplock.c
@@ -148,6 +148,7 @@ oplock_writev (call_frame_t *frame, xlator_t *this, fd_t *fd, struct iovec *vect
entry = fetch_op_lock(&priv->locks,inode,state->conn);
if (entry) {
+ LIST_REMOVE(entry,links);
if (entry->value != inode->gen) {
gf_log(this->name,GF_LOG_DEBUG,
"would reject write for %d from %p",
@@ -155,7 +156,6 @@ oplock_writev (call_frame_t *frame, xlator_t *this, fd_t *fd, struct iovec *vect
op_errno = EBUSY;
goto err;
}
- LIST_REMOVE(entry,links);
FREE(entry);
}
++(inode->gen);
commit 5beada814cdbb2bc104669892a322890838fff56
Author: Jeff Darcy <jdarcy(a)redhat.com>
Date: Thu Jan 27 17:33:11 2011 -0500
Make slightly less ugly volume names.
diff --git a/scripts/cloudfs b/scripts/cloudfs
index 30d73bd..d8abe72 100755
--- a/scripts/cloudfs
+++ b/scripts/cloudfs
@@ -127,7 +127,6 @@ def do_init ():
voldir = ""
for gdir in glusterd_dirs:
probe = "%s/vols/%s" % (gdir, volname)
- print "trying %s" % probe
if os.access(probe,os.X_OK):
voldir = probe
break
diff --git a/scripts/volfilter.py b/scripts/volfilter.py
index 7453c32..98ab20e 100755
--- a/scripts/volfilter.py
+++ b/scripts/volfilter.py
@@ -85,15 +85,20 @@ def generate (graph, last, stream=sys.stdout):
print >> stream, "end-volume"
def push_filter (graph, old_xl, filt_type, opts={}):
- suffix = string.split(old_xl.type,"/")[1]
- new_xl = Translator(old_xl.name+"-"+suffix)
+ suffix = "-" + old_xl.type.split("/")[1]
+ if len(old_xl.name) > len(suffix):
+ if old_xl.name[-len(suffix):] == suffix:
+ old_xl.name = old_xl.name[:-len(suffix)]
+ new_xl = Translator(old_xl.name+suffix)
new_xl.type = old_xl.type
new_xl.opts = old_xl.opts
new_xl.subvols = old_xl.subvols
graph[new_xl.name] = new_xl
+ old_xl.name += ("-" + filt_type.split("/")[1])
old_xl.type = filt_type
old_xl.opts = opts
old_xl.subvols = [new_xl]
+ graph[old_xl.name] = old_xl
def delete (graph, victim):
if len(victim.subvols) != 1:
13 years, 2 months