February 2011 - cloudfs-devel - Fedora Mailing-Lists

by Edward Shishkin

On 02/24/2011 06:05 PM, Jeff Darcy wrote: > > There are three basic issues that need to be addressed in the encryption > module: type of cipher used, initialization-vector handling, and > conflict management. Each is non-trivial, so I'll address them in turn. > > = Cipher > > The main factor affecting our choice of ciphers (or APIs to them) is > that we need to be able to deal efficiently with updates both in the > middle of the file and at the end. At EOF, the problem is that we need > a whole cipher-block in order to decrypt, but the file might actually > end at any byte boundary within that cipher-block. Therefore, we have > to deal with the "residue" somehow. > > * Store the residue in an xattr. > > * Store a whole cipher-block at the end, record the amount of padding in > an xattr. > > * Use a stream cipher (or block cipher converted to a stream cipher). Nup. "Block cipher converted to a stream cipher" doesn't resolve the EOF problem. Moreover, this sounds bad, so I suggest to adhere the following terminology. There are block ciphers and stream ciphers. . Stream cipher translates 1 byte of plain text to 1 byte of cipher text. Example of stream cipher: RC4. . Block cipher translates 1 byte of plain text to 1 block (of key size) of cipher text. Examples of block cipher: DES, AES. Block cipher can have stream (chaining) mode, which requires an initial vector (IV). Regardless of modes block cipher has output of size, which is multiple to key size. So stream ciphers don't have the EOF problem, and block ciphers do have such problem regardless of stream modes. Block ciphers with non-stream modes sucks (are not stable to attacks), and we won't consider them. So eventually let's speak about "block ciphers" and "stream ciphers". > > This problem is further compounded by the striping case, where EOF for a > stripe component (local file stored on one brick) might not be EOF for > the entire file (union of all stripe components). > > Since the two xattr-based approaches both require extra calls, the > stream-cipher approach has been used, with the cipher resetting at block > (e.g. 4KB) boundaries to allow efficient middle-of-file updates. As it > turns out, pure stream ciphers are relatively uncommon. Stream ciphers are EU standard, whereas block ciphers are US standard. In short, stream ciphers are not worse then block ones meaning speed and stability, but they require more care. I know openssl supports a stream cipher algorithm (RC4), but I can not recommend or refuse it for now (didn't have a chance to make a marketing analysis of existing stream algos). More often, > CFB/OFB/CTR methods are used to convert a block cipher into a stream > cipher. The OpenSSL documentation is *amazingly* bad, but it looks like > it should be pretty easy to use any of these techniques with AES as well > as with DES. > > = Initialization vector > > Right now, the code uses a constant IV, which is totally unacceptable > from a security standpoint and was always meant to be changed before > release. The question is: what should we use for an IV? GlusterFS does > attach a supposedly unique "gfid" as an xattr on each file, so that > might be usable as a basis for the IV so long as we can verify that it's > universal and stable enough to be sure that data won't become > unrecoverable because a gfid is missing or changed. There is an ESSIV technique of assigning IV (used in linux DMcrypt subsystem): IV(sector) = E_s(sector), where s = hash_K, see http://en.wikipedia.org/wiki/Disk_encryption_theory for details I think we can use it with the following modification: instead of "sector" we should take (N + gfid), where N is a number of logical cluster in a file (*). This is needed to make sure that files with identical plain content will have different cipher text. (*) Logical clusters are file's chunks ciphered independently for good random access. I suggest to consider only 4K clusters. Edward. > > = Conflict management > > For partial-block writes, the encryption module needs to do the > following atomically. > > * Read the current block contents. > > * Decrypt. > > * Overlay the new partial block on the old whole block. > > * Encrypt. > > * Write the entire block. > > There's some additional complexity to do with EOF, but that's the basic > idea. The current code eschews locks in favor of "optimistic" > concurrency control in which a server-side "oplock" translator maintains > a generation number for each inode. Clients can start a "transaction" > before they read, associating the current inode generation with their > connection. The next write on that connection will compare the stored > generation number vs. the current one. If they're not the same, that > means there was another write since the transaction started, and the > write is rejected so the client can start over. Unfortunately, this > does not account for "self conflicts" when one client sends multiple > writes to the same file in parallel. The standard > performance/write-behind translator does this constantly, which is why > it has to be disabled when using cloudfs encryption, and there are many > other ways for it to happen. > > My first inclination would be to add client code which detects and > avoids such self-conflict, but I have a sneaking suspicion that will be > pretty complex and have to be tweaked a lot to avoid compromising > performance. I kind of suspect that server-side queuing might be the > right answer here. If a transaction is begun which conflicts with > another already in progress, then the new one is simply queued behind > the old one and the transaction-begin call (actually a special setxattr) > will be resumed when the old ones complete. This also addresses > fairness/forward-progress issues inherent in both the locking and retry > models, though we'll need to put some thought into recovery from faults. > _______________________________________________ > cloudfs-devel mailing list > cloudfs-devel(a)lists.fedorahosted.org > https://fedorahosted.org/mailman/listinfo/cloudfs-devel

13 years, 1 month

2
3
0 / 0

Authentication and transport encryption

by Jeff Darcy

The thing we need in CloudFS is authentication at the tenant level, so that we can map a tenant's connection to the appropriate namespace and UID/GID space. For all practical purposes, this is the same as machine-level authentication. Trying to authenticate at a finer granularity, e.g. for separate users on a single machine or set of machines, is too cumbersome. Tenants will not accept having to register each individual with a site-wide service, when no such need exists e.g. on Amazon or Rackspace and integrating that service alongside whatever auth* service they're already using themselves is likely to be painful. We give each tenant a sandbox, and they can do whatever the heck they want within it, just like they can with compute instances and so on already. So, let's say that we have some way of providing strong authentication between the login/cloud translators. At the end of the authentication handshake we know, to a cryptographic certainty, that connection X belongs to tenant Y. After that, though, all bets are off. Without transport-level encryption, or at least per-message authentication via HMAC or similar, we're vulnerable to session high-jacking and other sorts of tampering with subsequent requests. Here's where we run into a problem: there's no reasonable way to make GlusterFS (knowingly) use a connection with these sorts of encryption/authentication properties. We'd have to change a lot of code in the existing 3500-line socket transport module, and do a very careful sweep through the management code to make sure that our new transport can be configured etc. This pretty much means that we need to use some external means - VPN, sshuttle, cloud-provided firewalls, manually-set-up ssh tunnels - to ensure security at the transport level. If we do that, we really shouldn't need our own separate authentication. We should be able to re-use whatever authenticated identity was used to establish the connection in the first place, but I don't happen to know of any generically-good way to do that e.g. with a VPN. Also, such an approach might run into the same "requiring integration with a site-wide service would be bad" problem as with the accounts themselves. That leads us back to requiring our own authentication even though we're relying on something else at the transport level, and we have two choices: * Implement our own authentication using filesystem calls as the communications channel. This is what we currently do, using setxattr. It could be modified so that clients do something like read a block of data from a pseudo-file and then use a private key to generate an HMAC which is actually used to log in. This could be provably as secure as other methods, but would be non-standard. * Use somebody else's authentication. For example, we could use the OpenSSL suite to generate client certificates signed by the provider. The server could then open a TLS port for the client to connect to, verify the cert's validity (using standard mechanisms and code) when the client does so, and then map the identity in the cert to an internal tenant ID. This might be more appealing to the "not safe unless it's on our list" types, but it's a bit unpleasant in other ways (e.g. opening a second hole in a firewall for the very brief TLS connection). I'm mostly leaning toward doing our own authentication and leaving the transport-layer encryption as "somebody else's problem" (with appropriate documentation) but I'm open to other suggestions. Any ideas?

13 years, 2 months

1
0
0 / 0

Configuration/management improvements

by Jeff Darcy

As I've discussed with many people, the current methods of setting up and maintaining CloudFS on top of GlusterFS are cumbersome and error-prone. Mostly this is the result of trying too hard to re-use too much of the GlusterFS infrastructure for tasks such as distributing volfiles and starting servers. Here are some basic requirements for what a "next gen" management interface for CloudFS should be able to do: * "Import" a GlusterFS volume, creating an equivalent set of server and (generic) client volfiles which include the CloudFS translators. * Handle addition and removal of tenants by updating our own database and generating tenant-specific client volfiles from the generic ones. When we implement stronger authentication, this might also involve some key distribution. * Ensure that all CloudFS volfiles remain in sync across servers and clients, including regeneration when either the CloudFS volfiles or the underlying GlusterFS volfiles are changed. * Start/stop server daemons using the CloudFS volfiles. * Mount/unmount on clients using the CloudFS volfiles. The way I propose to handle these requirements is to create "cloudfsd" which largely corresponds to glusterd (the management daemon, not glusterfsd which is the server daemon). This would be responsible for distributing volfiles among servers and starting/stopping server daemons using those volfiles. We should rely on the existing glusterd to deal with cluster ("pool") membership issues, instead of inventing our own infrastructure for that. Similarly, we should use glusterd's port-mapping infrastructure instead of our own. When we start servers, we point them to glusterd for registration. When we mount on clients, they'll get the information they need from there. Importing volumes, or adding/removing tenants, should still be done by the cloudfs script/executable, which can poke the local cloudfsd as appropriate to handle distribution etc. Note that we don't actually need to deal with starting up new translators in live server processes and so on initially. For now it's sufficient to deal with the volfile-distribution issues, and possibly restart server daemons entirely. In fact, we might want to stick with that approach for quite a while. Actually inserting and removing translators in a running server process is tricky and probably not well tested yet; starting a new process to serve new tenants would work very nearly as well without those problems. The options for the CloudFS CLI should be a proper superset of those for the GlusterFS CLI. If we can parse the command and recognize it as one of our own, then we can handle it internally. Otherwise, we should pass the command verbatim to "glusterfs" and then take any necessary actions (e.g. regenerating our volfiles) when that returns. The last part is client mounts. Currently, mount.glusterfs will contact a server to fetch its volfile - the same for anyone - and then use that to mount. We can sort of do that, but we have to deal with issues of having volfiles be tenant-specific and carry authentication information *which should never be on the server(s)*. One way to do this would be to have mount.cloudfs fetch a generic CloudFS client volfile (still not the same as the original GlusterFS client volfile) from the server, then post-process locally to add tenant identity and credentials before passing the result to glusterfs for actual mounting. After that, mapping ports and making connections can be handled by the existing GlusterFS methods.

13 years, 2 months

1
0
0 / 0

Encryption

by Jeff Darcy

There are three basic issues that need to be addressed in the encryption module: type of cipher used, initialization-vector handling, and conflict management. Each is non-trivial, so I'll address them in turn. = Cipher The main factor affecting our choice of ciphers (or APIs to them) is that we need to be able to deal efficiently with updates both in the middle of the file and at the end. At EOF, the problem is that we need a whole cipher-block in order to decrypt, but the file might actually end at any byte boundary within that cipher-block. Therefore, we have to deal with the "residue" somehow. * Store the residue in an xattr. * Store a whole cipher-block at the end, record the amount of padding in an xattr. * Use a stream cipher (or block cipher converted to a stream cipher). This problem is further compounded by the striping case, where EOF for a stripe component (local file stored on one brick) might not be EOF for the entire file (union of all stripe components). Since the two xattr-based approaches both require extra calls, the stream-cipher approach has been used, with the cipher resetting at block (e.g. 4KB) boundaries to allow efficient middle-of-file updates. As it turns out, pure stream ciphers are relatively uncommon. More often, CFB/OFB/CTR methods are used to convert a block cipher into a stream cipher. The OpenSSL documentation is *amazingly* bad, but it looks like it should be pretty easy to use any of these techniques with AES as well as with DES. = Initialization vector Right now, the code uses a constant IV, which is totally unacceptable from a security standpoint and was always meant to be changed before release. The question is: what should we use for an IV? GlusterFS does attach a supposedly unique "gfid" as an xattr on each file, so that might be usable as a basis for the IV so long as we can verify that it's universal and stable enough to be sure that data won't become unrecoverable because a gfid is missing or changed. = Conflict management For partial-block writes, the encryption module needs to do the following atomically. * Read the current block contents. * Decrypt. * Overlay the new partial block on the old whole block. * Encrypt. * Write the entire block. There's some additional complexity to do with EOF, but that's the basic idea. The current code eschews locks in favor of "optimistic" concurrency control in which a server-side "oplock" translator maintains a generation number for each inode. Clients can start a "transaction" before they read, associating the current inode generation with their connection. The next write on that connection will compare the stored generation number vs. the current one. If they're not the same, that means there was another write since the transaction started, and the write is rejected so the client can start over. Unfortunately, this does not account for "self conflicts" when one client sends multiple writes to the same file in parallel. The standard performance/write-behind translator does this constantly, which is why it has to be disabled when using cloudfs encryption, and there are many other ways for it to happen. My first inclination would be to add client code which detects and avoids such self-conflict, but I have a sneaking suspicion that will be pretty complex and have to be tweaked a lot to avoid compromising performance. I kind of suspect that server-side queuing might be the right answer here. If a transaction is begun which conflicts with another already in progress, then the new one is simply queued behind the old one and the transaction-begin call (actually a special setxattr) will be resumed when the old ones complete. This also addresses fairness/forward-progress issues inherent in both the locking and retry models, though we'll need to put some thought into recovery from faults.

13 years, 2 months

1
0
0 / 0

4 commits - fedora-ize pkg/cloudfs.spec.in pkg/configure.ac README.md scripts/cloudfs

by Jeff Darcy

README.md | 171 +++++++++++++++++++++++++++------------------------- fedora-ize | 4 - pkg/cloudfs.spec.in | 2 pkg/configure.ac | 2 scripts/cloudfs | 16 +++- 5 files changed, 104 insertions(+), 91 deletions(-) New commits: commit a0893f212e0b68a1c49a18c9677cbc1370d31070 Author: Jeff Darcy <jdarcy(a)redhat.com> Date: Tue Feb 22 12:21:19 2011 -0500 More doc updates. diff --git a/README.md b/README.md index c570777..2e12ed7 100644 --- a/README.md +++ b/README.md @@ -59,8 +59,8 @@ If that disappears, the -2 RPMs for el6 (which should be equivalent) are at: http://jdarcy.fedorapeople.org/el6_rpms/ -To build CloudFS, you need to install the -devel RPMs. Once you've done that, -you can go into your git tree and do the following: +To build CloudFS, you need to install the glusterfs-devel RPMs. Once you've +done that, you can go into your cloudfs git tree[3] and do the following: ./fedora-ize rsync -aptv SOURCES/ ~/rpmbuild/SOURCES/ @@ -75,8 +75,10 @@ quick. Install the resulting RPM and you're ready for configuration. ## Configuration ## -You use the "cloudfs" script to set up CloudFS-specific features. There are -two main cloudfs commands. +CloudFS operates on volumes that are modified from those created by GlusterFS, +as described in the GlusterFS documentation[4]. Once you have created the +volume in GlusterFS, do *not* start it. Use the "cloudfs" script to set up +CloudFS-specific features. There are two main cloudfs commands. * cloudfs init VOLUME USERS This initalizes the "multi-tenant" (namespace isolation) features of CloudFS, @@ -94,7 +96,9 @@ two main cloudfs commands. These commands must be run *on every server*, and re-run any time you use the "gluster volume set" command to change volume parameters (which will re-write -the originals that CloudFS has copied and modified). +the originals that CloudFS has copied and modified). Also, these changes will +- unlike changes made with the "gluster" command - not take effect until the +next time the volume is started. In addition to rewriting volfiles, you must create subdirectories - again on each server - for each user plus one for the "junk" pseudo-user. Thus, if you @@ -102,6 +106,13 @@ have a brick belonging to a volume at server1:/exports/glu and you add a user "fred" you will need to create /exports/glu/fred on server1 yourself . . . and likewise for every other brick in the volume. +Finally, you're ready to mount. Since the GlusterFS volfile-fetching +infrastructure can't handle per-tenant volfiles, you'll have to do this the +"old fashioned" (i.e. pre-3.1) way, by specifying the actual file instead +of a server. + + glusterfs --volfile my.vol.file /my/mount/point + NOTE: this process is recognized to be cumbersome, and will become less so shortly. In the next version, there will still be a "cloudfs" command/script to provide perform actions across the entire storage pool from a single @@ -139,5 +150,6 @@ the work-list items below as well. [2] https://fedoraproject.org/wiki/Features/CloudFS +[3] http://git.fedorahosted.org/git/?p=CloudFS.git - +[4] http://gluster.com/community/documentation/index.php/Gluster_3.1_Filesyst... commit 527b046b6bd6a22769bfedcba8c8dff89fd18f6b Author: Jeff Darcy <jdarcy(a)redhat.com> Date: Mon Feb 21 17:00:02 2011 -0500 New doc for building and configuration. Also changed Markdown header styles. diff --git a/README.md b/README.md index 6af15bd..c570777 100644 --- a/README.md +++ b/README.md @@ -1,10 +1,8 @@ -CloudFS -======= +# CloudFS # -Introduction ------------- +## Introduction ## -CloudFS is a set of enhancements to GlusterFS[1], allowing a cloud provider to +CloudFS is a set of enhancements to GlusterFS[1] allowing a cloud provider to set up a permanent, shared filesystem for their users. This mostly involves protecting users from each other in various ways, but includes other features as well: @@ -27,11 +25,10 @@ as well: In the future, CloudFS will also include an improved distribution ("DHT") translator, and multi-site replication. These features are not part of the -current release. The first-release functionality is more fully described -in the Fedora 15 feature page[2]. +current release. The first-release functionality is more fully described in +the Fedora 15 feature page[2]. -Code Structure --------------- +## Code Structure ## Most GlusterFS functionality is contained in "translators" which translate a higher-level operation (e.g. a write) into one or more lower-level operations @@ -50,80 +47,80 @@ implementation): * auth (client, TBD): auxiliary/helper code for authentication -Building --------- +## Building ## -To avoid distributing an entire GlusterFS tree without permission, CloudFS is -currently distributed as a set of overlays (for new files) and patches (for -existing files) to the official GlusterFS tree. To create a complete CloudFS -tree, follow these steps: +CloudFS depends on a specific version of GlusterFS, which is currently only +packaged for Fedora 15. This version has (as of 2011/02/21) not hit the yum +repositories yet, but the latest build can be downloaded from: - git clone git://git.gluster.com/glusterfs.git cloudfs - cd cloudfs - rsync -apt $CLOUDFS_DIR/xlators/ xlators/ - for i in $CLOUDFS_DIR/patches/*; do patch -p1 < $i; done + http://koji.fedoraproject.org/koji/buildinfo?buildID=223571 -At this point you can follow the usual GlusterFS build process. If you're -familiar with building RPMs, you can do something like this: +If that disappears, the -2 RPMs for el6 (which should be equivalent) are at: - ./autogen.sh - ./configure --enable-fusermount - make dist-gzip - cp glusterfs-3.1.0git.tar.gz ~/rpmbuild/SOURCES - cp glusterfs.spec ~/rpmbuild/SPECS - cd ~/rpmbuild - rpmbuild -bb SPECS/glusterfs.spec - -Work is currently under way to allow building translators "out of tree" much -as can be done for kernel modules. Once that work is complete, a simpler -CloudFS build procedure will be implemented, and a separate RPM specfile will -be created embodying that process. - -Configuration -------------- - -The work to integrate configuration of the new translators with the current -gluster CLI is, unfortunately, still TBD. The only method available to -configure and use the new translators is to edit the "volfiles" by hand, as -you would have done in 3.0 but with an extra twist. If you have created a -volume named "fubar" then your volfiles will be in /etc/glusterd/vols/fubar on -the servers. There will be one fubar-fuse.vol for the clients, and one -fubar.${HOST}.${PATH}.vol for each "brick" making up the filesystem. To make -a change globally, you'll need to do the following: - -1. Edit one of the brick volfiles, e.g. on host "gnarly" - -2. Propagate the changes to the other volfiles on the same host. If you only - have one brick per server, and all bricks use the same path, you can simply - copy the edited volfile. - -3. On every *other* server, do "volume sync gnarly all" to fetch the edited - volfiles. - -See the CONFIG.txt in each translator's directory for instructions specific -to that translator. + http://jdarcy.fedorapeople.org/el6_rpms/ -The .../scripts directory contains some Python scripts that can help automate -the process of modifying volfiles. Specifically: +To build CloudFS, you need to install the -devel RPMs. Once you've done that, +you can go into your git tree and do the following: -* filt-log-io.py: inserts a debug/log-io translator between a protocol/server - volume and each of its subvolumes - -* filt-crypto.py: inserts an encryption/crypto translator on top of each - protocol/client volume. Also disables performance/quick-read, which is - incompatible with encryption/crypto. - -* filt-cloud.py: replaces a simple translator "stack" (from storage/posix up - to whatever is below protocol/server) with one such stack per named tenant, - plus a cluster/cloud translator to tie them together. - -Running these scripts after each gluster "volume create" or "volume set" -command should generate a new volfile with the desired enhancements. They use -a common volfile parsing/modification library that might be useful for other -tasks as well. - -Work List ---------- + ./fedora-ize + rsync -aptv SOURCES/ ~/rpmbuild/SOURCES/ + rsync -aptv SPECS/ ~/rpmbuild/SPECS/ + cd ~/rpmbuild + rpmbuild -bb SPECS/cloudfs.spec + +The rest is standard rpmbuild stuff. For debugging, you'll probably want to +prepend CFLAGS=-g to your rpmbuild command line. Since this process only +builds the CloudFS-specific translators and not all of GlusterFS, it's pretty +quick. Install the resulting RPM and you're ready for configuration. + +## Configuration ## + +You use the "cloudfs" script to set up CloudFS-specific features. There are +two main cloudfs commands. + +* cloudfs init VOLUME USERS + This initalizes the "multi-tenant" (namespace isolation) features of CloudFS, + by rewriting the server volfiles to include the "cloud" translator and + generating per-user client volfiles which include the "login" translator. + The USERS file is simply a list of name/password pairs, one pair per line, + separated by spaces. There is no provision currently for extra indentation, + comments, etc. Note that the per-user client volfiles are placed in + /var/lib/glusterd/vols/VOLUME/VOLUME-fuse.vol.USER and you will need to get + them to the client(s) yourself. + +* cloudfs initc VOLFILE KEY + This initializes the encryption feature of CloudFS, by rewriting a client + volfile (not a volume name) to include the "crypt" translator. + +These commands must be run *on every server*, and re-run any time you use the +"gluster volume set" command to change volume parameters (which will re-write +the originals that CloudFS has copied and modified). + +In addition to rewriting volfiles, you must create subdirectories - again on +each server - for each user plus one for the "junk" pseudo-user. Thus, if you +have a brick belonging to a volume at server1:/exports/glu and you add a user +"fred" you will need to create /exports/glu/fred on server1 yourself . . . and +likewise for every other brick in the volume. + +NOTE: this process is recognized to be cumbersome, and will become less so +shortly. In the next version, there will still be a "cloudfs" command/script +to provide perform actions across the entire storage pool from a single +command line, but it will be improved in the following ways: + +* Instead of specifying users via a file, the list of users and corresponding + passwords (or other credentials - see work list) will be maintained by + cloudfs itself. The interface will include add-user and del-user commands, + which can even be issued dynamically. + +* Adding (or removing) users will automatically create (or delete) the + per-brick subdirectories. The "junk" pseudo-user will go away. + +As a side effect of the way these features are implemented, there will be +separate cloudfsd and mount.cloudfs commands corresponding to glusterd and +mount.glusterfs respectively. There will be other changes corresponding to +the work-list items below as well. + +## Work List ## * crypt translator: stronger encryption, keys in files @@ -131,17 +128,15 @@ Work List * auth translator: create -* build system: out-of-tree build process, specfile - * config: CLI integration, other tools for UID/GID mapping, billing, cert/key management * doc: pull together per-translator options, CLI extensions, other tools -Notes ------ +## Notes ## [1] http://www.gluster.org or http://www.gluster.com + [2] https://fedoraproject.org/wiki/Features/CloudFS commit 3c13ff84acc4f2ba7621b1ed545051ca4307940d Author: Jeff Darcy <jdarcy(a)redhat.com> Date: Thu Feb 3 21:58:46 2011 -0500 Even more packaging changes. diff --git a/fedora-ize b/fedora-ize index a5ac4b9..682812d 100755 --- a/fedora-ize +++ b/fedora-ize @@ -20,8 +20,8 @@ cp -r xlators/encryption/crypt/src $work/crypt/ cp -r xlators/features/oplock/src $work/oplock/ cp -r xlators/cluster/login/src $work/login/ cp pkg/* $work/ -cp scripts/cloudfs $work/ cp scripts/volfilter.py $work/ +cp scripts/cloudfs $work/ # Configure just enough to get a decent specfile. cd $work @@ -32,8 +32,6 @@ cd - # Create and populate the SOURCES directory. mkdir -p SOURCES (cd $mytmp; tar cvfz ../SOURCES/cloudfs-0.5.tgz cloudfs-0.5) -cp scripts/volfilter.py SOURCES/ -cp scripts/cloudfs SOURCES/ # Create and populate the SPECS directory. mkdir -p SPECS diff --git a/pkg/cloudfs.spec.in b/pkg/cloudfs.spec.in index 30c8a3f..1397438 100644 --- a/pkg/cloudfs.spec.in +++ b/pkg/cloudfs.spec.in @@ -17,7 +17,7 @@ URL: http://cloudfs.org Source0: http://cloudfs.org/dist/0.5/cloudfs-0.5.tgz BuildRoot: %{_tmppath}/%{name}-%{version}-%{release}-root -Requires: glusterfs >= 3.1.1 +Requires: glusterfs = 3.1.2 Requires: openssl Requires: python BuildRequires: glusterfs-devel >= 3.1.1 diff --git a/pkg/configure.ac b/pkg/configure.ac index bc64915..8d3e2c0 100644 --- a/pkg/configure.ac +++ b/pkg/configure.ac @@ -65,7 +65,7 @@ if test "x${have_spinlock}" = "xyes"; then AC_DEFINE(HAVE_SPINLOCK, 1, [define if found spinlock]) fi -GLUSTER_VERSION=3.1.1 +GLUSTER_VERSION=3.1.2 GF_HOST_OS="" GF_LDFLAGS="-rdynamic" GF_HOST_OS="GF_LINUX_HOST_OS" commit c03e1fec8bb8975b31d620cf1798df86d0ce5108 Author: Jeff Darcy <jdarcy(a)redhat.com> Date: Thu Feb 3 21:58:28 2011 -0500 Be even more conservative about performance translators. diff --git a/scripts/cloudfs b/scripts/cloudfs index 4b6615c..ed521f1 100755 --- a/scripts/cloudfs +++ b/scripts/cloudfs @@ -50,6 +50,14 @@ glusterd_dirs = [ "/etc/glusterd" # Gluster ] +# These are incompatible with crypt in various ways. +bad_translators = [ + "performance/quick-read", + "performance/read-ahead", + "performance/write-behind", + "performance/io-cache" +] + def copy_stack (old_xl,suffix,recursive=False): if recursive: new_name = old_xl.name + "-" + suffix @@ -179,17 +187,17 @@ def do_init_crypt (): graph, last = volfilter.load(vfname+".save") opts = { "key": sys.argv[3] } to_do = [xl for xl in graph.itervalues() - if xl.type == "performance/quick-read"] + if xl.type in bad_translators] for td in to_do: volfilter.delete(graph,td) to_do = [xl for xl in graph.itervalues() - if xl.type == "performance/write-behind"] + if xl.type == "cluster/dht"] if to_do: + # Nice to push as close to dht as we can. for td in to_do: - # Nice to push below io-stats etc. if we can volfilter.push_filter(graph,td,"encryption/crypt",opts) else: - # Might as well push it on top. + # Push on top if all else fails. volfilter.push_filter(graph,last,"encryption/crypt",opts) volfilter.generate(graph,last,file(vfname,"w"))

13 years, 2 months

1
0
0 / 0

5 commits - fedora-ize pkg/cloudfs.spec.in pkg/configure.ac pkg/Makefile.am scripts/cloudfs scripts/volfilter.py xlators/cluster xlators/encryption xlators/features

by Jeff Darcy

fedora-ize | 2 pkg/Makefile.am | 2 pkg/cloudfs.spec.in | 21 +---- pkg/configure.ac | 2 scripts/cloudfs | 13 ++- scripts/volfilter.py | 9 +- xlators/cluster/cloud/src/cloud.c | 6 + xlators/encryption/crypt/src/crypt.c | 136 ++++++++++++++++++++++++----------- xlators/encryption/crypt/src/crypt.h | 1 xlators/features/oplock/src/oplock.c | 2 10 files changed, 132 insertions(+), 62 deletions(-) New commits: commit a2786b98842257a29b197e5d416713e01b7c4000 Author: Jeff Darcy <jdarcy(a)redhat.com> Date: Thu Feb 3 17:27:06 2011 -0500 Packaging changes (mostly from rpmlint). diff --git a/fedora-ize b/fedora-ize index c5bca93..a5ac4b9 100755 --- a/fedora-ize +++ b/fedora-ize @@ -20,6 +20,8 @@ cp -r xlators/encryption/crypt/src $work/crypt/ cp -r xlators/features/oplock/src $work/oplock/ cp -r xlators/cluster/login/src $work/login/ cp pkg/* $work/ +cp scripts/cloudfs $work/ +cp scripts/volfilter.py $work/ # Configure just enough to get a decent specfile. cd $work diff --git a/pkg/Makefile.am b/pkg/Makefile.am index 0c83e6c..8fe762f 100644 --- a/pkg/Makefile.am +++ b/pkg/Makefile.am @@ -1,3 +1,5 @@ +bin_SCRIPTS = cloudfs +python_PYTHON = volfilter.py EXTRA_DIST = autogen.sh COPYING cloudfs.spec SUBDIRS = cloud crypt oplock login diff --git a/pkg/cloudfs.spec.in b/pkg/cloudfs.spec.in index 3e0a705..30c8a3f 100644 --- a/pkg/cloudfs.spec.in +++ b/pkg/cloudfs.spec.in @@ -9,14 +9,12 @@ Summary: Cloud File System Name: @PACKAGE_NAME@ Version: @PACKAGE_VERSION@ Release: %{release} -License: AGPLv3+ +License: AGPLv3 +Group: Applications/File Group: System Environment/Base Vendor: Red Hat -Packager: @PACKAGE_BUGREPORT@ URL: http://cloudfs.org -Source0: cloudfs-0.5.tgz -Source1: volfilter.py -Source2: cloudfs +Source0: http://cloudfs.org/dist/0.5/cloudfs-0.5.tgz BuildRoot: %{_tmppath}/%{name}-%{version}-%{release}-root Requires: glusterfs >= 3.1.1 @@ -31,10 +29,6 @@ BuildRequires: openssl-devel CloudFS is a cloud-capable filesystem based on GlusterFS (http://gluster.org) with additional authentication/encryption/multi-tenancy features. -Summary: CloudFS -Group: Applications/File -Provides: cloudfs = %{version}-%{release} - %prep %setup -q -n %{name}-%{version} @@ -48,14 +42,11 @@ Provides: cloudfs = %{version}-%{release} %install %{__rm} -rf %{buildroot} %{__make} install DESTDIR=%{buildroot} -%{__install} -D -p -m 0644 %{SOURCE1} \ - %{buildroot}%{python_sitelib}/volfilter.py -%{__install} -D -p -m 0755 %{SOURCE2} \ - %{buildroot}%{_bindir}/cloudfs # Remove unwanted files from all the shared libraries find %{buildroot}%{_libdir} -name '*.a' -delete find %{buildroot}%{_libdir} -name '*.la' -delete +find %{buildroot}%{_libdir} -name '*.so.0.0.0' | xargs strip %clean %{__rm} -rf %{buildroot} @@ -66,10 +57,10 @@ find %{buildroot}%{_libdir} -name '*.la' -delete %{_libdir}/glusterfs/@GLUSTER_VERSION(a)/xlator/cluster/*.so* %{_libdir}/glusterfs/@GLUSTER_VERSION(a)/xlator/encryption/*.so* %{_libdir}/glusterfs/@GLUSTER_VERSION(a)/xlator/features/*.so* -%{python_sitelib}/volfilter.py +%{python_sitelib}/volfilter.py* %{_bindir}/cloudfs %changelog -* Fri Jan 21 2011 Jeff Darcy <jdarcy(a)redhat.com> - 0.5 +* Fri Jan 21 2011 Jeff Darcy <jdarcy(a)redhat.com> - 0.5-1 - Original version based on GlusterFS 3.1.1 diff --git a/pkg/configure.ac b/pkg/configure.ac index d53fafa..bc64915 100644 --- a/pkg/configure.ac +++ b/pkg/configure.ac @@ -20,6 +20,8 @@ AM_INIT_AUTOMAKE AM_CONFIG_HEADER([config.h]) +AM_PATH_PYTHON([]) + AC_CONFIG_FILES([Makefile cloud/Makefile crypt/Makefile commit 84de7f10d03e37f33b15579e6d42d6aedc571a92 Author: Jeff Darcy <jdarcy(a)redhat.com> Date: Thu Jan 27 20:33:49 2011 -0500 Allow statfs even when unbound. Without this, DHT gets errors back as it's trying to get disk-full information so that it can avoid over-filling a subvolume. This seems more important than the minor amount of information leakage from the pre-binding dummy subvolume (which will in almost all cases be the same actul local filesystem as the real per-tenant subvolumes). diff --git a/xlators/cluster/cloud/src/cloud.c b/xlators/cluster/cloud/src/cloud.c index 5cdbd3f..0bddb3a 100644 --- a/xlators/cluster/cloud/src/cloud.c +++ b/xlators/cluster/cloud/src/cloud.c @@ -192,12 +192,14 @@ cloud_stat (call_frame_t *frame, xlator_t *this, loc_t *loc) return 0; } +#if 0 int32_t cloud_statfs (call_frame_t *frame, xlator_t *this, loc_t *loc) { STACK_UNWIND_STRICT (statfs, frame, -1, EPERM, NULL); return 0; } +#endif int cloud_symlink (call_frame_t *frame, xlator_t *this, const char *linkpath, @@ -495,7 +497,11 @@ struct xlator_fops fops = { .rmdir = cloud_rmdir, .setattr = cloud_setattr, .stat = cloud_stat, + /* + * Allow statfs, because it's fairly harmless and blocking it confuses + * DHT's capacity-balancing code. .statfs = cloud_statfs, + */ .symlink = cloud_symlink, .truncate = cloud_truncate, .unlink = cloud_unlink, commit a0f1109e5bc5bf5e077925ef07d8d34c32d0f733 Author: Jeff Darcy <jdarcy(a)redhat.com> Date: Thu Jan 27 20:31:22 2011 -0500 Insert crypt above write-behind to avoid excessive conflict retries. While conflicts are handled now, write-behind can cause an awful lot of "self conflicts" (where one of two simultaneous writes breaks the other's oplock). This brings the conflict/retry count down to a more reasonable level. diff --git a/scripts/cloudfs b/scripts/cloudfs index d8abe72..4b6615c 100755 --- a/scripts/cloudfs +++ b/scripts/cloudfs @@ -92,6 +92,7 @@ def cloudify_server (volfile, users): for user, pw in users: new_stack = copy_stack(last.subvols[0],user) volfilter.push_filter(graph,new_stack,"features/oplock") + new_stack.name = user subvols.append(new_stack) # One cloud to bring them all... @@ -182,9 +183,14 @@ def do_init_crypt (): for td in to_do: volfilter.delete(graph,td) to_do = [xl for xl in graph.itervalues() - if xl.type == "protocol/client"] - for td in to_do: - volfilter.push_filter(graph,td,"encryption/crypt",opts) + if xl.type == "performance/write-behind"] + if to_do: + for td in to_do: + # Nice to push below io-stats etc. if we can + volfilter.push_filter(graph,td,"encryption/crypt",opts) + else: + # Might as well push it on top. + volfilter.push_filter(graph,last,"encryption/crypt",opts) volfilter.generate(graph,last,file(vfname,"w")) if len(sys.argv) < 3: commit f9a4fd656c857406ae0db0ba541c20454f393211 Author: Jeff Darcy <jdarcy(a)redhat.com> Date: Thu Jan 27 17:33:26 2011 -0500 Implement retries when conflicts are detected. diff --git a/xlators/encryption/crypt/src/crypt.c b/xlators/encryption/crypt/src/crypt.c index 6110816..d2d3be1 100644 --- a/xlators/encryption/crypt/src/crypt.c +++ b/xlators/encryption/crypt/src/crypt.c @@ -31,6 +31,13 @@ #include "crypt.h" +/* Forward decls so crypt_launch can retry. */ +int32_t +crypt_rmw_done (call_frame_t *frame, xlator_t *this); +int32_t +crypt_lock_cbk (call_frame_t *frame, void *cookie, xlator_t *this, + int32_t op_ret, int32_t op_errno); + /* * The "lame" stuff is just for testing (makes it easier to verify correctness * of contents) and must be enabled by hand. @@ -86,7 +93,7 @@ lame_decrypt_iovec (crypt_private_t *priv, struct iovec *vector, int count) */ void -good_encrypt (crypt_private_t *priv, char * buf, int len) +good_crypt_buf (crypt_private_t *priv, char * buf, int len, int dir) { DES_cblock ivec; int num; @@ -94,8 +101,9 @@ good_encrypt (crypt_private_t *priv, char * buf, int len) while (len >= priv->block_size) { memset(&ivec,0,sizeof(ivec)); num = 0; - DES_cfb64_encrypt((const unsigned char *)buf,(unsigned char *)buf, - priv->block_size,&priv->sched,&ivec,&num,1); + DES_cfb64_encrypt((const unsigned char *)buf, + (unsigned char *)buf, priv->block_size, + &priv->sched,&ivec,&num,dir); buf += priv->block_size; len -= priv->block_size; } @@ -103,18 +111,25 @@ good_encrypt (crypt_private_t *priv, char * buf, int len) if (len > 0) { memset(&ivec,0,sizeof(ivec)); num = 0; - DES_cfb64_encrypt((const unsigned char *)buf,(unsigned char *)buf, - len,&priv->sched, &ivec,&num,1); + DES_cfb64_encrypt((const unsigned char *)buf, + (unsigned char *)buf,len,&priv->sched,&ivec,&num,dir); } } -/* - * We don't need good_decrypt because good_decrypt_iovec just does the - * looping internally. - */ +void +good_encrypt (crypt_private_t *priv, char * buf, int len) +{ + good_crypt_buf(priv,buf,len,1); +} void -good_decrypt_iovec (crypt_private_t *priv, struct iovec *vector, int count) +good_decrypt (crypt_private_t *priv, char * buf, int len) +{ + good_crypt_buf(priv,buf,len,0); +} + +void +good_crypt_iov (crypt_private_t *priv, struct iovec *vector, int count, int dir) { DES_cblock ivec; int num; @@ -137,7 +152,7 @@ good_decrypt_iovec (crypt_private_t *priv, struct iovec *vector, int count) } DES_cfb64_encrypt((const unsigned char *)buf, (unsigned char *)buf, - bytes,&priv->sched,&ivec,&num,0); + bytes,&priv->sched,&ivec,&num,dir); buf += bytes; b_resid = (b_resid + bytes) % priv->block_size; if (!b_resid) { @@ -147,6 +162,18 @@ good_decrypt_iovec (crypt_private_t *priv, struct iovec *vector, int count) } } +void +good_encrypt_iovec (crypt_private_t *priv, struct iovec *vector, int count) +{ + good_crypt_iov(priv,vector,count,1); +} + +void +good_decrypt_iovec (crypt_private_t *priv, struct iovec *vector, int count) +{ + good_crypt_iov(priv,vector,count,0); +} + int32_t crypt_readv_cbk (call_frame_t *frame, void *cookie, @@ -344,6 +371,48 @@ err: } int32_t +crypt_launch (call_frame_t *frame, xlator_t *this) +{ + crypt_wlocal_t *local = frame->local; + crypt_private_t *priv = this->private; + int32_t op_errno = ENOMEM; + + local->call_count = (local->head_resid != 0) + (local->tail_resid != 0); + + /* Check for the only case which doesn't require a lock. */ + if (!local->call_count) { + /* Head and tail were encrypted on the previous pass. */ + good_decrypt(priv,local->head_data,priv->block_size); + good_encrypt(priv,local->head_data,priv->block_size); + return crypt_rmw_done(frame,this); + } + + local->xattr = get_new_dict(); + if (!local->xattr) { + op_errno = ENOMEM; + goto err; + } + + if (dict_set_str(local->xattr,"trusted.glusterfs.lock","fubar") != 0) { + op_errno = EIO; + dict_unref(local->xattr); + goto err; + } + + local->op_ret = 0; + + STACK_WIND (frame, crypt_lock_cbk, FIRST_CHILD(this), + FIRST_CHILD(this)->fops->fsetxattr, + local->fd, local->xattr, 0); + + return 0; + +err: + STACK_UNWIND_STRICT(writev,frame,-1,op_errno,NULL,NULL); + return 0; +} + +int32_t crypt_writev_cbk (call_frame_t *frame, void *cookie, xlator_t *this, @@ -357,9 +426,15 @@ crypt_writev_cbk (call_frame_t *frame, /* * This is where we might get an error indicating that somebody else * wrote to the file between our read and write. In that case, we - * would simply re-start at the lock call. + * simply re-start at the lock call. */ + if ((op_ret < 0) && (op_errno == EBUSY)) { + gf_log(this->name,GF_LOG_WARNING,"retrying conflicted write"); + local->is_retry = _gf_true; + return crypt_launch(frame,this); + } + if (op_ret > local->orig_size) { op_ret = local->orig_size; } @@ -425,10 +500,12 @@ crypt_rmw_done (call_frame_t *frame, xlator_t *this) b_offset = 0; to_go = local->vector[v_index].iov_len - v_offset; while (to_go >= priv->block_size) { - good_encrypt(priv, - (char *)(local->vector[v_index].iov_base) - + v_offset + b_offset, - priv->block_size); + if (!local->is_retry) { + good_encrypt(priv, + (char *)(local->vector[v_index].iov_base) + + v_offset + b_offset, + priv->block_size); + } b_offset += priv->block_size; to_go -= priv->block_size; } @@ -480,6 +557,7 @@ crypt_rmw_done (call_frame_t *frame, xlator_t *this) goto err; } } + /* * We always start our writes at block boundaries, so we subtract the * "residue" before passing the offset to the next translator. Since @@ -671,31 +749,9 @@ crypt_writev (call_frame_t *frame, } local->head_resid = head_resid; local->tail_resid = tail_resid; - local->call_count = (head_resid != 0) + (tail_resid != 0); - /* Check for the only case which doesn't require a lock. */ - if (!local->call_count) { - return crypt_rmw_done(frame,this); - } - - local->xattr = get_new_dict(); - if (!local->xattr) { - op_errno = ENOMEM; - goto err; - } - - if (dict_set_str(local->xattr,"trusted.glusterfs.lock","fubar") != 0) { - op_errno = EIO; - dict_unref(local->xattr); - goto err; - } - - local->op_ret = 0; - - STACK_WIND (frame, crypt_lock_cbk, FIRST_CHILD(this), - FIRST_CHILD(this)->fops->fsetxattr, fd, local->xattr, 0); - - return 0; + local->is_retry = _gf_false; + return crypt_launch(frame,this); err: STACK_UNWIND_STRICT(writev,frame,-1,op_errno,NULL,NULL); diff --git a/xlators/encryption/crypt/src/crypt.h b/xlators/encryption/crypt/src/crypt.h index 0c6548c..fde242b 100644 --- a/xlators/encryption/crypt/src/crypt.h +++ b/xlators/encryption/crypt/src/crypt.h @@ -61,6 +61,7 @@ typedef struct { size_t head_resid; size_t tail_resid; dict_t *xattr; + gf_boolean_t is_retry; } crypt_wlocal_t; #endif /* __CRYPT_H__ */ diff --git a/xlators/features/oplock/src/oplock.c b/xlators/features/oplock/src/oplock.c index 940398e..402e6ea 100644 --- a/xlators/features/oplock/src/oplock.c +++ b/xlators/features/oplock/src/oplock.c @@ -148,6 +148,7 @@ oplock_writev (call_frame_t *frame, xlator_t *this, fd_t *fd, struct iovec *vect entry = fetch_op_lock(&priv->locks,inode,state->conn); if (entry) { + LIST_REMOVE(entry,links); if (entry->value != inode->gen) { gf_log(this->name,GF_LOG_DEBUG, "would reject write for %d from %p", @@ -155,7 +156,6 @@ oplock_writev (call_frame_t *frame, xlator_t *this, fd_t *fd, struct iovec *vect op_errno = EBUSY; goto err; } - LIST_REMOVE(entry,links); FREE(entry); } ++(inode->gen); commit 5beada814cdbb2bc104669892a322890838fff56 Author: Jeff Darcy <jdarcy(a)redhat.com> Date: Thu Jan 27 17:33:11 2011 -0500 Make slightly less ugly volume names. diff --git a/scripts/cloudfs b/scripts/cloudfs index 30d73bd..d8abe72 100755 --- a/scripts/cloudfs +++ b/scripts/cloudfs @@ -127,7 +127,6 @@ def do_init (): voldir = "" for gdir in glusterd_dirs: probe = "%s/vols/%s" % (gdir, volname) - print "trying %s" % probe if os.access(probe,os.X_OK): voldir = probe break diff --git a/scripts/volfilter.py b/scripts/volfilter.py index 7453c32..98ab20e 100755 --- a/scripts/volfilter.py +++ b/scripts/volfilter.py @@ -85,15 +85,20 @@ def generate (graph, last, stream=sys.stdout): print >> stream, "end-volume" def push_filter (graph, old_xl, filt_type, opts={}): - suffix = string.split(old_xl.type,"/")[1] - new_xl = Translator(old_xl.name+"-"+suffix) + suffix = "-" + old_xl.type.split("/")[1] + if len(old_xl.name) > len(suffix): + if old_xl.name[-len(suffix):] == suffix: + old_xl.name = old_xl.name[:-len(suffix)] + new_xl = Translator(old_xl.name+suffix) new_xl.type = old_xl.type new_xl.opts = old_xl.opts new_xl.subvols = old_xl.subvols graph[new_xl.name] = new_xl + old_xl.name += ("-" + filt_type.split("/")[1]) old_xl.type = filt_type old_xl.opts = opts old_xl.subvols = [new_xl] + graph[old_xl.name] = old_xl def delete (graph, victim): if len(victim.subvols) != 1:

13 years, 2 months

1
0
0 / 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

cloudfs-devel February 2011