hp@redhat.com said:
Dan Reed has some code written that's a bit simpler approach - it just rsyncs the homedir periodically.
This would work fine, at least for people with small /home directories, and it has the distinct advantage that it can be done right now, without a lot of development. It might be a problem for folks with larger /home directories, though. On my laptop, I have about 20GB of user files (images, maps, data, documents) and it takes a few minutes of disk grinding for rsync just to walk the tree and decide what's changed. This might be prohibitive (or at least prohibitively annoying) for the user. It also cuts into the battery life.
Another possible user-space option would be something based on SGI::FAM. Around here, I use a fam-based intrusion-detection system to monitor a few system files that are commonly modified by root kits, and send me a notice when one changes. A similar system could monitor the directories in /home and, whenever a file changes, queue it up for synchronization with a remote mirror. This has the advantage of not needing to walk the whole file tree repeatedly, as you'd need to do with rsync. I suspect that you might run into problems with the number of file descriptors or memory use for large /home directories, though. When I get a chance, I'll whip up a script and try it out on my laptop.
Moving out of user space, and requiring some of development, you could have the kernel's VFS layer generate a notice, maybe via DBUS, whenever a file changes. It'd be nice to be able to turn this on only for selected filesystems: monitor /home, but don't bother with /var, for example. A client would watch for changes and queue up files for synchronization.
Back to my original suggestion of a "RAID 1" mirror composed of a local disk and a network block device: It seems like you'd need to make the RAID system smart enough to realize that one device has much bigger latencies than the other, otherwise you'd get performance problems. You'd want to preferentially read from the local disk, for example, and you'd want to queue up writes to the remote disk instead of waiting for them to complete synchronously. I don't know if the current software RAID implementation supports this sort of thing.
Ideas are easy. Coding's hard. Thanks again for pulling together a lot of disconnected useful ideas into "stateless linux" and starting to instantiate them in code.
Bryan
On Wed, Sep 15, 2004 at 10:43:57AM -0400, Bryan K. Wright wrote:
directories, though. On my laptop, I have about 20GB of user files (images, maps, data, documents) and it takes a few minutes of disk grinding for rsync just to walk the tree and decide what's changed. This might be prohibitive (or at least prohibitively annoying) for the user. It also cuts into the battery life.
Another problem to worry about is saturation of the link upstream. I'm sure the average user wouldn't want the browser choked by rsync. Yes, you can tell rsync to use at most N KB/s, but that's not always easy to get right, if the user is in the position to estimate it at all - not to mention that link speed might change at any time for e.g. mobile users. And you run into the risk of the opposite scenario: you are forcing rsync to use only a fraction of the bandwidth, when there's nothing else using the rest.
Or do we just assume that there's going to be enough bandwidth? A saturated DSL is likely to be still more responsive than a saturated 56k connection.
On Wed, 2004-09-15 at 11:13, Rudi Chiarito wrote:
On Wed, Sep 15, 2004 at 10:43:57AM -0400, Bryan K. Wright wrote:
directories, though. On my laptop, I have about 20GB of user files (images, maps, data, documents) and it takes a few minutes of disk grinding for rsync just to walk the tree and decide what's changed. This might be prohibitive (or at least prohibitively annoying) for the user. It also cuts into the battery life.
Another problem to worry about is saturation of the link upstream. I'm sure the average user wouldn't want the browser choked by rsync. Yes, you can tell rsync to use at most N KB/s, but that's not always easy to get right, if the user is in the position to estimate it at all - not to mention that link speed might change at any time for e.g. mobile users. And you run into the risk of the opposite scenario: you are forcing rsync to use only a fraction of the bandwidth, when there's nothing else using the rest.
Or do we just assume that there's going to be enough bandwidth? A saturated DSL is likely to be still more responsive than a saturated 56k connection.
Well, a bigger problem with rsync is that in many cases, the listing files part is the biggest time sync. And if that gets interrupted, you start from the beginning. (*)
So, starting a backup via wireless/vpn when you open your laptop for 5 minutes at the coffee shop doesn't usually make sense.
So, you might want to look at it as "backup only when on these networks". I think it's pretty reasonable to assume that people have lots of bandwidth at home and at work these days.
Regards, Owen
(*) This may actually not be the case for most people's home directories; they probably don't have source trees, maildir folders, etc, so perhaps 200 files is more typical than the 50,000 you might find in a hacker's homedir.
So, you might want to look at it as "backup only when on these networks". I think it's pretty reasonable to assume that people have lots of bandwidth at home and at work these days.
I think this is a pretty safe assumption In the United States and some portions of the western world, but not a good assumption in a lot of other places.
-sv
On Wed, Sep 15, 2004 at 12:54:34PM -0400, seth vidal wrote:
I think this is a pretty safe assumption In the United States and some portions of the western world, but not a good assumption in a lot of other places.
I disagree even with that. There are a lot of networks that have volume charges and you don't want to mistake them for the users home lan. OTOH if you've got the kind of autoconfiguration that can find the backup service and see if it is the right one for this user you can get the right results.
Alan
On Wed, 2004-09-15 at 12:54, seth vidal wrote:
So, you might want to look at it as "backup only when on these networks". I think it's pretty reasonable to assume that people have lots of bandwidth at home and at work these days.
I think this is a pretty safe assumption In the United States and some portions of the western world, but not a good assumption in a lot of other places.
The question in my mind is:
If you never have a high bandwidth connection to the backup server, does the automatic homedir backup concept make sense?
Maybe it does, but it's certainly more of a challenge to make it work well in such circumstances, so I'd think of that as something to be looked at later, rather than a primary use case.
Regards, Owen
The question in my mind is:
If you never have a high bandwidth connection to the backup server, does the automatic homedir backup concept make sense?
Maybe it does, but it's certainly more of a challenge to make it work well in such circumstances, so I'd think of that as something to be looked at later, rather than a primary use case.
The question in my mind is: is an automatic homedir backup concept attainable?
I think the word we're going to keep running into trouble on is 'automatic'. We've got to get the user to leave it the hell alone for long enough to get the data and you're going to need to tell the user how long that will be in some sort of progress bar or notification.
I think a good place to start would be a user-enabled homedir backup utility. Something graphical, simple but reasonably configurable (exclusion lists, inclusion lists, etc). Then move on from there on how to invoke it automatically.
But we're not really anywhere close to the latter afaict.
-sv
The question in my mind is: is an automatic homedir backup concept attainable?
I think the word we're going to keep running into trouble on is 'automatic'. We've got to get the user to leave it the hell alone for long enough to get the data and you're going to need to tell the user how long that will be in some sort of progress bar or notification.
I think a good place to start would be a user-enabled homedir backup utility. Something graphical, simple but reasonably configurable (exclusion lists, inclusion lists, etc). Then move on from there on how to invoke it automatically.
But we're not really anywhere close to the latter afaict.
I could definitely see a good place to start is to make the backup software nagware. It tracks the lack backup date and throbs like the rhn-applet until the user runs its until completion.
-sv
On Wed, 15 Sep 2004 13:22:23 -0400, seth vidal skvidal@phy.duke.edu wrote:
I could definitely see a good place to start is to make the backup software nagware. It tracks the lack backup date and throbs like the rhn-applet until the user runs its until completion.
Gee... wouldn't inclusion of rdiff-backup and associated python gui-goodness built on top of it make some sense as a way to flush out some of these /home directory backup issues without layering on the extra complexity of "automatic" "network-aware"? If whatever the stateless solution to backing up and restoring a /home directory actually is... if it can't also be used as a backup and restore for user or admin initiated backups that solution seems too fragile and special purposed to me. Lets cut back on the complexity of autonegotation and just look at the logistics of actually moving syncing the bits over a network and see if this is even plausible, unless your going to demand laptop users not do silly things like dump whole movies onto their disks to watch when they travel. If we can't do user initiated backups with ease..where the user is aware that backup is going on..and is aware of the amount of data and time invovled to complete...we aren't going to come close to being able to doing this when the user is not aware of whats going on.
I for one pledge to beat the crap out of any drop-dead easy system-config* styled interface for doing simple "user or admin initiated" home directory backups to disk or to a remote server that shows up in Core. A little applet or notification nagware to inform me of a running backup if initiated by 'me the user' or 'me the admin' and to tell me when the last backup wouldn't be an unwelcomed addition.
-jef"preaching to the choir, hoping other people are watching and get the point"spaleta
-jef"preaching to the choir, hoping other people are watching and get the point"spaleta
Sadly, I'm in the choir. I'm one of those crazy guys who start small and try to add things on as the idea progresses.
So who has the time? Who has the interest?
-sv
On Wed, 2004-09-15 at 13:20 -0400, seth vidal wrote:
I think the word we're going to keep running into trouble on is 'automatic'. We've got to get the user to leave it the hell alone for long enough to get the data and you're going to need to tell the user how long that will be in some sort of progress bar or notification.
I think a good place to start would be a user-enabled homedir backup utility. Something graphical, simple but reasonably configurable (exclusion lists, inclusion lists, etc). Then move on from there on how to invoke it automatically.
But we're not really anywhere close to the latter afaict.
I definitely agree here. It can't be an all-or-nothing proposition. And there has to be provisions to exclude certain directories (if I exclude my ~/rpm heirarchy, I cut out quite a few gigs right there). And the ability to 'pause' the sync for those cases where the user knows they aren't going to be up long and/or really need the pipe so they can get that critical file for their boss.
On Wed, 2004-09-15 at 12:09 -0400, Owen Taylor wrote:
So, you might want to look at it as "backup only when on these networks". I think it's pretty reasonable to assume that people have lots of bandwidth at home and at work these days.
Which presents the question, is there any attributes that NetworkManager can expose about the network that would help this? Time connected so far attribute (though that wouldn't really tell you anything about what the user might do 5 seconds from now when they pull the plug and walk out of the coffee shop)?
NetworkManager doesn't have a concept of profiles, since that was a specific exclusion from the beginning (profiles suck). I'm not quite sure how to go about a "backup only when on these networks", except perhaps for these two ideas:
1) on wired networks, use your hostname as returned via DHCP, match that against a "home network" sort of thing. But remember, NetworkManager keeps the hostname of the actual machine constant (because otherwise X falls over and dies), so NM would save the hostname right before setting it back and expose that via DBus
2) On wireless networks, we could key off of the ESSID of the base station to figure out whether you were on a "home" network or not.
3) other, more complicated ways?
Dan
- On wireless networks, we could key off of the ESSID of the base
station to figure out whether you were on a "home" network or not.
umm - something like 90% of all NetGear WAPS are 'netgear' as the ESSID.
I think that might be unsafe to key off of. :)
-sv
I'd imagine that MAC addresses were more likely to work, given the whole 'designed uniqueness' thing. -R
On Wednesday 15 September 2004 01:07 pm, seth vidal wrote:
- On wireless networks, we could key off of the ESSID of the base
station to figure out whether you were on a "home" network or not.
umm - something like 90% of all NetGear WAPS are 'netgear' as the ESSID.
I think that might be unsafe to key off of. :)
-sv
On Wed, 2004-09-15 at 13:10, Roberto Peon wrote:
I'd imagine that MAC addresses were more likely to work, given the whole 'designed uniqueness' thing.
you mean 'theoretical uniqueness'
There are MAC duplicates out there. Trust me. :)
-sv
Well, design is often theoretical, yes. =)
-R
On Wednesday 15 September 2004 01:13 pm, seth vidal wrote:
On Wed, 2004-09-15 at 13:10, Roberto Peon wrote:
I'd imagine that MAC addresses were more likely to work, given the whole 'designed uniqueness' thing.
you mean 'theoretical uniqueness'
There are MAC duplicates out there. Trust me. :)
-sv
seth vidal wrote:
you mean 'theoretical uniqueness'
There are MAC duplicates out there. Trust me. :)
-sv
We bet a lot of things on theoretical uniqueness in computing. Short of spoofing and freakish occurances im personally willing to bet my "home" on the uniqueness of a mac address.
Michael Favia wrote:
We bet a lot of things on theoretical uniqueness in computing. Short of spoofing and freakish occurances im personally willing to bet my "home" on the uniqueness of a mac address.
Really? Even though there is hardware out there that lets the user manually assign the MAC and that there are routers out there that "spoof" MACs for use with broadband ISP access controls?
Carwyn
Well, the only way I see of uniquely and securely determining if this is the network you want is not an easy problem-- it would require changes in a lot of routing infrastructure...
In any case, I can't think of another ID -likely- to be unique on the network.
-R
On Wednesday 15 September 2004 01:48 pm, Carwyn Edwards wrote:
Michael Favia wrote:
We bet a lot of things on theoretical uniqueness in computing. Short of spoofing and freakish occurances im personally willing to bet my "home" on the uniqueness of a mac address.
Really? Even though there is hardware out there that lets the user manually assign the MAC and that there are routers out there that "spoof" MACs for use with broadband ISP access controls?
Carwyn
Carwyn Edwards wrote:
Michael Favia wrote:
Short of spoofing
Really? Even though there is hardware out there that lets the user manually assign the MAC and that there are routers out there that "spoof" MACs for use with broadband ISP access controls?
Yes that is why i said short of spoofing. In fact, most comercial routers (Netgear, Cisco, Linksys) and OS's (Windows, linux, mac surely but uncertian?) allow you to spoof your MAC addres these days. The ability to do so is crucial to the redundancy of the internet. However, Consider this: If the MAC address is spoofed *and* it happens to be spoofed with the same 6 byte address as another MAC address you have recorded in your preferences/syncing file there is most likely some sort of relationship present. This is the reason for spoofing (Backup router, etc). I'm not saying this is entirely certian by any means safe but that with my limited understanding i am willing to accept the risks and i believe a resonable person might agree. That after all is the real question.
On Wed, 2004-09-15 at 13:04 -0500, Michael Favia wrote:
Yes that is why i said short of spoofing. In fact, most comercial routers (Netgear, Cisco, Linksys) and OS's (Windows, linux, mac surely but uncertian?) allow you to spoof your MAC addres these days. The ability to do so is crucial to the redundancy of the internet. However, Consider this: If the MAC address is spoofed *and* it happens to be spoofed with the same 6 byte address as another MAC address you have recorded in your preferences/syncing file there is most likely some sort of relationship present. This is the reason for spoofing (Backup router, etc). I'm not saying this is entirely certian by any means safe but that with my limited understanding i am willing to accept the risks and i believe a resonable person might agree. That after all is the real question.
I think trying to determine location via Layer 2 means is going to be a big mistake. Networks tend to be very amorphous these days and node location is very non-physical. The workstation should not care at all WHERE it is, it's more of a question of "can I get to where I need to?" If contact can be made to the "home directory backup server", it is able to sync. Based on factors such as bandwidth and the like, it may be more or less aggressive in it's backup, but will still be able to attempt it. Many companies have people that may never connect to the local physical network, rather they work from home, client site, starbucks, neighbors wireless hookup, etc.
MAC address should play no part in this. Do you really want all of the client systems to stop backing up because you had to change an interface card in the router? Do you want to have to keep the same MAC address on that route forever so that you don't have to change every client in the org that may connect on that subnet?
Didn't think so.
David Hollis wrote:
I think trying to determine location via Layer 2 means is going to be a big mistake.
Perhaps you are correct.
Networks tend to be very amorphous these days and node location is very non-physical.
Very valid point. Floating between wireless access point on same network and all included.
The workstation should not care at all WHERE it is, it's more of a question of "can I get to where I need to?" If contact can be made to the "home directory backup server", it is able to sync.
Yes but the reason someone raised the MAC address issue was a bandwidth concern. For instance you can reach my server from the internet via cellphone infrared internet access (or dialup, public coffeshop) but i wouldnt want to sync from there most of the time. Especially if my work was more sensitive (financial, etc).
MAC address should play no part in this. Do you really want all of the client systems to stop backing up because you had to change an interface card in the router? Do you want to have to keep the same MAC address on that route forever so that you don't have to change every client in the org that may connect on that subnet?
I dont really have a problem with eternally spoofing the router address but you convinced me that it isnt necessary or a good reference point to judge ability/willingness to sync. Perhaps what we really want is this:
1. Assume can reach sync server 2. If we are on a acknowledged MAC sync up without asking. 3. If we aren't test bandwidth and assess amount to transfer then present user with a nice throbber (with size of transfer, avg speed and time expected) that allows him/her to decide or perhaps set a rule to decide (e.g. under 5 mins sync auto).
Regardless you have raised a good point that MAC addresses are NOT the determining factor in our willingness/desire to sync. Instead bandwidth, expected time on connection and perhaps the security of the connection really determine it. In either case administrators could write rule that prohibit outside syncing if desired (via MAC rules) and normal users could benefit from increased portability. Sound better?
There was sombody here who mentioned taking advantage of the auth mecanism in SSH to decide which network we are on
Another who mentioned that backupserver.location.organisation.com did not have to be the same machine - what if i was normaly resided in Oslo, (domain oslo.organisation.com - my computer gets this from dhcp - and sice this is my "main" location - my main backupserver is backupserver.oslo.organisation.com) Then i travel to the Sidney office - and the dhcp happily tells me that my domain is "sidney.organisation.com" - which means that the backupserver is backupserver.sidney.organisation.com. Both these backupservers has an alias - "backupserver". And my laptop happily connects to that through the internal LAN (fat pipe). But the best thing is: "something" in my profile tells the server that my main site is oslo - so while i am in flight home - the sidney server slowly sends the backup "back home" throug a small pipe.
But what if am in Bangkok - and there are no office there (no backupserver)? Well, then i could tell the machine (manually) to backup anyway - using ssh (it should anyway use ssh in order to establish that it is really on a thrusted network) back to my "home" backupserver.
And all this could be managed by the user from a small taskbar applet (se my former mail)
Kyrre
ons, 15.09.2004 kl. 21.20 skrev Michael Favia:
David Hollis wrote:
I think trying to determine location via Layer 2 means is going to be a big mistake.
Perhaps you are correct.
Networks tend to be very amorphous these days and node location is very non-physical.
Very valid point. Floating between wireless access point on same network and all included.
The workstation should not care at all WHERE it is, it's more of a question of "can I get to where I need to?" If contact can be made to the "home directory backup server", it is able to sync.
Yes but the reason someone raised the MAC address issue was a bandwidth concern. For instance you can reach my server from the internet via cellphone infrared internet access (or dialup, public coffeshop) but i wouldnt want to sync from there most of the time. Especially if my work was more sensitive (financial, etc).
MAC address should play no part in this. Do you really want all of the client systems to stop backing up because you had to change an interface card in the router? Do you want to have to keep the same MAC address on that route forever so that you don't have to change every client in the org that may connect on that subnet?
I dont really have a problem with eternally spoofing the router address but you convinced me that it isnt necessary or a good reference point to judge ability/willingness to sync. Perhaps what we really want is this:
- Assume can reach sync server
- If we are on a acknowledged MAC sync up without asking.
- If we aren't test bandwidth and assess amount to transfer then
present user with a nice throbber (with size of transfer, avg speed and time expected) that allows him/her to decide or perhaps set a rule to decide (e.g. under 5 mins sync auto).
Regardless you have raised a good point that MAC addresses are NOT the determining factor in our willingness/desire to sync. Instead bandwidth, expected time on connection and perhaps the security of the connection really determine it. In either case administrators could write rule that prohibit outside syncing if desired (via MAC rules) and normal users could benefit from increased portability. Sound better?
-- Michael Favia michael at insitesinc dot com Insites Incorporated http://michael.insitesinc.com
On 15/09/2004, at 19:48, Carwyn Edwards wrote:
Michael Favia wrote:
We bet a lot of things on theoretical uniqueness in computing. Short of spoofing and freakish occurances im personally willing to bet my "home" on the uniqueness of a mac address.
Really? Even though there is hardware out there that lets the user manually assign the MAC and that there are routers out there that "spoof" MACs for use with broadband ISP access controls?
I can manually override the MAC address of my PowerBook's Ethernet ports. Also, in clustering environments, it's pretty normal to have duplicated MAC addresses.
On Wed, Sep 15, 2004 at 01:06:26PM -0400, Dan Williams wrote:
- on wired networks, use your hostname as returned via DHCP, match that
against a "home network" sort of thing. But remember, NetworkManager
MAC address of router is another good hint
On Wed, 2004-09-15 at 13:08 -0400, Alan Cox wrote:
On Wed, Sep 15, 2004 at 01:06:26PM -0400, Dan Williams wrote:
- on wired networks, use your hostname as returned via DHCP, match that
against a "home network" sort of thing. But remember, NetworkManager
MAC address of router is another good hint
If that's the case, then NetworkManager doesn't need to do anything :)
Dan
On Wed, 2004-09-15 at 13:12 -0400, Dan Williams wrote:
On Wed, 2004-09-15 at 13:08 -0400, Alan Cox wrote:
On Wed, Sep 15, 2004 at 01:06:26PM -0400, Dan Williams wrote:
- on wired networks, use your hostname as returned via DHCP, match that
against a "home network" sort of thing. But remember, NetworkManager
MAC address of router is another good hint
If that's the case, then NetworkManager doesn't need to do anything :)
Dan
Though router mac is certainly subject to change. I could by on my company network but on a different segment and have a different gateway mac. Or when the network guys finally decide to ditch that old Wellfleet router and put in something from this century....
And what about VPN connectivity? If I have a VPN open, I'm cool with it synching files across that - though there still is the bandwidth congestion issue of course. I often go months at a time before I'm back in the office and that's a long time to not have things backed up. I think the test/validation/whatever-you-want-to-call-it may need to be more of a "can I access X location" type of test. It could be as simple as a ping or something slightly more substantial as a TCP connection. You can also get some crude metrics of latency that may be useful to tune how much/how fast the sync will go.
On Wed, 2004-09-15 at 13:06, Dan Williams wrote:
On Wed, 2004-09-15 at 12:09 -0400, Owen Taylor wrote:
So, you might want to look at it as "backup only when on these networks". I think it's pretty reasonable to assume that people have lots of bandwidth at home and at work these days.
Which presents the question, is there any attributes that NetworkManager can expose about the network that would help this? Time connected so far attribute (though that wouldn't really tell you anything about what the user might do 5 seconds from now when they pull the plug and walk out of the coffee shop)?
NetworkManager doesn't have a concept of profiles, since that was a specific exclusion from the beginning (profiles suck). I'm not quite sure how to go about a "backup only when on these networks", except perhaps for these two ideas:
I don't have a good answer to this, but I do think the concept of a "network" corresponds to a fairly definite idea in the user's head, so it at least potentially something that we can expose in the user interface. And "Do A only on network B" may be something we need more of in the future.
- on wired networks, use your hostname as returned via DHCP, match that
against a "home network" sort of thing. But remember, NetworkManager keeps the hostname of the actual machine constant (because otherwise X falls over and dies), so NM would save the hostname right before setting it back and expose that via DBus
The hostname that dhcp returns should be the same as a reverse DNS lookup on the network address, right?
- On wireless networks, we could key off of the ESSID of the base
station to figure out whether you were on a "home" network or not.
- other, more complicated ways?
The best idea that comes to my mind is:
- Use a restrictive criterion like MAC address of the router that seldom gives false positives for "am on the same network"
- Then have a "bookmark this network" function to give a name to the current network; if the hardware changes or some such, then the user will have to rebookmark the network, which is a little annoying, but managable.
We could also try to establish a "Fedora" way to authoritatively identify and name networks so that network administrators who cared could do better. (Extra bit of information from DHCP, extra DNS records, whatever.)
Generally, networks probably correspond pretty closely to domains for sufficiently well administered networks... e.g., my current network is "boston.redhat.com", but I don't think you want to rely on that congruence.
Regards, Owen
<summary>Stuff about identifying which network we are on to see if we want to do backup.</summary>
If I'm away at a conference with my laptop and they have WiFi access to the internet, I don't care which network I am on. I still want to back up my home directory though.
If I'm working in a multi site corporation that has multiple backup servers (one per site perhaps). I want to use the local backup server not the one back home that's through a thin pipe. I'd want my laptop to discover the local backup server and authenticate it to me and me to it.
Ok these ideas don't quite work with the diskless Stateless model but they are still valid when talking about homedir backup.
Carwyn
On Wed, 2004-09-15 at 19:15 +0100, Carwyn Edwards wrote:
<summary>Stuff about identifying which network we are on to see if we want to do backup.</summary>
If I'm away at a conference with my laptop and they have WiFi access to the internet, I don't care which network I am on. I still want to back up my home directory though.
If I'm working in a multi site corporation that has multiple backup servers (one per site perhaps). I want to use the local backup server not the one back home that's through a thin pipe. I'd want my laptop to discover the local backup server and authenticate it to me and me to it.
Ok these ideas don't quite work with the diskless Stateless model but they are still valid when talking about homedir backup.
Carwyn
If the client is configured to connect to a system via DNS name, that provides the enterprise with some flexibility like this. In Australia, homebackupsvr.corp.org could resolve to a different box than in the UK, etc. On the backend, I could workout how to merge them together on faster pipes. Such merging could get sticky of course and may be more of a 2.0 kind of thing.
Carwyn Edwards wrote:
If I'm away at a conference with my laptop and they have WiFi access to the internet, I don't care which network I am on. I still want to back up my home directory though.
And you'd be able to define the choice to do so. This concerned the ability to say: "Only sync when attached Here and Here". I'm sure an always attempt to sync would be available.
Dan Williams wrote:
On Wed, 2004-09-15 at 12:09 -0400, Owen Taylor wrote:
So, you might want to look at it as "backup only when on these networks". I think it's pretty reasonable to assume that people have lots of bandwidth at home and at work these days.
Which presents the question, is there any attributes that NetworkManager can expose about the network that would help this? Time connected so far attribute (though that wouldn't really tell you anything about what the user might do 5 seconds from now when they pull the plug and walk out of the coffee shop)?
NetworkManager doesn't have a concept of profiles, since that was a specific exclusion from the beginning (profiles suck). I'm not quite sure how to go about a "backup only when on these networks", except perhaps for these two ideas:
- on wired networks, use your hostname as returned via DHCP, match that
against a "home network" sort of thing. But remember, NetworkManager keeps the hostname of the actual machine constant (because otherwise X falls over and dies), so NM would save the hostname right before setting it back and expose that via DBus
- On wireless networks, we could key off of the ESSID of the base
station to figure out whether you were on a "home" network or not.
- other, more complicated ways?
I think the way this should work is:
Can I make a _secure_ connection to _my_ server? (Think ssh connection with the keys set up to know that the other side is who you think it is.) If so, start the backup process over that link. Do scheduling of the network traffic so user-initiated traffic gets higher priority than the backup. Keep in mind that the backup may be initiated by the user as well and would need to get scheduled as such. The user needs a way to tell the system "Don't do that right now" in case they are on an expensive link, or know something else the computer doesn't know, can't know, or misunderstands. There should also be a user-configurable daemon (or whatever) that can tell the backup system whether it should do its thing right now or not, based on arbitrary factors. For instance: battery life, disk is spun down, we're plugged into power, it's been X hours, we have Y kB of deltas that need to be backed up, certain important files have been modified, etc., etc.
Otherwise, you're confusing "where I am" with "what I can do". Sometimes they correlate, but they aren't really the same.
Eli --------------------. "If it ain't broke now, Eli Carter \ it will be soon." -- crypto-gram eli.carter(a)inet.com `-------------------------------------------------
------------------------------------------------------------------------ Confidentiality Notice: This e-mail transmission may contain confidential and/or privileged information that is intended only for the individual or entity named in the e-mail address. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or reliance upon the contents of this e-mail message is strictly prohibited. If you have received this e-mail transmission in error, please reply to the sender, so that proper delivery can be arranged, and please delete the message from your computer. Thank you. Inet Technologies, Inc. ------------------------------------------------------------------------
On Wed, 2004-09-15 at 19:57, Eli Carter wrote:
I think the way this should work is:
Can I make a _secure_ connection to _my_ server? (Think ssh connection with the keys set up to know that the other side is who you think it is.) If so, start the backup process over that link. Do scheduling of the network traffic so user-initiated traffic gets higher priority than the backup. Keep in mind that the backup may be initiated by the user as well and would need to get scheduled as such. The user needs a way to tell the system "Don't do that right now" in case they are on an expensive link, or know something else the computer doesn't know, can't know, or misunderstands. There should also be a user-configurable daemon (or whatever) that can tell the backup system whether it should do its thing right now or not, based on arbitrary factors. For instance: battery life, disk is spun down, we're plugged into power, it's been X hours, we have Y kB of deltas that need to be backed up, certain important files have been modified, etc., etc.
Otherwise, you're confusing "where I am" with "what I can do". Sometimes they correlate, but they aren't really the same.
I agree with this. I'm not sure that it ought to matter where the machine is, provided that the link is good.
Perhaps the most basic question for an automatic sync is "is the link likely to disappear whilst I'm working ?" We can guess that links with certain qualities are bad (bandwidth too low, connection hasn't been available or stable for a period of x seconds, user specifies that that link should not be used for sync).
We can't automatically tell when the user will want to disconnect the link themselves, so the concept of arranging completely automatic (rather than scheduled) backups may be problematic unless the backups can be safely broken up into small sets that can complete quickly. The Windows file sync software can automatically backup on login and logout. It drove me nuts that it would spend several minutes working through my entire home directory after I had decided that I wanted to switch off the laptop.
On Thu, 2004-09-16 at 00:48 +0100, Stuart Ellis wrote:
Perhaps the most basic question for an automatic sync is "is the link likely to disappear whilst I'm working ?" We can guess that links with certain qualities are bad (bandwidth too low, connection hasn't been available or stable for a period of x seconds, user specifies that that link should not be used for sync).
We can't automatically tell when the user will want to disconnect the link themselves, so the concept of arranging completely automatic (rather than scheduled) backups may be problematic unless the backups can be safely broken up into small sets that can complete quickly. The Windows file sync software can automatically backup on login and logout. It drove me nuts that it would spend several minutes working through my entire home directory after I had decided that I wanted to switch off the laptop.
Windows provides a good example of how it should not work. It needs to be something can be performed in the backgroun and can be interrupted and recover gracefully. And it shouldn't be triggered by login/logout. Those are probably the times when the user wants it least. In the morning they boot up and want to get into email/web whatever to start getting stuff done. The last thing you need is the drive sitting there chugging while it determines what it needs to sync. At logout, you're heading home the for the day, trying to beat traffic, etc and the last thing you want is to have to wait for that 650MB iso you just downloaded to sync!
tor, 16.09.2004 kl. 02.09 skrev David Hollis:
On Thu, 2004-09-16 at 00:48 +0100, Stuart Ellis wrote:
Perhaps the most basic question for an automatic sync is "is the link likely to disappear whilst I'm working ?" We can guess that links with certain qualities are bad (bandwidth too low, connection hasn't been available or stable for a period of x seconds, user specifies that that link should not be used for sync).
We can't automatically tell when the user will want to disconnect the link themselves, so the concept of arranging completely automatic (rather than scheduled) backups may be problematic unless the backups can be safely broken up into small sets that can complete quickly. The Windows file sync software can automatically backup on login and logout. It drove me nuts that it would spend several minutes working through my entire home directory after I had decided that I wanted to switch off the laptop.
Windows provides a good example of how it should not work. It needs to be something can be performed in the backgroun and can be interrupted and recover gracefully. And it shouldn't be triggered by login/logout. Those are probably the times when the user wants it least. In the morning they boot up and want to get into email/web whatever to start getting stuff done. The last thing you need is the drive sitting there chugging while it determines what it needs to sync. At logout, you're heading home the for the day, trying to beat traffic, etc and the last thing you want is to have to wait for that 650MB iso you just downloaded to sync!
-- David Hollis dhollis@davehollis.com
Sounds kinda like the windows network at school - it has a 2-10 minutes login time - and a logout time which is something of the same. This is due to Windows dowloading the WHOLE profile - instead of the Unix model with hiden config file and the desktop as a folder. Makes quite a few use the Linux machines instead - they use 10 sec max - and that is when the server is under high load (it is one of the thick clients that was to old to run XP, no DMA support, while the windows guys got an Xeon with SCSI disks...), and using a slow machine (600 mhz celly/128 MB RAM running fc2).
On Thursday 16 September 2004 04:50 pm, Kyrre Ness Sjobak wrote:
tor, 16.09.2004 kl. 02.09 skrev David Hollis:
On Thu, 2004-09-16 at 00:48 +0100, Stuart Ellis wrote:
Perhaps the most basic question for an automatic sync is "is the link likely to disappear whilst I'm working ?" We can guess that links with certain qualities are bad (bandwidth too low, connection hasn't been available or stable for a period of x seconds, user specifies that that link should not be used for sync).
We can't automatically tell when the user will want to disconnect the link themselves, so the concept of arranging completely automatic (rather than scheduled) backups may be problematic unless the backups can be safely broken up into small sets that can complete quickly. The Windows file sync software can automatically backup on login and logout. It drove me nuts that it would spend several minutes working through my entire home directory after I had decided that I wanted to switch off the laptop.
Windows provides a good example of how it should not work. It needs to be something can be performed in the backgroun and can be interrupted and recover gracefully. And it shouldn't be triggered by login/logout. Those are probably the times when the user wants it least. In the morning they boot up and want to get into email/web whatever to start getting stuff done. The last thing you need is the drive sitting there chugging while it determines what it needs to sync. At logout, you're heading home the for the day, trying to beat traffic, etc and the last thing you want is to have to wait for that 650MB iso you just downloaded to sync!
-- David Hollis dhollis@davehollis.com
Sounds kinda like the windows network at school - it has a 2-10 minutes login time - and a logout time which is something of the same. This is due to Windows dowloading the WHOLE profile - instead of the Unix model with hiden config file and the desktop as a folder. Makes quite a few use the Linux machines instead - they use 10 sec max - and that is when the server is under high load (it is one of the thick clients that was to old to run XP, no DMA support, while the windows guys got an Xeon with SCSI disks...), and using a slow machine (600 mhz celly/128 MB RAM running fc2).
Could we build a files-changed journal using dnotify or FAM, in order to get rid of the scanning time?
Ideally, you would know which parts of the file(s) were updated, and only ship those parts, but dnotify and FAM don't have that level of specificity,
If we took this approach, then we'd have a near-optimal update time. The merge problem, of course, still exists, but it will -always- exist unless you update from the server on login and to the server on logout.
-R
otaylor@redhat.com said:
Well, a bigger problem with rsync is that in many cases, the listing files part is the biggest time sync. And if that gets interrupted, you start from the beginning. (*)
Maybe I'm incorrect, but this all the more true when rsync is doing checksums. Remove the --checksum from the command line and the listing files starts to be negligible (at least when lots of files are changed).
So, starting a backup via wireless/vpn when you open your laptop for 5 minutes at the coffee shop doesn't usually make sense.
Rigt, but it would make a lot of sense of letting the user decides. There are two things needed:
When a backup started, the popup displayed should show that a backup is in progress and ask for what to decide (reboot later or postpone the backup). This should allow
At connexion time, let gdm have an option "no backup for this session" and if the computer is connected on a given network for the first time, prompt the user with a choice:
- never backup on this network. - always backup on this network [ optionnaly throught a VPN]. - always ask me for this network.
That should pretty much cover all the situations, at least for laptops. For workstations, basically all this is not necessary IMHO (configure everything at installation time).
One important feature is that a backup should always be interruptible leaving the computer in a coherent state. Another question is who should initiate the backup. Should it be the laptop/workstation, or should it be the network server holding the backup (which makes sense to avoid congestion when everyone is connecting its laptop on the network in the morning).
I have heard about dirvish for doing such server initiated backups. Anyone has experience with it ?
Theo.
-------------------------------------------------------------------- Theodore Papadopoulo Email: Theodore.Papadopoulo@sophia.inria.fr Tel: (33) 04 92 38 76 01 --------------------------------------------------------------------
On Wed, 15 Sep 2004 12:09:28 -0400, Owen Taylor otaylor@redhat.com wrote:
Well, a bigger problem with rsync is that in many cases, the listing files part is the biggest time sync. And if that gets interrupted, you start from the beginning. (*)
Hey this is totally different approach: What if each user's home directory was an encrypted file (mounted as a local loopback). These are laptops usually- for the need of occasional sync, so why not also encrypt them also in case of theft, secrets, etc.
Then as an option at login you can answer yes/no if you would like your profile (encrypted file) sync'ed before login. Then the rdiff does a compare against one (albeit large) file remotely.
Bonuses: encrypted, safe from losing laptop/local computer, at time of login- exactly the same profile/home directory
Drawbacks: slow?, container size of encrypted file, version control management difficult?
Just a thought! --Josiah (RHCE)
Rudi Chiarito wrote:
Another problem to worry about is saturation of the link upstream. I'm sure the average user wouldn't want the browser choked by rsync. Yes, you can tell rsync to use at most N KB/s, but that's not always easy to get right, if the user is in the position to estimate it at all - not to mention that link speed might change at any time for e.g. mobile users.
I've always wondered why applications are so greedy individually. Is there no mechanism to throttle requested bandwidth between apps? I often run into instances when a bit torrent uplink is saturating my uplink and crippling my web browsing capabilities because i dont even have enough space to send requests (id imagine thats the cause any way). Obviously i could manually divide my bandwidth but it often changes (laptop and on cable modem with variable up/down at home, bottomless connection speeds at work). Is the overhead of such a monitoring system too high for the benefit? Has it been attempted? There seem to be so many advantages to such a system with the increasing popularity of higbandwidth activities and the general user (Bittorrent, video on demand, aMule, Music services) It just seems like a self auditing network interface would make sense here.
On Wed, 2004-09-15 at 10:22, Michael Favia wrote:
Is there no mechanism to throttle requested bandwidth between apps?
Take a look at ebtables and the network QOS in 2.6. A friend of mine has set up a small router whose sole purpose in life is to limit BitTorrent traffic so that his ssh sessions aren't painfully slow. He says it works great.
-JE
Yes, use the SFQ or another fair scheduler for the network stuff.
-R
On Wednesday 15 September 2004 01:22 pm, Michael Favia wrote:
Rudi Chiarito wrote:
Another problem to worry about is saturation of the link upstream. I'm sure the average user wouldn't want the browser choked by rsync. Yes, you can tell rsync to use at most N KB/s, but that's not always easy to get right, if the user is in the position to estimate it at all - not to mention that link speed might change at any time for e.g. mobile users.
I've always wondered why applications are so greedy individually. Is there no mechanism to throttle requested bandwidth between apps? I often run into instances when a bit torrent uplink is saturating my uplink and crippling my web browsing capabilities because i dont even have enough space to send requests (id imagine thats the cause any way). Obviously i could manually divide my bandwidth but it often changes (laptop and on cable modem with variable up/down at home, bottomless connection speeds at work). Is the overhead of such a monitoring system too high for the benefit? Has it been attempted? There seem to be so many advantages to such a system with the increasing popularity of higbandwidth activities and the general user (Bittorrent, video on demand, aMule, Music services) It just seems like a self auditing network interface would make sense here.
-- Michael Favia michael at insitesinc dot com Insites Incorporated http://michael.insitesinc.com
On Sep 15, 2004, "Bryan K. Wright" bryan@ayesha.phys.Virginia.EDU wrote:
Back to my original suggestion of a "RAID 1" mirror composed of a local disk and a network block device:
Better do LVM mirroring than RAID 1, otherwise you'll end up having to resync the entire volume every time something changes. I understand LVM mirrors do it on an extent basis.
But what would be really appropriate for this application is inter-mezzo: keeping the FS transaction log and using that to tell what to send to the other replica sounds like the right way to do this kind of synchronization. Too bad inter-mezzo is dead :-(
On Thu, 16 Sep 2004 00:43, "Bryan K. Wright" bryan@ayesha.phys.Virginia.EDU wrote:
Another possible user-space option would be something based on SGI::FAM.
We are moving away from FAM for security reasons. Giving all user processes access to a daemon running with read access to all files on disk is not something that we desire.
Also doesn't dnotify etc take significant amounts of RAM when monitoring large numbers of files?
Moving out of user space, and requiring some of development, you could have the kernel's VFS layer generate a notice, maybe via DBUS, whenever a file changes. It'd be nice to be able to turn this on only
This has some awkward possibilities. I can imagine DBUS changing a file, causing a notification which then makes DBUS change a file...
On Sat, Oct 02, 2004 at 12:55:35AM +1000, Russell Coker wrote:
On Thu, 16 Sep 2004 00:43, "Bryan K. Wright" bryan@ayesha.phys.Virginia.EDU wrote:
Another possible user-space option would be something based on SGI::FAM.
We are moving away from FAM for security reasons. Giving all user processes access to a daemon running with read access to all files on disk is not something that we desire.
Also doesn't dnotify etc take significant amounts of RAM when monitoring large numbers of files?
The problem is rather that it requires to open all the directories containing said file and keep them open until monitoring is not needed anymore. Inotify kernel support should fix this.
Daniel
On Sat, 2004-10-02 at 00:55 +1000, Russell Coker wrote:
Moving out of user space, and requiring some of development, you could have the kernel's VFS layer generate a notice, maybe via DBUS, whenever a file changes. It'd be nice to be able to turn this on only
This has some awkward possibilities. I can imagine DBUS changing a file, causing a notification which then makes DBUS change a file...
We've already been there and done that a few times (hint, don't write to a log file when you get file change notifications... ;-))
Anyway, it can be managed.
Havoc