Hi,
Just noticed this change on rawhide... https://github.com/systemd/systemd/blob/master/NEWS#L29 * systemd-logind will now by default terminate user processes that are part of the user session scope unit (session-XX.scope) when the user logs out. This behavior is controlled by the KillUserProcesses= setting in logind.conf, and the previous default of "no" is now changed to "yes". This means that user sessions will be properly cleaned up after, but additional steps are necessary to allow intentionally long-running processes to survive logout.
While the user is logged in at least once, user@.service is running, and any service that should survive the end of any individual login session can be started at a user service or scope using systemd-run. systemd-run(1) man page has been extended with an example which shows how to run screen in a scope unit underneath user@.service. The same command works for tmux.
After the user logs out of all sessions, user@.service will be terminated too, by default, unless the user has "lingering" enabled. To effectively allow users to run long-term tasks even if they are logged out, lingering must be enabled for them. See loginctl(1) for details. The default polkit policy was modified to allow users to set lingering for themselves without authentication.
Previous defaults can be restored at compile time by the --without-kill-user-processes option to "configure".
So, now, I've read this and I could possibly remember to use systemd-run or to set myself as lingering... Except that I don't want to have to go through the pain of remembering to either change the system config on all my servers or always starting stuff with systemd-run if it's a bit long and I think I might want to ^Z/bg/disown it to let it finish.
Thinking further when my users get that update I don't see myself telling them to do that when they want to start a screen/tmux/nohup-job, users do not read every update changelogs (tbh I don't either unless there's a problem); and they probably wouldn't think of systemd if they ever get that particular issue.. heck they probably don't even know what systemd and logind are (even if yes, they are "advanced" enough to ssh into other servers to run *long* tasks that must continue overnight/when the user logs out ; it doesn't mean they know what they're using exactly)
Sure, this change will work for the whole probably targetted audience of simple desktop users on shared workstations where we probably want to kill lingering processes; but how much is that compared to servers ?
I know that if this gets through I will have to change the system default on all my servers... And while the big batches of thousands of compute nodes are automated there's still quite a few places to update, especially since this will be the first time we need to change logind.conf so it's not just adding a line to a file already propagated
Anyway, I don't really want to start (yet) a(nother) troll on systemd, I appreciate it's also brought good things; I'd just like the default values to be sane for most of the users. I did not see any discussion about this particular setting in the systemd-devel mailing list so I have hope that it is still open to change, but I'd rather start with a community where there are more admins who will likely agree that this change will do more harm than good.
Even if nothing comes out of it, at least more people will be aware of the issue and will be able to prepare to avoid most of the chaos that will come if this stays like this...
Thanks for reading,
On Fri, May 27, 2016 at 5:51 AM, Dominique Martinet dominique.martinet@cea.fr wrote:
Hi,
Just noticed this change on rawhide... https://github.com/systemd/systemd/blob/master/NEWS#L29
systemd-logind will now by default terminate user processes that are part of the user session scope unit (session-XX.scope) when the user logs out. This behavior is controlled by the KillUserProcesses= setting in logind.conf, and the previous default of "no" is now changed to "yes". This means that user sessions will be properly cleaned up after, but additional steps are necessary to allow intentionally long-running processes to survive logout.
While the user is logged in at least once, user@.service is running, and any service that should survive the end of any individual login session can be started at a user service or scope using systemd-run. systemd-run(1) man page has been extended with an example which shows how to run screen in a scope unit underneath user@.service. The same command works for tmux.
After the user logs out of all sessions, user@.service will be terminated too, by default, unless the user has "lingering" enabled. To effectively allow users to run long-term tasks even if they are logged out, lingering must be enabled for them. See loginctl(1) for details. The default polkit policy was modified to allow users to set lingering for themselves without authentication.
Previous defaults can be restored at compile time by the --without-kill-user-processes option to "configure".
So, now, I've read this and I could possibly remember to use systemd-run or to set myself as lingering... Except that I don't want to have to go through the pain of remembering to either change the system config on all my servers or always starting stuff with systemd-run if it's a bit long and I think I might want to ^Z/bg/disown it to let it finish.
This breaks the storage of ssh-agent credentials for te one-time enabling of SSH credentials for access on running hosts. Gods alone know what else it will break.
On Fri, May 27, 2016 at 08:51:23AM -0400, Nico Kadel-Garcia wrote:
This breaks the storage of ssh-agent credentials for te one-time enabling of SSH credentials for access on running hosts.
You mean you start ssh-agent somewhere during the first login and then access it from any process from further sessions? You can get a setup to work like this by running the agent in a service, like any long running service.
Gods alone know what else it will break.
File the bugs, we'll deal with them one at a time.
Zbyszek
On Fri, May 27, 2016 at 9:13 AM, Zbigniew Jędrzejewski-Szmek zbyszek@in.waw.pl wrote:
On Fri, May 27, 2016 at 08:51:23AM -0400, Nico Kadel-Garcia wrote:
This breaks the storage of ssh-agent credentials for te one-time enabling of SSH credentials for access on running hosts.
You mean you start ssh-agent somewhere during the first login and then access it from any process from further sessions? You can get a setup to work like this by running the agent in a service, like any long running service.
It's a historically useful way to require an authorized user to actually log into the system and unlock the key. It's similar to the requirement of secure Kerberos servers and Java keystore systems to have a user attend the startup of the daemons, in order to unlock the protected credentials on request and prevent unauthorized use of the service from a stolen backup or disk image.
Gods alone know what else it will break.
File the bugs, we'll deal with them one at a time.
If I could list all the bugs caused by this change, in advance, in all of Fedora userland, I'd be paid a lot more.
On Fri, May 27, 2016 at 06:20:44PM -0400, Nico Kadel-Garcia wrote:
On Fri, May 27, 2016 at 9:13 AM, Zbigniew Jędrzejewski-Szmek zbyszek@in.waw.pl wrote:
On Fri, May 27, 2016 at 08:51:23AM -0400, Nico Kadel-Garcia wrote:
This breaks the storage of ssh-agent credentials for te one-time enabling of SSH credentials for access on running hosts.
You mean you start ssh-agent somewhere during the first login and then access it from any process from further sessions? You can get a setup to work like this by running the agent in a service, like any long running service.
It's a historically useful way to require an authorized user to actually log into the system and unlock the key. It's similar to the requirement of secure Kerberos servers and Java keystore systems to have a user attend the startup of the daemons, in order to unlock the protected credentials on request and prevent unauthorized use of the service from a stolen backup or disk image.
Sure, but there's more than one way to do this. Unless you provide more details, there is now way to guess what is broken for you. Based on your general description, there should be no reason for this to not work.
Zbyszek
On 05/27/2016 11:51 AM, Dominique Martinet wrote:
Just noticed this change on rawhide... https://github.com/systemd/systemd/blob/master/NEWS#L29
systemd-logind will now by default terminate user processes that are part of the user session scope unit (session-XX.scope) when the user logs out. This behavior is controlled by the KillUserProcesses= setting in logind.conf, and the previous default of "no" is now changed to "yes". This means that user sessions will be properly cleaned up after, but additional steps are necessary to allow intentionally long-running processes to survive logout.
While the user is logged in at least once, user@.service is running, and any service that should survive the end of any individual login session can be started at a user service or scope using systemd-run. systemd-run(1) man page has been extended with an example which shows how to run screen in a scope unit underneath user@.service. The same command works for tmux.
After the user logs out of all sessions, user@.service will be terminated too, by default, unless the user has "lingering" enabled. To effectively allow users to run long-term tasks even if they are logged out, lingering must be enabled for them. See loginctl(1) for details. The default polkit policy was modified to allow users to set lingering for themselves without authentication.
Previous defaults can be restored at compile time by the --without-kill-user-processes option to "configure".
...
I did not see any discussion about this particular setting in the systemd-devel mailing list
The commit that made the change: https://github.com/systemd/systemd/commit/97e5530cf20 referenced two bugreports: https://bugs.freedesktop.org/show_bug.cgi?id=94508 https://github.com/systemd/systemd/issues/2900
Michal
On Fri, May 27, 2016 at 11:51:42AM +0200, Dominique Martinet wrote:
Hi,
Just noticed this change on rawhide... https://github.com/systemd/systemd/blob/master/NEWS#L29
- systemd-logind will now by default terminate user processes that are part of the user session scope unit (session-XX.scope) when the user logs out. This behavior is controlled by the KillUserProcesses= setting in logind.conf, and the previous default of "no" is now changed to "yes". This means that user sessions will be properly cleaned up after, but additional steps are necessary to allow intentionally long-running processes to survive logout.
[...]
Sure, this change will work for the whole probably targetted audience of simple desktop users on shared workstations where we probably want to kill lingering processes; but how much is that compared to servers ?
That's always debatable, but I'd say that there are orders of magnitude more desktops than servers into which random users can log in to run jobs. Also note that running jobs in a systemd service has advantages on the server: better accounting, more transparency, logs are easier to read. The (old) default of allowing left-over session processes to live on seems especially bad on a server with multiple users.
I know that if this gets through I will have to change the system default on all my servers... And while the big batches of thousands of compute nodes are automated there's still quite a few places to update, especially since this will be the first time we need to change logind.conf so it's not just adding a line to a file already propagated
It's two lines: [Login]\nKillUserProcesses=no. But please consider switching to the new mode of using systemd-run instead.
Zbyszek
PS. You asked if this was discusses on systemd-devel: it was on the bugtracker. See https://github.com/systemd/systemd/pull/3005, and also https://bugs.freedesktop.org/show_bug.cgi?id=94508.
Once upon a time, Zbigniew Jędrzejewski-Szmek zbyszek@in.waw.pl said:
Also note that running jobs in a systemd service has advantages on the server: better accounting, more transparency, logs are easier to read. The (old) default of allowing left-over session processes to live on seems especially bad on a server with multiple users.
Starting a one-off task under screen and detaching is an age-old server management process. Breaking that is not acceptable IMHO.
On Fri, May 27, 2016 at 08:09:33AM -0500, Chris Adams wrote:
Once upon a time, Zbigniew Jędrzejewski-Szmek zbyszek@in.waw.pl said:
Also note that running jobs in a systemd service has advantages on the server: better accounting, more transparency, logs are easier to read. The (old) default of allowing left-over session processes to live on seems especially bad on a server with multiple users.
Starting a one-off task under screen and detaching is an age-old server management process. Breaking that is not acceptable IMHO.
This change was done for a reason: left-over session processes are causing real problems. You still can start a one-off task under screen, you just need to invoke it in one the different ways described in https://www.freedesktop.org/software/systemd/man/systemd-run.html#Examples
Zbyszek
Once upon a time, Zbigniew Jędrzejewski-Szmek zbyszek@in.waw.pl said:
You still can start a one-off task under screen, you just need to invoke it in one the different ways described in https://www.freedesktop.org/software/systemd/man/systemd-run.html#Examples
This should not be made a default unless programs like screen can be modified to notify systemd directly not to kill them. Another popular use for screen is to not lose a session due to an accidental disconnect; having to change the way you run screen _every time_ is not acceptable.
Zbigniew Jędrzejewski-Szmek wrote:
On Fri, May 27, 2016 at 08:09:33AM -0500, Chris Adams wrote:
Once upon a time, Zbigniew Jędrzejewski-Szmek zbyszek@in.waw.pl said:
Also note that running jobs in a systemd service has advantages on the server: better accounting, more transparency, logs are easier to read. The (old) default of allowing left-over session processes to live on seems especially bad on a server with multiple users.
Starting a one-off task under screen and detaching is an age-old server management process. Breaking that is not acceptable IMHO.
This change was done for a reason: left-over session processes are causing real problems. You still can start a one-off task under screen, you just need to invoke it in one the different ways described in https://www.freedesktop.org/software/systemd/man/systemd-run.html#Examples
Zbyszek
devel mailing list devel@lists.fedoraproject.org https://lists.fedoraproject.org/admin/lists/devel@lists.fedoraproject.org
Similarly, we need a way to fix x2go
On Fri, May 27, 2016 at 7:19 AM, Zbigniew Jędrzejewski-Szmek zbyszek@in.waw.pl wrote:
On Fri, May 27, 2016 at 08:09:33AM -0500, Chris Adams wrote:
Once upon a time, Zbigniew Jędrzejewski-Szmek zbyszek@in.waw.pl said:
Also note that running jobs in a systemd service has advantages on the server: better accounting, more transparency, logs are easier to read. The (old) default of allowing left-over session processes to live on seems especially bad on a server with multiple users.
Starting a one-off task under screen and detaching is an age-old server management process. Breaking that is not acceptable IMHO.
This change was done for a reason: left-over session processes are causing real problems. You still can start a one-off task under screen, you just need to invoke it in one the different ways described in https://www.freedesktop.org/software/systemd/man/systemd-run.html#Examples
I have to agree, but there is a difference in expectations depending on the system.
Fedora Workstation, I expect all processes launched by/owned by me, to be quit on logout. Actually what I expect is by telling GNOME I'm logging out, restarting, or shutting down, that it should send a quit message to all applications. Those applications should be able to interrupt this if there's unsaved data and prompt the user; but better than this would be applications that can save their own state because an application cancelling a reboot is archaic. But often this doesn't work, processes continue to keep the user-session alive because they won't stop running. So we keep seeing these problems on Fedora were the system won't reboot for 1m30s which is the systemd timeout for user sessions that haven't yet quit.
So it's a problem.
Fedora Server, I expect to login, run tmux, start sessions, detach, and log out and I expect those tmux sessions to keep running. If this workflow is going to change I need some super clean and obvious way to know the right new way to do things or I'll just get annoyed and cranky. Running tmux as a systemd service? I don't know how that'll work, and I'm very skeptical that the user should get dinged with a workflow change just because there are some stubborn programs floating around that won't quit without delay.
It seems to me systemd should be able to know the difference between a program that's zombie or unresponsive but isn't doing anything or is unresponsive but is doing something; and if not then some way for programs to say "hey wait just a minute, I need to clean things up" or whatever, rather than just abruptly killing them.
On Fri, 27 May 2016, Chris Murphy wrote:
It seems to me systemd should be able to know the difference between a program that's zombie or unresponsive but isn't doing anything or is unresponsive but is doing something; and if not then some way for programs to say "hey wait just a minute, I need to clean things up" or whatever, rather than just abruptly killing them.
That invention is otherwise known as "unix signals".
systemd should not be the process police. If there is a systematic problem of badly written code leaving orphaned code running when a user logs out, then that broken code should be fixed instead of adding another layer of process management. systemd is not capable of interpreting the user's intent.
Paul
On Fri, May 27, 2016 at 07:03:23PM -0400, Paul Wouters wrote:
On Fri, 27 May 2016, Chris Murphy wrote:
It seems to me systemd should be able to know the difference between a program that's zombie or unresponsive but isn't doing anything or is unresponsive but is doing something; and if not then some way for programs to say "hey wait just a minute, I need to clean things up" or whatever, rather than just abruptly killing them.
That invention is otherwise known as "unix signals".
systemd should not be the process police. If there is a systematic problem of badly written code leaving orphaned code running when a user logs out, then that broken code should be fixed instead of adding another layer of process management. systemd is not capable of interpreting the user's intent.
systemd *is* process police. That's the job of init.
The sentiment of fixing processes which cause a problem is nice, but it's a game of whack-a-mole that you cannot win. For example in https://bugs.freedesktop.org/show_bug.cgi?id=94508#c10 it's hp-systray and some ibus related processes. Another time it'll be some other random process that is hung or misimplemented or confused. Once you have at least one process staying around, the login session remains in "closing" state. As long as the session stays around, the user's user@.service stays around, and this means many more processes staying around. It's a problem on a single-user system because when the user logs in again the state is not clean and processes from the old session are still holding files and resources. On a multi-user system it's also a problem, for the same reasons, and also because by default you don't want users consuming resources after they have logged out.
Before cgroups came around we really didn't have a mechanism to make accounting of processes automatic, so the only possibility was to hope that processes behave nicely, and let the administrator kill misbehaved processes by hand. This applied to both system services and user sessions. With systemd as pid1 we moved to a model where system services are managed and anything they leave behind is killed. This is a corresponding change to user sessions and user services. The same as with system services, we need to figure out what the exceptions are and which user services need special handling, but the default should be to clean up everything.
Zbyszek
On May 28, 2016 6:42 AM, "Zbigniew Jędrzejewski-Szmek" zbyszek@in.waw.pl wrote:
Once you have at least one process staying around, the login session remains in "closing" state. As long as the session stays around, the user's user@.service stays around, and this means many more processes staying around.
That is a design decision on systemd's part and could be changed.
but the default should be to clean up everything.
I agree that the default should be to clean up, but I'm not sure I agree that the default should change before a large fraction of userspace is ready, and it's definitely not ready yet.
Heck, as evidenced by the bug I filed, systemd-run itself isn't ready yet, and all I did is follow systemd's instructions.
It would seem entirely reasonable to me to go through the systemwide change process, identify what needs to get fixed, and fix it.
--Andy
On 28 May 2016 14:42, "Zbigniew Jędrzejewski-Szmek" zbyszek@in.waw.pl wrote:
On Fri, May 27, 2016 at 07:03:23PM -0400, Paul Wouters wrote:
On Fri, 27 May 2016, Chris Murphy wrote:
It seems to me systemd should be able to know the difference between a program that's zombie or unresponsive but isn't doing anything or is unresponsive but is doing something; and if not then some way for programs to say "hey wait just a minute, I need to clean things up" or whatever, rather than just abruptly killing them.
That invention is otherwise known as "unix signals".
systemd should not be the process police. If there is a systematic problem of badly written code leaving orphaned code running when a user logs out, then that broken code should be fixed instead of adding another layer of process management. systemd is not capable of interpreting the user's intent.
systemd *is* process police. That's the job of init.
The sentiment of fixing processes which cause a problem is nice, but it's a game of whack-a-mole that you cannot win. For example in https://bugs.freedesktop.org/show_bug.cgi?id=94508#c10 it's hp-systray and some ibus related processes. Another time it'll be some other random process that is hung or misimplemented or confused. Once you have at least one process staying around, the login session remains in "closing" state. As long as the session stays around, the user's user@.service stays around, and this means many more processes staying around. It's a problem on a single-user system because when the user logs in again the state is not clean and processes from the old session are still holding files and resources. On a multi-user system it's also a problem, for the same reasons, and also because by default you don't want users consuming resources after they have logged out.
Before cgroups came around we really didn't have a mechanism to make accounting of processes automatic, so the only possibility was to hope that processes behave nicely, and let the administrator kill misbehaved processes by hand. This applied to both system services and user sessions. With systemd as pid1 we moved to a model where system services are managed and anything they leave behind is killed. This is a corresponding change to user sessions and user services. The same as with system services, we need to figure out what the exceptions are and which user services need special handling, but the default should be to clean up everything.
There is some sense there but I do concur with the others that this should be listed as a system wide (not even self contained) change for F25 due to the impact and the critical path component.
For now might I suggest you do a fresh rawhide build with logind.conf setting the old behaviour, and then issue a F25 change for FESCO to discuss.
Once FESCO have ruled on this then we can follow their direction in assuming the new behaviour, and document how to background a process in the F25 changes/release notes.
On Sat, May 28, 2016 at 9:41 AM, Zbigniew Jędrzejewski-Szmek zbyszek@in.waw.pl wrote:
On Fri, May 27, 2016 at 07:03:23PM -0400, Paul Wouters wrote:
On Fri, 27 May 2016, Chris Murphy wrote:
It seems to me systemd should be able to know the difference between a program that's zombie or unresponsive but isn't doing anything or is unresponsive but is doing something; and if not then some way for programs to say "hey wait just a minute, I need to clean things up" or whatever, rather than just abruptly killing them.
That invention is otherwise known as "unix signals".
systemd should not be the process police. If there is a systematic problem of badly written code leaving orphaned code running when a user logs out, then that broken code should be fixed instead of adding another layer of process management. systemd is not capable of interpreting the user's intent.
systemd *is* process police. That's the job of init.
daemon poloce != process policie, especially user processes which have nothing whatsoever to do with system daemons. If it's gong manage user personal environments, it should be a separate set of tools called "userd".
The sentiment of fixing processes which cause a problem is nice, but it's a game of whack-a-mole that you cannot win. For example in https://bugs.freedesktop.org/show_bug.cgi?id=94508#c10 it's hp-systray and some ibus related processes. Another time it'll be some other random process that is hung or misimplemented or confused. Once you have at least one process staying around, the login session remains in "closing" state. As long as the session stays around, the user's user@.service stays around, and this means many more processes staying around. It's a problem on a single-user system because when the user logs in again the state is not clean and processes from the old session are still holding files and resources. On a multi-user system it's also a problem, for the same reasons, and also because by default you don't want users consuming resources after they have logged out.
It can be a problem. Enable this kind of aggressive userland manipulation by *request*, instead of by default, and you're far less likely to break longstanding procedures such as the ssh-agent configurations I just mentioned, and the user-activation of other credentials such as Java keystore.
Before cgroups came around we really didn't have a mechanism to make accounting of processes automatic, so the only possibility was to hope that processes behave nicely, and let the administrator kill misbehaved processes by hand. This applied to both system services and user sessions. With systemd as pid1 we moved to a model where system services are managed and anything they leave behind is killed. This is a corresponding change to user sessions and user services. The same as with system services, we need to figure out what the exceptions are and which user services need special handling, but the default should be to clean up everything.
Zbyszek
devel mailing list devel@lists.fedoraproject.org https://lists.fedoraproject.org/admin/lists/devel@lists.fedoraproject.org
On Fri, 27.05.16 19:03, Paul Wouters (paul@nohats.ca) wrote:
On Fri, 27 May 2016, Chris Murphy wrote:
It seems to me systemd should be able to know the difference between a program that's zombie or unresponsive but isn't doing anything or is unresponsive but is doing something; and if not then some way for programs to say "hey wait just a minute, I need to clean things up" or whatever, rather than just abruptly killing them.
That invention is otherwise known as "unix signals".
systemd should not be the process police. If there is a systematic problem of badly written code leaving orphaned code running when a user logs out, then that broken code should be fixed instead of adding another layer of process management. systemd is not capable of interpreting the user's intent.
Sorry, but systemd is pretty exactly this: a process babysitter. In fact, before it was named "system" it actually was called "BabyKit", in reference to its job of babysitting processes.
It's job is to run processes in clean and well-defined execution environments, and to ensure these execution environments are cleaned up properly afterwards.
Lennart
Lennart Poettering mzerqung@0pointer.de writes:
Sorry, but systemd is pretty exactly this: a process babysitter.
It's becoming a user nanny instead. I wish it would stop trying to enforce its "my way or the highway" approach to system rules. I've been playing whack-a-mole trying to keep up with all the tweaks I need (assuming I can find them) to let me do what I want to do with my own machine.
I am not a baby and do not need a babysitter.
On Tue, 31.05.16 15:11, DJ Delorie (dj@redhat.com) wrote:
Lennart Poettering mzerqung@0pointer.de writes:
Sorry, but systemd is pretty exactly this: a process babysitter.
It's becoming a user nanny instead. I wish it would stop trying to enforce its "my way or the highway" approach to system rules. I've been playing whack-a-mole trying to keep up with all the tweaks I need (assuming I can find them) to let me do what I want to do with my own machine.
I am not a baby and do not need a babysitter.
Thank god it's a process babysitter, not a DJ Delorie babysitter...
Lennart
On Wed, Jun 01, 2016 at 11:50:59AM +0200, Lennart Poettering wrote:
Sorry, but systemd is pretty exactly this: a process babysitter.
It's becoming a user nanny instead. I wish it would stop trying to enforce its "my way or the highway" approach to system rules. I've been playing whack-a-mole trying to keep up with all the tweaks I need (assuming I can find them) to let me do what I want to do with my own machine. I am not a baby and do not need a babysitter.
Thank god it's a process babysitter, not a DJ Delorie babysitter...
Both of you: let's please not go this direction in the thread. There's no way it's constructive.
On Fri, May 27, 2016 at 5:03 PM, Paul Wouters paul@nohats.ca wrote:
If there is a systematic problem of badly written code leaving orphaned code running when a user logs out, then that broken code should be fixed instead of adding another layer of process management. systemd is not capable of interpreting the user's intent.
That isn't working. Users are constantly running into restart and shutdown delays. Troubleshooting this to find out what process is holding things up is totally non-obvious. Identifying the process is half the problem, and then getting it fixed and released to Fedora can be months, by which time some other process is affected.
The latest one I've run into [1] I can't figure out what the culprit is. All processes have a status of S or derivative thereof. Clearly it's something in session-c1.scope since in the end that's what systemd forcibly kills. But it only does that after 90 seconds, which is just untenable. And as you can see, does the user blame gnome-shell because that's where the hang occurs? Or is it gdm because that's what owns the scope that won't quit? Or is it some process within that scope that's the problem, and if so how to find it? Non-obvious.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1337307
On Sun, 29 May 2016, Chris Murphy wrote:
On Fri, May 27, 2016 at 5:03 PM, Paul Wouters paul@nohats.ca wrote:
If there is a systematic problem of badly written code leaving orphaned code running when a user logs out, then that broken code should be fixed instead of adding another layer of process management. systemd is not capable of interpreting the user's intent.
That isn't working. Users are constantly running into restart and shutdown delays. Troubleshooting this to find out what process is holding things up is totally non-obvious. Identifying the process is half the problem, and then getting it fixed and released to Fedora can be months, by which time some other process is affected.
Taking a shotgun isn't going to help that. Actually, if "bad code upstream" is the problem, you can just wait on all of that code starting to tell systemd they want to linger to avoid getting shot by systemd for doing something wrong.
So this whole thing becomes another abstraction layer that serves no purpose, and just causes collateral damage.
Paul
On Mon, May 30, 2016 at 7:12 AM, Paul Wouters paul@nohats.ca wrote:
On Sun, 29 May 2016, Chris Murphy wrote:
On Fri, May 27, 2016 at 5:03 PM, Paul Wouters paul@nohats.ca wrote:
If there is a systematic problem of badly written code leaving orphaned code running when a user logs out, then that broken code should be fixed instead of adding another layer of process management. systemd is not capable of interpreting the user's intent.
That isn't working. Users are constantly running into restart and shutdown delays. Troubleshooting this to find out what process is holding things up is totally non-obvious. Identifying the process is half the problem, and then getting it fixed and released to Fedora can be months, by which time some other process is affected.
Taking a shotgun isn't going to help that. Actually, if "bad code upstream" is the problem, you can just wait on all of that code starting to tell systemd they want to linger to avoid getting shot by systemd for doing something wrong.
The metaphor is vaguely entertaining, but it's not like the shotgun is the first or second action in the sequence. It's only once the first two fail that the shotgun method is invoked, which by the way happens anyway on restart and shutdown 1m30s later.
I think there's a distinction between incidentally bad code causing these logout and restart delays, where code starting to request linger when this isn't strictly necessary is intentional subterfuge.
So this whole thing becomes another abstraction layer that serves no purpose, and just causes collateral damage.
For restart and shutdown it seems uncontroversial behavior to obtain escalated action requested by the user. The gray area might be strictly with logouts. So is there a way to carve out the more aggressive clean up of user processes only at restart/shutdown time, while leaving less aggressive clean ups when merely logging out?
On 05/29/2016 05:14 PM, Chris Murphy wrote:
On Fri, May 27, 2016 at 5:03 PM, Paul Wouters paul@nohats.ca wrote:
If there is a systematic problem of badly written code leaving orphaned code running when a user logs out, then that broken code should be fixed instead of adding another layer of process management. systemd is not capable of interpreting the user's intent.
That isn't working. Users are constantly running into restart and shutdown delays. Troubleshooting this to find out what process is holding things up is totally non-obvious. Identifying the process is half the problem, and then getting it fixed and released to Fedora can be months, by which time some other process is affected.
But the delays in system shutdown/restart were introduced by systemd in the first place. I would argue that these delays are much too long.
On Tue, May 31, 2016 at 03:04:52PM -0600, Orion Poplawski wrote:
On 05/29/2016 05:14 PM, Chris Murphy wrote:
On Fri, May 27, 2016 at 5:03 PM, Paul Wouters paul@nohats.ca wrote:
If there is a systematic problem of badly written code leaving orphaned code running when a user logs out, then that broken code should be fixed instead of adding another layer of process management. systemd is not capable of interpreting the user's intent.
That isn't working. Users are constantly running into restart and shutdown delays. Troubleshooting this to find out what process is holding things up is totally non-obvious. Identifying the process is half the problem, and then getting it fixed and released to Fedora can be months, by which time some other process is affected.
But the delays in system shutdown/restart were introduced by systemd in the first place. I would argue that these delays are much too long.
Set DefaultTimeoutStartSec= in /etc/systemd/systemd.conf.
The default default has to be fairly high because of the wide range of hardware that people are using (e.g. a 10 year old uni-processor machine swapping to a slow rotational disk).
Zbyszek
On Fri, Jun 03, 2016 at 01:44:46AM +0000, Zbigniew Jędrzejewski-Szmek wrote:
Set DefaultTimeoutStartSec= in /etc/systemd/systemd.conf.
(Note typo: /etc/systemd/system.conf, with no d in the filename.)
FWIW, the default is 90s
Chris Murphy wrote:
On Fri, May 27, 2016 at 7:19 AM, Zbigniew Jędrzejewski-Szmek zbyszek@in.waw.pl wrote:
On Fri, May 27, 2016 at 08:09:33AM -0500, Chris Adams wrote:
Once upon a time, Zbigniew Jędrzejewski-Szmek zbyszek@in.waw.pl said:
Also note that running jobs in a systemd service has advantages on the server: better accounting, more transparency, logs are easier to read. The (old) default of allowing left-over session processes to live on seems especially bad on a server with multiple users.
Starting a one-off task under screen and detaching is an age-old server management process. Breaking that is not acceptable IMHO.
This change was done for a reason: left-over session processes are causing real problems. You still can start a one-off task under screen, you just need to invoke it in one the different ways described in https://www.freedesktop.org/software/systemd/man/systemd-run.html#Examples
I have to agree, but there is a difference in expectations depending on the system.
Fedora Workstation, I expect all processes launched by/owned by me, to be quit on logout. Actually what I expect is by telling GNOME I'm logging out, restarting, or shutting down, that it should send a quit message to all applications. Those applications should be able to interrupt this if there's unsaved data and prompt the user; but better than this would be applications that can save their own state because an application cancelling a reboot is archaic. But often this doesn't work, processes continue to keep the user-session alive because they won't stop running. So we keep seeing these problems on Fedora were the system won't reboot for 1m30s which is the systemd timeout for user sessions that haven't yet quit.
If all you care about is your desktop environment getting cleaned up, then you're looking for a feature that is only activated within a desktop context. Historically, people accomplished this by adding commands to .xinitrc. Making this a system-wide default, whether you used a GUI desktop environment or not, is grossly mis-targeting this "fix".
So it's a problem.
Fedora Server, I expect to login, run tmux, start sessions, detach, and log out and I expect those tmux sessions to keep running. If this workflow is going to change I need some super clean and obvious way to know the right new way to do things or I'll just get annoyed and cranky. Running tmux as a systemd service? I don't know how that'll work, and I'm very skeptical that the user should get dinged with a workflow change just because there are some stubborn programs floating around that won't quit without delay.
It seems to me systemd should be able to know the difference between a program that's zombie or unresponsive but isn't doing anything or is unresponsive but is doing something; and if not then some way for programs to say "hey wait just a minute, I need to clean things up" or whatever, rather than just abruptly killing them.
On 05/27/2016 01:54 PM, Chris Murphy wrote:
It seems to me systemd should be able to know the difference between a program that's zombie or unresponsive but isn't doing anything or is unresponsive but is doing something; and if not then some way for programs to say "hey wait just a minute, I need to clean things up" or whatever, rather than just abruptly killing them.
I think our technical debt is catching up with us, because there's no consistent way to treat some processes as persistent, and others as disposable: we just did this type of process management by hand. Solving this systematically may be tricky because there's many different scenarios. Processes can be:
- totally disposable across logins, shouldn't even be relaunched on logging back
- disposable across login, but should reappear, e.g. the calendar on the desktop
- should keep running, e.g. the battery tester collecting the battery discharge data I am running now
- should keep running and restart if killed/crashed/rebooted: e.g. a weather/thermostat monitoring app
Systemd at least offers facilities to manage processes that, I think, allows for all those use cases. The difficulty is that systemd is fairly complex, and it would be nice if those use cases were easily available to a desktop user who may not be intimately familiar with writing systemd unit descriptions. I don't even know what would be an appropriate workflow for dealing with this: a rightclick GUI option? a commandline wrapper that runs the process as a service? Maybe some annointed processes like tmux/screen should automatically be exempt from termination? How about shell background processes: should an explicit & mean that the process keeps running across logouts?
On 05/27/2016 03:19 PM, Zbigniew Jędrzejewski-Szmek wrote:
This change was done for a reason: left-over session processes are causing real problems.
The original error was in fact having random processes spawned without user consent: configuration handlers, sound mixing, policy handlers and other stuff.
Now a problem is solved by adding a new problem.
On Fri, 27.05.16 08:09, Chris Adams (linux@cmadams.net) wrote:
Once upon a time, Zbigniew Jędrzejewski-Szmek zbyszek@in.waw.pl said:
Also note that running jobs in a systemd service has advantages on the server: better accounting, more transparency, logs are easier to read. The (old) default of allowing left-over session processes to live on seems especially bad on a server with multiple users.
Starting a one-off task under screen and detaching is an age-old server management process. Breaking that is not acceptable IMHO.
And it is still supported.
In my view it was actually quite strange of UNIX that it by default let arbitrary user code stay around unrestricted after logout. It has been discussed for ages now among many OS people, that this should possible but certainly not be the default, but nobody dared so far to flip the switch to turn it from a default to an option. Not cleaning up user sessions after logout is not only ugly and somewhat hackish but also a security problem.
systemd 230 now finally flipped the switch and finally by default cleans everything up correctly when the user logs out. But we do so in a very conservative way actually:
a) there's a compile time switch to turn this off globally (--without-kill-user-processes, not used in Fedora)
b) there's a runtime switch to turn this off locally on the system (in logind.conf)
c) there's a way to opt-out invidually for each user and each task from the cleanup logic, via systemd-run/loginctl linger. This operation goes through PK, and thus can be configured in a more strict or more open policy, depending on whhat the admin prefers.
I am pretty sure we should consider it our duty as Fedora developers to improve the Linux platform, and I am pretty sure that properly cleaning up processes on logout is a step towards that, not against it.
Lennart
Once upon a time, Lennart Poettering mzerqung@0pointer.de said:
I am pretty sure we should consider it our duty as Fedora developers to improve the Linux platform, and I am pretty sure that properly cleaning up processes on logout is a step towards that, not against it.
When you "clean up" by killing things that are designed to run after logout, you are being over-zealous. It is incumbent upon you to fix your cleanup methods to handle this case, not the thousands of users to change their process to avoid your broken methods.
On 27/05/16 15:25, Chris Adams wrote:
Once upon a time, Lennart Poettering mzerqung@0pointer.de said:
I am pretty sure we should consider it our duty as Fedora developers to improve the Linux platform, and I am pretty sure that properly cleaning up processes on logout is a step towards that, not against it.
When you "clean up" by killing things that are designed to run after logout, you are being over-zealous. It is incumbent upon you to fix your cleanup methods to handle this case, not the thousands of users to change their process to avoid your broken methods.
But that's effectively calling for the impossible - if there has historically been no way for things to announce that they are expected to remain after exit then there's no way to magically identify them now.
What would be good would be if screen and similar programs could learn to do the necessary magic automatically rather than having to be wrapped in systemd-run by the user.
Tom
Once upon a time, Tom Hughes tom@compton.nu said:
On 27/05/16 15:25, Chris Adams wrote:
When you "clean up" by killing things that are designed to run after logout, you are being over-zealous. It is incumbent upon you to fix your cleanup methods to handle this case, not the thousands of users to change their process to avoid your broken methods.
But that's effectively calling for the impossible - if there has historically been no way for things to announce that they are expected to remain after exit then there's no way to magically identify them now.
What would be good would be if screen and similar programs could learn to do the necessary magic automatically rather than having to be wrapped in systemd-run by the user.
And that's what I'm suggesting. Before making this the default in systemd, the systemd developers should come up with a way for processes to make some type of notification, and then get patches into the common things (screen and tmux at least, with freely-licensed example code for others to use) to handle this. Only then should the default be changed.
Flipping the switch and saying everybody has to change their behavior to match is rude. Working with people to transparently handle very common work processes is much more acceptable.
On Fri, May 27, 2016 at 09:36:58AM -0500, Chris Adams wrote:
Once upon a time, Tom Hughes tom@compton.nu said:
On 27/05/16 15:25, Chris Adams wrote:
When you "clean up" by killing things that are designed to run after logout, you are being over-zealous. It is incumbent upon you to fix your cleanup methods to handle this case, not the thousands of users to change their process to avoid your broken methods.
But that's effectively calling for the impossible - if there has historically been no way for things to announce that they are expected to remain after exit then there's no way to magically identify them now.
What would be good would be if screen and similar programs could learn to do the necessary magic automatically rather than having to be wrapped in systemd-run by the user.
And that's what I'm suggesting. Before making this the default in systemd, the systemd developers should come up with a way for processes to make some type of notification, and then get patches into the common things (screen and tmux at least, with freely-licensed example code for others to use) to handle this. Only then should the default be changed.
On Fr, 2016-05-27 at 15:29 +0100, Tom Hughes wrote:
On 27/05/16 15:25, Chris Adams wrote:
Once upon a time, Lennart Poettering mzerqung@0pointer.de said:
I am pretty sure we should consider it our duty as Fedora developers to improve the Linux platform, and I am pretty sure that properly cleaning up processes on logout is a step towards that, not against it.
When you "clean up" by killing things that are designed to run after logout, you are being over-zealous. It is incumbent upon you to fix your cleanup methods to handle this case, not the thousands of users to change their process to avoid your broken methods.
But that's effectively calling for the impossible - if there has historically been no way for things to announce that they are expected to remain after exit then there's no way to magically identify them now.
Wrong.
There is a way, and it's called SIGHUP. Processes get that signal if your controlling terminal is gone, which typically happens for background processes on logout. Default action for the signal is to terminate the process. So if you run "tail -f $somelog &", forget about it, then logout (or close the xterm) the SIGHUP will kill it.
But it's also possible for programs to ignore and/or handle SIGHUP. Simplest case is the "nohup" utility which basically sets the SIGHUP action to ignore then goes execute whatever you specified on the command line. IIRC vim catches SIGHUP to save the current buffer somewhere before exiting so your unsaved work isn't lost. And of course tools like screen and tmux don't exit but simply detach from terminal on SIGHUP.
There is nothing simliar for gui processes (which typically don't have a controlling tty) though. But having gui processes continue when your display server is gone looks pretty pointless to me, except for save-state-then-exit style actions (like firefox does so it can offer to restore tabs next time you start it).
IMO systemd should allow to specify the KillUserProcesses policy separately for processes with/without controlling terminal. So you could ask systemd to zap any gnome process going wild on logout without breaking screen and tmux.
While being on it: I also think a logging option would be useful. If a (gui) process doesn't exit by itself on logout but needs to be killed this indicates a bug and you might want to know about it.
cheers, Gerd
On Tue, 31.05.16 11:31, Gerd Hoffmann (kraxel@redhat.com) wrote:
IMO systemd should allow to specify the KillUserProcesses policy separately for processes with/without controlling terminal. So you could ask systemd to zap any gnome process going wild on logout without breaking screen and tmux.
Again, as mentioned before: key here is that permitting user processes to stick around after all sessions of the user ended needs to be a privilieged concept. It should not be allowed for user code to stick around after logout, unless this is explicitly permitted by the admin, and this hence needs to be enforced by privileged code.
Hence, whether a process reacts to SIGHUP or SIGTERM or not is not suitable at all as indication on whether to permit them to stay around or not, because that's something that is exclusively up to the processes themselves, and requires no privileges at all to make use of.
Lennart
On Di, 2016-05-31 at 11:56 +0200, Lennart Poettering wrote:
On Tue, 31.05.16 11:31, Gerd Hoffmann (kraxel@redhat.com) wrote:
IMO systemd should allow to specify the KillUserProcesses policy separately for processes with/without controlling terminal. So you could ask systemd to zap any gnome process going wild on logout without breaking screen and tmux.
Again, as mentioned before: key here is that permitting user processes to stick around after all sessions of the user ended needs to be a privilieged concept.
Sure.
It should not be allowed for user code to stick around after logout, unless this is explicitly permitted by the admin,
Having a switch for that so the admin can permit or deny this is fine. Having it default to "deny" is rude IMO.
Traditionally this has worked and there are some very reasonable use cases for this. logouts are not always intentional. If some internet hickup breaks your ssh connection you are logged out too, and the usual way to avoid your remote session being killed by that event is running your stuff in screen.
Hence, whether a process reacts to SIGHUP or SIGTERM or not is not suitable at all as indication on whether to permit them to stay around or not, because that's something that is exclusively up to the processes themselves, and requires no privileges at all to make use of.
Its still useful as indication whenever the user wants the processes stay around or not. So systemd should IMO use it for a more fine-grained control over the kill behavior.
Maybe turn KillUserProcesses into a tristate: Yes/WithoutTerminal/No.
Maybe make it depend on the "linger" option to avoid a new config switch. From a security point of view that makes sense, if the admin allows users to enable lingering for themself they are able to leave processes running after logout anyway, so systemd could also allow that using the traditional unix way via SIGHUP handler.
cheers, Gerd
Lennart Poettering mzerqung@0pointer.de writes:
Again, as mentioned before: key here is that permitting user processes to stick around after all sessions of the user ended needs to be a privilieged concept. It should not be allowed for user code to stick around after logout, unless this is explicitly permitted by the admin, and this hence needs to be enforced by privileged code.
How many Fedora installs are multi-user these days? How many single-user desktops are we afflicting with a "you must ask an admin" rule, when there is no admin besides the user sitting at the keyboard?
Any rule that tries to split users into "unpriviledged" and "admin" is short-sighted.
On Tue, May 31, 2016 at 3:20 PM, DJ Delorie dj@redhat.com wrote:
Lennart Poettering mzerqung@0pointer.de writes:
Again, as mentioned before: key here is that permitting user processes to stick around after all sessions of the user ended needs to be a privilieged concept. It should not be allowed for user code to stick around after logout, unless this is explicitly permitted by the admin, and this hence needs to be enforced by privileged code.
How many Fedora installs are multi-user these days? How many single-user desktops are we afflicting with a "you must ask an admin" rule, when there is no admin besides the user sitting at the keyboard?
Any rule that tries to split users into "unpriviledged" and "admin" is short-sighted.
I'm actually in agreement with DJ here, for systems that -are- multi-user, then having some users being able to linger and others not makes sense. What if the Anaconda team changed it so the "Make this user an administrator" checkbox also enabled linger? This way those that are not meant to be administrators (and therefore are likely not 'advanced users') can't persist by default
On Tue, 2016-05-31 at 15:26 -0400, Eric Griffith wrote:
What if the Anaconda team changed it so the "Make this user an administrator" checkbox also enabled linger?
anaconda team is (rightly) opposed to anything like this, in terms of magic code in anaconda that changes things. All that box does is put the user in the wheel group (and maybe tell PolicyKit they're an admin, I forget if that's a separate thing or not). It does not and will not do anything else. Anything else has to be achieved in terms of saying 'wheel members / PK admins can do X'.
DJ Delorie wrote:
Lennart Poettering mzerqung@0pointer.de writes:
Again, as mentioned before: key here is that permitting user processes to stick around after all sessions of the user ended needs to be a privilieged concept. It should not be allowed for user code to stick around after logout, unless this is explicitly permitted by the admin, and this hence needs to be enforced by privileged code.
How many Fedora installs are multi-user these days? How many single-user desktops are we afflicting with a "you must ask an admin" rule, when there is no admin besides the user sitting at the keyboard?
Any rule that tries to split users into "unpriviledged" and "admin" is short-sighted.
Agreed. And the basic premise is utterly wrong. The user was obviously permitted to login to the machine, they are therefore permitted to run processes on the machine. Whether their shell process stays alive or not is utterly irrelevant, any other processes that continue to run after their login shell terminates is still legitimately using the machine. To call running without a control terminal "privileged" is inventing new definitions out of thin air. There is no logical basis for it. The entire premise is invalid.
On May 31, 2016 3:24 PM, "Howard Chu" hyc@symas.com wrote:
DJ Delorie wrote:
Lennart Poettering mzerqung@0pointer.de writes:
Again, as mentioned before: key here is that permitting user processes to stick around after all sessions of the user ended needs to be a privilieged concept. It should not be allowed for user code to stick around after logout, unless this is explicitly permitted by the admin, and this hence needs to be enforced by privileged code.
How many Fedora installs are multi-user these days? How many single-user desktops are we afflicting with a "you must ask an admin" rule, when there is no admin besides the user sitting at the keyboard?
Any rule that tries to split users into "unpriviledged" and "admin" is short-sighted.
Agreed. And the basic premise is utterly wrong. The user was obviously
permitted to login to the machine, they are therefore permitted to run processes on the machine. Whether their shell process stays alive or not is utterly irrelevant, any other processes that continue to run after their login shell terminates is still legitimately using the machine. To call running without a control terminal "privileged" is inventing new definitions out of thin air. There is no logical basis for it. The entire premise is invalid.
Sure it is. An admin might reasonable want users who aren't logged in not to have processed running. So there are really two issues here:
1. What's a reasonable default? I would venture that allowing processes to persist is the right default. (Current Fedora 24 appears to be set up like that, although the behavior is awkward.)
2. Assuming that users are allowed to have persistent processes, how do they make them persistent? I would argue that the current approach of twiddling both loginctl (via polkit) *and* using systemd-run (currently buggy) is too complicated and is a poor API / user experience.
--Andy
On Tue, May 31, 2016 at 4:23 PM, Howard Chu hyc@symas.com wrote:
DJ Delorie wrote:
Lennart Poettering mzerqung@0pointer.de writes:
Again, as mentioned before: key here is that permitting user processes to stick around after all sessions of the user ended needs to be a privilieged concept. It should not be allowed for user code to stick around after logout, unless this is explicitly permitted by the admin, and this hence needs to be enforced by privileged code.
How many Fedora installs are multi-user these days? How many single-user desktops are we afflicting with a "you must ask an admin" rule, when there is no admin besides the user sitting at the keyboard?
Any rule that tries to split users into "unpriviledged" and "admin" is short-sighted.
Agreed. And the basic premise is utterly wrong. The user was obviously permitted to login to the machine, they are therefore permitted to run processes on the machine. Whether their shell process stays alive or not is utterly irrelevant, any other processes that continue to run after their login shell terminates is still legitimately using the machine. To call running without a control terminal "privileged" is inventing new definitions out of thin air. There is no logical basis for it. The entire premise is invalid.
The consistent theme by all parties I'm hearing is that there should be better sanctioning for the bad apples. Right now the perception of this feature is that sanctioning is impacting users and the upstreams of non-offending tools, more than it's impacting the actual bad apples that are the impetus behind the feature.
Dne 31.5.2016 v 21:20 DJ Delorie napsal(a):
Lennart Poettering mzerqung@0pointer.de writes:
Again, as mentioned before: key here is that permitting user processes to stick around after all sessions of the user ended needs to be a privilieged concept. It should not be allowed for user code to stick around after logout, unless this is explicitly permitted by the admin, and this hence needs to be enforced by privileged code.
How many Fedora installs are multi-user these days? How many single-user desktops are we afflicting with a "you must ask an admin" rule, when there is no admin besides the user sitting at the keyboard?
How many users logs out if they leave their (single-user) computer? I myself lock the computer, so everything keeps running and this problem appears to be artificial in this context.
Vít
On Jun 1, 2016 3:03 AM, "Vít Ondruch" vondruch@redhat.com wrote:
Dne 31.5.2016 v 21:20 DJ Delorie napsal(a):
Lennart Poettering mzerqung@0pointer.de writes:
Again, as mentioned before: key here is that permitting user processes to stick around after all sessions of the user ended needs to be a privilieged concept. It should not be allowed for user code to stick around after logout, unless this is explicitly permitted by the admin, and this hence needs to be enforced by privileged code.
How many Fedora installs are multi-user these days? How many single-user desktops are we afflicting with a "you must ask an admin" rule, when there is no admin besides the user sitting at the keyboard?
How many users logs out if they leave their (single-user) computer? I myself lock the computer, so everything keeps running and this problem appears to be artificial in this context.
Me, all the time. I ssh in, start a job in tnux, and log out.
--Andy
-----Original Message----- From: Vít Ondruch [mailto:vondruch@redhat.com] Sent: Wednesday, June 01, 2016 06:03 To: devel@lists.fedoraproject.org Subject: Re: systemd 230 change - KillUserProcesses defaults to yes
How many users logs out if they leave their (single-user) computer? I myself lock the computer, so everything keeps running and this problem appears to be artificial in this context.
Vít
Thank you! While I find this debate interesting and appreciate both sides of the argument, I had to wonder the same. I only log out I because I need to reboot. I only login after that, or when my workstation crashed. Otherwise, it's simply locked. I wish I could complete what I'm working on in a single session, but my workflow is such that I'd be much happier if a single session could survive for months (or more but now I'm fantasizing). I use screen/tmux at times, but generally only to multiplex a single TTY when X isn't available (e.g., to tail a log in real time while I fiddle in the shell). Otherwise I'd much rather utilize my X window manager. If the host I'm accessing is not my workstation, then it's almost always ssh (in an xterm).
Given all this, it shouldn't be hard to imagine I'd prefer the proposed change but I have no qualms in changing the default to suit my purposes -- I've been a deviant ever since RHL switched bash from vi mode to emacs mode. :-) (Anybody remember when that occurred? 5.x?)
-- John Florian
On Fri, May 27, 2016 at 9:58 AM Lennart Poettering mzerqung@0pointer.de wrote:
On Fri, 27.05.16 08:09, Chris Adams (linux@cmadams.net) wrote:
Once upon a time, Zbigniew Jędrzejewski-Szmek zbyszek@in.waw.pl said:
Also note that running jobs in a systemd service has advantages on the server: better accounting, more transparency, logs are easier to read. The (old) default of allowing left-over session processes to live on seems especially bad on a server with multiple users.
Starting a one-off task under screen and detaching is an age-old server management process. Breaking that is not acceptable IMHO.
And it is still supported.
In my view it was actually quite strange of UNIX that it by default let arbitrary user code stay around unrestricted after logout. It has been discussed for ages now among many OS people, that this should possible but certainly not be the default, but nobody dared so far to flip the switch to turn it from a default to an option. Not cleaning up user sessions after logout is not only ugly and somewhat hackish but also a security problem.
[snip]
Apologies for a metaphor, but...
Imagine a map of a terrain, and a transparent plastic overlay containing landmarks. Most of the time, people find it valuable to view the map with the overlay laid on top of it. But, sometimes it's useful to remove the overlay and look at the natural terrain. It would be a mistake to think that the only perspective is the one with the overlay on top... and it would be a big mistake to glue the overlay down so that particular perspective is effectively enforced.
The "login" concept here seems to me nothing more than a conceptual overlay of what's going on underneath (running user processes). Sure, it's a convenient way of describing a particular experience with a computer. But, it's not the only way to describe that experience. One could also describe it as a a graph of arbitrary processes.
It seems to me that what's happening is that systemd is now enforcing this "login session" perspective... metaphorically speaking, gluing the transparent overlay onto the map (but don't worry! they also provide a special adhesive remover!). This makes it that much harder for people to make use of what's underneath without viewing it through the overlay... which, as it turns out, is a *very* common thing to do (screen, tmux, nohup, etc.).
Whether or not this as default is a good thing in the long run, I don't know. I can see pros and cons (ease of cleanup / unexpected behavior for a big group of folks). However, I am concerned that it seems the conceptual perspective of a "login" is now being enforced within the internals. I think it's a mistake to think that the internals *must* match our human experience/understanding from the outside (the experience of a "login" session/environment), and this change appears to be stepping in that direction.
Perhaps one intermediate compromise is to, instead of requiring the use of system-run, users should be able to have a whitelist of processes (like screen, tmux, etc.) which are not killed as "cleanup". (Clearly, "screen" is intentionally long-running, and should never be treated as "leftovers" from a login session. I'm sure there are others which would fall under this scenario too.)
On 05/27/2016 12:45 PM, Christopher wrote:
It seems to me that what's happening is that systemd is now enforcing this "login session" perspective... metaphorically speaking, gluing the transparent overlay onto the map (but don't worry! they also provide a special adhesive remover!). This makes it that much harder for people to make use of what's underneath without viewing it through the overlay... which, as it turns out, is a *very* common thing to do (screen, tmux, nohup, etc.).
This is a very good observation. The 'login' infrastructure deals with authorization to run processes on the computer, which is orthogonal to managing characteristics of individual processes, such as whether they are transient or persistent. Admitedly, the logout process has to deal with the lingering processes: Windows, for instance, throws a dialog box asking to terminate the apps. This is somehow a violation of layering which I just pointed out above, but I think it is correct in asking for user intent.
In any case, the common use case nowadays is a personal device, where this whole issue is somehow moot: there are no multiple users, the user is the administrator, and the login session is really from startup to shutdown---so the proposed change doesn't change the user-visible behavior much, except making the reboot quicker.
Actually, how does this proposal deal with network logins? If I SSH to another system and run backup in the background, will it kill it when I log out?
On 06/01/2016 02:19 PM, Przemek Klosowski wrote:
On 05/27/2016 12:45 PM, Christopher wrote:
It seems to me that what's happening is that systemd is now enforcing this "login session" perspective... metaphorically speaking, gluing the transparent overlay onto the map (but don't worry! they also provide a special adhesive remover!). This makes it that much harder for people to make use of what's underneath without viewing it through the overlay... which, as it turns out, is a *very* common thing to do (screen, tmux, nohup, etc.).
This is a very good observation. The 'login' infrastructure deals with authorization to run processes on the computer, which is orthogonal to managing characteristics of individual processes, such as whether they are transient or persistent. Admitedly, the logout process has to deal with the lingering processes: Windows, for instance, throws a dialog box asking to terminate the apps. This is somehow a violation of layering which I just pointed out above, but I think it is correct in asking for user intent.
In any case, the common use case nowadays is a personal device, where this whole issue is somehow moot: there are no multiple users, the user is the administrator, and the login session is really from startup to shutdown---so the proposed change doesn't change the user-visible behavior much, except making the reboot quicker.
Actually, how does this proposal deal with network logins? If I SSH to another system and run backup in the background, will it kill it when I log out?
That's too broad of a question. If the backup is provided by a service running as a daemon on the system bus or as a system-wide systemd unit and all your session is doing is telling it to start, it will continue to run.
If you have your user set to "linger", it will continue to run.
If you just have it running in the background, it would have died when you logged out before this change (depending on shell behavior; some shells reparent background tasks so it's ambiguous).
If you had it running under a 'screen' session, previously it would have kept running, now it would exit.
So the "how" makes a big difference.
On Wed, Jun 1, 2016 at 2:19 PM, Przemek Klosowski < przemek.klosowski@nist.gov> wrote:
On 05/27/2016 12:45 PM, Christopher wrote:
It seems to me that what's happening is that systemd is now enforcing this "login session" perspective... metaphorically speaking, gluing the transparent overlay onto the map (but don't worry! they also provide a special adhesive remover!). This makes it that much harder for people to make use of what's underneath without viewing it through the overlay... which, as it turns out, is a *very* common thing to do (screen, tmux, nohup, etc.).
This is a very good observation. The 'login' infrastructure deals with authorization to run processes on the computer, which is orthogonal to managing characteristics of individual processes, such as whether they are transient or persistent. Admitedly, the logout process has to deal with the lingering processes: Windows, for instance, throws a dialog box asking to terminate the apps. This is somehow a violation of layering which I just pointed out above, but I think it is correct in asking for user intent.
In any case, the common use case nowadays is a personal device, where this whole issue is somehow moot: there are no multiple users, the user is the administrator, and the login session is really from startup to shutdown---so the proposed change doesn't change the user-visible behavior much, except making the reboot quicker.
I think one needs to be careful with even this assumption though. I have used my server in a hybrid fashion, where I'll log into it both in a desktop environment and via SSH and use the same tmux window or backgrounded processes from each. Killing these processes just because I started them from the desktop instead of via SSH is not an agreeable default.
On 27.05.2016 15:57, Lennart Poettering wrote:
On Fri, 27.05.16 08:09, Chris Adams (linux@cmadams.net) wrote:
Once upon a time, Zbigniew Jędrzejewski-Szmek zbyszek@in.waw.pl said:
Also note that running jobs in a systemd service has advantages on the server: better accounting, more transparency, logs are easier to read. The (old) default of allowing left-over session processes to live on seems especially bad on a server with multiple users.
Starting a one-off task under screen and detaching is an age-old server management process. Breaking that is not acceptable IMHO.
And it is still supported.
In my view it was actually quite strange of UNIX that it by default let arbitrary user code stay around unrestricted after logout. It has been discussed for ages now among many OS people, that this should possible but certainly not be the default, but nobody dared so far to flip the switch to turn it from a default to an option. Not cleaning up user sessions after logout is not only ugly and somewhat hackish but also a security problem.
systemd 230 now finally flipped the switch and finally by default cleans everything up correctly when the user logs out. But we do so in a very conservative way actually:
a) there's a compile time switch to turn this off globally (--without-kill-user-processes, not used in Fedora)
b) there's a runtime switch to turn this off locally on the system (in logind.conf)
c) there's a way to opt-out invidually for each user and each task from the cleanup logic, via systemd-run/loginctl linger. This operation goes through PK, and thus can be configured in a more strict or more open policy, depending on whhat the admin prefers.
Would it make sense to add a extended file attribute that allows systemd to query if an application should be killed or not? screen and tmux could have those added as long as they don't talk to systemd.
Either this extended attribute could be checked lazily or the elf process loader could inform systemd about the new requested scope.
Has this been looked at?
Bye, Hannes
On 31 May 2016 at 11:10, Hannes Frederic Sowa hannes@stressinduktion.org wrote:
On 27.05.2016 15:57, Lennart Poettering wrote:
On Fri, 27.05.16 08:09, Chris Adams (linux@cmadams.net) wrote:
Once upon a time, Zbigniew Jędrzejewski-Szmek zbyszek@in.waw.pl said:
Also note that running jobs in a systemd service has advantages on the server: better accounting, more transparency, logs are easier to read. The (old) default of allowing left-over session processes to live on seems especially bad on a server with multiple users.
Starting a one-off task under screen and detaching is an age-old server management process. Breaking that is not acceptable IMHO.
And it is still supported.
In my view it was actually quite strange of UNIX that it by default let arbitrary user code stay around unrestricted after logout. It has been discussed for ages now among many OS people, that this should possible but certainly not be the default, but nobody dared so far to flip the switch to turn it from a default to an option. Not cleaning up user sessions after logout is not only ugly and somewhat hackish but also a security problem.
systemd 230 now finally flipped the switch and finally by default cleans everything up correctly when the user logs out. But we do so in a very conservative way actually:
a) there's a compile time switch to turn this off globally (--without-kill-user-processes, not used in Fedora)
b) there's a runtime switch to turn this off locally on the system (in logind.conf)
c) there's a way to opt-out invidually for each user and each task from the cleanup logic, via systemd-run/loginctl linger. This operation goes through PK, and thus can be configured in a more strict or more open policy, depending on whhat the admin prefers.
Would it make sense to add a extended file attribute that allows systemd to query if an application should be killed or not? screen and tmux could have those added as long as they don't talk to systemd.
Either this extended attribute could be checked lazily or the elf process loader could inform systemd about the new requested scope.
Has this been looked at?
So I thought I'd give this a quick go in a rawhide VM for a test, one clone and dnf update later ...
Indeed you need to loginctl enable-linger <user> before systemd-run --user --scope screen will persist through a login.
Rawhide currently appears to have selinux issues with initiating the session though as pam_systemd.so activities were being skippe d(so loginctl list-sessions showed no sessions) until the system was flipped to permissive.
With no loginctl enable-linger screen was killed off with both regular screen and via systemd-run
With enable-linger but no systemd-run screen was killed off
With enable-linger and systemd-run screen was persisted
The polkit bit appears broken in rawhide as well since the unprivileged 'localuser' user got: [localuser@localhost ~]$ loginctl enable-linger localuser Could not enable linger: The name org.freedesktop.PolicyKit1 was not provided by any .service files
As root this could be enabled for localuser though
This does seem a fairly invasive change and currently semi-broken for testing in its entirety, can you please change the default in Rawhide to not follow upstream at this time and issue the F25 change for FESCO to decide on?
On 27/05/16 14:02, Zbigniew Jędrzejewski-Szmek wrote:
It's two lines: [Login]\nKillUserProcesses=no. But please consider switching to the new mode of using systemd-run instead.
How do I run screen with systemd-run?
I tried "systemd-run --user -t screen" but as soon as I detach from the screen session it seems to get killed.
Tom
On Fri, May 27, 2016 at 02:16:45PM +0100, Tom Hughes wrote:
On 27/05/16 14:02, Zbigniew Jędrzejewski-Szmek wrote:
It's two lines: [Login]\nKillUserProcesses=no. But please consider switching to the new mode of using systemd-run instead.
How do I run screen with systemd-run?
I tried "systemd-run --user -t screen" but as soon as I detach from the screen session it seems to get killed.
See https://www.freedesktop.org/software/systemd/man/systemd-run.html#Examples (example 5).
Zbyszek
On 27/05/16 14:19, Zbigniew Jędrzejewski-Szmek wrote:
On Fri, May 27, 2016 at 02:16:45PM +0100, Tom Hughes wrote:
On 27/05/16 14:02, Zbigniew Jędrzejewski-Szmek wrote:
It's two lines: [Login]\nKillUserProcesses=no. But please consider switching to the new mode of using systemd-run instead.
How do I run screen with systemd-run?
I tried "systemd-run --user -t screen" but as soon as I detach from the screen session it seems to get killed.
See https://www.freedesktop.org/software/systemd/man/systemd-run.html#Examples (example 5).
Which works fine except that the scope remains even after the screen has exited... Well that and it's a pain to alias screen to that because you only want to do it when it will be starting a new session. That can be solved with a bit of hackery though ;-)
Tom
On Fri, May 27, 2016 at 03:26:45PM +0100, Tom Hughes wrote:
On 27/05/16 14:19, Zbigniew Jędrzejewski-Szmek wrote:
On Fri, May 27, 2016 at 02:16:45PM +0100, Tom Hughes wrote:
On 27/05/16 14:02, Zbigniew Jędrzejewski-Szmek wrote:
It's two lines: [Login]\nKillUserProcesses=no. But please consider switching to the new mode of using systemd-run instead.
How do I run screen with systemd-run?
I tried "systemd-run --user -t screen" but as soon as I detach from the screen session it seems to get killed.
See https://www.freedesktop.org/software/systemd/man/systemd-run.html#Examples (example 5).
Which works fine except that the scope remains even after the screen has exited...
Hm, it shouldn't I think. Seems to work fine here. How are you running the command and what exactly remains behind?
Well that and it's a pain to alias screen to that because you only want to do it when it will be starting a new session. That can be solved with a bit of hackery though ;-)
Zbyszek
On 27/05/16 15:48, Zbigniew Jędrzejewski-Szmek wrote:
On Fri, May 27, 2016 at 03:26:45PM +0100, Tom Hughes wrote:
Which works fine except that the scope remains even after the screen has exited...
Hm, it shouldn't I think. Seems to work fine here. How are you running the command and what exactly remains behind?
So I'm trying this in F23 currently, so F24 might be different, but basically I did:
systemd-run --scope --user /bin/screen
and got a scope like this:
% systemctl --user status run-17952.scope ● run-17952.scope - /bin/screen Loaded: loaded (/run/user/2067/systemd/user/run-17952.scope; static; vendor preset: enabled) Drop-In: /run/user/2067/systemd/user/run-17952.scope.d └─50-Description.conf Active: active (running) since Fri 2016-05-27 16:01:41 BST; 33s ago CGroup: /user.slice/user-2067.slice/user@2067.service/run-17952.scope ├─17952 /bin/screen ├─17953 /bin/SCREEN └─17954 /bin/zsh
then hit ctrl-D to exit screen and was left with:
% systemctl --user status run-17952.scope ● run-17952.scope - /bin/screen Loaded: loaded (/run/user/2067/systemd/user/run-17952.scope; static; vendor preset: enabled) Drop-In: /run/user/2067/systemd/user/run-17952.scope.d └─50-Description.conf Active: active (running) since Fri 2016-05-27 16:01:41 BST; 39s ago
Tom
On 27/05/16 16:04, Tom Hughes wrote:
On 27/05/16 15:48, Zbigniew Jędrzejewski-Szmek wrote:
On Fri, May 27, 2016 at 03:26:45PM +0100, Tom Hughes wrote:
Which works fine except that the scope remains even after the screen has exited...
Hm, it shouldn't I think. Seems to work fine here. How are you running the command and what exactly remains behind?
So I'm trying this in F23 currently, so F24 might be different, but basically I did:
Confirmed the same in rawhide - the scope now has a hash in the name instead of a number but otherwise the same behaviour.
Note that there is a systemd update pending in rawhide but it has broken dependencies.
Tom
On Fri, May 27, 2016 at 04:06:36PM +0100, Tom Hughes wrote:
On 27/05/16 16:04, Tom Hughes wrote:
On 27/05/16 15:48, Zbigniew Jędrzejewski-Szmek wrote:
On Fri, May 27, 2016 at 03:26:45PM +0100, Tom Hughes wrote:
Which works fine except that the scope remains even after the screen has exited...
Hm, it shouldn't I think. Seems to work fine here. How are you running the command and what exactly remains behind?
So I'm trying this in F23 currently, so F24 might be different, but basically I did:
Confirmed the same in rawhide - the scope now has a hash in the name instead of a number but otherwise the same behaviour.
Note that there is a systemd update pending in rawhide but it has broken dependencies.
The problem was that samba-client-libs was dependent on libsystemd-daemon.so, which has been replaced by libsystemd.so. It was recompiled on the 23rd, but it's not available in rawhide. Something strange is going on here.
Zbyszek
On Fri, 27 May 2016 15:54:34 +0000 Zbigniew Jędrzejewski-Szmek zbyszek@in.waw.pl wrote:
The problem was that samba-client-libs was dependent on libsystemd-daemon.so, which has been replaced by libsystemd.so. It was recompiled on the 23rd, but it's not available in rawhide. Something strange is going on here.
It's simply that rawhide has failed to compose for the last now 4 days.
It's being worked on... ;)
The last issue I saw was that tmux (used by anaconda) grew a dep that lorax removes from the install, so the installer wouldn't work.
kevin
On Fri, 2016-05-27 at 15:54 +0000, Zbigniew Jędrzejewski-Szmek wrote:
The problem was that samba-client-libs was dependent on libsystemd-daemon.so, which has been replaced by libsystemd.so. It was recompiled on the 23rd, but it's not available in rawhide. Something strange is going on here.
Composes have been failing for the last few days. Packages don't reach the repos until a compose succeeds - a 'compose' builds an entire tree of repositories and deliverables, and syncs it to the primary mirror when it completes, so if composes are failing, the mirrors do not get updated.
We got a 24 compose for the first time in a few days today, so I guess Rawhide should not be far behind.
It's easy to know when this is the case, as the 'compose report' mails are sent to this very list...there is always a 'compose report' when a compose completes, so if you don't see one for a few days, it means composes are failing. The last Rawhide compose was 20160524.n.0 (which presumably ran just too early to get the samba rebuild).
On Fri, May 27, 2016 at 09:05:04AM -0700, Adam Williamson wrote:
It's easy to know when this is the case, as the 'compose report' mails are sent to this very list...there is always a 'compose report' when a compose completes, so if you don't see one for a few days, it means composes are failing. The last Rawhide compose was 20160524.n.0 (which presumably ran just too early to get the samba rebuild).
Oh, I just thought that maybe we have no broken deps so no reason to send the report :P
Zbyszek
On Fri, 27 May 2016, Zbigniew Jędrzejewski-Szmek wrote:
On Fri, May 27, 2016 at 09:05:04AM -0700, Adam Williamson wrote:
It's easy to know when this is the case, as the 'compose report' mails are sent to this very list...there is always a 'compose report' when a compose completes, so if you don't see one for a few days, it means composes are failing. The last Rawhide compose was 20160524.n.0 (which presumably ran just too early to get the samba rebuild).
Oh, I just thought that maybe we have no broken deps so no reason to send the report :P
:) No, I was getting broken deps reports almost every day. I've rebuilt samba couple days ago after fixing your patch so we should be fine once composes start working. The patch also merged upstream.
On Fri, May 27, 2016 at 09:40:15PM +0300, Alexander Bokovoy wrote:
On Fri, 27 May 2016, Zbigniew Jędrzejewski-Szmek wrote:
On Fri, May 27, 2016 at 09:05:04AM -0700, Adam Williamson wrote:
It's easy to know when this is the case, as the 'compose report' mails are sent to this very list...there is always a 'compose report' when a compose completes, so if you don't see one for a few days, it means composes are failing. The last Rawhide compose was 20160524.n.0 (which presumably ran just too early to get the samba rebuild).
Oh, I just thought that maybe we have no broken deps so no reason to send the report :P
:) No, I was getting broken deps reports almost every day. I've rebuilt samba couple days ago after fixing your patch so we should be fine once composes start working. The patch also merged upstream.
Thank you for taking care of this.
Zbyszek
On Fri, May 27, 2016 at 8:04 AM, Tom Hughes tom@compton.nu wrote:
On 27/05/16 15:48, Zbigniew Jędrzejewski-Szmek wrote:
On Fri, May 27, 2016 at 03:26:45PM +0100, Tom Hughes wrote:
Which works fine except that the scope remains even after the screen has exited...
Hm, it shouldn't I think. Seems to work fine here. How are you running the command and what exactly remains behind?
So I'm trying this in F23 currently, so F24 might be different, but basically I did:
systemd-run --scope --user /bin/screen
and got a scope like this:
% systemctl --user status run-17952.scope ● run-17952.scope - /bin/screen Loaded: loaded (/run/user/2067/systemd/user/run-17952.scope; static; vendor preset: enabled) Drop-In: /run/user/2067/systemd/user/run-17952.scope.d └─50-Description.conf Active: active (running) since Fri 2016-05-27 16:01:41 BST; 33s ago CGroup: /user.slice/user-2067.slice/user@2067.service/run-17952.scope ├─17952 /bin/screen ├─17953 /bin/SCREEN └─17954 /bin/zsh
then hit ctrl-D to exit screen and was left with:
% systemctl --user status run-17952.scope ● run-17952.scope - /bin/screen Loaded: loaded (/run/user/2067/systemd/user/run-17952.scope; static; vendor preset: enabled) Drop-In: /run/user/2067/systemd/user/run-17952.scope.d └─50-Description.conf Active: active (running) since Fri 2016-05-27 16:01:41 BST; 39s ago
Tom
Either the scope code is buggy or has IMO very strange behavior:
$ systemd-run --user --scope echo foo Running scope as unit run-4980.scope. foo
$ systemctl --user status run-4980.scope ● run-4980.scope - /usr/bin/echo foo Loaded: loaded (/run/user/1000/systemd/user/run-4980.scope; static; vendor preset: enabled) Drop-In: /run/user/1000/systemd/user/run-4980.scope.d └─50-Description.conf Active: active (running) since Fri 2016-05-27 08:16:03 PDT; 14s ago
Shouldn't the default be to remove the scope when all its processes are gone? The manpage for systemd-run says:
--remain-after-exit After the service or scope process has terminated, keep the service around until it is explicitly stopped. This is useful to collect runtime information about the service after it finished running. Also see RemainAfterExit= in systemd.service(5).
and I did *not* set that flag. At the very least, the manpage description of --scope should IMO be improved and the recommendation to use it for long-running processes should be reconsidered.
Now let's try the service mode:
$ systemctl status --user run-5457.service ● run-5457.service - /bin/sleep 30 Loaded: loaded (/run/user/1000/systemd/user/run-5457.service; static; vendor preset: enabled) Drop-In: /run/user/1000/systemd/user/run-5457.service.d └─50-Description.conf, 50-ExecStart.conf Active: active (running) since Fri 2016-05-27 08:24:03 PDT; 5s ago Main PID: 5458 (sleep) CGroup: /user.slice/user-1000.slice/user@1000.service/run-5457.service └─5458 /bin/sleep 30
Then, after a while:
$ systemctl status --user run-5457.service ● run-5457.service Loaded: not-found (Reason: No such file or directory) Active: inactive (dead)
Meanwhile, it's entirely unclear to me whether --scope will allow screen to survive past when all user sessions are logged out and how this interacts with "linger". The linger docs are not very good IMO.
Anyway, ISTM the best fix would be for tmux and screen to learn how to start *services* (not scopes) when they start a whole new session.
--Andy
On Fri, May 27, 2016 at 8:28 AM, Andrew Lutomirski luto@mit.edu wrote:
Either the scope code is buggy or has IMO very strange behavior:
$ systemd-run --user --scope echo foo Running scope as unit run-4980.scope. foo
$ systemctl --user status run-4980.scope ● run-4980.scope - /usr/bin/echo foo Loaded: loaded (/run/user/1000/systemd/user/run-4980.scope; static; vendor preset: enabled) Drop-In: /run/user/1000/systemd/user/run-4980.scope.d └─50-Description.conf Active: active (running) since Fri 2016-05-27 08:16:03 PDT; 14s ago
I played with this a bit. At least for now, I can get useful behavior like this:
$ systemd-run --user --service-type=forking /bin/bash -c "/bin/sleep 20 &" Running as unit run-7737.service.
The service will continue to exist until the whole process tree goes away. Shouldn't this be the *default* for --scope (and for tmux, screen, and nohup)?
Also, ISTM someone (systemd?) should supply a really simple DSO that programs can use to indicate their desire to create a long-running process tree like this.
Dominique Martinet dominique.martinet@cea.fr wrote:
Just noticed this change on rawhide... https://github.com/systemd/systemd/blob/master/NEWS#L29
- systemd-logind will now by default terminate user processes that are part of the user session scope unit (session-XX.scope) when the user logs out. This behavior is controlled by the KillUserProcesses= setting in logind.conf, and the previous default of "no" is now changed to "yes". This means that user sessions will be properly cleaned up after, but additional steps are necessary to allow intentionally long-running processes to survive logout.
This sounds very much like a system-wide Change. Where can I find the Change proposal?
Björn Persson
On Fri, 2016-05-27 at 18:15 +0200, Björn Persson wrote:
Dominique Martinet dominique.martinet@cea.fr wrote:
Just noticed this change on rawhide... https://github.com/systemd/systemd/blob/master/NEWS#L29 * systemd-logind will now by default terminate user processes that are part of the user session scope unit (session-XX.scope) when the user logs out. This behavior is controlled by the KillUserProcesses= setting in logind.conf, and the previous default of "no" is now changed to "yes". This means that user sessions will be properly cleaned up after, but additional steps are necessary to allow intentionally long-running processes to survive logout.
This sounds very much like a system-wide Change. Where can I find the Change proposal?
Björn Persson
devel mailing list devel@lists.fedoraproject.org https://lists.fedoraproject.org/admin/lists/devel@lists.fedoraproject .org
Also I could see separate Workstation & Server changes - as the impact of this change is IMHO much bigger than on the Workstation where it actually might have benefits in some cases.
Martin Kolman wrote:
On Fri, 2016-05-27 at 18:15 +0200, Björn Persson wrote:
Dominique Martinet dominique.martinet@cea.fr wrote:
Just noticed this change on rawhide... https://github.com/systemd/systemd/blob/master/NEWS#L29
- systemd-logind will now by default terminate user processes
that are part of the user session scope unit (session-XX.scope) when the user logs out. This behavior is controlled by the KillUserProcesses= setting in logind.conf, and the previous default of "no" is now changed to "yes". This means that user sessions will be properly cleaned up after, but additional steps are necessary to allow intentionally long-running processes to survive logout.
This sounds very much like a system-wide Change. Where can I find the Change proposal?
Björn Persson
devel mailing list devel@lists.fedoraproject.org https://lists.fedoraproject.org/admin/lists/devel@lists.fedoraproject .org
Also I could see separate Workstation & Server changes - as the impact of this change is IMHO much bigger than on the Workstation where it actually might have benefits in some cases.
This change might be reasonable on a Workstation but it makes no sense on a Server. It's debatable whether it's actually reasonable on a Workstation too. Jobs backgrounded with "&" in a shell are expected to keep running. Having to create rules and profiles for every possible command is idiotic.
On Fri, May 27, 2016 at 06:15:08PM +0200, Björn Persson wrote:
Dominique Martinet dominique.martinet@cea.fr wrote:
Just noticed this change on rawhide... https://github.com/systemd/systemd/blob/master/NEWS#L29
- systemd-logind will now by default terminate user processes that are part of the user session scope unit (session-XX.scope) when the user logs out. This behavior is controlled by the KillUserProcesses= setting in logind.conf, and the previous default of "no" is now changed to "yes". This means that user sessions will be properly cleaned up after, but additional steps are necessary to allow intentionally long-running processes to survive logout.
This sounds very much like a system-wide Change. Where can I find the Change proposal?
It's not a Fedora-specific change. See the other parts of the thread for links to upstream discussions (in particular the mail from Michal).
Zbyszek
On Fri, May 27, 2016 at 12:20 PM, Zbigniew Jędrzejewski-Szmek zbyszek@in.waw.pl wrote:
On Fri, May 27, 2016 at 06:15:08PM +0200, Björn Persson wrote:
Dominique Martinet dominique.martinet@cea.fr wrote:
Just noticed this change on rawhide... https://github.com/systemd/systemd/blob/master/NEWS#L29
- systemd-logind will now by default terminate user processes that are part of the user session scope unit (session-XX.scope) when the user logs out. This behavior is controlled by the KillUserProcesses= setting in logind.conf, and the previous default of "no" is now changed to "yes". This means that user sessions will be properly cleaned up after, but additional steps are necessary to allow intentionally long-running processes to survive logout.
This sounds very much like a system-wide Change. Where can I find the Change proposal?
It's not a Fedora-specific change. See the other parts of the thread for links to upstream discussions (in particular the mail from Michal).
I don't see anything in the Changes policy[1] that limits it to Fedora-specific changes. In fact, changes like this are exactly what the policy seems to capture: "Complex system wide changes involve system-wide defaults, critical path components, or other changes that are not eligible as self contained changes."
This is a system-wide default change in a critpath component, which qualifies twice! But seriously, large changes in new upstream releases are often captured by the change process (boost updates, glibc and gcc updates, etc.) Using the change process in this case would let us coordinate efforts and address and answer technical questions like "how do you indicate that you want a process to persist after logout," "can we/how do we modify programs that rely on this behavior to do the right thing (screen, tmux, etc.)," and "does it make sense to deviate from upstream in any fedora products," in addition to philosophical questions like what a "logout" really means or should mean, before the change landed.
Rich
On Fri, May 27, 2016 at 1:41 PM, Rich Mattes richmattes@gmail.com wrote:
On Fri, May 27, 2016 at 12:20 PM, Zbigniew Jędrzejewski-Szmek zbyszek@in.waw.pl wrote:
On Fri, May 27, 2016 at 06:15:08PM +0200, Björn Persson wrote:
Dominique Martinet dominique.martinet@cea.fr wrote:
Just noticed this change on rawhide... https://github.com/systemd/systemd/blob/master/NEWS#L29
- systemd-logind will now by default terminate user processes that
are
part of the user session scope unit (session-XX.scope) when theuser
logs out. This behavior is controlled by the KillUserProcesses= setting in logind.conf, and the previous default of "no" is now changed to "yes". This means that user sessions will be properly cleaned up after, but additional steps are necessary to allow intentionally long-running processes to survive logout.This sounds very much like a system-wide Change. Where can I find the Change proposal?
It's not a Fedora-specific change. See the other parts of the thread for links to upstream discussions (in particular the mail from Michal).
I don't see anything in the Changes policy[1] that limits it to Fedora-specific changes. In fact, changes like this are exactly what the policy seems to capture: "Complex system wide changes involve system-wide defaults, critical path components, or other changes that are not eligible as self contained changes."
This is a system-wide default change in a critpath component, which qualifies twice! But seriously, large changes in new upstream releases are often captured by the change process (boost updates, glibc and gcc updates, etc.) Using the change process in this case would let us coordinate efforts and address and answer technical questions like "how do you indicate that you want a process to persist after logout," "can we/how do we modify programs that rely on this behavior to do the right thing (screen, tmux, etc.)," and "does it make sense to deviate from upstream in any fedora products," in addition to philosophical questions like what a "logout" really means or should mean, before the change landed.
Rich
[1] https://fedoraproject.org/wiki/Changes/Policy
I agree; just because the change happened upstream in systemd doesn't mean that this shouldn't be evaluated in Fedora itself before being turned on by default.
This absolutely seems like the kind of thing that should be a system-wide change proposal (for F25, I guess).
Ben Rosser
So, my 2 cents...
Some questions for upstream:
* I assume killed processes are logged in the journal, but Is there any way to have a 'permissive' version? ie, simply log what would have been killed, but not do anything? That would be very helpful to folks to identify things that would be affected here without disrupting them at first. It would also allow bugs in other packages to get fixed up.
* Does 'loginctl enable-linger <user>' take effect in the current session? Or do you have to start a new one? does it persist over sessions or only affects the current/next one?
* How can I tell if linger is enabled or disabled on a user?
* enable-linger/disable-linger need root? So, the only way the user can exclude things is to use systemd-run?
For the Fedora side:
I agree that it should be a F25 change so things can be coordinated and so it has higher visibility for users. This would also allow time/a chance for working groups to decide if they want to have a per edition default that's different from the base one.
If something like a 'permissive' mode is possible, I would think it would be nice to move rawhide to that for now, if not, I don't feel too strongly either way on leaving it enabled for now. (On one hand disruption for users, on the other hand rawhide users should all be subscribed to this list and can change the default if they wish).
If we can handle the common cases, I think this is a lovely step forward.
kevin
On Sun, May 29, 2016 at 11:53 AM, Kevin Fenzi kevin@scrye.com wrote:
- Does 'loginctl enable-linger <user>' take effect in the current session? Or do you have to start a new one? does it persist over sessions or only affects the current/next one?
Also, what exactly does linger do? Does it turn off the kill thing entirely? Does it allow services to survive but not scopes? Or do services survive even without linger? The docs are IMO very unclear here.
On Sun, May 29, 2016 at 12:53:26PM -0600, Kevin Fenzi wrote:
So, my 2 cents...
Some questions for upstream:
- I assume killed processes are logged in the journal, but Is there any way to have a 'permissive' version? ie, simply log what would have been killed, but not do anything? That would be very helpful to folks to identify things that would be affected here without disrupting them at first. It would also allow bugs in other packages to get fixed up.
No, the killed processes are not logged, even at debug level, afaik. PID1 simply loops over the control cgroup and sends signals.
I see how having a 'permissive' option could be useful.
- Does 'loginctl enable-linger <user>' take effect in the current session? Or do you have to start a new one? does it persist over sessions or only affects the current/next one?
Lingering applies to the systemd --user instance, a.k.a. systemd@.service, not to the session. Lingering means that systemd@.service is present even if you are not logged in. If lingering is disabled, it is started on login, and stopped on logout of that user.
Killing processes which are part of the session (session-<n>.scope) doesn't have anything to do directly with lingering. It is controlled by the global KillUserProcesses= setting.
The connection between KillUserProcesses= and long-running processes is that if KillUserProcesses=yes is set (the new default), to successfully create a process which survives logout two steps are needed: 1. move it out of the session into a systemd --user unit, 2. make that systemd --user instance persistent, i.e. enable lingering.
Setting lingering is done over dbus, takes effect immediately, and is persistent (/var/lib/systemd/linger/<user> is created).
Setting KillUserProcesses can be done by modifying /etc/systemd/logind.conf, and also takes effect immediately, if systemd-logind is reloaded (using SIGHUP).
- How can I tell if linger is enabled or disabled on a user?
loginctl user-status <user> or loginctl show-user -p Linger <user>
- enable-linger/disable-linger need root? So, the only way the user can exclude things is to use systemd-run?
It is controlled through polkit. There are two operations: org.freedesktop.login1.set-user-linger which by default requires admin privileges, and org.freedesktop.login1.set-self-linger which by default allow any user to enable lingering for themselves. So lingering (with the default policy) requires no privileges, just an explicit enabling.
For the Fedora side:
I agree that it should be a F25 change so things can be coordinated and so it has higher visibility for users. This would also allow time/a chance for working groups to decide if they want to have a per edition default that's different from the base one.
If something like a 'permissive' mode is possible, I would think it would be nice to move rawhide to that for now, if not, I don't feel too strongly either way on leaving it enabled for now. (On one hand disruption for users, on the other hand rawhide users should all be subscribed to this list and can change the default if they wish).
If we can handle the common cases, I think this is a lovely step forward.
Zbyszek
So there's tmux, screen, curl, wget, and probably quite a few others that don't necessarily get daemonized that are probably affected.
It may be that most problems are processes instantiated by the DE running in the user session; and if that's true then maybe the DE needs to be more aggressive with cleanup rather than depending on systemd?
Found this related thread, which is now closed for comments but it's interesting perspectives. https://github.com/tmux/tmux/issues/428
--- Chris Murphy
On Sun, May 29, 2016 at 06:51:20PM -0600, Chris Murphy wrote:
So there's tmux, screen, curl, wget, and probably quite a few others that don't necessarily get daemonized that are probably affected.
Currently you can `scp ... &' and the process will survive a logout and continue running. Very useful when you want to copy files between machines without waiting around.
Rich.
On Mon, May 30, 2016 at 6:20 PM, Richard W.M. Jones rjones@redhat.com wrote:
On Sun, May 29, 2016 at 06:51:20PM -0600, Chris Murphy wrote:
So there's tmux, screen, curl, wget, and probably quite a few others that don't necessarily get daemonized that are probably affected.
Currently you can `scp ... &' and the process will survive a logout and continue running. Very useful when you want to copy files between machines without waiting around.
Rich.
IMO it is expected that any process started with & will survive a logout. If this does not allow that behavior by default it will invite much discord.
On Sun, May 29, 2016 at 06:51:20PM -0600, Chris Murphy wrote:
So there's tmux, screen, curl, wget, and probably quite a few others that don't necessarily get daemonized that are probably affected.
I would really like to see a solution whereby tmux and screen _just work_ without any required changes to user behavior. They're basically commands which _indicate_ "I want a new session that persists".
It seems fine to have some administrative option which prevents that, but I think allowing that behavior should be the default. That way, accidental lingering processes will be cleaned up, but people's expectations around tmux/screen will still be met.
I liked the suggestion of having those programs become "scope" aware (https://github.com/tmux/tmux/issues/428) but it looks like upstream tmux at least is not keen on it. What can we do instead?
----- Original Message -----
On Sun, May 29, 2016 at 06:51:20PM -0600, Chris Murphy wrote:
So there's tmux, screen, curl, wget, and probably quite a few others that don't necessarily get daemonized that are probably affected.
I would really like to see a solution whereby tmux and screen _just work_ without any required changes to user behavior. They're basically commands which _indicate_ "I want a new session that persists".
Really? The only times I ever used it was to access serial consoles with a better emulation than separate apps.
It seems fine to have some administrative option which prevents that, but I think allowing that behavior should be the default. That way, accidental lingering processes will be cleaned up, but people's expectations around tmux/screen will still be met.
I liked the suggestion of having those programs become "scope" aware (https://github.com/tmux/tmux/issues/428) but it looks like upstream tmux at least is not keen on it. What can we do instead?
Patch the applications downstream, or document things with enough details and mention it in the release notes.
Both the writing of the documentation and the patching would show whether implementing lingering is well documented enough, and viable enough.
On 01/06/16 10:20, Bastien Nocera wrote:
On Sun, May 29, 2016 at 06:51:20PM -0600, Chris Murphy wrote:
So there's tmux, screen, curl, wget, and probably quite a few others that don't necessarily get daemonized that are probably affected.
I would really like to see a solution whereby tmux and screen _just work_ without any required changes to user behavior. They're basically commands which _indicate_ "I want a new session that persists".
Really? The only times I ever used it was to access serial consoles with a better emulation than separate apps.
You've obviously never had to run something that's going to take hours or days to complete on a remote server and not wanted it to abort half way through because of a network glitch then.
That's when I use screen, either just setting running something in the background, or leaving it connected but knowing it will continue if anything goes wrong and I can just reattach from a new login.
Tom
----- Original Message -----
On 01/06/16 10:20, Bastien Nocera wrote:
On Sun, May 29, 2016 at 06:51:20PM -0600, Chris Murphy wrote:
So there's tmux, screen, curl, wget, and probably quite a few others that don't necessarily get daemonized that are probably affected.
I would really like to see a solution whereby tmux and screen _just work_ without any required changes to user behavior. They're basically commands which _indicate_ "I want a new session that persists".
Really? The only times I ever used it was to access serial consoles with a better emulation than separate apps.
You've obviously never had to run something that's going to take hours or days to complete on a remote server and not wanted it to abort half way through because of a network glitch then.
Yeah, I never used nohup. *roll eyes*
That's when I use screen, either just setting running something in the background, or leaving it connected but knowing it will continue if anything goes wrong and I can just reattach from a new login.
The screen manual says "Screen is a full-screen window manager that multiplexes a physical terminal between several processes (typically interactive shells).".
It doesn't say "it persists in the session by default, always, that's always how you're going to use it".
On 01/06/16 11:41, Bastien Nocera wrote:
You've obviously never had to run something that's going to take hours or days to complete on a remote server and not wanted it to abort half way through because of a network glitch then.
Yeah, I never used nohup. *roll eyes*
Sure, nohup is like the cheap version of screen, and systemd-run would do something basically equivalent when linger is enabled.
The advantage of screen is being able to interact with the process when necessary, for example when doing an OS upgrade on a remote machine and it want to ask you questions.
That's when I use screen, either just setting running something in the background, or leaving it connected but knowing it will continue if anything goes wrong and I can just reattach from a new login.
The screen manual says "Screen is a full-screen window manager that multiplexes a physical terminal between several processes (typically interactive shells).".
Sure, and I don't generally use any of that. I'm a very simple screen user and just use it for long running tasks that I need to be able to monitor the output of or interact with remotely.
I'm not against the change in systemd at all, I would just like cleaner ways to make the handful of things like screen and tmux continue to work.
Tom
Tom Hughes wrote:
On 01/06/16 11:41, Bastien Nocera wrote:
You've obviously never had to run something that's going to take hours or days to complete on a remote server and not wanted it to abort half way through because of a network glitch then.
Yeah, I never used nohup. *roll eyes*
Sure, nohup is like the cheap version of screen, and systemd-run would do something basically equivalent when linger is enabled.
The advantage of screen is being able to interact with the process when necessary, for example when doing an OS upgrade on a remote machine and it want to ask you questions.
That's when I use screen, either just setting running something in the background, or leaving it connected but knowing it will continue if anything goes wrong and I can just reattach from a new login.
The screen manual says "Screen is a full-screen window manager that multiplexes a physical terminal between several processes (typically interactive shells).".
Sure, and I don't generally use any of that. I'm a very simple screen user and just use it for long running tasks that I need to be able to monitor the output of or interact with remotely.
I'm not against the change in systemd at all, I would just like cleaner ways to make the handful of things like screen and tmux continue to work.
This is still looking at the problem back-asswards. The problem isn't that screen and tmux are special cases. The problem is that some handful of programs that got spawned in a GUI desktop environment are special cases, not exiting when they should.
Fix the broken programs, don't force every well-behaved program in the universe to change to accommodate your broken GUI environment. This is Programming 101.
The fact is that screen and tmux *aren't* the only programs that can legitimately run in the background. *Any* command can be backgrounded / nohup'd by a user and *all* of them are legitimate in that case.
On 01/06/16 12:19, Howard Chu wrote:
This is still looking at the problem back-asswards. The problem isn't that screen and tmux are special cases. The problem is that some handful of programs that got spawned in a GUI desktop environment are special cases, not exiting when they should.
I'm sorry, but I disagree.
There are basically three things that I'm aware of that are used from a user session to run something in background in a way that will survive the end of the user session and you named them - nohup, screen and tmux.
Anything else I put in background with "&" in my shell will be killed when I log out unless it has been disowned which is basically a shell builtin version of nohup.
So things which are intended to survive the end of a login session really are the special case. The default behaviour has always been that things are killed when you logout, it's just that the way of enforcing that (sending a SIHGHUP) is fairly gentle so things can easily survive without really intending to.
Tom
Tom Hughes wrote:
On 01/06/16 12:19, Howard Chu wrote:
This is still looking at the problem back-asswards. The problem isn't that screen and tmux are special cases. The problem is that some handful of programs that got spawned in a GUI desktop environment are special cases, not exiting when they should.
I'm sorry, but I disagree.
There are basically three things that I'm aware of that are used from a user session to run something in background in a way that will survive the end of the user session and you named them - nohup, screen and tmux.
Your awareness is apparently limited.
Anything else I put in background with "&" in my shell will be killed when I log out unless it has been disowned which is basically a shell builtin version of nohup.
Your shell is not *every* shell. csh and derivatives don't provide (and don't need) "nohup".
So things which are intended to survive the end of a login session really are the special case. The default behaviour has always been that things are killed when you logout,
False. Your premise is wrong, and your conclusion is wrong.
it's just that the way of enforcing that (sending a SIHGHUP) is fairly gentle so things can easily survive without really intending to.
Tom
On Wed, Jun 01, 2016 at 12:43:38PM +0100, Howard Chu wrote:
There are basically three things that I'm aware of that are used from a user session to run something in background in a way that will survive the end of the user session and you named them - nohup, screen and tmux.
Your awareness is apparently limited.
Of course, the awareness of every human being is limited. Let's take that for granted in threads like this, and keep the discussion focused on _increasing_ understanding and coming to a solution.
On Wed, Jun 1, 2016 at 7:43 AM, Howard Chu hyc@symas.com wrote:
Tom Hughes wrote:
On 01/06/16 12:19, Howard Chu wrote:
This is still looking at the problem back-asswards. The problem isn't that screen and tmux are special cases. The problem is that some handful of programs that got spawned in a GUI desktop environment are special cases, not exiting when they should.
I'm sorry, but I disagree.
There are basically three things that I'm aware of that are used from a user session to run something in background in a way that will survive the end of the user session and you named them - nohup, screen and tmux.
Your awareness is apparently limited.
mysqldump, pg_dump, and rsync leap to mind as examples of potentially lengthy procedures that need to continue even if the connecting user shell from which they are executed is interrupted. So do some userland "daemon" processes, such as a horrific "perl daemon" I've seen and the Python based "buildbot" daemon.
On Wed, 2016-06-01 at 12:28 +0100, Tom Hughes wrote:
On 01/06/16 12:19, Howard Chu wrote:
This is still looking at the problem back-asswards. The problem isn't that screen and tmux are special cases. The problem is that some handful of programs that got spawned in a GUI desktop environment are special cases, not exiting when they should.
I'm sorry, but I disagree.
There are basically three things that I'm aware of that are used from a user session to run something in background in a way that will survive the end of the user session and you named them - nohup, screen and tmux.
You forgot emacs.
Yes, really, emacs the text editor -- which happens to have a server mode that allows it to do something similar to what tmux does, but more emacs-specific. Now you're thinking, "Is anyone using that?!" Yes, people use it -- someone at a Lisp user group was shocked that I had emacs running in a tmux session and told me all about the emacs server and how great it is.
So things which are intended to survive the end of a login session really are the special case. The default behaviour has always been that things are killed when you logout,
No, the default behavior has been that *some* things are killed when you log out. There are plenty of well-documented ways to avoid receiving SIGHUP and a large ecosystem of software out there that expects those techniques to work. Yes it is possible to push ecosystem-wide fixes, but it is a massive undertaking that needs to have real motivation. What is the strong motivation for this change? Why should anyone change their workflow, patch their code, or do anything unrelated to their day job to accommodate this new and unexpected behavior?
-- Ben
On Wed, 01.06.16 12:19, Howard Chu (hyc@symas.com) wrote:
This is still looking at the problem back-asswards. The problem isn't that screen and tmux are special cases. The problem is that some handful of programs that got spawned in a GUI desktop environment are special cases, not exiting when they should.
Fix the broken programs, don't force every well-behaved program in the universe to change to accommodate your broken GUI environment. This is Programming 101.
Again, this isn't just work-arounds around broken programs. It's a security thing. It's privileged code (logind, PID 1) that enforces a clear life-cycle on unprivileged programs.
Any scheme that relies on unprivileged programs "being nice" doesn't fix the inherent security problem: after logout a user should not be able consume further runtime resources on the system, regardless if he does that because of a bug or on purpose.
Lennart
On Wed, Jun 01, 2016 at 03:48:04PM +0200, Lennart Poettering wrote:
Again, this isn't just work-arounds around broken programs. It's a security thing. It's privileged code (logind, PID 1) that enforces a clear life-cycle on unprivileged programs.
Any scheme that relies on unprivileged programs "being nice" doesn't fix the inherent security problem: after logout a user should not be able consume further runtime resources on the system, regardless if he does that because of a bug or on purpose.
This paints a very specific premise of what a "logout" is, and I'm not sure I agree with it. There are actually many cases where I want to use resources on systems I have accounts on without specifically being logged in — the login session is just a connection in to manage things.
Otherwise, we should remove user crontabs, at, and similar. And there are definitely some systems where that policy has a place, but I don't see it making sense as Fedora default, either system wide or for any of the Editions.
On Wed, 2016-06-01 at 09:59 -0400, Matthew Miller wrote:
This paints a very specific premise of what a "logout" is, and I'm not sure I agree with it. There are actually many cases where I want to use resources on systems I have accounts on without specifically being logged in — the login session is just a connection in to manage things.
Otherwise, we should remove user crontabs, at, and similar. And there are definitely some systems where that policy has a place, but I don't see it making sense as Fedora default, either system wide or for any of the Editions.
Explicitly marking things to escape the session (nohup, crontab, starting system services, etc) is very different from just leaking any and all non-terminating processes out of the session.
I am very much in favor of systemd enforcing that the session actually ends when I log out, so that I don't accidentally leave processes running. Leaking session processes have been a perennial problem that we have been battling forever (gconf, ibus, pulseaudio, the list goes on...). And they are causing actual problems, from preventing re-login to subtly breaking the next session to slowing down shutdown.
That doesn't mean that you can't have user crontabs. As Lennart says, using those mechanisms should ideally be a privileged operation (with a lenient policy on single-user systems).
Matthias
Matthias Clasen wrote:
On Wed, 2016-06-01 at 09:59 -0400, Matthew Miller wrote:
This paints a very specific premise of what a "logout" is, and I'm not sure I agree with it. There are actually many cases where I want to use resources on systems I have accounts on without specifically being logged in — the login session is just a connection in to manage things.
Otherwise, we should remove user crontabs, at, and similar. And there are definitely some systems where that policy has a place, but I don't see it making sense as Fedora default, either system wide or for any of the Editions.
Explicitly marking things to escape the session (nohup, crontab, starting system services, etc) is very different from just leaking any and all non-terminating processes out of the session.
I am very much in favor of systemd enforcing that the session actually ends when I log out, so that I don't accidentally leave processes running. Leaking session processes have been a perennial problem that we have been battling forever (gconf, ibus, pulseaudio, the list goes on...). And they are causing actual problems, from preventing re-login to subtly breaking the next session to slowing down shutdown.
So far you have only identified problems associated with GUI sessions. I still see no justification for terminating *all* user processes when it's clear there's only a problem with one very specific class of processes, all being launched in a very specific context.
That doesn't mean that you can't have user crontabs. As Lennart says, using those mechanisms should ideally be a privileged operation (with a lenient policy on single-user systems).
On Wed, Jun 01, 2016 at 04:09:19PM +0100, Howard Chu wrote:
Matthias Clasen wrote:
I am very much in favor of systemd enforcing that the session actually ends when I log out, so that I don't accidentally leave processes running. Leaking session processes have been a perennial problem that we have been battling forever (gconf, ibus, pulseaudio, the list goes on...). And they are causing actual problems, from preventing re-login to subtly breaking the next session to slowing down shutdown.
So far you have only identified problems associated with GUI sessions. I still see no justification for terminating *all* user processes when it's clear there's only a problem with one very specific class of processes, all being launched in a very specific context.
Yeah, to me it sounds like these problematic processes should say "kill me when the session ends", rather than others having to opt-out.
If the processes are killed by default, won't other applications start to rely on it and users who change the default will have more and more applications running after logout?
On Wed, Jun 1, 2016 at 10:58 AM, Matthias Clasen mclasen@redhat.com wrote:
On Wed, 2016-06-01 at 09:59 -0400, Matthew Miller wrote:
This paints a very specific premise of what a "logout" is, and I'm not sure I agree with it. There are actually many cases where I want to use resources on systems I have accounts on without specifically being logged in — the login session is just a connection in to manage things.
Otherwise, we should remove user crontabs, at, and similar. And there are definitely some systems where that policy has a place, but I don't see it making sense as Fedora default, either system wide or for any of the Editions.
Explicitly marking things to escape the session (nohup, crontab, starting system services, etc) is very different from just leaking any and all non-terminating processes out of the session.
I am very much in favor of systemd enforcing that the session actually ends when I log out, so that I don't accidentally leave processes running. Leaking session processes have been a perennial problem that we have been battling forever (gconf, ibus, pulseaudio, the list goes on...). And they are causing actual problems, from preventing re-login to subtly breaking the next session to slowing down shutdown.
That doesn't mean that you can't have user crontabs. As Lennart says, using those mechanisms should ideally be a privileged operation (with a lenient policy on single-user systems).
Matthias
Why should the policy only be lenient on single-user systems?
Even if I accept for the moment that letting a user keep processes running on a system when they log out should be considered "privileged", this is a privilege that has more or less always been granted to users by default. Why do we suddenly need to change the default?
Sure, providing functionality to *remove* that privilege from a user as necessary is a nice feature. But I would strongly be opposed to the distribution suddenly changing the status quo here without good reason.
Ben Rosser
Dne 1.6.2016 v 18:18 Ben Rosser napsal(a):
On Wed, Jun 1, 2016 at 10:58 AM, Matthias Clasen <mclasen@redhat.com mailto:mclasen@redhat.com> wrote:
On Wed, 2016-06-01 at 09:59 -0400, Matthew Miller wrote: > > This paints a very specific premise of what a "logout" is, and I'm > not > sure I agree with it. There are actually many cases where I want to > use > resources on systems I have accounts on without specifically being > logged in — the login session is just a connection in to manage > things. > > Otherwise, we should remove user crontabs, at, and similar. And > there > are definitely some systems where that policy has a place, but I > don't > see it making sense as Fedora default, either system wide or for any > of > the Editions. > Explicitly marking things to escape the session (nohup, crontab, starting system services, etc) is very different from just leaking any and all non-terminating processes out of the session. I am very much in favor of systemd enforcing that the session actually ends when I log out, so that I don't accidentally leave processes running. Leaking session processes have been a perennial problem that we have been battling forever (gconf, ibus, pulseaudio, the list goes on...). And they are causing actual problems, from preventing re-login to subtly breaking the next session to slowing down shutdown. That doesn't mean that you can't have user crontabs. As Lennart says, using those mechanisms should ideally be a privileged operation (with a lenient policy on single-user systems). Matthias --Why should the policy only be lenient on single-user systems?
Even if I accept for the moment that letting a user keep processes running on a system when they log out should be considered "privileged", this is a privilege that has more or less always been granted to users by default. Why do we suddenly need to change the default?
I'd say that the privilege was granted by accident not by design and this should change now, since systemd introduces infrastructure to fix this. I consider this reasonable, although it apparently breaks some forkflows. As long as there is way to change the defaults for experienced users, I welcome such change. I dare to say that this is good feature for majority of Fedora users although from the discussion of experienced users on this list it might seem to break the whole world.
Vít
Sure, providing functionality to *remove* that privilege from a user as necessary is a nice feature. But I would strongly be opposed to the distribution suddenly changing the status quo here without good reason.
Ben Rosser
-- devel mailing list devel@lists.fedoraproject.org https://lists.fedoraproject.org/admin/lists/devel@lists.fedoraproject.org
Hi,
On Wed, Jun 1, 2016 at 10:58 AM, Matthias Clasen mclasen@redhat.com wrote:
Leaking session processes have been a perennial problem that we have been battling forever (gconf, ibus, pulseaudio, the list goes on...). And they are causing actual problems, from preventing re-login to subtly breaking the next session to slowing down shutdown.
This is definitely true. It's a class of bug that's hit us over and over again. (in addition to gconf, ibus, and pulseaudio above, you could add bonobo-activation-server, evolution-data-server, gam_server off the top of my head). The problem is that background services don't generally take display connections, since they don't need to display anything. So they don't die when the display goes away.
We tried to fix this a long time ago with the introduction of dbus into the desktop. The idea was the session dbus-daemon daemon would define the scope of the session, and services would grab a bus connection if they wanted to be scoped to the session.
Of course, starting in Fedora 24, we no longer have a session bus. It's a user bus now. So the bus won't go away until the last user session (for a user) ends, and those background services won't go away until they lose their bus connections, since they still rely on dbus-daemon to cut the cord when the session ends. While those background services are waiting for their bus connection to disappear, they're keeping the session alive (but in a "closing" state).
To me, KillUserProcesses=yes is better from a theoretical it-should-have-always-done-this-if-it-could-have standpoint, and it's better from real world it-eliminates-a-class-of-bugs-that-has-plagued-us standpoint.
I don't like that it requires users to have to change workflows, so that's a negative and I understand why the change is controversial.
We may want to consider reverting the user bus change for F24 and revisit in F25, not sure.
Ray
On 2 June 2016 at 11:01, Ray Strode halfline@gmail.com wrote:
Hi,
....
We may want to consider reverting the user bus change for F24 and revisit in F25, not sure.
I believe we are less than a week from releasing F24... if there is a need to do this how far back does testing need to go? Are we back to alpha? beta? gamma?
Ray
devel mailing list devel@lists.fedoraproject.org https://lists.fedoraproject.org/admin/lists/devel@lists.fedoraproject.org
On 06/02/2016 11:01 AM, Ray Strode wrote:
Hi,
On Wed, Jun 1, 2016 at 10:58 AM, Matthias Clasen mclasen@redhat.com wrote:
Leaking session processes have been a perennial problem that we have been battling forever (gconf, ibus, pulseaudio, the list goes on...). And they are causing actual problems, from preventing re-login to subtly breaking the next session to slowing down shutdown.
This is definitely true. It's a class of bug that's hit us over and over again. (in addition to gconf, ibus, and pulseaudio above, you could add bonobo-activation-server, evolution-data-server, gam_server off the top of my head). The problem is that background services don't generally take display connections, since they don't need to display anything. So they don't die when the display goes away.
We tried to fix this a long time ago with the introduction of dbus into the desktop. The idea was the session dbus-daemon daemon would define the scope of the session, and services would grab a bus connection if they wanted to be scoped to the session.
Of course, starting in Fedora 24, we no longer have a session bus. It's a user bus now. So the bus won't go away until the last user session (for a user) ends, and those background services won't go away until they lose their bus connections, since they still rely on dbus-daemon to cut the cord when the session ends. While those background services are waiting for their bus connection to disappear, they're keeping the session alive (but in a "closing" state).
To me, KillUserProcesses=yes is better from a theoretical it-should-have-always-done-this-if-it-could-have standpoint, and it's better from real world it-eliminates-a-class-of-bugs-that-has-plagued-us standpoint.
I don't like that it requires users to have to change workflows, so that's a negative and I understand why the change is controversial.
We may want to consider reverting the user bus change for F24 and revisit in F25, not sure.
I don't think we need to change Fedora 24 for this. Unless I misunderstood, this systemd change has not been pushed to Fedora 24 (nor proposed for it). We're prepping for how to deal with things in Fedora 25.
So to Smooge's point, I think we should leave this as-is and avoid any new fallout during Final Freeze. We have months to address things in Fedora 25.
On 06/02/2016 03:13 PM, Stephen Gallagher wrote:
I don't think we need to change Fedora 24 for this. Unless I misunderstood, this systemd change has not been pushed to Fedora 24 (nor proposed for it). We're prepping for how to deal with things in Fedora 25.
You should not so easily dismiss and rule out core/baseOS ( and even other ) components adapting similar or same updating rebase scheme as the kernel community is using as ( and has prove to be working ).
There where upcoming changes in systemd that prevented this back in 2013 when I wrote this [1] proposal and we discussed it but those road blocks are no more afaik hence there is nothing preventing systemd from adapting an rebase scheme similar/same to the one that the kernel community is using.
JBG
1. https://fedoraproject.org/wiki/User:Johannbg/Systemd/systemd-rebase#DRAFT_Sy...
On 06/02/2016 11:36 AM, Jóhann B. Guðmundsson wrote:
On 06/02/2016 03:13 PM, Stephen Gallagher wrote:
I don't think we need to change Fedora 24 for this. Unless I misunderstood, this systemd change has not been pushed to Fedora 24 (nor proposed for it). We're prepping for how to deal with things in Fedora 25.
You should not so easily dismiss and rule out core/baseOS ( and even other ) components adapting similar or same updating rebase scheme as the kernel community is using as ( and has prove to be working ).
There where upcoming changes in systemd that prevented this back in 2013 when I wrote this [1] proposal and we discussed it but those road blocks are no more afaik hence there is nothing preventing systemd from adapting an rebase scheme similar/same to the one that the kernel community is using.
I'm not saying that upstream systemd wouldn't or couldn't rebase, I was saying that my understanding was that there was no plans for the KillUserProcesses default to be changed post-release in any Fedora. That would be a significant violation of the stable update policy, which I'm pretty certain the systemd maintainers are aware of.
I should also have been more specific with the term "this" in my last email; I was referring to whether we needed to revert the user bus change because of systemd. By my current understanding, that would not be necessary.
Hi,
I don't think we need to change Fedora 24 for this. Unless I misunderstood, this systemd change has not been pushed to Fedora 24 (nor proposed for it). We're prepping for how to deal with things in Fedora 25.
No, I was the one misunderstanding things. I thought the systemd change got pushed into F24 in early May, but reading through my IRC logs, that was just miscommunication.
Of course the systemd change was going to address a bug we have with lingering processes at log out, so we'll probably have to come up with some other fix for f24 at some point, but we can do it as a post-release update.
--Ray
On Thu, Jun 2, 2016 at 9:01 AM, Ray Strode halfline@gmail.com wrote:
Of course, starting in Fedora 24, we no longer have a session bus. It's a user bus now. So the bus won't go away until the last user session (for a user) ends, and those background services won't go away until they lose their bus connections, since they still rely on dbus-daemon to cut the cord when the session ends. While those background services are waiting for their bus connection to disappear, they're keeping the session alive (but in a "closing" state).
Sounds familiar. While it's not reproducible in a VM, on two baremetal systems I can reproduce 1m30s restart/shutdown delays on defaut clean installations.
To me, KillUserProcesses=yes is better from a theoretical it-should-have-always-done-this-if-it-could-have standpoint, and it's better from real world it-eliminates-a-class-of-bugs-that-has-plagued-us standpoint.
KillUserProcesses=yes isn't solving the problem I'm easily able to reproduce, because it isn't killing the gdm owned session-c1.scope, which appears to hang due to ibus-daemon not quitting.
There's a gdm owned ibus-daemon process, and a chris owned one. With the default of KillUserProcesses=no, restart shutdown and logout are delayed. If I set it to yes, the logout problem is fixed, but the restart and shutdown delays aren't fixed.
I don't like that it requires users to have to change workflows, so that's a negative and I understand why the change is controversial.
We may want to consider reverting the user bus change for F24 and revisit in F25, not sure.
Well it's uncertain to me whether the testers so far are just desensitized to restart delays or if they're just not encountering it, and it's a conditional problem. If it's encountered broadly and is fixable some reasonable time after release (a month? two?) fine. But already I'm in the habit of rebooting Fedora 24 with 'sudo reboot -f' because I don't have 30 seconds of patience in me let alone nearly two minutes. But I think we're stuck between a rock and hard place between excessive restart delays and reversion this late in the game.
On Thu, 2016-06-02 at 11:01 -0400, Ray Strode wrote:
Hi,
On Wed, Jun 1, 2016 at 10:58 AM, Matthias Clasen mclasen@redhat.com wrote:
Leaking session processes have been a perennial problem that we have been battling forever (gconf, ibus, pulseaudio, the list
goes
on...). And they are causing actual problems, from preventing re-
login
to subtly breaking the next session to slowing down shutdown.
This is definitely true. It's a class of bug that's hit us over and over again. (in addition to gconf, ibus, and pulseaudio above, you could add bonobo-activation-server, evolution-data-server, gam_server off the top of my head).
Annnnd gdm: https://bugzilla.redhat.com/show_bug.cgi?id=1195485#c13
On Wed, Jun 1, 2016 at 9:48 AM, Lennart Poettering mzerqung@0pointer.de wrote:
On Wed, 01.06.16 12:19, Howard Chu (hyc@symas.com) wrote:
This is still looking at the problem back-asswards. The problem isn't that screen and tmux are special cases. The problem is that some handful of programs that got spawned in a GUI desktop environment are special cases, not exiting when they should.
Fix the broken programs, don't force every well-behaved program in the universe to change to accommodate your broken GUI environment. This is Programming 101.
Again, this isn't just work-arounds around broken programs. It's a security thing. It's privileged code (logind, PID 1) that enforces a clear life-cycle on unprivileged programs.
Correct, but it's a new lifecycle that currently doesn't exist. Traditional PID 1 was essentially "start things I'm told to start, reap children that are zombied to me." Systemd has obviously improved that in various ways, and this functionality can be seen as another such improvement but it is not a clear-cut, easy decision.
Any scheme that relies on unprivileged programs "being nice" doesn't fix the inherent security problem: after logout a user should not be able consume further runtime resources on the system, regardless if he does that because of a bug or on purpose.
You keep saying this, and logically that makes sense. However, that is essentially putting blinders on to how Linux systems have worked and have been used for a very long time. The functionality you are describing should exist because it does solve a problem. The issue is what to do by default.
Given the principle of least surprise, it would make more sense to default with this being disabled out of the box. People leave tasks running in the background in various ways all the time for a variety of valid reasons. Builds, IRC sessions running in screen/tmux, local server processes, etc.
That default could be set in Fedora itself if upstream systemd wants to default with it on. Or it could even vary between Workstation, Server, and Cloud (though I'd argue that doesn't make sense).
josh
On 06/01/2016 02:01 PM, Josh Boyer wrote:
Given the principle of least surprise, it would make more sense to default with this being disabled out of the box.
I have to disagree with this statement.
Upstream should always reflect how things should be while downstream reflects how things are or atleast how things are in relevance to them since these things can deviate by factor of how many different downstream sources there are.
JBG
On Wed, Jun 1, 2016 at 9:48 AM, Lennart Poettering mzerqung@0pointer.de wrote:
On Wed, 01.06.16 12:19, Howard Chu (hyc@symas.com) wrote:
This is still looking at the problem back-asswards. The problem isn't
that
screen and tmux are special cases. The problem is that some handful of programs that got spawned in a GUI desktop environment are special cases, not exiting when they should.
Fix the broken programs, don't force every well-behaved program in the universe to change to accommodate your broken GUI environment. This is Programming 101.
Again, this isn't just work-arounds around broken programs. It's a security thing. It's privileged code (logind, PID 1) that enforces a clear life-cycle on unprivileged programs.
Any scheme that relies on unprivileged programs "being nice" doesn't fix the inherent security problem: after logout a user should not be able consume further runtime resources on the system, regardless if he does that because of a bug or on purpose.
Lennart
That's your opinion, and while many sysadmins may share it, many will not. Having this as an optional security feature would be fantastic. Enforcing it by default on every user many of which use tmux, screen, nohup, and & to persist long running processes for daily work, is not something to do just because you think it is what people should do.
On Wed, Jun 01, 2016 at 10:04:27AM -0400, Dan Book wrote:
Again, this isn't just work-arounds around broken programs. It's a security thing. It's privileged code (logind, PID 1) that enforces a clear life-cycle on unprivileged programs.
Any scheme that relies on unprivileged programs "being nice" doesn't fix the inherent security problem: after logout a user should not be able consume further runtime resources on the system, regardless if he does that because of a bug or on purpose.
Lennart
That's your opinion, and while many sysadmins may share it, many will not. Having this as an optional security feature would be fantastic. Enforcing it by default on every user many of which use tmux, screen, nohup, and & to persist long running processes for daily work, is not something to do just because you think it is what people should do.
Just a little perspective – this isn't a new option. KillUserProcesses functionality seems to be added by commit 202630822f52e06dce8404633407329c38099278 Date: Mon May 23 23:55:06 2011 +0200
Five years ago, so basically from day one. We have this optional security feature – fantastic! Also, the concept of a ”session” isn't anything new, it's core UNIX concept (setsid() enyone?)
I think that programs needing special treatment should use operating system's facilities to communicate that. So tmux, screen, nohup should really open a new session. It's unfortunate that tmux author is hostile against that, but maybe a clean, compile-time optional patch would persuade him? Anyway, I think some examples of ”how to inform systemd I'm a special program not to reap” would be welcome. Does it need to be done through D-Bus interaction with logind? Is using PAM sufficient/required? (Nb. screen already uses PAM for some functionality).
On Jun 1, 2016 7:29 AM, "Tomasz Torcz" tomek@pipebreaker.pl wrote:
On Wed, Jun 01, 2016 at 10:04:27AM -0400, Dan Book wrote:
Again, this isn't just work-arounds around broken programs. It's a security thing. It's privileged code (logind, PID 1) that enforces a clear life-cycle on unprivileged programs.
Any scheme that relies on unprivileged programs "being nice" doesn't fix the inherent security problem: after logout a user should not be able consume further runtime resources on the system, regardless if he does that because of a bug or on purpose.
Lennart
That's your opinion, and while many sysadmins may share it, many will
not.
Having this as an optional security feature would be fantastic.
Enforcing
it by default on every user many of which use tmux, screen, nohup, and
& to
persist long running processes for daily work, is not something to do
just
because you think it is what people should do.
Just a little perspective – this isn't a new option. KillUserProcesses
functionality
seems to be added by commit 202630822f52e06dce8404633407329c38099278 Date: Mon May 23 23:55:06 2011 +0200
Five years ago, so basically from day one. We have this optional security feature – fantastic! Also, the concept of a ”session” isn't anything new, it's core UNIX concept (setsid() enyone?)
I think that programs needing special treatment should use operating system's facilities to communicate that. So tmux, screen, nohup should really open a new session. It's unfortunate that tmux author is hostile against that, but maybe a clean, compile-time optional patch would
persuade
him?
You lost me. Tmux almost certainly *already* uses setsid(). The author is hostile to adding a dbus dependency to tmux to tell systemd that it wants a new session.
(I suspect that most terminal emulators also call setsid(), so this approach wouldn't actually work.)
--Andy
On Wed, Jun 01, 2016 at 07:35:21AM -0700, Andrew Lutomirski wrote:
On Jun 1, 2016 7:29 AM, "Tomasz Torcz" tomek@pipebreaker.pl wrote:
On Wed, Jun 01, 2016 at 10:04:27AM -0400, Dan Book wrote:
Again, this isn't just work-arounds around broken programs. It's a security thing. It's privileged code (logind, PID 1) that enforces a clear life-cycle on unprivileged programs.
Any scheme that relies on unprivileged programs "being nice" doesn't fix the inherent security problem: after logout a user should not be able consume further runtime resources on the system, regardless if he does that because of a bug or on purpose.
Lennart
That's your opinion, and while many sysadmins may share it, many will
not.
Having this as an optional security feature would be fantastic.
Enforcing
it by default on every user many of which use tmux, screen, nohup, and
& to
persist long running processes for daily work, is not something to do
just
because you think it is what people should do.
Just a little perspective – this isn't a new option. KillUserProcesses
functionality
seems to be added by commit 202630822f52e06dce8404633407329c38099278 Date: Mon May 23 23:55:06 2011 +0200
Five years ago, so basically from day one. We have this optional security feature – fantastic! Also, the concept of a ”session” isn't anything new, it's core UNIX concept (setsid() enyone?)
I think that programs needing special treatment should use operating system's facilities to communicate that. So tmux, screen, nohup should really open a new session. It's unfortunate that tmux author is hostile against that, but maybe a clean, compile-time optional patch would
persuade
him?
You lost me. Tmux almost certainly *already* uses setsid(). The author is hostile to adding a dbus dependency to tmux to tell systemd that it wants a new session.
(I suspect that most terminal emulators also call setsid(), so this approach wouldn't actually work.)
That's kind the point. It would be great to know what exactly need to be done by programs. Examples, examples, examples. Apparently setsid() is not enough.
On Wed, Jun 1, 2016 at 10:28 AM, Tomasz Torcz tomek@pipebreaker.pl wrote:
I think that programs needing special treatment should use operating system's facilities to communicate that. So tmux, screen, nohup should really open a new session. It's unfortunate that tmux author is hostile against that, but maybe a clean, compile-time optional patch would persuade him? Anyway, I think some examples of ”how to inform systemd I'm a special program not to reap” would be welcome. Does it need to be done through D-Bus interaction with logind? Is using PAM sufficient/required? (Nb. screen already uses PAM for some functionality).
As mentioned, this isn't just about screen, tmux, and nohup (or if there's any other programs used in a similar context). *Any* command run with a trailing & is commonly expected to survive logout, usually from remote shells. Setting this as a default security policy without allowing that standard behavior is going to be, at best, very surprising to a lot of people, and documenting a new way to do the same thing isn't good enough.
On Wed, Jun 01, 2016 at 10:35:31AM -0400, Dan Book wrote:
As mentioned, this isn't just about screen, tmux, and nohup (or if there's any other programs used in a similar context). *Any* command run with a trailing & is commonly expected to survive logout, usually from remote shells.
Um, no. There is *no* expectation of a random command surviving loss of its controlling tty (ie logout) unless explcit steps were taken to mitigate it. Such as the command daemonizing itself or ignoring SIGHUP -- either explicitly or via use of nohup (which also gives the added benefit of capturing stdout)
'&' in of itself was *never* any sort of guarantee, regardless of foolish expectations to the contrary.
- Solomon
Solomon Peachy wrote:
On Wed, Jun 01, 2016 at 10:35:31AM -0400, Dan Book wrote:
As mentioned, this isn't just about screen, tmux, and nohup (or if there's any other programs used in a similar context). *Any* command run with a trailing & is commonly expected to survive logout, usually from remote shells.
Um, no. There is *no* expectation of a random command surviving loss of its controlling tty (ie logout) unless explcit steps were taken to mitigate it. Such as the command daemonizing itself or ignoring SIGHUP -- either explicitly or via use of nohup (which also gives the added benefit of capturing stdout)
'&' in of itself was *never* any sort of guarantee, regardless of foolish expectations to the contrary.
Wrong, for all csh users.
You folks are all talking from quite narrow perspectives.
On Wed, Jun 01, 2016 at 05:11:13PM +0100, Howard Chu wrote:
'&' in of itself was *never* any sort of guarantee, regardless of foolish expectations to the contrary.
Wrong, for all csh users.
You folks are all talking from quite narrow perspectives.
You inadvertantly proved my point -- one can't make assumptions about what shells are installed or used.
(Heck, csh isn't even installed by default on Fedora. You have to explicitly install and enable/invoke it)
- Solomon
Hi,
As mentioned, this isn't just about screen, tmux, and nohup (or if there's any other programs used in a similar context). *Any* command run with a trailing & is commonly expected to survive logout, usually from remote shells.
No. They get SIGHUP when you logout, and the default action for SIGHUP is to exit.
So if programs want survive logout for whatever reason they have to change the SIGHUP action to either a signal handler or set it to ignore.
And IMO systemd should continue to allow programs to stay around that way in case lingering is enabled for the user.
cheers, Gerd
On 06/02/2016 01:39 AM, Gerd Hoffmann wrote:
Hi,
As mentioned, this isn't just about screen, tmux, and nohup (or if there's any other programs used in a similar context). *Any* command run with a trailing & is commonly expected to survive logout, usually from remote shells.
No. They get SIGHUP when you logout, and the default action for SIGHUP is to exit.
So if programs want survive logout for whatever reason they have to change the SIGHUP action to either a signal handler or set it to ignore.
And IMO systemd should continue to allow programs to stay around that way in case lingering is enabled for the user.
cheers, Gerd
What if systemd sent (optionally?) a SIGHUP to the stray processes instead of TERM/KILL? Or do the problematic ones that have been mentioned ignore SIGHUP as well?
On Wed, 01.06.16 16:28, Tomasz Torcz (tomek@pipebreaker.pl) wrote:
Five years ago, so basically from day one. We have this optional security feature – fantastic! Also, the concept of a ”session” isn't anything new, it's core UNIX concept (setsid() enyone?)
setsid() is really mostly about TTY job control. It doesn't really map nicely to logind or PAM sessions. And because the mapping is skewed audit came up with its own session ID (/proc/$PID/sessionid).
Lennart
On Wed, 2016-06-01 at 15:48 +0200, Lennart Poettering wrote:
Any scheme that relies on unprivileged programs "being nice" doesn't fix the inherent security problem: after logout a user should not be able consume further runtime resources on the system, regardless if he does that because of a bug or on purpose.
I don't think you've yet explained exactly why this constitutes a 'security problem'. Could you please do so?
On Wed, 01.06.16 07:20, Adam Williamson (adamwill@fedoraproject.org) wrote:
On Wed, 2016-06-01 at 15:48 +0200, Lennart Poettering wrote:
Any scheme that relies on unprivileged programs "being nice" doesn't fix the inherent security problem: after logout a user should not be able consume further runtime resources on the system, regardless if he does that because of a bug or on purpose.
I don't think you've yet explained exactly why this constitutes a 'security problem'. Could you please do so?
Well. Let's say you are responsible for the Linux desktops of a large security-senstive company (let's say bank, whatever), and the desktops are installed as fixed workstations, which different employees using them at different times. They log in, they do some "important company stuff", and then they log out again. Now, it's a large company, so it doesn't have the closest control on every single employee, and sometimes employees leave the company. Sometimes even the employees browse to the wrong web sites, catch a browser exploit and suddenly start runing spam bots under their user identity, without even knowing.
In all of these cases you really want to make sure that whatever the user did ends – really ends – by the time he logs out. So that the employee can't do stuff there except when logged in, and that he can't do stuff there even long after he left the company, and that the spam bot he caught gets killed as soon as he logs out.
This is really just one example. This model I think really needs to be the default everywhere. On desktops and on servers: unless the admin permitted it explicitly, there should not be user code running. If you allow your intern user access to a webserver to quickly check our the resource consumption of some service that doesn't mean that he shall be allowed to run stuff there forever, just because he once had the login privilege for the server. And even more: after you disabled his user account and logged him out, he really should be gone.
Yes, UNIX is pretty much a swiss cheese: it's really hard to secure a system properly so that somebody who once had access won't have access anymore at a later point. However, we need to start somewhere, and actually defining a clear lifecycle is a good start.
Pretty much all more modern OS designs tend to have such a clear lifecycle btw: when the user is logged out, he's *really* logged out. And it's completely OK if certain users get excludeded from that, but if so, then the admin needs to sign off on that, and thus a privilege check needs to be enforced.
Lennart
On Thursday, June 02, 2016 13:04:44 Lennart Poettering wrote:
On Wed, 01.06.16 07:20, Adam Williamson (adamwill@fedoraproject.org) wrote:
On Wed, 2016-06-01 at 15:48 +0200, Lennart Poettering wrote:
Any scheme that relies on unprivileged programs "being nice" doesn't fix the inherent security problem: after logout a user should not be able consume further runtime resources on the system, regardless if he does that because of a bug or on purpose.
I don't think you've yet explained exactly why this constitutes a 'security problem'. Could you please do so?
Well. Let's say you are responsible for the Linux desktops of a large security-senstive company (let's say bank, whatever), and the desktops are installed as fixed workstations, which different employees using them at different times. They log in, they do some "important company stuff", and then they log out again. Now, it's a large company, so it doesn't have the closest control on every single employee, and sometimes employees leave the company. Sometimes even the employees browse to the wrong web sites, catch a browser exploit and suddenly start runing spam bots under their user identity, without even knowing.
In all of these cases you really want to make sure that whatever the user did ends – really ends – by the time he logs out. So that the employee can't do stuff there except when logged in, and that he can't do stuff there even long after he left the company, and that the spam bot he caught gets killed as soon as he logs out.
Then what prevents the user from keeping a session forever?
This is really just one example. This model I think really needs to be the default everywhere. On desktops and on servers: unless the admin permitted it explicitly, there should not be user code running. If you allow your intern user access to a webserver to quickly check our the resource consumption of some service that doesn't mean that he shall be allowed to run stuff there forever, just because he once had the login privilege for the server. And even more: after you disabled his user account and logged him out, he really should be gone.
What exactly do you mean by "logged him out"?
You must be a privileged user to do that. If you are a privileged user, killing his/her processes is just one more command on top of that...
Kamil
Yes, UNIX is pretty much a swiss cheese: it's really hard to secure a system properly so that somebody who once had access won't have access anymore at a later point. However, we need to start somewhere, and actually defining a clear lifecycle is a good start.
Pretty much all more modern OS designs tend to have such a clear lifecycle btw: when the user is logged out, he's *really* logged out. And it's completely OK if certain users get excludeded from that, but if so, then the admin needs to sign off on that, and thus a privilege check needs to be enforced.
Lennart
On Thursday, June 02, 2016 13:04:44 Lennart Poettering wrote:
On Wed, 01.06.16 07:20, Adam Williamson (adamwill@fedoraproject.org) wrote:
On Wed, 2016-06-01 at 15:48 +0200, Lennart Poettering wrote:
Any scheme that relies on unprivileged programs "being nice" doesn't fix the inherent security problem: after logout a user should not be able consume further runtime resources on the system, regardless if he does that because of a bug or on purpose.
I don't think you've yet explained exactly why this constitutes a 'security problem'. Could you please do so?
Well. Let's say you are responsible for the Linux desktops of a large security-senstive company (let's say bank, whatever), and the desktops are installed as fixed workstations, which different employees using them at different times. They log in, they do some "important company stuff", and then they log out again. Now, it's a large company, so it doesn't have the closest control on every single employee, and sometimes employees leave the company. Sometimes even the employees browse to the wrong web sites, catch a browser exploit and suddenly start runing spam bots under their user identity, without even knowing.
Well, if i'm writing a malware i'll make sure it uses systemd-run so it keeps on running.
In all of these cases you really want to make sure that whatever the user did ends – really ends – by the time he logs out. So that the employee can't do stuff there except when logged in, and that he can't do stuff there even long after he left the company, and that the spam bot he caught gets killed as soon as he logs out.
Then what prevents the user from keeping a session forever?
Nothing, because this is not the proper solution to the security problem.
This is really just one example. This model I think really needs to be the default everywhere. On desktops and on servers: unless the admin permitted it explicitly, there should not be user code running. If you allow your intern user access to a webserver to quickly check our the resource consumption of some service that doesn't mean that he shall be allowed to run stuff there forever, just because he once had the login privilege for the server. And even more: after you disabled his user account and logged him out, he really should be gone.
What exactly do you mean by "logged him out"?
You must be a privileged user to do that. If you are a privileged user, killing his/her processes is just one more command on top of that...
Kamil
Yes, UNIX is pretty much a swiss cheese: it's really hard to secure a system properly so that somebody who once had access won't have access anymore at a later point. However, we need to start somewhere, and actually defining a clear lifecycle is a good start.
Pretty much all more modern OS designs tend to have such a clear lifecycle btw: when the user is logged out, he's *really* logged out. And it's completely OK if certain users get excludeded from that, but if so, then the admin needs to sign off on that, and thus a privilege check needs to be enforced.
This default is nonsense the only thing that it really does is break stuff that relies on processes being executed after the user closes his session. Yes, there's an obscure systemd-run command that only the systemd devs know and can make your programs run forever but what's wrong with "&" or just running "screen" to create a persistent session?? I know you're gonna start with the "there's been 40 years since Unix was designed" argument, Unix was designed with simplicity in mind, this default is not simple it adds another layer of complexity that is not really needed, breaks stuff and makes ALL THE WORLD change the way they work.
Going back to the bank example, the change of KillUserProcesses should be decided by the security team or the system administrator instead of magically be there.
And no, the argument of "you should learn systemd" does not apply here, this changes the way Linux behaves and it's not obvious to the user and believe me nobody outside of the people that have their eyes on systemd reads this document: https://github.com/systemd/systemd/blob/master/NEWS#L29
The change of KillUserProcesses to "yes" should be done in a use case basis not the upstream dev team, this changes the way Linux behaves in a very aggressive way and can lead to a lot of bug reports, break scripts and could make people stop using Linux with systemd because you should not break the "least surprise" principle.
It's easier to remove just one commit [1] than to make EVERYBODY change the way they work...
Cheers, Ivan
[1] https://github.com/systemd/systemd/pull/3005/commits/97e5530cf2076a2b4fc5575...
On Thu, Jun 2, 2016 at 1:26 PM, Ivan Chavero ichavero@redhat.com wrote:
Well, if i'm writing a malware i'll make sure it uses systemd-run so it
keeps on running.
The point of the feature is not to prevent users from running anything in the background. It's that *anything* the user runs has proper systemd confinement, so it's obvious and manageable by the administrator. Without this feature, the only reliable way to achieve the same thing is to reboot every system.
This default is nonsense the only thing that it really does is break
stuff that relies on processes being executed after the user closes his session. Yes, there's an obscure systemd-run command that only the systemd devs know and can make your programs run forever but what's wrong with "&" or just running "screen" to create a persistent session??
Maybe it's obscure to you, but it's foolish to suggest that it will forever be so. What's wrong with your shell understanding that "&" needs more sophisticated handling than fork/exec* these days? There's no reason why shells can't handle this for you, or you can setup your shell to handle it for you. There's already been discussion about creating wrapper scripts in Fedora for screen and tmux that automatically handle execution via system-run, so I'm unsure what the issue is.
----- Original Message -----
From: "Justin Brown" justin.brown@fandingo.org To: "Development discussions related to Fedora" devel@lists.fedoraproject.org Sent: Thursday, June 2, 2016 1:17:22 PM Subject: Re: systemd 230 change - KillUserProcesses defaults to yes
On Thu, Jun 2, 2016 at 1:26 PM, Ivan Chavero < ichavero@redhat.com > wrote:
Well, if i'm writing a malware i'll make sure it uses systemd-run so it
keeps on running.
The point of the feature is not to prevent users from running anything in the background. It's that *anything* the user runs has proper systemd confinement, so it's obvious and manageable by the administrator. Without this feature, the only reliable way to achieve the same thing is to reboot every system.
Why does user activity need to have systemd confinment?
A well crafted script can kill user processes if desired. This is pretty basic Unix system administration stuff.
This default is nonsense the only thing that it really does is break stuff that relies on processes being executed after the user closes his session. Yes, there's an obscure systemd-run command that only the systemd devs know and can make your programs run forever but what's wrong with "&" or just running "screen" to create a persistent session??
Maybe it's obscure to you, but it's foolish to suggest that it will forever be so.
Actually it's not obscure to me i can read manuals (BTW typical ad-hominem argument), and i follow systemd development because it's an important part of Linux systems. If the change of every Unix manual and textbook is required to remove this from obscurity, i'm pretty sure it will remain like that for a while...
What's wrong with your shell understanding that "&" needs more sophisticated handling than fork/exec* these days? There's no reason why shells can't handle this for you, or you can setup your shell to handle it for you. There's already been discussion about creating wrapper scripts in Fedora for screen and tmux that autmatically handle execution via system-run, so I'm unsure what the issue is.
Really?? I'm a little speachless here, you're suggesting that shell developers should change the behaviour of their software because of this default!!
¿What's the issue? There are a lot of users that expect their processes to behave in a certain way and this introduces a big change in this behaviour, this will break a lot of stuff.
BTW i'm not a systemd hater, i think it does pretty cool stuff but sometimes developers take decisions that have bigger repercussions than the use case they are trying to solve.
On 2 June 2016 at 15:17, Justin Brown justin.brown@fandingo.org wrote:
On Thu, Jun 2, 2016 at 1:26 PM, Ivan Chavero ichavero@redhat.com wrote:
Well, if i'm writing a malware i'll make sure it uses systemd-run so it
keeps on running.
The point of the feature is not to prevent users from running anything in the background. It's that *anything* the user runs has proper systemd confinement, so it's obvious and manageable by the administrator. Without this feature, the only reliable way to achieve the same thing is to reboot every system.
This default is nonsense the only thing that it really does is break stuff that relies on processes being executed after the user closes his session. Yes, there's an obscure systemd-run command that only the systemd devs know and can make your programs run forever but what's wrong with "&" or just running "screen" to create a persistent session??
Maybe it's obscure to you, but it's foolish to suggest that it will forever be so. What's wrong with your shell understanding that "&" needs more sophisticated handling than fork/exec* these days? There's no reason why shells can't handle this for you, or you can setup your shell to handle it for you. There's already been discussion about creating wrapper scripts in Fedora for screen and tmux that automatically handle execution via system-run, so I'm unsure what the issue is.
Mostly because that is a naive view of the amount of work that will need to be done. It has to be more than a wrapper script, it will take a bunch of patches to many applications for it to work. Those projects need to be aware of this change or some patch needs to be held in every OS when the upstream says 'screw this'
This isn't a sentence that these things can't happen. Supposedly various ports to MacOS-X (and possibly other OS's) have carried various patches to work with init systems that do something similar. So it is possible. However the want to do any of that was short cut from the beginning because of the standard cycle of systemd squabbling.
1. There is a problem for a certain group that systemd people care about (usually desktop but not always). 2. Systemd puts in a fix for that problem. 3. Someone who isn't using the system that way gets affected and asks/complains/bitches about the fix (depending on the person). 4. Communication goes down hill with the following items you can checkmark regularly: A. Someone 'representing' systemd says its right and it is dragging the neanderthals into the light of 21st century computing. B. Someone 'representing' grognards says its right and it doesn't want know-it-all eggheads pissing on it all the time. C. Someone 'explains' how this fixes a security problem. D. Someone 'explains' how it causes a security problem. E. Both sides tear apart each others arguments. F. Both sides yells, screams, throws insults, emails 'anonymous' death threats to people in the other side. G. Both sides say they are going to take their toys and go home. H. Eventually FESCO has to play adult and tell the groups to work together or no one gets to play.
I am guessing we are hitting E and will be going to F soon. G will come sometime after F24 is released (usually 2-3 weeks after release). H. will happen after the deadline for features in F25 occurs.
[PS I do not condone or think that any of the steps are good or should be done. It just seems to be the standard bingo for this software. Someone might also be able to put dnf or GNOME into similar categories. ]
Stephen John Smoogen wrote:
- There is a problem for a certain group that systemd people care
about (usually desktop but not always). 2. Systemd puts in a fix for that problem.
In this timeline, your step (2) is crucially missing a piece. Systemd has put in a *change* but it has been shown *not* to address the actual problem.
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/...
It's also quite obviously *not* an actual fix, it is a bandaid at best and the real problems remain in other code.
Whether or not there's any actual security benefit to the change is a pure non-sequitur. The fact remains that the original problem that prompted the change hasn't been fixed, while numerous legitimate use cases spanning 30+ years of practice get broken. This is *not* good software engineering, by any measure.
- Someone who isn't using the system that way gets affected and
asks/complains/bitches about the fix (depending on the person). 4. Communication goes down hill with the following items you can checkmark regularly: A. Someone 'representing' systemd says its right and it is dragging the neanderthals into the light of 21st century computing. B. Someone 'representing' grognards says its right and it doesn't want know-it-all eggheads pissing on it all the time. C. Someone 'explains' how this fixes a security problem. D. Someone 'explains' how it causes a security problem. E. Both sides tear apart each others arguments. F. Both sides yells, screams, throws insults, emails 'anonymous' death threats to people in the other side. G. Both sides say they are going to take their toys and go home. H. Eventually FESCO has to play adult and tell the groups to work together or no one gets to play.
I am guessing we are hitting E and will be going to F soon. G will come sometime after F24 is released (usually 2-3 weeks after release). H. will happen after the deadline for features in F25 occurs.
[PS I do not condone or think that any of the steps are good or should be done. It just seems to be the standard bingo for this software. Someone might also be able to put dnf or GNOME into similar categories. ]
Thanks.. I forgot an important part.
5. People comment about the broken cycle and then various people nitpick that comment in some fashion that doesn't improve anything but 'proves' that they are 'correcter' than the commenter. Overall everyone involved feels worse off.
On 2 June 2016 at 16:31, Howard Chu hyc@symas.com wrote:
Stephen John Smoogen wrote:
- There is a problem for a certain group that systemd people care
about (usually desktop but not always). 2. Systemd puts in a fix for that problem.
In this timeline, your step (2) is crucially missing a piece. Systemd has put in a *change* but it has been shown *not* to address the actual problem.
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/...
It's also quite obviously *not* an actual fix, it is a bandaid at best and the real problems remain in other code.
Whether or not there's any actual security benefit to the change is a pure non-sequitur. The fact remains that the original problem that prompted the change hasn't been fixed, while numerous legitimate use cases spanning 30+ years of practice get broken. This is *not* good software engineering, by any measure.
- Someone who isn't using the system that way gets affected and
asks/complains/bitches about the fix (depending on the person). 4. Communication goes down hill with the following items you can checkmark regularly: A. Someone 'representing' systemd says its right and it is dragging the neanderthals into the light of 21st century computing. B. Someone 'representing' grognards says its right and it doesn't want know-it-all eggheads pissing on it all the time. C. Someone 'explains' how this fixes a security problem. D. Someone 'explains' how it causes a security problem. E. Both sides tear apart each others arguments. F. Both sides yells, screams, throws insults, emails 'anonymous' death threats to people in the other side. G. Both sides say they are going to take their toys and go home. H. Eventually FESCO has to play adult and tell the groups to work together or no one gets to play.
I am guessing we are hitting E and will be going to F soon. G will come sometime after F24 is released (usually 2-3 weeks after release). H. will happen after the deadline for features in F25 occurs.
[PS I do not condone or think that any of the steps are good or should be done. It just seems to be the standard bingo for this software. Someone might also be able to put dnf or GNOME into similar categories. ]
-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ -- devel mailing list devel@lists.fedoraproject.org https://lists.fedoraproject.org/admin/lists/devel@lists.fedoraproject.org
Lennart Poettering mzerqung@0pointer.de wrote:
On Wed, 01.06.16 07:20, Adam Williamson (adamwill@fedoraproject.org) wrote:
On Wed, 2016-06-01 at 15:48 +0200, Lennart Poettering wrote:
Any scheme that relies on unprivileged programs "being nice" doesn't fix the inherent security problem: after logout a user should not be able consume further runtime resources on the system, regardless if he does that because of a bug or on purpose.
I don't think you've yet explained exactly why this constitutes a 'security problem'. Could you please do so?
Well. Let's say you are responsible for the Linux desktops of a large security-senstive company (let's say bank, whatever),
I suppose a bank might have a policy that the systems that handle the money must not be touched while the bank is closed. Then they might want to kill processes at closing time. That's a rather special use case that should be configured where it's needed, and it seems to me that it should be tied to the bank's opening-hours, not users' login sessions.
and the desktops are installed as fixed workstations, which different employees using them at different times. They log in, they do some "important company stuff", and then they log out again. Now, it's a large company, so it doesn't have the closest control on every single employee,
But apparently they do trust their employees enough to allow them to log in and run processes. If an employee wants to do something bad, then what prevents them from doing it while sitting at their desk? It seems that it would be a very special situation where an action that is OK for an employee to do while sitting at their desk suddenly becomes a problem when they go home for the night. In that case the admins would also have to disable Cron, At, SSH and any other means of remote access, and then they could enable KillUserProcesses at the same time.
and sometimes employees leave the company.
Then their user accounts shall be disabled so that they can no longer log in, and *then* it makes sense to kill all their processes.
Sometimes even the employees browse to the wrong web sites, catch a browser exploit and suddenly start runing spam bots under their user identity, without even knowing.
I don't see how a spambot is a problem during nights and weekends but not during work hours. A spambot shall be wiped out as soon as it's discovered. It shall definitely not be left in place and allowed to respawn every time the user logs in.
Sysadmins can choose to run a cron job that checks for unexpectedly old processes and alerts the admin who then takes a closer look and judges whether the process is legitimate or not.
unless the admin permitted it explicitly, there should not be user code running.
The admin did permit it explicitly when they created the user account.
If you allow your intern user access to a webserver to quickly check our the resource consumption of some service that doesn't mean that he shall be allowed to run stuff there forever, just because he once had the login privilege for the server.
Then disable the account and run "killall --signal KILL --user intern".
And even more: after you disabled his user account and logged him out, he really should be gone.
After you disabled his user account, he really should be gone. If he's just logged out, he will be back tomorrow. Logging out is one thing. Disabling a user account is another. Kill any lingering processes when disabling the account, not every time the user logs out.
A single command that both disables a user account and kills any processes running as that user might be handy. Anyone who thinks it's needed can write such a tool.
All of your examples are either very special cases, or else the enforcement belongs in other places than the logout procedure. Thus I'm still not convinced that there is a security problem that affects a typical Fedora system.
Björn Persson
On Thu, Jun 2, 2016 at 7:14 AM, Björn Persson <Bjorn@rombobjörn.se> wrote:
Lennart Poettering mzerqung@0pointer.de wrote:
And even more: after you disabled his user account and logged him out, he really should be gone.
After you disabled his user account, he really should be gone. If he's just logged out, he will be back tomorrow. Logging out is one thing. Disabling a user account is another. Kill any lingering processes when disabling the account, not every time the user logs out.
A single command that both disables a user account and kills any processes running as that user might be handy. Anyone who thinks it's needed can write such a tool.
I looked and so far there does not seem to be a one-command solution.
But 3 steps suffice: 1. Disable the account so that they cannot make new sessions:
usermod -L --expiredate 1 <user>
2. Set the pid limit of the user's cgroup to 0, so that they cannot fork new processes:
systemctl set-property user-<uid>.slice TasksMax=0
3. Kill the user's processes:
loginctl kill-user --signal=SIGKILL <user>
This could be wrapped up in a single command like "loginctl kickban <user-or-uid>". I'm guessing a lot of sysadmins would appreciate it.
There's some trickiness involved, in that usermod does not handle networked setups like sssd, but perfect is the enemy of good.
-- Allan Gardner
On Thu, 2 Jun 2016, Lennart Poettering wrote:
Well. Let's say you are responsible for the Linux desktops of a large security-senstive company (let's say bank, whatever), and the desktops are installed as fixed workstations, which different employees using them at different times. They log in, they do some "important company stuff", and then they log out again. Now, it's a large company, so it doesn't have the closest control on every single employee, and sometimes employees leave the company. Sometimes even the employees browse to the wrong web sites, catch a browser exploit and suddenly start runing spam bots under their user identity, without even knowing.
This all has nothing to do with individual processes on machines. If you are a big bank you better well detect rogue processes using up CPU on your install base.
In all of these cases you really want to make sure that whatever the user did ends – really ends – by the time he logs out.
No you don't. you are creating simplistic world views that are not there.
As others have said, the only simple case of killing processes is those with no use when the user is gone - that is locally started windowing applications. Really, we need to fix gnome and gdm and stuff that lingers where the problem is. We don't need systemd to kill my 200 gdm lockscreen binaries that eventually run me out of resources to unlock my screen. We need gdm to see its bugs and fix it.
This is really just one example. This model I think really needs to be the default everywhere.
People aren't agreeing with you. So making it a default seems like a bad idea. People do seem to agree on "obviously broken windoing apps" that are left lingering. Why can't we just let those get killed?
On desktops and on servers: unless the admin permitted it explicitly, there should not be user code running.
no, user code may be running everywhere as long as it does not affect the purpose or policies of the machines. Such policies are not written by filenames of binary files.
If you allow your intern user access to a webserver to quickly check our the resource consumption of some service that doesn't mean that he shall be allowed to run stuff there forever, just because he once had the login privilege for the server. And even more: after you disabled his user account and logged him out, he really should be gone.
apart from your use case taking up 4 lines, which seems like a difficult policy to code into applications (remember you would also need to be able to code the reverse of that policy) the only thing I do agree with you here is that unlisted uids/gids might be fair game to shoot. But one has to wonder how well that works in the case of network outages where a NIS server or something is temporarilly unavailable and you start shooting legitimate processes.
Yes, UNIX is pretty much a swiss cheese: it's really hard to secure a system properly so that somebody who once had access won't have access anymore at a later point. However, we need to start somewhere, and actually defining a clear lifecycle is a good start.
But your definition is already running foul with just a handful of software developers and it will cause large unexpected problems in the real world.
For example, a decade ago at a najor airline, they had their core database automatically deleted each night. turns out an overeager cronjob deletes all "core" files that crashed applications left all over the servers. To me, systemd shooting processes is not different.
If you are that concerned about processes, you need a strict security policy on what proccesses you allow to be _started_, not trying to fix your mistakes afterwards by shooting.
Paul
On Thu, 2016-06-02 at 10:02 -0400, Paul Wouters wrote:
People aren't agreeing with you. So making it a default seems like a bad idea. People do seem to agree on "obviously broken windoing apps" that are left lingering. Why can't we just let those get killed?
You are misinformed. This is not about 'obviously broken' windowing apps. Applications that have X or wayland connections get killed reliably when the session ends, because that connection is going away.
On Do, 2016-06-02 at 10:07 -0400, Matthias Clasen wrote:
On Thu, 2016-06-02 at 10:02 -0400, Paul Wouters wrote:
People aren't agreeing with you. So making it a default seems like a bad idea. People do seem to agree on "obviously broken windoing apps" that are left lingering. Why can't we just let those get killed?
You are misinformed. This is not about 'obviously broken' windowing apps. Applications that have X or wayland connections get killed reliably when the session ends, because that connection is going away.
No.
It's sort-of default behavior, a bit simliar to how terminal apps get zapped by SIGHUP when the terminal closes. But it isn't enforced at all, apps can keep running when the X or wayland connection goes away, either just a short moment (firefox saving open tabs to disk, then exit) or even longer in case they are running some kind of batch job which they can finish without user interaction. Or they keep on trying to read from the closed connection due to some stupid bug ...
cheers, Gerd
Gerd Hoffmann writes:
On Do, 2016-06-02 at 10:07 -0400, Matthias Clasen wrote:
You are misinformed. This is not about 'obviously broken' windowing apps. Applications that have X or wayland connections get killed reliably when the session ends, because that connection is going away.
No.
It's sort-of default behavior, a bit simliar to how terminal apps get zapped by SIGHUP when the terminal closes. But it isn't enforced at all, apps can keep running when the X or wayland connection goes away, either just a short moment (firefox saving open tabs to disk, then exit) or even longer in case they are running some kind of batch job which they can finish without user interaction. Or they keep on trying to read from the closed connection due to some stupid bug ...
ssh into a box, and start emacs. Close emacs. Logout from the shell. The shell logs out, but the ssh sesssion remains. The terminal session doesn't end because gconfd-2 is still running in the background.
Why does some kind of a configuration framework API need a freaking daemon to run in the background? And why is that bloody thing still running after I logout, and especially since neither the client, nor the server, runs the Gnome desktop?
Back when emacs converted to GTK, the switch to GTK made sense. Both emacs and Gnome, after all, were GNU projects. But now, years later, with Gnome jumping the shark it all winds up breaking unrelated stuff, like tmux and screen, that has nothing to do with Gnome.
On Thu, 2016-06-02 at 18:01 -0400, Sam Varshavchik wrote:
Gerd Hoffmann writes:
On Do, 2016-06-02 at 10:07 -0400, Matthias Clasen wrote:
You are misinformed. This is not about 'obviously broken' windowing apps. Applications that have X or wayland connections get killed reliably when the session ends, because that connection is going away.
No.
It's sort-of default behavior, a bit simliar to how terminal apps get zapped by SIGHUP when the terminal closes. But it isn't enforced at all, apps can keep running when the X or wayland connection goes away, either just a short moment (firefox saving open tabs to disk, then exit) or even longer in case they are running some kind of batch job which they can finish without user interaction. Or they keep on trying to read from the closed connection due to some stupid bug ...
ssh into a box, and start emacs. Close emacs. Logout from the shell. The shell logs out, but the ssh sesssion remains. The terminal session doesn't end because gconfd-2 is still running in the background.
Why does some kind of a configuration framework API need a freaking daemon to run in the background? And why is that bloody thing still running after I logout, and especially since neither the client, nor the server, runs the Gnome desktop?
Back when emacs converted to GTK, the switch to GTK made sense. Both emacs and Gnome, after all, were GNU projects. But now, years later, with Gnome jumping the shark it all winds up breaking unrelated stuff, like tmux and screen, that has nothing to do with Gnome.
gconf has been deprecated for like...five years?...now, so I'd say yelling and screaming about GNOME is kind of missing the point here. The more salient question being, why hasn't emacs-gtk or whatever moved off gconf yet?
On Thu, 2016-06-02 at 13:04 +0200, Lennart Poettering wrote:
On Wed, 01.06.16 07:20, Adam Williamson (adamwill@fedoraproject.org) wrote:
On Wed, 2016-06-01 at 15:48 +0200, Lennart Poettering wrote:
Any scheme that relies on unprivileged programs "being nice" doesn't fix the inherent security problem: after logout a user should not be able consume further runtime resources on the system, regardless if he does that because of a bug or on purpose.
I don't think you've yet explained exactly why this constitutes a 'security problem'. Could you please do so?
Well. Let's say you are responsible for the Linux desktops of a large security-senstive company (let's say bank, whatever), and the desktops are installed as fixed workstations, which different employees using them at different times. They log in, they do some "important company stuff", and then they log out again. Now, it's a large company, so it doesn't have the closest control on every single employee, and sometimes employees leave the company. Sometimes even the employees browse to the wrong web sites, catch a browser exploit and suddenly start runing spam bots under their user identity, without even knowing.
These are all bad things, yes. Yet there is a large logic gap right here.
In all of these cases you really want to make sure that whatever the user did ends – really ends – by the time he logs out.
Er...why? Why is it OK for the employee to be running malicious code (spambots, rootkits, whatever you like) just so long as they're logged in? How is making sure the evil code only runs while the employee is logged in helping in any significant way?
So that the employee can't do stuff there except when logged in, and that he can't do stuff there even long after he left the company, and that the spam bot he caught gets killed as soon as he logs out.
I guess a spambot that only runs 9-5 is slightly better than one that runs 24x7, but it hardly seems like the ideal fix for the problem. As for 'after he left the company', well, killing everyone's processes every time they log out is an awfully large hammer to solve the problem of killing someone's processes the *one time* they leave the company.
This is really just one example. This model I think really needs to be the default everywhere. On desktops and on servers: unless the admin permitted it explicitly, there should not be user code running. If you allow your intern user access to a webserver to quickly check our the resource consumption of some service that doesn't mean that he shall be allowed to run stuff there forever, just because he once had the login privilege for the server. And even more: after you disabled his user account and logged him out, he really should be gone.
Yes, UNIX is pretty much a swiss cheese: it's really hard to secure a system properly so that somebody who once had access won't have access anymore at a later point. However, we need to start somewhere, and actually defining a clear lifecycle is a good start.
Pretty much all more modern OS designs tend to have such a clear lifecycle btw: when the user is logged out, he's *really* logged out. And it's completely OK if certain users get excludeded from that, but if so, then the admin needs to sign off on that, and thus a privilege check needs to be enforced.
I think the design has a lot to be said for it, especially if you're starting from scratch and don't have decades of existing expectations and workflows to deal with. But I'm not hugely convinced by the 'security' aspect of it.
Hi,
In all of these cases you really want to make sure that whatever the user did ends – really ends – by the time he logs out.
Sure, there are valid use cases for that. The admin will probably also turn off lingering then, right?
So, what is problem with simply allowing screen + tmux continue to run in case lingering is enabled, by simply letting the usual SIGHUP logic do it's job for processes which have a controlling terminal?
cheers, Gerd
On Thu, Jun 02, 2016 at 01:04:44PM +0200, Lennart Poettering wrote:
Well. Let's say you are responsible for the Linux desktops of a large security-senstive company (let's say bank, whatever), and the desktops are installed as fixed workstations, which different employees using them at different times. They log in, they do some "important company
I definitely see the use of the option.
However, the above isn't the target for _any_ of the Fedora Editions, except _maybe_ "Developer in a large organization" for Workstation, and even then I think it's not likely to be the above.
This is really just one example. This model I think really needs to be the default everywhere. On desktops and on servers: unless the admin permitted it explicitly, there should not be user code running. If you allow your intern user access to a webserver to quickly check our the resource consumption of some service that doesn't mean that he shall be allowed to run stuff there forever, just because he once had the login privilege for the server. And even more: after you disabled his user account and logged him out, he really should be gone.
"On desktops and on servers: unless the admin permitted it explicitly, there should not be user code running" is a fine statement of policy, but it's _definitely_ policy, not fact, or even generalized best practice.
Disabling user accounts and logging someone out seems like a separate management problem not necessarily addressed by this anyway (how do you ensure logout on all systems?).
On 6/2/2016 7:04 AM, Lennart Poettering wrote:
In all of these cases you really want to make sure that whatever the user did ends – really ends – by the time he logs out.
I apologize if this has already been brought up, but I didn't see this particular point raised in the replies I've read and I wanted to be sure it is mentioned.
The potential problem I see with changing the default behavior of systemd is that it is non-intuitive and could be potentially harmful if the user is not aware of it. Consider the following example. I routinely use screen when I connect to the systems I manage remotely specifically when I try to apply updates. I do this because if my VPN connection or my Internet connection is interrupted, the update process will not die with my login session. Incidentally, I learned to use screen when applying updates the hard way many years ago when a killed update wreaked havoc with a system of mine and I found screen to be an excellent solution.
Currently screen is a utility whose purpose is, in part, to allow a user to disconnect from a running process in a safe way that will allow it to keep running and be resumed in the future from another session entirely. Now that will only be valid if the user knows whether or not systemd is configured on the system in question to not automatically reap these kinds of processes. This would have the effect of transforming what is clear and precise documentation of a well known and widely used utility into something that relies on another set of system configuration settings that are arguably not as intuitive.
For the record, I am not trying to say I am opposed to increasing system security and progress in general in the slightest - both are laudable goals. What I am trying to say is that there is a lot of merit in making sure things are not made unnecessarily more complex and difficult to discern unnecessarily. Whatever needs to be done to increase security needs to be done while embracing the notion of not pulling the rug out from under legitimate uses of programs whose indiscriminate and unexpected reaping could cause disastrous results.
Tom
On 06/02/2016 12:37 PM, Tom Rivers wrote:
The potential problem I see with changing the default behavior of systemd is that it is non-intuitive and could be potentially harmful if the user is not aware of it. Consider the following example. I routinely use screen when I connect to the systems I manage remotely specifically when I try to apply updates. I do this because if my VPN connection or my Internet connection is interrupted, the update process will not die with my login session. Incidentally, I learned to use screen when applying updates the hard way many years ago when a killed update wreaked havoc with a system of mine and I found screen to be an excellent solution.
Yes, and to make things worse, many systems with elevated security/sensitivity enforce idle session timeout that just looks at keyboard activity---so they tend to abort long-running jobs, such as updates.
Lennart Poettering writes:
Well. Let's say you are responsible for the Linux desktops of a large security-senstive company (let's say bank, whatever), and the desktops are installed as fixed workstations, which different employees using them at different times. They log in, they do some "important company stuff", and then they log out again. Now, it's a large company, so it doesn't have the closest control on every single employee, and sometimes employees leave the company. Sometimes even the employees browse to the wrong web sites, catch a browser exploit and suddenly start runing spam bots under their user identity, without even knowing.
In all of these cases you really want to make sure that whatever the user did ends – really ends – by the time he logs out. So that the
Nice theory. But there's a problem with this proposal.
If an unprivileged program, like tmux, or screen, or nohup, can do whatever dbus/ibus thingy it needs to do in order to elevate itself to a new "session", and make arrangements to prevent itself from getting nuked from high orbit by KillUserProcesses, then the same thing can obviously be done by any other process. Like the same rogue spambot that's being discussed here. The rogue spambout in question can simply talk to systemd itself, and arrange for it not to be killed when the user logs out. Just like any other process. There goes the added "security" we were hoping to achieve, here.
This KillUserProcesses feature offers no real "security" benefit here whatsoever. I am confident that any professional, who's actually making some bread in information security, will reach this conclusion without taking much time; just a brief, cursory analysis. The claim that KillUserProcesses implements any kind of system security is quite funny. And this doesn't exactly give me a warm and fuzzy feeling – who knows what other things that are also thought to be enforcing system security are lurking inside the systemd monolith…
Anyway, in order for this to be truly effective security, it should not be possible for any ordinary process, like tmux, screen, or nohup, without them being privileged binaries in some way (either via s[gu]id, capabilities(7), or selinux)[*].
Well, good luck with that.
[*] Not 100% true, actually. There are ways to come up with fairly bullet- proof framework for privileged processes on Linux, without relying on system- level support like suid/capabilities/selinux, but this is veering more off- topic than this already is.
On Thu, 02.06.16 18:00, Sam Varshavchik (mrsam@courier-mta.com) wrote:
If an unprivileged program, like tmux, or screen, or nohup, can do whatever dbus/ibus thingy it needs to do in order to elevate itself to a new "session", and make arrangements to prevent itself from getting nuked from high orbit by KillUserProcesses, then the same thing can obviously be done by any other process. Like the same rogue spambot that's being discussed here. The rogue spambout in question can simply talk to systemd itself, and arrange for it not to be killed when the user logs out. Just like any other process. There goes the added "security" we were hoping to achieve, here.
Key here is that the life-cycle is enforced by privileged code, and that this privileged code checks system policy (as in PolicyKit) when deciding what to do. Yes, the default policy we ship is friendly, and says that users can stick around if they want, via lingering, but key here is that this policy check is done by privileged code, and stored in privileged policy.
Lennart
Lennart Poettering mzerqung@0pointer.de wrote:
On Thu, 02.06.16 18:00, Sam Varshavchik (mrsam@courier-mta.com) wrote:
The rogue spambout in question can simply talk to systemd itself, and arrange for it not to be killed when the user logs out.
Yes, the default policy we ship is friendly, and says that users can stick around if they want, via lingering
And therefore the change that is being debated in this thread – the default value of KillUserProcesses – does not change anything security- wise, right? There already was, and there still is, a feature that sysadmins can opt in to use to enforce an unusually strict policy if they want, but there has not been, is not, and will not be such a policy be default, right?
If that's the case, then can we please stop talking about security and instead debate the usability aspects of this change?
Björn Persson
On Fri, Jun 03, 2016 at 03:30:33PM +0200, Björn Persson wrote:
Lennart Poettering mzerqung@0pointer.de wrote:
On Thu, 02.06.16 18:00, Sam Varshavchik (mrsam@courier-mta.com) wrote:
The rogue spambout in question can simply talk to systemd itself, and arrange for it not to be killed when the user logs out.
Yes, the default policy we ship is friendly, and says that users can stick around if they want, via lingering
And therefore the change that is being debated in this thread – the default value of KillUserProcesses – does not change anything security- wise, right? There already was, and there still is, a feature that sysadmins can opt in to use to enforce an unusually strict policy if they want, but there has not been, is not, and will not be such a policy be default, right?
There is both the default *policy* (i.e. what you can ask for using polkit), and the default *behaviour* (i.e. what happens when you log out if you haven't asked for special treatment). We are trying to make the second stricter, while keeping the first more permissive, at least for now. This way the change is more incremental.
If that's the case, then can we please stop talking about security and instead debate the usability aspects of this change?
The change is related to security. Current policy is lax to make the change easier by allowing users to revert to the previous behaviour at will. But the new default brings us one step closer to what we consider a better out-of-the-box behaviour of the system.
Of course usability is important. I'll be looking into allowing screen to persist automatically, but that needs a bit of thought and coding.
Zbyszek
Is it/should it be true that any 'sudo' process is privileged and automatically is put into a session that would not be killed by the user logging out? So if they user starts some background process with sudo, they can log out of their DE session and that process continues to run?
Chris Murphy
On Fri, Jun 3, 2016 at 11:24 AM, Chris Murphy lists@colorremedies.com wrote:
Is it/should it be true that any 'sudo' process is privileged and automatically is put into a session that would not be killed by the user logging out? So if they user starts some background process with sudo, they can log out of their DE session and that process continues to run?
OK so I have an example where there is breakage. The example itself doesn't matter, but because it's so basic (to me anyway) I think it opens up a rat's nest of other workflow problems, people just have to imagine their own and try them out.
1. Set /etc/systemd/logind.conf so that KillUserProcesses=yes 2. Start a btrfs scrub, which by default is a background process:
[chris@f24m ~]$ sudo btrfs scrub status / scrub started on /, fsid dbf2e938-1f28-4e93-aa6c-1e193004931b (pid=9527) [chris@f24m ~]$
3. Log out of the DE (this is gnome-shell). Wait a minute. Log back in.
4. [chris@f24m ~]$ sudo btrfs scrub status / [sudo] password for chris: scrub status for dbf2e938-1f28-4e93-aa6c-1e193004931b scrub started at Fri Jun 3 20:38:15 2016, interrupted after 00:00:05, not running total bytes scrubbed: 2.52GiB with 0 errors
If I repeat this with #KillUserProcesses=yes (commented out), the scrub completes without interruption. This is not an unprivileged process near as I can tell. Scrub is perhaps not the best example, it may well be better workflow to put such a thing on a timer instead. But it could take hours or days so, on demand usage means some kind of workflow change: stay logged in, or drop to a console and login as root to run the command? KillExcludeUsers=root is the default so presumably this avoids the interruption.
But what about device replacement? The command follows similar structure and behavior as 'btrfs replace start <olddev> <newdev> <mountpoint>' and then it goes to background and starts migrating data from the old to new drive. If I log out of the desktop session before that completes, I suspect that too will be interrupted similar to the scrub example. Obviously device replacement would not be put on a timer, it would be done on demand.
Anyway it seems problematic, presumably there are other examples of programs that users want to run on demand, with escalated privileges, in the background, and persist through a logout from the DE?
On Fri, Jun 3, 2016 at 9:37 PM, Chris Murphy lists@colorremedies.com wrote:
[chris@f24m ~]$ sudo btrfs scrub status / [sudo] password for chris: scrub status for dbf2e938-1f28-4e93-aa6c-1e193004931b scrub started at Fri Jun 3 20:38:15 2016, interrupted after 00:00:05, not running total bytes scrubbed: 2.52GiB with 0 errors
The other problem is the journal doesn't contain any hint why there was an interruption.
What this incentivizes me to do is just stay logged in, in cases where I probably should log out. So the security enhancement claims for the feature may be true, but I don't think it's taking into account the potential for users to just remain logged in (with or without a lock screen) which then obviates the security claims.
Chris Murphy writes:
On Fri, Jun 3, 2016 at 9:37 PM, Chris Murphy lists@colorremedies.com wrote:
[chris@f24m ~]$ sudo btrfs scrub status / [sudo] password for chris: scrub status for dbf2e938-1f28-4e93-aa6c-1e193004931b scrub started at Fri Jun 3 20:38:15 2016, interrupted after 00:00:05, not running total bytes scrubbed: 2.52GiB with 0 errors
The other problem is the journal doesn't contain any hint why there was an interruption.
What this incentivizes me to do is just stay logged in, in cases where I probably should log out.
I would think this would incentivize you to put KillUserProcesses back to "no".
On Sat, Jun 4, 2016 at 5:53 AM, Sam Varshavchik mrsam@courier-mta.com wrote:
Chris Murphy writes:
On Fri, Jun 3, 2016 at 9:37 PM, Chris Murphy lists@colorremedies.com wrote:
[chris@f24m ~]$ sudo btrfs scrub status / [sudo] password for chris: scrub status for dbf2e938-1f28-4e93-aa6c-1e193004931b scrub started at Fri Jun 3 20:38:15 2016, interrupted after 00:00:05, not running total bytes scrubbed: 2.52GiB with 0 errors
The other problem is the journal doesn't contain any hint why there was an interruption.
What this incentivizes me to do is just stay logged in, in cases where I probably should log out.
I would think this would incentivize you to put KillUserProcesses back to "no".
No because then the user session isn't cleaned up at all; as I understand it it's due in part to the change in F24 from session bus to user bus. The desktop dbus session has to get killed in order for background services using it to know they need to die.
Since I logout and reboot more often than I do drive replacements, I would leave KillUserProcesses on yes so I stop doing "sudo reboot -f" to avoid restart delays. And I'll just stay logged in and let whatever spambots are using my user session to keep on running, perhaps indefinitely by depending on a lock screen instead.
On Sat, Jun 4, 2016 at 7:53 AM, Sam Varshavchik mrsam@courier-mta.com wrote:
Chris Murphy writes:
On Fri, Jun 3, 2016 at 9:37 PM, Chris Murphy lists@colorremedies.com wrote:
[chris@f24m ~]$ sudo btrfs scrub status / [sudo] password for chris: scrub status for dbf2e938-1f28-4e93-aa6c-1e193004931b scrub started at Fri Jun 3 20:38:15 2016, interrupted after 00:00:05, not running total bytes scrubbed: 2.52GiB with 0 errors
The other problem is the journal doesn't contain any hint why there was an interruption.
What this incentivizes me to do is just stay logged in, in cases where I probably should log out.
I would think this would incentivize you to put KillUserProcesses back to "no".
I don't necessarily have such detailed administrative control of every Fedora box I might connect to, especially a fresh-built one on which I've had no oppertunity to hand-craft and add new configuration management settings.
Björn Persson writes:
If that's the case, then can we please stop talking about security and instead debate the usability aspects of this change?
Agreed.
But if someone still wishes to argue that this is some kind of a security feature, I'll be delighted to continue this discussion.
Lennart Poettering writes:
On Thu, 02.06.16 18:00, Sam Varshavchik (mrsam@courier-mta.com) wrote:
If an unprivileged program, like tmux, or screen, or nohup, can do whatever dbus/ibus thingy it needs to do in order to elevate itself to a new "session", and make arrangements to prevent itself from getting nuked from high orbit by KillUserProcesses, then the same thing can obviously be done by any other process. Like the same rogue spambot that's being discussed here. The rogue spambout in question can simply talk to systemd itself, and arrange for it not to be killed when the user logs out. Just like any other process. There goes the added "security" we were hoping to achieve, here.
Key here is that the life-cycle is enforced by privileged code, and that this privileged code checks system policy (as in PolicyKit) when deciding what to do. Yes, the default policy we ship is friendly, and says that users can stick around if they want, via lingering, but key here is that this policy check is done by privileged code, and stored in privileged policy.
That's not the issue. As I wrote, "if an unprivileged program, like tmux, or screen, or nohup, can do whatever dbus/ibus thingy it needs to do in order to elevate itself to a new "session", and make arrangements to prevent itself from getting nuked from high orbit by KillUserProcesses, then the same thing can obviously be done by any other process … like the same, rogue spambot".
So, this is "enforced by privileged code". That's wonderful news, but as Benny Hill would say: biiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiig deal.
Unless it's possible to have KillUserProcesses mandatory for a given userid's processes, and:
1) It is possible to have, say, tmux, make whatever arrangements are necessary to prevent itself from getting killed, but
2) Make it impossible for any other random process, running under the same uid, to do the same
only if these two conditions are met, then KillUserProcesses becomes an effective security measure.
KillUserProcesses enforces some kind of security only if it is mandatory, and perhaps with exceptions for approved processes. Which is not possible without having a privilege escalation occur at some point in the execution chain for those approved processes. This is POSIX Security Model 101.
But if, as it's being proposed, the option on the table is to make KillUserProcesses optional, i.e. make it optional, even turned on by default, but make it possible for a process to request itself to be lingered past logout, and use this for tmux/screen/nohup, then this offers absolutely no added security whatsoever. Becaus if tmux/screen/nohup can do it, then so can any other process.
It can certainly be a useful feature perhaps, in many situations. But it will not stop a rogue process that wants to linger. KillUserProcesses in its proposed optional enabling in Fedora is not a security measure.
On Thu, Jun 2, 2016 at 7:04 AM, Lennart Poettering mzerqung@0pointer.de wrote:
In all of these cases you really want to make sure that whatever the user did ends – really ends – by the time he logs out. So that the employee can't do stuff there except when logged in, and that he can't do stuff there even long after he left the company, and that the spam bot he caught gets killed as soon as he logs out.
You may personally want this, and it may be part of your "big picture". But when "you", as in the generic "sys-admin" you, kill the critical task that has always worked this way, and especially when you kill it as part of the sysstem upgrades, you will be called in for the "post-mortem" for killing working systems. Do this once or twice in a quarter, and you will get a "performance review". If it happens one more time after a performance review, and you will usually be *gone* after the next annual review or when the next layooffs happen, because you've irritated countless developer, nightly operational groups, and managers from other groups who just expect things to work the same way they worked last year.
Been there, done that, got the layoff bonus.
Pretty much all more modern OS designs tend to have such a clear lifecycle btw: when the user is logged out, he's *really* logged out. And it's completely OK if certain users get excludeded from that, but if so, then the admin needs to sign off on that, and thus a privilege check needs to be enforced.
Lennart
It's a reasonable approach. It definitely needed to be reviewed in the Fedora release cycle, so it can be selected or not selected as part of the announced release changes, because there are a *lot* casual processes that it will screw up. In particular unintentional logouts due to interrupted connectivity is a very, very common scenario for environments with poor connectivity. When i'm administering servers in other countries, especially for a fragile operation, I use screen and "ssh remote hostname process &" and nohup all the time to help ensure the continuity of critical operations.
On 06/02/2016 01:04 PM, Lennart Poettering wrote:
Well. Let's say you are responsible for the Linux desktops of a large security-senstive company (let's say bank, whatever), and the desktops are installed as fixed workstations, which different employees using them at different times. They log in, they do some "important company stuff", and then they log out again. Now, it's a large company, so it doesn't have the closest control on every single employee, and sometimes employees leave the company. Sometimes even the employees browse to the wrong web sites, catch a browser exploit and suddenly start runing spam bots under their user identity, without even knowing.
Do you really want to support a disruptive change in default behaviour with such a specific use case?
On Sat, 2016-06-04 at 19:36 +0200, Roberto Ragusa wrote:
On 06/02/2016 01:04 PM, Lennart Poettering wrote:
Well. Let's say you are responsible for the Linux desktops of a large security-senstive company (let's say bank, whatever), and the desktops are installed as fixed workstations, which different employees using them at different times. They log in, they do some "important company stuff", and then they log out again. Now, it's a large company, so it doesn't have the closest control on every single employee, and sometimes employees leave the company. Sometimes even the employees browse to the wrong web sites, catch a browser exploit and suddenly start runing spam bots under their user identity, without even knowing.
Do you really want to support a disruptive change in default behaviour with such a specific use case?
It is a common *enterprise* concern -- which is fine for enterprises that can afford to pay full-time sysadmins to configure systemd, policykit, SELinux, etc. The problem here is that there are a large number of non-enterprise users who are going to have to deal with yet another unexpected behavior and yet another hard-to-locate configuration option.
Yes, hard-to-locate, not because systemd's documentation is lacking but because it will take time to even realize that systemd is the problem. It took me three days to find the problem the last time systemd caused unexpected behavior on my system. What possible reason is there to think that systemd is killing processes? There are dozens of other things that would need to be ruled out first, and when you are not a full-time admin that is unacceptable.
-- Ben
On Mon, 2016-06-06 at 16:34 +0000, Jóhann B. Guðmundsson wrote:
On 06/06/2016 03:56 PM, Benjamin Kreuter wrote:
It took me three days to find the problem the last time systemd caused unexpected behavior on my system.
What was this hard to find unexpected behaviour you encountered?
The system would immediately suspend whenever I locked the screen.
The cause is that systemd reported my chassis type as "tablet," which triggers that particular behavior in GNOME. The problem is that my system is a laptop*. GNOME shares in the blame here for having hard- coded something so confusing, but systemd triggered that behavior and the only current workaround is to explicitly configure the chassis type via systemd.
My real point is not about the specific problem, but about how long it takes to figure out that systemd is even involved. It took a lot of googling to find that workaround. There is no reason to think about systemd when confronted with such behavior. With the way things are going I suppose that may change -- eventually we may assume that systemd is somehow responsible for all unwanted behavior.
-- Ben
* The marketing term for my system is "convertible," which is a laptop featuring a touchscreen lid that can be folded all the way around and used like a tablet. This is not captured by any of systemd's chassis types, and setting CHASSIS=laptop results in other unexpected (but not as bad) behavior: when folding the lid back into the more conventional laptop form the system will suspend.
On Mon, Jun 6, 2016 at 9:56 AM, Benjamin Kreuter ben.kreuter@gmail.com wrote:
On Sat, 2016-06-04 at 19:36 +0200, Roberto Ragusa wrote:
On 06/02/2016 01:04 PM, Lennart Poettering wrote:
Well. Let's say you are responsible for the Linux desktops of a large security-senstive company (let's say bank, whatever), and the desktops are installed as fixed workstations, which different employees using them at different times. They log in, they do some "important company stuff", and then they log out again. Now, it's a large company, so it doesn't have the closest control on every single employee, and sometimes employees leave the company. Sometimes even the employees browse to the wrong web sites, catch a browser exploit and suddenly start runing spam bots under their user identity, without even knowing.
Do you really want to support a disruptive change in default behaviour with such a specific use case?
It is a common *enterprise* concern -- which is fine for enterprises that can afford to pay full-time sysadmins to configure systemd, policykit, SELinux, etc. The problem here is that there are a large number of non-enterprise users who are going to have to deal with yet another unexpected behavior and yet another hard-to-locate configuration option.
Yes, hard-to-locate, not because systemd's documentation is lacking but because it will take time to even realize that systemd is the problem.
Even if you suspect systemd, which is reasonable because it legitimately has the authority to manage processes, there's nothing in the log that shows that it is systemd killing the process and why, in a traceable manner, so the user can alter policy if they don't like what's happening.
I think the point of the feature is to improve the ratio of deliberate to incidental/unintended behaviors. But the example I've come up with is deliberately initiated and privileged, yet it's killed by this feature on logout. Soo? Is that a bug?
Hi,
Lennart Poettering wrote on Wed, Jun 01, 2016 at 03:48:04PM +0200:
Again, this isn't just work-arounds around broken programs. It's a security thing. It's privileged code (logind, PID 1) that enforces a clear life-cycle on unprivileged programs.
Any scheme that relies on unprivileged programs "being nice" doesn't fix the inherent security problem: after logout a user should not be able consume further runtime resources on the system, regardless if he does that because of a bug or on purpose.
I actually don't understand this as being security, at least with the current default values where a user can set lingering for themselves and run explicitely separate sessions; anything they could do can still be done and it's just more work.
If a sysadmin wants to "secure" their environment (and it definitely does make sense in some use cases, like shared stations), they will also want to disable lingering, so they'll need to change something anyway to disable that possibility; so the default doesn't suit them.
If a sysadmin wants their users to be "least surprised", they will very probably want to change the new default back off, so they need to do something too.
All is left is single user workstations that may be happy with the new default value, but my personal narrow-minded opinion thinks that represents less...
While I'm actually writing a mail I might address a few more points: - I definitely have more programs than just screen & tmux I want to keep running, although most could be started with nohup/systemd-run, I usually don't bother (&/^Z, bg and disown is how I usually do it afterwards if I notice)
- Only screen/tmux come back all the time, but there are other alternatives - neercs, dtach to name two I know by name and occasionally use. I'm sure there are more. I'm not sure they will all want to adapt, like tmux that seems to refuse right now, and it will be a pain to manage downstream...
- I'll agree I mostly lock my screen on my own station if I want things to keep running, there however are particular cases where I need to close firefox/crap running and I will logout to close the X cruft; I'd still expect things like my fetchmail loop to run then (it's in a screen) This mainly happens on friday afternoons when I have mettings in another building and will log in from a shared post there, and the home directory is shared so my main station needs some cleanup (and logging out cleans all I want cleaned right now, but it doesn't have the new user session dbus system yet... Although I guess I don't care if that keeps running.)
- FWIW, I'm also of the opinion there are less non-X things that I want killed than things I want to stay (assuming things connected to X will die anyway when the session closes); so if we're going through the trouble to make an interface so programs can explicitely ask to not be killed, I don't see why these programs that we want killed (the user dbus, pulse, gnome calendar) could not just voluntarily ask "please kill me when the session is closed".
I think both solutions make sense, and if we want to allow both usages doing both might actually be the way forward that will please both side: admins can chose which camp they're in through a simple switch AND their relogging is not broken in either case.
Lennart Poettering mzerqung@0pointer.de writes:
Again, this isn't just work-arounds around broken programs. It's a security thing. It's privileged code (logind, PID 1) that enforces a clear life-cycle on unprivileged programs.
You're making three invalid assumptions here:
1. You're assuming that such programs are unpriviledged (or undesired)
2. You're assuming that it's PID 1's job to enforce security policy
3. You're assuming that this rule is desired by all users
Fedora as a distro needs to determine which of these assumptions are valid *for Fedora* and set the defaults accordingly, as well as determining if/how to give users the freedom to set them differently.
On Wed, Jun 01, 2016 at 01:21:06PM -0400, DJ Delorie wrote:
Fedora as a distro needs to determine which of these assumptions are valid *for Fedora* and set the defaults accordingly, as well as determining if/how to give users the freedom to set them differently.
I don't think it's possible to come up with a default that is globally applicable. Even the current status quo has its problems.
As an end-user on a multi-user system, I find auto-reaping annoying and inconvenient. I don't want my being disconnected to kill something I had running in the background, and I don't want to leave a login window open unnecessarily. Oh, and another pony.
As an end-user on a single-user (GUI) system, I want *everything* to be cleaned up when I log out because sometimes my desktop envirionment doesn't terminate things cleanly. (Replace "my desktop" with "the guest login on my system" if you'd prefer) ...Except when I don't. Only I don't know what I'll need to keep until after it's already running.
Yet as someone who adminsters multi-user systems, I absolutely want stuff to be completely cleaned up after the user logs out, in order to not waste resources. If there are long-running jobs then there are other mechanisms in place to handle them.
Anyway. I've enabled KillUserProcesses on my personal systems, because it solves more headaches than it creates.
On the other hand, my multi-user systems need screen/nohup/tmux to automagically do the right thing before I can turn KillUserProcesses on, or I'll have a minor user revolt on my hands..
- Solomon
On Wed, Jun 01, 2016 at 02:08:13PM -0400, Solomon Peachy wrote:
Fedora as a distro needs to determine which of these assumptions are valid *for Fedora* and set the defaults accordingly, as well as determining if/how to give users the freedom to set them differently.
I don't think it's possible to come up with a default that is globally applicable. Even the current status quo has its problems.
Well, we do end up needing _some_ default, since that's what a default is. Theoretically, we could have different defaults for Atomic/Cloud, Server, and Workstation depending on needs of the appropriate target audiences we've defined — but this is such a big thing that I think it's valuable to give some weight to consistency.
I couldn't agree more. Despite Lennart's repeatedly mentioning that this is substantially -- if not primarily -- a security feature, a lot of people are disregarding it. I think it's pretty dangerous and counter-productive in the long-term to have different security settings across the Fedora products.
SELinux is a great illustration of applying security settings consistently. Everyone has had problems with SELinux, especially when the target policy package was less mature. Everyone. Yet, Fedora ships in enforcing mode in all three products, even though in some people's eyes it's unnecessary for Workstation. (I don't share this opinion; I like SELinux enforcing everywhere.) There's another lesson from SELinux as well: Neither RHEL nor Fedora ship SELinux in Multi-Level Security (MLS) mode, allowing users to run unconfined_t as a compromise. Nonetheless, we're consistent everywhere with this setting. While SELinux is still daunting, the consistency of Fedora's default configuration ameliorates that to some extent.
Hacky, but it'd work for me if it worked transparently. (Or, make
/usr/bin/tmux et al be shell scripts which do the work.)
On the topic of consistency, it makes the most sense to do same as /usr/bin/yum currently does for nohup (tmux/screen/etc can become actual wrappers):
executable="/usr/bin/dnf" msg="Yum command has been deprecated, redirecting to '$executable $@'.\n"\ "See 'man dnf' and 'man yum2dnf' for more information.\n"\ "To transfer transaction metadata from yum to DNF, run:\n"\ "'dnf install python-dnf-plugins-extras-migrate && dnf-2 migrate'\n"
echo -e $msg >&2 exec $executable "$@"
On Wed, Jun 1, 2016 at 2:57 PM, Matthew Miller mattdm@fedoraproject.org wrote:
On Wed, Jun 01, 2016 at 02:08:13PM -0400, Solomon Peachy wrote:
Fedora as a distro needs to determine which of these assumptions are valid *for Fedora* and set the defaults accordingly, as well as determining if/how to give users the freedom to set them differently.
I don't think it's possible to come up with a default that is globally applicable. Even the current status quo has its problems.
Well, we do end up needing _some_ default, since that's what a default is. Theoretically, we could have different defaults for Atomic/Cloud, Server, and Workstation depending on needs of the appropriate target audiences we've defined — but this is such a big thing that I think it's valuable to give some weight to consistency.
-- Matthew Miller mattdm@fedoraproject.org Fedora Project Leader -- devel mailing list devel@lists.fedoraproject.org https://lists.fedoraproject.org/admin/lists/devel@lists.fedoraproject.org
On Wed, Jun 01, 2016 at 03:22:45PM -0500, Justin Brown wrote:
On the topic of consistency, it makes the most sense to do same as /usr/bin/yum currently does for nohup (tmux/screen/etc can become actual
Good call. Yum and dnf take logout inhibitors on the desktop, which helps in some cases, but not so much with "ssh'd into server and connection dies".
On 06/01/2016 09:48 AM, Lennart Poettering wrote:
On Wed, 01.06.16 12:19, Howard Chu (hyc@symas.com) wrote:
This is still looking at the problem back-asswards. The problem isn't that screen and tmux are special cases. The problem is that some handful of programs that got spawned in a GUI desktop environment are special cases, not exiting when they should.
Fix the broken programs, don't force every well-behaved program in the universe to change to accommodate your broken GUI environment. This is Programming 101.
Again, this isn't just work-arounds around broken programs. It's a security thing. It's privileged code (logind, PID 1) that enforces a clear life-cycle on unprivileged programs.
Any scheme that relies on unprivileged programs "being nice" doesn't fix the inherent security problem: after logout a user should not be able consume further runtime resources on the system, regardless if he does that because of a bug or on purpose.
As presently designed, it's a usability problem because it collides with the often-required idle session timeout. Your desktop session will stay up, but any remote connections subject to idle timeout will kill long-running jobs on logout. Since in general we can't predict how long the command will take (especially the system administration commands), we will have to use the convoluted invocation to persist the jobs across the unpredictable idle logout, or disable the feature.
It's ironic because as you point out it's a security risk to leave those processes running, and yet the sensitive systems are more likely to have the idle timeout turned on.
On Jun 1, 2016, at 09:48, Lennart Poettering wrote:
Any scheme that relies on unprivileged programs "being nice" doesn't fix the inherent security problem: after logout a user should not be able consume further runtime resources on the system, regardless if he does that because of a bug or on purpose.
You are redefining the meaning of (a graphical) logout. It simply means another user can use the mouse, keyboard and screen of this device. It makes no statement on whether the machines resources are shared or not.
It allows you to kill anything that has to do with the user controlling the screen, keyboard and mouse but the killing should be limited to those processes. And then we are back at "just fix those broken processes".
As others pointed out, the security feature does not really apply if the user is allowed to use any and all resources while logged in.
Paul
On Thu, 2016-06-02 at 14:19 -0400, Paul Wouters wrote:
On Jun 1, 2016, at 09:48, Lennart Poettering wrote:
Any scheme that relies on unprivileged programs "being nice" doesn't fix the inherent security problem: after logout a user should not be able consume further runtime resources on the system, regardless if he does that because of a bug or on purpose.
You are redefining the meaning of (a graphical) logout. It simply means another user can use the mouse, keyboard and screen of this device. It makes no statement on whether the machines resources are shared or not.
It allows you to kill anything that has to do with the user controlling the screen, keyboard and mouse but the killing should be limited to those processes. And then we are back at "just fix those broken processes".
I think the discussion is starting to go in circles. It is pretty clear that we have different opinions about the desired behavior of logout.
On Thursday 02 June 2016 14:38:38 Matthias Clasen wrote:
I think the discussion is starting to go in circles. It is pretty clear that we have different opinions about the desired behavior of logout.
I'll take this as an opportunity to raise a separate issue.
The current implementation has only 2 levels of control: global and individual (lingering). For non-tiny organizations this isn't good enough: * I would expect that root may set lingering for *groups* as well.
* Otherwise, administrators need to set policy per-individual and we are back to square one (killing individual user processes).
* Than we can have better default policy (e.g: members of groups wheel and staff have "lingering" on).
* Example: something similar to access.conf(5) (but "<foo>.d/*.conf" not a monolithic file).
* The design should assume that in the future, large organization would expect it their directory service. (e.g: like sudoers can now be integrated in IPA).
A separate thought: maybe have a list of exceptions (tmux/screen/vnc/whatever) but this really opens a new can of worms, so it may be better not to mix this with the user/group granularity issue.
Thanks,
On Fri, Jun 03, 2016 at 11:28:42AM +0300, Oron Peled wrote:
On Thursday 02 June 2016 14:38:38 Matthias Clasen wrote:
I think the discussion is starting to go in circles. It is pretty clear that we have different opinions about the desired behavior of logout.
I'll take this as an opportunity to raise a separate issue.
The current implementation has only 2 levels of control: global and individual (lingering). For non-tiny organizations this isn't good enough:
- I would expect that root may set lingering for *groups* as well.
That's not a bad idea. You might want to file an RFE at https://github.com/systemd/systemd/issues/new to move this forward.
Otherwise, administrators need to set policy per-individual and we are back to square one (killing individual user processes).
Than we can have better default policy (e.g: members of groups wheel and staff have "lingering" on).
Example: something similar to access.conf(5) (but "<foo>.d/*.conf" not a monolithic file).
logind reads configuration snippets from /usr/lib/systemd/logind.conf.d/ and /etc/systemd/logind.conf.d/. It should be just a matter of extending the configuration directive parsing to support groups and whatnot.
- The design should assume that in the future, large organization would expect it their directory service. (e.g: like sudoers can now be integrated in IPA).
I think polkit should have no issue with talking to IPA, so 'loginctl enable-linger' should support such policies already. If logind gained understanding of groups, this should work automatically too: it would use getpwent or similar call, which would query either the local database or the directory service, depending on local configuration.
Zbyszek
On 06/02/2016 02:19 PM, Paul Wouters wrote:
On Jun 1, 2016, at 09:48, Lennart Poettering wrote:
Any scheme that relies on unprivileged programs "being nice" doesn't fix the inherent security problem: after logout a user should not be able consume further runtime resources on the system, regardless if he does that because of a bug or on purpose.
You are redefining the meaning of (a graphical) logout. It simply means another user can use the mouse, keyboard and screen of this device. It makes no statement on whether the machines resources are shared or not.
It allows you to kill anything that has to do with the user controlling the screen, keyboard and mouse but the killing should be limited to those processes. And then we are back at "just fix those broken processes".
Actually, we have the capacity for dual login (switching users), where the first session is still active, and the new user runs his display session on a different console which grabs the mouse, keyboard and screen devices. The proposed change, as I understand it now, allows the processes from the first session to continue running.
On Thu, 02.06.16 14:19, Paul Wouters (paul@nohats.ca) wrote:
On Jun 1, 2016, at 09:48, Lennart Poettering wrote:
Any scheme that relies on unprivileged programs "being nice" doesn't fix the inherent security problem: after logout a user should not be able consume further runtime resources on the system, regardless if he does that because of a bug or on purpose.
You are redefining the meaning of (a graphical) logout. It simply means another user can use the mouse, keyboard and screen of this device. It makes no statement on whether the machines resources are shared or not.
Actually, with logind, current kernel, current X11 and/or wayland there's a very clear statement on sharing devices: logind will ensure that only the fg session can access the various evdev and DRM devices, and will suspend access for all sessions not currently in the fg. Similar, ACLs for a couple of other device nodes are patched depending on the fg session (but only for DRM and evdev the ongoing connection of bg users is suspended, as there's no concept of a generic revoke() in the Linux kernel, but only DRM and evdev-specific mechanisms). Locking this down properly, so that background sessions or even non-console logins don't get access to your devices has been something various folks from various communities have been working on for a while.
So yeah, sessions (as defined by logind) are a security concept already, and they will make sure that only the right users get access to the devices at the right times.
Lennart
On Fri, 3 Jun 2016, Lennart Poettering wrote:
You are redefining the meaning of (a graphical) logout. It simply means another user can use the mouse, keyboard and screen of this device. It makes no statement on whether the machines resources are shared or not.
Actually, with logind, current kernel, current X11 and/or wayland there's a very clear statement on sharing devices: logind will ensure that only the fg session can access the various evdev and DRM devices, and will suspend access for all sessions not currently in the fg. Similar, ACLs for a couple of other device nodes are patched depending on the fg session (but only for DRM and evdev the ongoing connection of bg users is suspended, as there's no concept of a generic revoke() in the Linux kernel, but only DRM and evdev-specific mechanisms). Locking this down properly, so that background sessions or even non-console logins don't get access to your devices has been something various folks from various communities have been working on for a while.
So yeah, sessions (as defined by logind) are a security concept already, and they will make sure that only the right users get access to the devices at the right times.
That's great. It has however, absolutely nothing to do with backgrounded processes, and their interpretation of good vs evil by systemd.
No one is saying when a graphical session ends, you cannot reclaim the devices required for the next graphical session to start.
No one is saying you cannot protect physical devices from graphical or network logins.
What it is offered now is garbage collection of the global process list, and people are stating systemd does not have the required to knowledge to successfully perform that task - and therefore should not try.
Paul
On Sun, Jun 5, 2016 at 2:20 PM, Paul Wouters paul@nohats.ca wrote:
On Fri, 3 Jun 2016, Lennart Poettering wrote:
You are redefining the meaning of (a graphical) logout. It simply means another user can use the mouse, keyboard and screen of this device. It makes no statement on whether the machines resources are shared or not.
Actually, with logind, current kernel, current X11 and/or wayland there's a very clear statement on sharing devices: logind will ensure that only the fg session can access the various evdev and DRM devices, and will suspend access for all sessions not currently in the fg. Similar, ACLs for a couple of other device nodes are patched depending on the fg session (but only for DRM and evdev the ongoing connection of bg users is suspended, as there's no concept of a generic revoke() in the Linux kernel, but only DRM and evdev-specific mechanisms). Locking this down properly, so that background sessions or even non-console logins don't get access to your devices has been something various folks from various communities have been working on for a while.
So yeah, sessions (as defined by logind) are a security concept already, and they will make sure that only the right users get access to the devices at the right times.
That's great. It has however, absolutely nothing to do with backgrounded processes, and their interpretation of good vs evil by systemd.
No one is saying when a graphical session ends, you cannot reclaim the devices required for the next graphical session to start.
No one is saying you cannot protect physical devices from graphical or network logins.
What it is offered now is garbage collection of the global process list, and people are stating systemd does not have the required to knowledge to successfully perform that task - and therefore should not try.
Paul
It can do it successfully. It can't do it safely.
On Wed, Jun 1, 2016 at 9:48 AM, Lennart Poettering mzerqung@0pointer.de wrote:
On Wed, 01.06.16 12:19, Howard Chu (hyc@symas.com) wrote:
This is still looking at the problem back-asswards. The problem isn't that screen and tmux are special cases. The problem is that some handful of programs that got spawned in a GUI desktop environment are special cases, not exiting when they should.
Fix the broken programs, don't force every well-behaved program in the universe to change to accommodate your broken GUI environment. This is Programming 101.
Again, this isn't just work-arounds around broken programs. It's a security thing. It's privileged code (logind, PID 1) that enforces a clear life-cycle on unprivileged programs.
Any scheme that relies on unprivileged programs "being nice" doesn't fix the inherent security problem: after logout a user should not be able consume further runtime resources on the system, regardless if he does that because of a bug or on purpose.
Lennart
That's what an optional, nightly, reaping cron job is for.
On Wed, Jun 01, 2016 at 10:25:32AM +0100, Tom Hughes wrote:
On 01/06/16 10:20, Bastien Nocera wrote:
On Sun, May 29, 2016 at 06:51:20PM -0600, Chris Murphy wrote:
So there's tmux, screen, curl, wget, and probably quite a few others that don't necessarily get daemonized that are probably affected.
I would really like to see a solution whereby tmux and screen _just work_ without any required changes to user behavior. They're basically commands which _indicate_ "I want a new session that persists".
Really? The only times I ever used it was to access serial consoles with a better emulation than separate apps.
You've obviously never had to run something that's going to take hours or days to complete on a remote server and not wanted it to abort half way through because of a network glitch then.
That's when I use screen, either just setting running something in the background, or leaving it connected but knowing it will continue if anything goes wrong and I can just reattach from a new login.
I'm using 'screen /dev/ttyUSBX 115200' to monitor serial consoles while I'm logged out :-)
Rich.
On Sun, May 29, 2016 at 06:51:20PM -0600, Chris Murphy wrote:
So there's tmux, screen, curl, wget, and probably quite a few others that don't necessarily get daemonized that are probably affected.
I would really like to see a solution whereby tmux and screen _just work_ without any required changes to user behavior. They're basically commands which _indicate_ "I want a new session that persists".
Really? The only times I ever used it was to access serial consoles with a better emulation than separate apps.
Yes really! I use it extensively, as does most sysadmin (or tmux) to run jobs on remote systems where you can disconnect and leave it running (or even on local systems where I might lose X for some reason). I suspect most sysadmins would use screen or tmux more than they'd use an email client.
Of course I also use it extensively on serial console too for SBCs.
Peter
On Wed, Jun 1, 2016 at 3:20 AM, Bastien Nocera bnocera@redhat.com wrote:
----- Original Message -----
On Sun, May 29, 2016 at 06:51:20PM -0600, Chris Murphy wrote:
So there's tmux, screen, curl, wget, and probably quite a few others that don't necessarily get daemonized that are probably affected.
I would really like to see a solution whereby tmux and screen _just work_ without any required changes to user behavior. They're basically commands which _indicate_ "I want a new session that persists".
Really? The only times I ever used it was to access serial consoles with a better emulation than separate apps.
Really, yes. I use PKA to login to Fedora 23 server where I then run tmux and then in a session I run weechat. I then disconnect from tmux and logout, and sometimes it's hours or days before I go log back in and of course I expect weechat to be there having logged everything since the last time I looked.
There's no way to make this work on a workstation that gets rebooted possibly a dozen times a day (the suffering life of testing and dual boot).
It seems fine to have some administrative option which prevents that, but I think allowing that behavior should be the default. That way, accidental lingering processes will be cleaned up, but people's expectations around tmux/screen will still be met.
I liked the suggestion of having those programs become "scope" aware (https://github.com/tmux/tmux/issues/428) but it looks like upstream tmux at least is not keen on it. What can we do instead?
Patch the applications downstream, or document things with enough details and mention it in the release notes.
Really?
I remain unconvinced the 80/20 rule doesn't apply here; where 80% of the problem this solution is trying to solve relates to DE's not collapsing its own user session. And the other 20% of the problem is something an admin could opt in to avoid, apparently for 5 years now, rather than opt out.
On Wed, 2016-06-01 at 04:43 -0400, Matthew Miller wrote:
On Sun, May 29, 2016 at 06:51:20PM -0600, Chris Murphy wrote:
So there's tmux, screen, curl, wget, and probably quite a few others that don't necessarily get daemonized that are probably affected.
I would really like to see a solution whereby tmux and screen _just work_ without any required changes to user behavior. They're basically commands which _indicate_ "I want a new session that persists".
It seems fine to have some administrative option which prevents that, but I think allowing that behavior should be the default. That way, accidental lingering processes will be cleaned up, but people's expectations around tmux/screen will still be met.
Yeah - I think that making tmux/screen & co persistent makes sense regardless how the systemd default is set on Fedora in the end. That's really the only way to make stuff consistent for users once the persistence on/off switch consists.
I liked the suggestion of having those programs become "scope" aware (https://github.com/tmux/tmux/issues/428) but it looks like upstream tmux at least is not keen on it. What can we do instead?
On 06/01/2016 04:43 AM, Matthew Miller wrote:
On Sun, May 29, 2016 at 06:51:20PM -0600, Chris Murphy wrote:
So there's tmux, screen, curl, wget, and probably quite a few others that don't necessarily get daemonized that are probably affected.
I would really like to see a solution whereby tmux and screen _just work_ without any required changes to user behavior. They're basically commands which _indicate_ "I want a new session that persists".
What about a default shell alias for those commands, Fedora already add a few aliases, not sure if there are packaging guidelines for that.
The alias script could check if lingering is enabled and warn the user about it.
The only problem is when people call tmux/screen from an script, I don't do it, I start them by hand on a terminal when I want it. Probably more people use them as part of a script.
It seems fine to have some administrative option which prevents that, but I think allowing that behavior should be the default. That way, accidental lingering processes will be cleaned up, but people's expectations around tmux/screen will still be met.
I liked the suggestion of having those programs become "scope" aware (https://github.com/tmux/tmux/issues/428) but it looks like upstream tmux at least is not keen on it. What can we do instead?
On Wed, Jun 01, 2016 at 02:34:01PM -0400, Robert Marcano wrote:
I would really like to see a solution whereby tmux and screen _just work_ without any required changes to user behavior. They're basically commands which _indicate_ "I want a new session that persists".
What about a default shell alias for those commands, Fedora already add a few aliases, not sure if there are packaging guidelines for that. The alias script could check if lingering is enabled and warn the user about it. The only problem is when people call tmux/screen from an script, I don't do it, I start them by hand on a terminal when I want it. Probably more people use them as part of a script.
Hacky, but it'd work for me if it worked transparently. (Or, make /usr/bin/tmux et al be shell scripts which do the work.)
My opinion is that if we can solve tmux, screen, and nohup (and possibly dtach), we'll hit the majority case, and we can document that to get this behavior in other cases, either 1) use one of those, 2) use the systemd-run command, or 3) change the config option.
(Actually, although I am not volunteering, one idea would be for nohup to move into the systemd fold and be an alias to systemd-run with special behavior (as telinit is to systemctl).)
OK so back to a specific example on Fedora 24 with a restart/shutdown delay. User gdm owns session-c1.scope, and for some reason I can't figure out, it won't quit on its own. So it enters a failed state 1m30s after I ask for a restart/shutdown. [1] I edited /etc/systemd/logind.conf uncommented KillUserProcesses=yes and rebooted. And at the next reboot, the problem still happens. While the user session itself is gone, and lsof /home no long shows user chris processes holding things up, user gdm still is causing the hang.
Since some other user process is the cause of the delay, and apparently isn't subject to killuserprocesses, at least in this instance it's not fixing (or papering over) this particular example of a restart/shutdown delay. I don't know that this is a bug, but I went ahead and filed it because it seems like killuserprocesses=yes should apply to user gdm, because that user isn't an excepted user (unless it's functionally the same as root, in which case the default #KillExcludeUsers=root is probably why, and now we're just back where we started where there are wayward processes that are causing restart hangs and are difficult to identify.
Chris
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1337307 [2] https://bugzilla.redhat.com/show_bug.cgi?id=1341837
On Sun, May 29, 2016 at 5:06 PM, Zbigniew Jędrzejewski-Szmek zbyszek@in.waw.pl wrote:
- Does 'loginctl enable-linger <user>' take effect in the current session? Or do you have to start a new one? does it persist over sessions or only affects the current/next one?
Lingering applies to the systemd --user instance, a.k.a. systemd@.service, not to the session. Lingering means that systemd@.service is present even if you are not logged in. If lingering is disabled, it is started on login, and stopped on logout of that user.
Killing processes which are part of the session (session-<n>.scope) doesn't have anything to do directly with lingering. It is controlled by the global KillUserProcesses= setting.
The connection between KillUserProcesses= and long-running processes is that if KillUserProcesses=yes is set (the new default), to successfully create a process which survives logout two steps are needed:
- move it out of the session into a systemd --user unit,
- make that systemd --user instance persistent, i.e. enable lingering.
Setting lingering is done over dbus, takes effect immediately, and is persistent (/var/lib/systemd/linger/<user> is created).
Setting KillUserProcesses can be done by modifying /etc/systemd/logind.conf, and also takes effect immediately, if systemd-logind is reloaded (using SIGHUP).
Can you clarify how systemd-run --user --scope fits in to this?
While I certainly understand the motivation of running services in a clean environment (as systemd-run without --scope would do), there are cases where that's the wrong thing to do. For example, if nohup were adjusted to work in the new regime, it would *not* want a clean environment. But I still don't understand how scopes work, what they have to do with lingering, whether every scope lives strictly within a service, or pretty much anything else about them. The systemd.scope(5) manpage isn't particularly helpful.
--Andy
On Sun, 29.05.16 12:53, Kevin Fenzi (kevin@scrye.com) wrote:
So, my 2 cents...
Some questions for upstream:
- I assume killed processes are logged in the journal, but Is there any way to have a 'permissive' version? ie, simply log what would have been killed, but not do anything? That would be very helpful to folks to identify things that would be affected here without disrupting them at first. It would also allow bugs in other packages to get fixed up.
No, there is no such scheme. There's only off an on. But I am not sure adding something like this is really necessary, as you can always see what's going to be killed via "loginctl user-status $USER"...
- Does 'loginctl enable-linger <user>' take effect in the current session? Or do you have to start a new one? does it persist over sessions or only affects the current/next one?
It's a per-user setting and it applies per-user instantly.
- How can I tell if linger is enabled or disabled on a user?
loginctl status $USER
- enable-linger/disable-linger need root? So, the only way the user can exclude things is to use systemd-run?
It's PolicyKit protected, but the default policy is liberal, and opens this up for unprivileged clients.
Lennart
Dne 28.5.2016 v 05:11 Ben Rosser napsal(a):
I agree; just because the change happened upstream in systemd doesn't mean that this shouldn't be evaluated in Fedora itself before being turned on by default.
This absolutely seems like the kind of thing that should be a system-wide change proposal (for F25, I guess).
+1 Systemd has the right to enable it in upstream. But distribution should respect the whole community. I.e. Fedora maintainer should disable this change in package. Notify affected packages (and file BZ) and only when majority of them are resolved, then this feature should be enabled in distribution.
Em 30-05-2016 07:56, Miroslav Suchý escreveu:
Dne 28.5.2016 v 05:11 Ben Rosser napsal(a):
I agree; just because the change happened upstream in systemd doesn't mean that this shouldn't be evaluated in Fedora itself before being turned on by default.
This absolutely seems like the kind of thing that should be a system-wide change proposal (for F25, I guess).
+1 Systemd has the right to enable it in upstream. But distribution should respect the whole community. I.e. Fedora maintainer should disable this change in package. Notify affected packages (and file BZ) and only when majority of them are resolved, then this feature should be enabled in distribution.
+1 fwiw
Marcelo
You know, it seems to me that systemd doing this to work around a Gnome problem (and a problem I have not seen outside of Gnome), is sort of like glibc working around a bug in Firefox and at the same time breaking bash. We're taking a bug in the Gnome stack and putting a 'fix' in systemd that breaks all sorts of applications. In turn, the 'fix' to those breakages is to add new, systemd specific code. I fail to see how this is even acceptable. I know right off that projects such as Mediagoblin are going to refuse to include such code, and rightfully so.
There are distros, such as Void, that exist specifically to avoid systemd. While obviously the systemd developers do not care about such distros, it is really not cool to force dependencies that they would rather avoid on them.
Here's an idea. How about Gnome fix their broken crap, and let's not enable this missfeature in systemd? All these problems (including the true, root problem!) go away. Alas, this seems to be too difficult a solution.
John.
On Mon, 30 May 2016 01:43:30 -0000 "John Dulaney" jdulaney@gnu.org wrote:
You know, it seems to me that systemd doing this to work around a Gnome problem (and a problem I have not seen outside of Gnome), is sort of like glibc working around a bug in Firefox and at the same time breaking bash. We're taking a bug in the Gnome stack and putting a 'fix' in systemd that breaks all sorts of applications. In turn, the 'fix' to those breakages is to add new, systemd specific code. I fail to see how this is even acceptable. I know right off that projects such as Mediagoblin are going to refuse to include such code, and rightfully so.
I've seen lingering processes/applications pretty much from all desktop envs at various times and places. I don't think this is Gnome specific IMHO.
I think it would be helpful if it could log these offenders and we could actually gather data on them instead of speculating.
kevin
On Mon, May 30, 2016 at 01:43:30 -0000, John Dulaney jdulaney@gnu.org wrote:
Here's an idea. How about Gnome fix their broken crap, and let's not enable this missfeature in systemd? All these problems (including the true, root problem!) go away. Alas, this seems to be too difficult a solution.
Let's be excellent to each other. Everyone here wants Fedora to be better and we should keep that in mind when discussing problems.
On Mon, 30.05.16 01:43, John Dulaney (jdulaney@gnu.org) wrote:
You know, it seems to me that systemd doing this to work around a Gnome problem (and a problem I have not seen outside of Gnome), is sort of like glibc working around a bug in Firefox and at the same time breaking bash. We're taking a bug in the Gnome stack and putting a 'fix' in systemd that breaks all sorts of applications.
This is a misunderstanding. Key here is that it is privileged code that enforces clean-up after logout. While it certainly would be great if all userspace software would clean up after itself, this is ultimately of no relevance, as long as this clean-up is voluntary and not enforced by the system.
The changed default here is really about defining the lifecycle of unprivileged code by privileged code, and thus about security. An unprivileged user should not be able run code at any time it wishes unless the admin allowed this, and thus it needs to be the system that enforces the lifecycle; and if it is opened up for clients it must go through some authentication layer, such as PolicyKit, which it does here.
Lennart
On Mon, 2016-05-30 at 12:05 +0200, Lennart Poettering wrote:
The changed default here is really about defining the lifecycle of unprivileged code by privileged code, and thus about security.
Security against what? Who is the attacker? What is the threat model?
Bandying about the word "security" to justify a change that clearly angers a lot of people does not make for a strong argument. It is also not the case that Fedora puts security above usability or expected behavior in all cases. The default SELinux policy does not deny execmem/execstack/etc., even though there is a clear security story for doing so, because it would break various things (web browsers, some programming language runtimes, etc.) in ways that aggravate users.
An unprivileged user should not be able run code at any time it wishes unless the admin allowed this,
Are we planning to disable cron? Is reconnecting to screen or tmux sessions suddenly out? VNC? There are literally hundreds of use-cases this kind of policy would break.
-- Ben
On Fri, 2016-05-27 at 11:51 +0200, Dominique Martinet wrote:
Hi,
Just noticed this change on rawhide... https://github.com/systemd/systemd/blob/master/NEWS#L29
systemd-logind will now by default terminate user processes that are part of the user session scope unit (session-XX.scope) when the user logs out. This behavior is controlled by the KillUserProcesses= setting in logind.conf, and the previous default of "no" is now changed to "yes". This means that user sessions will be properly cleaned up after, but additional steps are necessary to allow intentionally long-running processes to survive logout.
While the user is logged in at least once, user@.service is running, and any service that should survive the end of any individual login session can be started at a user service or scope using systemd-run. systemd-run(1) man page has been extended with an example which shows how to run screen in a scope unit underneath user@.service. The same command works for tmux.
After the user logs out of all sessions, user@.service will be terminated too, by default, unless the user has "lingering" enabled. To effectively allow users to run long-term tasks even if they are logged out, lingering must be enabled for them. See loginctl(1) for details. The default polkit policy was modified to allow users to set lingering for themselves without authentication.
Previous defaults can be restored at compile time by the --without-kill-user-processes option to "configure".
This made the press when it landed in Debian:
http://www.theregister.co.uk/2016/05/30/systemd_kills_deb_processes/
I'm not even going to bother clicking on the comment thread...