Cross post from the ansible list, to see if anyone here has any clue
Ansible 2.0.0.2 Control host: Ubuntu 14.04 Controlled host: CentOS 6
So, I've been trying to set up FreeIPA on my CentOS. I was getting really frustrated because right after ipa-server-install completed successfully, and I ran /etc/init.d/ipa start, subsequent commands failed. I finally realized that dirsrv (389 LDAP server) was stopping soon after starting.
Thinking there was something odd in the ipa startup, I started IPA, slept for 30 seconds, and then tried to start dirsrv. That reported that dirsrv was already running...but then it shut down right away. Logging in to the machine and starting dirsrv was fine. Starting dirsrv via
ssh <host> "/etc/init.d/dirsrv start"
also worked.
So, I put this in my Ansible command:
shell: /etc/init.d/dirsrv start && sleep 30
The logs show dirsrv start. And stay started. As soon as that sleep 30 expires, however, and ansible "hangs up" the server immediately shuts down. Same if I try:
command: /etc/init.d/dirsrv start && sleep 30
Same problem if I wrap the thing in a "script:" command.
WHAT would be killing a daemon, started by an init.d script, to shut down (cleanly, no less) when ansible is done with the command and disconnects. And the command to start IPA (or just the server, in the case of chasing down the bug) is followed by other ansible commands for that host, so it's not like ansible is done with the host when it "hangs up" after the given command.
I am at my wits end. Does anyone have any ideas how to fix or work around it? I even tried wrapping the init.d/ipa start in a
screen -d -m
session, but that shuts down right away.
Interestingly enough, if I put this in a script:
#!/bin/bash
screen -d -m /etc/init.d/ipa start /etc/init.d/ipa start sleep 30
And then pass that to the "screen background" process, even though it has gone into the background, Ansible won't continue until the script ends and the screen session terminates...but the dirsrv does stop right away!
So, something truly weird is going on here. Clearly a bug on the dirsrv side, but a really weird interaction with ansible and its ssh sessions.
Ideas would be greatly appreciated!
j
On 02/11/2016 10:44 AM, Joshua J. Kugler wrote:
Cross post from the ansible list, to see if anyone here has any clue
Ansible 2.0.0.2 Control host: Ubuntu 14.04 Controlled host: CentOS 6
So, I've been trying to set up FreeIPA on my CentOS. I was getting really frustrated because right after ipa-server-install completed successfully, and I ran /etc/init.d/ipa start, subsequent commands failed. I finally realized that dirsrv (389 LDAP server) was stopping soon after starting.
Joshua,
Can you post the Directory Server's errors log: /var/log/dirsrv/slapd-INSTANCE/errors
Also the version of the Directory Server: rpm -qa | grep 389-ds-base
If you see in the Directory Server's errors log: "Detected Disorderly Shutdown" then the server is crashing. To further troubleshoot the crash(if there is one) please follow:
http://www.port389.org/docs/389ds/FAQ/faq.html#sts=Debugging%C2%A0Crashes
Then reproduce the crash, and post some stacktraces from the core file to this list.
Thanks, Mark
Thinking there was something odd in the ipa startup, I started IPA, slept for 30 seconds, and then tried to start dirsrv. That reported that dirsrv was already running...but then it shut down right away. Logging in to the machine and starting dirsrv was fine. Starting dirsrv via
ssh <host> "/etc/init.d/dirsrv start"
also worked.
So, I put this in my Ansible command:
shell: /etc/init.d/dirsrv start && sleep 30
The logs show dirsrv start. And stay started. As soon as that sleep 30 expires, however, and ansible "hangs up" the server immediately shuts down. Same if I try:
command: /etc/init.d/dirsrv start && sleep 30
Same problem if I wrap the thing in a "script:" command.
WHAT would be killing a daemon, started by an init.d script, to shut down (cleanly, no less) when ansible is done with the command and disconnects. And the command to start IPA (or just the server, in the case of chasing down the bug) is followed by other ansible commands for that host, so it's not like ansible is done with the host when it "hangs up" after the given command.
I am at my wits end. Does anyone have any ideas how to fix or work around it? I even tried wrapping the init.d/ipa start in a
screen -d -m
session, but that shuts down right away.
Interestingly enough, if I put this in a script:
#!/bin/bash
screen -d -m /etc/init.d/ipa start /etc/init.d/ipa start sleep 30
And then pass that to the "screen background" process, even though it has gone into the background, Ansible won't continue until the script ends and the screen session terminates...but the dirsrv does stop right away!
So, something truly weird is going on here. Clearly a bug on the dirsrv side, but a really weird interaction with ansible and its ssh sessions.
Ideas would be greatly appreciated!
j
Here is the entire log from 'errors'. It starts up just fine, then when Ansible disconnects, something shuts it down. Is the dirsrv startup script not properly applying NOHUP or some such? The REALLY weird thing is that this works properly with an interactive ssh login, or with
ssh <host> "/etc/init.d/ipa start"
j
[11/Feb/2016:12:09:50 -0900] - 389-Directory/1.2.11.15 B2016.040.940 starting up [11/Feb/2016:12:09:50 -0900] schema-compat-plugin - warning: no entries set up under cn=computers, cn=compat,dc=kugler,dc=localdomain [11/Feb/2016:12:09:50 -0900] schema-compat-plugin - warning: no entries set up under cn=ng, cn=compat,dc=kugler,dc=localdomain [11/Feb/2016:12:09:50 -0900] schema-compat-plugin - warning: no entries set up under ou=sudoers,dc=kugler,dc=localdomain [11/Feb/2016:12:09:50 -0900] - Skipping CoS Definition cn=Password Policy,cn=accounts,dc=kugler,dc=localdomain--no CoS Templates found, which should be added before the CoS Definition. [11/Feb/2016:12:09:50 -0900] - Skipping CoS Definition cn=Password Policy,cn=accounts,dc=kugler,dc=localdomain--no CoS Templates found, which should be added before the CoS Definition. [11/Feb/2016:12:09:50 -0900] - slapd started. Listening on All Interfaces port 389 for LDAP requests [11/Feb/2016:12:09:50 -0900] - Listening on All Interfaces port 636 for LDAPS requests [11/Feb/2016:12:09:50 -0900] - Listening on /var/run/slapd-KUGLER- LOCALDOMAIN.socket for LDAPI requests [11/Feb/2016:12:10:30 -0900] - slapd shutting down - signaling operation threads [11/Feb/2016:12:10:30 -0900] - slapd shutting down - waiting for 1 thread to terminate [11/Feb/2016:12:10:30 -0900] - slapd shutting down - closing down internal subsystems and plugins [11/Feb/2016:12:10:30 -0900] - Waiting for 4 database threads to stop [11/Feb/2016:12:10:31 -0900] - All database threads now stopped [11/Feb/2016:12:10:31 -0900] - slapd stopped.
On Thursday, February 11, 2016 10:54:39 Mark Reynolds wrote:
On 02/11/2016 10:44 AM, Joshua J. Kugler wrote:
Cross post from the ansible list, to see if anyone here has any clue
Ansible 2.0.0.2 Control host: Ubuntu 14.04 Controlled host: CentOS 6
So, I've been trying to set up FreeIPA on my CentOS. I was getting really frustrated because right after ipa-server-install completed successfully, and I ran /etc/init.d/ipa start, subsequent commands failed. I finally realized that dirsrv (389 LDAP server) was stopping soon after starting.
Joshua,
Can you post the Directory Server's errors log: /var/log/dirsrv/slapd-INSTANCE/errors
Also the version of the Directory Server: rpm -qa | grep 389-ds-base
If you see in the Directory Server's errors log: "Detected Disorderly Shutdown" then the server is crashing. To further troubleshoot the crash(if there is one) please follow:
http://www.port389.org/docs/389ds/FAQ/faq.html#sts=Debugging%C2%A0Crashes
Then reproduce the crash, and post some stacktraces from the core file to this list.
Thanks, Mark
Thinking there was something odd in the ipa startup, I started IPA, slept for 30 seconds, and then tried to start dirsrv. That reported that dirsrv was already running...but then it shut down right away. Logging in to the machine and starting dirsrv was fine. Starting dirsrv via
ssh <host> "/etc/init.d/dirsrv start"
also worked.
So, I put this in my Ansible command:
shell: /etc/init.d/dirsrv start && sleep 30
The logs show dirsrv start. And stay started. As soon as that sleep 30 expires, however, and ansible "hangs up" the server immediately shuts down. Same if I try:
command: /etc/init.d/dirsrv start && sleep 30
Same problem if I wrap the thing in a "script:" command.
WHAT would be killing a daemon, started by an init.d script, to shut down (cleanly, no less) when ansible is done with the command and disconnects. And the command to start IPA (or just the server, in the case of chasing down the bug) is followed by other ansible commands for that host, so it's not like ansible is done with the host when it "hangs up" after the given command.
I am at my wits end. Does anyone have any ideas how to fix or work around it? I even tried wrapping the init.d/ipa start in a
screen -d -m
session, but that shuts down right away.
Interestingly enough, if I put this in a script:
#!/bin/bash
screen -d -m /etc/init.d/ipa start /etc/init.d/ipa start sleep 30
And then pass that to the "screen background" process, even though it has gone into the background, Ansible won't continue until the script ends and the screen session terminates...but the dirsrv does stop right away!
So, something truly weird is going on here. Clearly a bug on the dirsrv side, but a really weird interaction with ansible and its ssh sessions.
Ideas would be greatly appreciated!
j
-- 389 users mailing list 389-users@%(host_name)s http://lists.fedoraproject.org/admin/lists/389-users@lists.fedoraproject.org
On Thu, 2016-02-11 at 14:44 -0800, Joshua J. Kugler wrote:
Here is the entire log from 'errors'. It starts up just fine, then when Ansible disconnects, something shuts it down. Is the dirsrv startup script not properly applying NOHUP or some such? The REALLY weird thing is that this works properly with an interactive ssh login, or with
ssh <host> "/etc/init.d/ipa start"
Please do NOT use /etc/init.d/ scripts as they do not set the SELinux context correctly. You need to use "service <name> <action>" else you may be creating SELinux issues with your daemons.
Are you running EL6 or el7?
Try:
sudo ipactl restart
That may give you a hint as to what service has the issue
For example, when I have IPA issues, it tends to be DNS, and it's only when the SMB process tries to start that it fails.
So when smb fails, then ipactl unwinds and shuts down all the OTHER services.
So you may find the issue isn't Ds, but some other part of freeipa.
I hope that helps you solve the issue.
William -
Thanks for the tips.
For my debugging, se linux is disabled, and this issue happens even if I leave IPA out of it: if I just do /etc/init.d/dirsrv start, it then shuts down when the ansible connection closes.
j
On Friday, February 12, 2016 09:43:40 William Brown wrote:
On Thu, 2016-02-11 at 14:44 -0800, Joshua J. Kugler wrote:
Here is the entire log from 'errors'. It starts up just fine, then when Ansible disconnects, something shuts it down. Is the dirsrv startup script not properly applying NOHUP or some such? The REALLY weird thing is that this works properly with an interactive ssh login, or with
ssh <host> "/etc/init.d/ipa start"
Please do NOT use /etc/init.d/ scripts as they do not set the SELinux context correctly. You need to use "service <name> <action>" else you may be creating SELinux issues with your daemons.
Are you running EL6 or el7?
Try:
sudo ipactl restart
That may give you a hint as to what service has the issue
For example, when I have IPA issues, it tends to be DNS, and it's only when the SMB process tries to start that it fails.
So when smb fails, then ipactl unwinds and shuts down all the OTHER services.
So you may find the issue isn't Ds, but some other part of freeipa.
I hope that helps you solve the issue.
On Thu, 2016-02-11 at 17:25 -0800, Joshua J. Kugler wrote:
William -
Thanks for the tips.
For my debugging, se linux is disabled, and this issue happens even if I leave IPA out of it: if I just do /etc/init.d/dirsrv start, it then shuts down when the ansible connection closes.
Are you using the ansible service module?
On Friday, February 12, 2016 15:41:28 William Brown wrote:
On Thu, 2016-02-11 at 17:25 -0800, Joshua J. Kugler wrote:
William -
Thanks for the tips.
For my debugging, se linux is disabled, and this issue happens even if I leave IPA out of it: if I just do /etc/init.d/dirsrv start, it then shuts down when the ansible connection closes.
Are you using the ansible service module?
I was. That's when I discovered the behavior. Also tried the ansible shell and command modules. Same thing.
j
On Thu, 2016-02-11 at 23:08 -0800, Joshua J. Kugler wrote:
On Friday, February 12, 2016 15:41:28 William Brown wrote:
On Thu, 2016-02-11 at 17:25 -0800, Joshua J. Kugler wrote:
William -
Thanks for the tips.
For my debugging, se linux is disabled, and this issue happens even if I leave IPA out of it: if I just do /etc/init.d/dirsrv start, it then shuts down when the ansible connection closes.
Are you using the ansible service module?
I was. That's when I discovered the behavior. Also tried the ansible shell and command modules. Same thing.
Can you please post the output of sudo ipactl restart. I am suspicious it's not a DS issue, but another component of IPA is failing that causes the dirsrv to stop.
Can you post the yml of the service command you were using?
What version of EL are you running?
On Monday, February 15, 2016 07:51:11 William Brown wrote:
On Thu, 2016-02-11 at 23:08 -0800, Joshua J. Kugler wrote:
On Friday, February 12, 2016 15:41:28 William Brown wrote:
On Thu, 2016-02-11 at 17:25 -0800, Joshua J. Kugler wrote:
William -
Thanks for the tips.
For my debugging, se linux is disabled, and this issue happens even if I leave IPA out of it: if I just do /etc/init.d/dirsrv start, it then shuts down when the ansible connection closes.
Are you using the ansible service module?
I was. That's when I discovered the behavior. Also tried the ansible shell and command modules. Same thing.
Can you please post the output of sudo ipactl restart. I am suspicious it's not a DS issue, but another component of IPA is failing that causes the dirsrv to stop.
Can you post the yml of the service command you were using?
What version of EL are you running?
This is CentOS 6.7. I could post the output of 'sudo ipactl restart' but that wouldn't show the problem.
1) If I'm logged in to a shell (SSH) it works 2) If I do it via "ssh host 'command'" it works
It is only when I invoke it via Ansible that it shows this behavior. Also noted: it behaves this way just stand alone (see above). Using just /etc/init.d/dirsrv start, it will shut down as soon as the connection goes away. The attached log show the entire process from startup to shutdown.
Invoking this (via an ansible shell command) fails to work correctly as well:
- name: Start Dirsrv shell: nohup screen -d -m /usr/sbin/start-dirsrv
So even trying to nohup + using screen to "background" it, it still shuts down immediately after that 'shell' stanza is done.
Even this fails: nohup screen -d -m /usr/sbin/start-dirsrv & disown
The ansible output from just invoking /usr/sbin/start-dirsrv is:
changed: [192.168.122.12] => {"changed": true, "cmd": "/usr/sbin/start- dirsrv", "delta": "0:00:02.760271", "end": "2016-02-11 12:40:15.909214", "rc": 0, "start": "2016-02-11 12:40:13.148943", "stderr": "", "stdout": "Starting instance "KUGLER-LOCALDOMAIN"\nStarting instance "PKI-IPA"", "stdout_lines": ["Starting instance "KUGLER-LOCALDOMAIN"", "Starting instance "PKI-IPA""], "warnings": []}
So it successfully starts. No errors. But then shuts down right away.
j
Can you please post the output of sudo ipactl restart. I am suspicious it's not a DS issue, but another component of IPA is failing that causes the dirsrv to stop.
Can you post the yml of the service command you were using?
What version of EL are you running?
This is CentOS 6.7. I could post the output of 'sudo ipactl restart' but that wouldn't show the problem.
- If I'm logged in to a shell (SSH) it works
- If I do it via "ssh host 'command'" it works
It is only when I invoke it via Ansible that it shows this behavior. Also noted: it behaves this way just stand alone (see above). Using just /etc/init.d/dirsrv start, it will shut down as soon as the connection goes away. The attached log show the entire process from startup to shutdown.
Please do *not* use /etc/init.d scripts. You *must* use "service <name> <action>"
Invoking this (via an ansible shell command) fails to work correctly as well:
- name: Start Dirsrv
shell: nohup screen -d -m /usr/sbin/start-dirsrv
So even trying to nohup + using screen to "background" it, it still shuts down immediately after that 'shell' stanza is done.
Even this fails: nohup screen -d -m /usr/sbin/start-dirsrv & disown
Yes, because this process forks into the background. You would expect it to go away.
The ansible output from just invoking /usr/sbin/start-dirsrv is:
changed: [192.168.122.12] => {"changed": true, "cmd": "/usr/sbin/start- dirsrv", "delta": "0:00:02.760271", "end": "2016-02-11 12:40:15.909214", "rc": 0, "start": "2016-02-11 12:40:13.148943", "stderr": "", "stdout": "Starting instance "KUGLER-LOCALDOMAIN"\nStarting instance "PKI-IPA"", "stdout_lines": ["Starting instance "KUGLER-LOCALDOMAIN"", "Starting instance "PKI-IPA""], "warnings": []}
So it successfully starts. No errors. But then shuts down right away.
That log shows a clean slapd shutdown, not a termination or crash.
What happens if you use the ansible service module with -vvvv IE:
- name: Start dirsrv sudo: yes action: service enabled=yes state=restarted
Have you got any esoteric arguments in say /etc/sysconfig/dirsrv? Are you adding extra cli args like -d 0 to ns-slapd? (That would certainly break it ... ).
Are you running your ansible playbooks at sudo? Trying to start ns-slapd without privileges would cause issues.
Can you see anything in /var/log/messages?
When you use ansible to control ipa rather than dirsrv directly, does that have the same issue?
I think the issue is not with dirsrv at all, but with your ansible environment and how you are trying to start / stop the services....
On Wednesday, February 24, 2016 14:57:20 William Brown wrote:
Please do *not* use /etc/init.d scripts. You *must* use "service <name> <action>"
Invoking this (via an ansible shell command) fails to work correctly as well:
- name: Start Dirsrv shell: nohup screen -d -m /usr/sbin/start-dirsrv
So even trying to nohup + using screen to "background" it, it still shuts down immediately after that 'shell' stanza is done.
Even this fails: nohup screen -d -m /usr/sbin/start-dirsrv & disown
Yes, because this process forks into the background. You would expect it to go away.
Yes, I expect it to fork to the background, but thought that maybe it was still connected to the forground...somehow.
That log shows a clean slapd shutdown, not a termination or crash.
Right.
What happens if you use the ansible service module with -vvvv IE:
- name: Start dirsrv sudo: yes action: service enabled=yes state=restarted
I will set it up and run it again, but the same thing happened. That's how I originally found it. Actually, I did it with 'ipa' as the service, and the dirsrv part was stopping right away. The direct invocation of the startup scripts were subsequent trouble shooting.
Have you got any esoteric arguments in say /etc/sysconfig/dirsrv? Are you adding extra cli args like -d 0 to ns-slapd? (That would certainly break it ... ).
No. Just the defaults + whatever IPA installer adds.
Are you running your ansible playbooks at sudo? Trying to start ns-slapd without privileges would cause issues.
Yes, it's all as sudo (installation of packages and all other root-requiring sutff works).
Can you see anything in /var/log/messages?
I'll check, but if I remember correctly, no.
When you use ansible to control ipa rather than dirsrv directly, does that have the same issue?
Yes, that's how I originally found the issue.
I think the issue is not with dirsrv at all, but with your ansible environment and how you are trying to start / stop the services....
This is a bog-standard ansible environment. Not sure what I would be doing wrong. All other services started by Ansible remain running after it disconnects.
j
What happens if you use the ansible service module with -vvvv IE:
- name: Start dirsrv
sudo: yes action: service enabled=yes state=restarted
I will set it up and run it again, but the same thing happened. That's how I originally found it. Actually, I did it with 'ipa' as the service, and the dirsrv part was stopping right away. The direct invocation of the startup scripts were subsequent trouble shooting.
Can I see this with -vvvv on the ansible-playbook run thanks? If you have output you don't want shared, feel free to send direct to me offlist.
I think the issue is not with dirsrv at all, but with your ansible environment and how you are trying to start / stop the services....
This is a bog-standard ansible environment. Not sure what I would be doing wrong. All other services started by Ansible remain running after it disconnects.
Who knows, it could be anything.
For my sanity, if you ssh to the system and run:
sudo service dirsrv start
Does that work and persist?
On Thursday, February 25, 2016 14:50:58 William Brown wrote:
What happens if you use the ansible service module with -vvvv IE:
- name: Start dirsrv sudo: yes action: service enabled=yes state=restarted
I will set it up and run it again, but the same thing happened. That's how I originally found it. Actually, I did it with 'ipa' as the service, and the dirsrv part was stopping right away. The direct invocation of the startup scripts were subsequent trouble shooting.
Can I see this with -vvvv on the ansible-playbook run thanks? If you have output you don't want shared, feel free to send direct to me offlist.
Yes, I'll work on getting that.
For my sanity, if you ssh to the system and run:
sudo service dirsrv start
Yes. ssh'ing to the host and starting it works. Also, it works if I do:
ssh hostname "command_to_start"
it works and persists.
j
On 2/11/2016 5:44 PM, Joshua J. Kugler wrote:
Cross post from the ansible list, to see if anyone here has any clue
Ansible 2.0.0.2 Control host: Ubuntu 14.04 Controlled host: CentOS 6
<cut>
Hello,
could be unrelated, but check if the server crashes from TLS 1.2 connections. I had this issue a month ago:
# This crashes the server openssl s_client -connect ldap:636 -tls1_2 # This should be fine: openssl s_client -connect ldap:636 -tls1 / or -tls1_1
If it crashes, put this line in /etc/sysconfig/dirsrv and restart the server:
export NSS_DISABLE_HW_GCM=1
Maybe there is a connection from a machine that tries to use TLS 1.2 by default and it crashes the server.
389-users@lists.fedoraproject.org