Still trying to figure this out. It looks like slapd is dying, I thought it was still running for some reason.
slapd is dying to segfault. strace of it happening doesn't seem to reveal much:
18:32:41.543717 (+ 0.000801) openat(AT_FDCWD, "/var/run/dirsrv/slapd-COMPANY-INTERNAL.pid", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 32 18:32:41.544907 (+ 0.001195) getpid() = 16014 18:32:41.545269 (+ 0.000329) fstat64(32, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 18:32:41.545799 (+ 0.000536) write(32, "16014\n", 6) = 6 18:32:41.546603 (+ 0.000818) close(32) = 0 18:32:41.547061 (+ 0.000448) chmod("/var/run/dirsrv/slapd-COMPANY-INTERNAL.pid", 0644) = 0 18:32:41.547741 (+ 0.000676) clock_gettime(CLOCK_REALTIME, {tv_sec=1525735961, tv_nsec=548030641}) = 0 18:32:41.548324 (+ 0.000587) write(2, "[07/May/2018:18:32:41.548030641 "..., 134) = 134 18:32:41.551096 (+ 0.002840) clock_gettime(CLOCK_REALTIME, {tv_sec=1525735961, tv_nsec=551287555}) = 0 18:32:41.551568 (+ 0.000406) write(4, "[07/May/2018:18:32:41.551287555 "..., 134) = 134 18:32:41.552360 (+ 0.000811) fsync(4) = 0 18:32:41.558499 (+ 0.006170) clock_gettime(CLOCK_REALTIME, {tv_sec=1525735961, tv_nsec=558678099}) = 0 18:32:41.558901 (+ 0.000350) write(2, "[07/May/2018:18:32:41.558678099 "..., 136) = 136 18:32:41.561537 (+ 0.002680) clock_gettime(CLOCK_REALTIME, {tv_sec=1525735961, tv_nsec=561718659}) = 0 18:32:41.562357 (+ 0.000793) write(4, "[07/May/2018:18:32:41.561718659 "..., 136) = 136 18:32:41.563293 (+ 0.001148) fsync(4) = 0 18:32:41.566928 (+ 0.003452) getpid() = 16014 18:32:41.567712 (+ 0.000752) socket(AF_UNIX, SOCK_DGRAM|SOCK_CLOEXEC, 0) = 32 18:32:41.568628 (+ 0.000912) getsockopt(32, SOL_SOCKET, SO_SNDBUF, [163840], [4]) = 0 18:32:41.568972 (+ 0.000319) setsockopt(32, SOL_SOCKET, SO_SNDBUFFORCE, [8388608], 4) = -1 EPERM (Operation not permitted) 18:32:41.569548 (+ 0.000589) setsockopt(32, SOL_SOCKET, SO_SNDBUF, [8388608], 4) = 0 18:32:41.570064 (+ 0.000513) sendmsg(32, {msg_name={sa_family=AF_UNIX, sun_path="/run/systemd/notify"}, msg_namelen=21, msg_iov=[{iov_base="READY=1\nSTATUS=slapd started: Re"..., iov_len=69}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 69 18:32:41.570845 (+ 0.000789) close(32) = 0 18:32:41.576358 (+ 0.005575) futex(0xb30ed2e8, FUTEX_WAIT, 16016, NULL) = ? 18:33:01.730774 (+ 20.154428) +++ killed by SIGSEGV +++
It doesn't die immediately after hitting that final futex call - it's just over 20s later. There's still plenty of free memory at the time.
# free -m total used free shared buff/cache available Mem: 974 222 156 0 594 679 Swap: 2535 24 2511
The last entries in the error log is that it is listening on socket / port. No error logged for the segfault.
I did see that during previous setup steps (before it dies) there is apparently an error setting up replication:
[07/May/2018:18:17:46.245041807 -0500] - ERR - NSMMReplicationPlugin - acquire_replica - agmt="cn=meToipa-12.company.internal" (ipa-12:5): Unable to acquire replica: permission denied. The bind dn "" does not have permission to supply replication updates to the replica. Will retry later.
Possibly, the failure to configure replication is leading to the segfault? Any ideas why it would not have permission / what to check to ensure it has permission ?
On Thu, May 3, 2018 at 4:02 PM, Jonathan Vaughn jonathan@creatuity.com wrote:
It's still running.
Here's the error log from slapd during that run:
[01/May/2018:19:22:24.567650453 -0500] - ERR - NSMMReplicationPlugin - multimaster_extop_StartNSDS50ReplicationRequest - conn=42 op=5 replica="dc=company,dc=internal": Unable to acquire replica: error: permission denied [01/May/2018:19:22:24.613143629 -0500] - ERR - NSMMReplicationPlugin - multimaster_extop_StartNSDS50ReplicationRequest - conn=43 op=5 replica="dc=company,dc=internal": Unable to acquire replica: error: permission denied [01/May/2018:19:22:31.445153171 -0500] - INFO - NSMMReplicationPlugin - repl5_tot_run - Beginning total update of replica "agmt="cn=meToipa-11.company.internal" (ipa-11:389)". [01/May/2018:19:22:31.672485209 -0500] - NOTICE - NSMMReplicationPlugin - replica_subentry_check - Need to create replication keep alive entry <cn=repl keep alive 7,dc=company,dc=internal> [01/May/2018:19:22:31.674922201 -0500] - INFO - NSMMReplicationPlugin - replica_subentry_create - add dn: cn=repl keep alive 7,dc=company,dc=internal objectclass: top objectclass: ldapsubentry objectclass: extensibleObject cn: repl keep alive 7 [01/May/2018:19:22:36.125109819 -0500] - ERR - NSMMReplicationPlugin - multimaster_extop_StartNSDS50ReplicationRequest - conn=44 op=5 replica="dc=company,dc=internal": Unable to acquire replica: error: permission denied [01/May/2018:19:22:36.598178749 -0500] - INFO - NSMMReplicationPlugin - repl5_tot_run - Finished total update of replica "agmt="cn=meToipa-11.company.internal" (ipa-11:389)". Sent 503 entries. [01/May/2018:19:27:40.482836683 -0500] - ERR - NSMMReplicationPlugin - bind_and_check_pwp - agmt="cn=meToipa-11.company.internal" (ipa-11:389) - Replication bind with GSSAPI auth failed: LDAP error -1 (Can't contact LDAP server) ()
ipa-11 is the Pi I'm trying to set up replication on. Looks like there was a problem setting up replication due to permissions? But the other replica started up just fine ...
On Thu, May 3, 2018 at 3:13 PM, Rob Crittenden rcritten@redhat.com wrote:
Jonathan Vaughn wrote:
Here's the output from ipa-replica-install :
# ipa-replica-install WARNING: conflicting time&date synchronization service 'chronyd' will be disabled in favor of ntpd
Password for admin@COMPANY.INTERNAL: Run connection check to master Connection check OK Configuring NTP daemon (ntpd) [1/4]: stopping ntpd [2/4]: writing configuration [3/4]: configuring ntpd to start on boot [4/4]: starting ntpd Done configuring NTP daemon (ntpd). Configuring directory server (dirsrv). Estimated time: 30 seconds [1/41]: creating directory server instance [2/41]: enabling ldapi [3/41]: configure autobind for root [4/41]: stopping directory server [5/41]: updating configuration in dse.ldif [6/41]: starting directory server [7/41]: adding default schema [8/41]: enabling memberof plugin [9/41]: enabling winsync plugin [10/41]: configuring replication version plugin [11/41]: enabling IPA enrollment plugin [12/41]: configuring uniqueness plugin [13/41]: configuring uuid plugin [14/41]: configuring modrdn plugin [15/41]: configuring DNS plugin [16/41]: enabling entryUSN plugin [17/41]: configuring lockout plugin [18/41]: configuring topology plugin [19/41]: creating indices [20/41]: enabling referential integrity plugin [21/41]: configuring certmap.conf [22/41]: configure new location for managed entries [23/41]: configure dirsrv ccache [24/41]: enabling SASL mapping fallback [25/41]: restarting directory server [26/41]: creating DS keytab [27/41]: ignore time skew for initial replication [28/41]: setting up initial replication Starting replication, please wait until this has completed. Update in progress, 11 seconds elapsed Update succeeded
[29/41]: prevent time skew after initial replication [30/41]: adding sasl mappings to the directory [31/41]: updating schema [32/41]: setting Auto Member configuration [33/41]: enabling S4U2Proxy delegation [error] NetworkError: cannot connect to 'ldapi://%2Fvar%2Frun%2Fslapd- COMPANY-INTERNAL.socket': Your system may be partly configured. Run /usr/sbin/ipa-server-install --uninstall to clean up.
ipapython.admintool: ERROR cannot connect to 'ldapi://%2Fvar%2Frun%2Fslapd-COMPANY-INTERNAL.socket': ipapython.admintool: ERROR The ipa-replica-install command failed. See /var/log/ipareplica-install.log for more information
And here's the /var/log/ipareplica-install.log from just before it fails to the failure:
2018-05-02T00:22:44Z DEBUG [32/41]: setting Auto Member configuration 2018-05-02T00:22:44Z DEBUG Starting external process 2018-05-02T00:22:44Z DEBUG args=/usr/bin/ldapmodify -v -f /tmp/tmpt7bxf4x1 -H ldapi://%2Fvar%2Frun%2Fslapd-COMPANY-INTERNAL.socket -Y EXTERNAL 2018-05-02T00:22:45Z DEBUG Process finished, return code=0 2018-05-02T00:22:45Z DEBUG stdout=add nsslapd-pluginConfigArea: cn=automember,cn=etc,dc=company,dc=internal modifying entry "cn=Auto Membership Plugin,cn=plugins,cn=config" modify complete
2018-05-02T00:22:45Z DEBUG stderr=ldap_initialize( ldapi://%2Fvar%2Frun%2Fslapd-COMPANY-INTERNAL.socket/??base ) SASL/EXTERNAL authentication started SASL username: gidNumber=0+uidNumber=0,cn=peercred,cn=external,cn=auth SASL SSF: 0
2018-05-02T00:22:45Z DEBUG duration: 0 seconds 2018-05-02T00:22:45Z DEBUG [33/41]: enabling S4U2Proxy delegation 2018-05-02T00:22:45Z DEBUG Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/ipapython/ipaldap.py", line 979, in error_handler yield File "/usr/lib/python3.6/site-packages/ipapython/ipaldap.py", line 1572, in update_entry self.conn.modify_s(str(entry.dn), modlist) File "/usr/lib/python3.6/site-packages/ldap/ldapobject.py", line 600, in modify_s return self.modify_ext_s(dn,modlist,None,None) File "/usr/lib/python3.6/site-packages/ldap/ldapobject.py", line 573, in modify_ext_s resp_type, resp_data, resp_msgid, resp_ctrls = self.result3(msgid,all=1,timeout=self.timeout) File "/usr/lib/python3.6/site-packages/ldap/ldapobject.py", line 714, in result3 resp_ctrl_classes=resp_ctrl_classes File "/usr/lib/python3.6/site-packages/ldap/ldapobject.py", line 721, in result4 ldap_result = self._ldap_call(self._l.result 4,msgid,all,timeout,add_ctrls,add_intermediates,add_extop) File "/usr/lib/python3.6/site-packages/ldap/ldapobject.py", line 294, in _ldap_call result = func(*args,**kwargs) ldap.SERVER_DOWN: {'desc': "Can't contact LDAP server"}
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/ipaserver/install/service.py", line 506, in start_creation run_step(full_msg, method) File "/usr/lib/python3.6/site-packages/ipaserver/install/service.py", line 496, in run_step method() File "/usr/lib/python3.6/site-packages/ipaserver/install/dsinstance.py", line 977, in __setup_s4u2proxy __add_principal('ipa-http-delegation', 'HTTP', self) File "/usr/lib/python3.6/site-packages/ipaserver/install/dsinstance.py", line 973, in __add_principal api.Backend.ldap2.update_entry(entry) File "/usr/lib/python3.6/site-packages/ipapython/ipaldap.py", line 1572, in update_entry self.conn.modify_s(str(entry.dn), modlist) File "/usr/lib/python3.6/contextlib.py", line 99, in __exit__ self.gen.throw(type, value, traceback) File "/usr/lib/python3.6/site-packages/ipapython/ipaldap.py", line 1028, in error_handler error=info) ipalib.errors.NetworkError: cannot connect to 'ldapi://%2Fvar%2Frun%2Fslapd-COMPANY-INTERNAL.socket':
2018-05-02T00:22:45Z DEBUG [error] NetworkError: cannot connect to 'ldapi://%2Fvar%2Frun%2Fslapd-COMPANY-INTERNAL.socket': 2018-05-02T00:22:45Z DEBUG Destroyed connection context.ldap2_2991558640 2018-05-02T00:22:45Z DEBUG Backing up system configuration file '/etc/ipa/default.conf' 2018-05-02T00:22:45Z DEBUG Saving Index File to '/var/lib/ipa/sysrestore/sysrestore.index' 2018-05-02T00:22:45Z DEBUG File "/usr/lib/python3.6/site-packages/ipapython/admintool.py", line 174, in execute return_value = self.run() File "/usr/lib/python3.6/site-packages/ipapython/install/cli.py", line 319, in run cfgr.run() File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 364, in run self.execute() File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 388, in execute for _nothing in self._executor(): File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 430, in __runner exc_handler(exc_info) File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 459, in _handle_execute_exception self._handle_exception(exc_info) File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 449, in _handle_exception six.reraise(*exc_info) File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise raise value File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 420, in __runner step() File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 417, in <lambda> step = lambda: next(self.__gen) File "/usr/lib/python3.6/site-packages/ipapython/install/util.py", line 81, in run_generator_with_yield_from six.reraise(*exc_info) File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise raise value File "/usr/lib/python3.6/site-packages/ipapython/install/util.py", line 59, in run_generator_with_yield_from value = gen.send(prev_value) File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 654, in _configure next(executor) File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 430, in __runner exc_handler(exc_info) File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 459, in _handle_execute_exception self._handle_exception(exc_info) File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 517, in _handle_exception self.__parent._handle_exception(exc_info) File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 449, in _handle_exception six.reraise(*exc_info) File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise raise value File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 514, in _handle_exception super(ComponentBase, self)._handle_exception(exc_info) File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 449, in _handle_exception six.reraise(*exc_info) File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise raise value File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 420, in __runner step() File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 417, in <lambda> step = lambda: next(self.__gen) File "/usr/lib/python3.6/site-packages/ipapython/install/util.py", line 81, in run_generator_with_yield_from six.reraise(*exc_info) File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise raise value File "/usr/lib/python3.6/site-packages/ipapython/install/util.py", line 59, in run_generator_with_yield_from value = gen.send(prev_value) File "/usr/lib/python3.6/site-packages/ipapython/install/common.py", line 66, in _install for unused in self._installer(self.parent): File "/usr/lib/python3.6/site-packages/ipaserver/install/server/__init__.py", line 622, in main replica_install(self) File "/usr/lib/python3.6/site-packages/ipaserver/install/server/replicainstall.py", line 388, in decorated func(installer) File "/usr/lib/python3.6/site-packages/ipaserver/install/server/replicainstall.py", line 1407, in install pkcs12_info=dirsrv_pkcs12_info) File "/usr/lib/python3.6/site-packages/ipaserver/install/server/replicainstall.py", line 110, in install_replica_ds setup_pkinit=not options.no_pkinit, File "/usr/lib/python3.6/site-packages/ipaserver/install/dsinstance.py", line 419, in create_replica self.start_creation(runtime=30) File "/usr/lib/python3.6/site-packages/ipaserver/install/service.py", line 506, in start_creation run_step(full_msg, method) File "/usr/lib/python3.6/site-packages/ipaserver/install/service.py", line 496, in run_step method() File "/usr/lib/python3.6/site-packages/ipaserver/install/dsinstance.py", line 977, in __setup_s4u2proxy __add_principal('ipa-http-delegation', 'HTTP', self) File "/usr/lib/python3.6/site-packages/ipaserver/install/dsinstance.py", line 973, in __add_principal api.Backend.ldap2.update_entry(entry) File "/usr/lib/python3.6/site-packages/ipapython/ipaldap.py", line 1572, in update_entry self.conn.modify_s(str(entry.dn), modlist) File "/usr/lib/python3.6/contextlib.py", line 99, in __exit__ self.gen.throw(type, value, traceback) File "/usr/lib/python3.6/site-packages/ipapython/ipaldap.py", line 1028, in error_handler error=info)
2018-05-02T00:22:45Z DEBUG The ipa-replica-install command failed, exception: NetworkError: cannot connect to 'ldapi://%2Fvar%2Frun%2Fslapd- COMPANY-INTERNAL.socket': 2018-05-02T00:22:45Z ERROR cannot connect to 'ldapi://%2Fvar%2Frun%2Fslapd-COMPANY-INTERNAL.socket': 2018-05-02T00:22:45Z ERROR The ipa-replica-install command failed. See /var/log/ipareplica-install.log for more information
Maybe it just is trying to soon and it hasn't fully started yet? Because now I see that the previous step ALSO shows the URL encoded socket path, and it worked fine - but it looks like the previous step used ldapmodify, but the failing step was accessing LDAP from Python? Perhaps it works fine one way and not the other ?
I won't rule it out but I doubt it. Is ns-slapd still running at this point? This could happen, for example, if ns-slapd crashed after enabling the automember plugin. The DS error log might have something to say as well.
rob
On Wed, May 2, 2018 at 3:29 PM, Rob Crittenden <rcritten@redhat.com mailto:rcritten@redhat.com> wrote:
Jonathan Vaughn via FreeIPA-users wrote: Yes, I know, not recommended etc, low performance. I'm not going to run the CA on it. I just want to have a backup LDAP/Kerberos server. Right now I'm just trying to test things out. I've got a master and a replica (so you could say two masters I suppose) running in Virtualbox VMs, and I'm trying to set up a 3rd replica on a Pi. All are Fedors 27. I had to downgrade httpd due to https://pagure.io/freeipa/issue/7493 <https://pagure.io/freeipa/issue/7493> to even set up the first VM replica, but this issue is separate. Currently, the problem is it can't connect to it's own LDAP instance due to some kind of error ... ipa-replica-install worked fine on the x86_64 VM but on the armv71 Pi 3B when it tries to connect to LDAPI instead of using 'ldapi:///var/run//slapd-COMPANY-INTERNAL.socket' it uses 'ldapi://%2Fvar%2Frun%2Fslapd-COMPANY-INTERNAL.socket'. So it seems there is yet another ARM (or non-x86_64) bug ... similar to the problem with httpd and passing the KRB5CCNAME properly https://pagure.io/freeipa/issue/7337 <https://pagure.io/freeipa/issue/7337> Any ideas on where to look to patch in a fix to this so it uses the correct filename? The socket file is there ... and (at the time it tries) LDAP is running. What makes you think the ldapi URI is the problem? Can you share the logs? rob