Hi,

just for info. We restart our setup on an other stage in our dc with same result, we run in timeouts if first installed ipa server not available. So we give it a try in a complete different environment, with successfully failover. It  seem's we have a problem in our dc and we will have a deeper look on our environment.

Thanks,

Michael


Am 24.08.2017 um 21:12 schrieb Jakub Hrozek via FreeIPA-users:
On Thu, Aug 24, 2017 at 10:12:55AM +0200, Michael Gusek via FreeIPA-users wrote:
Hello Jakub,

here the first lines of ldap_child.log

|(Wed Aug 23 16:07:11 2017) [[sssd[ldap_child[2104]]]] [main] (0x0400):
ldap_child started. (Wed Aug 23 16:07:11 2017)
[[sssd[ldap_child[2104]]]] [main] (0x2000): context initialized (Wed Aug
23 16:07:11 2017) [[sssd[ldap_child[2104]]]] [unpack_buffer] (0x1000):
total buffer size: 81 (Wed Aug 23 16:07:11 2017)
[[sssd[ldap_child[2104]]]] [unpack_buffer] (0x1000): realm_str size: 16
(Wed Aug 23 16:07:11 2017) [[sssd[ldap_child[2104]]]] [unpack_buffer]
(0x1000): got realm_str: IPA.EXAMPLE.COM (Wed Aug 23 16:07:11 2017)
[[sssd[ldap_child[2104]]]] [unpack_buffer] (0x1000): princ_str size: 41
(Wed Aug 23 16:07:11 2017) [[sssd[ldap_child[2104]]]] [unpack_buffer]
(0x1000): got princ_str: host/ipa-lx-test-debian9.ípa.example.com (Wed
Aug 23 16:07:11 2017) [[sssd[ldap_child[2104]]]] [unpack_buffer]
(0x1000): keytab_name size: 0 (Wed Aug 23 16:07:11 2017)
[[sssd[ldap_child[2104]]]] [unpack_buffer] (0x1000): lifetime: 86400
(Wed Aug 23 16:07:11 2017) [[sssd[ldap_child[2104]]]] [unpack_buffer]
(0x0200): Will run as [0][0]. (Wed Aug 23 16:07:11 2017)
[[sssd[ldap_child[2104]]]] [privileged_krb5_setup] (0x2000): Kerberos
context initialized (Wed Aug 23 16:07:11 2017)
[[sssd[ldap_child[2104]]]] [main] (0x2000): Kerberos context initialized
(Wed Aug 23 16:07:11 2017) [[sssd[ldap_child[2104]]]] [become_user]
(0x0200): Trying to become user [0][0]. (Wed Aug 23 16:07:11 2017)
[[sssd[ldap_child[2104]]]] [become_user] (0x0200): Already user [0].
(Wed Aug 23 16:07:11 2017) [[sssd[ldap_child[2104]]]] [main] (0x2000):
Running as [0][0]. (Wed Aug 23 16:07:11 2017) [[sssd[ldap_child[2104]]]]
[main] (0x2000): getting TGT sync (Wed Aug 23 16:07:11 2017)
[[sssd[ldap_child[2104]]]] [ldap_child_get_tgt_sync] (0x2000): got
realm_name: [IPA.EXAMPLE.COM] (Wed Aug 23 16:07:11 2017)
[[sssd[ldap_child[2104]]]] [ldap_child_get_tgt_sync] (0x0100): Principal
name is: [host/ipa-lx-test-debian9.ípa.example.com@IPA.EXAMPLE.COM] (Wed
Aug 23 16:07:11 2017) [[sssd[ldap_child[2104]]]]
[ldap_child_get_tgt_sync] (0x0100): Using keytab
[MEMORY:/etc/krb5.keytab] (Wed Aug 23 16:07:11 2017)
[[sssd[ldap_child[2104]]]] [ldap_child_get_tgt_sync] (0x0100): Will
canonicalize principals (Wed Aug 23 16:07:11 2017)
[[sssd[ldap_child[2104]]]] [sss_child_krb5_trace_cb] (0x4000): [2104]
1503497231.122092: Getting initial credentials for
host/ipa-lx-test-debian9.ípa.example.com@IPA.EXAMPLE.COM (Wed Aug 23
16:07:11 2017) [[sssd[ldap_child[2104]]]] [sss_child_krb5_trace_cb]
(0x4000): [2104] 1503497231.122313: Looked up etypes in keytab:
aes256-cts, aes128-cts, aes256-sha2, aes128-sha2, des3-cbc-sha1,
rc4-hmac (Wed Aug 23 16:07:11 2017) [[sssd[ldap_child[2104]]]]
[sss_child_krb5_trace_cb] (0x4000): [2104] 1503497231.122451: Sending
request (218 bytes) to IPA.EXAMPLE.COM (Wed Aug 23 16:07:17 2017)
[[sssd[ldap_child[2104]]]] [sig_term_handler] (0x0010): Received signal
[Terminated] [15], shutting down (Wed Aug 23 16:07:17 2017)
[[sssd[ldap_child[2104]]]] [sig_term_handler] (0x0010): Unlink file
[/var/lib/sss/db/ccache_IPA.EXAMPLE.COM_TmKHkD] (Wed Aug 23 16:07:17
2017) [[sssd[ldap_child[2105]]]] [main] (0x0400): ldap_child started.
(Wed Aug 23 16:07:17 2017) [[sssd[ldap_child[2105]]]] [main] (0x2000):
context initialized |

We connect to IPA.EXAMPLE.COM, this is not helpfull. You can see, there
is a delay of 5 seconds. Later in this file, you can see, we try to
connect to second server ipa-lx-test-02.ípa.example.com.
Right, sssd says it does, but I really wonder why the ldap_child
timeouts even in that case. Are there any log entries in the
ipa-lx-test-02.ípa.example.com's log files around the time sssd connects
to it?

And also -- could you run a simple tcpdump (tcpdump -i eth1 -x "port
88 or port 53") to see what hosts does sssd talk to and what hosts does
it discover?

What I'm wondering is -- does SSSD really talk to the right server at
that time or does it keep trying the bad one even if it should be trying
the one that is up?

By the way, I'm suprised that the failover doesn't work for you. This is a
quite basic feature that had been developed years ago. Can you also specify
exactly how you bring one of the IDM servers down? Do you power it off,
run ipactl stop, ...?

|(Wed Aug 23 16:10:14 2017) [[sssd[ldap_child[2129]]]]
[sss_child_krb5_trace_cb] (0x4000): [2129] 1503497414.357486: Getting
initial credentials for
host/ipa-lx-test-debian9.ípa.example.com@IPA.EXAMPLE.COM (Wed Aug 23
16:10:14 2017) [[sssd[ldap_child[2129]]]] [sss_child_krb5_trace_cb]
(0x4000): [2129] 1503497414.357715: Looked up etypes in keytab:
aes256-cts, aes128-cts, aes256-sha2, aes128-sha2, des3-cbc-sha1,
rc4-hmac (Wed Aug 23 16:10:14 2017) [[sssd[ldap_child[2129]]]]
[sss_child_krb5_trace_cb] (0x4000): [2129] 1503497414.357842: Sending
request (218 bytes) to IPA.EXAMPLE.COM (Wed Aug 23 16:10:14 2017)
[[sssd[ldap_child[2129]]]] [sss_child_krb5_trace_cb] (0x4000): [2129]
1503497414.360574: Resolving hostname ipa-lx-test-02.ípa.example.com.
(Wed Aug 23 16:10:14 2017) [[sssd[ldap_child[2129]]]]
[sss_child_krb5_trace_cb] (0x4000): [2129] 1503497414.362141: Sending
initial UDP request to dgram <ip_of_ipa-lx-test-02>:88 (Wed Aug 23
16:10:14 2017) [[sssd[ldap_child[2129]]]] [sss_child_krb5_trace_cb]
(0x4000): [2129] 1503497414.367725: Received answer (346 bytes) from
dgram <ip_of_ipa-lx-test-02>:88 (Wed Aug 23 16:10:19 2017)
[[sssd[ldap_child[2129]]]] [sss_child_krb5_trace_cb] (0x4000): [2129]
1503497419.374601: Response was from master KDC |

I think this was my test with ‘kinit admin@IPA.EXAMPLE.COM’. You can
find whole log file attached. From my point of view, failover to second
KDC is not fast enough and we should set some timeouts in sssd.conf or
krb5.conf ?
Well, kinit doesn't talk to sssd..

sssd only has interfaces towards the NSS and PAM subsystems, so kinit
talks straight to libkrb5. What /KDCs/ does kinit talk to depends on
libkrb5 configuration. In your case, it doesn't contain any KDCs, so
it's either what SSSD tells libkrb5 to look at (through a kdcinfo file
interface) DNS (because dns_lookup_kdc = true) in this order. And the
kdcinfo files should not be generated unless sssd can switch to the
'online' mode here.

krb5.conf from ipa-lx-test-debian9:

|#File modified by ipa-client-install includedir
/var/lib/sss/pubconf/krb5.include.d/ [libdefaults] default_realm =
IPA.EXAMPLE.COM dns_lookup_realm = true dns_lookup_kdc = true rdns =
false ticket_lifetime = 24h forwardable = yes default_ccache_name =
KEYRING:persistent:%{uid} [realms] IPA.EXAMPLE.COM = { pkinit_anchors =
FILE:/etc/ipa/ca.crt } [domain_realm] .ipa.example.com = IPA.EXAMPLE.COM
ipa.example.com = IPA.EXAMPLE.COM |

Regards,

Michael

Am 23.08.2017 um 22:20 schrieb Jakub Hrozek via FreeIPA-users:

On Wed, Aug 23, 2017 at 05:13:13PM +0200, Michael Gusek via FreeIPA-users wrote:
Hi,

we are testing a FreeIPA trust to an Active Directory. Trust itself
works, we are happy. Now we tested a failure on FreeIPA site. We have
two instances, both with same roles. If we poweroff first installed
server, and clean sssd caches with restart of sssd on client side , sssd
service can’t establish a connection to second instance.

ipa-lx-test-01.ipa.example.com is the first installed FreeIPA with
ipa-server-4.4.0-14.el7.centos.7.x86_64 on latest CentOS7
ipa-lx-test-02.ipa.example.com is the second installed FreeIPA with
ipa-server-4.4.0-14.el7.centos.7.x86_64 on latest CentOS7
ipa-lx-test-debian9.ipa.example.com is a latest Debian 9.1 with sssd
1.15.0-3

For deeper inspection full log is attached. In logs we found something
like this:
OK, so as you say communication with the KDC failed:
(Wed Aug 23 16:07:11 2017) [sssd[be[ipa.example.com]]] [be_resolve_server_process] (0x0200): Found address for server ipa-lx-test-02.ipa.example.com: [x.x.x.x] TTL 1200                                       
(Wed Aug 23 16:07:11 2017) [sssd[be[ipa.example.com]]] [sdap_kinit_kdc_resolved] (0x1000): KDC resolved, attempting to get TGT...                                                                              
(Wed Aug 23 16:07:11 2017) [sssd[be[ipa.example.com]]] [create_tgt_req_send_buffer] (0x0400): buffer size: 81                                                                                                  
(Wed Aug 23 16:07:11 2017) [sssd[be[ipa.example.com]]] [child_handler_setup] (0x2000): Setting up signal handler up for pid [2104]                                                                             
(Wed Aug 23 16:07:11 2017) [sssd[be[ipa.example.com]]] [child_handler_setup] (0x2000): Signal handler set up for pid [2104]                                                                                    
(Wed Aug 23 16:07:11 2017) [sssd[be[ipa.example.com]]] [set_tgt_child_timeout] (0x0400): Setting 6 seconds timeout for tgt child                                                                               
(Wed Aug 23 16:07:11 2017) [sssd[be[ipa.example.com]]] [sdap_process_result] (0x2000): Trace: sh[0x558252c35750], connected[1], ops[(nil)], ldap[0x558252c130c0]                                               
(Wed Aug 23 16:07:11 2017) [sssd[be[ipa.example.com]]] [sdap_process_result] (0x2000): Trace: end of ldap_result list                                                                                          
(Wed Aug 23 16:07:11 2017) [sssd[be[ipa.example.com]]] [write_pipe_handler] (0x0400): All data has been sent!                                                                                                  
(Wed Aug 23 16:07:17 2017) [sssd[be[ipa.example.com]]] [get_tgt_timeout_handler] (0x4000): timeout for sending SIGTERM to tgt child [2104] reached.                                                            
(Wed Aug 23 16:07:17 2017) [sssd[be[ipa.example.com]]] [get_tgt_timeout_handler] (0x0400): Setting 2 seconds timeout for sending SIGKILL to tgt child                                                          
(Wed Aug 23 16:07:17 2017) [sssd[be[ipa.example.com]]] [read_pipe_handler] (0x0400): EOF received, client finished                                                                                             
(Wed Aug 23 16:07:17 2017) [sssd[be[ipa.example.com]]] [child_sig_handler] (0x1000): Waiting for child [2104].                                                                                                 
(Wed Aug 23 16:07:17 2017) [sssd[be[ipa.example.com]]] [child_sig_handler] (0x0020): child [2104] failed with status [7].                                                                                      
(Wed Aug 23 16:07:17 2017) [sssd[be[ipa.example.com]]] [child_callback] (0x0020): LDAP child was terminated due to timeout                                                                                     
(Wed Aug 23 16:07:17 2017) [sssd[be[ipa.example.com]]] [sdap_kinit_done] (0x0080): Communication with KDC timed out, trying the next one                                                                       
(Wed Aug 23 16:07:17 2017) [sssd[be[ipa.example.com]]] [_be_fo_set_port_status] (0x8000): Setting status: PORT_NOT_WORKING. Called from: ../src/providers/ldap/sdap_async_connection.c: sdap_kinit_done: 1207  
(Wed Aug 23 16:07:17 2017) [sssd[be[ipa.example.com]]] [fo_set_port_status] (0x0100): Marking port 389 of server 'ipa-lx-test-02.ipa.example.com' as 'not working'                                             
(Wed Aug 23 16:07:17 2017) [sssd[be[ipa.example.com]]] [fo_set_port_status] (0x0400): Marking port 389 of duplicate server 'ipa-lx-test-02.ipa.example.com' as 'not working' 

Could you check in the ldap_child.log which KDC did SSSD try to talk to
and what takes so long?
_______________________________________________
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-leave@lists.fedorahosted.org
​
-- 

________________________________________________


*Michael**Gusek*| System Administrator| Webtrekk GmbH |
*t*+49 30 755 415 302| *f *+49 30 755 415 100 | *w *www.webtrekk.com
<https://www.webtrekk.com/?wt_mc=signature.-.-.-.homepageURL>
Amtsgericht/Local Court Berlin, HRB 93435 B | Geschäftsführer/CEO
Christian Sauer und Wolf Lichtenstein



      
_______________________________________________
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-leave@lists.fedorahosted.org
_______________________________________________
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-leave@lists.fedorahosted.org

--

________________________________________________


Michael Gusek | System Administrator | Webtrekk GmbH |
t +49 30 755 415 302 | f +49 30 755 415 100 | w www.webtrekk.com
Amtsgericht/Local Court Berlin, HRB 93435 B | Geschäftsführer/CEO Christian Sauer und Wolf Lichtenstein