Hi!
We are having problems with some our 389-DS instances. They crash after receiving an update from the provider. The crash happened twice after about a week of running without problems. The crashes happened on two consumer servers but not at the same time. The servers are running CentOS 6x with the following 389DS packages installed: 389-ds-console-doc-1.2.6-1.el6.noarch 389-console-1.1.7-1.el6.noarch 389-adminutil-1.1.15-1.el6.x86_64 389-dsgw-1.1.10-1.el6.x86_64 389-ds-base-debuginfo-1.2.11.15-14.el6_4.x86_64 389-admin-1.1.29-1.el6.x86_64 389-ds-console-1.2.6-1.el6.noarch 389-admin-console-doc-1.1.8-1.el6.noarch 389-ds-1.2.2-1.el6.noarch 389-ds-base-1.2.11.15-14.el6_4.x86_64 389-ds-base-libs-1.2.11.15-14.el6_4.x86_64 389-admin-console-1.1.8-1.el6.noarch
We are in the process of replacing the Centos 5x base consumer+provider setup with a CentOS 6x base one. For the time being, the CentOS 6 machines are acting as consumers for the old server. They run for a while and then the replicated instances crash though not at the same time. One of the servers did not want to start after the crash, so I have run db2index on its database. It's been running for four days and it has still not finished. All I get from db2index now are these outputs: [09/Jul/2013:13:29:11 +0200] - reindex db: Processed 65095 entries (pass 1104) -- average rate 53686277.5/sec, recent rate 0.0/sec, hit ratio 0%
The other instance did start up, but the replication process did not work anymore. I disabled the replication to this host and set it up again. I chose "Initialize consumer now" and the consumer crashed every time. I have enabled full error logging and could find nothing. I have read a few threads (not all, I admit) on this list and http://directory.fedoraproject.org/wiki/FAQ#Debugging_Crashes and tried to troubleshoot.
The crash produced the attached core dump and I could use your help with understanding it. As well as any help with the crash. If more info is needed I will gladly provide it.
Regards, Mitja
On 07/09/2013 06:43 AM, Mitja Mihelič wrote:
Hi!
We are having problems with some our 389-DS instances. They crash after receiving an update from the provider.
After looking at the stack trace, I think this is https://fedorahosted.org/389/ticket/47391
The crash happened twice after about a week of running without problems. The crashes happened on two consumer servers but not at the same time. The servers are running CentOS 6x with the following 389DS packages installed: 389-ds-console-doc-1.2.6-1.el6.noarch 389-console-1.1.7-1.el6.noarch 389-adminutil-1.1.15-1.el6.x86_64 389-dsgw-1.1.10-1.el6.x86_64 389-ds-base-debuginfo-1.2.11.15-14.el6_4.x86_64 389-admin-1.1.29-1.el6.x86_64 389-ds-console-1.2.6-1.el6.noarch 389-admin-console-doc-1.1.8-1.el6.noarch 389-ds-1.2.2-1.el6.noarch 389-ds-base-1.2.11.15-14.el6_4.x86_64 389-ds-base-libs-1.2.11.15-14.el6_4.x86_64 389-admin-console-1.1.8-1.el6.noarch
We are in the process of replacing the Centos 5x base consumer+provider setup with a CentOS 6x base one. For the time being, the CentOS 6 machines are acting as consumers for the old server. They run for a while and then the replicated instances crash though not at the same time. One of the servers did not want to start after the crash,
Can you provide the error messages from the errors log?
so I have run db2index on its database. It's been running for four days and it has still not finished.
Try exporting using db2ldif, then importing using ldif2db.
All I get from db2index now are these outputs: [09/Jul/2013:13:29:11 +0200] - reindex db: Processed 65095 entries (pass 1104) -- average rate 53686277.5/sec, recent rate 0.0/sec, hit ratio 0%
How many entries do you have in your database?
The other instance did start up, but the replication process did not work anymore. I disabled the replication to this host and set it up again. I chose "Initialize consumer now" and the consumer crashed every time.
Can provide a stack trace of the core when the server crashes? This may be different than the stack trace below.
I have enabled full error logging and could find nothing. I have read a few threads (not all, I admit) on this list and http://directory.fedoraproject.org/wiki/FAQ#Debugging_Crashes and tried to troubleshoot.
The crash produced the attached core dump and I could use your help with understanding it. As well as any help with the crash. If more info is needed I will gladly provide it.
Regards, Mitja
-- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
On 07/09/2013 03:34 PM, Rich Megginson wrote:
On 07/09/2013 06:43 AM, Mitja Mihelič wrote:
Hi!
We are having problems with some our 389-DS instances. They crash after receiving an update from the provider.
After looking at the stack trace, I think this is https://fedorahosted.org/389/ticket/47391
The crash happened twice after about a week of running without problems. The crashes happened on two consumer servers but not at the same time. The servers are running CentOS 6x with the following 389DS packages installed: 389-ds-console-doc-1.2.6-1.el6.noarch 389-console-1.1.7-1.el6.noarch 389-adminutil-1.1.15-1.el6.x86_64 389-dsgw-1.1.10-1.el6.x86_64 389-ds-base-debuginfo-1.2.11.15-14.el6_4.x86_64 389-admin-1.1.29-1.el6.x86_64 389-ds-console-1.2.6-1.el6.noarch 389-admin-console-doc-1.1.8-1.el6.noarch 389-ds-1.2.2-1.el6.noarch 389-ds-base-1.2.11.15-14.el6_4.x86_64 389-ds-base-libs-1.2.11.15-14.el6_4.x86_64 389-admin-console-1.1.8-1.el6.noarch
We are in the process of replacing the Centos 5x base consumer+provider setup with a CentOS 6x base one. For the time being, the CentOS 6 machines are acting as consumers for the old server. They run for a while and then the replicated instances crash though not at the same time. One of the servers did not want to start after the crash,
Can you provide the error messages from the errors log?
I have attached error logs from the provider (2013-06-27-provider_error) and the consumer (2013-06-27-server_two_error) in question.
so I have run db2index on its database. It's been running for four days and it has still not finished.
Try exporting using db2ldif, then importing using ldif2db.
The export process hangs. After an hour strace still shows: futex(0x7f5822670ed4, FUTEX_WAIT, 1, NULL The error log for this is attached as 2013-07-10-server_two-ldif_import_hangs.
All I get from db2index now are these outputs: [09/Jul/2013:13:29:11 +0200] - reindex db: Processed 65095 entries (pass 1104) -- average rate 53686277.5/sec, recent rate 0.0/sec, hit ratio 0%
How many entries do you have in your database?
The number revolves around 65400. It varies perhaps 2 user del/add operations a month and 20 attribute changes per week, if that.
The other instance did start up, but the replication process did not work anymore. I disabled the replication to this host and set it up again. I chose "Initialize consumer now" and the consumer crashed every time.
Can provide a stack trace of the core when the server crashes? This may be different than the stack trace below.
The last provided stack trace was produced at the last server crash. I will provide another stack trace when CONSUMER_ONE crashes again. Currently it refuses to crash at initialization time and keeps running.
I have enabled full error logging and could find nothing. I have read a few threads (not all, I admit) on this list and http://directory.fedoraproject.org/wiki/FAQ#Debugging_Crashes and tried to troubleshoot.
The crash produced the attached core dump and I could use your help with understanding it. As well as any help with the crash. If more info is needed I will gladly provide it.
Regards, Mitja
-- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
On 07/12/2013 08:22 AM, Mitja Mihelič wrote:
On 07/09/2013 03:34 PM, Rich Megginson wrote:
On 07/09/2013 06:43 AM, Mitja Mihelič wrote:
Hi!
We are having problems with some our 389-DS instances. They crash after receiving an update from the provider.
After looking at the stack trace, I think this is https://fedorahosted.org/389/ticket/47391
The crash happened twice after about a week of running without problems. The crashes happened on two consumer servers but not at the same time. The servers are running CentOS 6x with the following 389DS packages installed: 389-ds-console-doc-1.2.6-1.el6.noarch 389-console-1.1.7-1.el6.noarch 389-adminutil-1.1.15-1.el6.x86_64 389-dsgw-1.1.10-1.el6.x86_64 389-ds-base-debuginfo-1.2.11.15-14.el6_4.x86_64 389-admin-1.1.29-1.el6.x86_64 389-ds-console-1.2.6-1.el6.noarch 389-admin-console-doc-1.1.8-1.el6.noarch 389-ds-1.2.2-1.el6.noarch 389-ds-base-1.2.11.15-14.el6_4.x86_64 389-ds-base-libs-1.2.11.15-14.el6_4.x86_64 389-admin-console-1.1.8-1.el6.noarch
We are in the process of replacing the Centos 5x base consumer+provider setup with a CentOS 6x base one. For the time being, the CentOS 6 machines are acting as consumers for the old server. They run for a while and then the replicated instances crash though not at the same time. One of the servers did not want to start after the crash,
Can you provide the error messages from the errors log?
I have attached error logs from the provider (2013-06-27-provider_error) and the consumer (2013-06-27-server_two_error) in question.
so I have run db2index on its database. It's been running for four days and it has still not finished.
Try exporting using db2ldif, then importing using ldif2db.
The export process hangs. After an hour strace still shows: futex(0x7f5822670ed4, FUTEX_WAIT, 1, NULL The error log for this is attached as 2013-07-10-server_two-ldif_import_hangs.
Are you using db2ldif or db2ldif.pl? If you are using db2ldif, is the server running? If not, please try first shutting down the server and use db2ldif.
If db2ldif still hangs, then please follow the instructions at http://port389.org/wiki/FAQ#Debugging_Hangs to get a stack trace of the hung process.
All I get from db2index now are these outputs: [09/Jul/2013:13:29:11 +0200] - reindex db: Processed 65095 entries (pass 1104) -- average rate 53686277.5/sec, recent rate 0.0/sec, hit ratio 0%
How many entries do you have in your database?
The number revolves around 65400. It varies perhaps 2 user del/add operations a month and 20 attribute changes per week, if that.
The other instance did start up, but the replication process did not work anymore. I disabled the replication to this host and set it up again. I chose "Initialize consumer now" and the consumer crashed every time.
Can provide a stack trace of the core when the server crashes? This may be different than the stack trace below.
The last provided stack trace was produced at the last server crash. I will provide another stack trace when CONSUMER_ONE crashes again. Currently it refuses to crash at initialization time and keeps running.
I have enabled full error logging and could find nothing. I have read a few threads (not all, I admit) on this list and http://directory.fedoraproject.org/wiki/FAQ#Debugging_Crashes and tried to troubleshoot.
The crash produced the attached core dump and I could use your help with understanding it. As well as any help with the crash. If more info is needed I will gladly provide it.
Regards, Mitja
-- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
On 07/12/2013 05:55 PM, Rich Megginson wrote:
On 07/12/2013 08:22 AM, Mitja Mihelič wrote:
On 07/09/2013 03:34 PM, Rich Megginson wrote:
On 07/09/2013 06:43 AM, Mitja Mihelič wrote:
Hi!
We are having problems with some our 389-DS instances. They crash after receiving an update from the provider.
After looking at the stack trace, I think this is https://fedorahosted.org/389/ticket/47391
Yes, it looks like it might be it. When CONSUMER_ONE crashed for the first time, the last thing replicated was a password change. Do you perhaps know, where I could get a 389DS version for Centos6 that has the patch? The ticket says it was pushed to 1.2.11, but would seem that our 1.2.11.15-14 is still an unpatched one and the repositories do not have any newer versions.
The crash happened twice after about a week of running without problems. The crashes happened on two consumer servers but not at the same time. The servers are running CentOS 6x with the following 389DS packages installed: 389-ds-console-doc-1.2.6-1.el6.noarch 389-console-1.1.7-1.el6.noarch 389-adminutil-1.1.15-1.el6.x86_64 389-dsgw-1.1.10-1.el6.x86_64 389-ds-base-debuginfo-1.2.11.15-14.el6_4.x86_64 389-admin-1.1.29-1.el6.x86_64 389-ds-console-1.2.6-1.el6.noarch 389-admin-console-doc-1.1.8-1.el6.noarch 389-ds-1.2.2-1.el6.noarch 389-ds-base-1.2.11.15-14.el6_4.x86_64 389-ds-base-libs-1.2.11.15-14.el6_4.x86_64 389-admin-console-1.1.8-1.el6.noarch
We are in the process of replacing the Centos 5x base consumer+provider setup with a CentOS 6x base one. For the time being, the CentOS 6 machines are acting as consumers for the old server. They run for a while and then the replicated instances crash though not at the same time. One of the servers did not want to start after the crash,
Can you provide the error messages from the errors log?
I have attached error logs from the provider (2013-06-27-provider_error) and the consumer (2013-06-27-server_two_error) in question.
so I have run db2index on its database. It's been running for four days and it has still not finished.
Try exporting using db2ldif, then importing using ldif2db.
The export process hangs. After an hour strace still shows: futex(0x7f5822670ed4, FUTEX_WAIT, 1, NULL The error log for this is attached as 2013-07-10-server_two-ldif_import_hangs.
Are you using db2ldif or db2ldif.pl? If you are using db2ldif, is the server running? If not, please try first shutting down the server and use db2ldif.
If db2ldif still hangs, then please follow the instructions at http://port389.org/wiki/FAQ#Debugging_Hangs to get a stack trace of the hung process.
I was using db2ldif with the server shut down. I tried it again and it hung. The LDIF file was created but its size was zero. The produced stack trace is attached as server_two-db2ldif_hang-stacktrace.1373877200.txt.
All I get from db2index now are these outputs: [09/Jul/2013:13:29:11 +0200] - reindex db: Processed 65095 entries (pass 1104) -- average rate 53686277.5/sec, recent rate 0.0/sec, hit ratio 0%
How many entries do you have in your database?
The number revolves around 65400. It varies perhaps 2 user del/add operations a month and 20 attribute changes per week, if that.
The other instance did start up, but the replication process did not work anymore. I disabled the replication to this host and set it up again. I chose "Initialize consumer now" and the consumer crashed every time.
Can provide a stack trace of the core when the server crashes? This may be different than the stack trace below.
The last provided stack trace was produced at the last server crash. I will provide another stack trace when CONSUMER_ONE crashes again. Currently it refuses to crash at initialization time and keeps running.
I have enabled full error logging and could find nothing. I have read a few threads (not all, I admit) on this list and http://directory.fedoraproject.org/wiki/FAQ#Debugging_Crashes and tried to troubleshoot.
The crash produced the attached core dump and I could use your help with understanding it. As well as any help with the crash. If more info is needed I will gladly provide it.
Regards, Mitja
-- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
On 07/15/2013 02:57 AM, Mitja Mihelič wrote:
On 07/12/2013 05:55 PM, Rich Megginson wrote:
On 07/12/2013 08:22 AM, Mitja Mihelič wrote:
On 07/09/2013 03:34 PM, Rich Megginson wrote:
On 07/09/2013 06:43 AM, Mitja Mihelič wrote:
Hi!
We are having problems with some our 389-DS instances. They crash after receiving an update from the provider.
After looking at the stack trace, I think this is https://fedorahosted.org/389/ticket/47391
Yes, it looks like it might be it. When CONSUMER_ONE crashed for the first time, the last thing replicated was a password change. Do you perhaps know, where I could get a 389DS version for Centos6 that has the patch? The ticket says it was pushed to 1.2.11, but would seem that our 1.2.11.15-14 is still an unpatched one and the repositories do not have any newer versions.
Is that the 389-ds-base that is included with CentOS6?
The crash happened twice after about a week of running without problems. The crashes happened on two consumer servers but not at the same time. The servers are running CentOS 6x with the following 389DS packages installed: 389-ds-console-doc-1.2.6-1.el6.noarch 389-console-1.1.7-1.el6.noarch 389-adminutil-1.1.15-1.el6.x86_64 389-dsgw-1.1.10-1.el6.x86_64 389-ds-base-debuginfo-1.2.11.15-14.el6_4.x86_64 389-admin-1.1.29-1.el6.x86_64 389-ds-console-1.2.6-1.el6.noarch 389-admin-console-doc-1.1.8-1.el6.noarch 389-ds-1.2.2-1.el6.noarch 389-ds-base-1.2.11.15-14.el6_4.x86_64 389-ds-base-libs-1.2.11.15-14.el6_4.x86_64 389-admin-console-1.1.8-1.el6.noarch
We are in the process of replacing the Centos 5x base consumer+provider setup with a CentOS 6x base one. For the time being, the CentOS 6 machines are acting as consumers for the old server. They run for a while and then the replicated instances crash though not at the same time. One of the servers did not want to start after the crash,
Can you provide the error messages from the errors log?
I have attached error logs from the provider (2013-06-27-provider_error) and the consumer (2013-06-27-server_two_error) in question.
so I have run db2index on its database. It's been running for four days and it has still not finished.
Try exporting using db2ldif, then importing using ldif2db.
The export process hangs. After an hour strace still shows: futex(0x7f5822670ed4, FUTEX_WAIT, 1, NULL The error log for this is attached as 2013-07-10-server_two-ldif_import_hangs.
Are you using db2ldif or db2ldif.pl? If you are using db2ldif, is the server running? If not, please try first shutting down the server and use db2ldif.
If db2ldif still hangs, then please follow the instructions at http://port389.org/wiki/FAQ#Debugging_Hangs to get a stack trace of the hung process.
I was using db2ldif with the server shut down. I tried it again and it hung. The LDIF file was created but its size was zero. The produced stack trace is attached as server_two-db2ldif_hang-stacktrace.1373877200.txt.
All I get from db2index now are these outputs: [09/Jul/2013:13:29:11 +0200] - reindex db: Processed 65095 entries (pass 1104) -- average rate 53686277.5/sec, recent rate 0.0/sec, hit ratio 0%
How many entries do you have in your database?
The number revolves around 65400. It varies perhaps 2 user del/add operations a month and 20 attribute changes per week, if that.
The other instance did start up, but the replication process did not work anymore. I disabled the replication to this host and set it up again. I chose "Initialize consumer now" and the consumer crashed every time.
Can provide a stack trace of the core when the server crashes? This may be different than the stack trace below.
The last provided stack trace was produced at the last server crash. I will provide another stack trace when CONSUMER_ONE crashes again. Currently it refuses to crash at initialization time and keeps running.
I have enabled full error logging and could find nothing. I have read a few threads (not all, I admit) on this list and http://directory.fedoraproject.org/wiki/FAQ#Debugging_Crashes and tried to troubleshoot.
The crash produced the attached core dump and I could use your help with understanding it. As well as any help with the crash. If more info is needed I will gladly provide it.
Regards, Mitja
-- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
On 07/15/2013 05:28 PM, Rich Megginson wrote:
On 07/15/2013 02:57 AM, Mitja Mihelič wrote:
On 07/12/2013 05:55 PM, Rich Megginson wrote:
On 07/12/2013 08:22 AM, Mitja Mihelič wrote:
On 07/09/2013 03:34 PM, Rich Megginson wrote:
On 07/09/2013 06:43 AM, Mitja Mihelič wrote:
Hi!
We are having problems with some our 389-DS instances. They crash after receiving an update from the provider.
After looking at the stack trace, I think this is https://fedorahosted.org/389/ticket/47391
Yes, it looks like it might be it. When CONSUMER_ONE crashed for the first time, the last thing replicated was a password change. Do you perhaps know, where I could get a 389DS version for Centos6 that has the patch? The ticket says it was pushed to 1.2.11, but would seem that our 1.2.11.15-14 is still an unpatched one and the repositories do not have any newer versions.
Is that the 389-ds-base that is included with CentOS6?
Yes, the 389-ds-base-1.2.11.15-14.el6_4.x86_64 and 389-ds-base-libs-1.2.11.15-14.el6_4.x86_64 are from the official Centos6 updates repoository. 389-ds-base-debuginfo is from http://debuginfo.centos.org/6/ The rest are from epel.
The crash happened twice after about a week of running without problems. The crashes happened on two consumer servers but not at the same time. The servers are running CentOS 6x with the following 389DS packages installed: 389-ds-console-doc-1.2.6-1.el6.noarch 389-console-1.1.7-1.el6.noarch 389-adminutil-1.1.15-1.el6.x86_64 389-dsgw-1.1.10-1.el6.x86_64 389-ds-base-debuginfo-1.2.11.15-14.el6_4.x86_64 389-admin-1.1.29-1.el6.x86_64 389-ds-console-1.2.6-1.el6.noarch 389-admin-console-doc-1.1.8-1.el6.noarch 389-ds-1.2.2-1.el6.noarch 389-ds-base-1.2.11.15-14.el6_4.x86_64 389-ds-base-libs-1.2.11.15-14.el6_4.x86_64 389-admin-console-1.1.8-1.el6.noarch
We are in the process of replacing the Centos 5x base consumer+provider setup with a CentOS 6x base one. For the time being, the CentOS 6 machines are acting as consumers for the old server. They run for a while and then the replicated instances crash though not at the same time. One of the servers did not want to start after the crash,
Can you provide the error messages from the errors log?
I have attached error logs from the provider (2013-06-27-provider_error) and the consumer (2013-06-27-server_two_error) in question.
so I have run db2index on its database. It's been running for four days and it has still not finished.
Try exporting using db2ldif, then importing using ldif2db.
The export process hangs. After an hour strace still shows: futex(0x7f5822670ed4, FUTEX_WAIT, 1, NULL The error log for this is attached as 2013-07-10-server_two-ldif_import_hangs.
Are you using db2ldif or db2ldif.pl? If you are using db2ldif, is the server running? If not, please try first shutting down the server and use db2ldif.
If db2ldif still hangs, then please follow the instructions at http://port389.org/wiki/FAQ#Debugging_Hangs to get a stack trace of the hung process.
I was using db2ldif with the server shut down. I tried it again and it hung. The LDIF file was created but its size was zero. The produced stack trace is attached as server_two-db2ldif_hang-stacktrace.1373877200.txt.
All I get from db2index now are these outputs: [09/Jul/2013:13:29:11 +0200] - reindex db: Processed 65095 entries (pass 1104) -- average rate 53686277.5/sec, recent rate 0.0/sec, hit ratio 0%
How many entries do you have in your database?
The number revolves around 65400. It varies perhaps 2 user del/add operations a month and 20 attribute changes per week, if that.
The other instance did start up, but the replication process did not work anymore. I disabled the replication to this host and set it up again. I chose "Initialize consumer now" and the consumer crashed every time.
Can provide a stack trace of the core when the server crashes? This may be different than the stack trace below.
The last provided stack trace was produced at the last server crash. I will provide another stack trace when CONSUMER_ONE crashes again. Currently it refuses to crash at initialization time and keeps running.
I have enabled full error logging and could find nothing. I have read a few threads (not all, I admit) on this list and http://directory.fedoraproject.org/wiki/FAQ#Debugging_Crashes and tried to troubleshoot.
The crash produced the attached core dump and I could use your help with understanding it. As well as any help with the crash. If more info is needed I will gladly provide it.
Regards, Mitja
-- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
On 07/16/2013 01:23 AM, Mitja Mihelič wrote:
On 07/15/2013 05:28 PM, Rich Megginson wrote:
On 07/15/2013 02:57 AM, Mitja Mihelič wrote:
On 07/12/2013 05:55 PM, Rich Megginson wrote:
On 07/12/2013 08:22 AM, Mitja Mihelič wrote:
On 07/09/2013 03:34 PM, Rich Megginson wrote:
On 07/09/2013 06:43 AM, Mitja Mihelič wrote: > Hi! > > We are having problems with some our 389-DS instances. They > crash after receiving an update from the provider.
After looking at the stack trace, I think this is https://fedorahosted.org/389/ticket/47391
Yes, it looks like it might be it. When CONSUMER_ONE crashed for the first time, the last thing replicated was a password change. Do you perhaps know, where I could get a 389DS version for Centos6 that has the patch? The ticket says it was pushed to 1.2.11, but would seem that our 1.2.11.15-14 is still an unpatched one and the repositories do not have any newer versions.
Is that the 389-ds-base that is included with CentOS6?
Yes, the 389-ds-base-1.2.11.15-14.el6_4.x86_64 and 389-ds-base-libs-1.2.11.15-14.el6_4.x86_64 are from the official Centos6 updates repoository. 389-ds-base-debuginfo is from http://debuginfo.centos.org/6/ The rest are from epel.
Looking at the stack trace you sent earlier - there is only 1 thread? You ran
gdb -ex 'set confirm off' -ex 'set pagination off' -ex 'thread apply all bt full' -ex 'quit' /usr/sbin/ns-slapd `pidof ns-slapd` > stacktrace.`date +%s`.txt 2>&1
? If so, I have no idea what's going on - I've never seen the server deadlock itself with only 1 thread . . .
> The crash happened twice after about a week of running without > problems. The crashes happened on two consumer servers but not > at the same time. > The servers are running CentOS 6x with the following 389DS > packages installed: > 389-ds-console-doc-1.2.6-1.el6.noarch > 389-console-1.1.7-1.el6.noarch > 389-adminutil-1.1.15-1.el6.x86_64 > 389-dsgw-1.1.10-1.el6.x86_64 > 389-ds-base-debuginfo-1.2.11.15-14.el6_4.x86_64 > 389-admin-1.1.29-1.el6.x86_64 > 389-ds-console-1.2.6-1.el6.noarch > 389-admin-console-doc-1.1.8-1.el6.noarch > 389-ds-1.2.2-1.el6.noarch > 389-ds-base-1.2.11.15-14.el6_4.x86_64 > 389-ds-base-libs-1.2.11.15-14.el6_4.x86_64 > 389-admin-console-1.1.8-1.el6.noarch > > We are in the process of replacing the Centos 5x base > consumer+provider setup with a CentOS 6x base one. For the time > being, the CentOS 6 machines are acting as consumers for the old > server. They run for a while and then the replicated instances > crash though not at the same time. > One of the servers did not want to start after the crash,
Can you provide the error messages from the errors log?
I have attached error logs from the provider (2013-06-27-provider_error) and the consumer (2013-06-27-server_two_error) in question.
> so I have run db2index on its database. It's been running for > four days and it has still not finished.
Try exporting using db2ldif, then importing using ldif2db.
The export process hangs. After an hour strace still shows: futex(0x7f5822670ed4, FUTEX_WAIT, 1, NULL The error log for this is attached as 2013-07-10-server_two-ldif_import_hangs.
Are you using db2ldif or db2ldif.pl? If you are using db2ldif, is the server running? If not, please try first shutting down the server and use db2ldif.
If db2ldif still hangs, then please follow the instructions at http://port389.org/wiki/FAQ#Debugging_Hangs to get a stack trace of the hung process.
I was using db2ldif with the server shut down. I tried it again and it hung. The LDIF file was created but its size was zero. The produced stack trace is attached as server_two-db2ldif_hang-stacktrace.1373877200.txt.
> All I get from db2index now are these outputs: > [09/Jul/2013:13:29:11 +0200] - reindex db: Processed 65095 > entries (pass 1104) -- average rate 53686277.5/sec, recent rate > 0.0/sec, hit ratio 0%
How many entries do you have in your database?
The number revolves around 65400. It varies perhaps 2 user del/add operations a month and 20 attribute changes per week, if that.
> > The other instance did start up, but the replication process did > not work anymore. I disabled the replication to this host and > set it up again. I chose "Initialize consumer now" and the > consumer crashed every time.
Can provide a stack trace of the core when the server crashes? This may be different than the stack trace below.
The last provided stack trace was produced at the last server crash. I will provide another stack trace when CONSUMER_ONE crashes again. Currently it refuses to crash at initialization time and keeps running.
> I have enabled full error logging and could find nothing. > I have read a few threads (not all, I admit) on this list and > http://directory.fedoraproject.org/wiki/FAQ#Debugging_Crashes > and tried to troubleshoot. > > The crash produced the attached core dump and I could use your > help with understanding it. As well as any help with the crash. > If more info is needed I will gladly provide it. > > Regards, Mitja > > > > -- > 389 users mailing list > 389-users@lists.fedoraproject.org > https://admin.fedoraproject.org/mailman/listinfo/389-users
On 07/16/2013 04:49 PM, Rich Megginson wrote:
On 07/16/2013 01:23 AM, Mitja Mihelič wrote:
On 07/15/2013 05:28 PM, Rich Megginson wrote:
On 07/15/2013 02:57 AM, Mitja Mihelič wrote:
On 07/12/2013 05:55 PM, Rich Megginson wrote:
On 07/12/2013 08:22 AM, Mitja Mihelič wrote:
On 07/09/2013 03:34 PM, Rich Megginson wrote: > On 07/09/2013 06:43 AM, Mitja Mihelič wrote: >> Hi! >> >> We are having problems with some our 389-DS instances. They >> crash after receiving an update from the provider. > > After looking at the stack trace, I think this is > https://fedorahosted.org/389/ticket/47391
Yes, it looks like it might be it. When CONSUMER_ONE crashed for the first time, the last thing replicated was a password change. Do you perhaps know, where I could get a 389DS version for Centos6 that has the patch? The ticket says it was pushed to 1.2.11, but would seem that our 1.2.11.15-14 is still an unpatched one and the repositories do not have any newer versions.
Is that the 389-ds-base that is included with CentOS6?
Yes, the 389-ds-base-1.2.11.15-14.el6_4.x86_64 and 389-ds-base-libs-1.2.11.15-14.el6_4.x86_64 are from the official Centos6 updates repoository. 389-ds-base-debuginfo is from http://debuginfo.centos.org/6/ The rest are from epel.
Looking at the stack trace you sent earlier - there is only 1 thread? You ran gdb -ex 'set confirm off' -ex 'set pagination off' -ex 'thread apply all bt full' -ex 'quit' /usr/sbin/ns-slapd `pidof ns-slapd` > stacktrace.`date +%s`.txt 2>&1
? If so, I have no idea what's going on - I've never seen the server deadlock itself with only 1 thread . . .
I ran gdb -ex 'set confirm off' -ex 'set pagination off' -ex 'thread apply all bt full' -ex 'quit' /usr/sbin/ns-slapd `pidof -o 49171 ns-slapd` > stacktrace.`date +%s`.txt 2>&1 The "-o 49171" is to exclude the pid of the config server instance, so only the problematic pid was looked at. If you get any more information regarding this crash it would be very much appreciated.
It may be best if I removed all 389DS related data from both of the consumer servers and start fresh. If they crash again I will send the relevant stack traces.
> >> The crash happened twice after about a week of running without >> problems. The crashes happened on two consumer servers but not >> at the same time. >> The servers are running CentOS 6x with the following 389DS >> packages installed: >> 389-ds-console-doc-1.2.6-1.el6.noarch >> 389-console-1.1.7-1.el6.noarch >> 389-adminutil-1.1.15-1.el6.x86_64 >> 389-dsgw-1.1.10-1.el6.x86_64 >> 389-ds-base-debuginfo-1.2.11.15-14.el6_4.x86_64 >> 389-admin-1.1.29-1.el6.x86_64 >> 389-ds-console-1.2.6-1.el6.noarch >> 389-admin-console-doc-1.1.8-1.el6.noarch >> 389-ds-1.2.2-1.el6.noarch >> 389-ds-base-1.2.11.15-14.el6_4.x86_64 >> 389-ds-base-libs-1.2.11.15-14.el6_4.x86_64 >> 389-admin-console-1.1.8-1.el6.noarch >> >> We are in the process of replacing the Centos 5x base >> consumer+provider setup with a CentOS 6x base one. For the time >> being, the CentOS 6 machines are acting as consumers for the >> old server. They run for a while and then the replicated >> instances crash though not at the same time. >> One of the servers did not want to start after the crash, > > Can you provide the error messages from the errors log? I have attached error logs from the provider (2013-06-27-provider_error) and the consumer (2013-06-27-server_two_error) in question. > >> so I have run db2index on its database. It's been running for >> four days and it has still not finished. > > Try exporting using db2ldif, then importing using ldif2db. The export process hangs. After an hour strace still shows: futex(0x7f5822670ed4, FUTEX_WAIT, 1, NULL The error log for this is attached as 2013-07-10-server_two-ldif_import_hangs.
Are you using db2ldif or db2ldif.pl? If you are using db2ldif, is the server running? If not, please try first shutting down the server and use db2ldif.
If db2ldif still hangs, then please follow the instructions at http://port389.org/wiki/FAQ#Debugging_Hangs to get a stack trace of the hung process.
I was using db2ldif with the server shut down. I tried it again and it hung. The LDIF file was created but its size was zero. The produced stack trace is attached as server_two-db2ldif_hang-stacktrace.1373877200.txt.
> >> All I get from db2index now are these outputs: >> [09/Jul/2013:13:29:11 +0200] - reindex db: Processed 65095 >> entries (pass 1104) -- average rate 53686277.5/sec, recent rate >> 0.0/sec, hit ratio 0% > > How many entries do you have in your database? The number revolves around 65400. It varies perhaps 2 user del/add operations a month and 20 attribute changes per week, if that. > >> >> The other instance did start up, but the replication process >> did not work anymore. I disabled the replication to this host >> and set it up again. I chose "Initialize consumer now" and the >> consumer crashed every time. > > Can provide a stack trace of the core when the server crashes? > This may be different than the stack trace below. The last provided stack trace was produced at the last server crash. I will provide another stack trace when CONSUMER_ONE crashes again. Currently it refuses to crash at initialization time and keeps running. > >> I have enabled full error logging and could find nothing. >> I have read a few threads (not all, I admit) on this list and >> http://directory.fedoraproject.org/wiki/FAQ#Debugging_Crashes >> and tried to troubleshoot. >> >> The crash produced the attached core dump and I could use your >> help with understanding it. As well as any help with the crash. >> If more info is needed I will gladly provide it. >> >> Regards, Mitja >> >> >> >> -- >> 389 users mailing list >> 389-users@lists.fedoraproject.org >> https://admin.fedoraproject.org/mailman/listinfo/389-users >
On 07/17/2013 01:52 AM, Mitja Mihelič wrote:
On 07/16/2013 04:49 PM, Rich Megginson wrote:
On 07/16/2013 01:23 AM, Mitja Mihelič wrote:
On 07/15/2013 05:28 PM, Rich Megginson wrote:
On 07/15/2013 02:57 AM, Mitja Mihelič wrote:
On 07/12/2013 05:55 PM, Rich Megginson wrote:
On 07/12/2013 08:22 AM, Mitja Mihelič wrote: > On 07/09/2013 03:34 PM, Rich Megginson wrote: >> On 07/09/2013 06:43 AM, Mitja Mihelič wrote: >>> Hi! >>> >>> We are having problems with some our 389-DS instances. They >>> crash after receiving an update from the provider. >> >> After looking at the stack trace, I think this is >> https://fedorahosted.org/389/ticket/47391
Yes, it looks like it might be it. When CONSUMER_ONE crashed for the first time, the last thing replicated was a password change. Do you perhaps know, where I could get a 389DS version for Centos6 that has the patch? The ticket says it was pushed to 1.2.11, but would seem that our 1.2.11.15-14 is still an unpatched one and the repositories do not have any newer versions.
Is that the 389-ds-base that is included with CentOS6?
Yes, the 389-ds-base-1.2.11.15-14.el6_4.x86_64 and 389-ds-base-libs-1.2.11.15-14.el6_4.x86_64 are from the official Centos6 updates repoository. 389-ds-base-debuginfo is from http://debuginfo.centos.org/6/ The rest are from epel.
Looking at the stack trace you sent earlier - there is only 1 thread? You ran gdb -ex 'set confirm off' -ex 'set pagination off' -ex 'thread apply all bt full' -ex 'quit' /usr/sbin/ns-slapd `pidof ns-slapd` > stacktrace.`date +%s`.txt 2>&1
? If so, I have no idea what's going on - I've never seen the server deadlock itself with only 1 thread . . .
I ran gdb -ex 'set confirm off' -ex 'set pagination off' -ex 'thread apply all bt full' -ex 'quit' /usr/sbin/ns-slapd `pidof -o 49171 ns-slapd` > stacktrace.`date +%s`.txt 2>&1 The "-o 49171" is to exclude the pid of the config server instance, so only the problematic pid was looked at. If you get any more information regarding this crash it would be very much appreciated.
It may be best if I removed all 389DS related data from both of the consumer servers and start fresh. If they crash again I will send the relevant stack traces.
Yes, that sounds good.
>> >>> The crash happened twice after about a week of running without >>> problems. The crashes happened on two consumer servers but not >>> at the same time. >>> The servers are running CentOS 6x with the following 389DS >>> packages installed: >>> 389-ds-console-doc-1.2.6-1.el6.noarch >>> 389-console-1.1.7-1.el6.noarch >>> 389-adminutil-1.1.15-1.el6.x86_64 >>> 389-dsgw-1.1.10-1.el6.x86_64 >>> 389-ds-base-debuginfo-1.2.11.15-14.el6_4.x86_64 >>> 389-admin-1.1.29-1.el6.x86_64 >>> 389-ds-console-1.2.6-1.el6.noarch >>> 389-admin-console-doc-1.1.8-1.el6.noarch >>> 389-ds-1.2.2-1.el6.noarch >>> 389-ds-base-1.2.11.15-14.el6_4.x86_64 >>> 389-ds-base-libs-1.2.11.15-14.el6_4.x86_64 >>> 389-admin-console-1.1.8-1.el6.noarch >>> >>> We are in the process of replacing the Centos 5x base >>> consumer+provider setup with a CentOS 6x base one. For the >>> time being, the CentOS 6 machines are acting as consumers for >>> the old server. They run for a while and then the replicated >>> instances crash though not at the same time. >>> One of the servers did not want to start after the crash, >> >> Can you provide the error messages from the errors log? > I have attached error logs from the provider > (2013-06-27-provider_error) and the consumer > (2013-06-27-server_two_error) in question. >> >>> so I have run db2index on its database. It's been running for >>> four days and it has still not finished. >> >> Try exporting using db2ldif, then importing using ldif2db. > The export process hangs. After an hour strace still shows: > futex(0x7f5822670ed4, FUTEX_WAIT, 1, NULL > The error log for this is attached as > 2013-07-10-server_two-ldif_import_hangs.
Are you using db2ldif or db2ldif.pl? If you are using db2ldif, is the server running? If not, please try first shutting down the server and use db2ldif.
If db2ldif still hangs, then please follow the instructions at http://port389.org/wiki/FAQ#Debugging_Hangs to get a stack trace of the hung process.
I was using db2ldif with the server shut down. I tried it again and it hung. The LDIF file was created but its size was zero. The produced stack trace is attached as server_two-db2ldif_hang-stacktrace.1373877200.txt.
> >> >>> All I get from db2index now are these outputs: >>> [09/Jul/2013:13:29:11 +0200] - reindex db: Processed 65095 >>> entries (pass 1104) -- average rate 53686277.5/sec, recent >>> rate 0.0/sec, hit ratio 0% >> >> How many entries do you have in your database? > The number revolves around 65400. It varies perhaps 2 user > del/add operations a month and 20 attribute changes per week, if > that. >> >>> >>> The other instance did start up, but the replication process >>> did not work anymore. I disabled the replication to this host >>> and set it up again. I chose "Initialize consumer now" and the >>> consumer crashed every time. >> >> Can provide a stack trace of the core when the server crashes? >> This may be different than the stack trace below. > The last provided stack trace was produced at the last server > crash. I will provide another stack trace when CONSUMER_ONE > crashes again. Currently it refuses to crash at initialization > time and keeps running. >> >>> I have enabled full error logging and could find nothing. >>> I have read a few threads (not all, I admit) on this list and >>> http://directory.fedoraproject.org/wiki/FAQ#Debugging_Crashes >>> and tried to troubleshoot. >>> >>> The crash produced the attached core dump and I could use your >>> help with understanding it. As well as any help with the >>> crash. If more info is needed I will gladly provide it. >>> >>> Regards, Mitja >>> >>> >>> >>> -- >>> 389 users mailing list >>> 389-users@lists.fedoraproject.org >>> https://admin.fedoraproject.org/mailman/listinfo/389-users >> >
One of the consumers has crashed again and I have attached the stacktrace. Four hous later it crashed again.
I do hope there is something in the stacktraces, so that something can be done to prevent future crashes.
The last log message in errors log was both times: ber_flush skipped because the connection was marked to be closed or abandoned
The following versions 389ds packages were installed at the time: 389-admin-1.1.29-1.el6.x86_64 389-admin-console-1.1.8-1.el6.noarch 389-admin-console-doc-1.1.8-1.el6.noarch 389-adminutil-1.1.15-1.el6.x86_64 389-console-1.1.7-1.el6.noarch 389-ds-1.2.2-1.el6.noarch 389-ds-base-1.2.11.15-22.el6_4.x86_64 389-ds-base-libs-1.2.11.15-22.el6_4.x86_64 389-ds-console-1.2.6-1.el6.noarch 389-ds-console-doc-1.2.6-1.el6.noarch 389-dsgw-1.1.10-1.el6.x86_64
Reragds, Mitja
On 17. 07. 2013 09:52, Mitja MiheliÄ wrote:
It may be best if I removed all 389DS related data from both of the consumer servers and start fresh. If they crash again I will send the relevant stack traces.
On 11/14/2013 08:50 AM, Mitja Mihelič wrote:
One of the consumers has crashed again and I have attached the stacktrace. Four hous later it crashed again.
I do hope there is something in the stacktraces, so that something can be done to prevent future crashes.
Unfortunately, not enough. Looks like there is still some mismatch between the version of the package and the version of the debuginfo package.
rpm -q 389-ds-base 389-ds-base-debuginfo openldap openldap-debuginfo db4 db4-debuginfo nss nss-debuginfo nspr nspr-debuginfo glibc glibc-debuginfo
Also, if you are seeing the message: ber_flush skipped because the connection was marked to be closed or abandoned
This means you are running with the CONNS error log level, which means you may have a lot of useful information in your errors log. Would you be able to provide that?
The last log message in errors log was both times: ber_flush skipped because the connection was marked to be closed or abandoned
The following versions 389ds packages were installed at the time: 389-admin-1.1.29-1.el6.x86_64 389-admin-console-1.1.8-1.el6.noarch 389-admin-console-doc-1.1.8-1.el6.noarch 389-adminutil-1.1.15-1.el6.x86_64 389-console-1.1.7-1.el6.noarch 389-ds-1.2.2-1.el6.noarch 389-ds-base-1.2.11.15-22.el6_4.x86_64 389-ds-base-libs-1.2.11.15-22.el6_4.x86_64 389-ds-console-1.2.6-1.el6.noarch 389-ds-console-doc-1.2.6-1.el6.noarch 389-dsgw-1.1.10-1.el6.x86_64
Reragds, Mitja
On 17. 07. 2013 09:52, Mitja MiheliÄ wrote:
It may be best if I removed all 389DS related data from both of the consumer servers and start fresh. If they crash again I will send the relevant stack traces.
On 14. 11. 2013 22:08, Rich Megginson wrote:
On 11/14/2013 08:50 AM, Mitja Mihelič wrote:
One of the consumers has crashed again and I have attached the stacktrace. Four hous later it crashed again.
I do hope there is something in the stacktraces, so that something can be done to prevent future crashes.
Unfortunately, not enough. Looks like there is still some mismatch between the version of the package and the version of the debuginfo package.
rpm -q 389-ds-base 389-ds-base-debuginfo openldap openldap-debuginfo db4 db4-debuginfo nss nss-debuginfo nspr nspr-debuginfo glibc glibc-debuginfo
The suggested debuginfo packages were not installed at the time when the stacktraces were made. They are installed now. I have recreated the stacktraces and attached them.
Also, if you are seeing the message: ber_flush skipped because the connection was marked to be closed or abandoned
This means you are running with the CONNS error log level, which means you may have a lot of useful information in your errors log. Would you be able to provide that?
I can provide the error logs, but will need to anonimize our user data. How large a time time interval do you need?
The last log message in errors log was both times: ber_flush skipped because the connection was marked to be closed or abandoned
The following versions 389ds packages were installed at the time: 389-admin-1.1.29-1.el6.x86_64 389-admin-console-1.1.8-1.el6.noarch 389-admin-console-doc-1.1.8-1.el6.noarch 389-adminutil-1.1.15-1.el6.x86_64 389-console-1.1.7-1.el6.noarch 389-ds-1.2.2-1.el6.noarch 389-ds-base-1.2.11.15-22.el6_4.x86_64 389-ds-base-libs-1.2.11.15-22.el6_4.x86_64 389-ds-console-1.2.6-1.el6.noarch 389-ds-console-doc-1.2.6-1.el6.noarch 389-dsgw-1.1.10-1.el6.x86_64
Reragds, Mitja
On 17. 07. 2013 09:52, Mitja MiheliÄ wrote:
It may be best if I removed all 389DS related data from both of the consumer servers and start fresh. If they crash again I will send the relevant stack traces.
On 11/15/2013 02:58 AM, Mitja Mihelič wrote:
On 14. 11. 2013 22:08, Rich Megginson wrote:
On 11/14/2013 08:50 AM, Mitja Mihelič wrote:
One of the consumers has crashed again and I have attached the stacktrace. Four hous later it crashed again.
I do hope there is something in the stacktraces, so that something can be done to prevent future crashes.
Unfortunately, not enough. Looks like there is still some mismatch between the version of the package and the version of the debuginfo package.
rpm -q 389-ds-base 389-ds-base-debuginfo openldap openldap-debuginfo db4 db4-debuginfo nss nss-debuginfo nspr nspr-debuginfo glibc glibc-debuginfo
The suggested debuginfo packages were not installed at the time when the stacktraces were made. They are installed now. I have recreated the stacktraces and attached them.
The crash looks related to paged searches. We have changed this code somewhat in the next version. Can you try the latest version in the EPEL6 testing repo? 389-ds-base-1.2.11.23-3 http://port389.org/wiki/Download
Also, if you are seeing the message: ber_flush skipped because the connection was marked to be closed or abandoned
This means you are running with the CONNS error log level, which means you may have a lot of useful information in your errors log. Would you be able to provide that?
I can provide the error logs, but will need to anonimize our user data. How large a time time interval do you need?
The last log message in errors log was both times: ber_flush skipped because the connection was marked to be closed or abandoned
The following versions 389ds packages were installed at the time: 389-admin-1.1.29-1.el6.x86_64 389-admin-console-1.1.8-1.el6.noarch 389-admin-console-doc-1.1.8-1.el6.noarch 389-adminutil-1.1.15-1.el6.x86_64 389-console-1.1.7-1.el6.noarch 389-ds-1.2.2-1.el6.noarch 389-ds-base-1.2.11.15-22.el6_4.x86_64 389-ds-base-libs-1.2.11.15-22.el6_4.x86_64 389-ds-console-1.2.6-1.el6.noarch 389-ds-console-doc-1.2.6-1.el6.noarch 389-dsgw-1.1.10-1.el6.x86_64
Reragds, Mitja
On 17. 07. 2013 09:52, Mitja MiheliÄ wrote:
It may be best if I removed all 389DS related data from both of the consumer servers and start fresh. If they crash again I will send the relevant stack traces.
On 15. 11. 2013 21:46, Rich Megginson wrote:
On 11/15/2013 02:58 AM, Mitja Mihelič wrote:
On 14. 11. 2013 22:08, Rich Megginson wrote:
On 11/14/2013 08:50 AM, Mitja Mihelič wrote:
One of the consumers has crashed again and I have attached the stacktrace. Four hous later it crashed again.
I do hope there is something in the stacktraces, so that something can be done to prevent future crashes.
Unfortunately, not enough. Looks like there is still some mismatch between the version of the package and the version of the debuginfo package.
rpm -q 389-ds-base 389-ds-base-debuginfo openldap openldap-debuginfo db4 db4-debuginfo nss nss-debuginfo nspr nspr-debuginfo glibc glibc-debuginfo
The suggested debuginfo packages were not installed at the time when the stacktraces were made. They are installed now. I have recreated the stacktraces and attached them.
The crash looks related to paged searches. We have changed this code somewhat in the next version. Can you try the latest version in the EPEL6 testing repo? 389-ds-base-1.2.11.23-3 http://port389.org/wiki/Download
Before installing packages from the testing repo, are there any other changes I could do?
When you mentioned a relation to paged searches, perharps this might be related to our usage of SSSD ona server that is querying the 389DS. Currently it uses paging of results, as it is enabled by default and page size is set to 1000 results. On the 389DS nsslapd-sizelimit is set to 2000.
Every 5 minutes SSSD issues this search query: SRCH base="dc=TIER2,dc=COMPANY,dc=si" scope=2 filter="(&(objectClass=posixAccount)(uid=*)(uidNumber=*)(gidNumber=*))" attrs="objectClass uid userPassword uidNumber gidNumber gecos homeDirectory loginShell krbprincipalname cn modifyTimestamp modifyTimestamp shadowLastChange shadowMin shadowMax shadowWarning shadowInactive shadowExpire shadowFlag krblastpwdchange krbpasswordexpiration pwdattribute authorizedService accountexpires useraccountcontrol nsAccountLock host logindisabled loginexpirationtime loginallowedtimemap"
The first 1000 entries are returned. conn=1276 op=3 RESULT err=0 tag=101 nentries=1000 etime=34.129000 notes=U,P
Then the exact same search is issued again, and 999 are returned. conn=1276 op=4 RESULT err=4 tag=101 nentries=999 etime=1.056000 notes=U,P
err=4 is understandable, since nsslapd-sizelimit = 2000.
Should I disable result paging for SSSD? Perhaps even set nsslapd-sizelimit to -1? (I would not like to do this)
Regards, Mitja
Also, if you are seeing the message: ber_flush skipped because the connection was marked to be closed or abandoned
This means you are running with the CONNS error log level, which means you may have a lot of useful information in your errors log. Would you be able to provide that?
I can provide the error logs, but will need to anonimize our user data. How large a time time interval do you need?
The last log message in errors log was both times: ber_flush skipped because the connection was marked to be closed or abandoned
The following versions 389ds packages were installed at the time: 389-admin-1.1.29-1.el6.x86_64 389-admin-console-1.1.8-1.el6.noarch 389-admin-console-doc-1.1.8-1.el6.noarch 389-adminutil-1.1.15-1.el6.x86_64 389-console-1.1.7-1.el6.noarch 389-ds-1.2.2-1.el6.noarch 389-ds-base-1.2.11.15-22.el6_4.x86_64 389-ds-base-libs-1.2.11.15-22.el6_4.x86_64 389-ds-console-1.2.6-1.el6.noarch 389-ds-console-doc-1.2.6-1.el6.noarch 389-dsgw-1.1.10-1.el6.x86_64
Reragds, Mitja
On 17. 07. 2013 09:52, Mitja MiheliÄ wrote:
It may be best if I removed all 389DS related data from both of the consumer servers and start fresh. If they crash again I will send the relevant stack traces.
On 11/18/2013 07:01 AM, Mitja Mihelič wrote:
On 15. 11. 2013 21:46, Rich Megginson wrote:
On 11/15/2013 02:58 AM, Mitja Mihelič wrote:
On 14. 11. 2013 22:08, Rich Megginson wrote:
On 11/14/2013 08:50 AM, Mitja Mihelič wrote:
One of the consumers has crashed again and I have attached the stacktrace. Four hous later it crashed again.
I do hope there is something in the stacktraces, so that something can be done to prevent future crashes.
Unfortunately, not enough. Looks like there is still some mismatch between the version of the package and the version of the debuginfo package.
rpm -q 389-ds-base 389-ds-base-debuginfo openldap openldap-debuginfo db4 db4-debuginfo nss nss-debuginfo nspr nspr-debuginfo glibc glibc-debuginfo
The suggested debuginfo packages were not installed at the time when the stacktraces were made. They are installed now. I have recreated the stacktraces and attached them.
The crash looks related to paged searches. We have changed this code somewhat in the next version. Can you try the latest version in the EPEL6 testing repo? 389-ds-base-1.2.11.23-3 http://port389.org/wiki/Download
Before installing packages from the testing repo, are there any other changes I could do?
When you mentioned a relation to paged searches, perharps this might be related to our usage of SSSD ona server that is querying the 389DS. Currently it uses paging of results, as it is enabled by default and page size is set to 1000 results. On the 389DS nsslapd-sizelimit is set to 2000.
Every 5 minutes SSSD issues this search query: SRCH base="dc=TIER2,dc=COMPANY,dc=si" scope=2 filter="(&(objectClass=posixAccount)(uid=*)(uidNumber=*)(gidNumber=*))" attrs="objectClass uid userPassword uidNumber gidNumber gecos homeDirectory loginShell krbprincipalname cn modifyTimestamp modifyTimestamp shadowLastChange shadowMin shadowMax shadowWarning shadowInactive shadowExpire shadowFlag krblastpwdchange krbpasswordexpiration pwdattribute authorizedService accountexpires useraccountcontrol nsAccountLock host logindisabled loginexpirationtime loginallowedtimemap"
The first 1000 entries are returned. conn=1276 op=3 RESULT err=0 tag=101 nentries=1000 etime=34.129000 notes=U,P
Then the exact same search is issued again, and 999 are returned. conn=1276 op=4 RESULT err=4 tag=101 nentries=999 etime=1.056000 notes=U,P
err=4 is understandable, since nsslapd-sizelimit = 2000.
Should I disable result paging for SSSD?
You could try that, yes. The problem seems related to paging.
Perhaps even set nsslapd-sizelimit to -1? (I would not like to do this)
Regards, Mitja
Also, if you are seeing the message: ber_flush skipped because the connection was marked to be closed or abandoned
This means you are running with the CONNS error log level, which means you may have a lot of useful information in your errors log. Would you be able to provide that?
I can provide the error logs, but will need to anonimize our user data. How large a time time interval do you need?
The last log message in errors log was both times: ber_flush skipped because the connection was marked to be closed or abandoned
The following versions 389ds packages were installed at the time: 389-admin-1.1.29-1.el6.x86_64 389-admin-console-1.1.8-1.el6.noarch 389-admin-console-doc-1.1.8-1.el6.noarch 389-adminutil-1.1.15-1.el6.x86_64 389-console-1.1.7-1.el6.noarch 389-ds-1.2.2-1.el6.noarch 389-ds-base-1.2.11.15-22.el6_4.x86_64 389-ds-base-libs-1.2.11.15-22.el6_4.x86_64 389-ds-console-1.2.6-1.el6.noarch 389-ds-console-doc-1.2.6-1.el6.noarch 389-dsgw-1.1.10-1.el6.x86_64
Reragds, Mitja
On 17. 07. 2013 09:52, Mitja MiheliÄ wrote:
It may be best if I removed all 389DS related data from both of the consumer servers and start fresh. If they crash again I will send the relevant stack traces.
I disabled LDAP paging in sssd.conf and let the setup run for a while. No crashes since. It does worry me though, that some other application could crash the server by using result paging.
On 18. 11. 2013 17:05, Rich Megginson wrote:
On 11/18/2013 07:01 AM, Mitja Mihelič wrote:
On 15. 11. 2013 21:46, Rich Megginson wrote:
On 11/15/2013 02:58 AM, Mitja Mihelič wrote:
On 14. 11. 2013 22:08, Rich Megginson wrote:
On 11/14/2013 08:50 AM, Mitja Mihelič wrote:
One of the consumers has crashed again and I have attached the stacktrace. Four hous later it crashed again.
I do hope there is something in the stacktraces, so that something can be done to prevent future crashes.
Unfortunately, not enough. Looks like there is still some mismatch between the version of the package and the version of the debuginfo package.
rpm -q 389-ds-base 389-ds-base-debuginfo openldap openldap-debuginfo db4 db4-debuginfo nss nss-debuginfo nspr nspr-debuginfo glibc glibc-debuginfo
The suggested debuginfo packages were not installed at the time when the stacktraces were made. They are installed now. I have recreated the stacktraces and attached them.
The crash looks related to paged searches. We have changed this code somewhat in the next version. Can you try the latest version in the EPEL6 testing repo? 389-ds-base-1.2.11.23-3 http://port389.org/wiki/Download
Before installing packages from the testing repo, are there any other changes I could do?
When you mentioned a relation to paged searches, perharps this might be related to our usage of SSSD ona server that is querying the 389DS. Currently it uses paging of results, as it is enabled by default and page size is set to 1000 results. On the 389DS nsslapd-sizelimit is set to 2000.
Every 5 minutes SSSD issues this search query: SRCH base="dc=TIER2,dc=COMPANY,dc=si" scope=2 filter="(&(objectClass=posixAccount)(uid=*)(uidNumber=*)(gidNumber=*))" attrs="objectClass uid userPassword uidNumber gidNumber gecos homeDirectory loginShell krbprincipalname cn modifyTimestamp modifyTimestamp shadowLastChange shadowMin shadowMax shadowWarning shadowInactive shadowExpire shadowFlag krblastpwdchange krbpasswordexpiration pwdattribute authorizedService accountexpires useraccountcontrol nsAccountLock host logindisabled loginexpirationtime loginallowedtimemap"
The first 1000 entries are returned. conn=1276 op=3 RESULT err=0 tag=101 nentries=1000 etime=34.129000 notes=U,P
Then the exact same search is issued again, and 999 are returned. conn=1276 op=4 RESULT err=4 tag=101 nentries=999 etime=1.056000 notes=U,P
err=4 is understandable, since nsslapd-sizelimit = 2000.
Should I disable result paging for SSSD?
You could try that, yes. The problem seems related to paging.
Perhaps even set nsslapd-sizelimit to -1? (I would not like to do this)
Regards, Mitja
Also, if you are seeing the message: ber_flush skipped because the connection was marked to be closed or abandoned
This means you are running with the CONNS error log level, which means you may have a lot of useful information in your errors log. Would you be able to provide that?
I can provide the error logs, but will need to anonimize our user data. How large a time time interval do you need?
The last log message in errors log was both times: ber_flush skipped because the connection was marked to be closed or abandoned
The following versions 389ds packages were installed at the time: 389-admin-1.1.29-1.el6.x86_64 389-admin-console-1.1.8-1.el6.noarch 389-admin-console-doc-1.1.8-1.el6.noarch 389-adminutil-1.1.15-1.el6.x86_64 389-console-1.1.7-1.el6.noarch 389-ds-1.2.2-1.el6.noarch 389-ds-base-1.2.11.15-22.el6_4.x86_64 389-ds-base-libs-1.2.11.15-22.el6_4.x86_64 389-ds-console-1.2.6-1.el6.noarch 389-ds-console-doc-1.2.6-1.el6.noarch 389-dsgw-1.1.10-1.el6.x86_64
Reragds, Mitja
On 17. 07. 2013 09:52, Mitja MiheliÄ wrote: > > > It may be best if I removed all 389DS related data from both of > the consumer servers and start fresh. If they crash again I will > send the relevant stack traces.
On 11/21/2013 02:23 AM, Mitja Mihelič wrote:
I disabled LDAP paging in sssd.conf and let the setup run for a while. No crashes since. It does worry me though, that some other application could crash the server by using result paging.
I am worried too. Please file a ticket.
On 18. 11. 2013 17:05, Rich Megginson wrote:
On 11/18/2013 07:01 AM, Mitja Mihelič wrote:
On 15. 11. 2013 21:46, Rich Megginson wrote:
On 11/15/2013 02:58 AM, Mitja Mihelič wrote:
On 14. 11. 2013 22:08, Rich Megginson wrote:
On 11/14/2013 08:50 AM, Mitja Mihelič wrote: > One of the consumers has crashed again and I have attached the > stacktrace. > Four hous later it crashed again. > > I do hope there is something in the stacktraces, so that > something can be done to prevent future crashes.
Unfortunately, not enough. Looks like there is still some mismatch between the version of the package and the version of the debuginfo package.
rpm -q 389-ds-base 389-ds-base-debuginfo openldap openldap-debuginfo db4 db4-debuginfo nss nss-debuginfo nspr nspr-debuginfo glibc glibc-debuginfo
The suggested debuginfo packages were not installed at the time when the stacktraces were made. They are installed now. I have recreated the stacktraces and attached them.
The crash looks related to paged searches. We have changed this code somewhat in the next version. Can you try the latest version in the EPEL6 testing repo? 389-ds-base-1.2.11.23-3 http://port389.org/wiki/Download
Before installing packages from the testing repo, are there any other changes I could do?
When you mentioned a relation to paged searches, perharps this might be related to our usage of SSSD ona server that is querying the 389DS. Currently it uses paging of results, as it is enabled by default and page size is set to 1000 results. On the 389DS nsslapd-sizelimit is set to 2000.
Every 5 minutes SSSD issues this search query: SRCH base="dc=TIER2,dc=COMPANY,dc=si" scope=2 filter="(&(objectClass=posixAccount)(uid=*)(uidNumber=*)(gidNumber=*))" attrs="objectClass uid userPassword uidNumber gidNumber gecos homeDirectory loginShell krbprincipalname cn modifyTimestamp modifyTimestamp shadowLastChange shadowMin shadowMax shadowWarning shadowInactive shadowExpire shadowFlag krblastpwdchange krbpasswordexpiration pwdattribute authorizedService accountexpires useraccountcontrol nsAccountLock host logindisabled loginexpirationtime loginallowedtimemap"
The first 1000 entries are returned. conn=1276 op=3 RESULT err=0 tag=101 nentries=1000 etime=34.129000 notes=U,P
Then the exact same search is issued again, and 999 are returned. conn=1276 op=4 RESULT err=4 tag=101 nentries=999 etime=1.056000 notes=U,P
err=4 is understandable, since nsslapd-sizelimit = 2000.
Should I disable result paging for SSSD?
You could try that, yes. The problem seems related to paging.
Perhaps even set nsslapd-sizelimit to -1? (I would not like to do this)
Regards, Mitja
Also, if you are seeing the message: ber_flush skipped because the connection was marked to be closed or abandoned
This means you are running with the CONNS error log level, which means you may have a lot of useful information in your errors log. Would you be able to provide that?
I can provide the error logs, but will need to anonimize our user data. How large a time time interval do you need?
> > The last log message in errors log was both times: > ber_flush skipped because the connection was marked to be closed > or abandoned > > The following versions 389ds packages were installed at the time: > 389-admin-1.1.29-1.el6.x86_64 > 389-admin-console-1.1.8-1.el6.noarch > 389-admin-console-doc-1.1.8-1.el6.noarch > 389-adminutil-1.1.15-1.el6.x86_64 > 389-console-1.1.7-1.el6.noarch > 389-ds-1.2.2-1.el6.noarch > 389-ds-base-1.2.11.15-22.el6_4.x86_64 > 389-ds-base-libs-1.2.11.15-22.el6_4.x86_64 > 389-ds-console-1.2.6-1.el6.noarch > 389-ds-console-doc-1.2.6-1.el6.noarch > 389-dsgw-1.1.10-1.el6.x86_64 > > Reragds, Mitja > > > On 17. 07. 2013 09:52, Mitja MiheliÄ wrote: >> >> >> It may be best if I removed all 389DS related data from both of >> the consumer servers and start fresh. If they crash again I >> will send the relevant stack traces. >
389-users@lists.fedoraproject.org