[389-users] problem initializing replica

Thu Aug 16 14:19:17 UTC 2012

On 08/15/2012 11:22 PM, Vladimir Elisseev wrote:
> Rich,
>
> I don't have RHDS (389-ds-base-1.2.10.2-18.el6_3.x86_64) servers
> available for tests at the moment and I've tried to reproduce this issue
> on CentOS 6.3 (389-ds-base-1.2.9.14-1.el6.x86_64), but everything works
> fine without modifying nsslapd-maxbersize... I've checked the
> nsslapd-maxbersize value and it "0" for CentOS as well as for RHDS. What
> does that mean?

I don't know, but the error message would seem to indicate that the 
incoming BER message was too big, and you need to increase the 
nsslapd-maxbersize.

>
> Regards,
> Vlad.
>
> On Wed, 2012-08-15 at 09:04 -0600, Rich Megginson wrote:
>> On 08/15/2012 09:02 AM, Vladimir Elisseev wrote:
>>> Rich,
>>>
>>> I think this could be the case, thanks! This explains why initializing
>>> replica using LDIF file succeeded as well! I've saved one of the entries
>>> with a lot of "member" attributes and the size of just this entry is
>>> over 1Mb, but maybe there's bigger... I'll be able to test it only
>>> tomorrow. BTW, could it affect proper replication as well?
>> Yes.  You should increase the nsslapd-maxbersize
>>
>>> Regards,
>>> Vlad.
>>>
>>>
>>>
>>> On Wed, 2012-08-15 at 08:36 -0600, Rich Megginson wrote:
>>>> On 08/14/2012 03:10 PM, Vladimir Elisseev wrote:
>>>>> Rich,
>>>>>
>>>>> First of all thanks for helping me. Below is the error log and
>>>>> corresponding access and log entries. The strange thing is that I've
>>>>> done the same for "o=netscaperoot" successfully, but user data it fails.
>>>>>
>>>>> error log
>>>>> -----------------------------------------------------------------------------------------------------------------
>>>>> [14/Aug/2012:13:26:04 +0200] NSMMReplicationPlugin - multimaster_be_state_change: replica dc=cids is going offline; disabling replication
>>>>> [14/Aug/2012:13:26:04 +0200] - WARNING: Import is running with nsslapd-db-private-import-mem on; No other process is allowed to access the database
>>>>> [14/Aug/2012:13:26:04 +0200] - ERROR bulk import abandoned
>>>>> -----------------------------------------------------------------------------------------------------------------
>>>>>
>>>>> access log
>>>>> -----------------------------------------------------------------------------------------------------------------
>>>>> [14/Aug/2012:13:26:01 +0200] conn=534 op=-1 fd=69 closed - Encountered end of file.
>>>>> [14/Aug/2012:13:26:01 +0200] conn=525 op=18 EXT oid="2.16.840.1.113730.3.5.12" name="replication-multimaster-extop"
>>>>> [14/Aug/2012:13:26:01 +0200] conn=525 op=18 RESULT err=0 tag=120 nentries=0 etime=0
>>>>> [14/Aug/2012:13:26:01 +0200] conn=525 op=19 EXT oid="2.16.840.1.113730.3.5.5" name="Netscape Replication End Session"
>>>>> [14/Aug/2012:13:26:01 +0200] conn=525 op=19 RESULT err=0 tag=120 nentries=0 etime=0
>>>>> [14/Aug/2012:13:26:03 +0200] conn=535 fd=69 slot=69 SSL connection from 10.233.128.3 to 10.233.128.217
>>>>> [14/Aug/2012:13:26:03 +0200] conn=535 op=-1 fd=69 closed - Encountered end of file.
>>>>> [14/Aug/2012:13:26:03 +0200] conn=520 op=21 UNBIND
>>>>> [14/Aug/2012:13:26:03 +0200] conn=520 op=21 fd=67 closed - U1
>>>>> [14/Aug/2012:13:26:04 +0200] conn=536 fd=67 slot=67 SSL connection from 10.233.128.216 to 10.233.128.217
>>>>> [14/Aug/2012:13:26:04 +0200] conn=536 SSL 256-bit AES
>>>>> [14/Aug/2012:13:26:04 +0200] conn=536 op=0 BIND dn="cn=replication manager,cn=config" method=128 version=3
>>>>> [14/Aug/2012:13:26:04 +0200] conn=536 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn="cn=replication manager,cn=config"
>>>>> [14/Aug/2012:13:26:04 +0200] conn=536 op=1 SRCH base="" scope=0 filter="(objectClass=*)" attrs="supportedControl supportedExtension"
>>>>> [14/Aug/2012:13:26:04 +0200] conn=536 op=1 RESULT err=0 tag=101 nentries=1 etime=0
>>>>> [14/Aug/2012:13:26:04 +0200] conn=536 op=2 SRCH base="" scope=0 filter="(objectClass=*)" attrs="supportedControl supportedExtension"
>>>>> [14/Aug/2012:13:26:04 +0200] conn=536 op=2 RESULT err=0 tag=101 nentries=1 etime=0
>>>>> [14/Aug/2012:13:26:04 +0200] conn=536 op=3 EXT oid="2.16.840.1.113730.3.5.12" name="replication-multimaster-extop"
>>>>> [14/Aug/2012:13:26:04 +0200] conn=536 op=3 RESULT err=0 tag=120 nentries=0 etime=0
>>>>> [14/Aug/2012:13:26:04 +0200] conn=536 op=69 RESULT err=0 tag=120 nentries=0 etime=0
>>>>> [14/Aug/2012:13:26:04 +0200] conn=536 op=70 EXT oid="2.16.840.1.113730.3.5.6" name="Netscape Replication Total Update Entry"
>>>>> [14/Aug/2012:13:26:04 +0200] conn=536 op=70 RESULT err=0 tag=120 nentries=0 etime=0
>>>>> -----------------------------------------------------------------------------------------------------------------
>>>>> a lot of the same entries as these two above
>>>>> -----------------------------------------------------------------------------------------------------------------
>>>>> [14/Aug/2012:13:26:04 +0200] conn=536 op=71 EXT oid="2.16.840.1.113730.3.5.6" name="Netscape Replication Total Update Entry"
>>>>> [14/Aug/2012:13:26:04 +0200] conn=536 op=116 EXT oid="2.16.840.1.113730.3.5.6" name="Netscape Replication Total Update Entry"
>>>>> [14/Aug/2012:13:26:04 +0200] conn=536 op=116 RESULT err=0 tag=120 nentries=0 etime=0
>>>>> [14/Aug/2012:13:26:04 +0200] conn=536 op=-1 fd=67 closed error 34 (Numerical result out of range) - B2
>>>> Hmm - this is very interesting - err 34 - this means the directory
>>>> server received a packet that was too big.  What sorts of entries are
>>>> you replicating?  Do you have very large group entries?  Try increasing
>>>> the nsslapd-maxbersize in cn=config.
>>>>
>>>>> -----------------------------------------------------------------------------------------------------------------
>>>>>
>>>>> Regards,
>>>>> Vlad
>>>>>
>>>>> On Tue, 2012-08-14 at 14:24 -0600, Rich Megginson wrote:
>>>>>> On 08/14/2012 11:26 AM, Vladimir Elisseev wrote:
>>>>>>> Version of 389-ds-base is 1.2.10.2.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Vlad.
>>>>>>>
>>>>>>> On Tue, 2012-08-14 at 11:21 -0600, Rich Megginson wrote:
>>>>>>>> On 08/14/2012 10:50 AM, Vladimir Elisseev wrote:
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> I've problems for initializing replica from Admin console or using
>>>>>>>>> ldapmodify. Although, I'm able to initialize replica from LDIF file
>>>>>>>>> successfully. Below is a snip from errorlog:
>>>>>>>>>
>>>>>>>>> ************* snip start *****************
>>>>>>>>> [14/Aug/2012:15:09:04 +0200] NSMMReplicationPlugin - multimaster_be_state_change: replica dc=cids is going offline; disabling replication
>>>>>>>>> [14/Aug/2012:15:09:04 +0200] - WARNING: Import is running with nsslapd-db-private-import-mem on; No other process is allowed to access the database
>>>>>>>>> [14/Aug/2012:15:09:04 +0200] - ERROR bulk import abandoned
>>>>>> Can you paste excerpts of your access log from around this time?  And
>>>>>> also - when was the last EXT operation from the access log before
>>>>>> 14/Aug/2012:15:09:04 +0200?
>>>>>>
>>>>>>>>> [14/Aug/2012:15:09:04 +0200] - import userRoot: Aborting all Import threads...
>>>>>>>>> [14/Aug/2012:15:09:11 +0200] - import userRoot: Import threads aborted.
>>>>>>>>> [14/Aug/2012:15:09:11 +0200] - import userRoot: Closing files...
>>>>>>>>> [14/Aug/2012:15:09:11 +0200] - libdb: userRoot/cIDSMemberOf.db4: unable to flush: No such file or directory
>>>>>>>>> [14/Aug/2012:15:09:11 +0200] - libdb: userRoot/mail.db4: unable to flush: No such file or directory
>>>>>>>>> [14/Aug/2012:15:09:11 +0200] - libdb: userRoot/nsuniqueid.db4: unable to flush: No such file or directory
>>>>>>>>> [14/Aug/2012:15:09:11 +0200] - libdb: userRoot/id2entry.db4: unable to flush: No such file or directory
>>>>>>>>> [14/Aug/2012:15:09:11 +0200] - libdb: userRoot/sn.db4: unable to flush: No such file or directory
>>>>>>>>> [14/Aug/2012:15:09:11 +0200] - libdb: userRoot/objectclass.db4: unable to flush: No such file or directory
>>>>>>>>> [14/Aug/2012:15:09:11 +0200] - libdb: userRoot/ou.db4: unable to flush: No such file or directory
>>>>>>>>> [14/Aug/2012:15:09:11 +0200] - libdb: userRoot/aci.db4: unable to flush: No such file or directory
>>>>>>>>> [14/Aug/2012:15:09:11 +0200] - libdb: userRoot/cIDSEntityID.db4: unable to flush: No such file or directory
>>>>>>>>> [14/Aug/2012:15:09:11 +0200] - libdb: userRoot/cn.db4: unable to flush: No such file or directory
>>>>>>>>> [14/Aug/2012:15:09:11 +0200] - libdb: userRoot/entryrdn.db4: unable to flush: No such file or directory
>>>>>>>>> [14/Aug/2012:15:09:11 +0200] - libdb: userRoot/member.db4: unable to flush: No such file or directory
>>>>>>>>> [14/Aug/2012:15:09:11 +0200] - libdb: userRoot/telephoneNumber.db4: unable to flush: No such file or directory
>>>>>>>>> [14/Aug/2012:15:09:11 +0200] - libdb: userRoot/parentid.db4: unable to flush: No such file or directory
>>>>>>>>> [14/Aug/2012:15:09:11 +0200] - import userRoot: Import failed.
>>>>>>>>> [14/Aug/2012:15:09:11 +0200] - process_bulk_import_op: NULL target sdn
>>>>>>>>> ************* snip end *****************
>>>>>>>>>
>>>>>>>>> These databases are for custom indexes, but I have no clue why they
>>>>>>>>> aren't created automatically (all the indexes as well as custom schema
>>>>>>>>> has been defined before the initialization). I'd greatly appreciate any
>>>>>>>>> help/thoughts.
>>>>>> They weren't created because it appears the replica init (bulk import)
>>>>>> was aborted before it could be started.  I'd like to find out why that
>>>>>> happened.
>>>>>>
>>>>>>>>> Thanks in advance,
>>>>>>>>> Vlad.
>>>>>>>> What are your supplier and consumer platforms?  What versions of
>>>>>>>> 389-ds-base?
>>>>>>>>> --
>>>>>>>>> 389 users mailing list
>>>>>>>>> 389-users at lists.fedoraproject.org
>>>>>>>>> https://admin.fedoraproject.org/mailman/listinfo/389-users
>
> --
> 389 users mailing list
> 389-users at lists.fedoraproject.org
> https://admin.fedoraproject.org/mailman/listinfo/389-users