[389-users] replication from 1.2.8.3 to 1.2.10.4

Robert Viduya robert at oit.gatech.edu
Fri Jul 13 14:02:10 UTC 2012


>> I've enabled the core dump stuff, but now I can't seem to get it to crash.  But I'm still getting the changelog messages in the error logs whenever I restart.  In addition, the hub server keeps running out of disk space.  I tracked it down to the access log filling up with MOD messages from replication.  It looks like changes are coming down from our 1.2.8 servers and being applied over and over again.  As an example, one of our entries was modified three times today, and on all our other machines I see the following in the access log file:
>> 
>> # egrep 78b8cc871a3cda9f352580e797b270bc access
>> [12/Jul/2012:11:00:59 -0400] conn=383671 op=3145 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
>> [12/Jul/2012:11:01:24 -0400] conn=383671 op=3153 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
>> [12/Jul/2012:11:01:38 -0400] conn=383671 op=3157 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
>> 
>> But on the problematic hub server, I see:
>> 
>> # egrep 78b8cc871a3cda9f352580e797b270bc access
>> [12/Jul/2012:15:17:29 -0400] conn=2 op=58 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
>> [12/Jul/2012:15:17:29 -0400] conn=2 op=60 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
>> [12/Jul/2012:15:17:29 -0400] conn=2 op=61 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
>> [12/Jul/2012:15:24:42 -0400] conn=6 op=169 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
>> [12/Jul/2012:15:24:42 -0400] conn=6 op=171 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
>> [12/Jul/2012:15:24:42 -0400] conn=6 op=172 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
>> [12/Jul/2012:15:24:45 -0400] conn=3 op=170 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
>> [12/Jul/2012:15:24:45 -0400] conn=3 op=172 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
>> [12/Jul/2012:15:24:45 -0400] conn=3 op=173 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
>> [12/Jul/2012:15:24:51 -0400] conn=2 op=2234 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
>> [12/Jul/2012:15:24:51 -0400] conn=2 op=2236 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
>> [12/Jul/2012:15:24:51 -0400] conn=2 op=2237 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
>> [12/Jul/2012:15:24:55 -0400] conn=6 op=2233 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
>> [12/Jul/2012:15:24:55 -0400] conn=6 op=2235 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
>> [12/Jul/2012:15:24:55 -0400] conn=6 op=2236 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
>> [12/Jul/2012:15:24:57 -0400] conn=3 op=2234 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
>> ...
>> 
>> I truncated the output for brevity, but there's over 250 MODs to that one object.  It's as if the server isn't able to do the replication bookkeeping and is accepting changes over and over again.  Eventually the disk fills up.
> Do you see error messages from the supplier suggesting that it is attempting to send the operation but failing and retrying?

No, there's nothing in the error logs on the supplier side.

> Do all of these operations have the same CSN?  The csn will be logged with the RESULT line for the operation.  Also, what is the err=? for the MOD operations?  err=0?  Some other code?

Here's some sample out, again, limited for brevity.  Most of the RESULT lines don't have a CSN, just the first few.  All the err= codes are 0.  I've grepped out just the DN sample from my previous mail, again for brevity.  There's a lot more DNs being reported:

[12/Jul/2012:15:17:29 -0400] conn=2 op=58 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:17:29 -0400] conn=2 op=58 RESULT err=0 tag=103 nentries=0 etime=0 csn=4fff2000000000330000
[12/Jul/2012:15:17:29 -0400] conn=2 op=60 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:17:29 -0400] conn=2 op=60 RESULT err=0 tag=103 nentries=0 etime=0 csn=4fff200f000000330000
[12/Jul/2012:15:17:29 -0400] conn=2 op=61 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:17:29 -0400] conn=2 op=61 RESULT err=0 tag=103 nentries=0 etime=0 csn=4fff2027000000330000
[12/Jul/2012:15:24:42 -0400] conn=6 op=169 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:42 -0400] conn=6 op=169 RESULT err=0 tag=103 nentries=0 etime=0
[12/Jul/2012:15:24:42 -0400] conn=6 op=171 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:42 -0400] conn=6 op=171 RESULT err=0 tag=103 nentries=0 etime=0
[12/Jul/2012:15:24:42 -0400] conn=6 op=172 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:42 -0400] conn=6 op=172 RESULT err=0 tag=103 nentries=0 etime=0
[12/Jul/2012:15:24:45 -0400] conn=3 op=170 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:45 -0400] conn=3 op=170 RESULT err=0 tag=103 nentries=0 etime=0
[12/Jul/2012:15:40:34 -0400] conn=3 op=170 MOD dn="gtdirguid=64898416edc9887656a2f933ae48a113,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:40:34 -0400] conn=3 op=170 RESULT err=0 tag=103 nentries=0 etime=0 csn=4fff25b5000300330000
[12/Jul/2012:15:24:45 -0400] conn=3 op=172 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:45 -0400] conn=3 op=172 RESULT err=0 tag=103 nentries=0 etime=0
[12/Jul/2012:15:40:34 -0400] conn=3 op=172 MOD dn="gtdirguid=e824607afc4eb02a105b633bcbf9e7c1,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:40:34 -0400] conn=3 op=172 RESULT err=0 tag=103 nentries=0 etime=0 csn=4fff25b6000100330000
[12/Jul/2012:16:03:44 -0400] conn=3 op=172 EXT oid="2.16.840.1.113730.3.5.5" name="Netscape Replication End Session"
[12/Jul/2012:16:03:44 -0400] conn=3 op=172 RESULT err=0 tag=120 nentries=0 etime=0
[12/Jul/2012:15:24:45 -0400] conn=3 op=173 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:45 -0400] conn=3 op=173 RESULT err=0 tag=103 nentries=0 etime=0
[12/Jul/2012:15:40:34 -0400] conn=3 op=173 MOD dn="gtdirguid=427dd677597bb6143e227143e771b811,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:40:34 -0400] conn=3 op=173 RESULT err=0 tag=103 nentries=0 etime=0 csn=4fff25b6000200330000
[12/Jul/2012:16:03:47 -0400] conn=3 op=173 EXT oid="2.16.840.1.113730.3.5.12" name="replication-multimaster-extop"
[12/Jul/2012:16:03:47 -0400] conn=3 op=173 RESULT err=0 tag=120 nentries=0 etime=0
[12/Jul/2012:15:24:51 -0400] conn=2 op=2234 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:51 -0400] conn=2 op=2234 RESULT err=0 tag=103 nentries=0 etime=0
[12/Jul/2012:15:24:51 -0400] conn=2 op=2236 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:51 -0400] conn=2 op=2236 RESULT err=0 tag=103 nentries=0 etime=0
[12/Jul/2012:15:24:51 -0400] conn=2 op=2237 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:51 -0400] conn=2 op=2237 RESULT err=0 tag=103 nentries=0 etime=0
[12/Jul/2012:15:24:55 -0400] conn=6 op=2233 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:55 -0400] conn=6 op=2233 RESULT err=0 tag=103 nentries=0 etime=0
[12/Jul/2012:15:24:55 -0400] conn=6 op=2235 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:55 -0400] conn=6 op=2235 RESULT err=0 tag=103 nentries=0 etime=0
[12/Jul/2012:15:24:55 -0400] conn=6 op=2236 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:55 -0400] conn=6 op=2236 RESULT err=0 tag=103 nentries=0 etime=0
[12/Jul/2012:15:24:57 -0400] conn=3 op=2234 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:57 -0400] conn=3 op=2234 RESULT err=0 tag=103 nentries=0 etime=0

The upgrade to 1.2.10.12 seems to have fixed the issue however, I'm not seeing these repeated entries anymore nor am I seeing changelog error messages when I restart the server.  I know you're all working on 1.2.11, but are there any major problems with 1.2.10.12 that's keeping it from being pushed to stable?  1.2.10.4 definitely isn't working for us.


More information about the 389-users mailing list