Hello all,
We're running into this error message on a full update between two dirsrvs of version:
389 Project 389-Directory/1.2.6 B2010.238.2133
The error message is:
[26/Sep/2010:15:03:35 -0400] - sasl_io_start_packet: failed - read only 3 bytes of sasl packet length on connection 4
According to the source code:
/* * NOTE: A better way to do this would be to read the bytes and add them to· * sp->encrypted_buffer - if offset < 4, tell caller we didn't read enough * bytes yet - if offset >= 4, decode the length and proceed. However, it * is highly unlikely that a request to read 4 bytes will return < 4 bytes, * perhaps only in error conditions, in which case the ret < 0 case above * will run */
Uh. Maybe our network is strange, maybe we've run into a different error condition, but this seems quite poor...
Cheers, Edward
We also have some old servers lying around, and we've seen some interesting differences doing replication between them and the new servers (I didn't report them initially because I assume a heterogenous LDAP master deployment is not... really supported.) But this information might be helpful for debugging.
New version is: 1.2.6 B2010.238.2133, old version is: 1.2.5 B2010.012.2024
* Incremental replication from new to old fails, with
[23/Sep/2010:18:42:57 -0400] NSMMReplicationPlugin - agmt="cn=GSSAPI Replication to real-mccoy.mit.edu" (real-mccoy:389): Unable to parse the response to the startReplication extended operation. Replication is aborting.
* Incremental replication from old to new, old to old and new to new succeeds
* Total update from old to new and old to old succeeds
* Total update from new to new fails with the aforementioned bug
* Total update from new to old is untested
If you would like us to rebuild fedora-389-ds with the corresponding source patched to read properly, we can do that.
Edward
Excerpts from Edward Z. Yang's message of Sun Sep 26 15:13:13 -0400 2010:
Hello all,
We're running into this error message on a full update between two dirsrvs of version:
389 Project 389-Directory/1.2.6 B2010.238.2133
The error message is:
[26/Sep/2010:15:03:35 -0400] - sasl_io_start_packet: failed - read only 3 bytes of sasl packet length on connection 4
According to the source code:
/*
- NOTE: A better way to do this would be to read the bytes and add them to·
- sp->encrypted_buffer - if offset < 4, tell caller we didn't read enough
- bytes yet - if offset >= 4, decode the length and proceed. However, it
- is highly unlikely that a request to read 4 bytes will return < 4 bytes,
- perhaps only in error conditions, in which case the ret < 0 case above
- will run
*/
Uh. Maybe our network is strange, maybe we've run into a different error condition, but this seems quite poor...
Cheers, Edward
Edward Z. Yang wrote:
We also have some old servers lying around, and we've seen some interesting differences doing replication between them and the new servers (I didn't report them initially because I assume a heterogenous LDAP master deployment is not... really supported.) But this information might be helpful for debugging.
New version is: 1.2.6 B2010.238.2133, old version is: 1.2.5 B2010.012.2024
- Incremental replication from new to old fails, with
[23/Sep/2010:18:42:57 -0400] NSMMReplicationPlugin - agmt="cn=GSSAPI Replication to real-mccoy.mit.edu" (real-mccoy:389): Unable to parse the response to the startReplication extended operation. Replication is aborting.
Does this happen in conjunction with the sasl_io error? If not, how often does it happen? Is it easily reproducible? Does it appear to be the same as https://bugzilla.redhat.com/show_bug.cgi?id=547503 ? Are you using a firewall (hardware or software)?
Incremental replication from old to new, old to old and new to new succeeds
Total update from old to new and old to old succeeds
Total update from new to new fails with the aforementioned bug
Total update from new to old is untested
If you would like us to rebuild fedora-389-ds with the corresponding source patched to read properly, we can do that.
Edward
Excerpts from Edward Z. Yang's message of Sun Sep 26 15:13:13 -0400 2010:
Hello all,
We're running into this error message on a full update between two dirsrvs of version:
389 Project 389-Directory/1.2.6 B2010.238.2133
The error message is:
[26/Sep/2010:15:03:35 -0400] - sasl_io_start_packet: failed - read only 3 bytes of sasl packet length on connection 4
According to the source code:
/*
- NOTE: A better way to do this would be to read the bytes and add them to·
- sp->encrypted_buffer - if offset < 4, tell caller we didn't read enough
- bytes yet - if offset >= 4, decode the length and proceed. However, it
- is highly unlikely that a request to read 4 bytes will return < 4 bytes,
- perhaps only in error conditions, in which case the ret < 0 case above
- will run
*/
Uh. Maybe our network is strange, maybe we've run into a different error condition, but this seems quite poor...
Cheers, Edward
-- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
Excerpts from Rich Megginson's message of Mon Sep 27 10:35:44 -0400 2010:
Does this happen in conjunction with the sasl_io error?
I don't see any sasl_io errors. In fact, that seems like the only relevant log entry that shows up.
If not, how often does it happen?
It was happening fairly regularly yesterday, but it seems to have recovered today. We’ll keep an eye on it.
Is it easily reproducible?
Not obviously.
Does it appear to be the same as https://bugzilla.redhat.com/show_bug.cgi?id=547503 ?
That's a very complicated bug, but it appears to have different character than ours.
Are you using a firewall (hardware or software)?
We turn off Fedora's built in Firewall.
Don't sweat too much about this one: it seems intermittent and it involves an old version of dirsrv that we won't be using shortly.
Cheers, Edward
Edward Z. Yang wrote:
Excerpts from Rich Megginson's message of Mon Sep 27 10:35:44 -0400 2010:
Does this happen in conjunction with the sasl_io error?
I don't see any sasl_io errors. In fact, that seems like the only relevant log entry that shows up.
If not, how often does it happen?
It was happening fairly regularly yesterday, but it seems to have recovered today. We’ll keep an eye on it.
Is it easily reproducible?
Not obviously.
Does it appear to be the same as https://bugzilla.redhat.com/show_bug.cgi?id=547503 ?
That's a very complicated bug, but it appears to have different character than ours.
Are you using a firewall (hardware or software)?
We turn off Fedora's built in Firewall.
Don't sweat too much about this one: it seems intermittent and it involves an old version of dirsrv that we won't be using shortly.
But you said you could reproduce on a 1.2.6 master and 1.2.6 consumer?
Cheers, Edward
Excerpts from Rich Megginson's message of Mon Sep 27 11:37:27 -0400 2010:
But you said you could reproduce on a 1.2.6 master and 1.2.6 consumer?
That's a different, more serious bug, the full update one. [1] I assumed you were referring to the incremental update across 1.2.5-1.2.6.
Cheers, Edward
Excerpts from Rich Megginson's message of Mon Sep 27 10:35:44 -0400 2010:
Does this happen in conjunction with the sasl_io error? If not, how often does it happen? Is it easily reproducible? Does it appear to be the same as https://bugzilla.redhat.com/show_bug.cgi?id=547503 ? Are you using a firewall (hardware or software)?
I see it happening again (the new dirsrv -> old dirsrv parse error problem.) It seems to occur immediately after a full update, maybe? Any data you'd like me to collect?
Edward
Edward Z. Yang wrote:
Excerpts from Rich Megginson's message of Mon Sep 27 10:35:44 -0400 2010:
Does this happen in conjunction with the sasl_io error? If not, how often does it happen? Is it easily reproducible? Does it appear to be the same as https://bugzilla.redhat.com/show_bug.cgi?id=547503 ? Are you using a firewall (hardware or software)?
I see it happening again (the new dirsrv -> old dirsrv parse error problem.) It seems to occur immediately after a full update, maybe? Any data you'd like me to collect?
Does it happen if the 1.2.6 server is the consumer? The problem I fixed is consumer related.
Edward
Edward Z. Yang wrote:
Excerpts from Rich Megginson's message of Tue Sep 28 15:31:55 -0400 2010:
Does it happen if the 1.2.6 server is the consumer? The problem I fixed is consumer related.
Yep. (Yeah, the arrows are ambiguous.)
Edward
Can you turn on Connection management logging to the error log? http://directory.fedoraproject.org/wiki/FAQ#Troubleshooting On the consumer, then reproduce the problem.
Edward Z. Yang wrote:
Hello all,
We're running into this error message on a full update between two dirsrvs of version:
389 Project 389-Directory/1.2.6 B2010.238.2133
The error message is:
[26/Sep/2010:15:03:35 -0400] - sasl_io_start_packet: failed - read only 3 bytes of sasl packet length on connection 4
According to the source code:
/*
- NOTE: A better way to do this would be to read the bytes and add them to·
- sp->encrypted_buffer - if offset < 4, tell caller we didn't read enough
- bytes yet - if offset >= 4, decode the length and proceed. However, it
- is highly unlikely that a request to read 4 bytes will return < 4 bytes,
- perhaps only in error conditions, in which case the ret < 0 case above
- will run
*/
Uh. Maybe our network is strange, maybe we've run into a different error condition, but this seems quite poor...
Ok. Please file a bug for this issue. Are you using a load balancer or failover device?
Cheers, Edward -- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
Excerpts from Rich Megginson's message of Mon Sep 27 10:31:53 -0400 2010:
Ok. Please file a bug for this issue.
Will do.
Are you using a load balancer or failover device?
We are using LVS to load balance HTTP requests, but LDAP is not loadbalanced. However, we have an internal backend network which we use to service LDAP replication. Some characteristic configuration:
eth1 Link encap:Ethernet HWaddr AC:DE:48:00:20:02 inet addr:172.21.0.228 Bcast:172.21.255.255 Mask:255.255.0.0 inet6 addr: fe80::aede:48ff:fe00:2002/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:82813649 errors:0 dropped:0 overruns:0 frame:0 TX packets:63692738 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:78380787428 (72.9 GiB) TX bytes:8426941129 (7.8 GiB) Interrupt:15
and
Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 18.181.0.57 172.21.0.57 255.255.255.255 UGH 0 0 0 eth1 18.181.0.56 172.21.0.56 255.255.255.255 UGH 0 0 0 eth1 18.181.0.237 172.21.0.237 255.255.255.255 UGH 0 0 0 eth1 18.181.0.235 172.21.0.235 255.255.255.255 UGH 0 0 0 eth1 18.181.0.234 172.21.0.234 255.255.255.255 UGH 0 0 0 eth1 18.181.0.47 172.21.0.47 255.255.255.255 UGH 0 0 0 eth1 18.181.0.167 172.21.0.167 255.255.255.255 UGH 0 0 0 eth1 18.181.0.228 172.21.0.228 255.255.255.255 UGH 0 0 0 eth1 18.181.0.53 172.21.0.53 255.255.255.255 UGH 0 0 0 eth1 18.181.0.52 172.21.0.52 255.255.255.255 UGH 0 0 0 eth1 18.181.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0 169.254.0.0 0.0.0.0 255.255.0.0 U 1002 0 0 eth0 169.254.0.0 0.0.0.0 255.255.0.0 U 1003 0 0 eth1 172.21.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth1 0.0.0.0 18.181.0.1 0.0.0.0 UG 0 0 0 eth0
So requests that are transmitted to, say, 'better-mousetrap.mit.edu' are routed over eth1 to gateway 172.21.0.57, which is better mousetrap.
Cheers, Edward
Posted: https://bugzilla.redhat.com/show_bug.cgi?id=637852
Excerpts from Rich Megginson's message of Mon Sep 27 10:31:53 -0400 2010:
Edward Z. Yang wrote:
Hello all,
We're running into this error message on a full update between two dirsrvs of version:
389 Project 389-Directory/1.2.6 B2010.238.2133
The error message is:
[26/Sep/2010:15:03:35 -0400] - sasl_io_start_packet: failed - read only 3 bytes of sasl packet length on connection 4
According to the source code:
/*
- NOTE: A better way to do this would be to read the bytes and add them to·
- sp->encrypted_buffer - if offset < 4, tell caller we didn't read enough
- bytes yet - if offset >= 4, decode the length and proceed. However, it
- is highly unlikely that a request to read 4 bytes will return < 4 bytes,
- perhaps only in error conditions, in which case the ret < 0 case above
- will run
*/
Uh. Maybe our network is strange, maybe we've run into a different error condition, but this seems quite poor...
Ok. Please file a bug for this issue. Are you using a load balancer or failover device?
Cheers, Edward -- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
389-users@lists.fedoraproject.org