Attached patch rewrites almost entire memberof plugin. It heavily utilizes hash tables instead of lists and arrays and it introduces concept of reference counting which should heavily optimize all operations when no loops are present in the user/group tree.
I tested the patch by running sysdb test suite, all tests passed. I also attach a document where basic concepts of the plugin are explained.
Please note that the patch is functional, although I don't consider it ready. I just want to get pre-ACK or some comments about the patch design. I have yet to implement the recompute task, I will work on that later.
Jan
On Tue, 2011-08-30 at 17:54 +0200, Jan Zelený wrote:
Attached patch rewrites almost entire memberof plugin. It heavily utilizes hash tables instead of lists and arrays and it introduces concept of reference counting which should heavily optimize all operations when no loops are present in the user/group tree.
I tested the patch by running sysdb test suite, all tests passed. I also attach a document where basic concepts of the plugin are explained.
Please note that the patch is functional, although I don't consider it ready. I just want to get pre-ACK or some comments about the patch design. I have yet to implement the recompute task, I will work on that later.
Jan, yesterday I an Stephen got together to start a review of the patch. Due to the complexity we tried to go thorugh the code together to make sure we understand what is going on.
Also due to the complexity and need to refresh memory we were only be able to go through most of the "add" code only, we were not able to review the update or delete code paths yet.
Attached find a diff file with comments in C++ style (so that it is clear they all need to be resolved before the patch can be considered ok, as they show as a big red line in our editors) that you can integrate in your tree and check one by one.
There are some questions in there, but in general we noticed a very annoying lack of comments in the code itself. The accompaning PDF is nice and all, but unless the logic is embedded and explained in the code (like it was with the previous code) it will be ablocker and will make review more difficult.
That said, so far so good, although we need to more carefully review the refcounting stuff, we got to it late and only marginally reviewed it. Need more time to fully grasp all details.
HTH, Simo.
On Thu, 2011-09-15 at 10:43 -0400, Simo Sorce wrote:
On Tue, 2011-08-30 at 17:54 +0200, Jan Zelený wrote:
Attached patch rewrites almost entire memberof plugin. It heavily utilizes hash tables instead of lists and arrays and it introduces concept of reference counting which should heavily optimize all operations when no loops are present in the user/group tree.
I tested the patch by running sysdb test suite, all tests passed. I also attach a document where basic concepts of the plugin are explained.
Please note that the patch is functional, although I don't consider it ready. I just want to get pre-ACK or some comments about the patch design. I have yet to implement the recompute task, I will work on that later.
Jan, yesterday I an Stephen got together to start a review of the patch. Due to the complexity we tried to go thorugh the code together to make sure we understand what is going on.
Also due to the complexity and need to refresh memory we were only be able to go through most of the "add" code only, we were not able to review the update or delete code paths yet.
Attached find a diff file with comments in C++ style (so that it is clear they all need to be resolved before the patch can be considered ok, as they show as a big red line in our editors) that you can integrate in your tree and check one by one.
There are some questions in there, but in general we noticed a very annoying lack of comments in the code itself. The accompaning PDF is nice and all, but unless the logic is embedded and explained in the code (like it was with the previous code) it will be ablocker and will make review more difficult.
That said, so far so good, although we need to more carefully review the refcounting stuff, we got to it late and only marginally reviewed it. Need more time to fully grasp all details.
There was one other thing that we noticed that Simo forgot to mention. We realized that the original design was flawed in one major way: we should not handle "missing" members at all. If we encounter one, we should consider it a critical error and abort.
The original reasoning was that we were using the memberOf plugin to handle unknown members, but as the SSSD evolved, we're now handling this properly in the data providers and making sure the membership data is ready before committing it. We now have a three-pass mechanism of saving data in the SSSD: 1) Save users 2) Save all groups (empty) 3) Add members to groups
Doing it this way, we no longer need the checks for missing members in the memberOf plugin, which should simplify implementation a moderate amount.
On Thu, 2011-09-15 at 10:53 -0400, Stephen Gallagher wrote:
On Thu, 2011-09-15 at 10:43 -0400, Simo Sorce wrote:
On Tue, 2011-08-30 at 17:54 +0200, Jan Zelený wrote:
Attached patch rewrites almost entire memberof plugin. It heavily utilizes hash tables instead of lists and arrays and it introduces concept of reference counting which should heavily optimize all operations when no loops are present in the user/group tree.
I tested the patch by running sysdb test suite, all tests passed. I also attach a document where basic concepts of the plugin are explained.
Please note that the patch is functional, although I don't consider it ready. I just want to get pre-ACK or some comments about the patch design. I have yet to implement the recompute task, I will work on that later.
Jan, yesterday I an Stephen got together to start a review of the patch. Due to the complexity we tried to go thorugh the code together to make sure we understand what is going on.
Also due to the complexity and need to refresh memory we were only be able to go through most of the "add" code only, we were not able to review the update or delete code paths yet.
Attached find a diff file with comments in C++ style (so that it is clear they all need to be resolved before the patch can be considered ok, as they show as a big red line in our editors) that you can integrate in your tree and check one by one.
There are some questions in there, but in general we noticed a very annoying lack of comments in the code itself. The accompaning PDF is nice and all, but unless the logic is embedded and explained in the code (like it was with the previous code) it will be ablocker and will make review more difficult.
That said, so far so good, although we need to more carefully review the refcounting stuff, we got to it late and only marginally reviewed it. Need more time to fully grasp all details.
There was one other thing that we noticed that Simo forgot to mention. We realized that the original design was flawed in one major way: we should not handle "missing" members at all. If we encounter one, we should consider it a critical error and abort.
The original reasoning was that we were using the memberOf plugin to handle unknown members, but as the SSSD evolved, we're now handling this properly in the data providers and making sure the membership data is ready before committing it. We now have a three-pass mechanism of saving data in the SSSD:
- Save users
- Save all groups (empty)
- Add members to groups
Doing it this way, we no longer need the checks for missing members in the memberOf plugin, which should simplify implementation a moderate amount.
I think we added a comment about this in the diff, but yeah a proper explanation was missing (pun :). Thanks for the writeup :)
Simo.
On Thu, 2011-09-15 at 10:43 -0400, Simo Sorce wrote:
On Tue, 2011-08-30 at 17:54 +0200, Jan Zelený wrote:
Attached patch rewrites almost entire memberof plugin. It heavily utilizes hash tables instead of lists and arrays and it introduces concept of reference counting which should heavily optimize all operations when no loops are present in the user/group tree.
I tested the patch by running sysdb test suite, all tests passed. I also attach a document where basic concepts of the plugin are explained.
Please note that the patch is functional, although I don't consider it ready. I just want to get pre-ACK or some comments about the patch design. I have yet to implement the recompute task, I will work on that later.
Jan, yesterday I an Stephen got together to start a review of the patch. Due to the complexity we tried to go thorugh the code together to make sure we understand what is going on.
Also due to the complexity and need to refresh memory we were only be able to go through most of the "add" code only, we were not able to review the update or delete code paths yet.
Attached find a diff file with comments in C++ style (so that it is clear they all need to be resolved before the patch can be considered ok, as they show as a big red line in our editors) that you can integrate in your tree and check one by one.
There are some questions in there, but in general we noticed a very annoying lack of comments in the code itself. The accompaning PDF is nice and all, but unless the logic is embedded and explained in the code (like it was with the previous code) it will be ablocker and will make review more difficult.
That said, so far so good, although we need to more carefully review the refcounting stuff, we got to it late and only marginally reviewed it. Need more time to fully grasp all details.
There was one other thing that we noticed that Simo forgot to mention. We realized that the original design was flawed in one major way: we should not handle "missing" members at all. If we encounter one, we should consider it a critical error and abort.
The original reasoning was that we were using the memberOf plugin to handle unknown members, but as the SSSD evolved, we're now handling this properly in the data providers and making sure the membership data is ready before committing it. We now have a three-pass mechanism of saving data in the SSSD:
- Save users
- Save all groups (empty)
- Add members to groups
Doing it this way, we no longer need the checks for missing members in the memberOf plugin, which should simplify implementation a moderate amount.
Yes, I suspected that, but I left it there since it was in the original code. Thanks for noticing this, I will certainly modify the behavior. Just to make sure - I should test if the member object exists but if it doesn't, I should just exit with an error, right?
Thanks Jan
On Thu, 2011-09-15 at 17:14 +0200, Jan Zelený wrote:
On Thu, 2011-09-15 at 10:43 -0400, Simo Sorce wrote:
On Tue, 2011-08-30 at 17:54 +0200, Jan Zelený wrote:
Attached patch rewrites almost entire memberof plugin. It heavily utilizes hash tables instead of lists and arrays and it introduces concept of reference counting which should heavily optimize all operations when no loops are present in the user/group tree.
I tested the patch by running sysdb test suite, all tests passed. I also attach a document where basic concepts of the plugin are explained.
Please note that the patch is functional, although I don't consider it ready. I just want to get pre-ACK or some comments about the patch design. I have yet to implement the recompute task, I will work on that later.
Jan, yesterday I an Stephen got together to start a review of the patch. Due to the complexity we tried to go thorugh the code together to make sure we understand what is going on.
Also due to the complexity and need to refresh memory we were only be able to go through most of the "add" code only, we were not able to review the update or delete code paths yet.
Attached find a diff file with comments in C++ style (so that it is clear they all need to be resolved before the patch can be considered ok, as they show as a big red line in our editors) that you can integrate in your tree and check one by one.
There are some questions in there, but in general we noticed a very annoying lack of comments in the code itself. The accompaning PDF is nice and all, but unless the logic is embedded and explained in the code (like it was with the previous code) it will be ablocker and will make review more difficult.
That said, so far so good, although we need to more carefully review the refcounting stuff, we got to it late and only marginally reviewed it. Need more time to fully grasp all details.
There was one other thing that we noticed that Simo forgot to mention. We realized that the original design was flawed in one major way: we should not handle "missing" members at all. If we encounter one, we should consider it a critical error and abort.
The original reasoning was that we were using the memberOf plugin to handle unknown members, but as the SSSD evolved, we're now handling this properly in the data providers and making sure the membership data is ready before committing it. We now have a three-pass mechanism of saving data in the SSSD:
- Save users
- Save all groups (empty)
- Add members to groups
Doing it this way, we no longer need the checks for missing members in the memberOf plugin, which should simplify implementation a moderate amount.
Yes, I suspected that, but I left it there since it was in the original code. Thanks for noticing this, I will certainly modify the behavior. Just to make sure - I should test if the member object exists but if it doesn't, I should just exit with an error, right?
Yes, exactly. We feel that it should be the responsibility of the consumer of the sysdb to ensure that the objects are in place before attempting to link them together.
On Tue, 2011-08-30 at 17:54 +0200, Jan Zelený wrote:
Attached patch rewrites almost entire memberof plugin. It heavily utilizes hash tables instead of lists and arrays and it introduces concept of reference counting which should heavily optimize all operations when no loops are present in the user/group tree.
I tested the patch by running sysdb test suite, all tests passed. I also attach a document where basic concepts of the plugin are explained.
Please note that the patch is functional, although I don't consider it ready. I just want to get pre-ACK or some comments about the patch design. I have yet to implement the recompute task, I will work on that later.
Jan, yesterday I an Stephen got together to start a review of the patch. Due to the complexity we tried to go thorugh the code together to make sure we understand what is going on.
Also due to the complexity and need to refresh memory we were only be able to go through most of the "add" code only, we were not able to review the update or delete code paths yet.
Attached find a diff file with comments in C++ style (so that it is clear they all need to be resolved before the patch can be considered ok, as they show as a big red line in our editors) that you can integrate in your tree and check one by one.
There are some questions in there, but in general we noticed a very annoying lack of comments in the code itself. The accompaning PDF is nice and all, but unless the logic is embedded and explained in the code (like it was with the previous code) it will be ablocker and will make review more difficult.
That said, so far so good, although we need to more carefully review the refcounting stuff, we got to it late and only marginally reviewed it. Need more time to fully grasp all details.
HTH, Simo.
Simo, first I'd like to thank you both for the review. I realize it is quite difficult, I wouldn't want to do it myself.
I will look at the diff you wrote about ASAP. That said, I already have a modification which resolves some pretty big issues which I somehow managed to miss at first. It also modifies the full update code path so it can be used by the recompute task. I will certainly add some more comments, up until now I've been focusing on the code itself. Hopefully I'll have the patch complete by the end of the week (i.e. tomorrow) and I'll send it. If you wish, we can schedule a phone call to discuss some places which are the most difficult to comprehend.
Thanks Jan
On Thu, 2011-09-15 at 17:00 +0200, Jan Zelený wrote:
On Tue, 2011-08-30 at 17:54 +0200, Jan Zelený wrote:
Attached patch rewrites almost entire memberof plugin. It heavily utilizes hash tables instead of lists and arrays and it introduces concept of reference counting which should heavily optimize all operations when no loops are present in the user/group tree.
I tested the patch by running sysdb test suite, all tests passed. I also attach a document where basic concepts of the plugin are explained.
Please note that the patch is functional, although I don't consider it ready. I just want to get pre-ACK or some comments about the patch design. I have yet to implement the recompute task, I will work on that later.
Jan, yesterday I an Stephen got together to start a review of the patch. Due to the complexity we tried to go thorugh the code together to make sure we understand what is going on.
Also due to the complexity and need to refresh memory we were only be able to go through most of the "add" code only, we were not able to review the update or delete code paths yet.
Attached find a diff file with comments in C++ style (so that it is clear they all need to be resolved before the patch can be considered ok, as they show as a big red line in our editors) that you can integrate in your tree and check one by one.
There are some questions in there, but in general we noticed a very annoying lack of comments in the code itself. The accompaning PDF is nice and all, but unless the logic is embedded and explained in the code (like it was with the previous code) it will be ablocker and will make review more difficult.
That said, so far so good, although we need to more carefully review the refcounting stuff, we got to it late and only marginally reviewed it. Need more time to fully grasp all details.
HTH, Simo.
Simo, first I'd like to thank you both for the review. I realize it is quite difficult, I wouldn't want to do it myself.
I will look at the diff you wrote about ASAP. That said, I already have a modification which resolves some pretty big issues which I somehow managed to miss at first. It also modifies the full update code path so it can be used by the recompute task. I will certainly add some more comments, up until now I've been focusing on the code itself. Hopefully I'll have the patch complete by the end of the week (i.e. tomorrow) and I'll send it. If you wish, we can schedule a phone call to discuss some places which are the most difficult to comprehend.
Yes we should set up a call between me you and Stephen next week. so we can start a through review.
Also one other thing we forgot to mention is that we were wondering if you made any performance measurements.
Perf. was the main driver for this work, so we would like to see some numbers to make sure we did actually achieved perf. gains with the new patchset. It would be pretty lame to switch a very complex system with another very complex (and largely untested) system and find out we got little to no gain :-)
Simo.
On Thu, 2011-09-15 at 17:00 +0200, Jan Zelený wrote:
On Tue, 2011-08-30 at 17:54 +0200, Jan Zelený wrote:
Attached patch rewrites almost entire memberof plugin. It heavily utilizes hash tables instead of lists and arrays and it introduces concept of reference counting which should heavily optimize all operations when no loops are present in the user/group tree.
I tested the patch by running sysdb test suite, all tests passed. I also attach a document where basic concepts of the plugin are explained.
Please note that the patch is functional, although I don't consider it ready. I just want to get pre-ACK or some comments about the patch design. I have yet to implement the recompute task, I will work on that later.
Jan, yesterday I an Stephen got together to start a review of the patch. Due to the complexity we tried to go thorugh the code together to make sure we understand what is going on.
Also due to the complexity and need to refresh memory we were only be able to go through most of the "add" code only, we were not able to review the update or delete code paths yet.
Attached find a diff file with comments in C++ style (so that it is clear they all need to be resolved before the patch can be considered ok, as they show as a big red line in our editors) that you can integrate in your tree and check one by one.
There are some questions in there, but in general we noticed a very annoying lack of comments in the code itself. The accompaning PDF is nice and all, but unless the logic is embedded and explained in the code (like it was with the previous code) it will be ablocker and will make review more difficult.
That said, so far so good, although we need to more carefully review the refcounting stuff, we got to it late and only marginally reviewed it. Need more time to fully grasp all details.
HTH, Simo.
Simo, first I'd like to thank you both for the review. I realize it is quite difficult, I wouldn't want to do it myself.
I will look at the diff you wrote about ASAP. That said, I already have a modification which resolves some pretty big issues which I somehow managed to miss at first. It also modifies the full update code path so it can be used by the recompute task. I will certainly add some more comments, up until now I've been focusing on the code itself. Hopefully I'll have the patch complete by the end of the week (i.e. tomorrow) and I'll send it. If you wish, we can schedule a phone call to discuss some places which are the most difficult to comprehend.
Yes we should set up a call between me you and Stephen next week. so we can start a through review.
Also one other thing we forgot to mention is that we were wondering if you made any performance measurements.
Perf. was the main driver for this work, so we would like to see some numbers to make sure we did actually achieved perf. gains with the new patchset. It would be pretty lame to switch a very complex system with another very complex (and largely untested) system and find out we got little to no gain :-)
Simo.
I did some testing and I was disappointed, the improvement was very small. But on the other hand, I was just running the sysdb test suite and the main difference should be seen when using complex rfc2307bis membership structures. I will set up a testing environment during the next week, hopefully I'll get some satisfying results.
Jan
On Thu, 2011-09-15 at 17:29 +0200, Jan Zelený wrote:
On Thu, 2011-09-15 at 17:00 +0200, Jan Zelený wrote:
On Tue, 2011-08-30 at 17:54 +0200, Jan Zelený wrote:
Attached patch rewrites almost entire memberof plugin. It heavily utilizes hash tables instead of lists and arrays and it introduces concept of reference counting which should heavily optimize all operations when no loops are present in the user/group tree.
I tested the patch by running sysdb test suite, all tests passed. I also attach a document where basic concepts of the plugin are explained.
Please note that the patch is functional, although I don't consider it ready. I just want to get pre-ACK or some comments about the patch design. I have yet to implement the recompute task, I will work on that later.
Jan, yesterday I an Stephen got together to start a review of the patch. Due to the complexity we tried to go thorugh the code together to make sure we understand what is going on.
Also due to the complexity and need to refresh memory we were only be able to go through most of the "add" code only, we were not able to review the update or delete code paths yet.
Attached find a diff file with comments in C++ style (so that it is clear they all need to be resolved before the patch can be considered ok, as they show as a big red line in our editors) that you can integrate in your tree and check one by one.
There are some questions in there, but in general we noticed a very annoying lack of comments in the code itself. The accompaning PDF is nice and all, but unless the logic is embedded and explained in the code (like it was with the previous code) it will be ablocker and will make review more difficult.
That said, so far so good, although we need to more carefully review the refcounting stuff, we got to it late and only marginally reviewed it. Need more time to fully grasp all details.
HTH, Simo.
Simo, first I'd like to thank you both for the review. I realize it is quite difficult, I wouldn't want to do it myself.
I will look at the diff you wrote about ASAP. That said, I already have a modification which resolves some pretty big issues which I somehow managed to miss at first. It also modifies the full update code path so it can be used by the recompute task. I will certainly add some more comments, up until now I've been focusing on the code itself. Hopefully I'll have the patch complete by the end of the week (i.e. tomorrow) and I'll send it. If you wish, we can schedule a phone call to discuss some places which are the most difficult to comprehend.
Yes we should set up a call between me you and Stephen next week. so we can start a through review.
Also one other thing we forgot to mention is that we were wondering if you made any performance measurements.
Perf. was the main driver for this work, so we would like to see some numbers to make sure we did actually achieved perf. gains with the new patchset. It would be pretty lame to switch a very complex system with another very complex (and largely untested) system and find out we got little to no gain :-)
Simo.
I did some testing and I was disappointed, the improvement was very small. But on the other hand, I was just running the sysdb test suite and the main difference should be seen when using complex rfc2307bis membership structures. I will set up a testing environment during the next week, hopefully I'll get some satisfying results.
The sysdb test suite isn't a good example. (Though it would be best if you could expand it to do so).
The biggest performance gains SHOULD be when removing a member attribute from a complicated relationship. Please measure this and let us know.
On Tue, 2011-08-30 at 17:54 +0200, Jan Zelený wrote:
Attached patch rewrites almost entire memberof plugin. It heavily utilizes hash tables instead of lists and arrays and it introduces concept of reference counting which should heavily optimize all operations when no loops are present in the user/group tree.
I tested the patch by running sysdb test suite, all tests passed. I also attach a document where basic concepts of the plugin are explained.
Please note that the patch is functional, although I don't consider it ready. I just want to get pre-ACK or some comments about the patch design. I have yet to implement the recompute task, I will work on that later.
Jan, yesterday I an Stephen got together to start a review of the patch. Due to the complexity we tried to go thorugh the code together to make sure we understand what is going on.
Also due to the complexity and need to refresh memory we were only be able to go through most of the "add" code only, we were not able to review the update or delete code paths yet.
Attached find a diff file with comments in C++ style (so that it is clear they all need to be resolved before the patch can be considered ok, as they show as a big red line in our editors) that you can integrate in your tree and check one by one.
There are some questions in there, but in general we noticed a very annoying lack of comments in the code itself. The accompaning PDF is nice and all, but unless the logic is embedded and explained in the code (like it was with the previous code) it will be ablocker and will make review more difficult.
That said, so far so good, although we need to more carefully review the refcounting stuff, we got to it late and only marginally reviewed it. Need more time to fully grasp all details.
HTH, Simo.
Simo, as promised before, I have a new version of the patch. This time it's complete, including the rewritten recompute task. I also took almost all comments you and Stephen gave me into account. The only thing I left unresolved for the moment is the last comment in the diff you sent. The change shouldn't be difficult, but it will need changes in many parts of the code, therefore I'll rather send it in the next patch.
Jan
On Wed, 2011-09-21 at 13:24 +0200, Jan Zelený wrote:
On Tue, 2011-08-30 at 17:54 +0200, Jan Zelený wrote:
Attached patch rewrites almost entire memberof plugin. It heavily utilizes hash tables instead of lists and arrays and it introduces concept of reference counting which should heavily optimize all operations when no loops are present in the user/group tree.
I tested the patch by running sysdb test suite, all tests passed. I also attach a document where basic concepts of the plugin are explained.
Please note that the patch is functional, although I don't consider it ready. I just want to get pre-ACK or some comments about the patch design. I have yet to implement the recompute task, I will work on that later.
Jan, yesterday I an Stephen got together to start a review of the patch. Due to the complexity we tried to go thorugh the code together to make sure we understand what is going on.
Also due to the complexity and need to refresh memory we were only be able to go through most of the "add" code only, we were not able to review the update or delete code paths yet.
Attached find a diff file with comments in C++ style (so that it is clear they all need to be resolved before the patch can be considered ok, as they show as a big red line in our editors) that you can integrate in your tree and check one by one.
There are some questions in there, but in general we noticed a very annoying lack of comments in the code itself. The accompaning PDF is nice and all, but unless the logic is embedded and explained in the code (like it was with the previous code) it will be ablocker and will make review more difficult.
That said, so far so good, although we need to more carefully review the refcounting stuff, we got to it late and only marginally reviewed it. Need more time to fully grasp all details.
HTH, Simo.
Simo, as promised before, I have a new version of the patch. This time it's complete, including the rewritten recompute task. I also took almost all comments you and Stephen gave me into account. The only thing I left unresolved for the moment is the last comment in the diff you sent. The change shouldn't be difficult, but it will need changes in many parts of the code, therefore I'll rather send it in the next patch.
That's the change to suppress memberuid for non-posixGroup entries, correct?
I'm going to say that this is an optimization that we don't need right now. Please open an RFE for it and we'll schedule it for later inclusion.
Right now the SSSD only ever uses the member/memberOf relationship for groups, so the only advantage of this would be that we wouldn't add memberUid attributes to non-POSIX groups in a nested chain, but I think the gain there is not worth the effort at this time.
Actual review of the patch will be forthcoming, but I didn't want you wasting time on this for now.
On Wed, 2011-09-21 at 07:42 -0400, Stephen Gallagher wrote:
On Wed, 2011-09-21 at 13:24 +0200, Jan Zelený wrote:
Simo, as promised before, I have a new version of the patch. This time it's complete, including the rewritten recompute task. I also took almost all comments you and Stephen gave me into account. The only thing I left unresolved for the moment is the last comment in the diff you sent. The change shouldn't be difficult, but it will need changes in many parts of the code, therefore I'll rather send it in the next patch.
That's the change to suppress memberuid for non-posixGroup entries, correct?
I'm going to say that this is an optimization that we don't need right now. Please open an RFE for it and we'll schedule it for later inclusion.
Right now the SSSD only ever uses the member/memberOf relationship for groups, so the only advantage of this would be that we wouldn't add memberUid attributes to non-POSIX groups in a nested chain, but I think the gain there is not worth the effort at this time.
Actual review of the patch will be forthcoming, but I didn't want you wasting time on this for now.
Actually we use memberUid to build the replies when group information is requested. But I agree this is a further optimization and does not impact current behavior.
Simo.
On Wed, 2011-09-21 at 13:24 +0200, Jan Zelený wrote:
On Tue, 2011-08-30 at 17:54 +0200, Jan Zelený wrote:
Attached patch rewrites almost entire memberof plugin. It heavily utilizes hash tables instead of lists and arrays and it introduces concept of reference counting which should heavily optimize all operations when no loops are present in the user/group tree.
I tested the patch by running sysdb test suite, all tests passed. I also attach a document where basic concepts of the plugin are explained.
Please note that the patch is functional, although I don't consider it ready. I just want to get pre-ACK or some comments about the patch design. I have yet to implement the recompute task, I will work on that later.
Jan, yesterday I an Stephen got together to start a review of the patch. Due to the complexity we tried to go thorugh the code together to make sure we understand what is going on.
Also due to the complexity and need to refresh memory we were only be able to go through most of the "add" code only, we were not able to review the update or delete code paths yet.
Attached find a diff file with comments in C++ style (so that it is clear they all need to be resolved before the patch can be considered ok, as they show as a big red line in our editors) that you can integrate in your tree and check one by one.
There are some questions in there, but in general we noticed a very annoying lack of comments in the code itself. The accompaning PDF is nice and all, but unless the logic is embedded and explained in the code (like it was with the previous code) it will be ablocker and will make review more difficult.
That said, so far so good, although we need to more carefully review the refcounting stuff, we got to it late and only marginally reviewed it. Need more time to fully grasp all details.
HTH, Simo.
Simo, as promised before, I have a new version of the patch. This time it's complete, including the rewritten recompute task. I also took almost all comments you and Stephen gave me into account. The only thing I left unresolved for the moment is the last comment in the diff you sent. The change shouldn't be difficult, but it will need changes in many parts of the code, therefore I'll rather send it in the next patch.
Nack
First, the third patch in this set does not compile. You used sysdb_memberof_recompute() in the test, but the function is sysdb_memberof_rebuild().
With that typo fixed, I attempted to run with this patch. Any attempt to run getgrnam() or getgrgid() results in a hang in the LDB until eventually the monitor issues a SIGTERM to the backend (which doesn't die, because it never gets back to the mainloop to process the signal).
Hi,
In an effort to test the new memberof plugin I came across something that probably is a bug. I used sssd 1.5.12 and applied the memberof patch. The patch applied cleanly and compiled fine but 'make check' failed on the sysdb test:
Running the sysdb check through gdb revealed a bit more:
My reason for choosing 1.5.12 was to get it working on RHEL6 with the least amount of effort. The patch didn't apply to 1.5.1 from RHEL6, so I used an SRPM from F14 to create RPMs, adding just the memberof patch to the mix. Building worked (except I had to disable the check part), but after installing the RPMs sssd did not. While 'getent passwd user' worked as expected, 'id user' didn't. The command would hang forever.
The provider used is plain OpenLDAP, RFC2307.
I reported issue #883 and I'm happy to test the improved memberof in a real environment, if needed.
Cheers,
Hi,
In an effort to test the new memberof plugin I came across something that probably is a bug. I used sssd 1.5.12 and applied the memberof patch. The patch applied cleanly and compiled fine but 'make check' failed on the sysdb test:
sssd_sysdb-tests.txt
Running the sysdb check through gdb revealed a bit more:
sssd_sysdb-tests_gdb.txt
My reason for choosing 1.5.12 was to get it working on RHEL6 with the least amount of effort. The patch didn't apply to 1.5.1 from RHEL6, so I used an SRPM from F14 to create RPMs, adding just the memberof patch to the mix. Building worked (except I had to disable the check part), but after installing the RPMs sssd did not. While 'getent passwd user' worked as expected, 'id user' didn't. The command would hang forever.
The provider used is plain OpenLDAP, RFC2307.
I reported issue #883 and I'm happy to test the improved memberof in a real environment, if needed.
Cheers,
Trond H. Amundsen t.h.amundsen@usit.uio.no Center for Information Technology Services, University of Oslo
Hi, judging from the backtrace you provided (function mbof_mod_del_next_child_cb in particular), I think you don't have the latest version of the patch. Please try the patch I sent yesterday, perhaps it will improve things.
On Wed, 2011-09-21 at 13:24 +0200, Jan Zelený wrote:
On Tue, 2011-08-30 at 17:54 +0200, Jan Zelený wrote:
Attached patch rewrites almost entire memberof plugin. It heavily utilizes hash tables instead of lists and arrays and it introduces concept of reference counting which should heavily optimize all operations when no loops are present in the user/group tree.
I tested the patch by running sysdb test suite, all tests passed. I also attach a document where basic concepts of the plugin are explained.
Please note that the patch is functional, although I don't consider it ready. I just want to get pre-ACK or some comments about the patch design. I have yet to implement the recompute task, I will work on that later.
Jan, yesterday I an Stephen got together to start a review of the patch. Due to the complexity we tried to go thorugh the code together to make sure we understand what is going on.
Also due to the complexity and need to refresh memory we were only be able to go through most of the "add" code only, we were not able to review the update or delete code paths yet.
Attached find a diff file with comments in C++ style (so that it is clear they all need to be resolved before the patch can be considered ok, as they show as a big red line in our editors) that you can integrate in your tree and check one by one.
There are some questions in there, but in general we noticed a very annoying lack of comments in the code itself. The accompaning PDF is nice and all, but unless the logic is embedded and explained in the code (like it was with the previous code) it will be ablocker and will make review more difficult.
That said, so far so good, although we need to more carefully review the refcounting stuff, we got to it late and only marginally reviewed it. Need more time to fully grasp all details.
HTH, Simo.
Simo, as promised before, I have a new version of the patch. This time it's complete, including the rewritten recompute task. I also took almost all comments you and Stephen gave me into account. The only thing I left unresolved for the moment is the last comment in the diff you sent. The change shouldn't be difficult, but it will need changes in many parts of the code, therefore I'll rather send it in the next patch.
Nack
First, the third patch in this set does not compile. You used sysdb_memberof_recompute() in the test, but the function is sysdb_memberof_rebuild().
With that typo fixed, I attempted to run with this patch. Any attempt to run getgrnam() or getgrgid() results in a hang in the LDB until eventually the monitor issues a SIGTERM to the backend (which doesn't die, because it never gets back to the mainloop to process the signal).
Could you please send me steps how to reproduce?
Thanks Jan
Purge or expire the cache getent group groupthatexistsandhasmembers
I tested on an RFC2307 server.
On Sep 22, 2011, at 5:10 AM, Jan Zelený jzeleny@redhat.com wrote:
On Wed, 2011-09-21 at 13:24 +0200, Jan Zelený wrote:
On Tue, 2011-08-30 at 17:54 +0200, Jan Zelený wrote:
Attached patch rewrites almost entire memberof plugin. It heavily utilizes hash tables instead of lists and arrays and it introduces concept of reference counting which should heavily optimize all operations when no loops are present in the user/group tree.
I tested the patch by running sysdb test suite, all tests passed. I also attach a document where basic concepts of the plugin are explained.
Please note that the patch is functional, although I don't consider it ready. I just want to get pre-ACK or some comments about the patch design. I have yet to implement the recompute task, I will work on that later.
Jan, yesterday I an Stephen got together to start a review of the patch. Due to the complexity we tried to go thorugh the code together to make sure we understand what is going on.
Also due to the complexity and need to refresh memory we were only be able to go through most of the "add" code only, we were not able to review the update or delete code paths yet.
Attached find a diff file with comments in C++ style (so that it is clear they all need to be resolved before the patch can be considered ok, as they show as a big red line in our editors) that you can integrate in your tree and check one by one.
There are some questions in there, but in general we noticed a very annoying lack of comments in the code itself. The accompaning PDF is nice and all, but unless the logic is embedded and explained in the code (like it was with the previous code) it will be ablocker and will make review more difficult.
That said, so far so good, although we need to more carefully review the refcounting stuff, we got to it late and only marginally reviewed it. Need more time to fully grasp all details.
HTH, Simo.
Simo, as promised before, I have a new version of the patch. This time it's complete, including the rewritten recompute task. I also took almost all comments you and Stephen gave me into account. The only thing I left unresolved for the moment is the last comment in the diff you sent. The change shouldn't be difficult, but it will need changes in many parts of the code, therefore I'll rather send it in the next patch.
Nack
First, the third patch in this set does not compile. You used sysdb_memberof_recompute() in the test, but the function is sysdb_memberof_rebuild().
With that typo fixed, I attempted to run with this patch. Any attempt to run getgrnam() or getgrgid() results in a hang in the LDB until eventually the monitor issues a SIGTERM to the backend (which doesn't die, because it never gets back to the mainloop to process the signal).
Could you please send me steps how to reproduce?
Thanks Jan
Purge or expire the cache getent group groupthatexistsandhasmembers
I tested on an RFC2307 server.
On Sep 22, 2011, at 5:10 AM, Jan Zelený jzeleny@redhat.com wrote:
On Wed, 2011-09-21 at 13:24 +0200, Jan Zelený wrote:
On Tue, 2011-08-30 at 17:54 +0200, Jan Zelený wrote:
Attached patch rewrites almost entire memberof plugin. It heavily utilizes hash tables instead of lists and arrays and it introduces concept of reference counting which should heavily optimize all operations when no loops are present in the user/group tree.
I tested the patch by running sysdb test suite, all tests passed. I also attach a document where basic concepts of the plugin are explained.
Please note that the patch is functional, although I don't consider it ready. I just want to get pre-ACK or some comments about the patch design. I have yet to implement the recompute task, I will work on that later.
Jan, yesterday I an Stephen got together to start a review of the patch. Due to the complexity we tried to go thorugh the code together to make sure we understand what is going on.
Also due to the complexity and need to refresh memory we were only be able to go through most of the "add" code only, we were not able to review the update or delete code paths yet.
Attached find a diff file with comments in C++ style (so that it is clear they all need to be resolved before the patch can be considered ok, as they show as a big red line in our editors) that you can integrate in your tree and check one by one.
There are some questions in there, but in general we noticed a very annoying lack of comments in the code itself. The accompaning PDF is nice and all, but unless the logic is embedded and explained in the code (like it was with the previous code) it will be ablocker and will make review more difficult.
That said, so far so good, although we need to more carefully review the refcounting stuff, we got to it late and only marginally reviewed it. Need more time to fully grasp all details.
HTH, Simo.
Simo, as promised before, I have a new version of the patch. This time it's complete, including the rewritten recompute task. I also took almost all comments you and Stephen gave me into account. The only thing I left unresolved for the moment is the last comment in the diff you sent. The change shouldn't be difficult, but it will need changes in many parts of the code, therefore I'll rather send it in the next patch.
Nack
First, the third patch in this set does not compile. You used sysdb_memberof_recompute() in the test, but the function is sysdb_memberof_rebuild().
With that typo fixed, I attempted to run with this patch. Any attempt to run getgrnam() or getgrgid() results in a hang in the LDB until eventually the monitor issues a SIGTERM to the backend (which doesn't die, because it never gets back to the mainloop to process the signal).
Could you please send me steps how to reproduce?
Thanks Jan
Strange. I tested this exact scenario and it succeeded. I retested it a moment ago to be sure and it still goes fine. Could you send me some more info like a core dump or something?
Thanks Jan
Purge or expire the cache getent group groupthatexistsandhasmembers
I tested on an RFC2307 server.
On Sep 22, 2011, at 5:10 AM, Jan Zelený jzeleny@redhat.com wrote:
On Wed, 2011-09-21 at 13:24 +0200, Jan Zelený wrote:
On Tue, 2011-08-30 at 17:54 +0200, Jan Zelený wrote: > Attached patch rewrites almost entire memberof plugin. It heavily > utilizes hash tables instead of lists and arrays and it introduces > concept of reference counting which should heavily optimize all > operations when no loops are present in the user/group tree. > > I tested the patch by running sysdb test suite, all tests passed. I > also attach a document where basic concepts of the plugin are > explained. > > Please note that the patch is functional, although I don't consider > it ready. I just want to get pre-ACK or some comments about the > patch design. I have yet to implement the recompute task, I will > work on that later.
Jan, yesterday I an Stephen got together to start a review of the patch. Due to the complexity we tried to go thorugh the code together to make sure we understand what is going on.
Also due to the complexity and need to refresh memory we were only be able to go through most of the "add" code only, we were not able to review the update or delete code paths yet.
Attached find a diff file with comments in C++ style (so that it is clear they all need to be resolved before the patch can be considered ok, as they show as a big red line in our editors) that you can integrate in your tree and check one by one.
There are some questions in there, but in general we noticed a very annoying lack of comments in the code itself. The accompaning PDF is nice and all, but unless the logic is embedded and explained in the code (like it was with the previous code) it will be ablocker and will make review more difficult.
That said, so far so good, although we need to more carefully review the refcounting stuff, we got to it late and only marginally reviewed it. Need more time to fully grasp all details.
HTH, Simo.
Simo, as promised before, I have a new version of the patch. This time it's complete, including the rewritten recompute task. I also took almost all comments you and Stephen gave me into account. The only thing I left unresolved for the moment is the last comment in the diff you sent. The change shouldn't be difficult, but it will need changes in many parts of the code, therefore I'll rather send it in the next patch.
Nack
First, the third patch in this set does not compile. You used sysdb_memberof_recompute() in the test, but the function is sysdb_memberof_rebuild().
With that typo fixed, I attempted to run with this patch. Any attempt to run getgrnam() or getgrgid() results in a hang in the LDB until eventually the monitor issues a SIGTERM to the backend (which doesn't die, because it never gets back to the mainloop to process the signal).
Could you please send me steps how to reproduce?
Thanks Jan
Strange. I tested this exact scenario and it succeeded. I retested it a moment ago to be sure and it still goes fine. Could you send me some more info like a core dump or something?
After further consulation on IRC, I finally found the bug and fixed it. Sending patches in attachment.
Jan
Jan Zelený jzeleny@redhat.com writes:
After further consulation on IRC, I finally found the bug and fixed it. Sending patches in attachment.
Hello Jan,
I'm happy to report that this updated version of the memberof patch passes all tests during 'make check'.
However, when used in a real environment, it works with small groups but has problems with larger groups. I tried running 'getent group somegroup' for different groups, and found that groups with more than ~1000 members aren't handled properly:
* 'getent group somegroup' never returns anything
* 'id someuser' hangs indefinately for some users. This seems to happen if the user is a member of one or more large groups. The limit for "large group" seems to be higher in this case though.
* 'id -G someuser' works for any user.
Another observation: sssd_be can't be killed by anything but a SIGKILL, if any of the above problems have occured. When in this state, sssd_be doesn't consume CPU. When issuing 'getent group largegroup' sssd_be starts consuming CPU as expected, then abruptly stops using CPU and enters the unkillable state.
Regards,
Jan Zelený jzeleny@redhat.com writes:
After further consulation on IRC, I finally found the bug and fixed it. Sending patches in attachment.
Hello Jan,
I'm happy to report that this updated version of the memberof patch passes all tests during 'make check'.
Yes, that was my original reference test.
However, when used in a real environment, it works with small groups but has problems with larger groups. I tried running 'getent group somegroup' for different groups, and found that groups with more than ~1000 members aren't handled properly:
'getent group somegroup' never returns anything
'id someuser' hangs indefinately for some users. This seems to happen if the user is a member of one or more large groups. The limit for "large group" seems to be higher in this case though.
'id -G someuser' works for any user.
Another observation: sssd_be can't be killed by anything but a SIGKILL, if any of the above problems have occured. When in this state, sssd_be doesn't consume CPU. When issuing 'getent group largegroup' sssd_be starts consuming CPU as expected, then abruptly stops using CPU and enters the unkillable state.
As you described it, it looks like issue similar to what Stephen found earlier and is now fixed. It probably isn't caused by the size of groups but rather membership structure. Could you be more specific how the structure is organized? Do you use rfc2307 or rfc2307bis? If the latter is the case, how deep your membership structure goes and how does it approximately look like?
Anyway I will run some tests and see if I can reproduce the issue somehow.
Jan
Jan Zelený jzeleny@redhat.com writes:
As you described it, it looks like issue similar to what Stephen found earlier and is now fixed. It probably isn't caused by the size of groups but rather membership structure. Could you be more specific how the structure is organized? Do you use rfc2307 or rfc2307bis? If the latter is the case, how deep your membership structure goes and how does it approximately look like?
This is rfc2307.
Regards,
Jan Zelený jzeleny@redhat.com writes:
As you described it, it looks like issue similar to what Stephen found earlier and is now fixed. It probably isn't caused by the size of groups but rather membership structure. Could you be more specific how the structure is organized? Do you use rfc2307 or rfc2307bis? If the latter is the case, how deep your membership structure goes and how does it approximately look like?
This is rfc2307.
Regards,
There must be something else going on. I tested 2307 on a group with ~1000 users and I found no problem.
Could you provide me a backtrace of the place where the process hangs? Any other information you might have will be useful as well.
Thanks Jan
Jan Zelený jzeleny@redhat.com writes:
As you described it, it looks like issue similar to what Stephen found earlier and is now fixed. It probably isn't caused by the size of groups but rather membership structure. Could you be more specific how the structure is organized? Do you use rfc2307 or rfc2307bis? If the latter is the case, how deep your membership structure goes and how does it approximately look like?
This is rfc2307.
There must be something else going on. I tested 2307 on a group with ~1000 users and I found no problem.
Could you provide me a backtrace of the place where the process hangs? Any other information you might have will be useful as well.
I don't have a backtrace, since I compiled this as an rpm. However, an strace of sssd_be hopefully holds a clue or two:
... lots of uninteresing bits ... 24469 fcntl(20, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=48, len=1}) = 0 24469 fcntl(20, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=48, len=1}) = 0 24469 fcntl(20, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=48, len=1}) = 0 24469 fcntl(20, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=48, len=1}) = 0 24469 fcntl(20, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=48, len=1}) = 0 24469 fcntl(20, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=48, len=1}) = 0 24469 fcntl(20, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=48, len=1}) = 0 24469 fcntl(20, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=48, len=1}) = 0 24469 brk(0x3f70000) = 0x3f70000 24469 brk(0x3fcc000) = 0x3fcc000 24469 brk(0x4192000) = 0x4192000 24469 brk(0x43ca000) = 0x43ca000 24469 mmap(NULL, 3641344, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f81a7aed000 24469 mremap(0x7f81a7aed000, 3641344, 4550656, MREMAP_MAYMOVE) = 0x7f81a7696000 24469 brk(0x43eb000) = 0x43eb000 24469 brk(0x440c000) = 0x440c000 24469 brk(0x442d000) = 0x442d000 24469 brk(0x444e000) = 0x444e000 24469 brk(0x446f000) = 0x446f000 24469 brk(0x4490000) = 0x4490000 24469 brk(0x44b1000) = 0x44b1000 24469 brk(0x44d2000) = 0x44d2000 24469 brk(0x44f3000) = 0x44f3000 24469 brk(0x4514000) = 0x4514000 24469 brk(0x4629000) = 0x4629000 24469 fcntl(20, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=48, len=1}) = 0 24469 fcntl(20, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=48, len=1}) = 0 24469 brk(0x4731000) = 0x4731000 24469 munmap(0x7f81a7e66000, 11153408) = 0 24469 brk(0x4752000) = 0x4752000 24469 brk(0x4778000) = 0x4778000 24469 brk(0x4799000) = 0x4799000 24469 brk(0x47ba000) = 0x47ba000 24469 brk(0x47db000) = 0x47db000 24469 brk(0x47fc000) = 0x47fc000 24469 brk(0x481d000) = 0x481d000 24469 brk(0x483f000) = 0x483f000 24469 brk(0x4861000) = 0x4861000 24469 brk(0x4884000) = 0x4884000 24469 brk(0x48a6000) = 0x48a6000 24469 brk(0x48c9000) = 0x48c9000 24469 brk(0x48ea000) = 0x48ea000 24469 brk(0x490b000) = 0x490b000 24469 brk(0x492c000) = 0x492c000 24469 brk(0x494d000) = 0x494d000 24469 brk(0x496e000) = 0x496e000 24469 brk(0x498f000) = 0x498f000 24469 brk(0x49b1000) = 0x49b1000 24469 brk(0x49d3000) = 0x49d3000 24469 brk(0x49f6000) = 0x49f6000 24469 mmap(NULL, 13942784, PROT_READ|PROT_WRITE, MAP_SHARED, 20, 0) = 0x7f81a7bbd000 24469 fcntl(20, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=48, len=1}) = 0 24469 fcntl(20, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=48, len=1}) = 0 24469 fcntl(20, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=48, len=1}) = 0 24469 fcntl(20, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=48, len=1}) = 0 24469 open("/dev/tty", O_RDWR|O_NOCTTY|O_NONBLOCK) = -1 ENXIO (No such device or address) 24469 writev(2, [{"*** glibc detected *** ", 23}, {"/usr/libexec/sssd/sssd_be", 25}, {": ", 2}, {"corrupted double-linked list", 28}, {": 0x", 4}, {"00000000048d98f0", 16}, {" ***\n", 5}], 7) = 103 24469 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f81af93d000 24469 open("/etc/ld.so.cache", O_RDONLY) = 25 24469 fstat(25, {st_mode=S_IFREG|0644, st_size=114674, ...}) = 0 24469 mmap(NULL, 114674, PROT_READ, MAP_PRIVATE, 25, 0) = 0x7f81a8b40000 24469 close(25) = 0 24469 open("/lib64/libgcc_s.so.1", O_RDONLY) = 25 24469 read(25, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20)\300\220;\0\0\0"..., 832) = 832 24469 futex(0x3b8a78ae80, FUTEX_WAIT_PRIVATE, 2, NULL .... and here it hangs ...
The "corrupted double-linked list" looks interesting...
PS. I forgot to mention that I'm using sssd 1.5.12 + memberof patch on rhel6 x86_64, since it required the least amount of effort to compile and install.
Regards,
Jan Zelený jzeleny@redhat.com writes:
As you described it, it looks like issue similar to what Stephen found earlier and is now fixed. It probably isn't caused by the size of groups but rather membership structure. Could you be more specific how the structure is organized? Do you use rfc2307 or rfc2307bis? If the latter is the case, how deep your membership structure goes and how does it approximately look like?
This is rfc2307.
There must be something else going on. I tested 2307 on a group with ~1000 users and I found no problem.
Could you provide me a backtrace of the place where the process hangs? Any other information you might have will be useful as well.
I don't have a backtrace, since I compiled this as an rpm. However, an strace of sssd_be hopefully holds a clue or two: PS. I forgot to mention that I'm using sssd 1.5.12 + memberof patch on rhel6 x86_64, since it required the least amount of effort to compile and install.
Regards,
I managed to reproduce this issue, however I'm not sure it was related to the plugin itself. What seemed to resolve it was update to latest packages in F15 and restart. However I did a small memory optimization, so I'm sending the corrected patch. Please let me know if everything works for you now. In not, try attaching gdb to sssd_be process *after* it freezes and send me a backtrace.
Thanks Jan
On Thu, 2011-09-29 at 16:56 +0200, Jan Zelený wrote:
Jan Zelený jzeleny@redhat.com writes:
As you described it, it looks like issue similar to what Stephen found earlier and is now fixed. It probably isn't caused by the size of groups but rather membership structure. Could you be more specific how the structure is organized? Do you use rfc2307 or rfc2307bis? If the latter is the case, how deep your membership structure goes and how does it approximately look like?
This is rfc2307.
There must be something else going on. I tested 2307 on a group with ~1000 users and I found no problem.
Could you provide me a backtrace of the place where the process hangs? Any other information you might have will be useful as well.
I don't have a backtrace, since I compiled this as an rpm. However, an strace of sssd_be hopefully holds a clue or two: PS. I forgot to mention that I'm using sssd 1.5.12 + memberof patch on rhel6 x86_64, since it required the least amount of effort to compile and install.
Regards,
I managed to reproduce this issue, however I'm not sure it was related to the plugin itself. What seemed to resolve it was update to latest packages in F15 and restart. However I did a small memory optimization, so I'm sending the corrected patch. Please let me know if everything works for you now. In not, try attaching gdb to sssd_be process *after* it freezes and send me a backtrace.
Still seeing issues with this latest round of patches (by the way, please always send all patches in a series when updating, it makes it easier to keep track of them).
I had JR try out this patch on his setup, and he hit a crash while saving about 4100 memberships to a small set of groups.
Program received signal SIGSEGV, Segmentation fault. 0x0000003e01003277 in _talloc_free () from /usr/lib64/libtalloc.so.2 (gdb) bt full #0 0x0000003e01003277 in _talloc_free () from /usr/lib64/libtalloc.so.2 No symbol table info available. #1 0x00002af7396056fb in mbof_update (update_ctx=0x23c7a9d0) at src/ldb_modules/memberof.c:1456 ldb = 0x1b267890 msg = 0x1 member = 0x21e6dab0 el = 0x237ac5c0 rc_el = 0xa5 new_el = <value optimized out> new_rc_el = <value optimized out> valdn = 0x54 i = 0 rc = 588275056 first_rc = 588275056 key = {type = 152, {str = 0x237ac5c0 "\220x&\033", ul = 595248576}} value = {type = 584457008, {ptr = 0x3e01005952, i = 16800082, ui = 16800082, l = 266304772434, ul = 266304772434, f = 2.35739712e-38, d = 1.315720393832104e-312}} process = 0x0 ret = 1 hret = 0 is_user = false member_name = 0x0 mod_req = 0x23c7a9d0 flags = 0 del_ctx = <value optimized out> #2 0x00002af739605e1c in mbof_add_next_member_cb (pvt=<value optimized out>, members=0x229eefb0, num=1) at src/ldb_modules/memberof.c:1908 add_ctx = 0x23844380 update_ctx = 0x1 ldb = 0x1b267890 el = 0x23105d70 valdn = <value optimized out> ret = 455506064 key = {type = HASH_KEY_STRING, {str = 0x222a0dd0 "", ul = 573181392}} value = {type = HASH_VALUE_LONG, {ptr = 0x3df9874e2e, i = -108573138, ui = 4186394158, l = 266179399214, ul = 266179399214, f = -8.78182185e+34, d = 1.315100967822999e-312}} #3 0x00002af7396033e2 in mbof_membertree_get_cb (req=<value optimized out>, ares=0x230fe300) at src/ldb_modules/memberof.c:1613 membertree_ctx = 0x0 ctx = 0x2311aba0 ret = <value optimized out> #4 0x0000003e02c1a0cc in ?? () from /usr/lib64/libldb.so.0 No symbol table info available. #5 0x0000003e014034d3 in ?? () from /usr/lib64/libtevent.so.0 No symbol table info available. #6 0x0000003e0140509b in ?? () from /usr/lib64/libtevent.so.0 No symbol table info available. #7 0x0000003e01402690 in _tevent_loop_once () from /usr/lib64/libtevent.so.0 No symbol table info available. #8 0x0000003e02c09a0f in ldb_wait () from /usr/lib64/libldb.so.0 No symbol table info available. #9 0x0000003e02c0b13c in ?? () from /usr/lib64/libldb.so.0 No symbol table info available. #10 0x0000003e02c0b2c0 in ldb_modify () from /usr/lib64/libldb.so.0 No symbol table info available. #11 0x00002af73982f070 in ipa_hbac_sysdb_save (sysdb=0x1b268660, domain=0x1b269130, primary_subdir=0x2af739881f12 "hbac_hosts", attr_name=<value optimized out>, primary_count=<value optimized out>, primary=<value optimized out>, group_subdir=0x2af739881f1d "hbac_hostgroups", groupattr_name=0x2af739880833 "cn", group_count=62, groups=0x1dc50610) at src/providers/ipa/ipa_hbac_common.c:268 lret = <value optimized out> ret = 0 orig_member_dns = 0x2352ebb0 i = 0 member_count = 1 members = 0x230060c0 tmp_ctx = 0x238900d0 member_dn = <value optimized out> ---Type <return> to continue, or q <return> to quit--- group_id = 0x1db230f0 "eng-zenoss" msg = 0x2340f170 member_filter = 0x21490870 "originalDN=fqdn=zoentdev1.ops.expertcity.com,cn=computers,cn=accounts,dc=expertcity,dc=com" __FUNCTION__ = "ipa_hbac_sysdb_save" #12 0x00002af73982394e in hbac_sysdb_save (req=<value optimized out>) at src/providers/ipa/ipa_access.c:484 ret = <value optimized out> hbac_ctx = 0x1b291a70 domain = 0x1b269130 sysdb = 0x1b268660 base_dn = <value optimized out> be_ctx = <value optimized out> tmp_ctx = <value optimized out> __FUNCTION__ = "hbac_sysdb_save" #13 0x00002af7398296a2 in ipa_hbac_rule_info_done (subreq=<value optimized out>) at src/providers/ipa/ipa_hbac_rules.c:213 ret = 0 req = 0x1dc60b20 state = 0x1c1ee730 __FUNCTION__ = "ipa_hbac_rule_info_done" #14 0x00002af73983ed39 in sdap_get_generic_done (op=<value optimized out>, reply=0x0, error=<value optimized out>, pvt=<value optimized out>) at src/providers/ldap/sdap_async.c:1022 req = 0x1b299120 state = <value optimized out> attrs = <value optimized out> errmsg = 0x0 result = 0 ret = <value optimized out> lret = 1 total_count = 0 cookie = {bv_len = 0, bv_val = 0x1ccfab10 "\260\263\367\034"} returned_controls = 0x1d8fcc20 page_control = <value optimized out> __FUNCTION__ = "sdap_get_generic_done" #15 0x00002af739840c8a in sdap_process_message (ev=<value optimized out>, pvt=<value optimized out>) at src/providers/ldap/sdap_async.c:307 msgtype = 101 ret = 0 reply = 0x1dc69460 op = 0x1dc62cf0 msgid = <value optimized out> __FUNCTION__ = "sdap_process_message" #16 sdap_process_result (ev=<value optimized out>, pvt=<value optimized out>) at src/providers/ldap/sdap_async.c:207 sh = <value optimized out> no_timeout = {tv_sec = 0, tv_usec = 0} te = <value optimized out> msg = 0x1c8adf20 ret = <value optimized out> __FUNCTION__ = "sdap_process_result" #17 0x0000003e014034d3 in ?? () from /usr/lib64/libtevent.so.0 No symbol table info available. #18 0x0000003e0140509b in ?? () from /usr/lib64/libtevent.so.0 No symbol table info available. #19 0x0000003e01402690 in _tevent_loop_once () from /usr/lib64/libtevent.so.0 No symbol table info available. #20 0x0000003e014026fb in ?? () from /usr/lib64/libtevent.so.0 No symbol table info available. #21 0x0000000000433db1 in server_loop (main_ctx=0x1b252b10) at src/util/server.c:526 No locals. #22 0x000000000040ee7f in main (argc=6, argv=0x7fff460731c8) at src/providers/data_provider_be.c:1333 opt = <value optimized out> pc = 0x1b2505e0 be_domain = 0x1b2509c0 "expertcity.com" srv_name = <value optimized out> conf_entry = <value optimized out> ---Type <return> to continue, or q <return> to quit--- main_ctx = 0x1b252b10 ret = <value optimized out> long_options = {{longName = 0x0, shortName = 0 '\000', argInfo = 4, arg = 0x647760, val = 0, descrip = 0x438df2 "Help options:", argDescrip = 0x0}, {longName = 0x438e00 "debug-level", shortName = 100 'd', argInfo = 2, arg = 0x647838, val = 0, descrip = 0x438dd1 "Debug level", argDescrip = 0x0}, {longName = 0x438e0c "debug-to-files", shortName = 102 'f', argInfo = 0, arg = 0x64783c, val = 0, descrip = 0x439a68 "Send the debug output to files instead of stderr", argDescrip = 0x0}, { longName = 0x438e1b "debug-timestamps", shortName = 0 '\000', argInfo = 2, arg = 0x647740, val = 0, descrip = 0x438ddd "Add debug timestamps", argDescrip = 0x0}, {longName = 0x43a408 "domain", shortName = 0 '\000', argInfo = 1, arg = 0x7fff46073088, val = 0, descrip = 0x439aa0 "Domain of the information provider (mandatory)", argDescrip = 0x0}, {longName = 0x0, shortName = 0 '\000', argInfo = 0, arg = 0x0, val = 0, descrip = 0x0, argDescrip = 0x0}} __FUNCTION__ = "main"
Please note that the issue is that we're calling talloc_free(msg) on msg at a time when msg = 0x1. (This can only happen in two error conditions, 1) member_name = ldb_msg_find_attr_as_string(member, DB_NAME, NULL); has returned NULL for member_name, or
2) entry_is_user_object() returned something other than LDB_SUCCESS or LDB_ERR_NO_SUCH_ATTRIBUTE
So there are two bugs here. 1) We need to initialize msg to NULL. 2) We need to figure out which of the above two failures is happening, and why.
Thanks Jan _______________________________________________ sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/sssd-devel
On Thu, 2011-09-29 at 16:56 +0200, Jan Zelený wrote:
Jan Zelený jzeleny@redhat.com writes:
As you described it, it looks like issue similar to what Stephen found earlier and is now fixed. It probably isn't caused by the size of groups but rather membership structure. Could you be more specific how the structure is organized? Do you use rfc2307 or rfc2307bis? If the latter is the case, how deep your membership structure goes and how does it approximately look like?
This is rfc2307.
There must be something else going on. I tested 2307 on a group with ~1000 users and I found no problem.
Could you provide me a backtrace of the place where the process hangs? Any other information you might have will be useful as well.
I don't have a backtrace, since I compiled this as an rpm. However, an strace of sssd_be hopefully holds a clue or two: PS. I forgot to mention that I'm using sssd 1.5.12 + memberof patch on rhel6 x86_64, since it required the least amount of effort to compile and install.
Regards,
I managed to reproduce this issue, however I'm not sure it was related to the plugin itself. What seemed to resolve it was update to latest packages in F15 and restart. However I did a small memory optimization, so I'm sending the corrected patch. Please let me know if everything works for you now. In not, try attaching gdb to sssd_be process *after* it freezes and send me a backtrace.
Still seeing issues with this latest round of patches (by the way, please always send all patches in a series when updating, it makes it easier to keep track of them).
Sorry about that. I meant to send all of them, but I somehow forgot.
Please note that the issue is that we're calling talloc_free(msg) on msg at a time when msg = 0x1. (This can only happen in two error conditions,
- member_name = ldb_msg_find_attr_as_string(member, DB_NAME, NULL);
has returned NULL for member_name, or
- entry_is_user_object() returned something other than LDB_SUCCESS or
LDB_ERR_NO_SUCH_ATTRIBUTE So there are two bugs here. 1) We need to initialize msg to NULL. 2) We need to figure out which of the above two failures is happening, and why.
I'm sending corrected set of patches, both issues have been addressed. However I was unable to test them properly, I'm seeing a strange error on my machine which simply can't happen. I suspect that something is wrong with my machine, but I'm unable to troubleshoot it. Please let me know it these patches work for you.
Thanks Jan
On Mon, 2011-10-03 at 14:33 +0200, Jan Zelený wrote:
On Thu, 2011-09-29 at 16:56 +0200, Jan Zelený wrote:
Jan Zelený jzeleny@redhat.com writes:
> As you described it, it looks like issue similar to what Stephen > found earlier and is now fixed. It probably isn't caused by the > size of groups but rather membership structure. Could you be more > specific how the structure is organized? Do you use rfc2307 or > rfc2307bis? If the latter is the case, how deep your membership > structure goes and how does it approximately look like?
This is rfc2307.
There must be something else going on. I tested 2307 on a group with ~1000 users and I found no problem.
Could you provide me a backtrace of the place where the process hangs? Any other information you might have will be useful as well.
I don't have a backtrace, since I compiled this as an rpm. However, an strace of sssd_be hopefully holds a clue or two: PS. I forgot to mention that I'm using sssd 1.5.12 + memberof patch on rhel6 x86_64, since it required the least amount of effort to compile and install.
Regards,
I managed to reproduce this issue, however I'm not sure it was related to the plugin itself. What seemed to resolve it was update to latest packages in F15 and restart. However I did a small memory optimization, so I'm sending the corrected patch. Please let me know if everything works for you now. In not, try attaching gdb to sssd_be process *after* it freezes and send me a backtrace.
Still seeing issues with this latest round of patches (by the way, please always send all patches in a series when updating, it makes it easier to keep track of them).
Sorry about that. I meant to send all of them, but I somehow forgot.
Please note that the issue is that we're calling talloc_free(msg) on msg at a time when msg = 0x1. (This can only happen in two error conditions,
- member_name = ldb_msg_find_attr_as_string(member, DB_NAME, NULL);
has returned NULL for member_name, or
- entry_is_user_object() returned something other than LDB_SUCCESS or
LDB_ERR_NO_SUCH_ATTRIBUTE So there are two bugs here. 1) We need to initialize msg to NULL. 2) We need to figure out which of the above two failures is happening, and why.
I'm sending corrected set of patches, both issues have been addressed. However I was unable to test them properly, I'm seeing a strange error on my machine which simply can't happen. I suspect that something is wrong with my machine, but I'm unable to troubleshoot it. Please let me know it these patches work for you.
Thanks Jan
Testing is ongoing, but something else occurred to me. We need to handle upgrades to this new approach properly. Please add a new sysdb_upgrade_NN() function to src/db/sysdb.c and bump the internal database version. When we upgrade to this version of the DB, we need to ensure that we run a full recompute across the whole sysdb so that all of the memberships are updated using the new mechanism.
On Mon, 2011-10-03 at 10:08 -0400, Stephen Gallagher wrote:
On Mon, 2011-10-03 at 14:33 +0200, Jan Zelený wrote:
On Thu, 2011-09-29 at 16:56 +0200, Jan Zelený wrote:
Jan Zelený jzeleny@redhat.com writes:
> > As you described it, it looks like issue similar to what Stephen > > found earlier and is now fixed. It probably isn't caused by the > > size of groups but rather membership structure. Could you be more > > specific how the structure is organized? Do you use rfc2307 or > > rfc2307bis? If the latter is the case, how deep your membership > > structure goes and how does it approximately look like? > > This is rfc2307.
There must be something else going on. I tested 2307 on a group with ~1000 users and I found no problem.
Could you provide me a backtrace of the place where the process hangs? Any other information you might have will be useful as well.
I don't have a backtrace, since I compiled this as an rpm. However, an strace of sssd_be hopefully holds a clue or two: PS. I forgot to mention that I'm using sssd 1.5.12 + memberof patch on rhel6 x86_64, since it required the least amount of effort to compile and install.
Regards,
I managed to reproduce this issue, however I'm not sure it was related to the plugin itself. What seemed to resolve it was update to latest packages in F15 and restart. However I did a small memory optimization, so I'm sending the corrected patch. Please let me know if everything works for you now. In not, try attaching gdb to sssd_be process *after* it freezes and send me a backtrace.
Still seeing issues with this latest round of patches (by the way, please always send all patches in a series when updating, it makes it easier to keep track of them).
Sorry about that. I meant to send all of them, but I somehow forgot.
Please note that the issue is that we're calling talloc_free(msg) on msg at a time when msg = 0x1. (This can only happen in two error conditions,
- member_name = ldb_msg_find_attr_as_string(member, DB_NAME, NULL);
has returned NULL for member_name, or
- entry_is_user_object() returned something other than LDB_SUCCESS or
LDB_ERR_NO_SUCH_ATTRIBUTE So there are two bugs here. 1) We need to initialize msg to NULL. 2) We need to figure out which of the above two failures is happening, and why.
I'm sending corrected set of patches, both issues have been addressed. However I was unable to test them properly, I'm seeing a strange error on my machine which simply can't happen. I suspect that something is wrong with my machine, but I'm unable to troubleshoot it. Please let me know it these patches work for you.
Thanks Jan
Testing is ongoing, but something else occurred to me. We need to handle upgrades to this new approach properly. Please add a new sysdb_upgrade_NN() function to src/db/sysdb.c and bump the internal database version. When we upgrade to this version of the DB, we need to ensure that we run a full recompute across the whole sysdb so that all of the memberships are updated using the new mechanism.
Another question: why are you using hash_create() directly instead of sss_hash_create()? The latter has the added bonus of being able to maintain memory with talloc (so talloc_free(table) will work cleanly).
On Fri, 2011-09-30 at 14:55 -0400, Stephen Gallagher wrote:
On Thu, 2011-09-29 at 16:56 +0200, Jan Zelený wrote:
Jan Zelený jzeleny@redhat.com writes:
As you described it, it looks like issue similar to what Stephen found earlier and is now fixed. It probably isn't caused by the size of groups but rather membership structure. Could you be more specific how the structure is organized? Do you use rfc2307 or rfc2307bis? If the latter is the case, how deep your membership structure goes and how does it approximately look like?
This is rfc2307.
There must be something else going on. I tested 2307 on a group with ~1000 users and I found no problem.
Could you provide me a backtrace of the place where the process hangs? Any other information you might have will be useful as well.
I don't have a backtrace, since I compiled this as an rpm. However, an strace of sssd_be hopefully holds a clue or two: PS. I forgot to mention that I'm using sssd 1.5.12 + memberof patch on rhel6 x86_64, since it required the least amount of effort to compile and install.
Regards,
I managed to reproduce this issue, however I'm not sure it was related to the plugin itself. What seemed to resolve it was update to latest packages in F15 and restart. However I did a small memory optimization, so I'm sending the corrected patch. Please let me know if everything works for you now. In not, try attaching gdb to sssd_be process *after* it freezes and send me a backtrace.
Still seeing issues with this latest round of patches (by the way, please always send all patches in a series when updating, it makes it easier to keep track of them).
I had JR try out this patch on his setup, and he hit a crash while saving about 4100 memberships to a small set of groups.
Program received signal SIGSEGV, Segmentation fault. 0x0000003e01003277 in _talloc_free () from /usr/lib64/libtalloc.so.2 (gdb) bt full #0 0x0000003e01003277 in _talloc_free () from /usr/lib64/libtalloc.so.2 No symbol table info available. #1 0x00002af7396056fb in mbof_update (update_ctx=0x23c7a9d0) at src/ldb_modules/memberof.c:1456 ldb = 0x1b267890 msg = 0x1 member = 0x21e6dab0 el = 0x237ac5c0 rc_el = 0xa5 new_el = <value optimized out> new_rc_el = <value optimized out> valdn = 0x54 i = 0 rc = 588275056 first_rc = 588275056 key = {type = 152, {str = 0x237ac5c0 "\220x&\033", ul = 595248576}} value = {type = 584457008, {ptr = 0x3e01005952, i = 16800082, ui = 16800082, l = 266304772434, ul = 266304772434, f = 2.35739712e-38, d = 1.315720393832104e-312}} process = 0x0 ret = 1 hret = 0 is_user = false member_name = 0x0 mod_req = 0x23c7a9d0 flags = 0 del_ctx = <value optimized out> #2 0x00002af739605e1c in mbof_add_next_member_cb (pvt=<value optimized out>, members=0x229eefb0, num=1) at src/ldb_modules/memberof.c:1908 add_ctx = 0x23844380 update_ctx = 0x1 ldb = 0x1b267890 el = 0x23105d70 valdn = <value optimized out> ret = 455506064 key = {type = HASH_KEY_STRING, {str = 0x222a0dd0 "", ul = 573181392}} value = {type = HASH_VALUE_LONG, {ptr = 0x3df9874e2e, i = -108573138, ui = 4186394158, l = 266179399214, ul = 266179399214, f = -8.78182185e+34, d = 1.315100967822999e-312}} #3 0x00002af7396033e2 in mbof_membertree_get_cb (req=<value optimized out>, ares=0x230fe300) at src/ldb_modules/memberof.c:1613 membertree_ctx = 0x0 ctx = 0x2311aba0 ret = <value optimized out> #4 0x0000003e02c1a0cc in ?? () from /usr/lib64/libldb.so.0 No symbol table info available. #5 0x0000003e014034d3 in ?? () from /usr/lib64/libtevent.so.0 No symbol table info available. #6 0x0000003e0140509b in ?? () from /usr/lib64/libtevent.so.0 No symbol table info available. #7 0x0000003e01402690 in _tevent_loop_once () from /usr/lib64/libtevent.so.0 No symbol table info available. #8 0x0000003e02c09a0f in ldb_wait () from /usr/lib64/libldb.so.0 No symbol table info available. #9 0x0000003e02c0b13c in ?? () from /usr/lib64/libldb.so.0 No symbol table info available. #10 0x0000003e02c0b2c0 in ldb_modify () from /usr/lib64/libldb.so.0 No symbol table info available. #11 0x00002af73982f070 in ipa_hbac_sysdb_save (sysdb=0x1b268660, domain=0x1b269130, primary_subdir=0x2af739881f12 "hbac_hosts", attr_name=<value optimized out>, primary_count=<value optimized out>, primary=<value optimized out>, group_subdir=0x2af739881f1d "hbac_hostgroups", groupattr_name=0x2af739880833 "cn", group_count=62, groups=0x1dc50610) at src/providers/ipa/ipa_hbac_common.c:268 lret = <value optimized out> ret = 0 orig_member_dns = 0x2352ebb0 i = 0 member_count = 1 members = 0x230060c0 tmp_ctx = 0x238900d0 member_dn = <value optimized out> ---Type <return> to continue, or q <return> to quit--- group_id = 0x1db230f0 "eng-zenoss" msg = 0x2340f170 member_filter = 0x21490870 "originalDN=fqdn=zoentdev1.ops.expertcity.com,cn=computers,cn=accounts,dc=expertcity,dc=com" __FUNCTION__ = "ipa_hbac_sysdb_save" #12 0x00002af73982394e in hbac_sysdb_save (req=<value optimized out>) at src/providers/ipa/ipa_access.c:484 ret = <value optimized out> hbac_ctx = 0x1b291a70 domain = 0x1b269130 sysdb = 0x1b268660 base_dn = <value optimized out> be_ctx = <value optimized out> tmp_ctx = <value optimized out> __FUNCTION__ = "hbac_sysdb_save" #13 0x00002af7398296a2 in ipa_hbac_rule_info_done (subreq=<value optimized out>) at src/providers/ipa/ipa_hbac_rules.c:213 ret = 0 req = 0x1dc60b20 state = 0x1c1ee730 __FUNCTION__ = "ipa_hbac_rule_info_done" #14 0x00002af73983ed39 in sdap_get_generic_done (op=<value optimized out>, reply=0x0, error=<value optimized out>, pvt=<value optimized out>) at src/providers/ldap/sdap_async.c:1022 req = 0x1b299120 state = <value optimized out> attrs = <value optimized out> errmsg = 0x0 result = 0 ret = <value optimized out> lret = 1 total_count = 0 cookie = {bv_len = 0, bv_val = 0x1ccfab10 "\260\263\367\034"} returned_controls = 0x1d8fcc20 page_control = <value optimized out> __FUNCTION__ = "sdap_get_generic_done" #15 0x00002af739840c8a in sdap_process_message (ev=<value optimized out>, pvt=<value optimized out>) at src/providers/ldap/sdap_async.c:307 msgtype = 101 ret = 0 reply = 0x1dc69460 op = 0x1dc62cf0 msgid = <value optimized out> __FUNCTION__ = "sdap_process_message" #16 sdap_process_result (ev=<value optimized out>, pvt=<value optimized out>) at src/providers/ldap/sdap_async.c:207 sh = <value optimized out> no_timeout = {tv_sec = 0, tv_usec = 0} te = <value optimized out> msg = 0x1c8adf20 ret = <value optimized out> __FUNCTION__ = "sdap_process_result" #17 0x0000003e014034d3 in ?? () from /usr/lib64/libtevent.so.0 No symbol table info available. #18 0x0000003e0140509b in ?? () from /usr/lib64/libtevent.so.0 No symbol table info available. #19 0x0000003e01402690 in _tevent_loop_once () from /usr/lib64/libtevent.so.0 No symbol table info available. #20 0x0000003e014026fb in ?? () from /usr/lib64/libtevent.so.0 No symbol table info available. #21 0x0000000000433db1 in server_loop (main_ctx=0x1b252b10) at src/util/server.c:526 No locals. #22 0x000000000040ee7f in main (argc=6, argv=0x7fff460731c8) at src/providers/data_provider_be.c:1333 opt = <value optimized out> pc = 0x1b2505e0 be_domain = 0x1b2509c0 "expertcity.com" srv_name = <value optimized out> conf_entry = <value optimized out> ---Type <return> to continue, or q <return> to quit--- main_ctx = 0x1b252b10 ret = <value optimized out> long_options = {{longName = 0x0, shortName = 0 '\000', argInfo = 4, arg = 0x647760, val = 0, descrip = 0x438df2 "Help options:", argDescrip = 0x0}, {longName = 0x438e00 "debug-level", shortName = 100 'd', argInfo = 2, arg = 0x647838, val = 0, descrip = 0x438dd1 "Debug level", argDescrip = 0x0}, {longName = 0x438e0c "debug-to-files", shortName = 102 'f', argInfo = 0, arg = 0x64783c, val = 0, descrip = 0x439a68 "Send the debug output to files instead of stderr", argDescrip = 0x0}, { longName = 0x438e1b "debug-timestamps", shortName = 0 '\000', argInfo = 2, arg = 0x647740, val = 0, descrip = 0x438ddd "Add debug timestamps", argDescrip = 0x0}, {longName = 0x43a408 "domain", shortName = 0 '\000', argInfo = 1, arg = 0x7fff46073088, val = 0, descrip = 0x439aa0 "Domain of the information provider (mandatory)", argDescrip = 0x0}, {longName = 0x0, shortName = 0 '\000', argInfo = 0, arg = 0x0, val = 0, descrip = 0x0, argDescrip = 0x0}} __FUNCTION__ = "main"
Please note that the issue is that we're calling talloc_free(msg) on msg at a time when msg = 0x1. (This can only happen in two error conditions,
- member_name = ldb_msg_find_attr_as_string(member, DB_NAME, NULL);
has returned NULL for member_name, or
- entry_is_user_object() returned something other than LDB_SUCCESS or
LDB_ERR_NO_SUCH_ATTRIBUTE
So there are two bugs here. 1) We need to initialize msg to NULL. 2) We need to figure out which of the above two failures is happening, and why.
Ok, I did a little more digging. Aside from the initialization error, the real problem here is that the memberUID code is breaking when it's dealing with non-posixGroups.
In this case, we have a bunch of hosts being added to hostgroups. They don't use a 'name=' attribute, so they're failing the memberuid linkage. We need to make the memberUID feature only happen for posixGroups. If this is too much work for the immediate future, we should just skip memberUID if the name attribute is missing, rather than fail.
Jan Zelený jzeleny@redhat.com writes:
There must be something else going on. I tested 2307 on a group with ~1000 users and I found no problem.
Could you provide me a backtrace of the place where the process hangs? Any other information you might have will be useful as well.
I don't have a backtrace, since I compiled this as an rpm. However, an strace of sssd_be hopefully holds a clue or two: PS. I forgot to mention that I'm using sssd 1.5.12 + memberof patch on rhel6 x86_64, since it required the least amount of effort to compile and install.
I managed to reproduce this issue, however I'm not sure it was related to the plugin itself. What seemed to resolve it was update to latest packages in F15 and restart. However I did a small memory optimization, so I'm sending the corrected patch. Please let me know if everything works for you now. In not, try attaching gdb to sssd_be process *after* it freezes and send me a backtrace.
Sorry about the delay, I've been crazy busy the last few days. The updated patch didn't resolve the issue. A backtrace of sssd_be is attached, captured while sssd_be was in the "unkillable" state.
#0 __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:97 No locals. #1 0x0000003b8a47bb68 in _L_lock_9159 () at malloc.c:3503 No symbol table info available. #2 0x0000003b8a479502 in __libc_malloc (bytes=255726235264) at malloc.c:3657 ignore1 = 128 ignore2 = -1971802496 ignore3 = -512 ar_ptr = <error reading variable ar_ptr (Asked for position 0 of stack, stack only has 0 elements on it.)> victim = <value optimized out> hook = <value optimized out> #3 0x0000003b8a004c62 in local_strdup ( s=0x7f68a48257bc "/lib64/libgcc_s.so.1") at dl-load.c:162 len = 21 new = <value optimized out> #4 0x0000003b8a0085f6 in _dl_map_object (loader=0x3b8a221188, name=0x3b8a5516de "libgcc_s.so.1", type=2, trace_mode=0, mode=<value optimized out>, nsid=<value optimized out>) at dl-load.c:2162 cached = 0x7f68a48257bc "/lib64/libgcc_s.so.1" namelen = 14 fd = <value optimized out> realname = <value optimized out> name_copy = <value optimized out> l = <value optimized out> fb = {len = 832, buf = "\177ELF\002\001\001\000\000\000\000\000\000\000\000\000\003\000>\000\001\000\000\000\020)\300\220;\000\000\000@\000\000\000\000\000\000\000hd\001\000\000\000\000\000\000\000\000\000@\000\070\000\006\000@\000\037\000\036\000\001\000\000\000\005", '\000' <repeats 13 times>"\300, \220;\000\000\000\000\000\300\220;\000\000\000\324V\001\000\000\000\000\000\324V\001\000\000\000\000\000\000\000 \000\000\000\000\000\001\000\000\000\006\000\000\000\330V\001\000\000\000\000\000\330V\341\220;\000\000\000\330V\341\220;\000\000\000\240\003\000\000\000\000\000\000 \006\000\000\000\000\000\000\000\000 \000\000\000\000\000\002\000\000\000\006\000\000\000\bW\001\000\000\000\000\000\bW\341\220;\000\000\000\bW\341\220;\000\000\000\260\001\000\000\000\000\000\000\260\001\000\000\000\000\000\000\b"...} found_other_class = false stack_end = <value optimized out> #5 0x0000003b8a0128d2 in dl_open_worker (a=0x7fff7b826a90) at dl-open.c:226 args = 0x7fff7b826a90 file = 0x3b8a5516de "libgcc_s.so.1" mode = -2147483647 call_map = <value optimized out> dst = <value optimized out> new = <value optimized out> r = <value optimized out> reloc_mode = <value optimized out> l = <value optimized out> any_tls = <value optimized out> #6 0x0000003b8a00e0a6 in _dl_catch_error (objname=0x7fff7b826ae0, errstring=0x7fff7b826ad8, mallocedp=0x7fff7b826aef, operate=0x3b8a0127b0 <dl_open_worker>, args=0x7fff7b826a90) at dl-error.c:178 errcode = 0 old = 0x7fff7b826b80 c = {objname = 0x7f68aa7048c8 "\237-", errstring = 0x0, malloced = 200, env = {{__jmpbuf = {2147483649, 6337619303767480908, -2, 0, 4, 255723902686, 6337619303828298316, -6306626998392448436}, __mask_was_saved = 44158276, __saved_mask = {__val = {32975296, 32961840, 32977600, 223390941224, 0, 115984560, 140735265532912, 0, 17212846784, 115984560, 140735265532984, 0, 33850008, 140735265532968, 140735265532512, 129195568}}}}} catchp = 0x7f68aa6db6f8 #7 0x0000003b8a01238a in _dl_open (file=0x3b8a5516de "libgcc_s.so.1", mode=-2147483647, caller_dlopen=0x0, nsid=-2, argc=4, argv=<value optimized out>, env=0x1f03360) at dl-open.c:555 args = {file = 0x3b8a5516de "libgcc_s.so.1", mode = -2147483647, caller_dlopen = 0x0, caller_dl_open = 0x3b8a522f20, map = 0x0, nsid = 0, argc = 4, argv = 0x7fff7b828158, env = 0x1f03360} objname = <value optimized out> errstring = <value optimized out> malloced = <value optimized out> errcode = <value optimized out> #8 0x0000003b8a522f20 in do_dlopen (ptr=<value optimized out>) at dl-libc.c:86 args = 0x7fff7b826c80 #9 0x0000003b8a00e0a6 in _dl_catch_error (objname=0x7fff7b826ca0, errstring=0x7fff7b826c98, mallocedp=0x7fff7b826caf, operate=0x3b8a522ee0 <do_dlopen>, args=0x7fff7b826c80) at dl-error.c:178 errcode = 0 old = 0x0 c = {objname = 0x2e5c8ba "\263\a", errstring = 0x0, malloced = false, env = {{__jmpbuf = {64, 6337619303679400524, 103, 140735265533328, 103, 7, 6337619303761189452, -6306626998392448436}, __mask_was_saved = 1, __saved_mask = {__val = {140735265534816, 140735265534348, 140735265533168, 140735265533168, 4294967297, 253403070465, 140735265534816, 140735265534704, 0, 140735265533144, 140735265533148, 140735265533148, 140735265533300, 140735265533300, 4294967297, 140733193388033}}}}} catchp = 0x7f68aa6db6f8 #10 0x0000003b8a523077 in dlerror_run (name=<value optimized out>, mode=<value optimized out>) at dl-libc.c:47 objname = 0x7fff7b826d58 "\034" last_errstring = 0x0 malloced = <value optimized out> result = <value optimized out> #11 __libc_dlopen_mode (name=<value optimized out>, mode=<value optimized out>) at dl-libc.c:160 args = {name = 0x3b8a5516de "libgcc_s.so.1", mode = -2147483647, map = 0x0} #12 0x0000003b8a4fb6e5 in init () at ../sysdeps/ia64/backtrace.c:41 No locals. #13 0x0000003b8b00cab3 in pthread_once () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_once.S:104 No locals. #14 0x0000003b8a4fb7e4 in __backtrace (array=<value optimized out>, size=64) at ../sysdeps/ia64/backtrace.c:85 __p = <value optimized out> arg = {array = 0x7fff7b827310, cnt = -1, size = 64} once = 1 #15 0x0000003b8a46f83b in __libc_message (do_abort=2, fmt=0x3b8a554840 "*** glibc detected *** %s: %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:178 addrs = {0x3000000018, 0x0, 0x7c8efd0, 0xd, 0x2244ed0, 0xd, 0x3, 0x7fff7b827430, 0x1ee7990, 0x5704afe4, 0xd, 0x3ba28021e2, 0x18, 0x7fff7b8273f0, 0x7fff00000018, 0x7fff7b827400, 0x7fff7b8273b0, 0x3ba2803c1f, 0x55, 0x206fd30, 0x7fff7b827410, 0x3ba2804dcf, 0x7fff7b827420, 0x3522d4, 0x3, 0x1ee7990, 0x3ba2804c60, 0x1ee7990, 0x2, 0x3ba28080a2, 0x3, 0x3ba2806937, 0x7fff7b82744c, 0x1, 0x2, 0x30, 0x1, 0x3b00000000, 0xd, 0x0, 0xd, 0x55, 0x7c8efd0, 0x0, 0x3, 0x3ba2802d2a, 0x206fd30, 0x7eb0730, 0x7fff7b8274d0, 0x70, 0x206fd30, 0x55, 0x7eb0730, 0x55, 0x7c8efd0, 0xd, 0x8000000000, 0x550000000d, 0x260119995704afe4, 0x3b8a47950d, 0x1364, 0x1ee7990, 0x1, 0x3ba2807e4a} n = <value optimized out> ap = {{gp_offset = 40, fp_offset = 48, overflow_arg_area = 0x7fff7b827640, reg_save_area = 0x7fff7b827550}} ap_copy = {{gp_offset = 16, fp_offset = 48, overflow_arg_area = 0x7fff7b827640, reg_save_area = 0x7fff7b827550}} fd = 2 on_2 = <value optimized out> list = <value optimized out> nlist = <value optimized out> cp = <value optimized out> written = <value optimized out> #16 0x0000003b8a475146 in malloc_printerr (action=3, str=0x3b8a552822 "corrupted double-linked list", ptr=<value optimized out>) at malloc.c:6283 buf = "0000000007ebca30" cp = <value optimized out> #17 0x0000003b8a478bb4 in _int_malloc (av=0x3b8a78ae80, bytes=<value optimized out>) at malloc.c:4636 p = <value optimized out> iters = <value optimized out> nb = 144 idx = <value optimized out> bin = <value optimized out> victim = 0x7ebca30 size = 130985616 victim_index = <value optimized out> remainder = <value optimized out> remainder_size = 130985472 block = <value optimized out> bit = <value optimized out> map = <value optimized out> fwd = <value optimized out> bck = <value optimized out> errstr = 0x0 #18 0x0000003b8a47950d in __libc_malloc (bytes=122) at malloc.c:3660 ar_ptr = 0x3b8a78ae80 victim = <value optimized out> hook = <value optimized out> #19 0x0000003b8d00335f in __talloc (ctx=<value optimized out>, el_size=<value optimized out>, count=<value optimized out>, name=<value optimized out>) at talloc.c:405 tc = 0x0 #20 _talloc_named_const (ctx=<value optimized out>, el_size=<value optimized out>, count=<value optimized out>, name=<value optimized out>) at talloc.c:516 ptr = 0x2a #21 _talloc_array (ctx=<value optimized out>, el_size=<value optimized out>, count=<value optimized out>, name=<value optimized out>) at talloc.c:1849 No locals. #22 0x0000003403a12af9 in ldb_dn_explode (dn=0x7db2860) at common/ldb_dn.c:348 p = <value optimized out> ex_name = <value optimized out> ex_value = <value optimized out> data = <value optimized out> d = <value optimized out> dt = <value optimized out> t = <value optimized out> trim = false in_extended = false in_ex_name = false in_ex_value = false in_attr = false in_value = false in_quote = false is_oid = false escape = false x = <value optimized out> l = <value optimized out> ret = <value optimized out> parse_dn = <value optimized out> is_index = false #23 0x00007f68a408307d in mbof_mod_process (req=<value optimized out>, ares=0x7c70aa0) at src/ldb_modules/memberof.c:3033 rc_el = <value optimized out> ctx = <value optimized out> added = 0x494b770 removed = 0x494a5f0 key = {type = HASH_KEY_STRING, { str = 0x7e983f0 "name=joakidy,cn=users,cn=default,cn=sysdb", ul = 132744176}} value = {type = HASH_VALUE_LONG, {ptr = 0x1, i = 1, ui = 1, l = 1, ul = 1, f = 1.40129846e-45, d = 4.9406564584124654e-324}} add_member = false ldb = 0x1ed3bc0 valdn = 0x7db2860 ret = <value optimized out> el = 0x6e7b140 i = <value optimized out> #24 mbof_mod_execute_cb (req=<value optimized out>, ares=0x7c70aa0) at src/ldb_modules/memberof.c:2925 ldb = <value optimized out> mod_ctx = 0x6e9e830 ctx = 0x1f01de0 ret = 0 #25 0x0000003403a1b7ed in ltdb_callback (ev=<value optimized out>, te=<value optimized out>, t=..., private_data=<value optimized out>) at ldb_tdb/ldb_tdb.c:1222 ctx = 0x49574c0 ret = <value optimized out> #26 0x0000003b8bc034e5 in tevent_common_loop_timer_delay (ev=0x1ed3620) at tevent_timed.c:254 current_time = {tv_sec = 0, tv_usec = 0} te = 0x4953310 #27 0x0000003b8bc0531b in std_event_loop_once (ev=<value optimized out>, location=<value optimized out>) at tevent_standard.c:537 std_ev = 0x1ee69b0 tval = {tv_sec = 0, tv_usec = 0} #28 0x0000003b8bc026d0 in _tevent_loop_once (ev=0x1ed3620, location=0x3403a27303 "common/ldb.c:578") at tevent.c:490 ret = <value optimized out> nesting_stack_ptr = 0x0 #29 0x0000003403a0a464 in ldb_wait (handle=0x22962d0, type=LDB_WAIT_ALL) at common/ldb.c:578 ev = 0x1ed3620 ret = <value optimized out> #30 0x0000003403a0b7ec in ldb_autotransaction_request (ldb=0x1ed3bc0, req=0x6e9f960) at common/ldb.c:530 ret = 0 #31 0x0000003403a0b971 in ldb_modify (ldb=0x1ed3bc0, message=0x218ce70) at common/ldb.c:1423 req = 0x6e9f960 ret = 0 #32 0x00007f68a3e3cdbc in sysdb_set_entry_attr (mem_ctx=<value optimized out>, ctx=0x1ed27f0, entry_dn=0x6ea13b0, attrs=0x1ef7620, mod_op=2) at src/db/sysdb_ops.c:466 msg = 0x218ce70 i = <value optimized out> ret = <value optimized out> __FUNCTION__ = "sysdb_set_entry_attr" #33 0x00007f68a3e45b49 in sysdb_store_group (mem_ctx=<value optimized out>, ctx=0x1ed27f0, domain=0x1ed12e0, name=0x203a330 "spss-usr", gid=70212, attrs=0x1ef7620, cache_timeout=5400) at src/db/sysdb_ops.c:1591 tmpctx = 0x206e970 src_attrs = {0x7f68a3e5d675 "name", 0x7f68a3e59abe "gidNumber", 0x7f68a3e5d951 "originalModifyTimestamp", 0x0} msg = 0x20a7a00 new_group = <value optimized out> now = <value optimized out> ret = 0 __FUNCTION__ = "sysdb_store_group" #34 0x00007f68a3e125f9 in sdap_store_group_with_gid (memctx=0x1ef7b10, sysdb=0x1ed27f0, dom=0x1ed12e0, opts=0x1eece60, groups=<value optimized out>, num_groups=1, populate_members=true, _usn_value=0x1ef7b48) at src/providers/ldap/sdap_async_accounts.c:688 ret = <value optimized out> #35 sdap_save_group (memctx=0x1ef7b10, sysdb=0x1ed27f0, dom=0x1ed12e0, opts=0x1eece60, groups=<value optimized out>, num_groups=1, populate_members=true, _usn_value=0x1ef7b48) at src/providers/ldap/sdap_async_accounts.c:860 ret = <value optimized out> tmpctx = 0x2177e10 posix_group = true el = 0x6e8df30 group_attrs = 0x1ef7620 name = 0x203a330 "spss-usr" gid = 70212 usn_value = 0x0 #36 sdap_save_groups (memctx=0x1ef7b10, sysdb=0x1ed27f0, dom=0x1ed12e0, opts=0x1eece60, groups=<value optimized out>, num_groups=1, populate_members=true, _usn_value=0x1ef7b48) at src/providers/ldap/sdap_async_accounts.c:1004 tmpctx = 0x218f540 higher_usn = <value optimized out> usn_value = 0x0 twopass = false ret = <value optimized out> i = <value optimized out> saved_groups = 0x0 nsaved_groups = <value optimized out> __FUNCTION__ = "sdap_save_groups" #37 0x00007f68a3e12dd7 in sdap_get_groups_done (subreq=0x0) at src/providers/ldap/sdap_async_accounts.c:1837 req = 0x1ef6dc0 state = 0x1ef7b10 ret = <value optimized out> sysret = <value optimized out> __FUNCTION__ = "sdap_get_groups_done" #38 0x0000003b8bc03707 in tevent_common_loop_immediate (ev=0x1ecf490) at tevent_immediate.c:135 im = 0x1ef7bd0 handler = 0x3b8bc046d0 <tevent_req_trigger> private_data = 0x1ef7740 #39 0x0000003b8bc0530a in std_event_loop_once (ev=0x1ecf490, location=<value optimized out>) at tevent_standard.c:532 std_ev = 0x1ecf550 tval = {tv_sec = 0, tv_usec = 0} #40 0x0000003b8bc026d0 in _tevent_loop_once (ev=0x1ecf490, location=0x435373 "src/util/server.c:550") at tevent.c:490 ret = <value optimized out> nesting_stack_ptr = 0x0 #41 0x0000003b8bc0273b in tevent_common_loop_wait (ev=0x1ecf490, location=0x435373 "src/util/server.c:550") at tevent.c:591 ret = <value optimized out> #42 0x0000000000429493 in server_loop (main_ctx=0x1ed0590) at src/util/server.c:550 No locals. #43 0x000000000040d1e7 in main (argc=<value optimized out>, argv=<value optimized out>) at src/providers/data_provider_be.c:1254 opt = <value optimized out> pc = <value optimized out> be_domain = 0x1ece400 "default" srv_name = <value optimized out> main_ctx = 0x1ed0590 confdb_path = <value optimized out> ret = 0 long_options = {{longName = 0x0, shortName = 0 '\000', argInfo = 4, arg = 0x63ad40, val = 0, descrip = 0x42e452 "Help options:", argDescrip = 0x0}, {longName = 0x42e460 "debug-level", shortName = 100 'd', argInfo = 2, arg = 0x63ad14, val = 0, descrip = 0x42e431 "Debug level", argDescrip = 0x0}, { longName = 0x42e46c "debug-to-files", shortName = 102 'f', argInfo = 0, arg = 0x63ae20, val = 0, descrip = 0x42f048 "Send the debug output to files instead of stderr", argDescrip = 0x0}, {longName = 0x42e47b "debug-timestamps", shortName = 0 '\000', argInfo = 2, arg = 0x63ad18, val = 0, descrip = 0x42e43d "Add debug timestamps", argDescrip = 0x0}, { longName = 0x42f9ac "domain", shortName = 0 '\000', argInfo = 1, arg = 0x7fff7b828018, val = 0, descrip = 0x42f080 "Domain of the information provider (mandatory)", argDescrip = 0x0}, {longName = 0x0, shortName = 0 '\000', argInfo = 0, arg = 0x0, val = 0, descrip = 0x0, argDescrip = 0x0}} __FUNCTION__ = "main" A debugging session is active.
Inferior 1 [process 11655] will be detached.
Quit anyway? (y or n) Detaching from program: /usr/libexec/sssd/sssd_be, process 11655
Regards,
sssd-devel@lists.fedorahosted.org