Op 28 jan. 2016, om 20:42 heeft James Ralston
<ralston(a)pobox.com> het volgende geschreven:
On Thu, Jan 28, 2016 at 8:18 AM, Bolke de Bruin <bdbruin(a)gmail.com> wrote:
> As mentioned in another thread one of the Hadoop components (Ranger)
> syncs all users and groups (including GIDs) on a regular basis to
> provide authorization.
Unfortunately, that is the problem. :-(
Apache Ranger assumes that the back-end database for the passwd/group
services is capable of enumeration. That is true for the "files"
database, but is not guaranteed to be true for other databases.
More simply put: there is no guarantee that getpwent()/getgrent() will
enumerate all users/groups (respectively) known to the passwd/group
At our site, we have a team that uses Hadoop, and they encountered
this issue when we first deployed sssd. Their work-around was to
manually create local passwd/group entries for the users/groups they
wanted to be visible within Hadoop. That worked for them, because
their Hadoop cluster was for only a handful of users, but that
solution isn't going to work for a production Hadoop cluster of any
I asked the developers on our Hadoop team to file a bug against Apache
Ranger, but I don't know if they ever did.
Ranger is actually even worse. It currently uses /etc/passwd and /etc/group
directly - so no nss. I have a patch in the works that addresses this by using getent
Moreover, I am adding some config parameters that allow to sync/enumerate
specific groups that ranger otherwise doesn’t see. It might help your guys in the
Still I think Ranger is a load of crap though, enumerating all users with over 50.000 in
our corp directory that is not fun. I just try to make it a little bit more manageable.