We are seeing random EIO errors when opening files on workstation clients that, so far, can only be resolved with a reboot of the client.
Environment:
2x replicated IPA servers, Centos 7.5 w/ freeipa 4.5.4-10.el7
NFS server: Centos 7.5
Clients: Mostly Fedora 28, but we've had the same error on Centos 7.5 systems as well. User home accounts are automounted with nfs4+krb5
Scenario:
Upon login, the home directory is always mounted cleanly and successfully traversed through subdirectories (cd, ls, etc). On an affected system any attempts to open/read a file will result in "Input/output error". An strace of the open actually shows the EIO at the openat syscall:
stat(".bash_profile", {st_mode=S_IFREG|0644, st_size=193, ...}) = 0 openat(AT_FDCWD, ".bash_profile", O_RDONLY) = -1 EIO (Input/output error)
This will then happen for every user that tries to use the workstation. We've tried restarting every service on the client to attempt to reset it (ie sssd, gssd, etc) but only a reboot will restore the NFS functionality. The one other symptom I noticed, is that on an affected workstation, the klist no longer contains the nfs/ principle for the nfs server. ie:
Good workstation: gary#> klist -a
Ticket cache: KEYRING:persistent:410400721:krb_ccache_xyvXIky Default principal: gary@<MYDOMAIN>
Valid starting Expires Service principal 2018-09-18 15:24:44 2018-09-19 15:24:44 nfs/fileserver@<MYDOMAIN> 2018-09-18 15:24:44 2018-09-19 15:24:44 nfs/fileserver@ 2018-09-18 15:24:44 2018-09-19 15:24:44 krbtgt/<MYDOMAIN>@<MYDOMAIN>
Bad workstation: gary#> klist
Ticket cache: KEYRING:persistent:410400721:krb_ccache_nY9m2vU Default principal: gary@<MYDOMAIN>
Valid starting Expires Service principal 2018-09-19 07:25:38 2018-09-20 07:25:38 krbtgt/<MYDOMAIN>@<MYDOMAIN>
Any help, pointers would be appreciated. Thanks Gary.
Gary Molenkamp via FreeIPA-users freeipa-users@lists.fedorahosted.org writes:
We are seeing random EIO errors when opening files on workstation clients that, so far, can only be resolved with a reboot of the client.
Environment:
2x replicated IPA servers, Centos 7.5 w/ freeipa 4.5.4-10.el7
NFS server: Centos 7.5
Clients: Mostly Fedora 28, but we've had the same error on Centos 7.5 systems as well. User home accounts are automounted with nfs4+krb5
Scenario:
Upon login, the home directory is always mounted cleanly and successfully traversed through subdirectories (cd, ls, etc). On an affected system any attempts to open/read a file will result in "Input/output error". An strace of the open actually shows the EIO at the openat syscall:
stat(".bash_profile", {st_mode=S_IFREG|0644, st_size=193, ...}) = 0 openat(AT_FDCWD, ".bash_profile", O_RDONLY) = -1 EIO (Input/output error)
This will then happen for every user that tries to use the workstation. We've tried restarting every service on the client to attempt to reset it (ie sssd, gssd, etc) but only a reboot will restore the NFS functionality. The one other symptom I noticed, is that on an affected workstation, the klist no longer contains the nfs/ principle for the nfs server. ie:
Good workstation: gary#> klist -a
Ticket cache: KEYRING:persistent:410400721:krb_ccache_xyvXIky Default principal: gary@<MYDOMAIN>
Valid starting Expires Service principal 2018-09-18 15:24:44 2018-09-19 15:24:44 nfs/fileserver@<MYDOMAIN> 2018-09-18 15:24:44 2018-09-19 15:24:44 nfs/fileserver@ 2018-09-18 15:24:44 2018-09-19 15:24:44 krbtgt/<MYDOMAIN>@<MYDOMAIN>
Bad workstation: gary#> klist
Ticket cache: KEYRING:persistent:410400721:krb_ccache_nY9m2vU Default principal: gary@<MYDOMAIN>
Valid starting Expires Service principal 2018-09-19 07:25:38 2018-09-20 07:25:38 krbtgt/<MYDOMAIN>@<MYDOMAIN>
Any help, pointers would be appreciated.
I'd look in gssproxy logs (crank up the debugging first). Something at or before crednetial acquisition has failed, it seems like.
Thanks, --Robbie
freeipa-users@lists.fedorahosted.org