Hi,
The error:
[17/Sep/2007:20:52:06 -0500] - libdb: Ignoring log file:
/opt/fedora-ds/slapd-isec-file/db/log.0000000206: magic number 0, not 40988
indicates that the backend Berkeley failed to use the log file
log.0000000206 as it is not a valid Berkeley DB logfile. Since you mentioned
that you had to shutdown the system manually and do a fsck when it came back
up, one possibility is that the log.0000000206 log file (and may be more
files) could have been corrupted. Have you checked the lost+found directory
for any recovered files ?
In any case, I would recommend that before you do any more troubleshooting
with the server, you take a snapshot (tar ball) of the affected directory
tree (/opt/fedora-ds and any other directories you can think of as belonging
to the directory server) and store the tar ball separately (on another
directory or even on another machine, for example). This would be useful if
you need to go back and change your troubleshooting methodology all over
again. Of course, if files are corrupt to begin with, then I am not sure ho
useful it would be to begin with.
Check whether everything is fine at the system level. Look back in the
directory server error log file to see what types of errors showed up (when
the directory server tried to start the first time after the system reboot).
Check in the system log to make sure that things are fine.
Finally, you can also see if by chance, you had taken any ldif dumps of the
directory server data at any point in time in the past. Or may be the file
system (or the system) itself was backed up by chance for some other
purpose. Do you have just one directory server instance running (i.e., only
1 master and no replicas/consumers) ?
PS: A couple of things that could have helped in this scenario is to have
regular backups of the system and also regular backups of the directory
server data
(
db2ldif.pl<http://www.redhat.com/docs/manuals/dir-server/cli/scripts.h...).
Also, another system (or a virtual machine) that is part of a development or
test environment and one which is similar to this production server in setup
and operation would be useful to have so that things can be tested on it
first before being deployed into production.
-=Venkat=-
gvenkat(a)gmail.com
On 9/17/07, Steven Jones <Steven.Jones(a)vuw.ac.nz> wrote:
Not knowing a huge amount about FDS/LDAP….I'd start with checking the OS.
Eg.,
[17/Sep/2007:20:52:06 -0500] - Please make sure there is enough disk space
for dbcache (10485760 bytes) and db region files
Suggests to me to check the filesystem with df –h to make sure there is
space left….possibly there is a core dump or something that needs
deleting…rare in Linux but not known on Solaris….
Or maybe some mount point failed to mount as the OS considered it too
damaged….make sure all the filespaces are mounted…
Beyond this I cannot help, sorry.
Making no backups or at least not exporting the database is hopefully
something you will not do again….
regards
Steven Jones
Senior Linux/Unix/San/Vmware System Administrator
APG -Technology Integration Team
Victoria University of Wellington
Phone: +64 4 463 6272
------------------------------
*From:* fedora-directory-users-bounces(a)redhat.com [mailto:
fedora-directory-users-bounces(a)redhat.com] *On Behalf Of *bikas gurung
*Sent:* Tuesday, 18 September 2007 3:50 p.m.
*To:* fedora-directory-users(a)redhat.com
*Subject:* [Fedora-directory-users] help....unable to start fedora server
Hi all,
I'm certainly in deep s*&#t now. I just updated my file-server with new
updates and patches and tried to reboot it; but it hanged: reason - Kernel
Panic. So I had to shutdown the system manually and had to run 'fsck'
manually afterwards. Everything seemed to run well afterwards. But today
evening I found that I was not able to connect my pc to file-server. When I
checked, it turns out that 'slapd' daemon wasn't started at all. I manually
tried to start the server using the scripts (in /rc.d/init.d ) but got an
error. Here's an error logged in log file:
Fedora-Directory/1.0.2 B2006.060.1928
isec-file:636 (/opt/fedora-ds/slapd-isec-file)
[17/Sep/2007:20:52:06 -0500] - Fedora-Directory/1.0.2 B2006.060.1928starting up
[17/Sep/2007:20:52:06 -0500] - Detected Disorderly Shutdown last time
Directory Server was running, recovering database.
[17/Sep/2007:20:52:06 -0500] - libdb: Ignoring log file:
/opt/fedora-ds/slapd-isec-file/db/log.0000000206: magic number 0, not 40988
[17/Sep/2007:20:52:06 -0500] - libdb: Invalid log file: log.0000000206:
Invalid argument
[17/Sep/2007:20:52:06 -0500] - libdb: PANIC: Invalid argument
[17/Sep/2007:20:52:06 -0500] - libdb: PANIC: DB_RUNRECOVERY: Fatal error,
run database recovery
[17/Sep/2007:20:52:06 -0500] - Database Recovery Process FAILED. The
database is not recoverable. err=-30978: DB_RUNRECOVERY: Fatal error, run
database recovery
[17/Sep/2007:20:52:06 -0500] - Please make sure there is enough disk space
for dbcache (10485760 bytes) and db region files
[17/Sep/2007:20:52:06 -0500] - start: Failed to init database, err=-30978
DB_RUNRECOVERY: Fatal error, run database recovery
[17/Sep/2007:20:52:06 -0500] - Failed to start database plugin ldbm
database
[17/Sep/2007:20:52:06 -0500] - WARNING: ldbm instance userRoot already
exists
[17/Sep/2007:20:52:06 -0500] - WARNING: ldbm instance NetscapeRoot already
exists
[17/Sep/2007:20:52:06 -0500] binder-based resource limits -
nsLookThroughLimit: parameter error (slapi_reslimit_register() already
registered)
[17/Sep/2007:20:52:06 -0500] - start: Resource limit registration failed
[17/Sep/2007:20:52:06 -0500] - Failed to start database plugin ldbm
database
[17/Sep/2007:20:52:06 -0500] - Error: Failed to resolve plugin
dependencies
[17/Sep/2007:20:52:06 -0500] - Error: preoperation plugin 7-bit check is
not started
[17/Sep/2007:20:52:06 -0500] - Error: accesscontrol plugin ACL Plugin is
not started
[17/Sep/2007:20:52:06 -0500] - Error: preoperation plugin ACL preoperation
is not started
[17/Sep/2007:20:52:06 -0500] - Error: postoperation plugin Class of
Service is not started
[17/Sep/2007:20:52:06 -0500] - Error: preoperation plugin HTTP Client is
not started
[17/Sep/2007:20:52:06 -0500] - Error: database plugin ldbm database is not
started
[17/Sep/2007:20:52:06 -0500] - Error: object plugin Legacy Replication
Plugin is not started
[17/Sep/2007:20:52:06 -0500] - Error: object plugin Multimaster
Replication Plugin is not started
[17/Sep/2007:20:52:06 -0500] - Error: postoperation plugin Roles Plugin is
not started
[17/Sep/2007:20:52:06 -0500] - Error: object plugin Views is not started
As all the client machines depend upon this server for authentication and
as weekend is still far away, I'm in big trouble now. I'm quite clueless
what to do and would really appreciate any kind of help. And no,
unfortunately I don't have a backup to fall back to .
Thanking you in advance
bikas