Hi,

The error:

[17/Sep/2007:20:52:06 -0500] - libdb: Ignoring log file: /opt/fedora-ds/slapd-isec-file/db/log.0000000206: magic number 0, not 40988

indicates that the backend Berkeley failed to use the log file log.0000000206 as it is not a valid Berkeley DB logfile. Since you mentioned that you had to shutdown the system manually and do a fsck when it came back up, one possibility is that the log.0000000206 log file (and may be more files) could have been corrupted. Have you checked the lost+found directory for any recovered files ?

In any case, I would recommend that before you do any more troubleshooting with the server, you take a snapshot (tar ball) of the affected directory tree (/opt/fedora-ds and any other directories you can think of as belonging to the directory server) and store the tar ball separately (on another directory or even on another machine, for example). This would be useful if you need to go back and change your troubleshooting methodology all over again. Of course, if files are corrupt to begin with, then I am not sure ho useful it would be to begin with.

Check whether everything is fine at the system level. Look back in the directory server error log file to see what types of errors showed up (when the directory server tried to start the first time after the system reboot). Check in the system log to make sure that things are fine.

Finally, you can also see if by chance, you had taken any ldif dumps of the directory server data at any point in time in the past. Or may be the file system (or the system) itself was backed up by chance for some other purpose. Do you have just one directory server instance running ( i.e., only 1 master and no replicas/consumers) ?

PS: A couple of things that could have helped in this scenario is to have regular backups of the system and also regular backups of the directory server data ( db2ldif.pl). Also, another system (or a virtual machine) that is part of a development or test environment and one which is similar to this production server in setup and operation would be useful to have so that things can be tested on it first before being deployed into production.

-=Venkat=-
gvenkat@gmail.com

On 9/17/07, Steven Jones <Steven.Jones@vuw.ac.nz > wrote:

Not knowing a huge amount about FDS/LDAP….I'd start with checking the OS. Eg.,

[17/Sep/2007:20:52:06 -0500] - Please make sure there is enough disk space for dbcache (10485760 bytes) and db region files

Suggests to me to check the filesystem with df –h to make sure there is space left….possibly there is a core dump or something that needs deleting…rare in Linux but not known on Solaris….

Or maybe some mount point failed to mount as the OS considered it too damaged….make sure all the filespaces are mounted…

Beyond this I cannot help, sorry.

Making no backups or at least not exporting the database is hopefully something you will not do again….

regards

Steven Jones
Senior  Linux/Unix/San/Vmware System Administrator
APG -Technology Integration Team
Victoria University of Wellington
Phone: +64 4 463 6272


From: fedora-directory-users-bounces@redhat.com [mailto:fedora-directory-users-bounces@redhat.com] On Behalf Of bikas gurung
Sent: Tuesday, 18 September 2007 3:50 p.m.
To: fedora-directory-users@redhat.com
Subject: [Fedora-directory-users] help....unable to start fedora server

Hi all,
I'm certainly in deep s*&#t now. I just updated my file-server with new updates and patches and tried to reboot it; but it hanged: reason - Kernel Panic. So I had to shutdown the system manually and had to run 'fsck' manually afterwards. Everything seemed to run well afterwards. But today evening I found that  I was not able to connect my pc to file-server. When I checked, it turns out that 'slapd' daemon wasn't started at all. I manually tried to start the server using the scripts (in /rc.d/init.d ) but got an error. Here's an error logged in log  file:

Fedora-Directory/1.0.2 B2006.060.1928
        isec-file:636 (/opt/fedora-ds/slapd-isec-file)

[17/Sep/2007:20:52:06 -0500] - Fedora-Directory/1.0.2 B2006.060.1928 starting up
[17/Sep/2007:20:52:06 -0500] - Detected Disorderly Shutdown last time Directory Server was running, recovering database.
[17/Sep/2007:20:52:06 -0500] - libdb: Ignoring log file: /opt/fedora-ds/slapd-isec-file/db/log.0000000206: magic number 0, not 40988
[17/Sep/2007:20:52:06 -0500] - libdb: Invalid log file: log.0000000206: Invalid argument
[17/Sep/2007:20:52:06 -0500] - libdb: PANIC: Invalid argument
[17/Sep/2007:20:52:06 -0500] - libdb: PANIC: DB_RUNRECOVERY: Fatal error, run database recovery
[17/Sep/2007:20:52:06 -0500] - Database Recovery Process FAILED. The database is not recoverable. err=-30978: DB_RUNRECOVERY: Fatal error, run database recovery
[17/Sep/2007:20:52:06 -0500] - Please make sure there is enough disk space for dbcache (10485760 bytes) and db region files
[17/Sep/2007:20:52:06 -0500] - start: Failed to init database, err=-30978 DB_RUNRECOVERY: Fatal error, run database recovery
[17/Sep/2007:20:52:06 -0500] - Failed to start database plugin ldbm database
[17/Sep/2007:20:52:06 -0500] - WARNING: ldbm instance userRoot already exists
[17/Sep/2007:20:52:06 -0500] - WARNING: ldbm instance NetscapeRoot already exists
[17/Sep/2007:20:52:06 -0500] binder-based resource limits - nsLookThroughLimit: parameter error (slapi_reslimit_register() already registered)
[17/Sep/2007:20:52:06 -0500] - start: Resource limit registration failed
[17/Sep/2007:20:52:06 -0500] - Failed to start database plugin ldbm database
[17/Sep/2007:20:52:06 -0500] - Error: Failed to resolve plugin dependencies
[17/Sep/2007:20:52:06 -0500] - Error: preoperation plugin 7-bit check is not started
[17/Sep/2007:20:52:06 -0500] - Error: accesscontrol plugin ACL Plugin is not started
[17/Sep/2007:20:52:06 -0500] - Error: preoperation plugin ACL preoperation is not started
[17/Sep/2007:20:52:06 -0500] - Error: postoperation plugin Class of Service is not started
[17/Sep/2007:20:52:06 -0500] - Error: preoperation plugin HTTP Client is not started
[17/Sep/2007:20:52:06 -0500] - Error: database plugin ldbm database is not started
[17/Sep/2007:20:52:06 -0500] - Error: object plugin Legacy Replication Plugin is not started
[17/Sep/2007:20:52:06 -0500] - Error: object plugin Multimaster Replication Plugin is not started
[17/Sep/2007:20:52:06 -0500] - Error: postoperation plugin Roles Plugin is not started
[17/Sep/2007:20:52:06 -0500] - Error: object plugin Views is not started

As all the client machines depend upon this server for authentication and as weekend is still far away, I'm in big trouble now. I'm quite clueless what to do and would really appreciate any kind of help. And no, unfortunately I don't have a backup to fall back to .

Thanking you in advance
bikas