Active Directory Database Corruption/Recovery

Active Directory Database Corruption/Recovery

Active Directory Database corruption and you don’t know what to do?

Did you got stuck with a domain controller that at startup is showing a message “Directory is Rebuilding Indices” and after a long time it fails?

Did you found corruption messages in event log:
NTDS ISAM Event ID: 467 database corruption Error,
NTDS Replication Event ID: 1084 Replication Error
NTDS Replication Event ID: 2108 Replication Error
NTDS General Internal Event ID: 1173 Processing Warning
………………………………………………….and others…

What about descriptions:
8451 The replication operation encountered a database error.
1414 JET_errSecondaryIndexCorrupted, Secondary index is corrupt. The database must be defragmented

Yep, these are some of the corruption errors that you may find if your Active Directory (AD) Database (DB) is “dead”

1 – Before proceed, let me tell you that in scenarios like this one, you should always try to get the best help possible. Best help means calling to Microsoft PSS. They have the necessary experience and documentation to help you with those problems.

2 – The recovery solutions posted bellow does not guarantee that your problem will be fixed. The posted solutions are general recommendations that you may decide to follow or not. Use the information on this post at your own RISK!!! And remember to ALWAYS test in lab environments before going to production.

That said what options do I have when this happens?

Note: You should always try to find the root cause for these types of problems. The root problems are normally related with hardware problems, antivirus configurations, Virus, Power outages, etc… If you don’t identify the root cause, there’s a good probability that you may end up where you started.

Now it’s time to recover…

******************************************************************
Scenario 1: Recover From Backup *******************************
******************************************************************
1. Get your DC backup and try to restore the DC using the latest backup. At minimum you need the system state backup to recover the AD DB. To restore the System State data on a domain controller, you must first start your computer in Directory Services Restore Mode (ADRM). This will allow you to restore the SYSVOL directory and Active Directory service database.

2. To access Directory Services Restore Mode, reboot the server press F8 during startup and select it from the list of startup options. If you’re using a third party backup solution please consult the vendor documentation for Domain Controllers backup/recover scenarios, if you’re using NTBackup from Microsoft Windows check the basics here and here.

Note: Your DC backup is only valid if it’s within the forest tombstone lifetime.

3. After system restore, reboot the server and if everything is ok, find the root cause of the problem and fix it.

******************************************************************
Scenario 2: Rebuild the Domain controller **********************
******************************************************************

If you’ve more than one Domain controller you may try to rebuild the DC that is having problems and then re-promote it again.

1. Remove active directory from the DC. You can do that formatting the hard drive, replacing the drive with a new one (backup the files that you need before formatting the drive). Normally this is done by using the dcpromo /forceremoval, but in corruption scenarios that shouldn’t work. Just MAKE SURE that the DC and related Active Directory configuration IS OUT of the DC and is NEVER AGAIN related or CONNECTED to the same network where the ORIGINAL HEALTHY DCs are. Is very important to guarantee this step or you may end up in a complete forest corruption scenario. Perhaps formatting the drive is the best option here… Just in case 🙂

2. The second step relates to seizing process. No, it’s NOT “Transfer”, it’s SEIZE ROLES, transfers are only possible when the DCs that have FSMO roles are online, but that’s not the case because we formatted the drive, right?

If your “formatted” DC held any FSMO roles, you must seize them to another online DC. To identify if your “formatted” DC had any FSMO roles in it go to command prompt and type (first install support tools from your windows cd\Support directory):

netdom query fsmo

This command will return the FSMO owners for the forest and the domain were you’re performing the query. In a forest, there are at least five FSMO roles that are assigned to one or more domain controllers, 2 are Forest wide and 3 exist in each domain in the forest.

The five FSMO roles are:
Schema Master (Forest)
Domain naming master (Forest)
Infrastructure Master (Domain)
Relative ID (RID) Master (Domain)
PDC Emulator (Domain)

To know more about these roles click here:

If the command returns the “formatted” DC as owner of any of these FSMO roles, you need to Seize them to a different, online domain controller to know more about that process check:
Using Ntdsutil.exe to transfer or seize FSMO roles to a domain controller

3. The next step is to perform metadata cleanup. You need to manually remove all remnant entries of the corrupted DC from AD database. To do that follow:
How to remove data in Active Directory after an unsuccessful domain controller demotion
You should also remove any DNS related entry to that DC.

4. Okay, take a deep breath… Wait or force replication so every DC knows about the changes that you made.

5. After knowing that changes were successfully replicated to all existing DCs it should be safe to promote the server back to domain controller. But wait!!! Did you already determine the root cause of the problem? No!!! Fix that first. You don’t want to end up in the initial scenario right?!

******************************************************************
Scenario 3: Manually FIX AD Database ************************** ******************************************************************

This option should be used as last resource. As I said at the beginning of this post, follow it or not at your own RISK!!! Remember to ALWAYS test in lab environments before going to production.

To manually fix the AD DB check ALL STEPS

1. Reboot the server and press F8. Choose Directory Services Restore Mode from the Menu.
2. Check the physical location of the NTDS folder (Normally at %WINDIR%\NTDS\).
3. Perform a backup of the NTDS folder (copy the folder to a different drive or to the same drive with a different name eg: NTDSBK). If something goes wrong you can always replace the original files with this copy/backup.

4. Check the permissions for the “NTDS” folder.

Windows Server 2003
Default permissions are:
System Full Control This folder, subfolders and files
Administrators Full Control This folder, subfolders and files
Creator Owner Full Control Subfolders and Files only
Local Service Create Folders / Append Data This folder and subfolders

5. Check the %WINDIR%\Sysvol\Sysvol folder to make sure it is shared.

6. Check the permissions on the %WINDIR%\Sysvol\Sysvol share. Compare them with other online DC.

Note: You may not be able to change the permissions on these folders if the Active Directory database is unavailable because it is damaged, however it is best to know if the permissions are set correctly before you start the recovery process, as it may not be the database that is the problem.

7. Make sure there is a folder in the Sysvol share labeled with the correct name for the domain.

8. Open a command prompt and run NTDSUTIL to verify the paths for the NTDS.dit file. They should match the physical structure from Step 2
From command prompt type:

ntdsutil files info

Output that is similar to the following appears:
C:\ NTFS (Fixed Drive) free(850.4 Mb) total(10.2Gb)
DS Path Information:
Database : C:\WINDOWS\NTDS\ntds.dit – 10.1 Mb
Backup dir : C:\WINDOWS\NTDS\dsadata.bak
Working dir: C:\WINDOWS\NTDS
Log dir : C:\WINDOWS\NTDS – 20.2 Mb total
temp.edb – 1.1 Mb
res2.log – 10.0 Mb
res1.log – 10.0 Mb
edb00001.log – 10.0 Mb
edb.log – 10.0 Mb

This information is pulled directly from the registry subkey: “HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\NTDS\Parameters“. Wrong paths can lead to incorrect start up of Active Directory. If this is your case check KB240362.
Type Quit to end the NTDSUTIL session.

9. Rename the edb.chk file and try to boot to Normal mode. If that fails, proceed with the next steps.

10. Reboot the server and press F8. Choose Directory Services Restore Mode from the Menu. From command prompt use the ESENTUTL to check the integrity of the database. You can use NTDSUTIL to check the integrity however ESENTUTL is usually more reliable.

To perform the integrity check, start the command prompt, type the following command:

esentutl /g “path\ntds.dit” /!10240 /8 /o

The output will tell you if the database is inconsistent and may produce a jet_error 1206 stating that the database is corrupt. If the database is inconsistent or corrupt it will need to be recovered or repaired. To recover the database type the following at the command prompt:
NTDSUTIL
Files
Recover

If this fails with an error, type quit back at the command prompt and repair the database using ESENTUTL. Type the following command:

esentutl /p “path\ntds.dit” /!10240 /8 /o

Note: If you do not put the switches at the end of the command you will most likely get a Jet_error 1213 “Page size mismatch” error.

11. Delete the log files inside NTDS directory, but do not delete or move the ntds.dit file.

12. The NTDSUTIL tool needs to be run again to check the Integrity of the database and to perform Semantic Database analysis. To check the integrity, at the command prompt type:

NTDSUTIL
Files
Integrity

The output should tell you that the integrity check completed successfully and prompt that you should perform a Semantic Database Analysis.
Type quit.

To perform the Semantic Database Analysis type the following at the NTDSUTIL Prompt type:
Semantic Database Analysis
Go

The output will tell you that the Analysis completed successfully.
Type quit and close the command prompt.

NOTE: If you get errors running the Analysis then type the following at the semantic checker prompt:
semantic checker: go fix

This puts the checker in Fixup mode, which should fix whatever errors there were.

13. Okay, take a deep breath… Review all steps…

14. Reboot the server to Normal Mode.

Hopefully one of these options will fix your problem 🙂

Additional Information:
Complete a Semantic Database Analysis for the Active Directory
Error Message: Lsass.exe – System Error : Security Accounts Manager

Windows 2000 DCs Unable to Boot into Active Directory
Use Ntdsutil to Manage Active Directory Files from the Command

This posting is provided “AS IS” with no warranties, and confers no rights.

Leave a Reply

Your email address will not be published. Required fields are marked *

three × 5 =

This site uses Akismet to reduce spam. Learn how your comment data is processed.