Active Directory Troubleshooting
The Active Directory database consists of database files and log files, and the overall health of these files is key to Active Directory’s stability. In this section, we discuss problems related to these files, such as corrupted files or inconsistent data due to replication problems.
Active Directory Files
The key to a successful Active Directory backup is the system state. The Active Directory file system is built to handle full and complete restoration even when time has elapsed since the backup occurred.
Following are the files that make up the system state:
- NTDS.DIT : This file is the Active Directory database file.
- EDB.LOG : This log file contains the transactions that have occurred since the last backup.
- EDB001.LOG, EDB002.LOG, etc. : These log files are similar to the EDB.LOG file; they are created when the EDB.LOG is full.
- EDB.CHK : This file contains an authoritative list of all transactions contained in the EDB files.
- RESx.LOG : These files are created when the EDB.LOG file creation fails. When the NTDS.DIT file is backed up, the current EDB.LOG file is deleted and a blank 10 MB EDB.LOG file is created. If there is not enough disk space for the new file creation, RESx.LOG (where x is a number, assigned in sequence to each file) files are created. One file is created per transaction. This allows more transactions to be recorded.
- *.PAT : These files are created when a transaction is split between log files. During a restore, these files are used to patch transactions that span more than one log file.
Troubleshooting Active Directory Replication
For an administrator, there is nothing more stressful than having domain controllers that are not replicating. By their very nature, domain controllers are supposed to be multi-master replicas of one another, and all domain controllers within a domain should have identical information. However, some issues can cause replication problems which bring you to troubleshooting.
Understanding how something works is the first part of knowing how to troubleshoot a problem. Whereas having troubleshooting methodology will help you ascertain the problem, you will have a better “feel” for where to start looking for problems if you understand the way something works and behaves.
Active Directory replication is probably one of the most obscure things that you will have to learn. Most administrators know why they need replication, but they don’t know how it gets the objects from one domain controller to another.
If your DNS infrastructure has issues, Active Directory replication may not work. Verify that your DNS infrastructure is stable and that name resolution is working correctly.
If DNS is working correctly, domain controllers will have a better chance of replicating the objects between one another. The first thing a domain controller does when replicating objects is examine the connection objects to other domain controllers. The domain controller will not be concerned about domain controllers other than those to which it has a connection.
Within the configuration partition, the domain controller will find the domain controllers to which it is connected. Active Directory will return the GUID that is associated with the domain controller defined on the connection object. Each domain controller registers the service locator (SRV) records for the Active Directory services it supports, and its GUID. If you open the _msdcs zone for the domain, you will find the domain controller GUID. The following figure shows the GUID for the domain controllers within the techtutsonline.com domain. The GUIDs appear as the last two lines within the details pane.
After the domain controller has obtained the GUID for the partner domain controller, it sends a query to DNS to locate the host name; then, using the host name, it queries for the IP address. Once the domain controller has the partner domain controller’s IP address, it can initiate a Remote Procedure Call (RPC) connection to the partner and begin the replication process.
Determining DNS Problems
When domain controllers are brought online, they register records within the DNS server they are configured to use. This registration includes the GUIDs that are registered within the _msdcs zone. If the domain controller fails to update its GUID, other domain controllers will not be able to locate it to replicate Active Directory objects. You can determine the DNS problems using RepAdmin and DCDiag utilities.
The RepAdmin utility can assist you when you are trying to determine the cause of replication problems. Some of its most popular options include checking the status of the Knowledge Consistency Checker (KCC), viewing the replication partners for domain controllers, and viewing which domain controllers have not replicated. If you want to view the KCC status, enter the repadmin /kcc command. If you want to view the replication status of the last replication attempt from a domain controller’s replication partners, enter the repadmin /showrepl command. The result of both commands is shown below:
C:\Windows\System32>repadmin /kcc Repadmin: running command /kcc against full DC localhost Default-First-Site-Name Current Site Options: (none) Consistency check on localhost successful. C:\Windows\System32>repadmin /showrepl Repadmin: running command /showrepl against full DC localhost Default-First-Site-Name\DC1 DSA Options: IS_GC Site Options: (none) DSA object GUID: a58e2a76-5db1-4e6e-af13-0ab21c22519c DSA invocationID: a58e2a76-5db1-4e6e-af13-0ab21c22519c ==== INBOUND NEIGHBORS ====================================== DC=techtutsonline,DC=com Default-First-Site-Name\DC2 via RPC DSA object GUID: 16df425c-10b7-4d05-bea5-297741076ba1 Last attempt @ 2015-09-23 15:47:43 was successful. CN=Configuration,DC=techtutsonline,DC=com Default-First-Site-Name\DC2 via RPC DSA object GUID: 16df425c-10b7-4d05-bea5-297741076ba1 Last attempt @ 2015-09-23 15:32:06 was successful. CN=Schema,CN=Configuration,DC=techtutsonline,DC=com Default-First-Site-Name\DC2 via RPC DSA object GUID: 16df425c-10b7-4d05-bea5-297741076ba1 Last attempt @ 2015-09-23 15:32:06 was successful. DC=DomainDnsZones,DC=techtutsonline,DC=com Default-First-Site-Name\DC2 via RPC DSA object GUID: 16df425c-10b7-4d05-bea5-297741076ba1 Last attempt @ 2015-09-23 15:32:06 was successful. DC=ForestDnsZones,DC=techtutsonline,DC=com Default-First-Site-Name\DC2 via RPC DSA object GUID: 16df425c-10b7-4d05-bea5-297741076ba1 Last attempt @ 2015-09-23 15:32:06 was successful.
To view the replication summary, use the repadmin /replsum command.
C:\Windows\System32>repadmin /replsum Replication Summary Start Time: 2015-09-23 15:50:49 Beginning data collection for replication summary, this may take awhile: ..... Source DSA largest delta fails/total %% error DC1 18m:43s 0 / 5 0 DC2 18m:43s 0 / 5 0 Destination DSA largest delta fails/total %% error DC1 18m:43s 0 / 5 0 DC2 18m:43s 0 / 5 0
Let’s discuss the output of repadmin /replsum command in details:
By default, the Active Directory replication is pull-based, so you should focus on the destination domain controller first.
In the above output, each dot (marked red) after the first three represents a domain controller.
Largest delta denotes the longest replication gap among all replication links for a particular domain controller.
Total is the replica links for a particular domain controller (one for each naming context on each domain controller).
Fail is the total number of replica links failing to replicate for one reason or the other. This will never be greater than the Total field.
Percentage is the percentage of failures in relation to the total replica links on the domain controller.
You can force synchronization for any of the active directory partitions with the repadmin /sync command. This will force replication for a specific partition from a replication partner that you use in the command. If you want to force replication between all domain controllers, you can use the option /syncall. By default, the Active Directory replication is pull replication, meaning that the domain controller will request the data from its partners. You can change that behavior by using the /P switch, which forces the domain controller to push its objects to its partner domain controllers. The command looks like this:
repadmin /syncall dc_FQDN directory_partition /P
C:\Windows\System32>repadmin /syncall dc1 /P CALLBACK MESSAGE: The following replication is in progress: From: a58e2a76-5db1-4e6e-af13-0ab21c22519c._msdcs.techtutsonline.com To : 16df425c-10b7-4d05-bea5-297741076ba1._msdcs.techtutsonline.com CALLBACK MESSAGE: The following replication completed successfully: From: a58e2a76-5db1-4e6e-af13-0ab21c22519c._msdcs.techtutsonline.com To : 16df425c-10b7-4d05-bea5-297741076ba1._msdcs.techtutsonline.com CALLBACK MESSAGE: SyncAll Finished. SyncAll terminated with no errors.
If you are not comfortable with command-line tools, you can open Active Directory Sites and Services under Administrative Tools, right click the NTDS Settings object, and select All Tasks Check Replication Topology. If you still receive errors because the KCC did not create the appropriate connection objects, manually create a connection object between the domain controllers. (If you use the Check Replication Topology option on the domain controller that is the Inter-Site Topology Generator (ISTG), you will recalculate the intersite and intrasite replication topology. If you run it from any other domain controller, you will recalculate the intrasite topology).
DCDiag is a command-line utility that will run diagnostic tests against the domain controller. It runs several tests, and the output can span many screens. If you want to perform specific tests against the domain controller, use the /test: switch. For instance, if you want to make sure that the replication topology is fully interconnected, issue the dcdiag /test:topology command:
C:\Windows\System32>dcdiag /test:topology Directory Server Diagnosis Performing initial setup: Trying to find home server... Home Server = DC1 * Identified AD Forest. Done gathering initial info. Doing initial required tests Testing server: Default-First-Site-Name\DC1 Starting test: Connectivity ......................... DC1 passed test Connectivity Doing primary tests Testing server: Default-First-Site-Name\DC1 Starting test: Topology ......................... DC1 passed test Topology Running partition tests on : ForestDnsZones Running partition tests on : DomainDnsZones Running partition tests on : Schema Running partition tests on : Configuration Running partition tests on : techtutsonline Running enterprise tests on : techtutsonline.com
To test that replication is functioning properly, issue the dcdiag /test:replications command.
C:\Windows\System32>dcdiag /test:replications Directory Server Diagnosis Performing initial setup: Trying to find home server... Home Server = DC1 * Identified AD Forest. Done gathering initial info. Doing initial required tests Testing server: Default-First-Site-Name\DC1 Starting test: Connectivity ......................... DC1 passed test Connectivity Doing primary tests Testing server: Default-First-Site-Name\DC1 Starting test: Replications ......................... DC1 passed test Replications Running partition tests on : ForestDnsZones Running partition tests on : DomainDnsZones Running partition tests on : Schema Running partition tests on : Configuration Running partition tests on : techtutsonline Running enterprise tests on : techtutsonline.com
To view the complete status of replication for a specific domain controller, use the following command:
C:\Windows\System32>dcdiag /v /s:dc2 Directory Server Diagnosis Performing initial setup: * Connecting to directory service on server dc2. * Identified AD Forest. Collecting AD specific global data * Collecting site info. Calling ldap_search_init_page(hld,CN=Sites,CN=Configuration,DC=techtutsonline ,DC=com,LDAP_SCOPE_SUBTREE,(objectCategory=ntDSSiteSettings),....... The previous call succeeded Iterating through the sites Looking at base site object: CN=NTDS Site Settings,CN=Default-First-Site-Name ,CN=Sites,CN=Configuration,DC=techtutsonline,DC=com Getting ISTG and options for the site * Identifying all servers. Calling ldap_search_init_page(hld,CN=Sites,CN=Configuration,DC=techtutsonline ,DC=com,LDAP_SCOPE_SUBTREE,(objectClass=ntDSDsa),....... The previous call succeeded.... The previous call succeeded Iterating through the list of servers Getting information for the server CN=NTDS Settings,CN=DC1,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=techtutsonline,DC=com objectGuid obtained InvocationID obtained dnsHostname obtained site info obtained All the info for the server collected Getting information for the server CN=NTDS Settings,CN=DC2,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=techtutsonline,DC=com objectGuid obtained InvocationID obtained dnsHostname obtained site info obtained All the info for the server collected * Identifying all NC cross-refs. * Found 2 DC(s). Testing 1 of them. Done gathering initial info. Doing initial required tests Testing server: Default-First-Site-Name\DC2 Starting test: Connectivity * Active Directory LDAP Services Check Determining IP4 connectivity * Active Directory RPC Services Check ......................... DC2 passed test Connectivity Doing primary tests Testing server: Default-First-Site-Name\DC2 Starting test: Advertising The DC DC2 is advertising itself as a DC and having a DS. The DC DC2 is advertising as an LDAP server The DC DC2 is advertising as having a writeable directory The DC DC2 is advertising as a Key Distribution Center The DC DC2 is advertising as a time server The DS DC2 is advertising as a GC. ......................... DC2 passed test Advertising Test omitted by user request: CheckSecurityError [output cut]
Best Practices for Troubleshooting AD Replication
If your domain controllers are not replicating objects correctly, users will not be able to gain access to the objects they need, and may fail to logon.
Keep the following tips in mind when troubleshooting active directory replication issues:
- Use the tools you are most familiar with when troubleshooting replication problems, but become familiar with the tools you are not familiar with.
- Verify the replication topology to make sure all of the domain controllers from all sites are interconnected.
- Urgent replication such as account lockouts will occur within the site, but will not be replicated to other sites until the site link allows it to replicate. Use repadmin utility to force the change.
- Create connection objects between domain controllers that hold FSMO roles and the servers that will act as their backup if the FSMO role holder fails. Make sure replication is occurring between the two servers.
Troubleshooting FSMO Roles
Flexible Single Master Operations (FSMO) Roles are specialized services within Active Directory Domain Services that should be performed only by a single domain controller at a time. We have already discussed FSMO roles here.
Because there can be only one domain controller holding each of the roles, you need to make sure you keep them operational. Of course, with some of these roles, getting them up and operational is more important than it is with others; however, you should still know what is required to get them into an operational state.
Importance of FSMO Roles
Each FSMO role is important within the forest. Without them, you will not have a means of identifying objects correctly, and data corruption can occur if two or more administrators make changes to objects within the forest. If your current role holder goes down, you can transfer the role to any functional domain controller. If you are sure that the failed role holder domain controller will not be able to come back online, you can seize the FSMO role to any functional domain controller. The whole procedure of transferring and seizing the FSMO roles is discussed here.
Best Practices for Troubleshooting FSMO Roles
Remember a few points about FSMO roles:
- Do not seize a role unless you are sure you will never reintroduce the original role holder to the network.
- If demoting a role holder, transfer the role to another domain controller first.
- Keep documentation that identifies the role holders and the domain controllers that are designated as the standby servers.
Troubleshooting Logon Failures
Nothing is more frustrating for users if attempt to logon at first time in the morning, gives an error. Immediately their day is off to a bad start. Troubleshooting is your realm, so it’s up to you to determine what is causing the problem and to set things back on track. Within an Active Directory domain, several things could be at fault.
If you are using a Windows Server 2008 or Windows Server 2012 based domain, the default password policy uses complex passwords. Although this is a good policy to use from the standpoint of most of your security auditors, you will find that it can cause additional headaches. Complex passwords, while more secure, are also more difficult for users to remember. You will probably end up unlocking users’ accounts and changing their passwords for them more often than you would with simpler passwords. You will also run into the problem of users writing down their passwords and leaving them close to their systems. Controlling passwords, monitoring authentication, and maintaining a sensible password and lockout policy will help you minimize logon problems, but you will still be forced into troubleshooting these issues.
Auditing for Logon Problems
As with any troubleshooting, you should start with checking out the event logs on the client system and the domain controllers within their sites. Although many administrators criticize the event logs, you can find out some interesting and useful information from them. If you have enabled auditing of account logon and logon events, you will receive events in the security log that pertain to accounts as they authenticate or fail to authenticate. To watch for failures, you must audit the failure of authentication by using an audit policy. Once you do that, you can receive the audit log for specific entries if users start having difficulty authenticating.
The following figure shows an example of a GPO that is being used to implement auditing for a domain. When you choose the options to audit and you want to see information concerning authentication, you should set the options as marked with green box.
Once these settings are done, you will be able to view the security logs for common events. On domain controllers, you can look for information concerning account lockouts and changes to the accounts. Event ID : 675 will show you the IP address of a client computer from which the bad password originated. If this IP address is not the computer from which the client normally authenticates, it may be an indication of an attack on the account. Event ID : 644 will appear if the Audit Account Management auditing option is set for Success. This event is generated any time an account is locked out because of incorrect credentials.
You can also view the security logs on client systems and search for common Event IDs. Event ID : 529 is recorded if the system does not have the user account that is attempting to log on. This could be due to the user accidentally pressing the wrong keys when they are logging on, but it could also be an indication of someone trying to hack into a system by guessing an account name. Event ID : 531 indicates that the account that was attempting to authenticate is locked out or has been disabled by an administrator.
By turning on Kerberos logging, you can have the system present more-detailed information concerning authentication. To do so, you can edit the registry manually. If you plan to edit the registry on a domain controller to enable Kerberos logging, you will need to open regedit and navigate to the following registry key:
You must create the REG_DWORD entry LogLevel. If you set the value of this entry to 1, you will be able to monitor the system event log for Event ID 4. If Event ID 4 appears in the log, it will indicate that a bad password was sent to the Kerberos service for authentication or that the account was locked out. If the error code within this event specifies 0x18 KDC_ERR_PREAUTH_FAILED, the password was incorrect. An error code of 0x12 KDC_ERR_CLIENT_REVOKED indicates that the account was locked out.
Best Practices for Logon and Account-Lockout Troubleshooting
The logon issues frustrates the administrators and users similarly. The calls that flood in right after a mandatory password change can be frustrating, but if you follow the information tips, you may be able to reduce the resulting headache:
- Only enable universal group membership caching if you want to reduce the replication across a WAN link and you have a small number of users who will be affected.
- Only turn off universal group membership enumeration for a native-mode domain unless you are not using universal security groups.
- Turn on auditing for account logon and account management so that you can identify logon failures and can determine the causes.
- Take advantage of the new Account Lockout and Management Tools to aid in troubleshooting account lockout.
- Monitor the PDC Emulator for authentication attempts. All attempts with a bad password are forwarded to the PDC Emulator.
- Turn off logging when it is not necessary so it does not consume additional resources.