Active Directory Troubleshooting

Publish Date: September 23, 2015

Active Directory Troubleshooting

The Active Directory database consists of database files and log files, and the overall health of these files is key to Active Directory’s stability. In this section, we discuss problems related to these files, such as corrupted files or inconsistent data due to replication problems.

Active Directory Files

The key to a successful Active Directory backup is the system state. The Active Directory file system is built to handle full and complete restoration even when time has elapsed since the backup occurred.

Following are the files that make up the system state:

  • NTDS.DIT : This file is the Active Directory database file.
  • EDB.LOG : This log file contains the transactions that have occurred since the last backup.
  • EDB001.LOG, EDB002.LOG, etc. : These log files are similar to the EDB.LOG file; they are created when the EDB.LOG is full.
  • EDB.CHK : This file contains an authoritative list of all transactions contained in the EDB files.
  • RESx.LOG : These files are created when the EDB.LOG file creation fails. When the NTDS.DIT file is backed up, the current EDB.LOG file is deleted and a blank 10 MB EDB.LOG file is created. If there is not enough disk space for the new file creation, RESx.LOG (where x is a number, assigned in sequence to each file) files are created. One file is created per transaction. This allows more transactions to be recorded.
  • *.PAT : These files are created when a transaction is split between log files. During a restore, these files are used to patch transactions that span more than one log file.

Troubleshooting Active Directory Replication

For an administrator, there is nothing more stressful than having domain controllers that are not replicating. By their very nature, domain controllers are supposed to be multi-master replicas of one another, and all domain controllers within a domain should have identical information. However, some issues can cause replication problems which bring you to troubleshooting.

Understanding Replication

Understanding how something works is the first part of knowing how to troubleshoot a problem. Whereas having troubleshooting methodology will help you ascertain the problem, you will have a better “feel” for where to start looking for problems if you understand the way something works and behaves.

Active Directory replication is probably one of the most obscure things that you will have to learn. Most administrators know why they need replication, but they don’t know how it gets the objects from one domain controller to another.

If your DNS infrastructure has issues, Active Directory replication may not work. Verify that your DNS infrastructure is stable and that name resolution is working correctly.

If DNS is working correctly, domain controllers will have a better chance of replicating the objects between one another. The first thing a domain controller does when replicating objects is examine the connection objects to other domain controllers. The domain controller will not be concerned about domain controllers other than those to which it has a connection.

Within the configuration partition, the domain controller will find the domain controllers to which it is connected. Active Directory will return the GUID that is associated with the domain controller defined on the connection object. Each domain controller registers the service locator (SRV) records for the Active Directory services it supports, and its GUID. If you open the _msdcs zone for the domain, you will find the domain controller GUID. The following figure shows the GUID for the domain controllers within the techtutsonline.com domain. The GUIDs appear as the last two lines within the details pane.

Domain Controller GUIDs registered in DNS
Domain Controller GUIDs registered in DNS

After the domain controller has obtained the GUID for the partner domain controller, it sends a query to DNS to locate the host name; then, using the host name, it queries for the IP address. Once the domain controller has the partner domain controller’s IP address, it can initiate a Remote Procedure Call (RPC) connection to the partner and begin the replication process.

Determining DNS Problems

When domain controllers are brought online, they register records within the DNS server they are configured to use. This registration includes the GUIDs that are registered within the _msdcs zone. If the domain controller fails to update its GUID, other domain controllers will not be able to locate it to replicate Active Directory objects. You can determine the DNS problems using RepAdmin and DCDiag utilities.

Using RepAdmin

The RepAdmin utility can assist you when you are trying to determine the cause of replication problems. Some of its most popular options include checking the status of the Knowledge Consistency Checker (KCC), viewing the replication partners for domain controllers, and viewing which domain controllers have not replicated. If you want to view the KCC status, enter the repadmin /kcc command. If you want to view the replication status of the last replication attempt from a domain controller’s replication partners, enter the repadmin /showrepl command. The result of both commands is shown below:

C:\Windows\System32>repadmin /kcc

Repadmin: running command /kcc against full DC localhost
Default-First-Site-Name
Current Site Options: (none)
Consistency check on localhost successful.

C:\Windows\System32>repadmin /showrepl

Repadmin: running command /showrepl against full DC localhost
Default-First-Site-Name\DC1
DSA Options: IS_GC
Site Options: (none)
DSA object GUID: a58e2a76-5db1-4e6e-af13-0ab21c22519c
DSA invocationID: a58e2a76-5db1-4e6e-af13-0ab21c22519c

==== INBOUND NEIGHBORS ======================================

DC=techtutsonline,DC=com
    Default-First-Site-Name\DC2 via RPC
        DSA object GUID: 16df425c-10b7-4d05-bea5-297741076ba1
        Last attempt @ 2015-09-23 15:47:43 was successful.

CN=Configuration,DC=techtutsonline,DC=com
    Default-First-Site-Name\DC2 via RPC
        DSA object GUID: 16df425c-10b7-4d05-bea5-297741076ba1
        Last attempt @ 2015-09-23 15:32:06 was successful.

CN=Schema,CN=Configuration,DC=techtutsonline,DC=com
    Default-First-Site-Name\DC2 via RPC
        DSA object GUID: 16df425c-10b7-4d05-bea5-297741076ba1
        Last attempt @ 2015-09-23 15:32:06 was successful.

DC=DomainDnsZones,DC=techtutsonline,DC=com
    Default-First-Site-Name\DC2 via RPC
        DSA object GUID: 16df425c-10b7-4d05-bea5-297741076ba1
        Last attempt @ 2015-09-23 15:32:06 was successful.

DC=ForestDnsZones,DC=techtutsonline,DC=com
    Default-First-Site-Name\DC2 via RPC
        DSA object GUID: 16df425c-10b7-4d05-bea5-297741076ba1
        Last attempt @ 2015-09-23 15:32:06 was successful.

To view the replication summary, use the repadmin /replsum command.

C:\Windows\System32>repadmin /replsum
Replication Summary Start Time: 2015-09-23 15:50:49

Beginning data collection for replication summary, this may take awhile:
  .....

Source DSA          largest delta    fails/total %%   error
 DC1                       18m:43s    0 /   5    0
 DC2                       18m:43s    0 /   5    0

Destination DSA     largest delta    fails/total %%   error
 DC1                       18m:43s    0 /   5    0
 DC2                       18m:43s    0 /   5    0

Let’s discuss the output of repadmin /replsum command in details:

By default, the Active Directory replication is pull-based, so you should focus on the destination domain controller first.

In the above output, each dot (marked red) after the first three represents a domain controller.

Largest delta denotes the longest replication gap among all replication links for a particular domain controller.

Total is the replica links for a particular domain controller (one for each naming context on each domain controller).

Fail is the total number of replica links failing to replicate for one reason or the other. This will never be greater than the Total field.

Percentage is the percentage of failures in relation to the total replica links on the domain controller.

You can force synchronization for any of the active directory partitions with the repadmin /sync command. This will force replication for a specific partition from a replication partner that you use in the command. If you want to force replication between all domain controllers, you can use the option /syncall. By default, the Active Directory replication is pull replication, meaning that the domain controller will request the data from its partners. You can change that behavior by using the /P switch, which forces the domain controller to push its objects to its partner domain controllers. The command looks like this:

repadmin /syncall dc_FQDN directory_partition /P

C:\Windows\System32>repadmin /syncall dc1 /P
CALLBACK MESSAGE: The following replication is in progress:
    From: a58e2a76-5db1-4e6e-af13-0ab21c22519c._msdcs.techtutsonline.com
    To  : 16df425c-10b7-4d05-bea5-297741076ba1._msdcs.techtutsonline.com
CALLBACK MESSAGE: The following replication completed successfully:
    From: a58e2a76-5db1-4e6e-af13-0ab21c22519c._msdcs.techtutsonline.com
    To  : 16df425c-10b7-4d05-bea5-297741076ba1._msdcs.techtutsonline.com
CALLBACK MESSAGE: SyncAll Finished.
SyncAll terminated with no errors.

If you are not comfortable with command-line tools, you can open Active Directory Sites and Services under Administrative Tools, right click the NTDS Settings object, and select All Tasks  Check Replication Topology. If you still receive errors because the KCC did not create the appropriate connection objects, manually create a connection object between the domain controllers. (If you use the Check Replication Topology option on the domain controller that is the Inter-Site Topology Generator (ISTG), you will recalculate the intersite and intrasite replication topology. If you run it from any other domain controller, you will recalculate the intrasite topology).

Using DCDiag

DCDiag is a command-line utility that will run diagnostic tests against the domain controller. It runs several tests, and the output can span many screens. If you want to perform specific tests against the domain controller, use the /test: switch. For instance, if you want to make sure that the replication topology is fully interconnected, issue the dcdiag /test:topology command:

C:\Windows\System32>dcdiag /test:topology
Directory Server Diagnosis

Performing initial setup:
   Trying to find home server...
   Home Server = DC1
   * Identified AD Forest.
   Done gathering initial info.

Doing initial required tests

   Testing server: Default-First-Site-Name\DC1
      Starting test: Connectivity
         ......................... DC1 passed test Connectivity

Doing primary tests

   Testing server: Default-First-Site-Name\DC1
      Starting test: Topology
         ......................... DC1 passed test Topology

   Running partition tests on : ForestDnsZones

   Running partition tests on : DomainDnsZones

   Running partition tests on : Schema

   Running partition tests on : Configuration

   Running partition tests on : techtutsonline

   Running enterprise tests on : techtutsonline.com

To test that replication is functioning properly, issue the dcdiag /test:replications command.

C:\Windows\System32>dcdiag /test:replications
Directory Server Diagnosis

Performing initial setup:
   Trying to find home server...
   Home Server = DC1
   * Identified AD Forest.
   Done gathering initial info.

Doing initial required tests

   Testing server: Default-First-Site-Name\DC1
      Starting test: Connectivity
         ......................... DC1 passed test Connectivity

Doing primary tests

   Testing server: Default-First-Site-Name\DC1
      Starting test: Replications
         ......................... DC1 passed test Replications

   Running partition tests on : ForestDnsZones

   Running partition tests on : DomainDnsZones

   Running partition tests on : Schema

   Running partition tests on : Configuration

   Running partition tests on : techtutsonline

   Running enterprise tests on : techtutsonline.com

To view the complete status of replication for a specific domain controller, use the following command:

C:\Windows\System32>dcdiag /v /s:dc2
Directory Server Diagnosis

Performing initial setup:
   * Connecting to directory service on server dc2.
   * Identified AD Forest.
   Collecting AD specific global data
   * Collecting site info.
   Calling ldap_search_init_page(hld,CN=Sites,CN=Configuration,DC=techtutsonline
,DC=com,LDAP_SCOPE_SUBTREE,(objectCategory=ntDSSiteSettings),.......
   The previous call succeeded
   Iterating through the sites
   Looking at base site object: CN=NTDS Site Settings,CN=Default-First-Site-Name
,CN=Sites,CN=Configuration,DC=techtutsonline,DC=com
   Getting ISTG and options for the site
   * Identifying all servers.
   Calling ldap_search_init_page(hld,CN=Sites,CN=Configuration,DC=techtutsonline
,DC=com,LDAP_SCOPE_SUBTREE,(objectClass=ntDSDsa),.......
   The previous call succeeded....
   The previous call succeeded
   Iterating through the list of servers
   Getting information for the server CN=NTDS Settings,CN=DC1,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=techtutsonline,DC=com
   objectGuid obtained
   InvocationID obtained
   dnsHostname obtained
   site info obtained
   All the info for the server collected
   Getting information for the server CN=NTDS Settings,CN=DC2,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=techtutsonline,DC=com
   objectGuid obtained
   InvocationID obtained
   dnsHostname obtained
   site info obtained
   All the info for the server collected
   * Identifying all NC cross-refs.
   * Found 2 DC(s). Testing 1 of them.
   Done gathering initial info.

Doing initial required tests

   Testing server: Default-First-Site-Name\DC2
      Starting test: Connectivity
         * Active Directory LDAP Services Check
         Determining IP4 connectivity
         * Active Directory RPC Services Check
         ......................... DC2 passed test Connectivity

Doing primary tests

   Testing server: Default-First-Site-Name\DC2
      Starting test: Advertising
         The DC DC2 is advertising itself as a DC and having a DS.
         The DC DC2 is advertising as an LDAP server
         The DC DC2 is advertising as having a writeable directory
         The DC DC2 is advertising as a Key Distribution Center
         The DC DC2 is advertising as a time server
         The DS DC2 is advertising as a GC.
         ......................... DC2 passed test Advertising
      Test omitted by user request: CheckSecurityError

[output cut]

Best Practices for Troubleshooting AD Replication

If your domain controllers are not replicating objects correctly, users will not be able to gain access to the objects they need, and may fail to logon.

Keep the following tips in mind when troubleshooting active directory replication issues:

  • Use the tools you are most familiar with when troubleshooting replication problems, but become familiar with the tools you are not familiar with.
  • Verify the replication topology to make sure all of the domain controllers from all sites are interconnected.
  • Urgent replication such as account lockouts will occur within the site, but will not be replicated to other sites until the site link allows it to replicate. Use repadmin utility to force the change.
  • Create connection objects between domain controllers that hold FSMO roles and the servers that will act as their backup if the FSMO role holder fails. Make sure replication is occurring between the two servers.

Troubleshooting FSMO Roles

Flexible Single Master Operations (FSMO) Roles are specialized services within Active Directory Domain Services that should be performed only by a single domain controller at a time. We have already discussed FSMO roles here.

Because there can be only one domain controller holding each of the roles, you need to make sure you keep them operational. Of course, with some of these roles, getting them up and operational is more important than it is with others; however, you should still know what is required to get them into an operational state.

Importance of FSMO Roles

Each FSMO role is important within the forest. Without them, you will not have a means of identifying objects correctly, and data corruption can occur if two or more administrators make changes to objects within the forest. If your current role holder goes down, you can transfer the role to any functional domain controller. If you are sure that the failed role holder domain controller will not be able to come back online, you can seize the FSMO role to any functional domain controller. The whole procedure of transferring and seizing the FSMO roles is discussed here.

Best Practices for Troubleshooting FSMO Roles

Remember a few points about FSMO roles:

  • Do not seize a role unless you are sure you will never reintroduce the original role holder to the network.
  • If demoting a role holder, transfer the role to another domain controller first.
  • Keep documentation that identifies the role holders and the domain controllers that are designated as the standby servers.

Troubleshooting Logon Failures

Nothing is more frustrating for users if attempt to logon at first time in the morning, gives an error. Immediately their day is off to a bad start. Troubleshooting is your realm, so it’s up to you to determine what is causing the problem and to set things back on track. Within an Active Directory domain, several things could be at fault.

If you are using a Windows Server 2008 or Windows Server 2012 based domain, the default password policy uses complex passwords. Although this is a good policy to use from the standpoint of most of your security auditors, you will find that it can cause additional headaches. Complex passwords, while more secure, are also more difficult for users to remember. You will probably end up unlocking users’ accounts and changing their passwords for them more often than you would with simpler passwords. You will also run into the problem of users writing down their passwords and leaving them close to their systems. Controlling passwords, monitoring authentication, and maintaining a sensible password and lockout policy will help you minimize logon problems, but you will still be forced into troubleshooting these issues.

Auditing for Logon Problems

As with any troubleshooting, you should start with checking out the event logs on the client system and the domain controllers within their sites. Although many administrators criticize the event logs, you can find out some interesting and useful information from them. If you have enabled auditing of account logon and logon events, you will receive events in the security log that pertain to accounts as they authenticate or fail to authenticate. To watch for failures, you must audit the failure of authentication by using an audit policy. Once you do that, you can receive the audit log for specific entries if users start having difficulty authenticating.

The following figure shows an example of a GPO that is being used to implement auditing for a domain. When you choose the options to audit and you want to see information concerning authentication, you should set the options as marked with green box.

Audit Policy in Default Domain Policy GPO
Audit Policy in Default Domain Policy GPO

Once these settings are done, you will be able to view the security logs for common events. On domain controllers, you can look for information concerning account lockouts and changes to the accounts. Event ID : 675 will show you the IP address of a client computer from which the bad password originated. If this IP address is not the computer from which the client normally authenticates, it may be an indication of an attack on the account. Event ID : 644 will appear if the Audit Account Management auditing option is set for Success. This event is generated any time an account is locked out because of incorrect credentials.

You can also view the security logs on client systems and search for common Event IDs. Event ID : 529 is recorded if the system does not have the user account that is attempting to log on. This could be due to the user accidentally pressing the wrong keys when they are logging on, but it could also be an indication of someone trying to hack into a system by guessing an account name. Event ID : 531 indicates that the account that was attempting to authenticate is locked out or has been disabled by an administrator.

Kerberos Logging

By turning on Kerberos logging, you can have the system present more-detailed information concerning authentication. To do so, you can edit the registry manually. If you plan to edit the registry on a domain controller to enable Kerberos logging, you will need to open regedit and navigate to the following registry key:

HKLM\System\CurrentControlSet\Control\LSA\Kerberos\Parameters

You must create the REG_DWORD entry LogLevel. If you set the value of this entry to 1, you will be able to monitor the system event log for Event ID 4. If Event ID 4 appears in the log, it will indicate that a bad password was sent to the Kerberos service for authentication or that the account was locked out. If the error code within this event specifies 0x18 KDC_ERR_PREAUTH_FAILED, the password was incorrect. An error code of 0x12 KDC_ERR_CLIENT_REVOKED indicates that the account was locked out.

Best Practices for Logon and Account-Lockout Troubleshooting

The logon issues frustrates the administrators and users similarly. The calls that flood in right after a mandatory password change can be frustrating, but if you follow the  information tips, you may be able to reduce the resulting headache:

  • Only enable universal group membership caching if you want to reduce the replication across a WAN link and you have a small number of users who will be affected.
  • Only turn off universal group membership enumeration for a native-mode domain unless you are not using universal security groups.
  • Turn on auditing for account logon and account management so that you can identify logon failures and can determine the causes.
  • Take advantage of the new Account Lockout and Management Tools to aid in troubleshooting account lockout.
  • Monitor the PDC Emulator for authentication attempts. All attempts with a bad password are forwarded to the PDC Emulator.
  • Turn off logging when it is not necessary so it does not consume additional resources.

Back



Microsoft Certified | Cisco Certified

1 Comment

Comments are closed.