Event Tracing for Windows: A fresh look at an old tool

Event Tracing for Windows: A fresh look at an old tool

While most Windows developers know of Event Tracing for Windows (ETW) as a logging and tracing mechanism, many administrators have never heard of it. Simply put, ETW includes event logging and tracing capabilities provided by the operating system. Implemented in the kernel, it traces events in user mode applications, the operating system kernel and kernel-mode device drivers.

Event Tracing for Windows is used by a number of core OS components and some third-party applications to provide event logging and tracing. Although it required access to a checked build of Windows to gather ETW information when first released with Windows 2000, more recent versions provide built-in tools with normal (free) Windows builds.

Getting started with Event Tracing for Windows
When diagnosing and troubleshooting Windows Server issues, it seems there is never too much data. The admin is always looking for additional details on what is going on with various components to pinpoint the problem. As such, there are a number of tools like Process Monitor,Process Explorer, Performance Monitor (Perfmon) and Performance Analysis for Logs (PAL) that dig considerably deeper than the event log, but there are times when we need to dig even further down than that.

ETW allows additional instrumentation for gathering data that would not otherwise be available and has a number of advantages. For example:

  • it uses per-processor kernel buffers from a non-paged pool that are not impacted by application crashes or hangs

  • it uses very low CPU overhead

  • it’s available for x86, x64 and IA64 architectures

  • it can enable and disable tracing without rebooting or restarting applications

Event Tracing for Windows may seem like a great tool, but using it is another issue since there is no GUI or user guide. It also requires a few preliminary steps just to produce output that can be used for analysis.

In order to provide useful output you need a consumer. The consumer built in to Windows Server is Tracerpt.exe. As you can imagine, there are a number of flags for Tracerpt to provide specific output formats, so it’s important to become familiar with the Tracerpt and Logman utilities that are native in Windows Server 2003 and up, as well as Windows 7 and Vista.

It’s also important to understand the architecture for ETW. As you can see in Figure 1, the controllers are used to start and stop a tracing session. The tool used to do this in Windows Server 2003 and 2008 is Logman.exe.

Figure 1. The ETW architecture
The ETW architecture
       Image credit: Microsoft Corp.

Windows Server 2003 also contains a handful of event providers that return specific events, including the following Active Directory-related providers:

  • Active Directory: Core
  • Active Directory: Kerberos
  • Active Directory: SAM
  • Active Directory: NetLogon

For instance, specifying Active Directory: Kerberos as a provider will only return Kerberos-specific events.

Event providers differ between Windows versions, however. For example, Windows Server 2003 has 22 providers, while Windows 2008 has 387. This gives more power to the trace and offers more granularities. Yet when it comes to LDAP traffic, the Active Directory: Core provider appears to give the same detail for either Windows version.

You can also combine event providers into a single trace. Since Kerberos authentication was involved in the example above, I used the Active Directory: Kerberos and the Active Directory: Core providers and applied the Logman option-pf, as shown in the following example:

Logman Create Trace CoreKerb –pf c:\etw\input.txt –o c:\etw\coreKerb

The –pf option reads an input text file (input.txt in this case). The format of the input file is shown in Figure 2.

Figure 2. Input text file format
Input text file format

Putting Event Tracing for Windows to work
The best way to explain ETW is with a case study. Recently, I was contacted by an engineer who needed information about how Active Directory was responding to an LDAP request for a Unix client authenticating against an AD domain controller. He used a Unix command to see the bind request/response on the Unix side and wanted to see similar output on the Windows side. The output looked something like this:

[23/Sep/2010:15:04:44 +0200] conn=31 fd=65 slot=65 connection from 10.50.20.173 to 10.50.12.119

[23/Sep/2010:15:04:44 +0200] conn=31 op=0 BIND dn="uid=dorsa,ou=people,o=Corp.net" method=128 version=3

[23/Sep/2010:15:04:44 +0200] conn=31 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn="uid=dorsa,ou=people,o=corp.net"

[23/Sep/2010:15:04:44 +0200] conn=31 op=1 SRCH base="ou=people,o=hp.com" scope=2 filter="(|(uid=dorsa)(cn=mdilln.dodgcty))" attrs=ALL

[23/Sep/2010:15:04:44 +0200] conn=31 op=1 RESULT err=0 tag=101 nentries=2 etime=0

[23/Sep/2010:15:04:44 +0200] conn=31 op=2 UNBIND

[23/Sep/2010:15:04:44 +0200] conn=31 op=2 fd=65 closed - U1

[23/Sep/2010:15:04:44 +0200] conn=29 op=-1 fd=64 closed error 11 (Resource temporarily unavailable) -

To work through the output, I used the NTDS Diagnostics registry key atHKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NTDS\Diagnostics and defined the LDAP Interface for a value of 5. It only provided the elapsed time of the operation and wasn’t close to the Unix output, so I decided to try Event Tracing for Windows. Since this was on Windows Server 2003, I used the Active Directory: Core provider, which gave LDAP details.

Below are the steps and commands I used to create an ETW trace and generate a log. The commands were used to dump LDAP data during authentication for the Unix engineer. I also created a C:\ETW directory to store all my data.

  1. C:>Logman Query Providers — This command lists all available providers. Note that the provider we are interested in for LDAP information is the Active Directory: Core provider.

  2. Logman create trace “LDAP1” –p “Active Directory: core” –o c:\etw\LDAP1 LDAP1 — This is the name of the trace (we’ll see it when we look at the list of traces).
    • -identifies identifies Active Directory: Core as the provider we want to use.
    • -o specifies the path for the output (.etl) file as C:\etw\ldap1 . The output file will be saved as LDAP1_000001.etl. Note that when the trace runs a second time the output file will be named LDAP1_000002.etl, etc.
  3. Once the trace is executed successfully with the Logman create trace command it can be seen in the queue with the command: C:>Logman Query. A sample output is shown in Figure 3. The LDAP1 trace is shown in the red box outline in the figure. Note that there are a number of traces I defined which can be reused simply by starting and stopping them.

    Figure 3. ETW trace (click to enlarge)
    ETW trace

  4. The following command starts the trace:

    Logman Start LDAP1

    Issuing Logman Query at this point would show LDAP1 as “Running”.

  5. Reproduction operations are then needed to reproduce the problem or situation you want to trace. In this case, I performed a logon and ran some LDIFDE commands to perform LDAP searches. Having these commands ready as soon as the trace starts will minimize the noise in the trace and make it easier to read.

  6. Next, stop the trace: Logman Stop LDAP1

  7. The C:\ETW directory now shows that the LDAP1 trace file LDAP1_000002.etl was created:

    C:\ETW>dir ldap1*
    Volume in drive C has no label.
    Volume Serial Number is 309D-BA04

    Directory of C:\ETW

    10/13/2010 04:22 PM     1,015 ldap1
    10/13/2010 04:20 PM     262,144 LDAP1_000001.etl
    01/21/2011 02:12 AM     262,144 LDAP1_000002.etl

  8. Because this is the second time running that trace, the file name was bumped to 000002.

Since the .etl log is unreadable we can use Tracerpt to give us some useful data. The command for this example would be:

TRACERPT LDAP1_000001.etl -o Ldap1.csv

    • -of sets the file type (default CSV) (See online help for more formats.)
    • -o represents the output file name default, which is dumpfile.csv and produces the most interesting dump of LDAP activity
    • -Summary, -Report represents statistical data (not used in this example)

Opening the LDAP1.csv file in Excel (or Notepad) will allow a look at the data. Figure 4 shows part of my output file with the LDAP requests and responses highlighted. As you can see, the search and bind requests from the text are in column A, while in column B you can see the start and end of the requests, which can be paired up. Further to the right you can see the user data, the filter and scope of the LDAP request, and so on.

Figure 4. View of LDAP1.csv data (click to enlarge)
View of LDAP1.csv data

The exciting thing about Event Tracing for Windows is that the opportunities with providers seem endless. Providers for Group Policy, Kerberos, LDAP clients, Netlogon, FSRM, IIS and many more are all available in Windows Server 2008.

While I used to rely exclusively on event logs and similar log files, I can now go a level deeper with Event Tracing for Windows and get a lot more verbose data to help me solve whatever problem I’m troubleshooting. The commands to produce the traces and reports are very easy to use as well. Of course, you can find more command options and details online.

You can follow SearchWindowsServer.com on Twitter @WindowsTT.

ABOUT THE AUTHOR
Gary Olsen is a systems software engineer for Hewlett-Packard in Global Solutions Engineering. He authored Windows 2000: Active Directory Design and Deployment and co-authored Windows Server 2003 on HP ProLiant Servers. Gary is a Microsoft MVP for Directory Services and formerly for Windows File Systems.

Auditing AD administrators with Windows 2008 R2’s Event Viewer

Auditing AD administrators with Windows 2008 R2’s Event Viewer

Any discussion with Microsoft about how to limit or manage administrator rights usually results in a lecture on how you should not give admin rights to people you don’t trust. This makes sense, but how do you know an admin can’t be trusted if there is no evidence they did something wrong? And further, how do you prove it?

There’s a long list of actions that you simply can’t lock a domain admin out of. Limiting admin rights and delegation is sometimes difficult to accomplish, especially in a multiple domain environment that requires admins in each domain. Support personnel usually need admin rights as well, and sometimes political requirements will dictate even more admins. So the realquestion is, how do you audit an administrator?

While the answer is to simply enable auditing, this doesn’t catch everything. For example, I recently worked on a large Active Directory deployment with a number of admins. They had an application that used certain user object attributes to provide hooks to the app. From a security standpoint, they found that an admin could disable auditing, modify those key attributes and do bad things with the application. The admin could then re-enable auditing without detection — even with Windows Server 2008 R2’s attribute auditing features. With auditing enabled, sufficient events are logged to show who made changes to the object, including the attributes. But with auditing disabled, all this evidence was missing.

It turns out that Event ID 4907 (Figure 1) is logged when auditing of non-directory objects is enabled, but no such event is logged for directory objects.

Figure 1. Event ID 4907 (click to enlarge)
Event ID 4907

The event clearly showed that the audit policy was changed and who did it, but I needed to be satisfied that we could not get the needed information some other way before logging a case with Microsoft. This is important, as it allows me to demonstrate the powerful Event Viewer features like custom views and sorting/saving filters for Windows Server 2008 R2. Verbose auditing dumps an incredible number of events to the security log with object auditing enabled. With the old Event Viewer, it would be very difficult to sort through these events to get what you want.

In order to audit directory objects, the Group Policy Object (GPO) setting “Audit Directory Service Access” (Figure 2) must be enabled on a GPO that applies to the object to be audited.

Figure 2. The Audit Directory Service Access GPO (click to enlarge)
The Audit Directory Service Access GPO

In addition, auditing must be enabled on the object itself. For example, to configure the audit settings on a user object, do the following:

  • Locate the desired object in the Active Directory Users and Computers (ADUC) snap-in.
  • Open the object Properties and select the Security tab. Be sure to go to the View menu and enable Advanced Features.
  • In the Security tab, select the Advanced button.
  • In the Advanced Properties screen, select the Auditing tab. Choose Add to add a user or group to audit, as shown in Figure 3. You can specify the security options in the following screen. Click OK to exit out of all open screens.
Test the auditing by logging on as the admin specified in the audit properties (in my example it is JrAdmin). Modify the object the auditing was defined on and check the security event log. For instance, you can delete the user object or modify an attribute.

Figure 3. Advanced security settings in ADUC (click to enlarge)
Advanced security settings in ADUC

With auditing enabled, the result is a plethora of events in the security log, most notably:

  • Event ID 4662 — A number of these events are logged with various bits of information (Figure 4). In my case 25 of these were generated for a single object modification.

Figure 4. Properties for Event ID 4662 (click to enlarge)
Properties for Event ID 4662

 

  • Event 5136 — this provides more detail about the modification like the one shown here. Note that we can see the DN of the user making the change to the directory object as well as the DN of the object. We can also see that the description attribute was modified, as we are shown the old value and the value that was deleted (note that some fields were deleted for brevity here):

    Log Name:   Security
    Event ID:   5136
    Task Category:   Directory Service Changes
    Computer:   w2k8r2-dc1.w2k8r2.Wtec.adapps.hp.com
    Description:   A directory service object was modified.

    Subject:
             Security ID: W2K8R2\JrAdmin
             Account Name: JrAdmin
             Account Domain: W2K8R2
    Directory Service:
             Name: w2k8r2.Wtec.adapps.hp.com
    Object:
             DN:
             CN=AdmUser401,OU=Atlanta,DC=w2k8r2,DC=Wtec,DC=adapps,DC          =hp,DC=com
             GUID:
             CN=AdmUser401,OU=Atlanta,DC=w2k8r2,DC=Wtec,DC=adapps,DC          =hp,DC=com
             Class: user
    Attribute:
             LDAP Dsplay Name: description
             Syntax (OID): 2.5.5.12
             Value: This is a test
    Operation:
             Type: Value Deleted
             Correlation ID: {1e193e1d-537f-49c7-bb69-bb50aa4542c2}
             Application Correlation ID: -

Now let’s consider a couple of options and the results.

  • GPO Auditing (directory access) is enabled for success but object auditing is disabled.
    – Result: Event ID 4662 logged when user is removed from object audit list.
  • – Result: Event ID 4738 logged when change to the object is made.
  • GPO Auditing (directory access) is disabled and object auditing is enabled.
    -*#160Result: Event IDs 4662, 4738 and 5136 are all logged.

It’s easy to see the difference in the number of events with full auditing in comparison to having GPO disabled and object auditing enabled. Note that even with GPO auditing disabled the important Event ID 5136 is logged, showing details of the attribute that was changed and who changed it. If both the GPO and object auditing are disabled, only one Event ID 4738 is logged, which has no useful information:

Log Name: Security
Event ID: 4738
Computer: w2k8r2-dc1.w2k8r2.Wtec.adapps.hp.com
Description: A user account was changed.

Subject:
         Security ID: W2K8R2\JrAdmin
     Account Name: JrAdmin
     Account Domain: W2K8R2

Target Account:
         Security ID: W2K8R2\AdmUser400
     Account Name: AdmUser400
     Account Domain: W2K8R2

Note that while various combinations of auditing can produce some events, there is still no event logged, specifying that the audit policy changed for directory objects. Non-directory objects (files, folders, etc.) log Event ID 4907.

So what’s the solution?
The answer is to use a third-party product to audit this activity. Quest Software and Symantec have tools that will do this, for example. It is unknown if Microsoft will change this in the next version of Windows.

Using the Event Viewer

In resolving this issue, the features in Windows Server 2008’s Event Viewer were critical to the process. The security log is famous for its size — especially with auditing. When auditing was enabled at the GPO and object level, 20 to 30 events would be logged for a single attribute change. The advanced filtering in Event Viewer allowed me to build several filters and simply refresh them when a change was made to the policy or object, allowing me to see only the important events.

The filters were built in the Custom Views folder as shown in Figure 5. In this example I was able to identify the event level, one or more ID numbers and one or more event logs (note that even though I only needed the security log here, I could have added additional logs if I wanted to). I also specified a limit of “Last 12 hours” to further limit it, and I saved it to a logical name. If I decided later that I wanted to add or remove an event ID, for example, I could edit the filter, save it, and then refresh the display to get a new data set of events.

Figure 5. The Custom View folder (click to enlarge)
The Custom View folder

Attempting to sort in the full security log took an incredibly long time; the Custom View filter took only a second or two. Of course the danger is that if you fail to include a necessary event in the filter, it will not show up in the filtered view. In my case I started with a filter for the last hour to limit the events, then found the events that related to my audit and added them to the Event ID field in the filter. As you can see in Figure 5, I have defined a number of custom views for various purposes and they are always available for use.

Another feature I used was the Copy Details to Text feature. To get the details for Event ID 4738 (shown in text above), I would have had to take several screen shots as the information scrolled in the event. In the Windows Server 2008 Event Viewer, just right-click on the event in the list, select Copy > Copy Details as Text and paste it into something like Notepad. This will display all the information for documentation purposes.

Now suppose you wanted to examine all the events for a time period — say from 8 a.m. to 5 p.m. — and needed to send those events to a support engineer or just wanted to work on a smaller file. You could simply select the desired events in the Event Viewer, right-click and select Save Selected Events and specify where you wanted it saved (Figure 6). This will make a small event log of just those events, making troubleshooting much simpler and easily transportable.

Figure 6. Saving selected events (click to enlarge)
Saving selected events

Finally, Figure 7 shows the Saved Logs feature. In the old Event Viewer, if you loaded saved event logs they would disappear after Event Viewer was closed. Now they stay until you delete them. Also, they have the names they were saved as, rather than the generic “Saved Application Log” names that were provided in the old Event Viewer. Windows Server 2008’s Event Viewer can also tell what kind of event log it is (system, application, etc.) so you don’t have to specify the log type, which is much easier to use.

Figure 7. The Saved Logs feature (click to enlarge)
The Saved Logs feature

So let’s quickly summarize what we’ve gone over. While the auditing of attributes is a powerful feature in Windows Server 2008 R2, it lacks functionality to audit changes to the audit policy, which in turn allows untrustworthy domain administrators to make destructive changes in Active Directory. It can be difficult to tell if an admin is trustworthy when you have no way of checking things like this. While third-party tools can help, this is still a weakness in Windows auditing.

The new features in the Windows Server 2008 Event Viewer provides great flexibility and powerful filtering not available in previous versions. This allows for excellent data reports to aid in the troubleshooting process. It also helps administrators quickly identify crucial events without wading through a sea of logs to find the ones that are related to the problem.

For more information about resolving issues with AD, visit our Active directory troubleshootingtopic page.

ABOUT THE AUTHOR
Gary Olsen is a systems software engineer for Hewlett-Packard in Global Solutions Engineering. He authored Windows 2000: Active Directory Design and Deployment and co-authored Windows Server 2003 on HP ProLiant Servers. Gary is a Microsoft MVP for Directory Services and formerly for Windows File Systems.