.NET Data Directories¶
Note
The documentation has a new home: Check it out!
Managed executables (applications written using a .NET language) contain an extra data directory in the optional header of the PE file format. This small data directory contains a header which is also known as the CLR 2.0 header, and references other structures such as the metadata directory, raw data for manifest resources and sometimes an extra native header in the case of mixed mode applications or zapped (ngen’ed) applications.
.NET directory / CLR 2.0 header¶
The .NET data directory can be accessed by the IPEImage.DotNetDirectory
property.
IPEImage peImage = ...
Console.WriteLine("Managed entry point: {0:X8}", peImage.DotNetDirectory.EntryPoint);
Metadata directory¶
The metadata data directory is perhaps the most important data directory that is referenced by the .NET directory. It contains the metadata streams, such as the table and the blob stream, which play a key role in the execution of a .NET binary.
To access the metadata directory, access the IDotNetDirectory.Metadata
property, which will provide you an instance of the IMetadata
interface:
var metadata = peImage.DotNetDirectory.Metadata;
Console.WriteLine("Metadata file format version: {0}.{1}", metadata.MajorVersion, metadata.MinorVersion);
Console.WriteLine("Target .NET runtime version: " + metadata.VersionString);
Metadata streams¶
The IMetadata
interface also exposes the Streams
property, a list of IMetadataStream
instances.
foreach (var stream in metadata.Streams)
Console.WriteLine("Name: " + stream.Name);
Alternatively, it is possible to get a stream by its name using the GetStream(string)
shortcut:
var stringsStream = metadata.GetStream("#Strings");
Or grab the stream by its type:
var stringsStream = metadata.GetStream<StringsStream>();
AsmResolver supports parsing streams using the names in the table below. Any stream with a different name will be converted to a CustomMetadataStream
.
Name | Class |
---|---|
#~ #- #Schema |
TablesStream |
#Blob |
BlobStream |
#GUID |
GuidStream |
#Strings |
StringsStream |
#US |
UserStringsStream |
Some streams support reading the raw contents using a BinaryStreamReader
. Effectively, every stream that was read from the disk is readable in this way. Below is an example of a program that dumps for each readable stream the contents to a file on the disk:
// Iterate over all readable streams.
foreach (var stream in metadata.Streams.Where(s => s.CanRead))
{
// Create a reader that reads the raw contents of the stream.
var reader = stream.CreateReader();
// Write the contents to the disk.
File.WriteAllBytes(stream.Name + ".bin", reader.ReadToEnd());
}
The Streams
property is mutable. You can add new streams, or remove existing streams:
// Create a new stream with the contents 1, 2, 3, 4.
var data = new byte[] {1, 2, 3, 4};
var newStream = new CustomMetadataStream("#Custom", data);
// Add the stream to the metadata directory.
metadata.Streams.Add(newStream);
// Remove it again.
metadata.Streams.RemoveAt(metadata.Streams.Count - 1);
Blob, Strings, US and GUID streams¶
The blob, strings, user-strings and GUID streams are all very similar in the sense that they all provide a storage for data referenced by the tables stream. Each of these streams has a very similar API in AsmResolver.
Class | Method |
---|---|
BlobStream |
GetBlobByIndex |
GuidStream |
GetGuidByIndex |
StringsStream |
GetStringByIndex |
UserStringsStream |
GetStringByIndex |
Example:
var stringsStream = metadata.GetStream<StringsStream>();
string value = stringsStream.GetStringByIndex(0x1234);
Since blobs in the blob stream have a specific format, just obtaining the byte[]
of a blob might not be all that useful. Therefore, the BlobStream
has an extra GetBlobReaderByIndex
method, that allows for parsing each blob using an BinaryStreamReader
object instead. If performance is critical, the GetBlobReaderByIndex
method is preferred over GetBlobByIndex
, as this method also avoids an allocation of a temporary buffer as well.
var blobStream = metadata.GetStream<BlobStream>();
if (blobStream.TryGetBlobReaderByIndex(0x1234, out var reader))
{
// Use reader to parse the blob signature ...
}
Tables stream¶
The tables stream (#~
, #-
or #Schema
) is the main stream stored in the .NET binary. It provides tables for all members defined in the assembly, as well as all references that the assembly uses. The tables stream is represented by the TablesStream
class and can be obtained in the same way as any other metadata stream:
TablesStream tablesStream = metadata.GetStream<TablesStream>();
Metadata tables are represented by the IMetadataTable
interface. Individal tables can be accessed using the ``GetTable` method:
IMetadataTable typeDefTable = tablesStream.GetTable(TableIndex.TypeDef);
Tables can also be obtained by their row type:
MetadataTable<TypeDefinitionRow> typeDefTable = tablesStream.GetTable<TypeDefinitionRow>();
The latter option is the preferred option, as it allows for a more type-safe interaction with the table as well and avoids boxing of each row in the table. Each metadata table is associated with its own row structure. Below a table of all row definitions:
Table index | Name (as per specification) | AsmResolver row structure name |
---|---|---|
0 | Module | ModuleDefinitionRow |
1 | TypeRef | TypeReferenceRow |
2 | TypeDef | TypeDefinitionRow |
3 | FieldPtr | FieldPointerRow |
4 | Field | FieldDefinitionRow |
5 | MethodPtr | MethodPointerRow |
6 | Method | MethodDefinitionRow |
7 | ParamPtr | ParameterPointerRow |
8 | Param | ParameterDefinitionRow |
9 | InterfaceImpl | InterfaceImplementationRow |
10 | MemberRef | MemberReferenceRow |
11 | Constant | ConstantRow |
12 | CustomAttribute | CustomAttributeRow |
13 | FieldMarshal | FieldMarshalRow |
14 | DeclSecurity | SecurityDeclarationRow |
15 | ClassLayout | ClassLayoutRow |
16 | FieldLayout | FieldLayoutRow |
17 | StandAloneSig | StandAloneSignatureRow |
18 | EventMap | EventMapRow |
19 | EventPtr | EventPointerRow |
20 | Event | EventDefinitionRow |
21 | PropertyMap | PropertyMapRow |
22 | PropertyPtr | PropertyPointerRow |
23 | Property | PropertyDefinitionRow |
24 | MethodSemantics | MethodSemanticsRow |
25 | MethodImpl | MethodImplementationRow |
26 | ModuleRef | ModuleReferenceRow |
27 | TypeSpec | TypeSpecificationRow |
28 | ImplMap | ImplementatinoMappingRow |
29 | FieldRva | FieldRvaRow |
30 | EncLog | EncLogRow |
31 | EncMap | EncMapRow |
32 | Assembly | AssemblyDefinitionRow |
33 | AssemblyProcessor | AssemblyProcessorRow |
34 | AssemblyOS | AssemblyOSRow |
35 | AssemblyRef | AssemblyReferenceRow |
36 | AssemblyRefProcessor | AssemblyRefProcessorRow |
37 | AssemblyRefOS | AssemblyRefOSRow |
38 | File | FileReferenceRow |
39 | ExportedType | ExportedTypeRow |
40 | ManifestResource | ManifestResourceRow |
41 | NestedClass | NestedClassRow |
42 | GenericParam | GenericParamRow |
43 | MethodSpec | MethodSpecificationRow |
44 | GenericParamConstraint | GenericParamConstraintRow |
Metadata tables are similar to normal ICollection<T>
instances. They provide enumerators, indexers and methods to add or remove rows from the table.
Console.WriteLine("Number of types: " + typeDefTable.Count);
// Get a single row.
TypeDefinitionRow firstTypeRow = typeDefTable[0];
// Iterate over all rows:
foreach (var typeRow in typeDefTable)
{
// ...
}
Members can also be accessed by their RID using the GetByRid
or TryGetByRid
helper functions:
TypeDefinitionRow thirdTypeRow = typeDefTable.GetByRid(3);
Using the other metadata streams, it is possible to resolve all columns. Below an example that prints the name and namespace of each type row in the type definition table in a file.
// Load PE image.
var peImage = PEImage.FromFile(@"C:\file.exe");
// Obtain relevant streams.
var metadata = peImage.DotNetDirectory.Metadata;
var tablesStream = metadata.GetStream<TablesStream>();
var stringsStream = metadata.GetStream<StringsStream>();
// Go over each type definition in the file.
var typeDefTable = tablesStream.GetTable<TypeDefinitionRow>();
foreach (var typeRow in typeDefTable)
{
// Resolve name and namespace columns using the #Strings stream.
string ns = stringsStream.GetStringByIndex(typeRow.Namespace);
string name = stringsStream.GetStringByIndex(typeRow.Name);
// Print name and namespace:
Console.WriteLine(string.IsNullOrEmpty(ns) ? name : $"{ns}.{name}");
}
Method and FieldRVA¶
Every row structure defined in AsmResolver respects the specification described by the CLR itself. However, there are two exceptions to this rule, and those are the Method and FieldRVA rows. According to the specification, both of these rows have an RVA column that references a segment in the original PE file. Since this second layer of abstraction attempts to abstract away any file offset or virtual address, these columns are replaced with properties called Body
and Data
respectively, both of type ISegmentReference
instead.
ISegmentReference
exposes a method CreateReader()
, which automatically resolves the RVA that was stored in the row, and creates a new input stream that can be used to parse e.g. method bodies or field data.
Reading method bodies:¶
Reading a managed CIL method body can be done using CilRawMethodBody.FromReader
method:
var methodTable = tablesStream.GetTable<MethodDefinitionRow>();
var firstMethod = methodTable[0];
var methodBody = CilRawMethodBody.FromReader(firstMethod.Body.CreateReader());
It is important to note that the user is not bound to use CilRawMethodBody
. In the case that the Native
(0x0001
) flag is set in MethodDefinitionRow.ImplAttributes
, the implementation of the method body is not written in CIL, but using native code that uses an instruction set dependent on the platform that this application is targeting. Since the bounds of such a method body is not always well-defined, AsmResolver does not do any parsing on its own. However, using the CreateReader()
method, it is still possible to decode instructions from this method body, using a custom instruction decoder.
Reading field data:¶
Reading field data can be done in a similar fashion as reading method bodies. Again use the CreateReader()
method to gain access to the raw data of the initial value of the field referenced by a FieldRVA row.
var fieldRvaTable = tablesStream.GetTable<FieldRvaRow>();
var firstRva = fieldRvaTable[0];
var reader = firstRva.Data.CreateReader();
Creating new segment references:¶
Creating new segment references not present in the current PE image yet can be done using the ISegment.ToReference()
extension method:
var myData = new DataSegment(new byte[] {1, 2, 3, 4});
var fieldRva = new FieldRvaRow(myData.ToReference(), 0);
TypeReference Hash (TRH)¶
Similar to the Import Hash, the TypeReference Hash (TRH) can be used to help identify malware families written in a .NET language. However, unlike the Import Hash, the TRH is based on the names of all imported type references instead of the symbols specified in the imports directory of the PE. This is a more accurate representation for .NET images, as virtually every .NET image only uses one native symbol (either mscoree.dll!_CorExeMain
or mscoree.dll!_CorDllMain
).
AsmResolver includes a built-in implementation for this that is based on the reference implementation provided by GData. The hash can be obtained using the GetTypeReferenceHash
extension method on IPEImage
or on IMetadata
:
IPEImage image = ...
byte[] hash = image.GetTypeReferenceHash();
IMetadata metadata = ...
byte[] hash = metadata.GetTypeReferenceHash();