Wednesday, August 01, 2007

Programming assemblies and modules in MSIL

Introduction

This is about building assemblies and modules using MSIL language. To have a reasonable insight about internals of .Net Framework understanding of IL is must. There are many articles that describe how CLR loader loads an assembly how JIT compiler compiles and points the the method to newly created memory block with compiled code etc. But I couldn't find a simple guide to build basic units of execution modules using IL like
  1. a net module

  2. a multi file library

  3. an assembly with library reference

This article describes this process in step by step manner. We are going to create two net modules. Then we link them together to form a library. Then we build an application assembly which refers to methods defined in net modules of library. Before going there I would like to describe some theory in one paragraph describing theoretical aspects of this subject. If you cant bare it, just skip following section. You can always come back to have peep if needed.

Theoretical background

CLR CTS CLS - quick focus. CLS defines minimum set of rules needed to offer interoperatability among .NET application written and compiled in different languages and targetting CLR. In .NET's context IL,MSIL and CIL are synonyms. Intermediate Language (IL) ~= Microsoft Intermediate Language (MSIL) ~= Common Intermediate Language (CIL). CLS compliance ensures interoperatable in terms of naming conventions, the data types, the function types etc. But CLS compliance is not mandatory but a recommendation. CLR doesn't doesn't impose any restrictions for an application even it is non-compliant with respect to CLS. But in such cases interoperatability can not be achieved with modules/assemblies developed in different languages. On the other hand Common Type System enforces its rules on each every construct of any language that is targeted for CLR. For examples assemblies and modules are standard types. Managed .NET application are called assemblied and managed executables are referred as modules. An assembly ca contain many modules. Each module contains MetaData and IL. And an assembly contains a Manifest too. CLR offers a safe execution environment above operating system. Safety is achieved by enforcing type control, structured exception handling, garbage collection etc.

With that let us jump to files and code. Recollect that we are going to create two net modules. Then we link them together to form a library. Then we build an application assembly which refers to methods defined in net modules of library. Copy code for these three source files nm01.il,nm02.il and hello.il. And we will build mylib.dll by linking net modules nm01 and nm02.

Files and Source

net module 01 - [nm01.il]

I could have taken more IL files to build this net modules.
This got just a constructor and two static methods.
There is no need for them to be static, but they are there without any strict reason.

.assembly extern mscorlib{}
.class public mymath01
{
.method public void .ctor()
{
.maxstack 1
ldarg.0 //push "this" instance onto the stack
call instance void [mscorlib]System.Object::.ctor()
ret
}
.method static public int32 mysum(int32 i1,int32 i2)
{
.maxstack 2
ldarg.0
ldarg.1
add
ret
}
.method static public int32 mysub(int32 i1,int32 i2)
{
.maxstack 2
ldarg.1
ldarg.2
sub
ret
}
}
net module 02 - [nm02.il]

.assembly extern mscorlib{}
.class public mymath02
{
.method public void .ctor()
{
.maxstack 1
ldarg.0 //push "this" instance onto the stack
call instance void [mscorlib]System.Object::.ctor()
ret
}

.method static public int32 mymul(int32 i1,int32 i2)
{
.maxstack 2
ldarg.1
ldarg.2
mul
ret
}

.method static public int32 mydiv(int32 i1,int32 i2)
{
.maxstack 2
ldarg.1
ldarg.2
div
ret
}
}

Application - [hello.il]

.assembly extern mscorlib {}
.assembly extern mylib {}
.assembly hello {}
.method static public void main() cil managed
{
.entrypoint
.maxstack 4
.locals init (int32 first,
int32 second,
int32 result)

ldstr "First number:"


call void [mscorlib]System.Console::WriteLine(string)
call string [mscorlib]System.Console::ReadLine()
call int32 [mscorlib]System.Int32::Parse(string)
stloc first


ldstr "Second number:"
call void [mscorlib]System.Console::WriteLine(string)
call string [mscorlib]System.Console::ReadLine()
call int32 [mscorlib]System.Int32::Parse(string)
stloc second

ldloc first
ldloc second
call int32 [mylib]mymath01::mysum(int32,int32)
stloc result

ldstr "{0} + {1} = {2}"
ldloc first
box int32
ldloc second
box int32
ldloc result
box int32
call void [mscorlib]System.Console::WriteLine(string,object,object,object)

ldstr "Hello, World!"
call void [mscorlib]System.Console::WriteLine(string)
ret
}

I guess most of this stuff is pretty obvious except for maxstack.

maxstack

At the IL level, the CLR needs to know the maximum stack depth of each method. This is not a plain arithmetic. It depends on code flow and depends on max number of stack slots required for any operation that is performed within this method. Though it is not simple it is not dynamic too. Compilers targeting for CLR can calculate this before producing executables. Compilers of high level languages like csc (CSharp ) and vbc (VB) does calculate this number embed it before producing executable code. But for some reason ILAsm doesn't do this. If you don't specify .maxstack, stack defaults to 8. But if methods more stack slots program will crash at runtime.

Partial Classes

Some might have thought that why I created a new class in each net module; rather I would have used a partial class. This makes sense because I am actually building an arithmetic class. But this is not possible here. Because IL doesn't know anything about partial classes. This feature is offered by high level language compilers with syntactic sugar. That makes it clear about why partial classes can not be spanned across assemblies. Compilers merge code in partial classes to make a single class while building. Thus all references for that class should be resolved before building the assembly.

Build commands

I guess you have .Net Framework and SDK(or Visual Studio 2003/2005) installed on your machine. The command I am describing here are for local host. If you are building for other targets than build machine, specify flags for respective targets. Now start Visual Studio Command prompt or SDK command prompt. Navigate to location of source files.

  1. Building net modules
    ilasm /dll /output=nm01.netmodule nm01.il
    ilasm /dll /output=nm02.netmodule nm02.il

  2. Building library - we use assembly linker for this purpose
    al /t:lib /out:mylib.dll nm01.netmodule nm02.netmodule

  3. Building application
    ilasm hello.il

  4. IL dis-assembler - ildasm
    For the purpose of this article use command line options of ildasm instead of GUI.
    ildasm /All hello.exe /out=hello.dil
    This produces helo.dil file which contains complete information

  5. dumpbin
    You may also want to use dumpbin if you are interested to see headers and sections of PE files.

    Please note that there is mention of linking mylib.dll while building hello.il. Actually this linking is specified in source; ".assembly extern mylib {}".

Now run hello and check for results like this
$hello
First number:
3
Second number:
7
3 + 7 = 10
Hello, World!

How to Debug

Debugging a managed code itself becomes a separate topic. This is because CLR won't execute IL code , it actually executes native(machine) code compiled by jitter. And any loss of fidelity from IL to Native will confuse the debugger. This entry ran quite long. I will keep Debugging for another entry.

No comments: