Wednesday, February 19, 2014

Keeping clean code principles for esoteric classes...?

Let's face it, most of our classes will never by used by more than a lonely class. If this is the case, it seems that many of the clean code principles are meaningless for those classes... Or are they?

SRP
One principle to consider, is the Single Responsibility Principle. In his book, Principles, Paterens and Practices (PPP), "Uncle Bob" gives the example of the Rectangle class. This class has 2 responsibilities: the first is to provide a mathematical model of the geometry of a rectangle, and the other is to give some GUI services such as drawing it on the screen. Two different applications use this class: GraphicalApplication and ComputationalGeometryApplication. If a change to the GraphicalApplication causes the Rectangle class to change for some reason, the other application, ComputationalGeometryApplication, needs to be rebuilt, retested, and redeployed.
That really sounds bad, but as I stated at the beginning, most of our classes are never being used by more than one class, not to mention more than one application...

High Cohesion
Another principle is the High Cohesion Principle. This principle states that most of a class's members should be in use by most of the functionality it exposes. Usually, classes that violate this principle, expose many functionalities and therefore have many private members that need to be instantiated. When you need to use some functionality of such a class, you must go through a painful instantiation process - you must provide it all the state it needs, even if most of this state is irrelevant to your needs.
And yet again, if a class is only being used by a lonely class, then probably all the functionalities and all the required variables are truly relevant to that lonely class.

Encapsulation
The last principle I would like to consider is encapsulation. In his other book, Clean Code, "Uncle Bob" says about encapsulation:
"There is a reason that we keep our variables private. We don’t want anyone else to depend on them. We want to keep the freedom to change their type or implementation on a whim or an impulse. Why, then, do so many programmers automatically add getters and setters to their objects, exposing their private variables as if they were public?"
But if a class of mine is only being used by a lonely class, what's the big deal? If I will ever need to change the type or the implementation of a variable my class exposes, I will only have to change two classes - my class and the class that use it, not such a big deal...

But wait...

Having said all that, I still encourage you to keep those principles for ALL your classes, even for those that are only being used by a lonely class.

It's true, most of your classes will never be in use by more than one class, not to mention more than one application.
And yet, sometime in the future, some of them will! You never know which classes are those; you never know when will they start to be in use in other places; you never know how much are they going to be used.

Conclusion
Therefore, treat each and every class as if it is going to be used by many classes, there is a chance it will...

Thursday, October 31, 2013

Mockist TDD vs Classic TDD


If you want to practice TDD there are two main approaches to choose from: mockist TDD or classic TDD (Martin Fowler). In this post i would like to compare between the two.
First, I’ll describe the two approaches and later I will list the pros and cons of the mockist approach.

Mockist TDD
With this approach, you're working in a high granularity, meaning every class has its own test fixture. As a result, each test fixture should only test one CUT (Class Under Test) and none of the classes the CUT depends on, assuming they all have their own test fixtures.

Suppose we have a class A that uses class B. To achieve the high granularity we’ve talked about, TestFixtureA must use a Test Double of B, as shown in figure 1:

Figure 1


Of course our design must support dependency injection to achieve that and it means class A must work against an interface of B and it also requires a way to inject a concrete instance of B into class A (via constructor/setters/IoC etc.)
That’s why this approach is called Mockist TDD since it has an extensive use of Mocks (Test Doubles). It is also called Isolated Testing since each class is tested in an isolated way.

NOTE: we isolate class A from class B even if class B is a regular business class that has no interactions with any external resources such as DB\web service\files system etc.

Classic TDD
With this approach, you're working in a low granularity, meaning every graph of classes has its own test fixture. As a result, each test fixture covers a graph of classes implicitly by testing the graph's root.

Figure 2
















Usually you don't test the inner classes of the graph explicitly since they are already tested implicitly by the tests of their root, thus, you avoid coverage duplications. This lets you keep the inner classes with an internal access modifier unless they are in use by other projects.

Pros & Cons
Let’s describe the pros and cons of the mockist approach.
Pros
1.       More TDD’ish – since all the classes the CUT depends on are mocked, you can start testing the CUT without implementing the classes it depends on. Think about the classic approach, when you come to test some CUT, you should first implement its dependencies, but before that you should first implement their dependencies and so forth.
2.       High granularity – this means:
a.       Smaller test fixtures – one per class, unlike one per graph of classes in the classic approach.
b.      Smaller test setups – take a look on TestFixtureA at figure 2: the Arrange phase (Arrange-Act-Assert) of tests like this is quite large since it has to setup a state for too many classes in the graph. This quite a crucial issue – the bigger the test, the bigger the risk of having bugs in the test itself.
c.       Frequent checkins/commits – think about it, with the classic approach, your tests won’t pass before all the classes the CUT depends on are implemented correctly, thus, the frequency of your checkins (commits) is reduced dramatically (you don't want to commit red tests).
3.       More alternatives to do DI – take a look at figure 3, suppose you need to inject different concretes of interface I into class C. With the mockist approach, which heavily relies on injections, the code that initializes class A also initializes class B and inject it to class A, and also initializes class C and inject it to class B. Therefore, it can easily inject a concrete class of interface I into class C. See figure 4 for example. On the other hand, with the classic approach, the code that initialize class A doesn’t have access to the inner classes of the graph (B, C and I) and therefore its only way to inject a concrete class of interface I into class C is by using some framework of IoC.
Figure 3



















Figure 4










Cons
1.       Much more interfaces and injections – with the mockist approach, for almost every class, you have at least one interface. In addition, there is some kind of injections inflation.
2.       Weaker encapsulation – each class exposes its relations with the classes it depends on so that they can be injected into it and also to allow behavior verification, this partly weakens the encapsulation.
3.       High vulnerability to refactoring – with the mockist approach, every change in the interaction between two classes, requires changes in some tests, since tests usually aware of the interactions between classes (see behavior verification). With the classic approach, on the other hand, you usually do state verification and thus, the tests are not aware of the interaction between classes.

Conclusions:
I personally definitely prefer the mockist approach for one main reason – I cannot see how truly TDD is possible without it.


Monday, October 28, 2013

Red-Green-Refactor

The most recommended way to implement TDD is to follow the Red-Green-Refactor path. In this post I would like to talk about the importance of the Red-Green steps.

Good tests
A good test is green whenever the UUT (Unit Under Test) is correct and is red whenever the UUT is incorrect.

Bad tests
There are 3 types of bad tests:
  1. Tests that are red when the UUT is correct and are also red when the UUT is incorrect. Obviously, tests of this type will be discovered immediately.
  2. Tests that are red when the UUT is correct and are green when the UUT is incorrect. This type of tests is worse than the previous type since it’s not always detectable. And if the UUT is incorrect (and thus the test is green…), it will provide you a false confidence that everything is OK.
  3. Tests that are green when the UUT is correct and are green when UUT is incorrect. This type is at least as bad as the previous type of tests. Tests like this are worthless.

The Red-Green-Refactor (RGR) path will most probably lead you to good tests in most cases. Why? If you follow that path, the first step is the Red step. In that step you should first write your test before the UUT is implemented (thus, incorrect). This almost ensures you that your test will be red when the UUT is incorrect. The second step is the Green step, in which you implement your UUT correctly and you expect your test to be green. This almost ensures you that your test will be green when the UUT is correct. 
Eventually this leads you, with a high degree of certainty, to a ‘good test’ as described above (red when the UUT is incorrect and green when the UUT is correct).


Remember: we’re not talking about pure mathematics here, there will be times when you will follow the RGR path and still end up with ‘bad tests’. Yet, following this path will enhance the robustness of your tests dramatically.

Conclusions:
Many times, people tend to write their tests only after they have completed the UUT and thus, skipping the Red step. This might lead them to bad tests of type 2 and 3 as mentioned.
My conclusion: follow the RGR path.

Monday, April 22, 2013

Referencing the internal members of an aggregate.

There is a lot of confusion around the Aggregate pattern [DDD] and especially around the question: whether or not it's OK for an external object to reference an internal member of an aggregate.

First of all, let's see what the DDD book has to say about it:

"Choose one Entity to be the root of each Aggregate, and control all access to the objects inside the boundary through the root. Allow external objects to hold references to the root only. Transient references to internal members can be passed out for use within a single operation only. Because the root controls access, it can not be blindsided by changes to the internals."

This is a little bit confusing. On the one hand, the root controls all access to the internal members and it can not be blindsided by changes to the internals, but on the other hand, transient references to internal members may be passed out. This means that an external object can mess around with the state of the internals and thus, blindside the AR...
Sounds like a paradox, or is it?

Consider the following example: suppose we have a Travel Agency web site in which users can make flights reservations. A reasonable invariant would be: total reserved sits for a flight can not exceed the total number of sits on the plain. This rule is a true invariant since breaking it will cause a total chaos on the day of the flight. Imagine 60 people claiming their sits on a plain with 50 sits only...

To enforce this invariant we will probably have the following objects in one aggregate:

Here Flight is the AR










Our mission is to make sure that the state in the DB NEVER(!) violates this invariant.
There are several techniques to achieve that.

Tell, don't ask.

The first technique is to let the AR encapsulate all the internal members and every changes to their states will be done through it. In our example, Flight (which is the AR) will encapsulate Reservations. Flight will expose methods like ReserveSits, UpdateReservation, CancelReservation and thus will be able to enforce the invariant. This technique might work, but only if all the internal members of the aggregate are fully encapsulated. Unfortunately it breaks the rule of: Transient references to internal members can be passed out for use within a single operation only

It used to be my favorite technique but it's quite a pain in the ***. What if the aggregate consists of a dipper graph of objects? Eventually you will end up with an AR that has an endless list of methods which all their purpose is to encapsulate every action on the internal members.

Brute force validation.

We need a different technique, a one that will allow external objects to hold transient references to internal members and at the same time will not blindside the AR. The one i prefer is what I call a "brute force validation" (BFV). With this technique you ask the AR to check all its invariants before each time you are about change its state in the DB. You will probably have a method in the AR called CheckInvariants or something like that.

There are a few issues to consider with BFV. First of all, you MUST not forget to call CheckInvariants before each time you are saving the AR to the DB. This means you need to find all the places in code in which you are saving the AR and to invoke this method there. Ouch...
And what if some developer will add a new place in code that saves the AR to the DB? If this developer will forget to call the CheckInvariants method - your DB will be corrupted...

Fortunately, the Repository pattern is coming to the rescue. According to this pattern, each aggregate should have its own repository (usually with the name of the AR as a prefix e.g. FlightRepository). Each AR repository should have a method that saves the AR in the DB along with all its internals and only them. According to this pattern, the AR repository should be the only place to save the AR to the DB. This sounds like a good place to call the CheckInvariants method - inside the repository itself, right before the saving action.

But there is another issue: what happens if an external object modifies one of the internals and then tries to save this internal directly to the DB? This will bypass the CheckInvariants method which is located only at the AR. Actually, if you are following the Repository pattern correctly, you don't have to worry about it - repositories should only expose methods that save ARs and not regular entities. Therefore this scenario is not possible.

One last issue to consider. Imagine the following scenario: some AR holds a reference to an internal member of another AR. In our example, let say that another AR, the class User, is holding a list of all the user's Reservations. Those Reservations are also internal members of some Flights. 


Is this possible?

















What if some User object modifies a Reservation and then this User object is sent to UserRepository to be saved? According to the Repository pattern this is all perfectly legal - a User is an AR and hence should have a repository to save it.
But still, we do have a problem here, the modification to the Reservation object may violate some of the invariants of a Flight and this Flight won't even know about it. Do not worry, if you're following the Aggregate pattern correctly, this scenario is also not possible. It's true, at some point, a User object may hold a reference to some Reservation of some Flight, but this reference is Transient, meaning, the Reservation is not a member of User. Therefore, even if some User object will modify a Reservation and then this User object will be saved to the DB, the Reservation will not be saved along with it.

An entity can be a member of only one AR













Conclusion

Brute force validation allows you to expose the AR's internal members (if needed) and yet, to be confident that even if some of the invariants are violated - these violations will be discovered before saving the AR to the DB.

Wednesday, June 6, 2012

Asynchronous Programming in .Net 4.5

In this post i would like to talk about .net 4.5 and the enhancement made there for the Asynchronous Programming.

Introduction
Threads are very expensive - they consume memory (1Mb per thread) and they consume time in their initialization, finalization and context switching. Therefore, threads should be managed carefully by the thread pool. The goal is to create threads no more than needed.

When a thread is executing an I/O operation such as networking, file system, etc., the thread is being blocked by Windows while the hardware device is performing the I/O operation. The thread will continue to run when the device will finish its operation. So far so good since a waiting thread should not waste a precious CPU time.
But there is a problem though - a blocked thread does not return to the thread pool and thus forcing the thread pool to create new threads for incoming requests, or even worse - reject incoming  requests.
Not only I/O operations block threads. For example: SqlConnection.Open can block a thread until it will have an available connection to supply.

Consider the following piece of code:
figure 1

Here we have 4 blocking operations which are highlighted in yellow.
This code is quite problematic for a scalable server. Imagine a server that serves tens, hundreds or even thousands of concurrent requests - this means that while waiting for these operations to finish and thus finally release the thread, new requests are keep coming in and since threads are blocked and are not being released back to the thread pool, the thread pool will produce more and more threads in order to handle the incoming requests. At best, the thread pool will manage to handle all the incoming requests by producing more and more threads which will eventually decrease the performance dramatically (as stated above). At worst, the number of threads will reach the limit of the thread pool and thus, incoming requests will be queued.

Asynchronous Programming
Previous versions of .net prior to 4.5 already had a solution (not completed) to the problem stated above.
The solution was in the form of Asynchronous Programming which includes all the Begin_xxx and End_xxx methods. For example: SqlCommand.BeginExecuteReader, WebRequest.BeginGetResponse and so forth.

Below, is an example code:




























figure 2

Async Programming in general is not the objective of this post so I'm not going to explain in details how SqlCommand.BeginExecuteReader works. But i will say that when the thread is invoking BeginExecuteReader it is not blocked and it is free to continue to the next line of execution. When the SqlCommand finishes to execute the query against the DB, the inline method supplied to BeginExecuteReader will be invoked by some available thread, not necessarily the one who called BeginExecuteReader. And thus, no threads are being blocked like they were if they were invoking SqlCommand.ExecuteReader.

As mentioned, this code belongs to .net versions prior to 4.5 and it has some major drawbacks:
1 - using statements can not be used.
2 - Inline methods are not so intuitive.
3 -  there are no async executions for DataReader.GetInt32, DataReader.Read and SqlConnection.Open. Therefore, blocking operations are not fully avoided.

Actually, SqlConnection.Open can be a real bottleneck. I made an experiment: I have created an asp.net application with 2 pages: light.aspx and heavy.aspx. The light one should perform a quick and simple operation, say, some simple calculation. The heavy one should perform a heavy and long operation, say, executing some heavy query against the db which might take a few seconds.
The heavy page will use a connection pool (not a thread pool!) of 30 connections.
I have implemented the heavy page in 2 versions: a synchronous version which will implement the code from figure 1 and an asynchronous version which will implement the code from figure 2.
For both versions, a simple client application that i've built, sent 1000 requests for the heavy page and then 1000 requests for the light page.
What I'm interested to see in such an experiment is how the light pages are responding when there are many requests for heavy pages that are consuming threads from the thread pool.

I've expected that the results of the asynchronous version will be better since less threads will be blocked by the heavy page and thus less threads will be created by the thread pool.
I was wrong - at both versions the server became irresponsive at some point for both light and heavy pages.
I expected this for the synchronous version, but why did it also happened for the asynchronous version?
The answer is this: the first 30 requests were not blocked since SqlConnection.Open had available connections to supply them. But from the 31th request and forth, SqlConnection.Open blocked all threads until some of the first 30 threads will finish their job and release their connections. Thus, more and more threads became blocked, hence, increasing the load on the thread pool. At some point, new incoming requests, whether they were for heavy pages or light pages could not be handled and thus where queued.

Now we'll see how .net 4.5 can help us solve this problem.

.Net 4.5 - Asynchronous Programming
In the code below you can see the new way to implement async operations with .net 4.5:


















figure 3


The first thing to notice about is two new keywords: await and async. To support these 2 new keywords you have 2 options: upgrade VS 2010 by installing this and then this, or you can start working with higher versions of VS: 2011 or 2012.
But since we are using features of the async ADO.Net which is part of the .net 4.5 - we cannot use VS 2010 which doesn't support them anyway (as far as i know).

OK, now let's analyze it.
In figure 4 you can see the control flow of the thread that will execute the method shown in figure 3. There you can see that thread t1 is the executing thread and it is the one who calls SqlConnect from within Foo.
When the thread is executing the line:  await con.OpenAsync(); ado.net is trying to allocate a free and available connection in the connection pool. If all connections are taken (and the limit is reached), the thread will skip all the code below that line and will return back to the point in which it entered SqlConnect and will continue to execute the Foo method. The code below await con.OpenAsync(); will be executed when some connection will become available and it will be executed by a thread which most likely won't be t1 (t2 in figure 4).














figure 4


Of course, the same goes for all the other lines involving the await keyword, meaning that when the new thread (t2), which will execute the rest of the code, will reach a line 99 in figure 3, it will skip all the lines below it and return back to the thread pool.

This ensures us that SqlConnect does not involve any blocking points and thus, no thread will be blocked by SqlConnect and this will increase the overall availability of the thread pool's threads.

I went back to my experiment and changed the heavy page to implement the code from figure 3.
Just to remind you, what  I'm interested to see in such an experiment is how the light pages are responding when there are many requests for heavy pages that are consuming threads from the thread pool.
I run my test again and... good news! for the new version, the responsiveness of the server for light pages was the same whether heavy pages were running in the background or whether not. It means that the heavy pages did not add any significant load on the thread pool.
By monitoring the thread pool this came up to be true - the thread pool hardly needed new threads to handle the requests for both heavy and light pages.

Thursday, March 8, 2012

Aggregate [DDD] - boosting the performance.

In this post I would like to tell you about some experiment I've made - put to the test the Aggregate [DDD] pattern.

From Eric Evan's book:
It is difficult to guarantee the consistency of changes to objects in a model with complex associations. Invariants need to be maintained that apply to closely related groups of objects, not just discrete objects. Yet cautious locking schemes cause multiple users to interfere pointlessly with each other and make a system unusable. [DDD, p. 126] 

According to this, the Aggregate pattern should provide a more efficient way to enforce invariants in a multiple users environment by significantly reducing DB locks.
OK, so I've put this to the test.

I've simulated an eCommerce store with 80,000 concurrent users which try to add/edit different OrderLines of 1000 different Orders. One or more users can work simultaneously on the same Order.
Invariant: each Order has a MaximumTotal that cannot be exceeded by the sum of the Amount of all of its OrderLines. 
I've used SQL Server 2005 + NHibernate 3.1.0.

So first I tried to enforce this invariant without the Aggregate pattern. I've created an OrderLineService with 2 methods:

This method gets an orderId and an amount. It fetches the Order eagerly with its OrderLines, find the first OrderLine and tries to update its Amount with the amount passed as parameter. Before updating the amount, we must check that the invariant is not going to be violated, so we calculate the sum of all of the OrderLines of the given Order. 
But what if right after we found out that the invariant is not going to be violated and right before committing the changes - a second concurrent user has added a new OrderLine that might violate the invariant?
For example:
Order #123 has a MaxTotal of 100$. It has 2 OrderLines with 40$ each.
Two requests (r1 and r2) are arriving to the server simultaneously - r1 wants to update the first OrderLine to 50$ and r2 wants to add a new OrderLine with an amount of 20$. If both will succeed the invariant will be violated.
r1 fetches Order #123. 
r2 fetches Order #123. 
r1 checks the invariant, find it to be ok and updates the amount (but yet to commit).
r2 checks the invariant, find it to be ok and adds a new OrderLine.
r2 commits.
Now in the db we have 3 OrderLines to Order #123: 40$, 40$, 20$ - invariant is kept. 
r1 commits.
Now in the db we have 3 OrderLines to Order #123: 50$, 40$, 20$ - invariant is violated and we don't even know about it.

To prevent this we must lock the table with RepeatableRead isolation (in MSSQL 2005 this isolation level also prevents the phantom reads). This means that until that transaction is not committed - NO OTHER USER CAN INSERT A NEW ORDERLINE, NOT EVEN IF THIS NEW ORDERLINE BELONGS TO ANOTHER ORDER!

The next Method:


This method gets an orderId and an amount. It fetches the Order eagerly with its OrderLines and tries to add a new OrderLine with the amount passed as a parameter. Before adding the new OrderLine, we must check that the invariant is not going to be violated, so we calculate the sum of all of the OrderLines of the given Order. 
Again, for the same reason as stated above, we must lock the table with RepeatableRead isolation.

Here is the code that simulates 80,000 concurrent users, all are trying to add/edit different OrderLines with different amounts.

By not using the Aggregate pattern it took ~200,000 milliseconds to process 80,000 concurrent requests.

Now let's simulate a scenario in which we do use the Aggregate pattern.
First I've added two new methods to class Order: UpdateFirstOrderLine and AddOrderLine.

Next I've also add 2 new methods to OrderLineService: UpdateFirstOrderLineWithAggr and InsertNewOrderLineWithAggr:

As you can see, in these 2 methods I haven't used the RepeatableRead isolation level for none of the transactions, meaning 2 concurrent users can simultaneously add 2 different OrderLines to the same Order and potentially violate the invariant. So how can we tolerate this? 
Let's look at the pattern's definition again:

Choose one Entity to be the root of each Aggregate, and control all access to the objects inside the boundary through the root.

Any changes to Order.OrderLines list, whether if it's adding a new OrderLine or modifying an existing one, will be done through the root (Order) and will increase the root's Version (optimistic-lock). Therefore, if two concurrent users will try to add two different OrderLines to the same Order - one will succeed and the other will fail for trying to update an out-of-date instance of Order.
With this mechanism, I don't have to lock the whole OrderLines table any more - i can prevent simultaneous modifications through the root object.

Let's simulate this.
Order #123 has a MaxTotal of 100$ and Version 2. It has 2 OrderLines with 40$ each.
Two requests (r1 and r2) are arriving to the server simultaneously - r1 wants to update the first OrderLine to 50$ and r2 wants to add a new OrderLine with an amount of 20$. If both will succeed the invariant will be violated.
r1 fetches Order #123. Version is 2
r2 fetches Order #123. Version is 2
r1 checks the invariant, find it to be ok and updates the amount (but yet to commit).
r2 checks the invariant, find it to be ok and adds a new OrderLine.
r2 commits.
Now in the db we have 3 OrderLines to Order #123: 40$, 40$, 20$ - invariant is kept. Also, the Version is now 3.
r1 tries to commit but with Version 2 and fails.

For that mechanism to work properly, we need to be 100% sure that any changes to one of the Aggregate's members will increase the root's version. As Evan says:

Because the root controls access, it cannot be blindsided by changes to the internals  (see also)

Here is the code that simulates 80,000 concurrent users, all are trying to add/edit different OrderLines with different amounts:


By using the Aggregate pattern it took ~54,000 milliseconds to process 80,000 concurrent requests.

Around 4 times faster!!!

All source files can be found here

Tuesday, January 17, 2012

hbm2net - c# instead of T4

I'm still mapping my entities to NHibernate by using hbm files. I still do this for several reasons, but i won't detail them now.

Since I'm using hbm files, I want to exploit one of their huge advantage - auto generating lot's of code that can be derived from the hbm files by using the hbm2net tool. 

for example:

- Auto generate my POCO entities.
- Auto generate my DTO entities.
- Auto generate my server side validations and their equivalent client side validations
- Implementing cross cutting behaviors like overriding GetHashCode(), Equals(), or invoking      NotifyPropertyChanged etc.

In it's earliest versions, hbm2net was expecting a Velocity script to auto generate code. Recent versions can also work with T4 scripts.
hbm2net is great and it's hard to imagine how to work with out it. Unfortunately, Velocity is not that user-friendly and neither T4.

My preferred way is to implement my own generator written in c#, which can be plugged into hbm2net instead of working with T4 or Velocity.

And why would i prefer c#...? Well, i guess that's obvious...

So, lets get to work.

First of all, download the latest version of hbm2net from here and extract it to wherever you like.

Next, create a Class Library project in Visual Studio and call it MyHbm2NetGenerator. 
Add a reference to NHibernate.Tool.hbm2net.dll (should be located where you've extracted the zip file).
Add a class to this project and call it POCOGenerator. This class should be derived from NHibernate.Tool.hbm2net.AbstractRenderer and implement NHibernate.Tool.hbm2net.ICanProvideStream:


the hbm2net will create a single instance of this class and will use it to generate the derived code for all hbm files.

Next, implement  ICanProvideStream.CheckIfSourceIsNewer:


The hbm2net will invoke CheckIfSourceIsNewer for each hbm file. The source parameter will be the LastWriteTimeUtc of the current hbm file. The directory parameter will be the path for the output directory in which the generated files will be stored. This method should return true if source is greater than the LastWriteTimeUtc of the generated file, meaning - if there were changes in the hbm file since the last generation of the POCO file.

The method GetFileName is receiving the parameter clazz which is holding almost all the details about the POCO entity that is going to be generated. I'll give more details about this class soon, but for now, all we need in this method is the POCO entity name which can be found at clazz.GeneratedName.


Next, implement  ICanProvideStream.GetStream:

The hbm2net will invoke this method to get a stream to flush the content of the current generated POCO entity.

Next, you need to override the method Render. This is actually the main method where you generate the content of the POCO entity and flush it (save it).


Now, implement a method that will generate the POCO's content (GeneratePOCO is the name i gave it).
Of course you should be using  the ClassMapping object to get all the POCO's details e.g. class name, class modifiers, base class, properties, fields etc.
In the next post i will show you in more details what can be done with ClassMapping in order to generate the desired POCO content.

OK, we're getting there: compile your MyHbm2NetGenerator project and then copy MyHbm2NetGenerator.dll to the directory where you've extracted hbm2net.

Next, create an xml file, call it config.xml (or whatever...) and put it wherever you like. config.xml should look like this:


renderer is the FullyQualifiedName of your POCOGenerator class. package is the namespace for your POCO entities - you will receive it in the POCOGenerator.Render as the savedToPackage parameter.

Now, execute the following command in the command shell:

<hbm2net dir>\hbm2net.exe --config=<config dir>\config.xml --output=<output dir> <hbm files dir>\*.hbm.xml


And that's it! Go to <output dir> to see your generated files.


To make hbm2net auto generate your code on every build of your domain/DTO/validations project, you can make a pre/post build event in your project settings with the command line I just showed you.

download code example