C# and GC.Collect()

It’s been said before but that doesn’t mean it’s said to often:
Don’t use GC.Collect()
So first, if you just can’t resist doing a GC.Collect(), let’s talk about your
compulsion. Then we’ll go into a little more about why calling GC.Collect() is
bad.

If you must

Of course, there are exceptions. Check out a
Rico Mariani’s
posting for some examples. If you are going to do it, you should document
exactly why you think it is relevant preferably with some actual measurements:

// We finished handling an event that generated a lot of data.  We're
// going wait for input, usually more then 30 seconds, and this
// GC.Collect() reduced our size by 120MBytes and saves time
// on the next request when a GC automatically happens -- pete, 2007/12/03
GC.Collect();

I also prefer to own up to doing something questionable so I include my name and
when. Odds are in a couple years it’s not relevant anymore and someone can come
ask me.
I think when you are reading code and see a GC.Collect() without such
documentation, then the appropriate comment is:

// GC.Collect();

I’m a coward. If something does go wrong, it’s nice to know a GC.Collect() was there.

Why you shouldn’t

Why is GC.Collect() bad? Professionally, the first thing that comes to mind
when I see it is this person doesn’t know what they’re doing. They probably
generated 8 copies of a 100MByte buffer and rather then put the thought into how
to use a single buffer, they just called GC.Collect() to keep the application
from looking like its really huge. Of course, they are still copying that data
around but who cares about performance?
The bigger issue is that calling GC.Collect() is not harmless! There are real
performance consequences.

Review: Garbage Collection

C# uses a generation based stop-and-copy style garbage collection algorithm.
The runtime waits until you’ve reached a certain amount of memory allocation and
then automatically invokes this algorithm. Please note, the runtime doesn’t run
out of memory — it detects such problems and automatically starts doing the GC
so you explicitly calling GC.Collect() doesn’t help with out-of-memory
situations.
Let’s ignore the generation part and look at the stop-and-copy part (but in a
simple way). Picture a big array. Each element of that array is an object that
your application has allocated. So maybe there are a million elements in this
array. The first step is to allocate a new array (let’s just assume it’s the
same size). The runtime starts with a “root” node and copies that into this new
array. It then copies every object that root node references into the new
array. It then moves to the second object in the new array (that’d be the first
object the “root” node referenced) and copies every object the second object
references. It then moves to the third object in the new array (that’d be the
second object the “root” node referenced) and copies everything it references
into the new array.
After a while, every object that can referenced any other object is in this new
array. Everything in the original array is now garbage; let’s pretend that 1/2
or 500,000 objects are now deallocated. The original array is freed and your
are good to go.
There a few details: it doesn’t copy anything twice and it updates references to
go to the location in the new array. There are bookkeeping details I’m glossing
over. Convince yourself the above algorithm works. If you don’t, you’ll never
be happy about GC.
So next time you call GC.Collect() in some key function, think about all the
extra work that is going on. Normally, the reason there is not much of a
performance penalty is you typically only have to do garbage collection on a
periodic basis. I.e. if your application generates an average of say a 10,000
objects a second then maybe only after 50 seconds is there a need to do garbage
collection.
The generational part is a wonderful optimization and represents another reason
why you should not call GC.Collect(). Most applications have some global
variables and some data structures (e.g. objects) that stick around for the life
of the application. Why bother checking and copying all those objects? These
are “old” objects that’ve stuck around for a while and are likely to stay
around.
Just the opposite of the “old” objects are those objects you allocate in a for()
loop or other local variables in a function. The runtime should try and forget
about those as soon as possible. These representation generations: locally
allocated and quickly forgotten about objects are generation 0 and “old” objects
that aren’t going to be going away are generation N.
So now imagine we are low on memory and need to do garbage collection. Start by
doing the stop-and-copy on generation 0. With luck, that frees up enough memory
and only checks 10,000 objects (made up number). That means the runtime can
stop doing the garbage collection. Why? Because most of these objects are
local values that are temporary in nature.
Anything in generation 0 that survives is promoted to generation 1. Why? Well,
if it last this long, it’s probably going to last longer so there’s no need to
try and collect it again in generation 0.
What happens if we don’t free up enough memory while collecting generation 0?
Well, we repeat the process with generation 1. Anything left over is promoted
to generation 2. What happens if we still don’t have enough memory? In C#, it
now does the final generation 2. Which means we’ve now checked every object in
the system. Pretty nifty.
Unfortunately, if you call GC.Collect() that moves everything up one generation.
After two GC.Collect() calls, all the local and temporary objects are promoted
to generation 2.

Conclusion

Trust your garbage collector. It works. Calling GC.Collect() is going to hurt
your performance more then it helps.