Technology

C# and GC.Collect()

It’s been said before but that doesn’t mean it’s said to often:
Don’t use GC.Collect()
So first, if you just can’t resist doing a GC.Collect(), let’s talk about your
compulsion. Then we’ll go into a little more about why calling GC.Collect() is
bad.

If you must

Of course, there are exceptions. Check out a
Rico Mariani’s
posting for some examples. If you are going to do it, you should document
exactly why you think it is relevant preferably with some actual measurements:

// We finished handling an event that generated a lot of data.  We're
// going wait for input, usually more then 30 seconds, and this
// GC.Collect() reduced our size by 120MBytes and saves time
// on the next request when a GC automatically happens -- pete, 2007/12/03
GC.Collect();

I also prefer to own up to doing something questionable so I include my name and
when. Odds are in a couple years it’s not relevant anymore and someone can come
ask me.
I think when you are reading code and see a GC.Collect() without such
documentation, then the appropriate comment is:

// GC.Collect();

I’m a coward. If something does go wrong, it’s nice to know a GC.Collect() was there.

Why you shouldn’t

Why is GC.Collect() bad? Professionally, the first thing that comes to mind
when I see it is this person doesn’t know what they’re doing. They probably
generated 8 copies of a 100MByte buffer and rather then put the thought into how
to use a single buffer, they just called GC.Collect() to keep the application
from looking like its really huge. Of course, they are still copying that data
around but who cares about performance?
The bigger issue is that calling GC.Collect() is not harmless! There are real
performance consequences.

Review: Garbage Collection

C# uses a generation based stop-and-copy style garbage collection algorithm.
The runtime waits until you’ve reached a certain amount of memory allocation and
then automatically invokes this algorithm. Please note, the runtime doesn’t run
out of memory — it detects such problems and automatically starts doing the GC
so you explicitly calling GC.Collect() doesn’t help with out-of-memory
situations.
Let’s ignore the generation part and look at the stop-and-copy part (but in a
simple way). Picture a big array. Each element of that array is an object that
your application has allocated. So maybe there are a million elements in this
array. The first step is to allocate a new array (let’s just assume it’s the
same size). The runtime starts with a “root” node and copies that into this new
array. It then copies every object that root node references into the new
array. It then moves to the second object in the new array (that’d be the first
object the “root” node referenced) and copies every object the second object
references. It then moves to the third object in the new array (that’d be the
second object the “root” node referenced) and copies everything it references
into the new array.
After a while, every object that can referenced any other object is in this new
array. Everything in the original array is now garbage; let’s pretend that 1/2
or 500,000 objects are now deallocated. The original array is freed and your
are good to go.
There a few details: it doesn’t copy anything twice and it updates references to
go to the location in the new array. There are bookkeeping details I’m glossing
over. Convince yourself the above algorithm works. If you don’t, you’ll never
be happy about GC.
So next time you call GC.Collect() in some key function, think about all the
extra work that is going on. Normally, the reason there is not much of a
performance penalty is you typically only have to do garbage collection on a
periodic basis. I.e. if your application generates an average of say a 10,000
objects a second then maybe only after 50 seconds is there a need to do garbage
collection.
The generational part is a wonderful optimization and represents another reason
why you should not call GC.Collect(). Most applications have some global
variables and some data structures (e.g. objects) that stick around for the life
of the application. Why bother checking and copying all those objects? These
are “old” objects that’ve stuck around for a while and are likely to stay
around.
Just the opposite of the “old” objects are those objects you allocate in a for()
loop or other local variables in a function. The runtime should try and forget
about those as soon as possible. These representation generations: locally
allocated and quickly forgotten about objects are generation 0 and “old” objects
that aren’t going to be going away are generation N.
So now imagine we are low on memory and need to do garbage collection. Start by
doing the stop-and-copy on generation 0. With luck, that frees up enough memory
and only checks 10,000 objects (made up number). That means the runtime can
stop doing the garbage collection. Why? Because most of these objects are
local values that are temporary in nature.
Anything in generation 0 that survives is promoted to generation 1. Why? Well,
if it last this long, it’s probably going to last longer so there’s no need to
try and collect it again in generation 0.
What happens if we don’t free up enough memory while collecting generation 0?
Well, we repeat the process with generation 1. Anything left over is promoted
to generation 2. What happens if we still don’t have enough memory? In C#, it
now does the final generation 2. Which means we’ve now checked every object in
the system. Pretty nifty.
Unfortunately, if you call GC.Collect() that moves everything up one generation.
After two GC.Collect() calls, all the local and temporary objects are promoted
to generation 2.

Conclusion

Trust your garbage collector. It works. Calling GC.Collect() is going to hurt
your performance more then it helps.

C# and LOH

LOH (Large Object Heap) is used by objects larger then 85K. Conceptually, it is identical to any other GC’d object except for two items:

  • Immediately put into the Gen2

  • It’s not compacted (i.e. copied to a new location).

IMG 380.jpg

Trying out ecko

I was dissatisfied with MarsEdit primarily because the photo editing was non-existent.
Of course, now I can’t get picasa integration!

Installing apache on Mac OS X

I’ve got Leopard (10.5.1) with the developer tools installed and want to do some
more development in Django on my laptop. I first want to install MacPort to make it
easier to download and install various opensource projects. Using MacPorts, I’ll
get Apache
installed and configured. Finally, I’ll get the basic django working.
1. Installing Mac Ports painless. Just follow the direction at the web site. I prefered to download and install the disk image: disk image
2. You need to add port to your path. I use a ~.bashrc:

PATH=/usr/local/bin:/usr/X11R6/bin:/opt/local/bin:/opt/local/sbin:$PATH
    

and make sure it’s available (create a new shell, a new terminal window, or “. ~/.bashrc”). The command

bport -h/b
Usage: port
        [-bcdfiknopqRstuvx] [-D portdir] [-F cmdfile] action [privopts] [actionflags]
        [[portname|pseudo-portname|port-url] [@version] [+-variant]... [option=value]...]...
"port help" or "man 1 port" for more information.
    

should print a help message.
3. I checked to make sure which apache and any options I could chose from (the “sudo” is so I can run the command as root (administrative user) and I do it each time so I have a log of my actions)

$ sudo port selfupdate
MacPorts base version 1.600 installed
Downloaded MacPorts base version 1.600
The MacPorts installation is not outdated and so was not updated
selfupdate done!
$ sudo port list apache
apache                         @1.3.37         www/apache
macbook:~ ware$ sudo port list apache2
apache2                        @2.2.6          www/apache2
$ sudo port variants apache
apache has the variants:
	universal
	darwin
	apache_layout

And finally, let’s install it:

$ sudo port install apache2

Shockingly, there was an error:

...
checking for mawk... (cached) no
checking for gawk... (cached) no
checking for nawk... (cached) no
checking for awk... (cached) no
configure: error: No awk program found
  

I think there was a path problem involved because gawk was now installed and restarting the install worked:

$ sudo port install gawk
Skipping org.macports.activate (gawk ) since this port is already active
$ sudo port install apache2
  

I now need to tell OS X about starting up this application on system booting:

$ sudo launchctl load -w /Library/LaunchDaemons/org.macports.apache2.plist
  

And now it’s time to configure apache:

$ cd /opt/local/apache2/conf
$ cp httpd.conf.sample httpd.conf
   

Trying to start it apache up gave me this error:

$ sudo /opt/local/apache2/bin/apachectl start
httpd: Syntax error on line 96 of /opt/local/apache2/conf/httpd.conf: Cannot load /opt/local/apache2/modules/mod_ssl.so into server: dlopen(/opt/local/apache2/modules/mod_ssl.so, 10): Symbol not found: _ssl_cmd_SSLCACertificateFilen  Referenced from: /opt/local/apache2/modules/mod_ssl.son  Expected in: flat namespacen
   

Use your favorite editor to comment out ssl (I don’t need it, yet)

$ sudo vi httpd.conf
...
#LoadModule ssl_module modules/mod_ssl.so
...

3. Start it up:

$ sudo /opt/local/apache2/bin/apachectl start
$ sudo bin/apachectl start
 

4. And prove it works by connecting to http://localhost/

Apple Developer Center

So I was checking out Apple Developer Connection which is informative. But why would I pay $499 for a Select membership?
The ADC Compatibility labs seems interesting. Test software on multiple configurations twice per month. Too bad I’m in NY.