On Web Gardens, ASP.NET, and IIS 6.0

So as I’ve been working with a Web site deployed using Web gardens over the past two years, I’ve learned some painful quirks that come along with them.

What is a Web garden?

Think of a web garden as a web farm, but all in the context of a single machine. You have multiple worker processes running your application, preferably each running on a different core in your multicore computer.

Available right-clicking an application pool and choosing Properties in the IIS Management snap-in.

Available right-clicking an application pool and choosing Properties in the IIS Management snap-in.

Why is this a good thing? Is it for performance? Well, not performance alone and not necessarily at all. Using a Web garden will only be good for performance if your application does some unusual blocking and locking while processing a request such that all the little threads within an individual worker process get tied up waiting for another thread to release the lock. We had this for a while because our Web framework had a nasty threading bug when rendering templates, and running in a Web garden helped because while separate threads within a single process could clobber one another, separate threads in separate processes obviously couldn’t.

The bigger gain for Web gardens is not performance but robustness. If one of the worker processes hosting your application goes ape in that you get stuck in a loop and run at full tilt CPU until health monitoring kills you, then you’ve just stopped serving all requests from that process until it’s killed. If that’s the only process, then you’ve stopped serving requests for your Web site entirely. But if you have a Web garden with, say, 3 worker processes and one of them goes AWOL, then at least you’re serving requests from the remaining 2 until the bananas worker process is shut down and spun up again.

What are the caveats?

There are several caveats to consider, but they’re all important from a scalability perspective. In fact, enabling Web gardening can be a good way to see if your application will be able to function in a Web farm scenario on multiple servers reasonably well. If you can deal with a Web garden now, you’ll have a far easier time scaling out to multiple servers in the future than if you’d stuck with an application that assumes everything lives within the same worker process.

No InProc session storage

One of the caveats about using a Web garden is that if you’re using session state in your application, then you need to use an out-of-process session state, such as the ASP.NET State Service or sessions stored in SQL Server. InProc session management won’t work because each worker process will be maintaining its own session state. So if a customer is browsing your Web site and you have 3 worker processors, then your customer has a 1 in 3 chance of losing his session state as IIS round-robins his subsequent requests among the 3 available worker processes. If you use an out-of-process session state, then you can be sure that all 3 worker processes are consulting the same single resource as the place to store and retrieve session data.

With InProc storage and Web gardens, Bob the Blue Dot only has a 1 in 3 chance of finding his original session on any given request.

With InProc storage and Web gardens, Bob the Blue Dot only has a 1 in 3 chance of finding his original session on any given request.

No built-in caching mechanism

Similarly, if you’re using System.Web.Caching, you need to remember that each individual worker process is going to maintain its own cache. So if you cache a big list of products on your catalog listing page and you have three worker processes, then the database is going to get hit at least three times and that page is going to get cached in three separate places in RAM–one place for each process. This can make an operation such as clearing the cache difficult.

Suppose you have a protected HTTP handler, say, ClearCaches.axd, which if pinged by some administrative IP address tells the System.Web.Caching cache to clear itself. You’ll only have cleared out the cache for the worker process that happened to serve that particular request. To clear them all, you’d have to recycle all of the worker processes through the IIS Management snap-in or just keep pinging your ClearCaches.axd until you were confident that IIS had round-robined you across to all of the worker processes. Neither is really ideal.

A better solution would be to move to an out-of-process cache like memcached. Then you only have one service that you need to clear the cache for, and that cache clearing action will be observed by all three worker processes.

These are the two things that you’d naturally expect to have to deal with when running Web gardens. Now let’s talk about two that you normally wouldn’t expect. (That’s a nice way of saying that I learned them the hard way.)

Poor App_Offline.htm integration

So ASP.NET 2.0 has this little nifty feature that revolves around a file called App_Offline.htm. You dump a file named like this in the root of your ASP.NET Web application. Upon the next request to any page on the site served through the ASP.NET ISAPI handler, the worker process will spin down, and the contents of the App_Offline.htm file will be served instead. (Stupidly, ASP.NET will send a 404 header for this, which is plain wrong, but this is hard-coded.) This unlocks the DLLs in your site’s bin directory. So if you filled that file with a message like “We’re down for maintenance; check back soon,” you’d be able to xcopy over the new binaries to the bin directory, delete the App_Offline.htm file when you’re done, and you’d have provided a simple and graceful site maintenance message to your users in the process.

But this doesn’t work well for Web gardens. The problem is that the worker process spins down and releases its lock on your application binaries only after the next request to that individual process after you created the App_Offline.htm file. So if I have a Web garden with 3 worker processes A, B, and C, and I plunk my App_Offline.htm file down, and I visit the home page of my site, then something like this might happen: Worker process B decides to process my request, notes that the App_Offline.htm file exists, serves me the contents of the file, and then spins down and releases its lock on the application binaries. But processes A and C are still churning along and locking those files. I’d either have to refresh the homepage a whole bunch of times to be sure I had round-robined across all of the binaries to get them to all spin down (less than ideal), or I’d have to kill all the worker processes by stopping the application pool in the IIS Management snap-in. In the end, though, it’s easiest to just pretend this feature doesn’t exist for you and find some alternative update mechanism, such as deactivating your main site and activating a completely separate Web site in IIS that simply displays the site maintenance message.

Bug in .NET Framework installer can cause a strange scenario

I lost some hair over this one because it’s unexpected and only appears under a very specific scenario. Here we go:

Take a server running .NET 3.0 and an ASP.NET Web site running in an application pool that has Web gardens enabled (number of processes: 3). The web.config configuration is something like the following:

    <sessionState
      cookieless="UseCookies"
      cookieName=".authz"
      mode="StateServer"
      regenerateExpiredSessionId="true"
      stateConnectionString="tcpip=127.0.0.1:42424"
      timeout="60"
      useHostingIdentity="true" />

Now upgrade the machine to .NET 3.5 SP1. Reboot the server. Stand back in shock and horror to find that sessions are no longer maintained across the worker processes, as if all of them have reverted to InProc session storage. You see that the configuration obviously still says StateServer, but it Simply Does Not Work. You quickly reduce to 1 worker process as the current workaround.

The problem appears to occur when all of the following conditions are true:

  • You are running Windows Server 2003 (IIS 6.0) and an ASP.NET 2.0 web site.
  • The Web site is configured to use Web Gardens, where the maximum number of worker processes is greater than 1. Because of this, you have configured your application to use an out-of-process session storage; in this scenario, the ASP.NET State Service running on the local machine.
  • The application pool identity is set not to NETWORK SERVICE but to a custom, low-privileged user account that you created per a deployment best practice.
  • You run an installer that updates the .NET framework; in my case, this was an update from .NET 3.0 to .NET 3.5 SP1.
  • When the upgrade finishes and you reboot the server, you find that your session variables are frequently lost upon refreshing a page, since there is only a 1 in 3 chance of you getting the original worker process that served your original request. But this shouldn’t matter, since you’re using the ASP.NET state service. What broke?

When using the ASP.NET state service, ASP.NET uses a value called the machineKey to encrypt and/or hash all session data to be stored (I don’t know if it’s encrypting or hashing or both, but it’s not an important distinction for this discussion). This is so that when any worker process asks for data from the service using a session identifier, it can be sure that the data was not tampered with while it was being stored in the external data source.

If you are on a web farm, then you probably have a static machineKey defined in your web.config file, and this issue does not occur. But for a single-server web garden scenario, you probably rely on the default machineKey setting, which is set to AutoGenerate,IsolateApps for ASP.NET 2.0 applications. This means that ASP.NET automatically generates a machine key that is unique to your application pool. It regenerates this key according to some algorithm, but that is not important for this discussion.

The generated value is normally stored in the registry under HKLMSOFTWAREMicrosoftASP.NET2.0.50727.0AutoGenKeys{SID of the Application Pool Identity}. But the .NET Framework installer incorrectly (I do believe this is a bug) destroys this registry key and, to add insult to injury, resets the permissions on this key such that your custom application pool identity cannot write to the registry entry when it goes to create its new machine key.

Without being able to store this key, each process will think session data inserted by other processes has been tampered with or is otherwise invalid. Yikes!

Without being able to store this key, each process will think session data inserted by other processes has been tampered with or is otherwise invalid. Yikes!

The result is that each worker process that spins up in the web garden is using its own in-memory copy of a machine key that it generated just in time, effectively creating a web farm scenario by accident. For example, worker process A spins up, sees that no AutoGenKey entry exists (indeed, it cannot even read it), generates its own and begins using that to hash data sent to the ASP.NET State Service. It tries to save this new machine key to the registry entry, but fails silently. Worker process B spins up, sees that no AutoGenKey entry exists, generates its own and begins using that to hash data…you see where this is going.

Bad permissions on this one registry key can cause some real headache-inducing behavior as everything seems to behave as if it were InProc.

Bad permissions on this one registry key can cause some real headache-inducing behavior as everything seems to behave as if it were InProc.

The result is now you have session data hashed with three different machine keys. Though the data for the session identifier exists, two out of three of the worker processes will reject it as invalid/tampered because it is using its own key.

You could get around this by explicitly setting a custom machineKey in your web.config file.

Or you could re-run aspnet_regiis.exe -ga MachineNameApplicationPoolUserName at a Command Prompt to fix up the broken permissions.

Conclusions and Delusions

There you have it! Web gardens are a really neat feature for increasing the robustness of your application as well as testing the general suitability of your application for migration to a Web farm scenario. But documentation out there is pretty scarce, and I get the feeling that people don’t use Web gardens all that often. Hopefully, these little issues that I’ve run across will save somebody some heartache in the future. Good luck!