May 8 09

Connecting to a MySQL database on the .NET Compact Framework

by Nicholas Piasecki

Our mobile picking application at work runs on the .NET Compact Framework and typically connects to a SQL Server database. This is all nifty and supported with a vanilla install of NETCF and is as such uninteresting. But recently I’ve been tasked with an integration scenario where an alternative persistence implementation for the mobile picking application must communicate with a MySQL 5.1 database.

MySQL offers an ADO.NET driver called Connector/Net 6.0 at its Web site. Sweet. So you reference Mysql.Data.CF in your application and you’re good to go. But as soon as you call Open() on a MysqlConnection instance, you’ll get slammed with a System.IndexOutOfRangeException deep down in the innards of the of the Connector/Net library. If you take a look at the stack trace, you’ll see that it’s occurring in MemoryStream of all places. Apparently, upon calling Open(), the driver immediately issues a command to the database to get some metadata about its capabilities, and we blow up at trying to build that parameterized query.

Why is this broken?

The MySQL Connector/Net library is using the same code for the full framework as well as the compact framework, with little ifdefs for CF-specific or CF-omitted behavior sprinkled throughout the codebase. This isn’t a bad thing; in fact, it’s quite normal, and I do it for one of my common libraries that’s shared between our internal processing application and the mobile picker. The problem stems from an actual behavior difference between the MemoryStream implementations of the full and the compact frameworks. Here’s some sample code that demonstrates the problem:

MemoryStream ms = new MemoryStream();
ms.Position = 4;
ms.WriteByte((byte)0);
Console.WriteLine(ms.Capacity);
Console.ReadLine();

What we are doing is instancing a new MemoryStream via its default constructor; its internal byte[] buffer will be empty (capacity zero). We set its internal Position to somewhere and then try to write a byte.

In the full framework, the MemoryStream auto-expands the capacity by checking to see if the Position is greater than or equal to the Length. So after the above code executes, you end up with a MemoryStream instance whose internal byte[] buffer contains 5 bytes (all zeroes).

In the compact framework, the MemoryStream also auto-expands, but it only does it if the Position equals the Length–not if the Position is greater than the Length. So the above code results in a System.IndexOutOfRangeException. It’s not a documented exception for the WriteByte() method, so I would imagine it is an oversight in the compact framework. (Although the MySQL use case is a little strange; I would never use the MemoryStream class this way.)

The work around is to download the source distribution for Connector/Net 6.0 and edit the MySqlPacket.cs file. Around line 34, change

private MemoryStream buffer = new MemoryStream();

to

private MemoryStream buffer = new MemoryStream(5);

All we’re doing is starting the internal byte[] buffer with a capacity of 5, since the MySqlPacket.cs constructor sets the Position to 4. It shouldn’t affect actual behavior of the MySqlPacket since we’re only adding capacity and not length to the MemoryStream instance that it uses.

Other unrelated problems

If you’re upgrading the Connector/Net 6.0 source to NETCF v35 (just right-click the project in Visual Studio 2008 and choose “Upgrade Project,” you’ll need to delete the CompilerGeneratedAttribute.cs file. NETCF v35 already has one, but NETCF v20 didn’t; I guess that’s why the MySQL guys had created their own.

Finally, you may be getting a “Missing Manifest Exception” whenever Connector/Net 6.0 attempts to access its Resources object for error messages. That’s because they’re trying to share the same Resources.designer.cs file for the CF assembly that was created for the desktop assembly. The problem is that in line 42 of the generated Resources.designer.cs file, it’s trying to load the embedded resx file from the assembly–but it’s searching for “MySql.Data.MySqlClient.Properties.Resources”, which is the namespace for the desktop DLL. So it doesn’t find it in the NETCF assembly on your device and tosses the missing manifest error. The solution is to either fix up the name by editing it directly (e.g., “MySql.Data.CF.Properties.Resources”) or by deleting the file and letting Visual Studio regenerate it properly on the next recompile.

Conclusions and delusions

I’ve filed two bugs, one with MySQL and one on Connect, who knows what will happen with them:

The CF version of Connector/Net seems pretty alpha, but it’s what we’ve got to work with. Good luck!

Update 7/6/2009: Download my custom version of the file.

May 8 09

On ADO.NET Connection Pooling and SQL Server Error 10054

by Nicholas Piasecki

So our internal order processing application at work securely connects to our SQL Server database via an ADO.NET connection. And if you leave the application up and running for a long time period–say, an entire weekend–you sometimes come to work on Monday morning to see an error message like the following:

Something Didn't Work

Something Didn't Work

A transport-level error has occurred while sending the request to the server. (provider: TCP Provider, error: 0 – An existing connection was forcibly closed by the remote host.) (Microsoft SQL Server, Error: 10054)

(Think that’s a sexy error dialog? It’s from the SQL Server 2005 SDK. Just Google it; MS lets you use it in your own applications. Pretty cool for displaying an exception and all of the inner exceptions that it wraps.)

If you dismiss the dialog and retry the command that failed, it works fine. What’s up?

The ADO.NET connection pool is great…

Well, we’re using the connection pooling feature of ADO.NET. Usually, connection pooling is a behind-the-scenes feature that makes our database connection code run faster. Instead of setting up and tearing down the connection to the database server with each call to Open() and Close() on a SqlConnection instance, ADO.NET is instead acquiring and releasing connections to and from a pool of connections. If, while executing some command against a connection and a fault occurs, ADO.NET will automatically remove the faulting connection from the connection pool and spin up a new connection. Sounds nifty, right?

…until the network farts

Well, sometimes the connections fault in a way that ADO.NET isn’t aware of until the next time you try to use that connection. In my case, the ADO.NET connection pool is happily diddling along when, for example, our office’s craptastic Comcast SMC unholy-marriage-of-router-and-modem device hiccups and reboots itself, severing any sockets that were once open.

My code then tries to do something innocuous like call Open() on a SqlConnection, execute a command, and then Close() the connection, as it has done a million times before. ADO.NET pulls a connection from the connection pool, notes that the socket is broken, clears out the connection pool, creates a new connection, and executes my command successfully, and I and my users are none the wiser of the unimportant detail that this connection pool repair job just happened.

Ha ha, no, that’s not what happens; that would make sense. What actually happens is that ADO.NET does all of those things but not before throwing a SqlException as described above instead of executing my command. Why I would care about this error, I’m not sure–my code isn’t even aware that the connection pool exists, its calling of Open() and Close() is quite explicit in terms of the behavior that it is expecting, and ADO.NET throwing error 10054 is, in my opinion, leaking implementation details that should be hidden beneath the SqlConnection abstraction. But this is what we’ve got.

Extension methods are great

When C# 3.0 introduced extension methods, I really thought they were a silly feature that would lead to developer confusion. I was instantly reminded of my PHP days where everyone in the world had devised their own unique set of HTML escape functions, and moving to any new Web site involved learning a whole new suite of strange extensions or redefinitions of built-in functions. But, in .NET, I’ve found myself using extension functions more and more often, especially since their usage is neatly held in check through the use of namespaces. It is through some extension methods that I work around the above error in my code.

Essentially, instead of calling command.ExecuteReader() or command.ExecuteScalar() on my SqlCommand instances, I call command.SafeExecuteReader() or command.SafeExecuteScalar() instead, with the SafeExecute* functions simply being extension methods that I defined as similar to the following, which I lifted with some modifications off some developer’s forum somewhere some time ago:

namespace Skiviez.Porcupine.Model
{
    using System.Data;
    using System.Data.SqlClient;
 
    /// <summary>
    /// Contains extensions to the SqlCommand class.
    /// </summary>
    internal static class SqlCommandExtensions
    {
        /// <summary>
        /// The maximum failed attempts that can occur before a standard
        /// SqlException is thrown.
        /// </summary>
        private const int MaxFailedAttempts = 3;
 
        /// <summary>
        /// Obtains a reader safely by automatically retrying the command a
        /// number of times in case an invalid connection in the connection pool
        /// is encountered.
        /// </summary>
        /// <param name="command">The command.</param>
        /// <param name="commandBehavior">The command behavior.</param>
        /// <returns>the data reader</returns>
        /// <exception cref="T:System.Data.SqlClient.SqlException">if a non-
        /// recoverable error is encountered</exception>
        internal static SqlDataReader SafeExecuteReader(
            this SqlCommand command, 
            CommandBehavior commandBehavior)
        {
            SqlDataReader reader;
            bool retry;
            int failedAttemptsCount;
 
            reader = null;
            retry = false;
            failedAttemptsCount = 0;
 
            do
            {
                try
                {
                    reader = command.ExecuteReader(commandBehavior);
                }
                catch (SqlException e)
                {
                    if (e.ErrorCode == 10054)
                    {
                        // Transport error occurred, meaning the connection pool
                        // had a bad entry. We'll probably be okay if we retry
                        // one more time.
                        retry = true;
                        failedAttemptsCount++;
                    }
                    else
                    {
                        throw;
                    }
                }
            } 
            while (retry && failedAttemptsCount <= MaxFailedAttempts);
 
            return reader;
        }
}

Essentially, if executing the command results in a SqlException, I check to see if it is this stupid 10054 error. If it is, I’ll retry execution of the command up to 3 times, even though a single retry almost always fixes the problem. If it still doesn’t work after that–or if the error wasn’t 10054 to begin with–I assume that something really is hosed and throw the exception anyway.

Conclusions and delusions

Thanks to an extension method, it’s pretty easy to safeguard my users against an error that is not really an error without resorting to lots of boilerplate code around each ExecuteReader() call. It’s one more step of making sure that my software simply works (most of the time!).

Apr 14 09

File Sizes for Exporting SWF from Adobe InDesign CS4 are Huge Because Some Settings Simply Do Not Work

by Nicholas Piasecki

Update 5/5/2009: This is resolved in update 6.0.2. Original post follows.

My boss at work used Adobe InDesign CS4 to create a new, 20-page printed catalog for our customers. He discovered that InDesign has this nifty “Export to SWF” feature whereby, insto-presto, the same document can be converted into a snazzy Flash version, complete with super fancy page-turning effects. Neat!

Unfortunately, the file size of the generated Flash file is enormous–at 7.4 megabytes, it definitely wasn’t something that we could realistically put on our Web site.

Seems easy enough to fix! Let’s look at the options dialog that InDesign pops up during the export process:

Words, words, words...

Words, words, words...

Awesome! “JPEG Quality” sounds like just what we need, so let’s set it to “Minimum” and export the Flash–oh, wait, it’s the same size: 7.4 MB. Go back and set it to “Maximum”–7.4 MB. Set it to “Low”–7.4 MB.

Um.

A few wasted minutes of Googling gives us this gem on Adobe’s Forums from the purported Product Manager for InDesign:

First and foremost, unfortunately there is a bug in the JPEG Quality setting of the InDesign CS4 SWF Export dialog. The bug is that no matter what value you choose from the pop-up menu (Minimum, Low, Medium, High, Maximum), they will all end up as Medium quality. The bad news is that this bug was not discovered (internally) or reported (by pre-release and CS4 customers) before we shipped CS4 (6.0.0), or the 6.0.1 dot release. The good news is that it will be fixed in the next dot release (6.0.2). I can’t provide a specific date for the next dot release yet.

In other words, the feature Simply Does Not Work. This is the type of programming mistake that I make in our backend applications at a small business, sure: but it is not the type of mistake that you would expect from a multi-million dollar software development powerhouse who has been doing this for decades and charges over $600 for the product. Ridiculous.

Try exporting to a source file

Great. So the fix is not available, and I have to get this stupid Flash file up before I can go home. Maybe I can use the other InDesign option to export to XFL. Then we can open up this file in Adobe Flash proper and sort out the images there.

That works great, but when exporting to XFL, InDesign doesn’t export any of the hyperlinks or page animations, which is kind of half the point of the exporting feature. The Help manual even documents this as if this were useful behavior. Obviously InDesign had to generate some source file to generate the enormous SWF; why can’t I have that file, and not this half-baked XFL thing?

Now I’m getting irritated

Okay. Now what? Perhaps I can decompile the enormous SWF file so that I can access the image resources and lower the quality of the JPEGs. If you Google this long enough, you will find that generally all third-party Flash products (especially those relating to decompilation) fall into three categories:

  • outdated open source software that probably never worked;
  • poorly written shareware; and
  • Trojan horses.

I finally downloaded a demo of Eltima’s Flash Decompiler Trillix. (What the hell does “Trillix” mean?) This application looks pretty, and it does let me see the image resources on the SWF and adjust their quality. But saving is disabled in the demo.

After roughly calculating out my hourly pay, I’d by that point determined that I had wasted much more of the company’s money than the $67 licensing fee, so I broke down and paid three tanks of gas for a magic license key number. I finally degraded the quality of all of the images and got the file size from 7.4 MB to 1.0 MB, which is good enough for us. Hopefully by the time the next catalog rolls around, Adobe will have released the 6.0.2 update.

The pièce de résistance

My boss asked me if I could use the new tool to just convert the SWF to a plain old FLA source file, the idea being that in the future we’d use this file as a template and bypass InDesign entirely.

Yep, you guessed it–the Eltiva software crashes with a bad pointer reference whenever it tries to decompile the scripts in the InDesign-generated Flash file. Why? Who knows–I don’t.

Conclusions and Delusions

This rant has some purpose: hopefully, it will help prevent some future poor sap from spending nearly half a day chasing his tale. The lessons learned: avoid the Export to SWF feature in Adobe InDesign CS4, and, if you must use it, use a decompiler to manually fix up the image resources.

This is why we drink.

Apr 5 09

Installing the Hand Held D7600 .NET SDK Under Windows Vista

by Nicholas Piasecki
The tried-and-true D7600.

The tried-and-true D7600.

So at work, I’ve written a customs Windows Forms application for our hand held mobile picker (a Hand Held Dolphin 7600 running Windows CE 5.0). I originally developed it under Visual Studio 2005 on XP and migrated my project to Visual Studio 2008 without a hitch.

When I tried to install the Hand Held D7600 SDK on my home machine running Windows Vista and Visual Studio 2008, everything seemed to install properly, but the D7600 was not registered as an available target platform in Visual Studio.

There are two problems at work here:

  • Even if you’re developing just a .NET application, you need to install both the C++ SDK and the .NET SDK from Hand Held’s D7600 Web site. (Hand Held was recently acquired by Honeywell, and as is typical for large company mergers, the Web site and branding are both terribly confused at the moment.) It is the C++ SDK that actually performs the task of registering the target device with Visual Studio; the .NET installer just adds a few extra assemblies.
  • The C++ SDK installer performs this registration via a Windows Installer custom action. The custom action appears to function properly but silently fails under Windows Vista unless the entire installer was launched with administrative privileges. The solution is to launch an elevated Command Prompt (right-click Command Prompt in the Start menu and choose “Run as administrator”) and execute the installer that way; e.g., msiexec /i "Windows CE C SDK for Visual Studio 2005 - Rev. 1.09.msi". Make sure you don’t have any instances of Visual Studio open while the installer is running. Then the D7600 should be available as a target platform the next time you start Visual Studio.

Hope this helps someone out there!

Mar 24 09

On TextWriterTraceListener, Inheritance, InitializeData, ASP.NET, and Paths

by Nicholas Piasecki

So you know that .NET has this nifty tracing framework built in; you just plug a few lines into your system.diagnostics section of your app.config or web.config file and now your Trace statements are being output to the listener that you specify. Neat.

Let’s say that you’ve gone the extra mile and you’ve implemented your own custom trace listener. Even better, let’s say that you’ve created a trace listener that extends from TextWriterTraceListener. After all, you’re probably logging to a text file, but perhaps you wanted to change the format around a little bit.

Recall that the way for us to specify the location of our trace logging file via configuration is to use the initializeData attribute as in the following example:

<system.diagnostics>
  <sharedListeners>
    <add
      name="Listener:ApplicationText"
      type="Skiviez.Hedgehog.Model.AlignedTextWriterTraceListener, Skiviez.Hedgehog.Model"
      initializeData="Media\Logs\Hedgehog.txt"
      traceOutputOptions="ThreadId, DateTime, ProcessId" />
  </sharedListeners>
</system.diagnostics>

Right. Looks pretty sane: I’ve got my custom trace listener type (we’ll assume that AlignedTextWriterTraceListener simply extends the built-in TextWriterTraceListener), I’m telling it to log to Media\Logs\Hedgehog.txt (relative to my working directory, I would presume), and I’m passing in some output options and giving it a name.

Nothing we have done would here would leave us to believe that we have broken the way TextWriterTraceListener works. We have, but it won’t be apparent until we try to run this in a ASP.NET Web site.

What have you got against Web sites?

So when we run it an ASP.NET Web site, we will note that while we receive no error, we also see no log file sitting in the Media\Logs directory as we specified. I can understand how the tracing code probably swallows exceptions–so as to not take down your entire Web site or application because some hoo hah misconfigured a tracing statement in the configuration file–so let’s assume that we’ve screwed something up. Check the permissions on the Media\Logs directory? Check. Check the working directory? Hmm.

The working directory of ASP.NET applications is usually strange, somewhere in the %SYSTEMDIR% area–that’s because your code is usually running from some temporary location where your ASP.NET Web site was compiled just-in-time to serve the first request. So our trace listener is trying to be relative to a highly privileged directory–not relative to our Web application’s root directory–and obviously the log file can’t be created there, failing silently.

Okay, that makes sense. But that seems astonishing somehow.

You’re not going crazy

The reason why this seems astonishing is that if you replace your configuration to use the standard TextWriterTraceListener instead of your custom type that extends from it, then the log file will magically appear in the expected location, relative to your Web application’s root directory, and not relative to the system directory.

<system.diagnostics>
  <sharedListeners>
    <add
      name="Listener:ApplicationText"
      type="System.Diagnostics.TextWriterTraceListener"
      initializeData="Media\Logs\Hedgehog.txt"
      traceOutputOptions="ThreadId, DateTime, ProcessId" />
  </sharedListeners>
</system.diagnostics>

Okay. So you double-check your custom type to make sure you’re really not doing anything strange with configuration. Which of course you aren’t–you’re just forwarding constructors to the base TextWriterTraceListener‘s constructors. You’re not supposed to care how they actually work.

So why is the path that the initializeData attribute seems to be relative to changing wildly between the two types?

Enter the Reflector

So after spelunking through Reflector for a little while, we stumble across this little gem in the TraceUtil‘s class GetRuntimeObject() method:

internal static object GetRuntimeObject(
	string className, 
	Type baseType, 
	string initializeData)
{
	// ... snip ...
 
	if (string.IsNullOrEmpty(initializeData))
	{
		if (IsOwnedTextWriterTL(c))
		{
			throw new ConfigurationErrorsException(
				SR.GetString(
					"TextWriterTL_DefaultConstructor_NotSupported"));
		}
		ConstructorInfo constructor = c.GetConstructor(new Type[0]);
		if (constructor == null)
		{
			throw new ConfigurationErrorsException(
				SR.GetString(
					"Could_not_get_constructor", 
					new object[] { className }));
		}
		obj2 = constructor.Invoke(new object[0]);
	}
	else
	{
		ConstructorInfo info2 = 
			c.GetConstructor(new Type[] { typeof(string) });
		if (info2 != null)
		{
			if ((IsOwnedTextWriterTL(c) && 
				(initializeData[0] != Path.DirectorySeparatorChar)) && 
				((initializeData[0] != Path.AltDirectorySeparatorChar) && 
				!Path.IsPathRooted(initializeData)))
			{
				string configFilePath = DiagnosticsConfiguration.ConfigFilePath;
				if (!string.IsNullOrEmpty(configFilePath))
				{
					string directoryName = Path.GetDirectoryName(configFilePath);
					if (directoryName != null)
					{
						initializeData = Path.Combine(
							directoryName, 
							initializeData);
					}
				}
			}
			obj2 = info2.Invoke(new object[] { initializeData });
		}
		// ... snip ...
	}
		// ... snip ...
}
 
internal static bool IsOwnedTextWriterTL(Type type)
{
	if ((typeof(XmlWriterTraceListener) != type) && 
		(typeof(DelimitedListTraceListener) != type))
	{
		return (typeof(TextWriterTraceListener) == type);
	}
	return true;
}

Now, I know that I for one have written some strange and hackish code in my time, but whoever wrote this function was obviously wearing his silly pants that day. Here’s what the function does:

If the following are ALL true:

  • The trace output listener type is one of the three concrete types explicitly provided by Microsoft (and not an inheritor of that type)
  • The initializeData attribute is provided
  • The initializeData path seems to be relative

Then the function mangles the initializeData attribute by combining it with the path of the configuration file that contained the attribute. It then instances the trace listener by providing this faked-just-in-time full path as the fileName argument.

So if I had configured a TextWriterTraceListener in my configuration file, and my configuration file were in C:\Foo, and I specified Media\Logs\Hedgehog.txt as the initializeData parameter, then the TraceUtils class would give the TextWriterTraceListener instance the value of C:\Foo\Media\Logs\Hedgehog.txt as the fileName argument.

But if I had configured anything but TextWriterTraceListener, XmlWriterTraceListener, or DelimitedListTraceListener as my output listener in my configuration file, and my configuration file were in C:\Foo, and I specified Media\Logs\Hedgehog.txt as the initializeData parameter, then the TraceUtils would kindly tell me to screw myself, pass the TextWriterTraceListener instance the value of Media\Logs\Hedgehog.txt as the fileName argument, which the instance would then match with the current working directory, which is something insane like C:\Windows\System32\Temporary ASP.NET Files if we’re talking about an ASP.NET Web site, and so I would end up with C:\Windows\System32\Temporary ASP.NET Files\Media\Logs\Hedgehog.txt, which is about as useful as sticking my thumb up my butt.

So just by extending a class, we have completely broken the way one of its configuration attributes works. That’s pretty astonishing when it comes to object-oriented design.

An ugly work-around

So I wasn’t about to specify a full path in initializeData because I don’t enjoy munging a configuration value on every single machine that I happen to build and deploy my solution to. As this is written in the .NET Framework’s internal TraceUtils class, I simply cannot avoid this behavior when extending a Microsoft-provided type.

So if I want to be able to specify a relative path in initializeData with a type that extends from TextWriterTraceListener and have it mimic the TextWriterTraceListener‘s initializeData relative-path-resolution behavior, my only recourse is to copy TextWriterTraceListener out of Reflector and make it my own type with my own relative-path munging semantics.

I change the constructors of my new type, which I’ve christened TextWriterTraceListener in my own namespace, as follows:

/// <summary>
/// Initializes a new instance of the TextWriterTraceListener class, 
/// using the file as the recipient of the debugging and tracing output. 
/// </summary>
/// <param name="fileName">The name of the file the 
/// TextWriterTraceListener writes to. </param>
public TextWriterTraceListener(string fileName)
{
	this.fileName = MungeFileName(fileName);
}
 
/// <summary>
/// Initializes a new instance of the TextWriterTraceListener class 
/// with the specified name, using the file as the recipient of the 
/// debugging and tracing output. 
/// </summary>
/// <param name="fileName">the fileName to output to</param>
/// <param name="name">the name of the trace listener</param>
public TextWriterTraceListener(string fileName, string name) : base(name)
{
	this.fileName = MungeFileName(fileName);
}

and the gloriously-named MungeFileName() method is

/// <summary>
/// Munges the file name such that if it is a relative path, we go
/// relative from the configuration file and not from the current working
/// directory. This makes things work as expected on ASP.NET sites and
/// makes other applications similarly work with non-astonishing
/// behavior.
/// </summary>
/// <param name="fileName">the file name to munge</param>
/// <returns>the munged filename</returns>
private static string MungeFileName(string fileName)
{
	string configPath;
	string mungedFileName;
 
	mungedFileName = fileName;
 
	if (fileName[0] != Path.DirectorySeparatorChar &&
		fileName[0] != Path.AltDirectorySeparatorChar &&
		!Path.IsPathRooted(fileName))
	{
		ConfigurationSection configSection;
 
		configSection = (ConfigurationSection)ConfigurationManager
			.GetSection("system.diagnostics");
		if (configSection != null)
		{
			configPath = configSection.ElementInformation.Source;
 
			if (!string.IsNullOrEmpty(configPath))
			{
				string directoryName;
 
				directoryName = Path.GetDirectoryName(configPath);
 
				if (directoryName != null)
				{
					mungedFileName = Path.Combine(
						directoryName, 
						fileName);
				}
			}
		}
	}
 
	return mungedFileName;
}

What I’m doing here is saying, “Is the path relative? Then figure out where my system.diagnostics configuration came from, and make myself relative to THAT directory.” This way, I don’t have to have different code paths for this class depending on whether it’s running in a traditional Windows application or an ASP.NET Web site–the initializeData is always relative to the configuration file and not the current working directory, which is what I want 99% of the time, and it’s probably what most people want, too.

Conclusions and Delusions

I think the primary lesson to take away from this is to avoid doing magic configuration and baby-sitting for a class’s constructors. I have a feeling that TextWriterTraceListener was implemented by somebody at Microsoft and sometime much later in the development process they realized that specifying a relative initializeData was difficult for ASP.NET applications. Being unable to alter TextWriterTraceListener without breaking compatibility for non-ASP.NET applications that may have already been built, they devised an unusual TraceUtils class to munge the parameters read from configuration before they were passed to a new instance’s constructor.

The result? It worked well if you stuck to the built-in classes, but sure was damn surprising if you created a custom type and eventually used it on an ASP.NET project.

Hope this helps someone out there. You’re not insane, after all.

Here’s the implementation for your convenience.

Mar 16 09

Sending Raw EPL2 Directly to a Zebra LP2844 via C#

by Nicholas Piasecki

Sometimes, our vendors send us products with incorrect barcodes or without UPCs. We have a requirement such that every item in our warehouse must have a barcode for inventory control purposes. So when these barcode-less items arrive, we need to print out UPC labels for them. We have a thermal label printer, and we know that it can print out barcodes, so let’s figure out a way to have our application create these barcode labels automatically.

Neat! We can generate this from C# with a text file.

Neat! We can generate this from C# with a text file.

The printer you’ve probably seen before

This little thermal label printer is seemingly ubiquitous in the retail and warehouse worlds:

The good ol' Zebra LP2844.

The good ol' Zebra LP2844.

Everyone seems to have standardized on this little guy–you can rent one from UPS or FedEx to print shipping labels, PayPal can print to it, and so can Stamps.com and Endicia. They retail for nearly $400, but nobody pays that; you can pick up some battered beauties for about $70 on eBay. And the model hasn’t really changed much in over a decade, with the only major revision being the addition of USB support. There are some slight differences in the UPS and FedEx-supplied models because they install a custom firmware, but for our purposes, they’re all the same printer.

The native language of the printer isn’t PostScript or PCL but instead a proprietary language called EPL2 (most commonly just called EPL, without the “2″). It’s short for Eltron Printer Language. Eltron was the company that originally designed this model of printer, and Zebra bought them up a decade ago. Zebra has their own printer language–ZPL, or Zebra Printer Language–that’s similar to ZPL and is used on the TLP2844Z model, but the languages aren’t compatible with one another. Since there was already a lot of legacy code out there working with EPL, and since EPL itself isn’t broken, Zebra continues to offer printers that support EPL. This is the language that UPS WorldShip and FedEx Ship Manager use to print out UPS shipping labels from these devices.

It turns out that EPL is conceptually quite simple, and we can relatively easily add native support for it in our .NET applications. But first, let’s consider why we don’t just take the GDI route.

Why not print using PrintDocument and GDI+?

Zebra provides an advanced printer driver for the LP2844 that allows Windows to see it as any old GDI-based printer. If you open up a Word document, type some stuff into it, and print it to your LP2844, you’ll indeed get your document printed out as you expect. But there are a few disadvantages to this approach:

  • The GDI conversion is much slower than when sending commands with EPL.
  • The printer driver has a few bugs when it comes to determining the label size. If you’ve encountered a scenario where you specify landscape but it insists on printing in portrait mode or vice versa, then you’ve run into this problem.
  • The fidelity of text is generally poor. The LP2844 only supports a resolution of 203dpi (actually, there’s a 300dpi version out there, but it’s pretty uncommon), so small text can come out blocky, jagged, and hard to read. This isn’t true for the printer’s native fonts, which are optimized for this resolution.
  • GDI has no built-in support for rendering barcodes. Sure, you can write a barcode renderer yourself in GDI (I’ve done this before, but it’s not a trivial task), but why not save some time and use your printer’s built-in functionality?

Grabbing the EPL documentation and Zebra Firmware Downloader

One of the hardest parts of using EPL is simply finding the documentation for it. Zebra provides a very well-written manual, but it’s buried deep on their Web site: here’s a link to a mirrored copy of the EPL programming documentation. While you’re at it, download the Zebra Firmware Downloader, which will help you send your test EPL text files directly to the printer.

Learning about the EPL

An example is usually the easiest way to learn. Here’s the EPL commands necessary to generate the UPC label shown at the top of this blog post:

 
N
q609
Q203,26
B26,26,0,UA0,2,2,152,B,"603679025109"
A253,26,0,3,1,1,N,"SKU 6205518 MFG 6354"
A253,56,0,3,1,1,N,"2XIST TROPICAL BEACH"
A253,86,0,3,1,1,N,"STRIPE SQUARE CUT TRUNK"
A253,116,0,3,1,1,N,"BRICK"
A253,146,0,3,1,1,N,"X-LARGE"
P1,1

EPL is one command per line. A command starts out with a command identifier, typically a letter, followed by a comma-separated list of parameters specific to that command. You can look up each of these commands in the EPL2 programming documentation. Here’s an English-language version of the commands in the above example.

0. Sending an initial newline guarantees that any previous borked 
      command is submitted.
1. [N] Clear the image buffer. This is an important step and 
      generally should be the first command in any EPL document; 
      who knows what state the previous job left the printer in.
2. [q] Set the label width to 609 dots (3 inch label x 203 dpi 
      = 609 dots wide).
3. [Q] Set the label height to 203 dots (1 inch label) with a 26 
      dot gap between the labels. (The printer will probably auto-
      sense, but this doesn't hurt.)
4. [B] Draw a UPC-A barcode with value "603679025109" at 
      x = 26 dots (1/8 in), y = 26 dots (1/8 in) with a narrow bar 
      width of 2 dots and make it 152 dots (3/4 in) high. (The 
      origin of the label coordinate system is the top left corner 
      of the label.)
5. [A] Draw the text "SKU 6205518 MFG 6354" at
      x = 253 dots (3/4 in), y = 26 dots (1/8 in) in
      printer font "3", normal horizontal and vertical scaling,
      and no fancy white-on-black effect.
(6 through 9 are similar to line 4.)
10. [P] Print one copy of one label.

In a way, EPL is quite similar to GDI. You’ve got an image buffer in memory, you issue a batch of commands to doodle and write text and barcodes on that buffer, and then you release the buffer to the printer, telling it to print it.

When designing a label, I find that it’s essential to have a ruler and a calculator handy. You’ll be converting between dots and inches a lot as you design your label. If you find that you send a command (such as a barcode command) and nothing prints out, that usually means that the command was invalid or that some parameter was out of range (for example, the barcode height wasn’t tall enough).

It’d be nice if we could print out these EPL text files that we’ve created in Notepad without writing any code. The Zebra Firmware Downloader (which certainly has a scary name) can do this for us. Once you’ve installed the application, start it up and hunt for the “Auto Detect” button on the toolbar. Once it finds your printer, right-click it in the list, click Select Firmware File…, and browse to the text file containing your EPL commands. Then right-click the printer and choose Download to Selected. The printer will print out your test label. You can now easily be editing your EPL document in Notepad and keep choosing Download to Selected as you iterate through and refine your design.

Printing the label directly from C#

Now that we’ve got the EPL code written, we need to figure out a way for our application to send this file directly to the printer. The easiest way is to use the RawPrinterHelper sample class provided by Microsoft, but we’ll need to fix a bug in it first.

Here’s what a class that prints to the printer might look like:

using System;
using System.Collections.Generic;
using System.Text;
using Skiviez.UndiesClient.Domain;
using Skiviez.Commons.WinForms;
using Skiviez.Commons.Core;
using System.Globalization;
 
namespace BlahBlahBlah
{
    public class UpcLabel
    {
 
        private string upc;
 
        public UpcLabel(string upc)
        {
            if (upc== null)
            {
                throw new ArgumentNullException("upc");
            }
 
            this.upc = upc;
        }
 
        public void Print(string printerName)
        {
            StringBuilder sb;
 
            if (printerName == null)
            {
                throw new ArgumentNullException("printerName");
            }
 
            sb = new StringBuilder();
            sb.AppendLine();
            sb.AppendLine("N");
            sb.AppendLine("q609");
            sb.AppendLine("Q203,26");
            sb.AppendLine(string.Format(
                CultureInfo.InvariantCulture,
                "B26,26,0,UA0,2,2,152,B,\"{0}\"",
                this.upc));
            sb.AppendLine("P1,1");
 
            RawPrinterHelper.SendStringToPrinter(printerName, sb.ToString());
        }
    }
}

This is pretty straight forward. We’re just building our EPL document in a StringBuilder, and we could easily customize the document on the fly with some string.Format calls. Here’s how we might invoke this code:

UpcLabel label = new UpcLabel("603679025109");
label.Print("Zebra  LP2844");

The string that we pass in in the Print function is just the name of the printer as it appears in the Printers list in the Windows control panel. You could pop up a PrintDialog here and ask the user for a printer, but I’ve just hard-coded the name.

The last piece of the puzzle is that RawPrinterHelper.SendStringToPrinter call. Well, it’s the exact same code copy and pasted from the Microsoft support article above. But there’s one bug that we need to fix. In the SendStringToPrinter method, there’s a line that looks like this:

    dwCount = szString.Length;

Change it to look like this:

    dwCount = (szString.Length + 1) * Marshal.SystemMaxDBCSCharSize;

(The main problem is that it wasn’t leaving room for the null character at the end of the unmanaged string, which can cause some mysterious problems with your last command not getting interpreted directly, depending on the length of the raw document. The SystemMaxDBCSCharSize nonsense is for versions of Windows where an ANSI codepage with double byte characters is loaded.)

Conclusions and Delusions

And there you have it! EPL is a fun and simple printer command language, and with a little bit of interop, we can send these commands directly from our C# application. Happy barcoding!

Feb 26 09

Distinguishing Barcode Scanners from the Keyboard in WinForms

by Nicholas Piasecki

Like any business, small or otherwise, we use barcodes for lots of things–for picking orders, for looking up totes, for processing returns, and for checking in inventory, to name but a few–and so our internal Windows Forms-based application asks for barcodes a lot:

To scan a barcode, just focus that textbox and use the scan gun. Easy!

To scan a barcode, just focus that textbox and use the scan gun. Easy!

We have three different types of barcode scan guns at work, but they all work in more or less the same way: they show up as a USB HID device in Windows, and when you scan a barcode, Windows and WinForms applications simply see it as if someone were typing a bunch of numbers on the keyboard really quickly. This makes integration easy: wherever you have a Textbox, you can click on it and scan a barcode and it just works.

By scanning barcodes everywhere, we reduce miscounts and improve our accuracy. But there is one part of this select-the-textbox-and-scan model that is incredibly irritating.

I was first alerted to the problem by my highly irritated coworker storming into my office. He was annoyed that he had been checking in inventory for a particularly large shipment and hadn’t noticed until the end of his work that he or the scan gun’s cord had jiggled the keyboard and caused the textbox to lose focus. The scan gun beeped happily when it read a barcode, but the warehouse computers lack sound and so there was no obvious way to alert him that, um, some of those last 100 barcode scans had been sent to an hwnd that doesn’t give a flying fart about them. The result? He had to start over.

The typical programmer response was “Well, make sure that the textbox actually has focus.” And this policy worked for a short while until I was the one checking in a particularly large shipment one day and happened to do the exact same thing. I was irritated, to say the least. Time to solve the problem.

Developing a strategy

So how would I like this to work? Well, I envision creating a class called BarcodeScannerListener. This class would expose a BarcodeScanned event that would, you know, be fired when a barcode was scanned no matter which control or form currently had focus; the event arguments would be something like BarcodeScannedEventArgs that contained the complete scanned barcode and, if we wanted to get real fancy, perhaps some additional information about the device. Then, all I have to do is pass this instance of BarcodeScannerListener around my application to interested parties (through dependency injection or some nonsense, or by making it a singleton), and I’d have cut out a lot of textbox-related scanning code in one fell swoop.

So we have three hurdles to overcome:

  1. We need to be able to listen to keyboard strokes no matter which form or control currently has focus.
  2. Since we’re listening to this cacophony of keyboard strokes from all controls in the application, we need to be able to distinguish between strokes that come from the keyboard and strokes that come from the barcode scan gun.
  3. Since it’s conceivable that a control that can’t receive input or shouldn’t receive input might be focused when a barcode is scanned, we need to suppress the Windows messages related to the keyboard strokes generated by the scan gun. (How annoying would it be to hear 12 Windows “ding” sounds upon scanning a barcode, or how annoying would it be to have a UPC show up in that e-mail reply you were typing to a customer when you scanned it look up inventory?)

Our ideal high-level overview would something like this:

Easier said than done.

Easier said than done.

So let’s take a stab at this.

Serial emulation mode?

Some barcode scanners, in addition to supporting USB HID information mode in which they appear as a barcode scanner, also support a serial port emulation mode. In this case, we would communicate with the scanner by sending and receiving commands through the SerialPort class as if it were any other serial device, and you usually tell the scanner to go into this mode by scanning a special “configuration barcode” that its firmware recognizes, found in the owner’s manual. This would be great if

  • the scanner supported such a serial emulation mode;
  • the scanner was meant to be used by our application and our application alone; and
  • we didn’t want the scanner to generate keystrokes anywhere in the operating system.

Unfortunately, not all of our scanners at work support this mode, and even if they did, the scanner is intended to be used by multiple applications (such as FedEx Ship Manager, which requires it to operate as a keyboard wedge). So it seems like the easy solution is out.

Two steps forward, one step back

The first thing that we need to do is to figure out how to get access to the keystroke information no matter which form or control has focus. There are two ways to go about doing this: one is to use the Raw Input API and the other is to use low-level keyboard hook.

But since we also need to know which device the keyboard stroke came from, it looks like the Raw Input API will be the winner; the KBDLLHOOKSTRUCT structure that’s provided by SetWindowsHookEx() doesn’t tell us any device information. So let’s put keyboard hooks on the back burner for now and learn about the Raw Input API.

Diddling around with the Raw Input API

By default, no applications in Windows receive raw input. You have to register for it. And when you’ve registered an hwnd to receive raw input, that hwnd will start getting WM_INPUT messages pumped to it. The WM_INPUT message contains a RAWINPUTHEADER structure that can be passed to various functions in the Raw Input API that can tell you scads of information about the input event that just occurred. What’s important to note is that, with the appropriate flags, these WM_INPUT messages get sent to the hwnd that you register with the API for every input event regardless of whether or not your hwnd currently has focus. This satisfies our first requirement of needing to be able to see incoming keyboard stroke events regardless of whether or not some stupid textbox has focus. The regular device-independent input events like WM_CHAR and WM_KEYDOWN will still get generated for the hwnd; all you’re saying is that “hey, I want to hear about WM_INPUT and get all the gory details, too.”

Using the Raw Input API itself is kind of weird, and the mystical incantations of P/Invoke only make it weirder. This article on CodeProject helped me a lot in this regard, but I still had to drudge through the MSDN documentation and, of course, pinvoke.net.

Time to get started on BarcodeScannerListener. Since we’ll be piggybacking on an hwnd to receive the WM_INPUT messages, we might as well use one that’s going to be around for the lifetime of our application. In this case, this means my main form. But it’d be nice to design this as an independent class so that any consumers don’t need to know about the MainForm. What do you do when you need to override a WndProc without touching the class itself? You got it, it’s NativeWindow to the rescue again.

    public class BarcodeScannerListener : NativeWindow
    {
        /// <summary>
        /// Initializes a new instance of the BarcodeScannerListener
        /// class. The raw input devices that this class will listen to are
        /// registered with the given window handle.
        /// </summary>
        /// <param name="form">the form that should listen for
        /// barcode scans</param>
        /// <exception cref="ArgumentNullException">if the form is null</exception>
        /// <exception cref="ApplicationException">if we are unable to register
        /// for raw input devices</exception>
        /// <exception cref="ConfigurationErrorsException">if an error occurs
        /// during configuration</exception>
        public BarcodeScannerListener(Form form)
        {
            IntPtr hwnd;
 
            if (form == null)
            {
                throw new ArgumentNullException("form");
            }
 
            hwnd = form.Handle;
 
            HookRawInput(hwnd); // We'll take a look at this in a minute
            this.HookHandleEvents(form);
 
            this.AssignHandle(hwnd);
        }
 
        /// <summary>
        /// Hooks into the form's HandleCreated and HandleDestoryed events
        /// to ensure that we start and stop listening at appropriate times.
        /// </summary>
        /// <param name="form">the form to listen to</param>
        private void HookHandleEvents(Form form)
        {
            form.HandleCreated += this.OnHandleCreated;
            form.HandleDestroyed += this.OnHandleDestroyed;
        }
 
        /// <summary>
        /// When the form's handle is created, let's hook into it so we can see
        /// the WM_INPUT event.
        /// </summary>
        /// <param name="sender">the form whose handle was created</param>
        /// <param name="e">the event arguments</param>
        private void OnHandleCreated(object sender, EventArgs e)
        {
            this.AssignHandle(((Form)sender).Handle);
        }
 
        /// <summary>
        /// When the form's handle is destroyed, let's unhook from it so we stop
        /// listening and allow the OS to free up its resources.
        /// </summary>
        /// <param name="sender">the form whose handle was destroyed</param>
        /// <param name="e">the event arguments</param>
        private void OnHandleDestroyed(object sender, EventArgs e)
        {
            this.ReleaseHandle();
        }
    }

Most of the code above is just the kind of bullcrap that we need to do to make sure that we release resources properly when the form’s hwnd gets destroyed. The real meat we want to figure out is that HookRawInput() function in the constructor. This is what does the dirty work of registering our hwnd with the Raw Input API:

/// <summary>
/// Registers ourselves to listen to raw input from keyboard-like devices.
/// </summary>
/// <param name="hwnd">the handle of the form that will receive the raw
/// input messages</param>
/// <exception cref="InvalidOperationException">if the call to register with the
/// raw input API fails for some reason</exception>
private static void HookRawInput(IntPtr hwnd)
{
	NativeMethods.RAWINPUTDEVICE[] rid;
 
	rid = new NativeMethods.RAWINPUTDEVICE[1];
 
	rid[0].usUsagePage = 0x01;      // USB HID Generic Desktop Page
	rid[0].usUsage = 0x06;          // Keyboard Usage ID
	rid[0].dwFlags = NativeMethods.RawInputDeviceFlags.RIDEV_INPUTSINK;
	rid[0].hwndTarget = hwnd;
 
	if (!NativeMethods.RegisterRawInputDevices(rid, (uint)rid.Length, (uint)Marshal.SizeOf(rid[0])))
	{
		InvalidOperationException e;
 
		e = new InvalidOperationException(
			"The barcode scanner listener could not register for raw input devices.",
			new Win32Exception());
		throw e;
	}
}

Let’s stop for a minute and take a look at what the hell this is doing. The goal here is to P/Invoke the Win32 method called RegisterRawInputDevices(). This takes a variable length array of RAWINPUTDEVICE structures as a parameter. These structures tell us what classes of devices that we want to receive WM_INPUT messages for. Note that we can’t ask for a particular device, and this is what makes things so interestingly complicated–we’re going to get all of the notifications for all keyboard-like devices and we’re going to have to filter out the ones from the barcode scan gun ourselves. But that comes much later.

So what we do here is register ourselves with one class of devices, those devices that are keyboard-like. (Unfortunately, there is no HID usage page for barcode scan guns yet.) The parameters in usUsagePage and usUsage are magic numbers. There is a whole slew of these magic numbers on the USB consortium’s Web site. Since all of desktop scan guns act as keyboard wedges, I went with the usage page and usage (really, could they have chosen terms that were any more confusingly similar?) for a keyboard device and it just worked.

The RIDEV_INPUTSINK flag is what tells Windows to send us WM_INPUT messages regardless of the hwnd that they’re being directed to. This means that even if our main form is minimized and the user has, say, Notepad focused and scans a barcode, we’ll still see the event. I’m not sure if that’s really cool or really spooky.

The actual extern declaration is sitting in a class called NativeMethods. The declaration really isn’t all that interesting and if you want to look at it, you can take a look at the sample code.

Great! So now we’re all set up to get WM_INPUT messages. Let’s check this by actually implementing our version of WndProc in our BarcodeScannerListener class:

/// <summary>
/// Hook into the form's WndProc message. We listen for WM_INPUT and do
/// special processing on the raw data.
/// </summary>
/// <param name="m">the message</param>
[SecurityPermission(
   SecurityAction.LinkDemand, 
   Flags = SecurityPermissionFlag.UnmanagedCode)]
protected override void WndProc(ref Message m)
{
	switch (m.Msg)
	{
		case NativeMethods.WM_INPUT:
			this.ProcessRawInputMessage(m.LParam);
			// There's more work to do here, but we'll get to that in a minute
			break;
	}
 
	base.WndProc(ref m);
}

So, this is pretty simple. If we get a WM_INPUT, we call our ProcessRawInputMessage() function on it. Otherwise, we don’t really care about the message. So let’s take a look at that beast of a function. Get your scroll wheel finger ready because it’s a doozy:

/// <summary>
/// Process the given WM_INPUT message.
/// </summary>
/// <param name="rawInputHeader">the rawInputHeader of the message</param>
/// <returns>whether or not the keystroke was handled</returns>
private bool ProcessRawInputMessage(IntPtr rawInputHeader)
{
	bool handled;
	uint size;
 
	handled = false;
	size = 0;
 
	// First we call GetRawInputData() to set the value of size, which
	// we will the nuse to allocate the appropriate amount of memory in
	// the buffer.
	if (NativeMethods.GetRawInputData(
			rawInputHeader,
			NativeMethods.RawInputCommandFlag.RID_INPUT,
			IntPtr.Zero,
			ref size,
			(uint)Marshal.SizeOf(typeof(NativeMethods.RAWINPUTHEADER))) == 0)
	{
		IntPtr buffer;
		BarcodeScannerDeviceInfo deviceInfo; // I'll get to this later
		NativeMethods.RAWINPUT raw;
 
		buffer = Marshal.AllocHGlobal((int)size);
 
		try
		{
			if (NativeMethods.GetRawInputData(
					rawInputHeader,
					NativeMethods.RawInputCommandFlag.RID_INPUT,
					buffer,
					ref size,
					(uint)Marshal.SizeOf(typeof(NativeMethods.RAWINPUTHEADER))) == size)
			{
				raw = (NativeMethods.RAWINPUT)Marshal.PtrToStructure(buffer, typeof(NativeMethods.RAWINPUT));
 
				// This is what filters out for the right device we're looking for.
				// How'd I get that hDevice magic number? Keep reading.
				if (this.devices.TryGetValue(raw.header.hDevice, out deviceInfo))
				{
					handled = true;
 
					if (raw.header.dwType == NativeMethods.RawInputType.RIM_TYPEKEYBOARD)
					{
						if (raw.keyboard.Message == NativeMethods.WM_KEYDOWN)
						{
							StringBuilder localBuffer;
							byte[] state;
 
							localBuffer = new StringBuilder();
							state = new byte[256];
 
							if (NativeMethods.GetKeyboardState(state))
							{
								if (NativeMethods.ToUnicode(
										raw.keyboard.VKey,
										raw.keyboard.MakeCode,
										state,
										localBuffer,
										64,
										0) > 0)
								{
									if (localBuffer.Length == 1 && localBuffer[0] == 0x4)
									{
										this.FireBarcodeScanned(deviceInfo);
									}
									else
									{
										this.keystrokeBuffer.Append(localBuffer.ToString());
									}
								}
							}
						}
					}
				}
			}
		}
		finally
		{
			Marshal.FreeHGlobal(buffer);
		}
	}
 
	return handled;
}

After working with Java and C# for a few years, this kind of API is a reminder of how boring and tedious manual memory management can be. To get details about the WM_INPUT message, including important things like the device it’s coming from, we need to call a method in the Win32 API called GetRawInputHeader(). Since the size of the data can vary, we have to call the method once passing a null reference to a buffer (IntPtr.Zero) and it will populate the size parameter for us. Then we can allocate memory for a buffer of that size and call the function again. Then we marshal the data in that buffer to a RAWINPUT struct, which is the bad boy that tells us what device the keyboard stroke is originating from.

The rest of the method is now jumping ahead: if the device handle (hDevice) is one that we’re watching for (namely, the device handle of the barcode scan gun) and the message is a WM_KEYDOWN event, then we get the current keyboard state and call ToUnicode() to convert the current “keyboard” state into an actual usable character. If a character is returned, we’ll add it to a StringBuilder called keystrokeBuffer. Since all of our scan guns are configured to send an EOT character (ASCII 0×4) at the end of a barcode, we’ll watch for that and if we see it we’ll clear out our buffer and fire the BarcodeScanned event.

Why go to the trouble of GetKeyboardState() and ToUnicode()? Well, remember that the barcode scan gun really is just simulating a keyboard. So when it tries to send our EOT character at the end of the barcode, it doesn’t actually press an EOT key–there’s no such key on your keyboard. Instead it presses Control and then it presses D. These two keyboard strokes get sent in two different WM_INPUTs and two different calls to this method. If we’d been just looking at the virtual keycode on each method call, we’d end up with a lot of bogus characters. Let’s let ToUnicode() do the hard work of figuring out whether a letter is lowercase or uppercase, or if it’s trying to output a $ instead of a 3.

Blocking the keyboard strokes

So our third requirement was to block the keyboard strokes from actually reaching the control if they originated from our scan gun. We might be tempted to revisit our old friend the low-level keyboard hook for this. Indeed, we might spend three hours pursuing this option only to realize something quite unfortunate: the low-level keyboard hook, which provides no device-specific information, runs before any WM_INPUT messages get sent to us by the Raw Input API. That means that by the time that we have enough information to decide whether or not we want to swallow the keyboard stroke, it’s already too late–we’ve missed our chance to swallow it because the keyboard hook callback is long gone.

A less elegant, possibly race-condition prone condition is to just zap all of the WM_KEYDOWN events in the WndProc method if we handled the keyboard stroke:

protected override void WndProc(ref Message m)
{
	switch (m.Msg)
	{
		case NativeMethods.WM_INPUT:
			if (this.ProcessRawInputMessage(m.LParam))
			{
				// This is the new stuff
				NativeMethods.MSG message;
				NativeMethods.PeekMessage(
					out message,
					IntPtr.Zero,
					NativeMethods.WM_KEYDOWN,
					NativeMethods.WM_KEYDOWN,
					NativeMethods.PeekMessageRemoveFlag.PM_REMOVE);
			}
 
			break;
	}
 
	base.WndProc(ref m);
}

We P/Invoke a call to PeekMessage() with a parameter of PM_REMOVE to do this. By passing in a null pointer as the hwnd parameter, we will zap all of the WM_KEYDOWN messages for handles that are running on the current thread. Since all of app’s UI is running in a single-thread, we can be sure we’re zapping the events regardless of which of our controls had focus. This seems to work well enough, especially as I can’t think of a scenario where a user would be typing on the keyboard and scanning a barcode at the same time. But I can envision a scenario where we try to clear the WM_KEYDOWN but it hasn’t be posted yet, allowing a stray character to get through. I haven’t seen it in practice, but it’s something to keep in mind.

Figuring out the device handle

So the last little bit is to figure out the device handle of a barcode scan gun. This is so that we can compare it against the hDevice handle that comes back in the RAWINPUT structure in the Raw Input API. To our constructor we’ll add a method invocation for InitializeBarcodeScannerDeviceHandles() and define that function thusly:

/// <summary>
/// Enumerates devices provided by GetRawInputDeviceList. We'll only listen
/// to these devices.
/// </summary>
/// <exception cref="ConfigurationErrorsException">if an error occurs
/// during configuration</exception>
private void InitializeBarcodeScannerDeviceHandles()
{
	BarcodeScannerListenerConfigurationSection config;
	BarcodeScannerListenerConfigurationElementCollection hardwareIdsConfig;
	List<string> hardwareIds;
	uint numDevices;
	uint size;
 
	config = BarcodeScannerListenerConfigurationSection.GetConfiguration();
	hardwareIdsConfig = config.HardwareIds;
	hardwareIds = new List<string>();
	numDevices = 0;
	size = (uint)Marshal.SizeOf(typeof(NativeMethods.RAWINPUTDEVICELIST));
 
	foreach (BarcodeScannerListenerConfigurationElement hardwareId in hardwareIdsConfig)
	{
		hardwareIds.Add(hardwareId.Id);
	}
 
	// First, we get the number of raw input devices in the list by passing
	// in IntPtr.Zero. Then we allocate sufficient memory and retrieve the
	// entire list.
	if (NativeMethods.GetRawInputDeviceList(IntPtr.Zero, ref numDevices, size) == 0)
	{
		IntPtr rawInputDeviceList;
 
		rawInputDeviceList = Marshal.AllocHGlobal((int)(size * numDevices));
		if (NativeMethods.GetRawInputDeviceList(
			rawInputDeviceList,
			ref numDevices,
			size) != uint.MaxValue)
		{
			// Next, we iterate through the list, discarding undesired items
			// and retrieving further information on the barcode scanner devices
			for (int i = 0; i < numDevices; ++i)
			{
				uint pcbSize;
				NativeMethods.RAWINPUTDEVICELIST rid;
 
				pcbSize = 0;
				rid = (NativeMethods.RAWINPUTDEVICELIST)Marshal.PtrToStructure(
					new IntPtr((rawInputDeviceList.ToInt32() + (size * i))),
					typeof(NativeMethods.RAWINPUTDEVICELIST));
 
				if (NativeMethods.GetRawInputDeviceInfo(
					rid.hDevice,
					NativeMethods.RawInputDeviceInfoCommand.RIDI_DEVICENAME,
					IntPtr.Zero,
					ref pcbSize) >= 0)
				{
					if (pcbSize > 0)
					{
						string deviceName;
						string friendlyName;
						BarcodeScannerDeviceInfo info;
						IntPtr data;
 
						data = Marshal.AllocHGlobal((int)pcbSize);
						if (NativeMethods.GetRawInputDeviceInfo(
							rid.hDevice,
							NativeMethods.RawInputDeviceInfoCommand.RIDI_DEVICENAME,
							data,
							ref pcbSize) >= 0)
						{
							deviceName = (string)Marshal.PtrToStringAnsi(data);
 
							if ((from hardwareId in hardwareIds
								 where deviceName.Contains(hardwareId)
								 select hardwareId).Count() > 0)
							{
								friendlyName = GetDeviceFriendlyName(deviceName);
 
								info = new BarcodeScannerDeviceInfo(
									deviceName,
									GetBarcodeScannerDeviceType(rid.dwType),
									rid.hDevice,
									friendlyName);
 
								this.devices.Add(rid.hDevice, info);
							}
						}
 
						Marshal.FreeHGlobal(data);
					}
				}
			}
		}
 
		Marshal.FreeHGlobal(rawInputDeviceList);
	}
}

This mess of interop code reads our configuration file for a list of hardware IDs to look for. For example, one of the strings I’m looking for in the device name is HID#Vid_05e0&Pid_028a, which I got from Device Manager, and this is enough to identify our Hand Held scan gun. Then we call GetRawInputDeviceList() (twice, once to get the size of the buffer that we need to allocate, and once again to actually fill that buffer once we’ve allocated it) to get a list of the devices that we’re actually listening to. For each device, a call to GetRawInputDeviceInfo() gets us the device name, which is this really horrific looking string, but my hardware ID is in there somewhere. So if I have a match, I grab some information about the device, wrap it up in a simple BarcodeScannerDeviceInfo object, and add the device handle and this struct to a dictionary called devices. Our ProcessRawInputMessage() function can now look at this dictionary to determine which WM_INPUT messages to listen to and which ones to ignore.

Now, you may be thinking, why do all that to get the device handle? Why not just store that in your config file? Well, the problem with USB devices is that device handle changes every time you unplug and replug the barcode scan gun. The hardware ID is the only reliable identifier. I don’t understand what most of the huge device name string actually contains, but at this point, I’m not sure if I really care to become a USB expert–right now, things Simply Work, and I’m inclined to leave them that way.

Putting it all together

With all this in place, I can run the app, scan the UPC symbol of the ridiculously American-sized Diet Mountain Dew sitting on my desk, and squeal for joy as (A) the textbox does NOT show any characters, (B) the status bar listens to the BarcodeScanned event and displays the UPC data, and (C) I type on the keyboard and data is correctly passed through to the textbox and ignored by the BarcodeScannerListener. If I scan the barcode in another application, my application still sees it, but the application still gets the text passed through–that’s not a big deal for me. (The reason for this is our call to PeekMessage() only zaps messages for my current application thread, not for the entire OS.)

I’ll clean up the code and see if I can post something up soon. And if you know of a way where this could have been done in three lines, by all means, let me know.

At the very least, it’s nice to scan away on those huge inventory shipments and not worry about some stupid textbox having focus.

Update: I’ve put together a bare-bones version of the code. Sample Application Code

Update 5/15/2010: One Year Later

So it’s been about 14 months since I put the above code into production in my application. Thanks to some of the helpful comments passers-by have left, and thanks to some rare crash reports that I was able to debug, I’ve made some changes to the code that I use in production and would like to make them known, to save you some headaches.

PeekMessage() was a bad idea

The first is that I removed the call to PeekMessage() and instead used an IMessageFilter implementation. The reason that PeekMessage() was problematic is that it removed all of the WM_KEYDOWN messages in the window’s message queue.

Under normal conditions, this worked just fine, since there was normally only one WM_INPUT (the one being processed) and one WM_KEYDOWN (the one that corresponds to the WM_INPUT message) in the queue. But if the system was “busy” (say, for example, you start up FedEx Ship Manager in the background and scan something), then some messages–particularly, the SHIFT key, as noted in the comments below–would be “missed.” That’s because the tax on OS resources would slow down the rate at which the application could process messages while the barcode scanner continues to dish out messages at a constant rate.

So what would happen is that, say, three WM_KEYDOWN messages would appear on the queue: let’s use “b”, SHIFT, and “c” as an example set. By the time the application got around to processing the first WM_INPUT (the one for the “b” keystroke), it would call PeekMessage() and inadvertently zap all three WM_KEYDOWN messages.

And normally even that wouldn’t be a big deal, because we’re looking at WM_INPUT messages, after all, not WM_KEYDOWN messages, to determine what to put in our StringBuilder. But note that we call GetKeyboardState() to determine the state of the keyboard and pass that byte array into the ToUnicode() call to get the actual character back. It turns out that GetKeyboardState() works by examining the WM_KEYDOWN messages that are sitting in the message queue–and guess what, we zapped the SHIFT keypress right out of the queue by calling PeekMessage(). As a result, we’ll fill our StringBuilder with “bc” instead of “bC”, and we might be left scratching our heads as to why the SHIFT, CTRL, or ALT keys aren’t always “making it through.”

So my workaround is simple:

    /// <summary>
    /// Filters WM_KEYDOWN messages.
    /// </summary>
    public class BarcodeScannerKeyDownMessageFilter : IMessageFilter
    {
        /// <summary>
        /// Gets or sets a value indicating whether or not the next keydown message
        /// should be filtered.
        /// </summary>
        public bool FilterNext
        {
            get;
            set;
        }
 
        /// <summary>
        /// Filters out a message before it is dispatched.
        /// </summary>
        /// <param name="m">The message to be dispatched. You cannot modify 
        /// this message.</param>
        /// <returns>
        /// true to filter the message and stop it from being dispatched; false
        /// to allow the message to continue to the next filter or control.
        /// </returns>
        [SecurityPermission(SecurityAction.LinkDemand, Flags = SecurityPermissionFlag.UnmanagedCode)]
        public bool PreFilterMessage(ref Message m)
        {
            bool filter = false;
 
            if (this.FilterNext && m.Msg == NativeMethods.WM_KEYDOWN)
            {
                filter = true;
                this.FilterNext = false;
            }
 
            return filter;
        }
    }

If the FilterNext property is set, then we filter out the next WM_KEYDOWN that we see and then set the property back to false. Otherwise, we let the message through.

The BarcodeScannerListener now instances the BarcodeScannerKeyDownMessageFilter (what a mouthful) in its constructor and registers it with the Windows Forms-provided Application class:

            this.filter = new BarcodeScannerKeyDownMessageFilter();
            Application.AddMessageFilter(this.filter);

And the WndProc message sets the FilterNext property of the filter as it sees appropriate:

        protected override void WndProc(ref Message m)
        {
            switch (m.Msg)
            {
                case NativeMethods.WM_INPUT:
                    if (this.ProcessRawInputMessage(m.LParam))
                    {
                        this.filter.FilterNext = true;
                    }
 
                    break;
            }
 
            base.WndProc(ref m);
        }

This has proven to be much more reliable in practice than the old PeekMessage() method. I haven’t had problems with “missed keystrokes” since.

GetKeyboardState() only works as expected when you have focus

By design, Windows keyboard messages go straight to the window that has focus. When I originally wrote the article, I had the HookRawInput() method of the BarcodeScannerListener specify RIDEV_INPUTSINK. This allowed me to receive WM_INPUT messages even when my application did not have focus.

Well, that meant I still received the WM_INPUT messages, but I didn’t receive the WM_KEYDOWN messages. And normally I wouldn’t care about that, but that also means that I wasn’t receiving WM_KEYDOWN messages for things like SHIFT, CTRL and ALT. The net result? Barcodes would be scanned with intended case if the application had focus, but they would be scanned as all lowercase if the application did not have focus. That’s because GetKeyboardState() assesses the keyboard state by looking at the WM_KEY* messages in the message queue, as I mentioned above in the discussion about the problems with the PeekMessage() call.

I didn’t come up with a workaround for this other than to say that I now only support scans when the application has focus. As a result, I used 0 instead of specifying any flags in HookRawInput().

(Thanks to commenter Matt for pointing this out.)

A rare memory corruption bug caused great hair loss

If there’s one thing that I hate, it’s bugs that I fix but never truly understood the original cause. And on this one I’m not alone.

Every once in 10,000 or so scans, the application would crash with a memory corruption error in msvcr80.dll, and it always occurred when marshalling data into the RAWINPUT structure in the ProcessRawInputMessage() function of the BarcodeScannerListener:

raw = (NativeMethods.RAWINPUT)Marshal.PtrToStructure(buffer, typeof(NativeMethods.RAWINPUT));

(DrkNess, one of the commenters on this article, also encountered the error.)

Now, as we know with access violations, the memory corruption could have occurred at some other point in the program: this is just an unmanaged call that gets called exceptionally frequently and so is likely to stumble across the corrupted memory boundary.

I read all up on pinning memory with the garbage collector (shouldn’t have applied), double- and triple-checked the P/Invoke declarations (they seemed to be correct), and even found several references to a similar problem in the article on CodeProject that became an inspiration for this one, but damned if I couldn’t find a reason for the error. Obviously I had screwed something up in a subtle way that is fantastically hard to reproduce.

So I ripped out all of the P/Invoke declarations and created a C++/CLI assembly that contained one class, the help-me-I’m-running-out-of-names-for-these-things BarcodeScannerListenerInteropHelper. Its sole purpose is to be used by the BarcodeScannerListener and bridge the gap between the managed world and the unmanaged Windows platform calls without the need for P/Invoke. With P/Invoke out of the way, so too has the memory corruption bug disappeared. A nice side effect is that the BarcodeScannerListener class has become much smaller and easier to understand.

For the really hardcore, write an upper device driver filter

Throughout this process I also learned that one sure-fire way to implement all of this barcode scanning functionality is to write an upper device driver filter. But I still stick to my method because (a) it works well enough, (b) doesn’t require special configuration on the client machine, and (c) is a hell of a lot easier than mucking with drivers.

Such a filter would enable us to receive input regardless of which application had focus, and it would enable us to control which applications received the input, but it does look like it would be a lot of work.

New Conclusions and Delusions

The code has held up pretty well, and I’m glad that I finally found the time to write a post-mortem analysis of the issues that I’ve encountered.

Here’s an updated version of the sample application that incorporates the changes that I’ve made. Good luck!

Feb 16 09

Fun with Deadlocks in SQL Server 2005

by Nicholas Piasecki

So we used to have a batch script that would run every evening to synchronize inventory between two different systems. And while this batch script was running, there’d be a high probability of having requests served by the Web site fail. The database that the Web site was accessing would toss an error like “Process Foo deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction” for some of the requests.

The error also occurred sporadically in other scenarios, but it most often happened when the batch script was running. This helped me deduce that any long running transaction occurring against the [orders] table in our database while the Web site was being accessed greatly increased the chances of the error occurring.

So how did I resolve the deadlocks such that I haven’t seen one since? Well, the answer was to put on my detective hat and hold a stakeout, waiting for the error to occur again after I had enabled some tracing features of SQL Server.

Capturing Data

By default, SQL Server does not log any particularly useful information about the deadlock. To tell it to do so, I executed the following command:

DBCC TRACEON(1204, 3605, -1)

This command tells SQL Server to send debugging information (identified by trace type 1204) directly to the error log (identified by parameter 3605). In my system, the error log was stored at C:Program FilesMicrosoft SQL ServerMSSQL.1MSSQLLOGERRORLOG.

Three days later, a deadlock occurred. When I checked the log, a wealth of cryptic information awaited:

	Deadlock encountered .... Printing deadlock information
	Wait-for graph
 
	Node:1
 
	KEY: 5:72057594071154688 (5200690f2156) CleanCnt:3 Mode:S Flags: 0x0
	       Grant List 2:
	         Owner:0x2FF3C660 Mode: S        Flg:0x0 Ref:1 Life:00000000 SPID:54 ECID:0 XactLockInfo: 0x2B5DCEDC
	         SPID: 54 ECID: 0 Statement Type: SELECT Line #: 1
	         Input Buf: Language Event: (@param0 nvarchar(7),@param1 int)SELECT TOP 5         COALESCE((           SELECT            SUM(oi.QtyShipped)           FROM dbo.orderItems AS oi           INNER JOIN dbo.[order] AS o ON o.ID = oi.OrderID           INNER JOIN dbo.productItem AS inner_pi
	       Requested By: 
	         ResType:LockOwner Stype:'OR'Xdes:0x03977770 Mode: X SPID:60 BatchID:0 ECID:0 TaskProxy:(0x3C718378) Value:0x96fe8a0 Cost:(0/1704)
 
	Node:2
 
	RID: 5:1:11345:3               CleanCnt:2 Mode:X Flags: 0x2
	       Grant List 3:
	         Owner:0x1549C0A0 Mode: X        Flg:0x0 Ref:0 Life:02000000 SPID:60 ECID:0 XactLockInfo: 0x03977794
	         SPID: 60 ECID: 0 Statement Type: UPDATE Line #: 1
	         Input Buf: Language Event: (@param0 int,@param1 int,@param2 int,@param3 int)UPDATE dbo.orderItems SET          Qty =  @param0 ,          Backorder =  @param1 ,          QtyShipped =  @param2         WHERE ID =  @param3
	       Requested By: 
	         ResType:LockOwner Stype:'OR'Xdes:0x2B5DCEB8 Mode: S SPID:54 BatchID:0 ECID:0 TaskProxy:(0x2D6D0378) Value:0x2a504c60 Cost:(0/0)
 
	      Victim Resource Owner:
	       ResType:LockOwner Stype:'OR'Xdes:0x2B5DCEB8 Mode: S SPID:54 BatchID:0 ECID:0 TaskProxy:(0x2D6D0378) Value:0x2a504c60 Cost:(0/0)

Analyzing Data

At first glance, this looked like a bunch of unhelpful gibberish. After grabbing a magnifying glass and some painkillers, though, it started to make some sense.

In Node:1, the Web site was trying to generate a list of similar products for a particular product page. In Node:2, an order was being marked as shipped by our internal order processing application. SQL Server killed Node:1, since it by default sees killing a SELECT statement as a less potentially damaging operation than killing an UPDATE. I figured this much out by recognizing the snippets of the SQL queries under Language Event. The log only contained about the first 100 characters of the offending SQL statements, but this was good enough.

Next, I used the trace output above to determine some more information about the locked resources.

Node:1 was the resource identified by KEY: 5:72057594071154688 (520060f2156), and the process SPID 54, which was running the similar products SELECT statement, had a shared lock on it. I figured out what this key referred to by referencing the system partitions table:

SELECT * FROM sys.partitions WHERE hobt_id = 72057594071154688

(I just stripped off the 5: part of the KEY and omitted the numbers in parentheses.) This command returned some gibberish, but the important values returned in it were the object_id (798625888) and the index_id (1). I used this information in another query:

SELECT * FROM sys.indexes WHERE object_id = 798625888 AND index_id = 1
SELECT OBJECT_NAME(798625888)

These queries told me that the KEY resource was actually the clustered index named PK_OrderItems on our orderitems table. Good–I figured out what the first locked resource was.

Now I had to figure out Node:2. It was the resource identified by RID: 5:1:11345:3, and the process SPID 60, which was running an UPDATE against the orderitems table, had an exclusive lock on it. I figured out what this RID referred to by using the DBCC tool:

DBCC TRACEON(3604) -- Enables output to be sent to the Query Analyzer window
DBCC PAGE(5,1,11345,0)

It returned a lot of information, but the real gem was m_objId (1531152500). I used that next:

SELECT OBJECT_NAME(1531152500)

This informed me that the RID was a particular page (for our purposes, a particular row) on the orders table.

Next, I looked at the queries each of these processes was running.

The similar products query was intensive (it has since be rewritten, but that’s besides the point). It tried to find popular products (based on the QtyShipped in the orderitems table) that match the given style. “Popular products” was defined as products that have had the most shipped in the past 7 days. It looked something like this:

	      SELECT TOP 5
	       COALESCE((
	         SELECT
	          SUM(oi.QtyShipped)
	         FROM dbo.orderItems AS oi
	         INNER JOIN dbo.[ORDER] AS o ON o.ID = oi.OrderID
	         INNER JOIN dbo.productItem AS inner_pi ON inner_pi.Upc = oi.Upc
	         WHERE inner_pi.Sku = p.Sku
	         AND o.ShipDate >= DATEADD(DAY, -7, GETDATE())
	       ), 0) AS ProductSalesVolume,
	       -- snipped various other columns
	      WHERE -- snipped some where conditions
	      ORDER BY ProductSalesVolume DESC

The update query, on the other hand, was rather simple:

	      UPDATE dbo.orderItems SET
	        Qty = #Quantity#,
	        Backorder = #QuantityBackordered#,
	        QtyShipped = #QuantityShipped#
	      WHERE ID = #Id#

Now let’s remember what the log data indicated:

  • RID: 5:1:11345:3 is a single row within the order table.
  • KEY: 5:72057594071154688 (5200690f2156) is the PK_OrderItems clustered index on the orderItems table.

SPID 54, the SELECT, had a shared lock on the PK_OrderItems index of the orderItems table. It would have liked to acquire a shared lock on a particular row of the order table. SPID 60, the UPDATE, had an exclusive lock on the desired row/page in the order table. It would have liked to acquire an exclusive lock on the PK_OrderItems index of the orderItems table.

Why was this so?

The simple UPDATE wasn’t actually touching the order table, which was confusing. What was happening was that the internal order processing application was saving the order and the order items in one transaction. Since the order table was updated during this transaction, the process that was running the transaction (SPID 60) had an exclusive lock on that row in the order table. This clue led me to the sequence events that occurred for the deadlock to happen:

Time SPID 54 Log SPID 60 Log
T0 BEGIN TRANSACTION; BEGIN TRANSACTION;
T1 [requests shared lock on PK_OrderItems index so it can perform the INNER JOIN in the upcoming SELECT] [requests exclusive lock on particular [order] row so it can perform an UPDATE on that row]
T2 [shared lock on PK_OrderItems index is granted] [exclusive lock on particular [order] row is granted]
T3 SELECT SUM(oi.QtyShipped) … UPDATE [order] SET Flag = ‘P’ WHERE ID = ‘Blah’
T4 (while continuing processing query from T3)
[requests shared lock on particular [order] row so it can read its ShipDate]
[requests exclusive lock on PK_OrderItems index so it can perform an UPDATE on a particular row]
T5 [request is blocked by SPID 60's exclusive lock on particular [order] row] [request is blocked by SPID 54's shared lock on PK_OrderItems index]
T6 [deadlock detected; this process is killed as the deadlock victim] [exclusive lock on PK_OrderItems index is granted]
T7 UPDATE [orderItems] SET QtyShipped = 3 WHERE ID = Blah;
T8 COMMIT TRANSACTION;

Resolution

One option to resolve this deadlock was to remove the inner SELECT that was looking at the QtyShipped when determining the most popular products for the similar products list, perhaps by caching this value and updating it periodically in a column called CachedSalesVolume that was updated infrequently. Or, it could have been broken out into a separate query that could be cached independently by the Web site.

Another option was to change the SELECT query to use the WITH (NOLOCK) query optimizer hint, which would tell SQL Server to run the query at the READ UNCOMMITTED isolation level. This would have enabled the query to read data that was locked by another user (in our case SPID 60) without requiring a shared lock on that data. This is referred to as a “dirty read” because you can read values that are not yet committed and are subject to change. Even more unfortunately, in the extremely rare case where a query that is using NOLOCK tries to read data that is actually being moved or changed by another user in that instant of time, a 605 error occurs, and the SELECT would still fail anyway.

The final and best option was to explore whether or not adding an additional index could alleviate the issue. The queries run by SPID 60 were always going to acquire an exclusive lock on the PK_OrderItems and a particular row in the order table. There really isn’t a way around this; it’s just the way the UPDATE process works. The only option, then, was to provide an index that the query used by SPID 54 could use instead of the PK_OrderItems index.

The SPID 54 query was using the PK_OrderItems index as part of a Key Lookup operation. Here’s why: when performing the first INNER JOIN, the query used the IX_OrderItems_OrderId index on the orderItems table to obtain a set of orderItems for a given order number. The next INNER JOIN needed to look at the UPC, and the final SELECT needed to look at the QtyShipped column. Unfortunately, the query was now only holding the OrderId and ID columns, so it needed to look up each order item and fetch its UPC value and its QtyShipped value. Looking up these additional columns for each row, which are not included in the index, was what caused the shared lock on the PK_OrderItems clustered index. By adding the UPC and QtyShipped columns as included columns in the IX_OrderItems_OrderId index, I could avoid the clustered index seek on the PK_OrderItems index entirely.

Here’s an abbreviated query plan before adding these included columns:

Before adding the included columns.

Before adding the included columns.

The disaster happened in row 13. The SELECT acquired a shared lock on PK_OrderItems because it needed to look up each order item in order to grab the UPC and QtyShipped fields.

Here is the abbreviated query plan after adding these included columns:

After adding the included columns.

After adding the included columns.

You can see that two steps have been removed from the query plan. Importantly, the Clustered Index Seek was now gone.

Now when the SELECT was scanning through the index, it already had the UPC and QtyShipped information it needed to complete its query. It no longer needed to lock and go rummaging through the row for these values. Since the lock was completely eliminated, the deadlock was eliminated, too.

So the moral of the story? If you have deadlocks, the first thing you should look at is your indexes. Try adding or removing indexes, or try adding included columns to indexes, to eliminate the shared lock/exclusive lock scenario from occurring.

Have fun making things work!

Feb 12 09

On Web Gardens, ASP.NET, and IIS 6.0

by Nicholas Piasecki

So as I’ve been working with a Web site deployed using Web gardens over the past two years, I’ve learned some painful quirks that come along with them.

What is a Web garden?

Think of a web garden as a web farm, but all in the context of a single machine. You have multiple worker processes running your application, preferably each running on a different core in your multicore computer.

Available right-clicking an application pool and choosing Properties in the IIS Management snap-in.

Available right-clicking an application pool and choosing Properties in the IIS Management snap-in.

Why is this a good thing? Is it for performance? Well, not performance alone and not necessarily at all. Using a Web garden will only be good for performance if your application does some unusual blocking and locking while processing a request such that all the little threads within an individual worker process get tied up waiting for another thread to release the lock. We had this for a while because our Web framework had a nasty threading bug when rendering templates, and running in a Web garden helped because while separate threads within a single process could clobber one another, separate threads in separate processes obviously couldn’t.

The bigger gain for Web gardens is not performance but robustness. If one of the worker processes hosting your application goes ape in that you get stuck in a loop and run at full tilt CPU until health monitoring kills you, then you’ve just stopped serving all requests from that process until it’s killed. If that’s the only process, then you’ve stopped serving requests for your Web site entirely. But if you have a Web garden with, say, 3 worker processes and one of them goes AWOL, then at least you’re serving requests from the remaining 2 until the bananas worker process is shut down and spun up again.

What are the caveats?

There are several caveats to consider, but they’re all important from a scalability perspective. In fact, enabling Web gardening can be a good way to see if your application will be able to function in a Web farm scenario on multiple servers reasonably well. If you can deal with a Web garden now, you’ll have a far easier time scaling out to multiple servers in the future than if you’d stuck with an application that assumes everything lives within the same worker process.

No InProc session storage

One of the caveats about using a Web garden is that if you’re using session state in your application, then you need to use an out-of-process session state, such as the ASP.NET State Service or sessions stored in SQL Server. InProc session management won’t work because each worker process will be maintaining its own session state. So if a customer is browsing your Web site and you have 3 worker processors, then your customer has a 1 in 3 chance of losing his session state as IIS round-robins his subsequent requests among the 3 available worker processes. If you use an out-of-process session state, then you can be sure that all 3 worker processes are consulting the same single resource as the place to store and retrieve session data.

With InProc storage and Web gardens, Bob the Blue Dot only has a 1 in 3 chance of finding his original session on any given request.

With InProc storage and Web gardens, Bob the Blue Dot only has a 1 in 3 chance of finding his original session on any given request.

No built-in caching mechanism

Similarly, if you’re using System.Web.Caching, you need to remember that each individual worker process is going to maintain its own cache. So if you cache a big list of products on your catalog listing page and you have three worker processes, then the database is going to get hit at least three times and that page is going to get cached in three separate places in RAM–one place for each process. This can make an operation such as clearing the cache difficult.

Suppose you have a protected HTTP handler, say, ClearCaches.axd, which if pinged by some administrative IP address tells the System.Web.Caching cache to clear itself. You’ll only have cleared out the cache for the worker process that happened to serve that particular request. To clear them all, you’d have to recycle all of the worker processes through the IIS Management snap-in or just keep pinging your ClearCaches.axd until you were confident that IIS had round-robined you across to all of the worker processes. Neither is really ideal.

A better solution would be to move to an out-of-process cache like memcached. Then you only have one service that you need to clear the cache for, and that cache clearing action will be observed by all three worker processes.

These are the two things that you’d naturally expect to have to deal with when running Web gardens. Now let’s talk about two that you normally wouldn’t expect. (That’s a nice way of saying that I learned them the hard way.)

Poor App_Offline.htm integration

So ASP.NET 2.0 has this little nifty feature that revolves around a file called App_Offline.htm. You dump a file named like this in the root of your ASP.NET Web application. Upon the next request to any page on the site served through the ASP.NET ISAPI handler, the worker process will spin down, and the contents of the App_Offline.htm file will be served instead. (Stupidly, ASP.NET will send a 404 header for this, which is plain wrong, but this is hard-coded.) This unlocks the DLLs in your site’s bin directory. So if you filled that file with a message like “We’re down for maintenance; check back soon,” you’d be able to xcopy over the new binaries to the bin directory, delete the App_Offline.htm file when you’re done, and you’d have provided a simple and graceful site maintenance message to your users in the process.

But this doesn’t work well for Web gardens. The problem is that the worker process spins down and releases its lock on your application binaries only after the next request to that individual process after you created the App_Offline.htm file. So if I have a Web garden with 3 worker processes A, B, and C, and I plunk my App_Offline.htm file down, and I visit the home page of my site, then something like this might happen: Worker process B decides to process my request, notes that the App_Offline.htm file exists, serves me the contents of the file, and then spins down and releases its lock on the application binaries. But processes A and C are still churning along and locking those files. I’d either have to refresh the homepage a whole bunch of times to be sure I had round-robined across all of the binaries to get them to all spin down (less than ideal), or I’d have to kill all the worker processes by stopping the application pool in the IIS Management snap-in. In the end, though, it’s easiest to just pretend this feature doesn’t exist for you and find some alternative update mechanism, such as deactivating your main site and activating a completely separate Web site in IIS that simply displays the site maintenance message.

Bug in .NET Framework installer can cause a strange scenario

I lost some hair over this one because it’s unexpected and only appears under a very specific scenario. Here we go:

Take a server running .NET 3.0 and an ASP.NET Web site running in an application pool that has Web gardens enabled (number of processes: 3). The web.config configuration is something like the following:

    <sessionState
      cookieless="UseCookies"
      cookieName=".authz"
      mode="StateServer"
      regenerateExpiredSessionId="true"
      stateConnectionString="tcpip=127.0.0.1:42424"
      timeout="60"
      useHostingIdentity="true" />

Now upgrade the machine to .NET 3.5 SP1. Reboot the server. Stand back in shock and horror to find that sessions are no longer maintained across the worker processes, as if all of them have reverted to InProc session storage. You see that the configuration obviously still says StateServer, but it Simply Does Not Work. You quickly reduce to 1 worker process as the current workaround.

The problem appears to occur when all of the following conditions are true:

  • You are running Windows Server 2003 (IIS 6.0) and an ASP.NET 2.0 web site.
  • The Web site is configured to use Web Gardens, where the maximum number of worker processes is greater than 1. Because of this, you have configured your application to use an out-of-process session storage; in this scenario, the ASP.NET State Service running on the local machine.
  • The application pool identity is set not to NETWORK SERVICE but to a custom, low-privileged user account that you created per a deployment best practice.
  • You run an installer that updates the .NET framework; in my case, this was an update from .NET 3.0 to .NET 3.5 SP1.
  • When the upgrade finishes and you reboot the server, you find that your session variables are frequently lost upon refreshing a page, since there is only a 1 in 3 chance of you getting the original worker process that served your original request. But this shouldn’t matter, since you’re using the ASP.NET state service. What broke?

When using the ASP.NET state service, ASP.NET uses a value called the machineKey to encrypt and/or hash all session data to be stored (I don’t know if it’s encrypting or hashing or both, but it’s not an important distinction for this discussion). This is so that when any worker process asks for data from the service using a session identifier, it can be sure that the data was not tampered with while it was being stored in the external data source.

If you are on a web farm, then you probably have a static machineKey defined in your web.config file, and this issue does not occur. But for a single-server web garden scenario, you probably rely on the default machineKey setting, which is set to AutoGenerate,IsolateApps for ASP.NET 2.0 applications. This means that ASP.NET automatically generates a machine key that is unique to your application pool. It regenerates this key according to some algorithm, but that is not important for this discussion.

The generated value is normally stored in the registry under HKLMSOFTWAREMicrosoftASP.NET2.0.50727.0AutoGenKeys{SID of the Application Pool Identity}. But the .NET Framework installer incorrectly (I do believe this is a bug) destroys this registry key and, to add insult to injury, resets the permissions on this key such that your custom application pool identity cannot write to the registry entry when it goes to create its new machine key.

Without being able to store this key, each process will think session data inserted by other processes has been tampered with or is otherwise invalid. Yikes!

Without being able to store this key, each process will think session data inserted by other processes has been tampered with or is otherwise invalid. Yikes!

The result is that each worker process that spins up in the web garden is using its own in-memory copy of a machine key that it generated just in time, effectively creating a web farm scenario by accident. For example, worker process A spins up, sees that no AutoGenKey entry exists (indeed, it cannot even read it), generates its own and begins using that to hash data sent to the ASP.NET State Service. It tries to save this new machine key to the registry entry, but fails silently. Worker process B spins up, sees that no AutoGenKey entry exists, generates its own and begins using that to hash data…you see where this is going.

Bad permissions on this one registry key can cause some real headache-inducing behavior as everything seems to behave as if it were InProc.

Bad permissions on this one registry key can cause some real headache-inducing behavior as everything seems to behave as if it were InProc.

The result is now you have session data hashed with three different machine keys. Though the data for the session identifier exists, two out of three of the worker processes will reject it as invalid/tampered because it is using its own key.

You could get around this by explicitly setting a custom machineKey in your web.config file.

Or you could re-run aspnet_regiis.exe -ga MachineNameApplicationPoolUserName at a Command Prompt to fix up the broken permissions.

Conclusions and Delusions

There you have it! Web gardens are a really neat feature for increasing the robustness of your application as well as testing the general suitability of your application for migration to a Web farm scenario. But documentation out there is pretty scarce, and I get the feeling that people don’t use Web gardens all that often. Hopefully, these little issues that I’ve run across will save somebody some heartache in the future. Good luck!

Dec 5 08

On IIS 6, Content-Location, Internal IP Addresses, and URL Redirection

by Nicholas Piasecki

One of our automated security scans of our Web site turned up a low priority notice that our server was revealing its internal IP address on some redirect requests. It’s a “feature” of IIS that occurs when a client requests a resource via HTTP 1.0 (or without the Host header) that results in a redirection. The output of a telnet session to such a resource looks something like this:

[npiaseck ~]$ telnet www.example.com 80
Trying 10.1.10.38...
Connected to www.example.com (10.1.10.38).
Escape character is '^]'.
GET /Media HTTP/1.0
 
HTTP/1.1 301 Moved Permanently
Content-Length: 152
Content-Type: text/html
Location: http://10.1.10.38/Media/
Server: Microsoft-IIS/6.0
Date: Fri, 05 Dec 2008 13:50:00 GMT
Connection: close
 
<head><title>Document Moved</title></head>
<body><h1>Object Moved</h1>
This document may be found <a HREF="http://10.1.10.38/Media/">here</a>
</body>
 
Connection closed by foreign host.

I’ve made up an internal IP address, but the point is that IIS is inserting the internal IP address in the body of the document as well as the Location header. If you make the same request with the Host header supplied, such as in Host: www.example.com, then IIS would use www.example.com instead of supplying the IP address.

The recommended Microsoft solution is detailed in KB834141. Namely, you edit the SetHostName metabase entry for the Web site to www.example.com, and now IIS will use that instead of the IP address in the above redirection scenario. Easy as pie, right?

Well, that change broke some code. For reasons that are not clear to me, here’s the mechanism by which IIS 6 determines what to use in these redirect requests:

  1. Is the UseHostName property set to true? If so, use the machine’s host name.
  2. Is the SetHostName property set to some value? If so, use that value.
  3. Is a Host header supplied in the HTTP request? If so, use that value.
  4. If all of the above fail, just use the IP address, which is probably internal.

For search engine optimization reasons, all of the requests on our Web site redirect all requests that come in without the www subdomain to ones with the www subdomain. This prevents Google from dinging us for having duplicated content. The code was implemented as an ASP.NET HTTP module that ran before every request and inspected the incoming URL by looking at the HttpContext.Current.Request.Url.OriginalString property. If the domain didn’t match, then it issued a permanent redirect.

The problem with the above precedence order is that IIS will substitute the value of HTTP_HOST and other server variables passed into ASP.NET with the SetHostName value, not the Host header if so supplied. The net effect is that our redirection method suddenly stopped working because the code was always seeing the SetHostName value, www.example.com, regardless of the URL that was actually used to make the request. To make matters worse, if we were to set the UseHostName property, the code would always see a mismatch and enter a redirect loop. Neither are desirable scenarios.

The solution is to change the code to inspect the Host header directly, if present, instead of looking at the URL as passed into the server variables array. It ends up looking something like this:

private void OnContextBeginRequest(object sender, EventArgs e)
{
	HttpContext context;
	string originalRequestHost;
	int portIndex;
	HttpRequest request;
	string redirectUri;
 
	context = HttpContext.Current;
	request = context.Request;
 
	originalRequestHost = request.Headers["Host"];
 
	// Some clients will pass in the port number in the Host header if
	// it's not going over port 80
	portIndex = originalRequestHost.LastIndexOf(':');
	if (portIndex != -1)
	{
		originalRequestHost = 
			originalRequestHost.Substring(0, portIndex);
	}
 
	if (originalRequestHost != null && originalRequestHost != mPreferredSubdomain)
	{
		redirectUri = string.Format(
			"http://{0}{1}{2}",
			mPreferredSubdomain, 
			request.Url.Port != 80 ? ":" + request.Url.Port.ToString() : string.Empty,
			request.RawUrl);
		cLog.InfoFormat("Redirecting from to {0} (domain: {1}, preferred: {2}).",
			redirectUri, originalRequestHost, mPreferredSubdomain);
		context.Response.Redirect(redirectUri, true);
	}
}

The code looks at the host header. If it’s set, and doesn’t match, then it redirects. (It does a little bit of finagling to strip off the port number if a non-standard port is used in the request.)

With this code in place, we can now use the SetHostName property and still have our URL redirection work properly for HTTP/1.1 requests. We’re out of luck when it comes to HTTP/1.0 requests, but this is something that we can live with.