Dear Authorize.Net

I walked in this morning to find multiple Authorize.net accounts failing all transactions with a General Error. I called Authorize.net around 8 am Eastern and got connected to a woman who said I was breaking up. Then I launched a Live Chat, and Brandon S. told me that I needed to contact my bank. He confirmed that lots of other people were having the errors, but that I needed to contact my bank for resolution. So I called my bank (Wells Fargo) who had no idea what was going on and said to try back at 9 am. When I called again, they admitted they were getting a lot of calls about this but weren’t sure what was going on. Twitter starts lighting up with people reporting similar problems. All for an outage that had been going on since around 4 am Eastern according to our logs.

So now it looks like a First Data change broke Authorize.Net, or the other way around, or in some other way in which I shouldn’t have to care: it still took you half a day to admit that you had a problem and then get it resolved. That’s what happens when you play the blame game instead of investigating issues yourself.

Here’s what you should have done:

  • You should have noticed a high percentage of these errors occurring during the night.

  • You should have noticed that they were all related to a particular processor.

  • You should have alerted that processor, posted a status message to your home page, and alerted your customer service staff.

Instead, you told individual customers to call their banks and fend for themselves. You said, “it’s not my problem.”

This is typical Authorize.Net; time and time again, I see you passing the blame onto someone else:

  • Instead of implementing two-factor security, you require us to change inane security questions and passwords constantly.

  • Instead of improving account security through alerts for suspicious behavior, you make merchants sign a release form for ECC because it’s always easier to pass the buck than it is to make your system better.

  • Instead of improving the very real use case problems with CIM, you let developer complaints on the forums go without action for years.

  • Instead of making refunds easy to handle like YOUR PARENT COMPANY CyberSource, you document that each individual merchant needs to void and recapture instead of implementing this functionality once, for everybody, yourselves in the API. Which means storing the card details (hi PCI DSS!) or battling CIM.

I’m beginning to think that nobody actually works on customer feedback or new features at Authorize.Net. Instead, it’s all maintenance on a decade-old platform and API that, because it’s been running for this long, there must be nothing wrong with it.

You are setting your company up to be disrupted by a new player. You need to act now by not being a payment gateway but by being a company that simply helps customers accept credit cards, and that means figuring out what’s wrong any time, every time.

Otherwise, you’re just another middleman passing the buck, and you will be replaced.

Thanks for listening.

Using the EPL2 GW command to send an image to a Zebra thermal printer

How’s that for a title? While that’s gibberish for most, if you’re the unlucky soul who’s been tasked with dumping an image to your Zebra thermal printer, it could be just what you’re looking for.

The ubiquitous Zebra thermal printer.

The ubiquitous Zebra thermal printer.

A little background

These thermal printers are commonly used in shipping environments to print out USPS, FedEx, and UPS labels. The printers generally speak two languages natively: EPL2 and ZPLII. The EPL (Eltron Programming Language) language is older than ZPL (Zebra Programming Language) but is also a bit simpler.

Zebra bought Eltron and has kept EPL around for backward compatibility reasons; the tired LP2844 only speaks EPL, but newer printers can speak both EPL and ZPL. While ZPL has advanced drawing features and proportional fonts, I tend to favor EPL just because the command set is simpler and it ends up getting the job done.

The EPL language consists of printer commands, one per line, in an ASCII text file. Each of the commands is described in the EPL Manual on Zebra’s site. For example, here’s a quick-and-dirty document that prints out some text on a 3″ x 1″ thermal label:

 
N
q609
Q203,26
A26,26,0,5,1,2,N,"HI, MOM!"
P1,1

Each command generally starts with a letter and is followed by comma-separated parameters, each of which are described in the manual. There are commands for drawing text, drawing lines, and drawing barcodes: the basic kind of stuff that you need to do in a warehouse envrionment.

OK. But what about images?

These printers come with a Windows printer driver that let them work with Windows like any other GDI-based printer. You can print a Word document, the driver translates it into a bitmap, and then the driver dishes out the image to the printer.

Sometimes, though, you want to use the EPL language (after all, your printer has tons of barcode formatting built into it already, so you might as well use that instead of buying a third-party library) while also dishing out a bitmap (such as your company’s logo). You look in the handy-dandy manual and see that you need to use the GW command to send out an image, and you start by …

The GW Command

… scratching your head. Ugh, looks like we’ll have to do some math.

But not yet

First, though, let’s pound and poke the image that we want to draw into a format that we can work with. Since we’re dealing with a thermal printer, there is no concept of grayscale here: we either burn a dot (represented by bit 0) or don’t burn one (represented by bit 1). (If you’ve seen my post about printing images to an ESC/POS receipt printer, you’ll note that Epson’s convention is conveniently the direct opposite of this.)

There’s a good chance that the bitmap that we’re trying to draw is not monochrome. In a monochrome image, each pixel is either 100% black (burn a dot) or 100% white (don’t burn a dot). Our image is probably grayscale or even in color, so we need to figure out how to snap each pixel to either being pure black or pure white.

The way one figures out how to do this is to search for it on the Internet and hope that some graphics nerd has done this for you. And, happily, there’s apparently this thing called luma that will serve this purpose nicely:

/// <summary>
/// Gets a <see cref="BitmapData"/> instace for a given image.
/// </summary>
/// <param name="bytes">The image data.</param>
/// <returns>The <see cref="BitmapData"/> instance.</returns>
private static BitmapData GetBitmapData(byte[] bytes)
{
    using (var ms = new MemoryStream(bytes))
    using (var bitmap = (Bitmap)Bitmap.FromStream(ms))
    {
        var threshold = 127;
        var index = 0;
        var dimensions = bitmap.Width * bitmap.Height;
        var dots = new BitArray(dimensions);
 
        for (var y = 0; y < bitmap.Height; y++)
        {
            for (var x = 0; x < bitmap.Width; x++)
            {
                var color = bitmap.GetPixel(x, y);
                var luminance = (int)((color.R * 0.3) + (color.G * 0.59) + (color.B * 0.11));
                dots[index] = luminance < threshold;
                index++;
            }
        }
 
        return new BitmapData()
        {
            Dots = dots,
            Height = bitmap.Height,
            Width = bitmap.Width
        };
    }
}

Given our bitmap as an array of bytes, we load it into a GDI+ bitmap using .NET’s System.Drawing namespace. Then we apply our luminance formula to determine how “bright” each pixel is, snapping it into a binary value. Then, we return a BitmapData struct that just contains the properties that you see here: the bitmap height, the bitmap width, and the now-binary pixels of our image strung out in one long array.

That’s nice, but you’re avoiding the math

So now we need to generate the actual GW command. The code here is remarkably similar to that of the article I wrote about sending images to ESC/POS printers. Here we go:

/// <summary>
/// Inserts a GW command image.
/// </summary>
/// <param name="bw">The binary writer.</param>
/// <param name="top">The top location.</param>
/// <param name="left">The left location.</param>
/// <param name="image">The image bytes.</param>
private static void InsertImage(BinaryWriter bw, int top, int left, byte[] image)
{
    var encoding = Encoding.ASCII;
    var data = GetBitmapData(image);
    var dots = data.Dots;
    var bytes = (int)Math.Ceiling((double)data.Width / 8);
 
    bw.Write(encoding.GetBytes(string.Format("GW{0},{1},{2},{3},", top, left, bytes, data.Height)));
 
    var imageWidth = data.Width;
    var canvasWidth = bytes * 8;
 
    for (int y = 0; y < data.Height; ++y)
    {
        for (int x = 0; x < canvasWidth; )
        {
            byte s = 0;
 
            for (int b = 0; b < 8; ++b, ++x)
            {
                bool v = false;
 
                if (x < imageWidth)
                {
                    int i = (y * data.Width) + x;
                    v = data.Dots[i];
                }
 
                s |= (byte)((v ? 0 : 1) << (7 - b));
            }
 
            bw.Write(s);
        }
    }
 
    bw.WriteNewLine();
}

So what the hell is going on here?

The first thing of note is that we calculate the p3 parameter by converting our bitmap width (in pixels) into a number of bytes. Since each pixel is represented by one bit (a 1 or 0), then each byte represents 8 pixels. That means that our image’s width must be a multiple of 8. We handle this by using Math.Ceiling in the conversion so that we’ll end up just padding with extra white space if our bitmap width is not a multiple of 8.

The second thing of note is that we calculate the p4 parameter. This is referred to in the documentation as the “print length” which is just a confusing way of asking “how many dots tall is it”. This is just the height of our bitmap.

Finally, we need to dump out our pixel data. I’m using .NET’s BinaryWriter class, so I have to write out data to the stream in bytes. The outer loop loops through each horizontal stripe of the bitmap, starting from the top. And the next loop draws each dot in that line, starting from the left. And the innermost loop fills up a byte, since we have to “gather” 8 pixels at once to write out as a byte to the BinaryWriter. There’s an extra if check there to account for the case where our bitmap image width is not a multiple of 8; if so, we need to make sure to pad the extra space instead of marching off to the next line in our dots array.

The s |= (byte)((v ? 0 : 1) << (7 - b)); line looks terrifying but is really just working to build up the byte. I discussed the mechanics of this in detail in my post about printing images to an ESC/POS receipt printer.

If you open the file in Notepad, you’ll see gibberish. That’s because the image data you just encoded isn’t going to map neatly into ASCII characters, and this is where the design of the EPL language starts to break down. It’s nice that most of it can be expressed in simple ASCII characters and can be edited in a text editor, but this isn’t one of those cases. If you open the file in a text editor and save it, you might “bake” the binary image data portion it in the wrong encoding and end up with gibberish on your printer. Be sure to send it directly to the printer as is without loading it into a string or StringBuilder!

Putting it all together

In my case, I was sick of a certain provider constantly breaking their EPL label with every software update and instead wanted to dump their PNG version of the label (which they seem to actually test) directly to the printer on 4″ x 6″ stock. Given the bitmap as a byte[] array, here’s a quick-and-dirty function to dump it out:

/// <summary>
/// Converts an image, represented by the given binary payload, into an EPL document.
/// </summary>
/// <param name="source">The image to convert.</param>
/// <returns>The EPL document payload.</returns>
internal static byte[] AsEplImageDocument(this byte[] source)
{
    using (var ms = new MemoryStream())
    using (var bw = new BinaryWriter(ms, Encoding.ASCII))
    {
        // Clear out any bogus commands
        bw.WriteNewLine();
 
        // Start a new document
        bw.Write(Encoding.ASCII.GetBytes("N"));
        bw.WriteNewLine();
 
        // Label width is 4"
        bw.Write(Encoding.ASCII.GetBytes("q812"));
        bw.WriteNewLine();
 
        // Label height is 6" ... only important in ZB mode
        bw.Write(Encoding.ASCII.GetBytes("Q1218,20"));
        bw.WriteNewLine();
 
        // From earlier in the article
        InsertImage(bw, 0, 0, source);
 
        // Print one copy of the label
        bw.Write(Encoding.ASCII.GetBytes("P1,1\n"));
        bw.WriteNewLine();
 
        bw.Flush();
 
        return ms.ToArray();
    }    
}

You can dish the resulting document out directly to the printer using the RawPrinterHelper class from MSDN.

Hope this helps someone!

A brief rant on credit card security

I wish that the Address Verification System (AVS) worked well, that it worked against shipping instead of billing addresses (i.e., cardholders could maintain a list of valid shipping addresses online with their bank), and that it was required to process all card-not-present transactions (because this might give banks an incentive to make AVS actually work well all of the time, when their cardholders start complaining that e-commerce stores are flagging all of their transactions, and for consumers to maintain updated address information at their banks).

To stop fraud, we need to stop allowing purchases to be made solely with the information that appears on the card because this is not secret information; it is basic information about an account. This is not something that merchants can fix alone. Merchants already have to foot the bill for chargebacks, and PCI standards are really just an attempt to make merchants responsible for maintaining the security of what is, at its core, a payment system that broke in the 1990s. CVN didn’t work–10 years later, we still have to explain to consumers where the number is because the card associations have not spent enough effort educating the public and, quite frankly, they don’t care. To use a credit card, anywhere, you should have to swipe it (PCI is still relevant in this scenario). Period.

Two-factor authentication, in the traditional sense, is not a solution. 3-D Secure is confusing for both merchants and consumers, expensive, and little used: for the program to work, it needs to be required. A better solution would be programs like Discover’s Secure Online Account Number (where you can generate a temporary account number), but they fucked that up by making the temporary account numbers expire on the same day as the real account number (to be effective, they should expire after a certain number of transactions or be able to be locked down to a specific merchant). And for such a program to work, this also needs to be required, too.

It’s time for consumers and banks to grow up and deal with credit card fraud because the problems with it today are largely ones that merchants cannot and no amount of obfuscating cardholder data can fix. Unfortunately, at the end of the day, it’s the merchant who usually ends up paying for the fraudulent charges when the dime finally drops, so we’ll continue to see little help from the card associations and the issuing banks: no one has the balls to make sweeping consumer- and bank-initiated security changes to the industry and require those changes for future transactions, unless they’re pointing their fingers at the merchants.

I guess we’ll just have to keep bending over and taking it–the responsibility, the consequences of fraud, and all of the blame when the system doesn’t work.

DKIM Signing Outbound Messages in Exchange Server 2007

DomainKeys Identified Mail is a proposed standard that, in its own words, allows a domain (such as skiviez.com) to assert responsibility for the contents of an e-mail message.

The outgoing mail transport agent (“mail server”) computes a hash of the body of the mail message and adds a special, cryptographically signed DKIM-Signature header to the mail message that contains this hash as well as some other information about the message. The public part of the key, used by receiving mail transport agents to verify the signature on incoming messages, is stored as a TXT record in the DNS for the signing domain.

The result of all of this is that the recipient can now know that the body and certain headers of the e-mail message were not tampered with or changed by a third party while the message was in transit. If the signature contained within the DKIM-Signature header doesn’t verify, then it’s possible that the message is a phishing message, spam, or some other false representation of members of the signing domain. (IMHO, however, it is more likely to just indicate a broken DKIM implementation, as there are several obnoxious corner cases that we will see.)

Where is the Exchange support?

No currently released version of Microsoft Exchange supports DKIM natively, not even 2010. This is purportedly because Microsoft threw its weight behind another standard called SPF, or Sender Protection Framework. SPF is much simpler to implement because it does not make a statement about the integrity of the message contents; that is, there is no signing step that requires processing of each outbound message and stamping it with a special header. Instead, a TXT record in the DNS for the sending domain lists the IP addresses from which e-mail messages from that domain are allowed to originate.

For example, let’s say that a store has a mail transport agent at mail.example.com and a Web site at www.example.com. An SPF record for example.com might make a statement like the following:

“E-mail messages purporting to be from @example.com should only originate from either mail.example.com or www.example.com. All other originating IP addresses should be treated with suspicion.”

You can see why this is easier to implement; you just add an entry to DNS, and you don’t have to configure the outbound mail transport agent to do anything. The only reason a mail transport agent would need to change is for it to gain the ability to inspect incoming messages and validate the MAIL FROM part of the message headers against the corresponding SPF TXT records in DNS. And that is what Exchange does today.

While SPF does not validate the integrity of the message–a hacked, legitimate company mail server that spews out spam unwittingly will still pass an SPF–it is a simple, convenient way to mark phishing and spoofing attempts originating from other locations, like a random house in Wisconsin that is unwittingly part of a botnet, with suspicion.

But, then again, a hacked mail transport agent using DKIM that spews out spam unwittingly will still pass a DKIM verification. Spam can be signed just like regular mail. So the advantage of DKIM over SPF is that DKIM provides the ability to prove that the contents of the message were not modified in transit, and if an organization so chose, they could sign different messages with different keys or decide to not sign certain kinds of messages at all, allowing the recipient to interpret these different kinds of mail in different ways.

In reality, most e-mail doesn’t need to be that secure, so the utility-to-difficulty ratio may be a little out-of-whack here, especially when other standards for signing messages already exist.

So DKIM is more complicated and not supported by Microsoft. Why implement DKIM on Exchange?

Because fuck Yahoo!, that’s why.

Actually, let’s step back a minute.

Yahoo! invented something called Domain Keys, which predates DKIM and which DKIM is largely based on, in an attempt to combat spam in the ways described previously. The aim was for Yahoo! mail servers to be able to easily identify entire classes of suspect e-mail and send them to people’s spam folders (which, in Yahoo! weasel words, is referred to as the “bulk mail folder”). Then they integrated it into the Yahoo! mail system.

If your business sends mail of any kind of volume to Yahoo! mail servers, such as an e-mail newsletter that users subscribe to, you’ll quickly see a message along the lines of the following:

421 4.16.55 [TS01] Messages from x.x.x.x temporarily deferred due to excessive user complaints

And you might also notice your e-mail messages going straight to the spam folder on Yahoo! mail accounts by default, with a X-Yahoo-Filtered-Bulk: x.x.x.x header appearing in the message headers. If this has happened, it is because Yahoo! has decided that you are naughty and placed the IP address of your mail transport agent on an internal blacklist. If a Yahoo! user has previously marked e-mail from your domain as “Not Spam,” then that user will continue to receive your e-mail, but any other Yahoo! users will find your e-mail going to the spam folder by default until they either click the “Not Spam” button on one of them or add your from address to their contact list.

“But we’re a legitimate small business,” you say. “Only about 5,000 of the e-mails on our mailing list are actual Yahoo! addresses, the newsletter list is opt-in, and there are ‘Unsubscribe’ links in every e-mail. It’s not like we’re spamming users out of the blue,” you continue. “These are users who have asked for this service and can cancel at anytime. So why has Yahoo! blacklisted us and treat us like spam?”

The answer, unfortunately, is that Yahoo! is incompetent and can’t build a mail service that can handle spam like Gmail can handle spam. Instead, their e-mail system depends on the following metrics:

  • If a large burst of e-mail comes from the same IP address within a certain time period, temporarily consider it as a spam and temporarily deny the requests. The mail will still arrive, but the “fat pipe” will be squashed down so that overall delivery takes longer to complete. The idea is that if the mail is originating from a spammer, they are unlikely to try again.
  • If a “significant” number of Yahoo! users mark mail messages originating from the same IP address as spam by clicking on the “Spam” button in their account, add that IP address to the X-Yahoo-Filtered-Bulk blacklist.

In other words, if enough people mark your e-mail as spam, your IP address is fucked, and you’ll soon be fielding complaints from customers who insist that they aren’t receiving your Web sites transactional e-mails, such as a “forgot my password” or “shipping confirmation” e-mail, because Yahoo! has now decided to deliver all e-mails that originate from your IP address to the spam folder, which many users do not think to check or cannot find. So they blame you.

If you think that Yahoo! has blacklisted you in error, or if you have improved your e-mail sending practices and wish to have them consider you for removal, you can fill out this form with incorrect JavaScript required field validation to request that the Yahoo! postmaster take you off the naughty list. About 27 hours later, a Yahoo! representative with a sub-room temperature IQ will e-mail you back a long, canned reply that generally amounts to “No.”

But in that “no” message, they offer the possibility of signing up to the Complaint Feedback Loop. When you participate in this program, if a Yahoo! user presses the “Spam” button on one of your e-mails, then Yahoo! will e-mail you the e-mail address of that user along with a copy of the e-mail that you sent. You can then unsubscribe that user from your mailing list yourself (since they neglected to use the unsubscribe link in your e-mail and will continue to damage the reputation of your IP address if you continue to send mail to them), or you can see which kinds of e-mails you are sending are creating the most problems.

But participation in their feedback loop requires that you use DKIM to sign all of your outbound mail messages. The reasons for this are not clear, but I assume it has to do with ensuring that you are only notified about the problematic e-mails that legitimately came from your organization, and not ones generated by a spammer. Or, it could simply be a way to drive adoption of Yahoo!’s DKIM standard.

At any rate, Yahoo! mail is the most widely used e-mail service on the planet, so when the idiots in the higher echelons of Yahoo! say “Jump!” then we must ask “How high?” When they say “implement DKIM”, we implement DKIM.

So what are options for implementing DKIM on Exchange?

One option is to simply not do it in Exchange and set up a relaying mail server that has DKIM support, like hMailServer or Postfix with dkim-milter. But if you’re at a small business, the idea of maintaining yet another server for what is conceptually a simple task is not a pleasant thought.

Another option is to use a dedicated device like a Barracuda or an IronPort. This device would sit in front of Exchange and rewrite the mail headers in transit, adding the DKIM-Signature header as it flies out of your office. But these devices are not cheap and are out of reach of many small businesses. And the thought of acquiring a specialized device for doing something any mail transport agent should be able to do natively is not a pleasant thought.

You can buy an off-the-shelf plug-in for Exchange like this one from a company in Hong Kong. The reviews on the Internet do generally seem to indicate that it does work. But do we really want to spend $300 – $800 on a component from a company without a reputation and give that component read access to all of our organization’s outbound mail messages? Trust is certainly a driving factor here.

A fourth option, and the option I pursued since I considered giving up on DKIM if Yahoo! didn’t change its ways after our implementation, is to take advantage of Exchange 2007′s new Transport Agents functionality. This allows you to write custom, managed code that runs on that .NET Framework that integrates with the Exchange message processing pipeline. In our case, we could write a custom transport agent that appends a DKIM-Signature header to outgoing MIME messages.

Setting up the project

In Visual Studio, we just need to use a plain old “C# Class Library” project. The project must target version 2.0 of the CLR (that’s .NET Framework 2.0, .NET Framework 3.0, or .NET Framework 3.5, thanks to the dipshits in Microsoft’s marketing department) and not the .NET Framework 1.x or, maddeningly, the .NET Framework 4.x. This is because the transport agent process provided by Exchange Server 2007 that will load our transport agent is running on CLR 2.0, and a process that’s been loaded with an earlier version of the CLR can’t load a later version.

We also need to reference two assemblies provided by the Exchange Server, Microsoft.Exchange.Data.Common.dll and Microsoft.Exchange.Data.Transport.dll. You would think that you would be able to find these in, say, the Exchange Server SDK that’s available from Microsoft Downloads. But you’d be wrong. Microsoft keeps diddling with the version of these assemblies with various update rollups for Exchange, and the only way to get a copy is to pull them off an actual Exchange Server 2007 installation. Mine were located in C:\Program Files\Microsoft\Exchange Server\Public; I just copied them to my local computer and referenced them in my new class library assembly for local development.

To start, we just need to create two classes. The first is our actual agent, which derives from the RoutingAgent class defined in the DLLs we just copied:

public sealed class DkimSigningRoutingAgent : RoutingAgent
{
	public DkimSigningRoutingAgent()
	{
		// What "Categorized" means in this sense,
		// only the Exchange team knows
		this.OnCategorizedMessage += this.WhenMessageCategorized;
	}
 
	private void WhenMessageCategorized(
		CategorizedMessageEventSource source,
		QueuedMessageEventArgs e)
	{
		// This is where the magic will happen
	}
}

The second is a factory class that the Exchange server’s little plug-in agent looks for; it’ll instance new copies of our agent:

public sealed class DkimSigningRoutingAgentFactory : RoutingAgentFactory
{
	public override RoutingAgent CreateAgent(SmtpServer server)
	{
		return new DkimSigningRoutingAgent();
	}
}

You’re probably thinking, “Wow! The documentation provided by the Exchange team sure blows donkey chunks, how did you ever figure out that that was what you needed to do?” The answer is by spelunking through the Exchange Server SDK examples and by religiously following this guy’s blog.

Now we’re ready to start reading mail messages and “canonicalizing” them into the format that the DKIM spec expects.

Exchange and “Internet Mail”

If there is one thing that is annoying about Exchange, it is that 20-some years after the fact it seems skeptical that this whole “Internet Mail” thing is going to catch on. Getting Exchange to give you the raw message in MIME format is not as simple as one might think.

You see, to be able to hash the body of the message and sign certain headers in the message, we need to know exactly how Exchange is going to format it when it is sent. Even a single space or newline that is added after we compute the hash can throw the whole thing off.

In our routing agent’s OnCategorizedMessage event listener, the event arguments give us access to an instance of the MailItem class. It has a boatload of properties for accessing the body and headers of the message programmatically. Unfortunately, we can’t use these properties because they represent the semantic values, not the raw ones. Instead, we’ll need to use the GetMimeReadStream() and GetMimeWriteStream() methods to read the raw mail message and write out the modified version, respectively.

Implementing the routing agent

Let’s start by completing the Routing Agent implementation. We’ll keep it simple by moving all of the hard signing stuff into an IDkimSigner interface, which we’ll worry about implementing later:

public sealed class DkimSigningRoutingAgent : RoutingAgent
{
    private static ILog log = LogManager.GetLogger(
        MethodBase.GetCurrentMethod().DeclaringType);
 
    private IDkimSigner dkimSigner;
 
    public DkimSigningRoutingAgent(IDkimSigner dkimSigner)
    {
        if (dkimSigner == null)
        {
            throw new ArgumentNullException("dkimSigner");
        }
 
        this.dkimSigner = dkimSigner;
 
        this.OnCategorizedMessage += this.WhenMessageCategorized;
    }
 
    private void WhenMessageCategorized(
        CategorizedMessageEventSource source,
        QueuedMessageEventArgs e)
    {
        try
        {
            this.SignMailItem(e.MailItem);
        }
        catch (Exception ex)
        {
            log.Error(
                Resources.DkimSigningRoutingAgent_SignFailed,
                ex);
        }
    }
 
    private void SignMailItem(MailItem mailItem)
    {
        if (!mailItem.Message.IsSystemMessage &&
            mailItem.Message.TnefPart == null)
        {
            using (var inputStream = mailItem.GetMimeReadStream())
            {
                if (this.dkimSigner.CanSign(inputStream))
                {
                    using (var outputStream = mailItem.GetMimeWriteStream())
                    {
                        this.dkimSigner.Sign(inputStream, outputStream);
                    }
                }
            }
        }
    }
}

The only real quirk is the if statement in the SignMailItem() function, which I mostly discovered through trial and error. If the mail item is a “system message” (whatever that means) then all of the mailItem‘s methods will be read only (throwing exceptions if we try to mutate), so we shouldn’t even bother. And if the mail item has a TNEF part, then in it’s in a bizarro proprietary Microsoft format, and the DKIM spec just isn’t going to work with that. Finally, if something blows up, we catch the exception and log it–better to send a message without a signature than not send it all.

Defining an interface for DKIM signing

So the next step is to make up that IDkimSigner implementation and make it do the dirty work. You can see that I’ve made it simple in that we only need to write two methods:

public interface IDkimSigner : IDisposable
{
    bool CanSign(Stream inputStream);
 
    void Sign(Stream inputStream, Stream outputStream);
}

A method for sanity checking

The first method will scan our mail item’s content stream and do a sanity check and ensure that we can actually sign the message. For example, if our IDkimSigner implementation is configured to sign messages originating from warehouse1.example.com and we pass CanSign() a message from warehouse2.example.com, then we can return false to indicate that we just don’t know what to do with the message. Let’s implement that method.

private string domain;
 
public bool CanSign(Stream inputStream)
{
    bool canSign;
    string line;
    StreamReader reader;
 
    if (this.disposed)
    {
        throw new ObjectDisposedException("DomainKeysSigner");
    }
 
    if (inputStream == null)
    {
        throw new ArgumentNullException("inputStream");
    }
 
    canSign = false;
    reader = new StreamReader(inputStream);
 
    inputStream.Seek(0, SeekOrigin.Begin);
 
    line = reader.ReadLine();
    while (line != null)
    {
        string header;
        string[] headerParts;
 
        // We've reached the end of the headers (headers are
        // separated from the body by a blank line).
        if (line.Length == 0)
        {
            break;
        }
 
        // Read a line. Because a header can be continued onto
        // subsequent lines, we have to keep reading lines until we
        // run into the end-of-headers marker (an empty line) or another
        // line that doesn't begin with a whitespace character.
        header = line + "\r\n";
        line = reader.ReadLine();
        while (!string.IsNullOrEmpty(line) &&
            (line.StartsWith("\t", StringComparison.Ordinal) ||
            line.StartsWith(" ", StringComparison.Ordinal)))
        {
            header += line + "\r\n";
            line = reader.ReadLine();
        }
 
        // Extract the name of the header. Then store the full header
        // in the dictionary. We do this because DKIM mandates that we
        // only sign the LAST instance of any header that occurs.
        headerParts = header.Split(new char[] { ':' }, 2);
        if (headerParts.Length == 2)
        {
            string headerName;
 
            headerName = headerParts[0];
 
            if (headerName.Equals("From", StringComparison.OrdinalIgnoreCase))
            {
                // We don't break here because we want to read the bottom-most
                // instance of the From: header (there should be only one, but
                // if there are multiple, it's the last one that matters).
                canSign = header
                    .ToUpperInvariant()
                    .Contains("@" + this.domain.ToUpperInvariant());
            }
        }
    }
 
    inputStream.Seek(0, SeekOrigin.Begin);
 
    return canSign;
}

Barf. But we have to do this style of ghetto parsing because, after all, we’re dealing with the raw e-mail message format. All we’re doing is scanning through the headers until we reach the last From: header, and then we make sure that the From: e-mail address belongs to the domain that our instance knows how to sign. Then we seek back to the beginning of the stream to be polite.

A method for signing

The second method that we have to implement is the one that actually does all of the dirty work. And in DKIM signing, we can break it down into five steps:

  1. Compute a hash of the body of the message.
  2. Create an unsigned version of the DKIM-Signature header that contains that body hash value and some other information, but has the signature component set to an empty string.
  3. “Canonicalize” the headers that we are going to sign. By “canonicalize”, we mean “standardize capitalization, whitespace, and newlines into a format required by the spec, since other mail transport agents who get their grubby paws on this message might reformat the headers”.
  4. Slap our unsigned version of the DKIM-Signature header to the end of our “canonicalized” headers, sign that data, and slap the resulting signature to the end of the DKIM-Signature header.
  5. Write this signed DKIM-Signature into the headers of the mail message, and send it on its merry way.

Divide and conquer!

Implementing the Sign() method

Our implementation for the Sign() method will tackle each step in turn:

public void Sign(Stream inputStream, Stream outputStream)
{
    if (this.disposed)
    {
        throw new ObjectDisposedException("DomainKeysSigner");
    }
 
    if (inputStream == null)
    {
        throw new ArgumentNullException("inputStream");
    }
 
    if (outputStream == null)
    {
        throw new ArgumentNullException("outputStream");
    }
 
    var bodyHash = this.GetBodyHash(inputStream);
    var unsignedDkimHeader = this.GetUnsignedDkimHeader(bodyHash);
    var canonicalizedHeaders = this.GetCanonicalizedHeaders(inputStream);
    var signedDkimHeader = this.GetSignedDkimHeader(unsignedDkimHeader, canonicalizedHeaders);
 
    WriteSignedMimeMessage(inputStream, outputStream, signedDkimHeader);
}

Computing the body hash

The first step, computing the hash of the body, is actually pretty easy. There is only one quirk in that DKIM spec says that if the body ends with multiple empty lines, then the body should be normalized to just one terminating newline for the purposes of computing the hash. The code is not exciting, and you can download it at the end of this article.

Creating the “unsigned” header

The next step is to create the “unsigned” DKIM-Signature header. This is where the DKIM spec is just weird. The DKIM-Signature header contains a lot of information in it, such as the selector, domain, and the hashing algorithm (SHA1 or SHA256) being used. Since that information is vital to ensuring the integrity of the signature, it’s important that that information be a part of the DKIM signature.

If I were designing this, I would append two headers to e-mail messages: a DKIM-Information header that contained all of the above information and is part of the data that is signed and a DKIM-Signature header that contains just the signature data. But the DKIM spec makes use only the one DKIM-Signature header, and for the purposes of signing, we treat the “signature part” of the header (b=) as an empty string:

private string GetUnsignedDkimHeader(string bodyHash)
{
    return string.Format(
        CultureInfo.InvariantCulture,
        "DKIM-Signature: v=1; a={0}; s={1}; d={2}; c=simple/simple; q=dns/txt; h={3}; bh={4}; b=;",
        this.hashAlgorithmDkimCode,
        this.selector,
        this.domain,
        string.Join(" : ", this.eligibleHeaders.OrderBy(x => x, StringComparer.Ordinal).ToArray()),
        bodyHash);
}

You can see here that I’ve got some instance variables that were set in our IDkimSigner implementation’s constructor, such as the hash algorithm to use, the selector, domain, headers to include in the signature, and so on. We also insert our recently-computed hash of the body here.

You can also see that I’m using “simple” body canonicalization and “simple” header canonicalization. The DKIM spec gives us a few options in determining how the message is represented for signing and verification purposes. For the “simple” body canonicalization, it means “exactly as written, except for the weird rule about multiple newlines at the end of the body”. For the “simple” header canonicalization, it means “exactly as written, whitespace, newlines, and everything”.

There is a “relaxed” canonicalization method, but it’s more work, since you have to munge the headers and body into a very particular format, and I didn’t feel like writing a MIME parser.

Extracting “canonicalized” headers

The third step is to get a list of the canonicalized headers. In the constructor, I accept a list of headers to sign: From, To, Message-ID, and so on. (From is always required to be signed.) Then I use parsing code similar to that used in the CanSign() method and build a list of of the raw headers. The only real gotcha to watch out for is that headers can be wrapped onto more than one line, and since we’re using the “simple” canonicalization algorithm, we’ll need to preserve those whitespaces and newlines exactly as we extract them from the stream. Then I sort the headers alphabetically, since that’s how I specified them in the GetUnsignedDkimHeader() method specified above.

Signing the message

The logic behind signing the message is not that difficult. We smash all of the canonicalized headers together, add our unsigned DKIM-Signature header to the end, and compute our signature on this. Then we append the signature to the b= element, previously empty, of our DKIM-Signature header:

private string GetSignedDkimHeader(
    string unsignedDkimHeader,
    IEnumerable<string> canonicalizedHeaders)
{
    byte[] signatureBytes;
    string signatureText;
    StringBuilder signedDkimHeader;
 
    using (var stream = new MemoryStream())
    {
        using (var writer = new StreamWriter(stream))
        {
            foreach (var canonicalizedHeader in canonicalizedHeaders)
            {
                writer.Write(canonicalizedHeader);
            }
 
            writer.Write(unsignedDkimHeader);
            writer.Flush();
 
            stream.Seek(0, SeekOrigin.Begin);
 
            signatureBytes = this.cryptoProvider.SignData(stream, this.hashAlgorithmCryptoCode);
        }
    }
 
    signatureText = Convert.ToBase64String(signatureBytes);
    signedDkimHeader = new StringBuilder(unsignedDkimHeader.Substring(0, unsignedDkimHeader.Length - 1));
 
    signedDkimHeader.Append(signatureText);
    signedDkimHeader.Append(";\r\n");
 
    return signedDkimHeader.ToString();
}

The only gotcha here, which I lost a few hours to, is a weird quirk of the .NET Framework 3.5 implementation of the SignData() function of the RSACryptoServiceProvider class. One of the overloads of the SignData() function accepts an instance of a HashAlgorithm to specify the kind of hash to use. The SHA-256 implementation was added in .NET 3.5 SP1, but it was done in such a way that an internal switch statement used internally by the .NET crypto classes wasn’t updated until .NET 4.0 to recognize the new SHA256CryptoServiceProvider type. Some guy blogs about why this is, but what it essentially means is that if you pass a SHA256CryptoServiceProvider instance to the SignData() method on .NET 2.0/3.0/3.5/3.5SP1, you get an exception, and on .NET 4.0 you don’t. Since Exchange 2007 uses .NET 3.5 SP1, we have to use the recommended workaround of using the overload that accepts a string representation of the hash algorithm.

Writing out the message

The last step is to write out the message with our newly created DKIM-Signature header. This really is a simple as taking the output stream, writing the DKIM-Signature header, and then dumping in the entire contents of the input stream.

Getting a key to sign messages with

Let us take a brief interlude from our DKIM circles of hell and obtain a key with which we will actually sign the DKIM-Signature header we’ve worked so hard to create.

We need to generate an RSA public/private key pair: a public key to store in DNS in the format required by the DKIM spec, and a private key to actually sign the messages with. The nice folks over at Port25 have a DKIM wizard that does exactly that.

It’s smashingly simple–just enter your domain name (say, “example.com“), a “selector” (say, “key1“), and select a key size (bigger is better, right?). The “selector” is a part of the DKIM spec that allows a single signing domain to use multiple keys. For example, you could use a key with selector name “newsletters” to sign all of the crap newsletter e-mails that you send out, and another key with selector name “tx” to sign all of the transactional e-mails that you send out.

It then spits out the syntax of the TXT records that you need to add to DNS for that selector:

key1._domainkey.example.com IN  TXT     "k=rsa\; p={BIG HONKING PUBLIC KEY HERE}"
_domainkey.example.com      IN  TXT     "t=y; o=~;"

The first record is where the public part of the key is stored. Whenever a mail transport agent sees one of our DKIM-Signature headers with a selector of key1, it’ll know to go hunting in DNS for a TXT record named key1._domainkey.example.com and pull the public key for verification from there.

The second record is part of the older DomainKeys specification and it is not strictly necessary. As written here, it means that we’re in testing mode (“t=y”)–that is, don’t freak out if you see a bad signature because we’re still dicking around with the setup of our implementation–and that not all messages originating from this domain will be signed (“o=~”)–maybe we won’t bother signing our newsletter e-mails, for example.

We’ll also have the private key specified in a format similar to the one below:

-----BEGIN RSA PRIVATE KEY-----
MIIBOwIBAAJBANXBbZybdmjKDTONFVqAWXmGzR6GSZX5LV3OF//1jRz7dzGWTCKK
jembqBxqhr0Y2ua2l4D4EZi6FwDmdqgLS6MCAwEAAQJAD4qhypovEM1oClB+tfbR
Cpn3ffmrjgDxAHoEmrKi0PGBn8fumW22bad2tmrAjWWTVmeXJvQyEy1awq0M2PMR
0QIhAPEnqivb5dKZbTeKhiF4c6IUHfwEq8wNf2LWZvdH3ROrAiEA4un604mDss4Q
qAVEx686pUttfWyJrYkcZ/tx7kOoL+kCICEysqyDAypw0KY6vahR6qk/V7lf8z6O
BSFYHqigDgEtAiEAsK9r5UcQSyv1AD+J/MpOqeJ/kMfwtDUs7zJ01gfMb/ECIQDg
8d/XVJDi4Cqbt4wfcHZxADAgqyK8Z5M69fBecnExVg==
-----END RSA PRIVATE KEY-----

One thing I have glossed over in the code discussion until now was how that this.cryptoProvider instance that actually computes the signature got created.

We’ll need to read this key and load it into the cryptography classes used by the .NET Framework and by Windows to actually sign mail messages and get that this.cryptoProvider instance. Surely there is a simple API for this, yes?

Instancing a CryptoProvider

One problem is that the documentation in MSDN for the CryptoAPI is bad. I say “bad” because it certainly seems like .NET and Windows don’t expose native support for processing a PEM-encoded key, and if it does, well, I couldn’t find the documentation for it. Instead, the RSACryptoServiceProvider prefers to store its keys in an XML format that nothing else in the world seems to use.

This means that our implementation is so close to being finished that we can almost taste it, but now we have to complete a side quest to actually read our damn key and get an instance of the RSACryptoServiceProvider. Or, we could generate a certificate ourselves and store it in the Certificates MMC snap-in, but why should we have to do that? I’d rather just plop the damn key in the application configuration file like the rest of the goddamned world does it, “secure container” my ass.

We can thank the moon and the stars that some guy has written a PEM reader for us. How does it work? I have no idea, but I tested it on several keys and it seemed to work fine, which is good enough for me. I tossed this code into a static CryptHelper class, and now getting an instance of the RSACryptoServiceProvider is as simple as

this.cryptoProvider = CryptHelper
     .GetProviderFromPemEncodedRsaPrivateKey(encodedKey);

Loading the routing agent into Exchange Server 2007

I took all of this code and then added boring administrative stuff like logging and moving some hardcoded values (such as the PEM-encoded key, the selector, and the domain) into the usual .NET App.config file mechanism.

Installing the agent on the Exchange Server is surprisingly simple. After compiling the project and futzing with the configuration file, we just copy the DLLs and configuration file to a folder on the Exchange Server, say, C:\Program Files\Skiviez\Wolverine\DKIM Signer for Exchange\.

Then we launch the Exchange Management Shell (remember to right-click it and “Run as Administrator”) and execute a command to tell Exchange to actually register our agent:

Install-TransportAgent
     -Name "DKIM Signer for Exchange"
     -TransportAgentFactory "Skiviez.Wolverine.Exchange.DkimSigner.DkimSigningRoutingAgentFactory"
     -AssemblyPath "C:\Program Files\Skiviez\Wolverine\DKIM Signer for Exchange\Skiviez.Wolverine.Exchange.DkimSigner.dll"

followed by

Enable-TransportAgent -Name "DKIM Signer for Exchange"

Interestingly, there will be a note telling you to close the Powershell window. It is not kidding. For some reason, the Install-TransportAgent cmdlet will keep a file handle open on our DLL, preventing Exchange from actually loading it until we close the Powershell window.

To make it actually work, we need to restart the Microsoft Exchange Transport service. I’ve found that restarting the Microsoft Exchange Mail Submission right after that is a good idea; otherwise, there can be a short delay of about 15 minutes before people’s Outlooks attempt to send outbound mail again.

Testing the implementation

To make sure things are actually working, Port25 comes to the rescue with their verification tool. You just send an e-mail to check-auth@verifier.port25.com and within a few minutes, they’ll send you an e-mail back with a boatload of debugging information. If it’s all good, you’ll see a result like the following:

Summary of Results
SPF check:          pass
DomainKeys check:   neutral
DKIM check:         pass
Sender-ID check:    pass
SpamAssassin check: ham

(What you’re looking for is the “pass” next to the DKIM check. The DomainKeys part being neutral is OK, since DomainKeys is the older standard and we’re choosing not to implement it.)

Conclusions and Delusions

I’ve been using this code for a few weeks now and it seems to work fine–the messages that I’ve sent through the server to Port25 and my Yahoo! test account all end up showing the DKIM passing. The usual “it works on my machine!” disclaimers apply, however, as I’m sure there are myriad configuration differences in Exchange that could this not to work. Bug fixes are welcome, but don’t come crying if it sends all of your e-mail to that big junk folder in the sky.

And thanks to some blowhards at Yahoo!, the world now has a public domain implementation of DKIM signing for Exchange to play with.

And in case you’re curious–after doing all this work to set up DKIM and participate in the Complaint Feedback Loop at their suggestion–their answer is still “no,” without elaboration. When Yahoo! finally goes under, I won’t be one shedding a nostalgic tear.

There are some unit tests, but they do have our private key in them, and I couldn’t be bothered to siphon those out. The code below is just the bits that do the actual signing.

Download the code used in this article.

Inside Skiviez: Mailing Addresses

Skiviez ships to about two dozen countries around the world. This means that our e-commerce system has to be able to accept and deal with international addresses.

For a long time, Skiviez dealt with this problem in a way that I would say the vast majority of US-based e-commerce based retailers dealt with it: they added a country dropdown, turned the State field into a textbox, and made the State field no longer required. And if you are not obsessive compulsive like I am, then this works reasonably well enough: international customers have to shoehorn their address into the format presented, but the data can still get in there.

However, after 10 years in operation, this approach was beginning to show some deficiencies:

Magnetic attraction. There is, for lack of a better term, a magnetic attraction between users and text boxes. Even if the field is not required, users have a tendency to enter “N/A”, “.”, “?”, “none”, or some other placeholder value in fields that do not apply to them.

Address quality problems. With the above method, the bumpers are off–a user could type “cheese” into the postal code field, and we’d just have to accept it on their word that “cheese” is a valid postal code for the United Kingdom because the site wasn’t doing any sort of validation except for United States-based addresses. Shipping a parcel to another country only to have it returned or destroyed as a result of an invalid or insufficient address is an expensive issue.

Integration issues. To actually ship out international packages, we have to do deal with third-party carrier APIs (such as Endicia, FedEx, and UPS) to purchase postage and print shipping labels. Some of these APIs do perform more advanced validation on international addresses. This could cause problems because a poorly formatted address would make it nearly all of the way through our fulfillment process–order acceptance, picking, processing, and packaging–only to blow up with an error at the shipping desk. It’d be nice if we could catch problems earlier to avoid disrupting the shipping flow and eliminate the potential for misrouting the package.

A smarter system

With the October 2010 release of the Skiviez Web site (which was a pretty huge update), the Web site is now much, much smarter in how it asks for address information. For example, here’s the default address form for the United States, which is about what you’d expect:

But if you choose, say, Ireland in the countries drop down, the page refreshes and displays a new form:

The state drop down is gone (since Ireland doesn’t have states), the ZIP/postal code text box is gone (since Ireland does not use postal codes), and a new County text box is available.

Once an address is entered, it is formatted in the way a person native to that country would expect. For example, in a German address, the postal code comes before the city:

Taking it even further, if you select, say, Finland from the drop down, you get a form back that looks like this:

The postal code validates against the postal code format used in Finland, and that field appears before the Municipality field, since the convention throughout most of the EU is to write “{Postal Code} {City}” instead of the American “{City} {State} {Postal Code}” way.

Clearly, we’ve made a lot of improvements to address handling at Skiviez, and this simple input format actually has a lot of complexity behind it. So how does it all work?

Tweaking the existing database schema

At Skiviez, we were clearly working from a legacy database schema. The customer addresses table has been sitting there for 10 years, and we’re not about to change it any time soon. And while there is some internal cruft, the fields in the database more or less looked like the following:

This is a normal, relational schema. So how do we enhance this table so that it can store addresses from the dozens of countries that we ship to? That sounds like a tall order: surely the addresses of the world are all wildly different from one another, and we’ll have to go entity-attribute-value, throwing away the relational bit of our relational data store, to be able to accurately store this diverse information, right?

Well, it turns out that you can take these existing fields, relax away any NOT NULL constraints, and rename them to field names that are operating at a higher level of abstraction. (We didn’t actually rename the fields in the database, as this would have broken old code, but the new field names do exist in the object model, described later.)

In other words, you can shoehorn most of the addresses in the world (most importantly, the ones that Skiviez ships to), by just thinking of the existing fields in these terms:

Nothing has really changed. We’re just making the concession that it’s OK that the meaning of the data within these fields will change slightly depending on the country we’re talking about. In the United States, the “AdministrativeArea” is the state, territory, or military post office. In Ireland, the “AdministrativeArea” is the county. And in Finland, the “AdministrativeArea” is NULL because it simply isn’t used at all.

Creating a sane object model

There is something wrong with the database schema, though, and for the sake of preserving legacy code and data, I worked around it in the new object model that is persisted to this schema. The problem is as follows:

A mailing address is not the same as a contact.

Frequently while coding over the years, I found myself wanting to deal with a physical address when I didn’t have any related contact information. That is, I had a group of data for 123 Main St; Richmond, VA 23221; United States, but I didn’t know who lived there or what the phone number was. The only object I had in my model was an Address object that was a 1-1 mapping with the old database schema, so this meant having Address instances with a null name and a null phone number.

A more natural model is to split the fields into three objects and create a fourth object that is composed of all of them:

So instead of one big honking Address object, we have

  • an Address object that just contains physical address information;
  • a Country object (not shown in the above diagram) that contains information necessary to format and validate an address (more on this later);
  • a PersonalName object that binds all of the name fields together;
  • a Contact object that groups together a PersonalName, an optional company name string, and a PhoneNumberBase object; and
  • a MailingAddress object that pairs a Contact with an Address.

By splitting the old Address object into these individual components, I can now freely re-use these components elsewhere in the application. For example, the PersonalName object also gets used in the Customer object. The return form generator expects a MailingAddress object, but a shipping rate calculator can deal with an Address object directly, since the latter has no need for the Contact information. In other words, the object model went from coarse-grained to fine-grained.

But perhaps the most important aspect of this model is that each of the four above classes is immutable.

Why are immutable addressing objects better?

When an object is immutable, it can’t be altered after it is first instanced. In other words, everything about the object is permanently set at construction time; its instance methods won’t mutate the object, and its properties don’t have public setters. In domain driven design, this is often referred to as a value type, but I don’t like that terminology as it can cause some confusion with value-vs-reference types in the .NET Framework.

This means that if I have an instance of an Address and want to edit the PostalCode property, then I will need to create a brand new instance of an Address, copy most of the properties over, and set my new postal code at construction time.

There are a few advantages to this approach:

  • The instance can validate itself once at construction time. This can be a big win if validation is particularly complex and dependent on the values of other properties (which is exactly what happens in an address, since the country determines whether or not a field is required and what format it needs to be in). If anything else is referencing that Address instance, then it can assume that that instance won’t get put into an invalid state. For example, a SalesOrder has a ShippingAddress property that takes a MailingAddress instance. The SalesOrder doesn’t have to worry about someone coming along and saying ShippingAddress.PostalCode = null at some point, bypassing any validation checks. The whole ShippingAddress instance would have to be replaced, which means building a new Address instance, which means running the validation logic built into the address builder.
  • The hash code of the instance is consistent throughout the object’s lifetime. This is useful for modeling an address property as a component in NHibernate, which I won’t get into here.

But a big disadvantage to this approach is that it can be tedious to deal with in code. At face level, this means that our Address object would need a gigantic constructor or support an IFreezable-type of implementation. A cleaner way is to use the builder pattern to allow us to set the fields in any order and validate the Address in one shot:

var country = CountryMother.UnitedStates();
 
return new Address.Builder()
    .WithPrimaryStreetLine("3005 SOME ST")
    .WithMunicipality("RICHMOND")
    .WithAdministrativeArea("VIRGINIA")
    .WithPostalCode("23221")
    .WithCountry(country)
    .Build(); // exception thrown here if validation errors

If we want to get a list of detailed validation problems instead of throwing an exception, the builder provides a mechanism for that:

var country = CountryMother.UnitedStates();
var builder = new Address.Builder()
    .WithPrimaryStreetLine("3005 SOME ST")
    .WithMunicipality("RICHMOND")
    .WithAdministrativeArea("VIRGINIA")
    .WithPostalCode("23221")
    .WithCountry(country);
var results = builder.Validate();
 
if (results.Any(x => !x.Success))
{
    foreach (var error in results.Where(x => !x.Success))
    {
        Console.WriteLine(
            "Field {0}, Error {1}",
            error.Field,
            error.Message);
    }
}
else
{
    var instance = builder.Build();
}

This allows the user interface to be able to get to detailed error messages pretty easily. But how does that call to Validate() work? How does the builder really know what format an address should be in?

Validation

We need to define a way for a Country to know how its mailing addresses need to be formatted. Since we have all countries of the world in our database, but we don’t ship to all countries in the world, I created a nullable property on the Country object called Scheme. The Scheme is an instance of the AddressScheme class and it’s persisted to the database along with the country.

If we try to build an address without a country specified or with a country that doesn’t have an AddressScheme, then we throw an exception saying that we just can’t do that. Otherwise, our Validate() method will run through a list of the AddressFieldRules that are contained within the AddressScheme instance, calling the Validate(string) method on the rule for a particular field.

To be able to easily persist these rules in the database, I created an enum that parallels the generic properties of the Address class that we defined earlier:

public enum AddressField : int
{
    Unknown = 0,
    AdministrativeArea = 1,
    Country = 2,
    Municipality = 3,
    PostalCode = 4,
    PrimaryStreetLine = 5,
    SecondaryStreetLine = 6,
    SubAdministrativeArea = 7
}

Internally, the Address object maintains its state via a Hashtable of fields, where the key is the AddressField enumerated value and the value is an object, usually string but sometimes a Country. The properties of the Address object are really just strongly-typed convenience mappings onto that Hashtable as in the following example:

public virtual string AdministrativeArea
{
	get
	{
		return this[AddressField.AdministrativeArea] as string;
	}
 
	private set
	{
		this[AddressField.AdministrativeArea] = value;
	}
}

The implementation of the AddressFieldRule is fairly straightforward. I provide the capability to validate against a regular expression (such as might happen with a postal code field), to simply consider the field required (such as might happen with a city field), and to ensure that the value is a member of a list of a predefined list of values (such as might happen with a province field). The Validate(string) method figures out which rules to apply based on the properties that are set and returns an AddressFieldRuleValidationResult that describes what went right and what went wrong.

/// <summary>
/// Describes a requirement for an address field for a particular address.
/// </summary>
public class AddressFieldRule
{
    /// <summary>
    /// The list of allowed values; this may be null if no particular set of
    /// values is required.
    /// </summary>
    private IEnumerable<addressFieldAllowedValue> allowedValues;
 
    /// <summary>
    /// The description of the field and its format.
    /// </summary>
    private string description;
 
    /// <summary>
    /// The address field.
    /// </summary>
    private AddressField field;
 
    /// <summary>
    /// Whether or not the field is required.
    /// </summary>
    private bool isRequired;
 
    /// <summary>
    /// The localized name.
    /// </summary>
    private string name;
 
    /// <summary>
    /// An optional validation regular expression template.
    /// </summary>
    private string validationRegex;
 
    /// <summary>
    /// Initializes a new instance of the <see cref="AddressFieldRule"/> class.
    /// </summary>
    /// <param name="field">The field.</param>
    /// <param name="isRequired">if set to <c>true</c> [is required].</param>
    /// <param name="name">English, localized name for the field. Required.</param>
    /// <param name="description">The english description.</param>
    /// <param name="validationRegex">The validation regex. Optional.</param>
    /// <param name="allowedValues">The allowed values. Optional; if null, then
    /// no particular set of values is required.</param>
    /// <exception cref="T:System.ArgumentNullException">if the englishName is null
    /// or empty</exception>
    /// <exception cref="T:System.ArgumentOutOfRangeException">if the address field
    /// is not in a valid range</exception>
    public AddressFieldRule(
        AddressField field,
        bool isRequired,
        string name,
        string description,
        string validationRegex,
        IEnumerable<addressFieldAllowedValue> allowedValues)
    {
        if (string.IsNullOrEmpty(name))
        {
            throw new ArgumentNullException("name");
        }
 
        if (!Enum.IsDefined(typeof(AddressField), field))
        {
            throw new ArgumentOutOfRangeException("field");
        }
 
        if (!string.IsNullOrEmpty(validationRegex) &&
            allowedValues != null &&
            allowedValues.Count() > 0)
        {
            throw new ArgumentException(
                "The ValidationRegex and the AllowedValues options are mutually exclusive.");
        }
 
        this.field = field;
        this.description = description;
        this.isRequired = isRequired;
        this.name = name;
        this.allowedValues = allowedValues ?? new List<addressFieldAllowedValue>();
        this.validationRegex = validationRegex;
    }
 
    /// <summary>
    /// Initializes a new instance of the <see cref="AddressFieldRule"/> class.
    /// </summary>
    protected AddressFieldRule()
    {
    }
 
    /// <summary>
    /// Gets values that, if not empty, then the address field value
    /// should be one of the membersin the First property of each of these
    /// Pairs. The Second property, if
    /// set, is an alternative name that could be displayed in a dropdown to
    /// the user.
    /// </summary>
    public virtual IEnumerable<addressFieldAllowedValue> AllowedValues
    {
        get
        {
            return this.allowedValues;
        }
 
        private set
        {
            this.allowedValues = value;
        }
    }
 
    /// <summary>
    /// Gets a human-readable string that describes the required format for the
    /// field. It may give an example, such as "JYX 938".
    /// </summary>
    public virtual string Description
    {
        get { return this.description; }
        private set { this.description = value; }
    }
 
    /// <summary>
    /// Gets the field that this metadata describes.
    /// </summary>
    public virtual AddressField Field
    {
        get { return this.field; }
        private set { this.field = value; }
    }
 
    /// <summary>
    /// Gets a value indicating whether or not the field must be
    /// included in the address.
    /// </summary>
    public virtual bool IsRequired
    {
        get
        {
            return this.isRequired;
        }
 
        private set
        {
            this.isRequired = value;
        }
    }
 
    /// <summary>
    /// Gets an English, localized name for the field. For example, the
    /// StateOrProvince field has a name of "Province or Territory" for
    /// Canadian addresses.
    /// </summary>
    public virtual string Name
    {
        get { return this.name; }
        private set { this.name = value; }
    }
 
    /// <summary>
    /// Gets an optional validation regular expression template, which may
    /// be null.
    /// </summary>
    public virtual string ValidationRegex
    {
        get
        {
            return this.validationRegex;
        }
 
        private set
        {
            this.validationRegex = value;
        }
    }
 
    /// <summary>
    /// Gets or sets the data store identifier for the rule.
    /// </summary>
    private int? Identity
    {
        get;
        set;
    }
 
    /// <summary>
    /// Validates the value.
    /// </summary>
    /// <param name="value">The value to validate.</param>
    /// <returns>The validation result.</returns>
    public virtual AddressFieldRuleValidationResult Validate(string value)
    {
        if ((value == null || value.Trim().Length == 0) && this.IsRequired)
        {
            var message = string.Format(
                CultureInfo.CurrentCulture,
                "The field '{0}' is required, but the value was empty or blank.",
                this.Name);
 
            return new AddressFieldRuleValidationResult()
            {
                Field = this.Field,
                Original = value,
                Message = message
            };
        }
 
        if (value != null)
        {
            var input = value.Trim();
 
            if (input.Length > 0)
            {
                if (!string.IsNullOrEmpty(this.ValidationRegex) &&
                    !Regex.IsMatch(input, this.ValidationRegex))
                {
                    var message = string.Format(
                        CultureInfo.CurrentCulture,
                        "The field '{0}' has an unexpected or incorrectly formatted value.",
                        this.Name);
 
                    return new AddressFieldRuleValidationResult()
                    {
                        Field = this.Field,
                        Original = value,
                        Message = message
                    };
                }
 
                if (this.AllowedValues.Count() > 0)
                {
                    string preferred;
 
                    // If an alternative value is detected, select the preferred.
                    var alternativeAllowedValue =
                        (from av in this.AllowedValues
                            where !string.IsNullOrEmpty(av.Alternative) &&
                            av.Alternative.Equals(input, StringComparison.OrdinalIgnoreCase)
                            select av).SingleOrDefault();
                    if (alternativeAllowedValue != null)
                    {
                        // We matched an alternative value, so let's use the
                        // corresponding primary value
                        preferred = alternativeAllowedValue.Preferred;
 
                        return new AddressFieldRuleValidationResult()
                        {
                            Field = this.Field,
                            Original = value,
                            Preferred = preferred,
                            Success = true
                        };
                    }
                    else
                    {
                        // Search on the primary value, then.
                        var primaryAllowedValue =
                            (from av in this.AllowedValues
                                where av.Preferred.Equals(input, StringComparison.OrdinalIgnoreCase)
                                select av).SingleOrDefault();
                        if (primaryAllowedValue != null)
                        {
                            // We matched the primary value, let's re-assign
                            // it to get the correct casing
                            preferred = primaryAllowedValue.Preferred;
 
                            return new AddressFieldRuleValidationResult()
                            {
                                Field = this.Field,
                                Original = value,
                                Preferred = preferred,
                                Success = true
                            };
                        }
                        else
                        {
                            var message = string.Format(
                                CultureInfo.CurrentCulture,
                                "The field '{0}' was not in the list of expected values.",
                                this.Name);
 
                            return new AddressFieldRuleValidationResult()
                            {
                                Field = this.Field,
                                Original = value,
                                Message = message
                            };
                        }
                    }
                }
            }
        }
 
        return new AddressFieldRuleValidationResult()
        {
            Field = this.Field,
            Original = value,
            Preferred = value,
            Success = true
        };
    }
}

(In the case of a state or province field, where we ensure that the value belongs to a predefined list of AddressFieldAllowedValues, we also let the rule tell the consumer which value is “preferred.” For example, a user might enter “Virginia”, but the system will spit back “VA” as the preferred formatting for that value.)

As a result, that Validate() method on the builder doesn’t have to do much heavy lifting:

public IEnumerable<addressFieldRuleValidationResult> Validate()
{
    var failures = new List<addressFieldRuleValidationResult>();
 
    if (this.address.Country == null)
    {
        var result = new AddressFieldRuleValidationResult()
        {
            Field = AddressField.Country,
            Message = "The country is required."
        };
 
        failures.Add(result);
    }
    else if (this.address.Country.AddressScheme == null)
    {
        var result = new AddressFieldRuleValidationResult()
        {
            Field = AddressField.Country,
            Message = "The country must have an address scheme."
        };
 
        failures.Add(result);
    }
    else
    {
        var scheme = this.address.Country.AddressScheme;
 
        foreach (AddressField field in new ArrayList(this.address.fields.Keys))
        {
            if (field != AddressField.Country &&
                field != AddressField.Unknown)
            {
                // If the field is not Country or Unknown, let's
                // see if the scheme has a rule defined for it.
                if (scheme.HasRuleFor(field))
                {
                    // Since a rule is defined, let's validate the field.
                    var result = scheme.ValidateField(field, this.address[field] as string);
                    if (result.Success)
                    {
                        this.address[field] = result.Preferred;
                    }
                    else
                    {
                        failures.Add(result);
                    }
                }
                else
                {
                    // No rule is defined, so strip the data out of the
                    // address.
                    this.address[field] = null;
                }
            }
        }
    }
 
    return failures;
}

Where do all of these address rules come from? There is nothing more valuable than Frank’s Compulsive Guide to Postal Addresses.

Formatting

Another important property on the AddressScheme is the Formatter. This is an object that knows how to format an address for a particular country. The ToString() implementation on the Address can then just call this.Country.Formatter.Format("M", this) to get a string back that contains the correctly formatted address, where “M” is our IFormattable code for an address format that includes newlines.

My AddressFormatter object works by taking a template string that is loaded from the database. The template format is pretty simple: it’s a combination of the AddressField enumerated values (delimited by braces) and the semicolon (which indicates separate lines of the address). For example, the template for a German address is specified as:

{PrimaryStreetLine};{SecondaryStreetLine};DE-{PostalCode}  {Municipality};{Country}

Given an “S” format code (for “single-line output”), we’ll substitute the values and leave the semicolons. Given an “M” format code (for “mailing output”), we’ll substitute the values and swap the semicolons with newlines. We also remove any contiguous separators that might happen, such as when the SecondaryStreetLine is blank, to avoid having a blank line in the address.

So given an address that was built as

var country = CountryMother.Germany();
 
return new Address.Builder()
    .WithPrimaryStreetLine("42 Kaiserstrasse")
    .WithMunicipality("Berlin")
    .WithPostalCode("12345")
    .WithCountry(country)
    .Build();

our formatter, given a format code of “M”, will spit out

42 Kaiserstrasse
DE-12345 Berlin
Germany

Since the address object is immutable, we know it’s valid, so the formatter doesn’t have to worry about handling a potentially invalid address object.

The user interface of the Web site also uses the template to figure what input fields to display and what order to display them in.

Determining address kind

When shipping, it’s often important to know the “kind” of an address, as in the following enumeration demonstrates:

public enum AddressKind : int
{
    Unknown = 0,
    Commercial = 1,
    PostOfficeBox = 2,
    Residential = 3,
    Military = 4
}

Why is this important? For example, you can only use USPS to ship to a PO box or a military (APO/FPO/DPO) address, and you can’t use FedEx Home Service on a commercial address.

One mistake that I made over the years was trying to store the AddressKind right in the Address object. This was a mistake for a few reasons:

  • I don’t always care about the address kind, such as for an international address, so having it appear on every Address instance is wasteful.
  • Computing the address kind can be expensive. While figuring out a PO box or a military address just involves scanning the address fields, determining commercial or residential status means calling a Web service supplied by FedEx, which can take seconds to complete. By having the Kind property on the address, I’m left with a bad situation: I can either keep the immutability of my address object by determining the address kind for every address when I instantiate them, or I can break immutability, which negates the advantage of not having to worry about change tracking on these objects.

As a result, our new Web site simply doesn’t store the address kind at all. The kind, if needed, is determined on the fly by calling the DetermineAddressKind() method of an IAddressKindDeterminer instance.

Being flexible: Legacy data and third-party integration points

Finally, there’s one last point to be made. This new address validation and formatting is great and all, but there are two potential problems:

  • There is 10 years of less-than-satisfactory address data already in the database, and it could be in an “invalid” format according to these new guidelines. Obviously, we want to maintain backward compatibility here, as we don’t want to force our customers to re-type their address data, and we don’t want to blow up when simply trying to load the address from the database.
  • Addresses sometimes get force-fed to the system via third-party integrations such as PayPal Express Checkout or Google Checkout. I can’t really reject addresses coming from these sources, so I have to be able to “take it like it is,” even if it means it doesn’t quite adhere to my preferred format.

As a result, the new address system has to be able to grandfather in old addresses and graciously accept other addresses from integration. For accepting addresses from integration, I simply added an overload to the address builder’s Build() method that does not perform any validation–and trust myself only to use that overload in the proper scenarios.

For handling legacy address data, I simply let NHibernate set the address fields directly when loading objects from the database. That is, since the address isn’t constructed via the address builder but is instead built up via reflection, no validation is run on the legacy address and so the legacy address will still load.

One advantage of this model comes from the Address object’s immutability. Since the address can’t be changed, if the user wishes to edit an old address, they’ll be forced to bring the whole address up to snuff since a new Address instance will need to be created.

Conclusions and delusions

At the end of the day, we’re left with a robust implementation of address storage and address validation:

International address validation, input, and handling has never been better at Skiviez, and for the first time in years, I feel like it’s finally under control–thanks to a clean object design, immutability, and a few days of thinking really hard about the problem. I hope this post gives someone some insight on their own project out there. Good luck!

Windows Mobile Device Center hangs on splash screen

At Skiviez, we use two HandHeld Honeywell Dolphin D7600 Mobile Computers to pick orders. They’re devices with a barcode scanner, a touch screen, and WiFi capability. They run a Platform Builder variant of Windows CE 5.0. And in Windows XP, they used ActiveSync to connect to Windows and provide the basic service of being able to access the file system of the device (the other services provided by ActiveSync, like syncing mail, contacts, and calendars, don’t really make sense for an industrial device like this).

ActiveSync is a pretty wonky and ugly looking program, but it worked.

In Windows Vista and Windows 7, ActiveSync has been replaced–though the tradition of fragility and wonkiness continued–by an abomination called the Windows Mobile Device Center. (Indeed, it is telling that the new Windows Phone 7 drops ActiveSync/Windows Mobile Device Center completely and instead uses its own synchronization mechanism through the Zune software.) Which, like its predecessor, still worked for providing the basic service of accessing the file system and allowing Visual Studio to connect a debugger.

Except, one day, it stopped working. And I had no idea why.

Suddenly, plunking the device into its cradle would have the following behavior:

  • The device would authenticate and think it is connected.
  • The PC’s USB subsystem would recognize the device and think it is connected.
  • Windows Mobile Device Center would permanently hang at the splash screen:

Which is just awesome.

If I re-launched Windows Mobile Device Center via the Start menu, the program would open, but it would insist that the device is “Not Connected.”

So, I began to troubleshoot.

  • I looked in the Event Log. (My first mistake. I’m renaming the Event Log the “Red Herrings Viewer”.)
  • I tried enabling verbose logging for Windows Mobile Device Center (which reports nothing useful).
  • I tried uninstalling the the device and letting Windows reinstall its drivers.
  • I tried uninstalling Windows Mobile Device Center and reinstalling it.
  • I tried soft resetting the device.
  • I tried hard resetting the device.
  • I made sure that the “Windows Mobile *” firewall rules existed in Windows Firewall.
  • I tried a different USB port.
  • I tried a different USB cable.
  • I tried a different D7600 device.
  • I tried a different D7600 cradle.
  • I tried merged the registry settings from another Windows 7 machine that the device would successfully connect to.
  • I tried switching to the RNDIS connectivity model.
  • I tried granting additional permissions on the “Windows CE Services” registry keys.
  • I tried diddling with the various “Windows CE Services” registry keys.

You might think that anyone of these things would contain the solution. But you’d wrong.

The problem was that I had FileZilla FTP Server installed on my machine, configured to allow FTPS connections. (We use an FTP server to manage images and files on the Skiviez Web site, and I had a local copy on my machine from when I was testing the configuration.)

Now, some people might ask “Why the hell would an FTP server break Windows Mobile device connectivity?” Apparently, Windows Mobile Device Center uses port 990 to orchestrate the connection.

Port 990 just so happens to be the standard control port for FTPS connections. If anything else is consuming port 990, then the Windows Mobile Device Center either hang, reports that the device is not connected, or stupidly tries to keep connecting to it. (A message like “whoops port 990 is in use or does not seem to be a mobile device” would go a long way.)

So make sure nothing is using port 990; then go pour yourself a g&t.

Inside Skiviez: Catalog Listings and Search Queries

Last Tuesday, I rolled out a new release of the Skiviez Web site. It is the most significant technical change at Skiviez since 2007 and brings with it a lot of improvements in how it operates against our company’s 10-year-old database.

One of these new features is a fast, faceted product catalog. Faceting is the technical name for something you’ve seen in many e-commerce stores like NewEgg, Zappo’s, and Lowe’s: a list of links that quickly allow you to drill down to the content that you’d like to see.

Trying to do this kind of thing efficiently in a traditional relational database is not going to happen unless you want to start adding and removing indexes on the fly or go full on Entity-Attribute-Value, which somewhat defeats the purpose of using a relational database. You lose efficient indexing, you lose type information, you lose, well, the relations.

Instead, Skiviez now uses Solr–an open source search platform that is based on Lucene, a text search engine library–for managing catalog listing pages and search pages.

Solr from 10,000 feet

Solr is not relational. It doesn’t have tables. The schema is flat; all “documents” in the Solr store share the same schema. From a relational perspective, you could think of it as one table with many columns, except that not every row in the table uses all of the columns. (This isn’t how it internally works at all, but it’s a useful metaphor.)

Some fields are “stored,” which means you can get back the data that you put in. Some fields are “indexed,” which means that you can query on them directly and use them for faceting. And some fields are “multivalued”, which means that they can hold more than one value at a time. For example, our <sizeInStock> field can hold S, M, and L values all at once for a particular product, and just L for another.

Why would you ever want a value to not be “stored”? Why add data that you can’t get back? Well, in Solr, storing and indexing are distinct concepts, so you can a value that is not stored, just indexed. A good example from the Skiviez Solr schema would be the salesVolume field, which indicates how many units of a particular item were sold in the past week:

		<field
			name="salesVolume"
			type="sint"
			indexed="true"
			stored="false" />

This is just an integer, a number of units sold. In my UI, I’m not going to ever actually display that number–that would reveal a little too much information about our business. But I still want our users to be able to sort catalog listings by “Top Selling” products. By creating an index, we enable sorting; by disallowing storage, we save some space.

The Skiviez Solr schema defines a “type” field, some fields that are shared among all document types, and a few type-specific fields. This works well because the only intended use of Solr is catalog listings and searching; if I had to store more disparate kinds of data, I might create several Solr schemas.

<!--
	This field is used to indicate the type of data that is stored within
	the document. Since the Skiviez Solr instance represents a variety of
	different types, this field helps to indicate what kind of data you are
	looking at.
 
	Types that are currently understood the Web site include:
 
		BLOGARTICLE
		BRAND
		LINE
		PRODUCT
		PRODUCTSUMMARY
                PROMOTION
		STYLE
		WEBPAGE
-->
<field
	name="type"
	type="string"
	indexed="true"
	stored="true"
	required="true" />

Why is this useful? Well, the search portion of the Skiviez Web site can use the <type> field and the fields that are used by all of the different document types to quickly search a wide variety of sources:

In this case, we’re looking at our <type>, <urlKey>, <name>, and <image> fields. The Web site can use this to quickly sort information and build links to content.

The PRODUCT type uses a wealth of product-specific fields, however, and when we query to display catalog pages, we tell Solr to only return documents of type PRODUCT. I could have created a second Solr schema for this purpose, but why bother?

Getting data into Solr

The traditional SQL database is still the “authoritative” data store, and the Solr indexes are read-only snapshots of that data. This means that the data coming from Solr is always slightly stale, so I had to ask myself:

  • How stale is too stale?
  • When do I value speed or querying over staleness?

Part of the new Skiviez Web site is a Windows service that I call the “Worker.” It uses Quartz.NET to execute C# IJob implementations periodically. You can think of them as traditional scheduled tasks in Windows; the only difference is that I am explicitly managing them in code, using the same object-oriented model of our domain in those jobs, and, as long as the service is installed, I don’t need to worry about configuring scheduled tasks.

Every three hours, one of those IJobs that gets executed is the RefreshSolrIndexesJob, and all that job does is ping an HttpWebRequest over to http://solr.example.com/dataimport?command=full-import, where solr.example.com is placed with the FQDN of our internal Solr server. This is because I use Solr’s built-in DataImportHandler to actually suck in the data from the SQL database; the job just has to “touch” that URL periodically to make the sync work. Because the DataImportHandler commits the changes periodically, this is all effectively running in the background, transparent to the users of the Web site. And because the Skiviez product catalog is reasonably small (a few thousand items), we can blow away the whole Solr index and re-build it in fewer than two minutes.

There’s also a function in our backend application that allows employees to trigger the index rebuilding immediately; this can happen when new product arrives in the warehouse and we want to get it up on the Web site right away.

The DataImportHandler is built into Solr, and configuring it is a little confusing because it uses some strange terminology. It just takes an XML configuration file, and whenever you ping its request handler, it performs synchronization tasks based on what has been specified in its configuration.

      <entity
        name="brands"
        dataSource="undiesDatabase"
        transformer="TemplateTransformer"
        query="
            SELECT
                b.ID AS BrandId,
                b.[Name] AS BrandName,
                b.Description AS BrandDescription,
                b.UrlKey AS BrandUrlKey,
                CASE b.Active
                  WHEN 'Y' THEN 'True'
                  ELSE 'False'
                END AS BrandIsActive
            FROM na.brands AS b
            ORDER BY b.[Name];">
        <field column="id" template="BRAND-${brands.BrandId}" />
        <field column="type" template="BRAND" />
        <field name="identity" column="BrandId" />
        <field name="name" column="BrandName" />
        <field name="description" column="BrandDescription" />
        <field name="isActive" column="BrandIsActive" />
        <field name="urlKey" column="BrandUrlKey" />
      </entity>

I say the terminology is confusing because in <field> elements, the @column attribute is the name of the Solr field and the name of the column in the SQL result set. But if the @name attribute is specified, then the @column attribute is the name of the column in the result set and the @name attribute is the name of the Solr field. It’s confusing because some <field> elements don’t pull directly from the result set, instead relying on a “transformer.” You would expect to just specify a @name for those fields, but specifying just @column is correct. In the above example, the “Template Transformer” is used on the id field to format Solr identities in the format of BRAND-127, where 127 is the SQL database’s primary key for the brand.

(And yes, attributes can have newlines in them. I’m not sure how I went through years of programming without realizing this, but once I realized that I could do it, it made reading those SQL queries much easier! This example is by far the shortest query for pulling Solr data from the SQL database; the one for products is over 200 lines long.)

Querying data: how stale is too stale?

This does mean that information in the Skiviez product catalog can be up to three hours stale. A user might click a link for “Medium In Stock (3)” on the catalog page (since this kind of faceted data is generated by querying Solr) but then see on the product detail page that no mediums are in stock (since on this page, the quantity information is one of the few things not cached and queried directly against the database). This is annoying, but generally rare in our particular scenario (we are a reasonably small business and not that high traffic), and it will be fixed up in 3 hours anyway when we rebuild the whole index again from scratch, so I have accepted this as a reasonable trade-off.

(If I really wanted to solve this problem, there are a few approaches that I could take. I could use domain events to fire off partial updates to the Solr index whenever a Save/Update operation occurred on an Item or ItemGroup in the Skiviez domain model, or I could insert a record into a table named, say, dbo.IdentitiesOfStuffThatNeedsUpdatingInSolr, and have an IJob that reads that list and executes partial updates every minute. And even if I did these things, I’d still probably want to do a periodic “blow it all away and refresh” in case one of those partial updates failed in the background.)

As for querying this data from Solr, there are a few approaches that I could have taken. One is to hide the fact that Solr exists entirely via the methods of a repository-like class: the Get*() methods would access Solr, and the Create/Update/Delete methods would access the database. I didn’t like this approach because my Solr schema is already shamelessly tailored to the UI that will be accessing that data, as it should be–I’ve already made the decision to use Solr to provide easy faceting, sorting, and fast display of information, so I might as well use it to its fullest extent. This means making it explicit in code as to when I mean to access Solr and when I mean to access the up-to-date, non-cached database object.

In my case, I ended up using NHibernate to do the CRUD access (e.g., loading an ItemGroup, futzing with its pricing rules, and then saving it back), forgoing the repository pattern because I don’t typically see its value when NHibernate and its mappings are already abstracting the database. (Sometimes abstracting out NHibernate is useful, however, as we’ll see below.)

When querying for data, I know pretty well if I’m using it for catalog-oriented purposes (in which I care about speed and querying features) or for displaying in a table on a back-end administrative application (I care about currency). For querying on the Web site, I have an interface called IListingQuery. It has a Search() method that accepts a ListingRequest where I define some parameters–selected facets, search terms, page number, number of items per page, etc.–and gives back a ListingResponse–remaining facets, number of results, the results on this page, etc. This interface is pretty boring stuff.

Where it gets interesting is that the implementation of IListingQuery that gets dependency injected into my Web site’s ProductsController is using a list of IListingQueryStrategys underneath. The default strategy, the SolrListingQueryStrategy, hits Solr directly via a plain old-fashioned HttpWebRequest and parses the XML in the HttpWebResponse (which is much easier to use, in my humble opinion, than some of the Solr client libraries).

If the Solr-based strategy throws an exception or vomits for some reason, then the DatabaseListingQueryStrategy runs next and hits the database directly–although it ignores some parameters of the ListingRequest, like faceting or advanced text searching, since that is inefficient to do there and is the whole reason I am using Solr in the first place. The idea is that usually Solr is answering my search requests quickly in their full-featured glory, but if something blows up and Solr goes down, then the catalog pages of the site can still function in “reduced-functionality mode” by hitting the database with a limited feature set directly. (As implemented on the Skiviez site today, you would realize that this happened if the filters list on the left-hand side becomes empty and search parameters are ignored.)

(The explosion of classes that you see in the screenshot above is mostly due to testability. For example, Solr works by sending long, complicated GET requests to it. So I’ve pulled out of the URI-building functionality into its own class; it serves a single purpose and can be tested as such. The actual SolrListingQueryStrategy class implementation is very short, with most of the work delegated to other classes within that folder.)

Conclusions and delusions

What is important is that I have made explicit in code that this is a search–by using IListingQuery instead of NHibernate–so the database-based strategy can take some liberties in ignoring some of the search parameters without worrying about affecting some of its down-level callers too severely. The decision to perform a query against a possibly-stale data store versus the authoritative data store has been made explicit–if I want fast, possibly stale data with advanced search features, I use IListingQuery. If I want slow, up-to-date data with insert/update/delete capability, I use NHibernate’s named queries. And if I make a change in the SQL database, I know that the out-of-process Worker service will update Solr eventually, making things eventually consistent.

The end result? Fast catalog pages for our customers that gracefully fall back to the old behavior when something doesn’t work.

SBS 2008 restarts unexpectedly when backup starts

Today, our SBS 2008 server restarted itself at 5:00 p.m. sharp. I mean on the dot.

That was disturbing enough in itself. When the system came back up, it helpfully asked me to type in “Why did the system shut down unexpectedly?” and I enthusiastically typed in “Fuck if I know you jackass.” Then, I headed straight for the event log.

The event log was full of terrifying messages such as

The system failed to flush data to the transaction log. Corruption may occur.

or

An error was detected on device \Device\Harddisk1\DR2  during a paging operation.

Hmm. There was no blue screen, no bug check, no minidump. It was as if the power had been cut.

I looked accusingly at the UPS since I have had problems with bad UPSs interrupting the power supply in the past. I held down its self-test button, it made that satisfying buzzing noise, and … everything stayed up.

But while crouched down next to the UPS, I heard an odd swishing noise, like a tiny man was running his finger across a sheet of Saran Wrap. Then I noticed that the external Western Digital hard drive that we use for SBS 2008 backup was doing its swooshing-lights mode, not its solid-lights mode, and I knew from previous experience that it only did that when it was starting up or shutting down.

I had a hunch–in SBS 2008, backup uses Volume Shadow Copy, and I had seen similar disk errors when another of our external hard drives cooked itself (though instead of rebooting, that server became unresponsive). I unplugged the external drive and the event log messages stopped.

I then promptly threw the external hard drive into the trash, drove straight to Best Buy and bought a new external hard drive with the company credit card. (Aside: Why do 90% of external hard drives come with craptastic backup software or “one-touch” buttons? I just want a drive in a box. I finally found one in the “Seagate Expansion” line.)

Then I plugged in the new external hard drive and re-ran the “Configure server backup” wizard from the SBS 2008 console. I unchecked the old, now non-existent drive, checked the new one, and off it went. And all seems happy now. (I ran chkdsk for good measure on the system and data drives and they checked out OK, so it does all seem related to the external backup drive cooking itself.)

Should it be capable of handling faulty backup hardware more gracefully? Sure. And I wish that SBS 2008 had the option to use the old ntbackup utility because then at least you could backup to network-attached storage. It’s been my experience that external hard drives really are not that reliable and have an average lifespan of only about two years, but maybe I have just been glaring at them the wrong way.

I Hate Software

Today, I realized that I hate software.

First, it was FedEx

It has been 45 days since I reported to FedEx that the new SmartPost integration offered by their Web services simply does not work when an EPL2 label is requested:

I Hate Software, Part 1

The problem is that the USPS Delivery Confirmation barcode does not print when an EPL2 label is requested. That’s because in the EPL2 document that FedEx sends back, quotation marks in the barcode command are not properly escaped:

A90,473,0,3,1,2,N,"ZIP - USPS DELIVERY CONFIRMATION e-VS"
B50,535,0,1,3,4,175,N,""F1"42023221"F1"0000000000000000000000"

That should read "\"F1\"420..." for it to print correctly. (The ZPLII version of the label works fine.)

I understand that larger corporations have fixed software release cycles and different bug triage mechanisms, but having to through three levels of support and waiting 45+ days to simply say “this simply doesn’t work please add a backslash” is somewhat frustrating.

Then, there was UPS

Similarly, I’ve discovered a recently-introduced error with the UPS Web services. If you request a 4″ x 6″ EPL shipping label, then the UPS Web service will happily ignore request and send back a 4″ x 8″ label instead. That’s because they’re sending back the wrong width and height settings in the label response:

q795
Q1600,24

That would be 795 dots / 203 dpi == 3.91″ wide (OK) and 1600 dots / 203 dpi == 7.88″ high (hmm, not what I asked for). (The ZPLII version of the label works fine.)

Finally, SmartFTP sent me over the edge

Since updating to the latest version of SmartFTP, I found myself frequently being unable to connect to the Skiviez private FTP server that is used to manage software updates, e-mails, and product catalog images. It would fail about 80% of the time with

[13:51:38] 234 Using authentication type TLS
[13:51:38] SSL: Error (Error=0x80090308).

I’m sure that this means something. And even if I knew what it was, sometimes it would just work without me changing anything (about 10 percent of the time). Further still, I hadn’t touched the FTP server for some time.

That’s when I noticed in the SmartFTP change log:

FTP: Completely rewrote SSL layer

Sigh. Downgraded to the version prior to that changelog entry, and it works fine.

Conclusions and delusions

I have certainly done my part to contribute buggy, crappy software to the world. I continue to spew out more buggy, crappy software with each passing day. But it is extra depressing and disheartening to know that I, some idiot working at a small company, can run across such simply-does-not-work bugs (and, in FedEx’s case, never-actually-worked-ever bugs) in software produced by large corporations and used by presumably hundreds to thousands of people around the world.

Quick Tip: Sharing a FedEx ZP 500 printer attached to a Windows XP computer to a Windows Vista/7 machine

At Skiviez/WFS, we have a FedEx ZP 500 ZPL printer on the shipping desk. This is what FedEx is migrating everyone to now that the tried-and-true Zebra/Eltron LP2844* series is getting a little long in the tooth. (Along with a gradual migration to ZPL over EPL2, but that’s a rant for another day.)

FedEx ZP 500

The FedEx ZP 500 is a bit of a white elephant in that Zebra doesn’t mention it on their Web site; it’s some sort of special contract job with FedEx to produce and jointly brand these devices. It’s probably just a re-branded version of the Zebra GK420d, but in reality we have the printer manufacturer pretending that they don’t make the printer (e.g., “call FedEx for support”) and a shipping company who has no idea how to support the printer (e.g., “call Zebra for the printer driver”). But I’m getting distracted.

The real issue was that the shipping desk is running Windows XP and shares the printer via the native Windows printer sharing mechanism so that it’s listed in the Active Directory. I do this so that I can run integration tests from the workstation in my office and test the label generation functions of our software without needing to have a thermal label printer hooked up to my workstation solely for this purpose. These printers aren’t cheap, you know.

New operating system, new drivers required

I recently upgraded my workstation to Windows 7. When I tried to add the shared FedEx thermal printer, I was greeted with error code 0x00000007a along with an error message that generally amounted to “something didn’t work.” I suspected a driver problem since Vista is when Microsoft locked down on the mandate that printer drivers run in user mode, not kernel mode–which is a good thing in terms of system stability, since a poorly-written printer driver can no longer trigger a BSOD and a reboot, but a bad thing in terms of backwards compatibility.

The problem is that

  • the Windows XP machine is offering the Windows XP drivers to my Windows 7 install;
  • the Windows 7 printer wizard doesn’t give me a chance to supply my own printer drivers, and instead happily installs the XP ones, which don’t work;
  • the FedEx-supplied Vista drivers are mutually exclusive in terms of compatibility with the XP drivers, so I can’t install them on the XP machine via the Server Properties thingie; and
  • even if I could do that, I am hesitant to dick around with the printer drivers on a critical machine.

Adding the printer

The solution was to add a printer in a different, counter-intuitive way. Here’s what I did:

  1. From the Windows 7 Control Panel, I went to “View devices and printers” and then “Add a printer”.
  2. When asked “What type of printer do you want to install?” I chose Add a local printer, even though I know full damn well that I’m not actually adding a local printer.
  3. For “Choose a printer port,” I chose “Create a new port” with “Type of port” set to “Local Port”.
  4. In the “Enter a port name” dialog, I entered the UNC share name for the printer, which looks like \\{MACHINE-NAME}\{PRINTER-SHARE-NAME}. In my case, it was \\ASHWHWS003\FedEx ZP 500 Plus.
  5. When asked for a driver, I chose “Have Disk” and navigated to the *.inf file in the ZD directory of the Zebra Designer drivers available from the FedEx Web site.

This allowed me to use locally-available printer drivers on a printer attached to another machine. Good luck!