Sunday, July 12, 2009

The address's State field may contain more than just the 50 states

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/the_addresss_state_field_may_contain_more_than_just_the_50_.htm]

Most business applications eventually ask the user to enter an address. There's the user's address, shipping address, their company's address, maybe an emergency contact's address, address history, travel-related addresses, financial addresses, etc... Most addresses have a city, state, and zip. While city and zip seem simple (more on that later), many devs initially expect the "State" field to be simple - perhaps a two-character column that can store the 50 US states. However, it can quickly balloon to something much more complicated (especially if you're troubleshooting some legacy app). Besides the standard states, it could contain:

Military codes (reference)

Armed Forces AfricaAE
Armed Forces Americas (except Canada)AA
Armed Forces CanadaAE
Armed Forces EuropeAE
Armed Forces Middle EastAE
Armed Forces PacificAP

US Possessions (reference)

AMERICAN SAMOAAS
DISTRICT OF COLUMBIADC
FEDERATED STATES OF MICRONESIAFM
GUAMGU
MARSHALL ISLANDSMH
NORTHERN MARIANA ISLANDSMP
PALAUPW
PUERTO RICOPR
VIRGIN ISLANDSVI

Perhaps Canadian provinces? (reference)

AlbertaAB 
British ColumbiaBC 
ManitobaMB 
New BrunswickNB 
Newfoundland and LabradorNL 
Northwest TerritoriesNT 
Nova Scotia NS
NunavutNU 
OntarioON 
Prince Edward IslandPE 
QuebecQC 
SaskatchewanSK 
Yukon YT

Generic codes to indicate international use?

Foreign CountryFC
Out of CountryOC
Not ApplicableNA

Or, specific applications may try their own proprietary international mapping, like "RS" = Russia. This might work if you're only doing business with a handful of countries, but it doesn't scale well to the 200+ (?) existing other countries (i.e. I would not recommend this. Use a "Country" field instead is feasible).

Special codes to indicate an unknown, or empty state?

XX
  
*
..

Perhaps, for some reason, the application developer isn't storing just 2-char codes, but rather integer ids that map to another "States" table, so you see numbers like "32" instead of "NY" ("New York")?

Or, even worse, they're shoving non-state related information into the state column as a hack that "made something else easier".

How many distinct entries could you have?

  • With 26 letters, you've only got 26 ^ 2 = 676 options.
  • If you use numbers too, you've got (26 + 10) ^ 2 = 1296 options.
  • If you start using lower case letters (SQL is case-insensitive, but maybe this impacts managed code), then you've got (2 * 26 + 10) ^ 2 = 3844 options.
  • Add in some special characters (such as spaces, periods, asterisks, hyphen, underscores, etc...), maybe 10 of then (if the column isn't validated on strictly alpha-numeric), and you've got (2 * 26 + 10 + 10) ^2 = 5184 options.

That's potentially 100 time more than just the 50 US states. Of course, for new development, we'd all prefer some clearly-defined schema with referential integrity and a business-sensible range of values. However the real world of enterprise applications is messy, and you have to be prepared to see messy things.

It sounds simple, but that innocent "state" field can quickly get very complex.

No comments:

Post a Comment