[This was originally posted at
http://timstall.dotnetdevelopersjournal.com/the_addresss_state_field_may_contain_more_than_just_the_50_.htm]
Most business applications eventually ask the user to enter an address. There's the user's address, shipping address, their company's address, maybe an emergency contact's address, address history, travel-related addresses, financial addresses, etc... Most addresses have a city, state, and zip. While city and zip seem simple (more on that later), many devs initially expect the "State" field to be simple - perhaps a two-character column that can store the 50 US states. However, it can quickly balloon to something much more complicated (especially if you're troubleshooting some legacy app). Besides the standard states, it could contain:
Military codes (reference)
Armed Forces Africa | AE |
Armed Forces Americas (except Canada) | AA |
Armed Forces Canada | AE |
Armed Forces Europe | AE |
Armed Forces Middle East | AE |
Armed Forces Pacific | AP |
US Possessions (reference)
AMERICAN SAMOA | AS |
DISTRICT OF COLUMBIA | DC |
FEDERATED STATES OF MICRONESIA | FM |
GUAM | GU |
MARSHALL ISLANDS | MH |
NORTHERN MARIANA ISLANDS | MP |
PALAU | PW |
PUERTO RICO | PR |
VIRGIN ISLANDS | VI |
Perhaps Canadian provinces? (reference)
Alberta | AB |
British Columbia | BC |
Manitoba | MB |
New Brunswick | NB |
Newfoundland and Labrador | NL |
Northwest Territories | NT |
Nova Scotia | NS |
Nunavut | NU |
Ontario | ON |
Prince Edward Island | PE |
Quebec | QC |
Saskatchewan | SK |
Yukon | YT |
Generic codes to indicate international use?
Foreign Country | FC |
Out of Country | OC |
Not Applicable | NA |
Or, specific applications may try their own proprietary international mapping, like "RS" = Russia. This might work if you're only doing business with a handful of countries, but it doesn't scale well to the 200+ (?) existing other countries (i.e. I would not recommend this. Use a "Country" field instead is feasible).
Special codes to indicate an unknown, or empty state?
Perhaps, for some reason, the application developer isn't storing just 2-char codes, but rather integer ids that map to another "States" table, so you see numbers like "32" instead of "NY" ("New York")?
Or, even worse, they're shoving non-state related information into the state column as a hack that "made something else easier".
How many distinct entries could you have?
- With 26 letters, you've only got 26 ^ 2 = 676 options.
- If you use numbers too, you've got (26 + 10) ^ 2 = 1296 options.
- If you start using lower case letters (SQL is case-insensitive, but maybe this impacts managed code), then you've got (2 * 26 + 10) ^ 2 = 3844 options.
- Add in some special characters (such as spaces, periods, asterisks, hyphen, underscores, etc...), maybe 10 of then (if the column isn't validated on strictly alpha-numeric), and you've got (2 * 26 + 10 + 10) ^2 = 5184 options.
That's potentially 100 time more than just the 50 US states. Of course, for new development, we'd all prefer some clearly-defined schema with referential integrity and a business-sensible range of values. However the real world of enterprise applications is messy, and you have to be prepared to see messy things.
It sounds simple, but that innocent "state" field can quickly get very complex.