Sunday, March 13, 2005

Understanding the Number of Bytes in Integral Types

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/understanding_the_number_of_bytes_in_integral_types.htm]

As I prepare to teach about simple types in C#, one of the questions people will ask is "Why are there so many different integer types?"

C# supports nine integral types: sbyte, byte, short, ushort, int, uint, long, ulong, and char. Ignoring char, the table below shows the range and .Net Alias for each:

TypeAlias forAllowed Values

sbyte

System.SByte

Integer between –128 and 127.

byte

System.Byte

Integer between 0 and 255.

short

System.Int16

Integer between –32768 and 32767.

ushort

System.UInt16

Integer between 0 and 65535.

int

System.Int32

Integer between –2147483648 and 2147483647.

uint

System.UInt32

Integer between 0 and 4294967295.

long

System.Int64

Integer between – 9223372036854775808 and 9223372036854775807.

ulong

System.UInt64

Integer between 0 and 18446744073709551615.

The basic concept is that the larger the range of allowed values, the more memory the type requires to store that range. Range includes both magnitude as well as sign (+ or -). Therefore .Net provides different types to optimize for this. For example, no need to waste ulong when you only need to store the values 1 through 10.

Ultimately the types are based on binary storage. So storing the value 25 would really be stored as 11001, or: (1*2^4) + (1*2^3) + (0*2^2) + (0*2^1) + (1*2^0), or, more simply: 16 + 8 + 0 + 0 + 1. The following Excel spreadsheet demonstrates these calculations.

Notice that the range for unsigned integers start at 0. So byte (which has 8 bits) can store 2^8, or 256 values. This is spent covering 0 through 255. In order to make the type signed, or capable of storing negative values, there are two approaches:

  1. Use an extra bit to indicate the sign.
  2. Shift the entire range from 0 to some arbitrary negative number.

The second approach actually gives the larger range. For example, if sbyte used the first approach, it would use 1 bit for the sign, leaving only 7 bits for the magnitude. This would have a range of 0-127 (note: 2^7-1 = 127) in the positive direction, and 0-127 in the negative direction, for a total range of -127 to + 127. However, both the positive and negative directions cover 0, wasting a value. Therefore the datatype gets a larger range by simply offsetting the positive range: -128 to +127 instead of just -127 to + 127.

Given the power of today's computers, along with the simplicity of most applications, many developers can get away with just using the default int type (System.Int32) for all their integer needs. However it is still good to know what's going on behind the scenes because:

  • It is a common Computer Science principle and transcends merely the C# language.
  • You may work on an application where it does matter.
  • It will help you understand other applications that use these types.

No comments:

Post a Comment