IEEE Standard 754 floating point is the most common representation today
for real numbers on computers, including Intel-based PC's, Macintoshes,
and most Unix platforms.
There are several ways to represent real numbers on computers. Fixed point places a radix point somewhere in the middle of the digits, and is equivalent to using integers that represent portions of some unit. For example, one might represent 1/100ths of a unit; if you have four decimal digits, you could represent 10.82, or 00.01. Another approach is to use rationals, and represent every number as the ratio of two integers. Floating-point representation - the most common solution - basically represents reals in scientific notation. Scientific notation represents numbers as a base number and an exponent. For example, 123.456 could be represented as 1.23456 x 102. In hexadecimal, the number 123.abc might be represented as 1.23abc x 162. Floating-point solves a number of representation problems. Fixed-point has a fixed window of representation, which limits it from representing very large or very small numbers. Also, fixed-point is prone to a loss of precision when two large numbers are divided. Floating-point, on the other hand, employs a sort of "sliding window" of precision appropriate to the scale of the number. This allows it to represent numbers from 1,000,000,000,000 to 0.0000000000000001 with ease. IEEE floating point numbers have three basic components: the sign, the exponent, and the mantissa. The exponent base (2) is implicit and need not be stored. The following table shows the layout for single (32-bit) and double (64-bit) precision floating-point values. The number of bits for each field are shown (bit ranges are in square brackets):
The sign bit
is as simple as it gets. Zero denotes a positive number; one denotes a
negative number. Flipping the value of this bit flips the sign of the number.
The picture in the right up corner is a block diagram of an arithmetic unit dedicated to floating point addition. The coresponding steps of the algorith are presented in the picture above. First the exponent of the one operand is substracted from the other using the small ALU to determine which is larger and by how much. This difference controls the three multiplexors; from the left to right they select the larger exponent, the significant of the smaller number, and the significant of the larger number. The smaller significant is shifted right and then the significands are added together using the big ALU. The normalization step the shifts the sum left or right and increments or decrements th eexponent. Rounding then creates the final result. References:
|
|