Rechner Architektur SoSe 2000 Projekt

Floating Point Java Simulation (acording to the IEEE 754 standard)


           IEEE Standard 754 floating point is the most common representation today for real numbers on computers, including Intel-based PC's, Macintoshes, and most Unix platforms. 
           There are several ways to represent real numbers on computers. Fixed point places a radix point somewhere in the middle of the digits, and is equivalent to using integers that represent portions of some unit. For example, one might represent 1/100ths of a unit; if you have four decimal digits, you could represent 10.82, or 00.01. Another approach is to use rationals, and represent every number as the ratio of two integers. Floating-point representation - the most common solution - basically represents reals in scientific notation. Scientific notation represents numbers as a base number and an exponent. For example, 123.456 could be represented as 1.23456 x 102. In hexadecimal, the number 123.abc might be represented as 1.23abc x 162. Floating-point solves a number of representation problems. Fixed-point has a fixed window of representation, which limits it from representing very large or very small numbers. Also, fixed-point is prone to a loss of precision when two large numbers are divided. Floating-point, on the other hand, employs a sort of "sliding window" of precision appropriate to the scale of the number. This allows it to represent numbers from 1,000,000,000,000 to 0.0000000000000001 with ease. 
           IEEE floating point numbers have three basic components: the sign, the exponent, and the mantissa. The exponent base (2) is implicit and need not be stored.  The following table shows the layout for single (32-bit) and double (64-bit) precision floating-point values. The number of bits for each field are shown (bit ranges are in square brackets): 
 
Sign Exponent Mantissa Bias
Single Precision 1 [31] 8 [30-23] 23 [22-00] 127
Double Precision 1 [63] 11 [62-52] 52 [51-00] 1023

          The sign bit is as simple as it gets. Zero denotes a positive number; one denotes a negative number. Flipping the value of this bit flips the sign of the number. 
          The exponent field needs to represent both positive and negative exponents. To do this, a bias is added to the actual exponent in order to get the stored exponent. For IEEE single-precision floats, this value is 127. Thus, an exponent of zero means that 127 is stored in the exponent field. A stored value of 200 indicates an exponent of (200-127), or 73. For reasons discussed later, exponents of -127 (all zeros) and +128 (all ones) are reserved for special numbers. For double precision, the exponent field is 11 bits, and has a bias of 1023. 
          The mantissa, also known as the significand, represents the precision bits of the number. Any number can be expressed in scientific notation in many different ways. For example, the number five can be represented as any of these: 
                  5.00 x 10+0   or   0.05 x 10+2   or   5000 x 10-3 
          Because of this, floating-point numbers are stored in normalized form. This basically puts the radix point after the first non-zero digit. In normalized form, five is represented as 5.0 x 100. A nice little optimization is available to us with a base of two, since the only non-zero digit possible is one. We can toss away the one and just assume that it exists, giving us one extra bit of precision for free. Thus, the mantissa has effectively 24 bits of resolution. 
             So, to sum up: 
             1.The sign bit is 0 for positive, 1 for negative. 
             2.The exponent's base is two. 
             3.The exponent field contains 127 plus the true exponent for single-precision, or 1023 plus the true exponent for double precision. 
             4.The first bit of the mantissa is assumed to be 1, and is not stored explicitely. 

          The picture in the right up corner is a block diagram of an arithmetic unit dedicated to floating point addition. The coresponding steps of the algorith are presented in the picture above. First the exponent of the one operand is substracted from the other using the small ALU to determine which is larger and by how much. This difference controls the three multiplexors; from the left to right they select the larger exponent, the significant of the smaller number, and the significant of the larger number. The smaller significant is shifted right and then the significands are added together using the big ALU. The normalization step the shifts the sum left or right and increments or decrements th eexponent. Rounding then creates the final result. 

References: 
           IEEE Computer Society (1985), IEEE Standard for Binary Floating-Point Arithmetic, IEEE Std 754-1985. 
           http://stdsbbs.IEEE.org


Here you may find also the source code as a zip file.

It seems that you have some problems seeing this applet. Try to get a new version of Netscape or Internet Explorer. This applet was compiled with Java 1.2.1 .

If you have problems to run the applet in your browser please try the original page in your appletviewer.


Author: Bogdan Stanca