MASTER THESIS
DESIGN OF SINGLE
PRECISION FLOAT ADDER
(32-BIT NUMBERS)
ACCORDING TO IEEE 754
STANDARD USING VHDL
Arturo Barrabés Castillo
Bratislava, April 25th 2012
Supervisors: Dr. Roman Zálusky
Prof. Viera Stopjaková
Fakulta Elecktrotechniky a Informatiky
Slovenská Technická Univerzita v Bratislave
INDEX
1.1.
1.2.
1.2.1.
1.2.2.
1.2.3.
Index ........................................................................................................3
Resum.......................................................................................................5
Zhrnutie ....................................................................................................5
Abstract ....................................................................................................5
Chapter 1: Introduction ........................................................................ 7
Floating Point Numbers ................................................................7
The Standard IEEE 754 ................................................................8
Overview..............................................................................8
Binary Interchange Format Encodings ......................................9
Precision and Rounding ........................................................10
Chapter 2: Code Development ............................................................ 13
32-bits Floating Point Adder Design .............................................13
Addition/Subtraction Steps ...................................................13
Block Diagram.....................................................................15
Blocks Design............................................................................17
Pre-Adder Design ................................................................17
Adder Design ......................................................................17
Standardizing Design ...........................................................19
Chapter 3: Pre-Adder.......................................................................... 21
Special Cases............................................................................21
n_case Block.......................................................................21
Subnormal Numbers ..................................................................25
n_subn Block ......................................................................25
3.3. Mixed Numbers .........................................................................27
comp Block .........................................................................27
zero Block ..........................................................................30
shift_left/shift Block.............................................................32
norm Block .........................................................................35
Normal Numbers .......................................................................38
comp_exp Block ..................................................................38
shift Block ..........................................................................41
n_normal Block ...................................................................41
3.3.1.
3.3.2.
3.3.3.
3.3.4.
2.2.1.
2.2.2.
2.2.3.
3.4.1.
3.4.2.
3.4.3.
3.1.
3.2.
3.1.1.
3.2.1.
2.1.
2.1.1.
2.1.2.
2.2.
3.4.
- 3 -
Arturo Barrabés Castillo
3.5.
3.5.1.
3.5.2.
3.5.3.
5.1.
5.1.1.
5.1.2.
5.2.
4.2.1.
4.2.2.
4.2.3.
4.2.4.
Pre-Adder .................................................................................44
selector Block .....................................................................44
MUX/DEMUX Blocks .............................................................48
preadder Block ....................................................................50
Chapter 4: Adder ................................................................................ 55
Adder.......................................................................................55
Signout Block......................................................................55
Adder Block ........................................................................59
Block_Adder Block ...............................................................62
Standardizing Block ...................................................................65
round Block ........................................................................65
shift_left/zero Block .............................................................65
block_norm Block ................................................................67
vector Block........................................................................70
Chapter 5: 32-Bits Floating Point Adder ............................................. 73
Floating Point Adder...................................................................74
Mux_fpadder Block ..............................................................74
fpadder Block......................................................................74
Simulations...............................................................................77
Special Cases......................................................................77
Normal Numbers .................................................................80
Subnormal Numbers ............................................................81
Mixed Numbers ...................................................................82
Chapter 6: Results .............................................................................. 83
Errors ......................................................................................83
Gap between Numbers .........................................................83
Rounding or Truncation ........................................................85
Floating Point Addition .........................................................86
Results analysis.........................................................................86
Subnormal Numbers ............................................................86
Mixed Numbers ...................................................................88
Normal Numbers .................................................................89
Conclusions ..............................................................................91
Chapter 7: Bibliography ...................................................................... 93
Annex: VHDL Code.............................................................................. 95
5.2.1.
5.2.2.
5.2.3.
5.2.4.
6.2.1.
6.2.2.
6.2.3.
6.1.
6.1.1.
6.1.2.
6.1.3.
6.2.
4.1.
4.1.1.
4.1.2.
4.1.3.
4.2.
6.3.
- 4 -
RESUM
La aritmètica de punt flotant és, amb diferència, el mètode més utilitzat
d’aproximació a la aritmètica amb nombres reals per realitzar càlculs numèrics
per ordinador.
Durant molt temps cada màquina presentava una aritmètica diferent: bases,
mida dels significants i exponents, formats, etc. Cada fabricant implementava el
seu propi model ,fet que dificultava la portabilitat entre diferents equips, fins que
va aparèixer la norma IEEE 754 que definia un estàndard únic per a tothom.
L’objectiu d’aquest projecte és, a partir del estàndard IEEE 754, implementar un
sumador/restador binari de punt flotant de 32 bits emprant el llenguatge de
programació hardware VHDL.
ZHRNUTIE
Práca s číslami s pohyblivou desatinnou čiarkou je najpoužívanejší spôsob pre
vykonávanie aritmetických výpočtov s reálnymi číslami na moderných
počítačoch. Donedávna, každý počítač využíval rôzne typy formátov: báza,
znamienko, veľkosť exponentu, atď. Každá firma implementovala svoj vlastný
formát a zabraňovala jeho prenosu na iné platformy pokiaľ sa nevymedzil
jednotný štandard IEEE 754. Cieľom tejto práce je implementovanie 32-bitovej
sčítačky/odčítačky pracujúcej s číslami s pohyblivou desatinnou čiarkou podľa
štandardu IEEE 754 a to pomocou jazyka na opis hardvéru VHDL.
ABSTRACT
Floating Point arithmetic is by far the most used way of approximating real
number arithmetic for performing numerical calculations on modern computers.
Each computer had a different arithmetic for long time: bases, significant and
exponents’ sizes, formats, etc. Each company implemented its own model and it
hindered the portability between different equipments until IEEE 754 standard
appeared defining a single and universal standard.
The aim of this project is implementing a 32 bit binary floating point
adder/subtractor according with the IEEE 754 standard and using the hardware
programming language VHDL.
- 5 -
CHAPTER 1:
INTRODUCTION
Many fields of science, engineering and finance require manipulating real
numbers efficiently. Since the first computers appeared, many different ways of
approximating real numbers on it have been introduced.
One of them, the floating point arithmetic, is clearly the most efficient way of
representing real numbers in computers. Representing an infinite, continuous set
(real numbers) with a finite set (machine numbers) is not an easy task: some
compromises must be found between speed, accuracy, ease of use and
implementation and memory cost.
Floating Point Arithmetic represent a very good compromise for most numerical
applications.
1.1.
Floating Point Numbers
The floating point numbers representation is based on the scientific notation: the
decimal point is not set in a fixed position in the bit sequence, but its position is
indicated as a base power.
All the floating point numbers are composed by three components:
• Sign: it indicates the sign of the number (0 positive and 1 negative)
• Mantissa: it sets the value of the number
- 7 -
Arturo Barrabés Castillo
• Exponent: it contains the value of the base power (biased)
• Base: the base (or radix) is implied and it is common to all the numbers (2
for binary numbers)
The free using of this format caused either designed their own floating point
system. For example, Konrad Zuse did the first modern implementation of a
floating point arithmetic in a computer he had built (the Z3) using a radix-2
number system with 14-bit significant, 7-bit exponents and 1-bit sign. On the
other hand the PDP-10 or the Burroughs 570 used a radix-8 and the IBM 360
had radix-16 floating point arithmetic.
This led to the need for a standard which would make a clear and concise format
to be used by all the developers.
1.2.
The Standard IEEE 754
The first question that comes to mind is “What’s IEEE?”. The Institute of
Electrical and Electronics Engineers (IEEE) is a non-profit professional association
dedicated to advancing technological innovations and excellence.
It was founded in 1884 as the AIEE (American Institute of Electrical Engineers).
The IEEE was formed in 1963 when AIEE merged with IRE (Institute of Radio
Engineers).
One of its many functions is leading standards development organization for the
development of industrial standards in a broad range of disciplines as
telecommunications, consumer electronics or nanotechnology.
IEEE 754 is one of these standards.
1.2.1.
Overview
Standard IEEE 754 specifies formats and methods in order to operate with
floating point arithmetic.
These methods for computational with floating point numbers will yield the same
result regardless the processing is done in hardware, software or a combination
for the two or the implementation.
The standard specifies:
• Formats for binary and decimal floating point data for computation and
data interchange
• Different operations as addition, subtraction, multiplication and other
operations
• Conversion between integer-floating point formats and the other way
around
• Different properties to be satisfied when rounding numbers during
arithmetic and conversions
• Floating point exceptions and their handling (NaN, ±∞ or zero)
- 8 -