SLIDE Week 6 – Lecture 6 – Arithmetic processing unit implementation Kiến Trúc Máy Tính | Trường Đại học Công nghệ, Đại học Quốc gia Hà Nội

SLIDE Week 6 – Lecture 6 – Arithmetic processing unit implementation Kiến Trúc Máy Tính | Trường Đại học Công nghệ, Đại học Quốc gia Hà Nội . Tài liệu được sưu tầm và biên soạn dưới dạng PDF gồm 32 trang giúp bạn tham khảo, củng cố kiến thức và ôn tập đạt kết quả cao trong kỳ thi sắp tới. Mời bạn đọc đón xem! 

Môn:
Thông tin:
32 trang 4 tuần trước

Bình luận

Vui lòng đăng nhập hoặc đăng ký để gửi bình luận.

SLIDE Week 6 – Lecture 6 – Arithmetic processing unit implementation Kiến Trúc Máy Tính | Trường Đại học Công nghệ, Đại học Quốc gia Hà Nội

SLIDE Week 6 – Lecture 6 – Arithmetic processing unit implementation Kiến Trúc Máy Tính | Trường Đại học Công nghệ, Đại học Quốc gia Hà Nội . Tài liệu được sưu tầm và biên soạn dưới dạng PDF gồm 32 trang giúp bạn tham khảo, củng cố kiến thức và ôn tập đạt kết quả cao trong kỳ thi sắp tới. Mời bạn đọc đón xem! 

10 5 lượt tải Tải xuống
ELT3047 Computer Architecture
Hoang Gia Hung
Faculty of Electronics and Telecommunications
University of Engineering and Technology, VNU Hanoi
Lecture 6: Arithmetic processing unit
implementation
Last lecture review
Demonstration of design principles for MIPS ISA
Storage model
Addressing mode
Instruction encoding
Compiler support* (self-study)
MIPS instruction set
Arithmetic & logical instructions
Data transfer instructions
Control instructions
if-then, loops
Procedure call
Brief introduction to compiler technology
Todays lecture: implementation of MIPS ALU
First part of the MIPS datapath in incremental featurism approach.
Operations on integers
Addition and subtraction
Multiplication and division
Dealing with overflow
Operations on real numbers
Floating point operations
Rounding error
Arithmetic for multimedia (not covered in this course)
Performing simultaneous operations on short vectors, e.g. 128-bit adder
= four 32-bit adds. Also called data-level parallelism (DLD), vector
parallelism, or Single Instruction, Multiple Data (SIMD)
Saturating operations (e.g. clipping in audio, saturation in video)
Arithmetic for Computers
Digits are added bit by bit from right to left, with carries
passed to the next digit to the left
Example: 7 + 6
Subtraction: the appropriate operand is simply negated before
being added, any carry-out is discarded.
Example: 7 6 = 7 + (6)
+7: 0111
6: 1010
+1: 1 0001 (discard carry-out)
Integer Addition and Subtraction
Occurs if result out of signed integer range
Example: 7 - (-6) = 7 + 6
+7: 0111
+6: 0110
+13: 1101 = -8 + 4 + 1 = -3 (two’s complement)
Occurs when adding operands with the same sign, or subtracting
operands with different signs.
Detection: checking the MSB of two operands and the answer (3-bit
comp.) or checking Carry-in and Carry-Out from MSB’s (2-bit comp.).
Dealing with Overflow.
Some languages (e.g., C) ignore overflow
Other languages (e.g., Ada, Fortran) require raising an exception the
computer jumps to a predefined address to invoke the appropriate routine.
Overflow
Half adder has 2 inputs, in
principle HA is same as
FA, with C
in
set to 0.
Each FA has a finite delay
= 2 gate delays ()
1-Bit Adders
A B
0 0
0 1
1 0
1 1
0
1
1
0
SC
out
0
0
0
1
S = A B
C
out
= AB
Half
Adder
A B
S
C
out
+
A B
0 0
0 1
1 0
1 1
0
1
1
0
SC
out
0
0
0
1
S = A B C
in
C
out
= AB + AC
in
+ BC
in
Full
Adder
C
in
0 0
0 1
1 0
1 1
0
0
0
0
1
1
1
1
1
0
0
1
0
1
1
1
A B
S
C
out
C
in
+
A
B
C
in
C
out
S

Types of carry propagate adders (CPAs):
Ripple-carry (slow)
Carry-select (fast)
Carry-lookahead (faster)
Trade-off: faster adders require more hardware
Multibit Adders
A B
S
C
out
C
in
+
N
N
N
Chain 1-bit adders together
All input bits available at the same time
Size/complexity: O(n) ( 

)
Carry ripples through entire chain
Carry ripple through all FAs from right to left
Critical Path Delay: O(n) (

)
Ripple-Carry Adder
S
31
A
30
B
30
S
30
A
1
B
1
S
1
A
0
B
0
S
0
C
30
C
29
C
1
C
0
C
out
+
+
+
+
A
31
B
31
C
in
Principle: speculative C
in
compute both, select one
3 adders operate in parallel
Size/complexity: O(n) (example: 



)
Critical Path Delay: O(n
0.5
) (example: 



n = 16: )
Carry-Select Adder
Principles
Generate a carry out if and are both 1,

does not matter.
Propagate a carry in to the carry out if or is 1.
Local decisions based on and only: generate , propagate
.
Carry Generate and Propagate
Compute carry out (C
out
) for -bit blocks in parallel using
generate and propagate signals:








We can compute 
in O(log n) gate delay and O(n
2
) size only
manageable for small n
Given 
we can compute
for a constant additional delay.
Carry Look Ahead Adder
C0
A
3
B
3
CLL (carry look-ahead logic)
C3
C4
p
3
g
3
A
2
B
2
C2
p
2
g
2
A
1
B
1
C1
p
1
g
1
A
0
B
0
p
0
g
0
S
3
S
2
S
1
S
0
Unsigned integer multiplication
Paper and pencil example
Multiplicand 1000 = 8
Multiplier x 1001 = 9
1000
0000
0000
1000
Product 1001000 = 72
Observations
m-bit multiplicand x n-bit multiplier = (m+n)-bit product
Accomplished via shifting and addition.
Consume more time and more chip area than addition
Binary multiplication is easy
0 x multiplicand = 0
1 x multiplicand = multiplicand
Sequential Unsigned Multiplication
Initially 0
If each step took a clock cycle, this algorithm would require almost 100 clock
cycles to multiply two 32-bit numbers.
1000
× 1001
1000
+ 0
1000
10000
× 100
1000
100000
× 10
1000
1000000
× 1
1000000
+ 1000
1001000
iteration
Refined Algorithm & Hardware
HI LO
shift right
write
LO[0]
carry
add
Yes
No
= 0= 1
Start
HI = 0, LO = Multiplier
LO[0]?
HI = HI + Multiplicand
Shift right (Carry, HI, LO) 1 bit
32
nd
repetition?
Done
Perform shifts in parallel
ALU produces 64-bit result + Carry bit
Final Product = HI and LO registers
Initialize LO = Multiplier
Refined algorithm: Example
Consider: 1100
2
x 1101
2
4-bit multiplicand and multiplier 4-bit adder produces a 5-bit sum (with
carry)
Product = 10011100
2
Signed Integer Multiplication (p & p)
Case 1: Positive Multiplier
Multiplicand 1100 = -4
Multiplier x 0101 = +5
11111100
111100
Product 11101100 = -20
Case 2: Negative Multiplier
Multiplicand 1100 = -4
Multiplier x 1101 = -3
11111100
111100
00100 (2's complement of 1100)
Product 00001100 = +12
Sign-extension
Sign-extension
Signed Integer Multiplier Hardware
Yes
No
= 0= 1
Start
HI = 0, LO = Multiplier
LO[0]?
First 31 iterations: HI = HI + Multiplicand
Last iteration: HI = HI - Multiplicand
Shift right (sign, HI, LO) 1 bit
32
nd
repetition?
Done
HI LO
shift right
write
LO[0]
sign
Add/sub
Perform shifts in parallel
ALU produces 64-bit result + Sign bit
Sign bit set as follows
No overflow Extend sign-bit of result
Overflow Invert sign bit of result
Signed Multiplication: Example
Consider: 1010
2
x 1111
2
Iteration 1: No overflow Extend sign-bit (1 1)
Iteration 2,3: Overflow Invert sign bit (0 1)
Last iteration: No overflow (why?), add 2's complement of Multiplicand
Product = 00001100
2
Faster Integer Multiplier
Uses multiple adders
Cost/performance tradeoff
Can be pipelined
Several multiplication performed in parallel
MIPS multiplication
Two 32-bit registers for product
HI: most-significant 32 bits
LO: least-significant 32-bits
Instructions
mult rs, rt / multu rs, rt
64-bit product in HI/LO
mfhi rd / mflo rd
Move from HI/LO to rd
Can test HI value to see if product overflows 32 bits
mul rd, rs, rt
Least-significant 32 bits of product > rd
| 1/32

Preview text:

ELT3047 Computer Architecture
Lecture 6: Arithmetic processing unit implementation Hoang Gia Hung
Faculty of Electronics and Telecommunications
University of Engineering and Technology, VNU Hanoi Last lecture review
❑ Demonstration of design principles for MIPS ISA ➢ Storage model ➢ Addressing mode ➢ Instruction encoding
➢ Compiler support* (self-study) ❑ MIPS instruction set
➢ Arithmetic & logical instructions ➢ Data transfer instructions ➢ Control instructions ➢ if-then, loops ➢ Procedure call
❑ Brief introduction to compiler technology
Today’s lecture: implementation of MIPS ALU
➢ First part of the MIPS datapath in incremental featurism approach. Arithmetic for Computers ❑ Operations on integers ➢ Addition and subtraction
➢ Multiplication and division ➢ Dealing with overflow ❑ Operations on real numbers ➢ Floating point operations ➢ Rounding error
❑ Arithmetic for multimedia (not covered in this course)
➢ Performing simultaneous operations on short vectors, e.g. 128-bit adder
= four 32-bit adds. Also called data-level parallelism (DLD), vector
parallelism, or Single Instruction, Multiple Data (SIMD)
Saturating operations (e.g. clipping in audio, saturation in video)
Integer Addition and Subtraction
❑ Digits are added bit by bit from right to left, with carries
passed to the next digit to the left ➢ Example: 7 + 6
❑ Subtraction: the appropriate operand is simply negated before
being added, any carry-out is discarded.
Example: 7 – 6 = 7 + (–6) +7: 0111 –6: 1010 +1: 1 0001 (discard carry-out) Overflow
❑ Occurs if result out of signed integer range
Example: 7 - (-6) = 7 + 6 +7: 0111 +6: 0110
+13: 1101 = -8 + 4 + 1 = -3 (two’s complement)
➢ Occurs when adding operands with the same sign, or subtracting
operands with different signs.
Detection: checking the MSB of two operands and the answer (3-bit
comp.) or checking Carry-in and Carry-Out from MSB’s (2-bit comp.). ❑ Dealing with Overflow.
➢ Some languages (e.g., C) ignore overflow
➢ Other languages (e.g., Ada, Fortran) require raising an exception → the
computer jumps to a predefined address to invoke the appropriate routine. 1-Bit Adders Half Full Cin 𝜏 2𝜏 Adder Adder A B A B A C C C C out out + out in + ⚫ S S B A B C S C A B C S out in out S 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 1 1 0 0 1 0 1 0 0 1 1 1 1 0 0 1 1 1 0 1 0 0 0 1 S = A  B 1 0 1 1 0 ❖ Half adder has 2 inputs, in C = AB 1 1 0 1 0 out 1 1 1 1 1 principle HA is same as FA, with C set to 0. in S = A  B  Cin ❖ Each FA has a finite delay C = AB + AC + BC out in in = 2 gate delays (2𝜏) Multibit Adders A B N N C C out in + N S
❑ Types of carry propagate adders (CPAs): ➢ Ripple-carry (slow) ➢ Carry-select (fast) ➢ Carry-lookahead (faster)
❑ Trade-off: faster adders require more hardware Ripple-Carry Adder A B A B A B A B 31 31 30 30 1 1 0 0 C C out + C + C C + C + in 30 29 1 0 S S S S 31 30 1 0
❑ Chain 1-bit adders together
➢ All input bits available at the same time
➢ Size/complexity: O(n) (= 𝑛𝐹𝐴𝑠𝑖𝑧𝑒)
❑ Carry ripples through entire chain
➢ Carry ripple through all FAs from right to left
➢ Critical Path Delay: O(n) (= 𝑛𝑡𝐹𝐴 = 2𝑛𝜏) Carry-Select Adder
❑ Principle: speculative C – compute both, select one in
➢ 3 adders operate in parallel
➢ Size/complexity: O(n) (example: 1.5𝑛𝐹𝐴𝑠𝑖𝑧𝑒 + mux𝑠𝑖𝑧𝑒)
➢ Critical Path Delay: O(n0.5) (example: 0.5𝑛𝑡𝐹𝐴 + mux𝑑𝑒𝑙𝑎𝑦 → n = 16: ≈ 16𝜏) Carry Generate and Propagate ❑ Principles
Generate a carry out if 𝑎 and 𝑏 are both 1, 𝐶𝑖𝑛 does not matter.
Propagate a carry in to the carry out if 𝑎 or 𝑏 is 1.
➢ Local decisions based on 𝑎 and 𝑏 only: generate 𝐺 = 𝑎𝑏, propagate 𝑃 = 𝑎 + 𝑏. Carry Look Ahead Adder A B A B A B 3 3 2 2 1 1 A B 0 0 S C3 S C2 C1 S 3 2 S1 0 C0 p g p g p g 3 3 2 2 1 1 p g 0 0 CLL (carry look-ahead logic) C4
❑ Compute carry out (C ) for 𝑘-bit blocks in parallel using out
generate and propagate signals: 𝑐1 = 𝐺0 + 𝑐0𝑃0
𝑐2 = 𝐺1 + 𝐺0𝑃1 + 𝑐0𝑃0𝑃1
𝑐3 = 𝐺2 + 𝐺1𝑃2 + 𝐺0𝑃1𝑃2 + 𝑐0𝑃0𝑃1𝑃2
𝑐4 = 𝐺3 + 𝐺2𝑃3 + 𝐺1𝑃2𝑃3 + 𝐺1𝑃1𝑃2𝑃3 + 𝑐0𝑃0𝑃1𝑃2𝑃3
➢ We can compute 𝑐𝑛 in O(log n) gate delay and O(n2) size → only manageable for small n
➢ Given 𝑐𝑛 we can compute 𝑠𝑛 for a constant additional delay.
Unsigned integer multiplication ❑ Paper and pencil example Multiplicand 1000 = 8 Multiplier x 1001 = 9 1000 Binary multiplication is easy 0000 0 x multiplicand = 0 0000
1 x multiplicand = multiplicand 1000 Product 1001000 = 72 ❑ Observations
➢ m-bit multiplicand x n-bit multiplier = (m+n)-bit product
➢ Accomplished via shifting and addition.
➢ Consume more time and more chip area than addition
Sequential Unsigned Multiplication 1000 10000 100000 1000000 × 1001 × 100 × 10 × 1 1000 1000000 + 0 + 1000 1000 1000 1000 1001000 iteration Initially 0 ❖
If each step took a clock cycle, this algorithm would require almost 100 clock
cycles to multiply two 32-bit numbers.
Refined Algorithm & Hardware Start ❑ Perform shifts in parallel HI = 0, LO = Multiplier
➢ ALU produces 64-bit result + Carry bit
➢ Final Product = HI and LO registers = 1 = 0
➢ Initialize LO = Multiplier LO[0]? HI = HI + Multiplicand
Shift right (Carry, HI, LO) 1 bit add No 32nd repetition? carry shift right Yes HI LO Done write LO[0] Refined algorithm: Example ❑ Consider: 1100 x 1101 2 2
➢ 4-bit multiplicand and multiplier → 4-bit adder produces a 5-bit sum (with carry) ➢ Product = 100111002
Signed Integer Multiplication (p & p)
❑ Case 1: Positive Multiplier Multiplicand 1100 = -4 Multiplier x 0101 = +5 11111100 Sign-extension 111100 Product 11101100 = -20
❑ Case 2: Negative Multiplier Multiplicand 1100 = -4 Multiplier x 1101 = -3 11111100 Sign-extension 111100 00100 (2's complement of 1100) Product 00001100 = +12
Signed Integer Multiplier Hardware Start ❑ Perform shifts in parallel HI = 0, LO = Multiplier
➢ ALU produces 64-bit result + Sign bit
Sign bit set as follows = 1 = 0 LO[0]?
No overflow Extend sign-bit of result
Overflow Invert sign bit of result
First 31 iterations: HI = HI + Multiplicand
Last iteration: HI = HI - Multiplicand
Shift right (sign, HI, LO) 1 bit Add/sub No 32nd repetition? sign shift right Yes HI LO Done write LO[0] Signed Multiplication: Example ❑ Consider: 1010 x 1111 2 2
➢ Iteration 1: No overflow Extend sign-bit (1 → 1)
➢ Iteration 2,3: Overflow Invert sign bit (0 → 1)
➢ Last iteration: No overflow (why?), add 2's complement of Multiplicand ➢ Product = 000011002 Faster Integer Multiplier ❑ Uses multiple adders ➢ Cost/performance tradeoff ❑ Can be pipelined
➢ Several multiplication performed in parallel MIPS multiplication
❑ Two 32-bit registers for product
➢ HI: most-significant 32 bits
➢ LO: least-significant 32-bits ❑ Instructions ➢ mult rs, rt / multu rs, rt ▪ 64-bit product in HI/LO ➢ mfhi rd / mflo rd ▪ Move from HI/LO to rd
▪ Can test HI value to see if product overflows 32 bits ➢ mul rd, rs, rt
▪ Least-significant 32 bits of product –> rd