Fiscal Year 2018



Course number: CSC.T433 School of Computing, Graduate major in Computer Science

# Advanced Computer Architecture

# 9. Instruction Level Parallelism: Exploiting ILP Using Multiple Issue and Speculation

www.arch.cs.titech.ac.jp/lecture/ACA/ Room No.W936 Mon 13:20-14:50, Thr 13:20-14:50

Kenji Kise, Department of Computer Science kise \_at\_ c.titech.ac.jp

CSC.T433 Advanced Computer Architecture, Department of Computer Science, TOK TECH

# Hardware register renaming

- Logical registers (architectural registers) which are ones defined by ISA
  - \$0, \$1, ... \$31
- Physical registers
  - Assuming plenty of registers are available, p0, p1, p2, ...
- A processor renames (converts) each logical register to a unique physical register dynamically

Typical instruction pipeline of scalar processor

| IF | ID | EX | MEM | WB |
|----|----|----|-----|----|
|----|----|----|-----|----|

Typical instruction pipeline of high-performance superscalar processor

| IF ID Renaming Dispat | Issue Execute Commit Retire |
|-----------------------|-----------------------------|
|-----------------------|-----------------------------|

# Out-of-order execution

- In in-order execution model, all instructions executed in the order that they appear. This can lead to unnecessary stalls.
  - Instruction (3) stalls waiting for insn (2) to go first, even though it does not have a data dependence.
- Using register renaming to eliminate output dependence and antidependence, just having true data dependence
- With out-of-order execution, insn (3) is allowed to executed before the insn (2)
  - Scoreboarding (CDC6600 in 1964)
  - Tomasulo algorithm (IBM System/360 Model 91 in 1967)

(3)

(2)

(4)

Data flow graph

# The key idea for OoO execution (1/3)

In-order front-end, OoO execution core, in-order retirement using instruction ulletwindow and reorder buffer (ROB)

2



I1: sub p9,p1,p2 I2: add p10,p9,p3 I3: or p11,p4,p5 I4: and p12,p10,p11





# The key idea for OoO execution (2/3)

• In-order front-end, OoO execution core, in-order retirement using instruction window and reorder buffer (ROB)



# The key idea for OoO execution (3/3)

• In-order front-end, OoO execution core, in-order retirement using instruction window and reorder buffer (ROB)



### Instruction pipeline of OoO execution processor

- Allocating instructions to instruction window is called dispatch
- Issue or fire wakes up instructions and their executions begin
- In commit stage, the computed values are written back to ROB
- The last stage is called retire or graduate. The result is written back to register file (architectural register file) using a logical register number.

### In-order front-end



# Exercise: OoO execution

- Draw the cycle by cycle processing behavior of these 12 instructions
  - wakeup
  - select



#### Cycle 1



#### Cycle 2

| Instruction window | Issue | Execute | Commit | Retire |
|--------------------|-------|---------|--------|--------|
|                    |       |         | ROB    |        |

#### Cycle 3

| Instruction window | Issue | Execute | Commit | Retire |
|--------------------|-------|---------|--------|--------|
|                    |       |         |        |        |
|                    |       |         |        |        |
|                    |       |         | ROB    |        |

#### Cycle 4

| Instruction window | Issue | Execute | Commit | Retire |
|--------------------|-------|---------|--------|--------|
|                    |       |         | ROB    |        |

#### Cycle 5

↔

| Ins | stru | ctio | n wi | indo | <b>w</b> | Is | sue | ] | Exec<br>S | cute | ( | Comr<br>[<br>[ | nit | Re | it:<br>]<br>] |  |
|-----|------|------|------|------|----------|----|-----|---|-----------|------|---|----------------|-----|----|---------------|--|
|     |      |      |      |      |          |    |     |   |           |      |   | RC             | B   |    |               |  |

#### Cycle 6



#### Cycle 7



#### Cycle 8



#### Cycle 9

| Instruction window | Issue | Execute | Commit | Retire |
|--------------------|-------|---------|--------|--------|
|                    |       |         | ROB    |        |

#### Cycle 10

re

| Instruction window | Issue | Execute | Commit | Retire |
|--------------------|-------|---------|--------|--------|
|                    |       |         | ROB    |        |

CSC.T433 Advanced Computer Architecture, Department of Computer Science, TOKYO TECH

# Prediction miss and recovery

- Assume that instruction 3 is a miss predicted branch and its target insn is 20
- Register file (and PC) has the architecture state after insn 3 is executed
- When insn 3 is retired, recover by flushing all instructions and restart



# MIPS R3000 Instruction Set Architecture (ISA)



CSC.T433 Advanced Computer Architecture, Department of Computer Science, TOKYO TECH

### Branch prediction miss and aggressive recovery

- Instruction 3 is a miss predicted branch and its target insn is 20
- Register file (and PC) has the architecture state after insn 3 is executed
- When insn 3 is executed, recover by flushing instructions after insn 3 and restart



# Aside: What is a window?

A window is a space in the wall of a building or in the side of a vehicle,
 which has glass in it so that light can come in and you can see out. (Collins)



Instructions to be executed for an application

Instruction window

Instruction large window

Instruction window

Instruction window

Instruction window

CSC.T433 Advanced Computer Architecture, Department of Computer Science, TOKYO TECH

# Register dataflow

• In-flight instructions are ones processing in a processor



# Case 1: Register dataflow from a far previous instn

- One source operand of insn I2 is from a retired instruction Ia.
- Because Ia is retired, the destination register has no renamed tag. The tag of a source register can not be renamed at renaming stage, still having a logical register tag \$3.
- Where does the operand \$3 comes from?

```
Ia: add $3,$0,$0
I1: sub p9,$1,$2
I2: add p10,p9,$3
I3: or p11,$4,$5
I4: and p12,p10,p11
```



# Register renaming again

- A processor remembers a set of renamed logical registers.
- If \$1 and \$2 are not renamed in in-flight instructions, it uses \$1 and \$2 instead of p1 and p2.



CSC.T433 Advanced Computer Architecture, Department of Computer Science, TOKYO TECH

# Case 2: Register dataflow from ROB

- Assume that one source operand of insn 15 is from 12 which is not retired. The operand is generated a few clock cycles (sometimes, tens of cycles) earlier.
- Because I2 is not retired, RF does not have the operand.
   I2 is committed, so the operand is stored in ROB.
- Where does the operand comes from?



# Case 3: Register dataflow from ALUs

- Assume that one source operand of insn 15 is from 14 which is not retired. The operand is generated in the previous clock cycle.
- Because I2 is not retired, RF does not have the operand. Because I2 is not committed, ROB does not have the operand.
- Where does the operand comes from?



# Datapath of OoO execution processor



# Pollack's Rule

 Pollack's Rule states that microprocessor "performance increase due to microarchitecture advances is roughly proportional to the square root of the increase in complexity". Complexity in this context means processor logic, i.e. its area.



WIKIPFDIA