Fiscal Year 2018



Course number: CSC.T433 School of Computing, Graduate major in Computer Science

# Advanced Computer Architecture

### 15. Final report

www.arch.cs.titech.ac.jp/lecture/ACA/ Room No.W936 Mon 13:20-14:50, Thr 13:20-14:50

Kenji Kise, Department of Computer Science kise \_at\_ c.titech.ac.jp

CSC.T433 Advanced Computer Architecture, Department of Computer Science, TOK TECH



- 1. Submit your final report in a PDF file via E-mail by February 17, 2019
- 2. Enjoy!

# 1. OoO execution and dynamic scheduling

- Draw the cycle by cycle processing behavior of these 10 instructions
- Modify this dataflow graph and draw another cycle by cycle processing behavior of the graph having 10 instructions



#### Cycle 1



#### Cycle 2

| Instruction window | Issue | Execute | Commit | Retire |
|--------------------|-------|---------|--------|--------|
|                    |       |         |        |        |
|                    |       |         |        |        |
|                    |       |         |        |        |
|                    |       |         | ROB    |        |

#### Cycle 3

| Instruction window | Issue | Execute | Commit | Retire |
|--------------------|-------|---------|--------|--------|
|                    |       |         |        |        |
|                    |       |         | ROB    |        |

#### Cycle 4

| Instruction window | Issue | Execute | Commit | Retire |
|--------------------|-------|---------|--------|--------|
|                    |       |         | ROB    |        |

#### Cycle 5

T

| Instruction window | Issue | Execute | Commit | Retire |
|--------------------|-------|---------|--------|--------|
|                    |       |         | ROB    |        |
|                    |       |         |        |        |

#### Cycle 6



#### Cycle 7

ī



#### Cycle 8

| Instruction window | Issue | Execute | Commit | Retire |
|--------------------|-------|---------|--------|--------|
|                    |       |         | ROB    |        |

#### Cycle 9

| Instruction window | Issue | Execute | Commit | Retire |
|--------------------|-------|---------|--------|--------|
|                    |       |         | ROB    |        |

#### Cycle 10

| Instruction window | Issue | Execute | Commit | Retire |
|--------------------|-------|---------|--------|--------|
|                    |       |         | ROB    |        |

CSC.T433 Advanced Computer Architecture, Department of Computer Science, TOKYO TECH

# 2. Register renaming of multi-instructions per cycle

- Design a register renaming unit which renames two instructions per cycle in Verilog HDL. Please download rename01.v and rename02.v from the support page and refer them.
- 2. Design a register renaming unit which renames three instructions per cycle in Verilog HDL.
- Please modify a module RENAME in rename02.v referring the design which renames one instruction per cycle in rename01.v.
- The renamed instruction sequences by rename01.v and rename02.v must be the same.
- The report should include a block diagram, a source code in Verilog HDL, and obtained waveforms of your design.

### Renaming two instructions per cycle for superscalar

Renaming instruction IO and I1



CSC.T433 Advanced Computer Architecture, Department of Computer Science, TOKYO TECH

# 3. Parallel programming (The free lunch is over)

- Describe an efficient parallel program for the following sequential program using LOCK(), UNLOCK() and BARRIER().
- Explain why your cord runs correctly and why your code is efficient.

```
/* the number of grids */
#define N 8
#define TOL 15.0 /* tolerance parameter */
float A[N+2], B[N+2];
void solve () {
    int i, done = 0;
    while (!done) {
        float diff = 0;
        for (i=1; i<=N; i++) { /* use A as input */</pre>
            B[i] = 0.333 * (A[i-1] + A[i] + A[i+1]);
        }
        for (i=1; i<=N; i++) { /* use B as input */</pre>
            A[i] = 0.333 * (B[i-1] + B[i] + B[i+1]);
            diff = diff + fabsf(B[i] - A[i]);
        }
        if (diff <TOL) done = 1;
        for (i=0; i<=N+1; i++) printf("%6.2f ", B[i]);</pre>
        printf("| diff=%6.2f¥n", diff); /* for debug */
    }
}
int main() {
    int i;
    for (i=1; i<N-1; i++) A[i] = B[i] = 100+i*i;
    solve();
```

main02.c

# 4. Building blocks for synchronization

- Fetch-and-increment reads an original value from memory and increments (adds one to) it in memory atomically
- Implement fetch-and-increment (FAI) using the load-linked/storeconditional instruction pair
  - Refer the discussion of implementing an atomic exchange EXCH
- Implement BARRIER() using FAI

# 5. Cache coherence protocols

- Select your favorite commercial multi-core processor
  - Describe the memory organization including caches and main memory
    - cache line size, write policy, write allocate/no-allocate, directmapped/set-associative, the number of caches (L1, L2, and L3?)
  - Describe the cache coherence protocol used there

# 6. Academic paper reading

- Select a paper from the list below and
  - Describe the problem that the authors try to solve,
  - Describe the key idea of the proposal,
  - Describe your opinion why the authors could solve the problem although there may be many researchers try to solve similar problems.
- List
  - Emulating Optimal Replacement with a Shepherd Cache, MICRO-40, 2008
  - The V-Way Cache: Demand Based Associativity via Global Replacement, ISCA'05, 2005
  - Skewed Compressed Caches, MICRO-47, 2014
  - Prophet/critic hybrid branch prediction, ISCA'04, 2004
  - A new case for the TAGE branch predictor, MICRO-44, 2011