

Ver. 2018-10-06a

Course number: CSC.T363

# コンピュータアーキテクチャ Computer Architecture

### 3. 半導体メモリ Memory Technologies

www.arch.cs.titech.ac.jp/lecture/CA/ Room No.W321 Tue 13:20-16:20, Fri 13:20-14:50

CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH

吉瀬 謙二 情報工学系 Kenji Kise, Department of Computer Science Kise \_at\_ c.titech.ac.jp 1



### DRAM (dynamic random access memory)







### Processor-Memory(DRAM) Performance Gap



### The Memory System's Fact and Goal

### Fact:

# Large memories are slow and fast memories are small

How do we create a memory that gives the illusion of being large, cheap and fast?

With hierarchy (階層) With parallelism (並列性)

# A Typical Memory Hierarchy

By taking advantage of the principle of locality (局所性)

Present much memory in the cheapest technology





- Cache memory consists of a small, fast memory that acts as a buffer for the large memory.
- The nontechnical definition of *cache* is a safe place for hiding things.



Intel Core 2 Duo

### Intel Sandy Bridge, January 2011



Main memory



Disk





### Characteristics of the Memory Hierarchy



### Memory Hierarchy Technologies

- Caches use SRAM (static random access memory) for speed and technology compatibility
  - Low density (6 transistor cells), high power, expensive, fast
  - Static: content will last "forever" (until power turned off)



- Main Memory uses DRAM for size (density)
  - High density (1 transistor cells), low power, cheap, slow
  - Dynamic: needs to be "refreshed" regularly (~ every 8 ms)
    - 1% to 2% of the active cycles of the DRAM
  - Addresses divided into 2 halves (row and column)
    - RAS or Row Access Strobe triggering row decoder
    - CAS or Column Access Strobe triggering column selector

### Classical RAM Organization (~Square)









#### CY7C1049DV33

#### Switching Waveforms



### Datasheet

# Classical DRAM Operation

Column **Address** N cols **DRAM** Organization: N rows x N column x M-bit DRAM Read or Write M-bit at a time N rows Row Each M-bit access requires **Address** a RAS / CAS cycle M bit planes **Cycle Time M-bit Output** 1<sup>st</sup> M-bit Access 2<sup>nd</sup> M-bit Access RAS CAS **Col Address** Row Address **Col Address Row Address** 14 CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH

# Page Mode DRAM Operation

**Column Address** N cols Page Mode DRAM N x M SRAM to save a row DRAM Row After a row is read into the N rows **Address** SRAM "register" Only CAS is needed to access other M-bit words on that row N x M SRAM RAS remains asserted while CAS is M bit planes toggled **M-bit Output Cycle Time** 1<sup>st</sup> M-bit Access 2<sup>nd</sup> M-bit 3<sup>rd</sup> M-bit 4<sup>th</sup> M-bit RAS CAS Row Address Col Address Col Address Col Address **Col Address** 

# Synchronous DRAM (SDRAM) Operation



CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH

### **Other DRAM Architectures**

- Double Data Rate SDRAMs DDR-SDRAMs (and DDR-SRAMs)
  - Double data rate because they transfer data on both the rising and falling edge of the clock
  - Are the most widely used form of SDRAMs
- DDR2-SDRAMs
- DDR3-SDRAMs





# Which is faster?

| From Tokyo To Firoshima |            |                   |                    |            |                          |  |  |  |  |
|-------------------------|------------|-------------------|--------------------|------------|--------------------------|--|--|--|--|
|                         |            | Time<br>Cost      | Max<br>Speed       | Passengers | Throughput<br>(P × S)    |  |  |  |  |
|                         | Boeing 737 | 1:20<br>32,000yen | 800km/h<br>(670km) | 170        | 85,510<br>(170 x 503)    |  |  |  |  |
|                         | Nozomi     | 4:00<br>18,000yen | 270km/h<br>(820km) | 1,300      | 266,500<br>(1,300 x 205) |  |  |  |  |

• Time to run the task (ExTime)

Enom Tolyco to Winochima

- Execution time, response time, latency
- Tasks per day, hour, week, sec, ns ...
  (Performance)
  - Throughput, **bandwidth**

From the lecture slide of David E Culler



# DRAM Memory Latency & Bandwidth Milestones

|                             | DRAM | Page<br>DRAM | FastPage<br>DRAM | FastPage<br>DRAM | Synch<br>DRAM | DDR<br>SDRAM |
|-----------------------------|------|--------------|------------------|------------------|---------------|--------------|
| Module Width                | 16b  | 16b          | 32b              | 64b              | 64b           | 64b          |
| Year                        | 1980 | 1983         | 1986             | 1993             | 1997          | 2000         |
| Mb/chip                     | 0.06 | 0.25         | 1                | 16               | 64            | 256          |
| Die size (mm <sup>2</sup> ) | 35   | 45           | 70               | 130              | 170           | 204          |
| Pins/chip                   | 16   | 16           | 18               | 20               | 54            | 66           |
| BWidth (MB/s)               | 13   | 40           | 160              | 267              | 640           | 1600         |
| Latency (nsec)              | 225  | 170          | 125              | 75               | 62            | 52           |

Patterson, CACM Vol 47, #10, 2004

In the time that the memory to processor bandwidth doubles the memory latency improves by a factor of only 1.2 to 1.4 To deliver such high bandwidth, the internal DRAM has to be organized as interleaved memory banks

### NEXYS 4 DDR

- Micron MT47H64M16HR-25:H DDR2 memory
  - 128MiB DDR2, 16-bit wide interface





# A Typical Memory Hierarchy

By taking advantage of the principle of locality (局所性)

Present much memory in the cheapest technology

