2018年度(平成30年度)版

Ver. 2018-11-05a

Course number: CSC.T363

コンピュータアーキテクチャ Computer Architecture

10. 仮想記憶、セキュリティ Virtual Memory and Security

www.arch.cs.titech.ac.jp/lecture/CA/ Room No.W321 Tue 13:20-16:20, Fri 13:20-14:50

CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH

吉瀬 謙二 情報工学系 Kenji Kise, Department of Computer Science Kise \_at\_ c.titech.ac.jp 1

# A Typical Memory Hierarchy

By taking advantage of the principle of locality (局所性)

Present much memory in the cheapest technology



#### Example of 32-bit memory space (4GB)

0x0000000

#### 



#### 2GB Memory !

| *****  | *****    | ****** | ***** |         |      | *****  |       | *****   |          | *****     | *****  |
|--------|----------|--------|-------|---------|------|--------|-------|---------|----------|-----------|--------|
| 🚺 kte  | rm       |        |       |         |      |        |       |         |          |           | 巴      |
| top -  | 11:35:26 | up 10  | ) da  | ys, 19  | :49, | 2 user | s, I  | load av | erage: 0 | .01, 0.03 | 1, 0   |
| Tasks: | 164 tota | 1,     | 1 r   | unning, | 163  | sleepi | ng,   | 0 sto   | pped,    | 0 zombie  |        |
|        | : 0.0%us |        |       |         |      |        |       |         |          |           |        |
|        | 4002924k |        |       |         |      |        |       |         |          |           |        |
| Swap∶  | 6062072k | tota   | al,   |         | 0k u | sed, 6 | 06207 | 72k fre | e, 2570  | 804k cacł | ned    |
|        |          |        |       |         |      |        |       |         |          |           |        |
| PID    | USER     | PR     | NI    | VIRT    | RES  | SHR S  | %CPU  | %MEM    | TIME+    | COMMAND   |        |
| 1      | root     | 15     | 0     | 10348   | 696  | 584 S  | 0.0   | 0.0     | 0:01.00  |           |        |
|        | root     | RT     | -5    | 0       | 0    | 0 S    | 0.0   |         |          | migratio  |        |
| 3      | root     | 34     | 19    | 0       | 0    | 0 S    | 0.0   | 0.0     |          | ksoftira  |        |
| 4      | root     | RT     | -5    | 0       | 0    | 0 S    | 0.0   |         |          | watchdog  |        |
| 5      | root     | RT     | -5    | 0       | 0    | 0 S    | 0.0   | 0.0     |          | migratio  |        |
| 6      | root     | 34     | 19    | 0       | 0    | 0 S    | 0.0   | 0.0     |          | ksoftire  |        |
| 7      | root     | RT     | -5    | 0       | 0    | 0 S    | 0.0   |         |          | watchdog  |        |
| 8      | root     | RT     | -5    | 0       | 0    | 0 S    | 0.0   | 0.0     |          | migratio  |        |
|        | root     | 34     | 19    | 0       | 0    | 0 S    | 0.0   |         |          | ksoftire  |        |
| 10     | root     | RT     | -5    | 0       | 0    | 0 S    | 0.0   | 0.0     | 0:00.00  | watchdog  | g/2    |
|        |          |        |       |         |      |        |       |         |          |           | ****** |

OxFFFFFFF



# Virtual Memory(仮想記憶)

- Use main memory as a "cache" for secondary memory
  - Provides the ability to easily run programs larger than the size of physical memory
  - Simplifies loading a program for execution by providing for code relocation (i.e., the code can be loaded anywhere in main memory)
  - Allows efficient and safe sharing of memory among multiple programs
- Security, memory protection
  - control memory access rights



Secondary memory (disk)

# Virtual Memory

- What makes it work? again the Principle of Locality
  - A program is likely to access a relatively small portion of its address space during any period of time

# Virtual Memory

- Each program is compiled into its own address space – a "virtual address (VA)" space
- Physical address (PA) for the access of physical devices
  - During run-time each virtual address, VA must be translated to a physical address, PA





# Virtual Memory



# Two Programs Sharing Physical Memory

- A program's address space is divided into pages (all one fixed size, typical 4KB) or segments (variable sizes)
  - The starting location of each page (either in main memory or in secondary memory) is contained in the program's page table

Program A's page table (virtual address space)



# Address Translation

A virtual address is translated to a physical address by a combination of hardware and software



So each memory request first requires an address
translation from the virtual space to the physical space

#### Address Translation Mechanisms



#### Virtual Addressing, the hardware fix

• Thus it may take an extra memory access to translate a virtual address to a physical address



#### Virtual Addressing, the hardware fix

- The hardware fix is to use a Translation Lookaside Buffer (TLB) (アドレス変換バッファ)
  - a small cache that keeps track of recently used address mappings to avoid having to do a page table lookup



#### Making Address Translation Fast



# MIPS Direct Mapped Cache Example

• One word/block, cache size = 1K words





What kind of locality are we taking advantage of?

#### Translation Lookaside Buffers (TLBs)

• Just like any other cache, the TLB can be organized as fully associative, set associative, or direct mapped

| V | Virtual Page # | Physical Page # |  |  |
|---|----------------|-----------------|--|--|
|   |                |                 |  |  |
|   |                |                 |  |  |
|   |                |                 |  |  |
|   |                |                 |  |  |

- TLB access time is typically smaller than cache access time (because TLBs are much smaller than caches)
  - TLBs are typically not more than 128 to 256 entries even on high end machines

# A TLB in the Memory Hierarchy



- A TLB miss is it a TLB miss or a page fault?
  - If the page is in main memory, then the TLB miss can be handled (in hardware or software) by loading the translation information from the page table into the TLB
    - Takes 100's of cycles to find and load the translation info into the TLB
  - If the page is not in main memory, then it's a true page fault
    - Takes 1,000,000's of cycles to service a page fault

## A TLB in the Memory Hierarchy



- **page fault** : page is not in physical memory
- TLB misses are much more frequent than true page faults

#### Two Machines' TLB Parameters

|                  | Intel P4                                    | AMD Opteron                                                                            |
|------------------|---------------------------------------------|----------------------------------------------------------------------------------------|
| TLB organization | 1 TLB for instructions and<br>1TLB for data | 2 TLBs for instructions and 2 TLBs for data                                            |
|                  | Both 4-way set associative<br>Both use ~LRU | Both L1 TLBs fully associative<br>with ~LRU replacement                                |
|                  | replacement                                 | Both L2 TLBs are 4-way set<br>associative with round-robin<br>LRU                      |
|                  | Both have 128 entries                       | Both L1 TLBs have 40 entries<br>Both L2 TLBs have 512 entries<br>TBL misses handled in |
|                  | TLB misses handled in hardware              | hardware                                                                               |



# TLB Event Combinations

| TLB  | Page<br>Table | Cache        | Possible? Under what circumstances?                                        |
|------|---------------|--------------|----------------------------------------------------------------------------|
| Hit  | Hit           | Hit          | Yes – what we want!                                                        |
| Hit  | Hit           | Miss         | Yes – although the page table is not checked if the TLB hits               |
| Miss | Hit           | Hit          | Yes – TLB miss, PA in page table                                           |
| Miss | Hit           | Miss         | Yes – TLB miss, PA in page table, but data not in cache                    |
| Miss | Miss          | Miss         | Yes – page fault                                                           |
| Hit  | Miss          | Miss/<br>Hit | Impossible – TLB translation not possible if page is not present in memory |
| Miss | Miss          | Hit          | Impossible – data not allowed in cache if page is not in memory            |

# Why Not a Virtually Addressed Cache?

• A virtually addressed cache would only require address translation on cache misses



#### but

- Two different virtual addresses can map to the same physical address (when processes are sharing data),
- Two different cache entries hold data for the same physical address
  - synonyms (別名)
    - Must update all cache entries with the same physical address or the memory becomes inconsistent

# The Hardware/Software Boundary

- What parts of the virtual to physical address translation is done by or assisted by the hardware?
  - Translation Lookaside Buffer (TLB) that caches the recent translations
    - TLB access time is part of the cache hit time
    - May cause an extra stage in the pipeline for TLB access
  - Page table storage, fault detection and updating
    - Page faults result in interrupts (precise) that are then handled by the OS
    - Hardware must support (i.e., update appropriately) Dirty and Reference bits (e.g., ~LRU) in the Page Tables





# A Typical Memory Hierarchy

By taking advantage of the principle of locality (局所性)

Present much memory in the cheapest technology

