一个叫木头,一个叫马尾

An introduction to virtual memory

...and the crucial role it plays in modern operating systems.

文章译自An introduction to virtual memory

虚拟内存介绍及其在现代操作系统中的重要作用

Computers are complex machines designed to perform a simple task: to run programs — browsers, text editors, web servers, video games, ... — that operate on data — photos, music, text files, databases and so on.

计算机是一种复杂的机器,被设计出来执行一项简单的任务:运行程序——浏览器、文本编辑器、web服务器、视频游戏...——它们的特征是都要操作数据:照片、音乐、文本文件、数据库等。

When not in use, such programs and data live peacefully in the hard drive, the device responsible for keeping information alive even if your computer is turned off. Running an application means to ask the processor (a.k.a. Central Processing Unit or CPU) to read and execute the machine instructions that make up the computer program, along with any additional data processing.

当不使用时,这些程序和数据会安然地呆在硬盘中,硬盘的责任是确保信息不会丢失,即使你的电脑关闭了。运行应用程序意味着要求 处理器(又名中央处理单元或CPU) 读取和执行构成计算机程序的机器指令,以及任何额外的数据处理。

Hard drives store huge amount of information, yet they are terribly slow. Way slower than the processor: a CPU that reads instructions from a hard drive directly would become a serious bottleneck for the whole system. For this reason, the program and its data are first copied to the main memory (a.k.a. Random Access Memory or RAM), another storage hardware component smaller than a hard drive but much faster, so that the processor can read instructions from there without speed penalties.

硬盘存储了大量的信息,但它们的速度却慢得可怕。比处理器慢得多:直接从硬盘上读取指令的CPU会成为整个系统的严重瓶颈。为此,程序及其数据首先被复制到 主存储器(又名随机存取存储器或RAM),这是另一个比硬盘小但速度快得多的存储硬件,这样处理器就可以从那里读取指令,而不会影响速度。

The main memory can be seen as a long list of cells, each one containing some binary data and marked with a number called the memory address. Memory addresses span from 0 to N, based on the amount of main memory available in the system. The range of addresses used by a program is called the address space.

主存储器可以看成是一个长长的 单元格列表,每个单元格都包含一些二进制数据,并标有一个称为 内存地址 的数字。内存地址的范围从 0 到 N,是根据系统中可用的主存储器的容量而定的。一个程序使用的地址范围称为 地址空间

(1. Two programs loaded in memory. Each cell is a memory address. Space between program A and program B might be used by other programs or data. 内存加载的两个程序。每个单元格是一个内存地址。程序A和程序B之间的空间可能被其他程序或数据使用)

Usage of the main memory in early computers

早期计算机对主存储器的使用

In the beginning of the computer history (and also nowadays in embedded systems), programs had access to the entire main memory and its management was left to the programmer. Writing software for those machines was challenging: part of the developer's job was to devise a good way to manage RAM accesses and make sure that the whole program would not overflow the available memory.

在计算机历史的初期(现在的嵌入式系统也是如此),程序可以访问整个主内存,且内存的管理交由程序员负责。为这些机器编写软件很有挑战性:开发人员的部分工作是设计出一种好的方法来管理RAM访问,并确保整个程序不会出现内存溢出。

Things got trickier with the advent of multitasking, when multiple programs could run on the same computer. Programmers had to face new critical issues:

随着多任务处理的出现,事情变得更加棘手,多个程序可以在同一台计算机上同时运行了。程序员们不得不面对新的关键问题:

  1. memory layout — programs located in RAM after the first one would have their address space offset by a certain amount, no longer in the initial range 0 to N. An additional pain point to take care of during development;
  2. memory fragmentation —— as things are moved back and forth to memory, the available space becomes fragmented into smaller and smaller chunks. This would make it harder to find available space to load new programs and data in memory;
  3. security —— what if program A accidentally overwrites program B's memory? Or, even worse: what if it deliberately reads sensitive data from another program, such as passwords or credit card information?
  1. 内存布局 —— 在第一个程序之后加载时RAM中的程序,其地址空间会有一定的偏移,不再是初始范围 0 到 N,在开发过程中多了一个需要注意的痛点。
  2. 内存碎片化 —— 当东西来回在内存移动时,可用空间会被分割成越来越小的碎片。这将导致为新的程序和数据找到可用空间更加困难。
  3. 安全性 —— 如果程序A不小心覆盖了程序B的内存怎么办?或者更糟糕的:如果它故意从另一个程序中读取敏感数据,比如密码或信用卡信息等敏感数据怎么办?

So it was pretty obvious to hardware architects in the early 1960s that a form of automatic memory management could significantly simplify programming and fix the more critical memory protection problem. Eventually they came up with what is known today as virtual memory.

因此,在20世纪60年代初,硬件架构师们很明显地发现,自动内存管理可以大大简化编程,并解决更关键的内存保护问题。最终,他们设计出了今天所说的虚拟内存

Virtual memory in a nutshell

虚拟内存简述

In virtual memory, a program does not have direct access to physical RAM. Instead, it interacts with an illusory address space called virtual address space. The operating system works together with the processor to provide such virtual address space and convert it, sooner or later, into the physical one.

在虚拟内存中,程序不能直接访问物理RAM。相反,它与一个称为虚拟地址空间的地址空间进行交互。操作系统与处理器一起工作,提供这种虚拟地址空间,并在需要时将其转换为物理地址空间。

Every memory access is performed through a virtual address that does not refer to the actual physical location in memory. A program always reads or write the virtual address, and it's completely unaware of what is going on in the underlying hardware.

每一次内存访问都是通过一个虚拟地址进行的,而这个虚拟地址并不指向内存中的实际物理位置。程序总是在读取或写入虚拟地址,它完全不知道底层硬件中发生了什么。

(2. Two processes with their own virtual address spaces. Notice how the physical memory is not contiguous for process. 两个进程都有自己的虚拟地址空间。注意物理内存对进程来说并不一定是连续的)

Benefits of the virtual memory

虚拟内存的好处

In the picture above you can see an example of virtual to physical translation in action, which reveals two main benefits of the virtual memory:

在上面的图片中,你可以看到一个从虚拟地址转换为物理地址的实际例子,从中可以看出虚拟内存的两大好处:

  1. each program has a virtual address space that starts from 0 — this simplifies a lot the programmer's life: no need to manually keep track of memory offsets anymore;
  2. virtual memory is always contiguous, even if the underlying physical counterpart isn't — the operating system does the hard job of gathering the available pieces together into a single, uniform virtual memory chunk.
  1. 每个程序都有一个从0开始的虚拟地址空间 —— 这简化了程序员的工作:无需再手动记录内存偏移量;
  2. 虚拟内存总是毗连的,即使底层的物理内存不是,操作系统也会把可用的碎片聚集到一个统一的虚拟内存块中。

The virtual memory mechanism also solves the problem of a limited RAM: every process is given the impression that it is working with an undefined amount of memory, often larger than the physical one. Moreover, the virtual memory guarantees security: program A can't read or write virtual memory assigned to program B without triggering an operating system error. We will see how all of this magic is possible in the following paragraphs.

虚拟内存机制还解决了有限RAM的问题:每一个进程都以为自己在未定义数量的内存内工作,前者往往比物理内存更大。此外,虚拟内存还保证了安全性:程序A无法读取或写入分配给程序B的虚拟内存,此类违规行为将触发操作系统错误。我们将在下面的段落中看到所有这些神奇的东西是如何实现的。

Pages and frames: where it all begins

Pages(页) 和 frames(桢):一切的起点

The virtual memory mechanism needs a place to store the mapping between virtual and physical addresses. That is, given a virtual address X, the system must be able to find the corresponding physical address Y. However, you can't save such information as a 1:1 relationship: it would require a database as big as the whole RAM!

虚拟内存机制需要一个地方来存储虚拟地址和物理地址之间的映射关系。也就是说,给定一个虚拟地址X,系统必须能够找到相应的物理地址Y。然而,你不能把这样的信息以1:1的关系保存下来:不然就需要一个和整个RAM一样大的数据库了!

Modern virtual memory implementations overcome this problem (and many others) by interpreting the virtual and the physical memory as a long list of small, fixed-size chunks. The chunks of the virtual memory are called pages and the chunks of the physical one are called frames. The Memory Management Unit (MMU) is a hardware component in the CPU that stores the mapping information between pages and frames inside a special data structure called page table. A page table is like a database table where each row contains a page index and the frame index it corresponds to. Every running program has a page table in the MMU, as you can see in the picture below.

现代虚拟内存的实现克服了这个问题(以及许多其他问题),它将虚拟内存和物理内存解释为一长串固定大小的小块。虚拟内存的块被称为pages(页),物理内存的块被称为frames(桢)。内存管理单元(MMU) 是CPU中的一个硬件组件,它将 pages 和 frames 之间的映射信息存储在一个叫做 page table 的特殊数据结构中。page table就像一个数据库表,每一行都包含一个page索引和对应的frame索引。每个运行中的程序在MMU中都有一个 page table,如下图所示。

(3. The MMU mapping in action. Each cell is a process page or a physical memory frame. Some pages may not have a corresponding frame mapped: we will see why in the next paragraphs. MMU映射。每个单元格都是一个进程页或物理内存帧。有些页面可能没有相应的帧映射:我们将在下一段中看到原因。)

Converting pages to frames

将 pages 转换为 frames

A virtual address is made up of two things:

一个虚拟地址由两个东西组成:

  1. a page index, that tells the page the virtual address belongs to;
  2. a frame offset, that tells the position of the physical address inside the frame;
  1. 一个page索引,告诉虚拟地址属于哪个page。
  2. 一个frame偏移量,告诉物理地址在frame中的位置。

This information is enough for the MMU to perform the virtual to physical conversion. When a program reads or write a virtual address, it wakes up the MMU which in turn grabs the page index (1) and searches for the corresponding frame in the program's page table. Once the frame is found, the MMU makes use of the frame offset (2) to find the exact physical memory address and pass it back to the program. At this point the conversion is done: the program has a physical address in RAM to read or write through the virtual one.

这些信息足以让MMU进行虚拟地址到物理地址的转换。当程序读取或写入一个虚拟地址时,它会唤醒MMU,MMU反过来抓取page索引(1),并在程序的page table中搜索相应的frame。一旦找到该frame,MMU利用frame的偏移量(2)找到准确的物理内存地址,并将其传回给程序。至此,转换工作完成:程序在RAM中拥有了一个通过虚拟地址进行读写的物理地址。

Under the hood of virtual memory

虚拟内存的背后

While programs are provided with a continguous, clean and tidy virtual address space, both the operating system and the hardware are allowed to do crazy things in the background with data residing in the physical memory.

虽然程序被提供了一个持续的、干净整洁的虚拟地址空间,但操作系统和硬件仍然有能力在后台用驻留在物理内存中的数据做疯狂的事情。

For example, the operating system often delays loading parts of a program from the hard drive until the program attempts to use it. Some of the code will only be run during initialization or when a special condition occurs. A program's page table may be filled with entries that point to non-existing or not yet allocated frames. This case is depicted by the image 3. above, where the last two pages map to nowhere.

例如,操作系统经常会延迟从硬盘中加载程序的部分内容,直到程序尝试使用时才继续载入。有些代码只有在初始化期间或发生特殊条件时才会运行。程序的page table中的条目可能会指向不存在或尚未分配的frame。这种情况在上面图3中已有所示,最后两个page的映射为空。

Tricks like this one are completely transparent to the application, which keeps reading and writing its own virtual address space unaware of the background noise. However, sooner or later the program may want to access one of the virtual addresses that don't map to the RAM: what to do?

像这样的技巧对程序来说是完全透明的,程序会在不知道背景噪音的情况下不停地读写自己的虚拟地址空间。然而,程序迟早会访问到其中一个没有映射到物理RAM的虚拟地址,那时要怎么办?

Page faults (缺页错误)

A page fault (also known as page miss) occurs when a program accesses a virtual address on a page not currently mapped to a physical frame. More specifically, a page fault takes place when the page exists in the program's page table but points to a non-existent or not yet available frame in the physical memory.

当程序访问没有映射到物理frame上的虚拟地址时,就发生了 page fault(也称为 page miss)。更具体地说,当一个page在程序的page table中存在,但却指向了物理存储器中不存在或尚不可用的frame时,就会发生 page fault。

The MMU detects the page fault and redirects the message to the operating system, which will do its best to find a frame in the physical memory for the mapping. Most of the time this is a straightforward operation, unless the system is running out of RAM.

MMU检测到page fault,并将消息重定向到操作系统,操作系统将尽最大努力在物理内存中找到映射的frame。大多数情况下,这是一个直接的操作,除非系统的RAM用完了。

Paging, or when the physical memory is not enough

Paging(分页), 或当物理内存不够用时

Paging is another memory management trick: the operating system moves some pages to the hard drive, to make room for other programs or data when there is no more physical memory available. Sometimes it is also called swapping, although not 100% correct. Swapping is about moving the entire process to disk. Some operating systems do this too, when needed.

Paging 是另一种内存管理技巧:当没有更多的物理内存可用时,操作系统会将一些pages移动到硬盘上,为其他程序或数据腾出空间。有时它也被称为swapping,虽然不是100%正确。Swapping(交换)是指将整个进程移动到磁盘上。有些操作系统在需要的时候也会这样做。

Paging gives programs the illusion of an unlimited amount of available RAM. The operating system optimistically allows for a virtual memory address space larger than the physical one, knowing that data can be moved in and out the hard drive in case of need. Some systems (e.g. Windows) make use of a special file called paging file for this purpose. Others (e.g. Linux) have a dedicated hard drive partition called swap area (for historical reasons though, modern Linux performs paging instead of swapping).

分页给程序带来了一种假象,以为可用RAM是无限量的。操作系统乐观地允许虚拟内存地址空间大于物理地址空间,因为它知道数据可以在需要的时候向硬盘中移入移出。有些系统(如Windows)为此目的使用了一个叫做 paging file(分页文件) 的特殊文件。其他系统(如Linux)有一个专门的硬盘分区,称为 swap area(swap区)(由于历史原因,现代Linux执行的是paging,而不是swapping)。

Unfortunately the hard drive is way slower than the main memory. So when a page fault occurs and the page was temporarily moved to the hard drive, the operating system has to read data from the sluggish medium and move it back to memory, causing a lag. All in all, less paging means a system that runs more efficiently.

不幸的是,硬盘的速度比主内存要慢得多。所以当出现page fault,临时将page移动到硬盘上时,操作系统必须从低速介质中读取数据,然后将数据移回内存,造成滞后。总而言之,越少的paging意味着系统的运行效率越高。

Thrashing(磨蹭?)

Thrashing occurs when the system spends more time in paging than running applications, triggered by a constant stream of page faults. This is an extreme corner case that happens if you are running too many programs that fill up the entire RAM and/or the paging area on the hard drive is unoptimized. The operating system tries to keep up with the large amount of page fault requests, constantly moving data between the hard drive and the physical memory, grinding the system to a halt. Thrashing can be avoided by increasing the amount of RAM, reducing the number of programs being run or again by adjusting the size of the swap file.

当系统在paging中花费的时间多于运行应用程序本身时,就出现了 Thrashing 现象。如果你运行的程序太多,占用了整个内存,或者硬盘上的分页区域没有经过优化时,就会出现这种极端的情况。操作系统会努力跟上大量的page faults请求,不断地在硬盘和物理内存之间移动数据,使系统陷入停顿。可以通过增加RAM、减少正在运行的程序数量或调整swap file的大小,来避免thrashing现象。

Memory protection

内存保护

Virtual memory also provides security across running applications: your browser can't peep into your text editor's virtual memory and vice versa without triggering an error. The main purpose of memory protection is to prevent a process from accessing memory that doesn't belong to it.

虚拟内存还提供了运行中的应用程序之间的安全性:你的浏览器不能偷窥到你的文本编辑器的虚拟内存,反之亦然。内存保护的主要目的是防止进程访问不属于它的内存。

The memory protection mechanism is usually provided by the MMU and the page tables it manages, while other architectures may use different hardware strategies. When a program tries to access a portion of virtual memory it doesn't own, an invalid page fault is triggered. The MMU and the operating system catch the signal and raise a failure condition called segmentation fault (on Unix) or access violation (on Windows). The operating system usually kills the program in response.

内存保护机制通常由MMU和它所管理的page tables提供,而其他架构可能使用不同的硬件策略。当程序试图访问不属于自己的虚拟内存时,就会触发 invalid page fault(无效页故障)。MMU和操作系统会捕捉到这个信号,并发出一个称为segmentation fault(分段故障,在Unix上)或access violation(访问违规,在Windows上)的故障条件。作为响应,操作系统通常会杀死该程序。

Segmentation faults and access violations are also often raised by mistake. Programming languages that perform manual memory management give you the ability to set aside portions of memory to be used to store program data: the operating system will provide you with a nice chunk of free memory (a.k.a. a buffer) to read and write according to your program's needs. However, nothing prevents you to read or write outside the buffer boundaries, accessing memory that doesn't belong to your program or simply doesn't exist. The operating system would detect the illegal access and raise the usual violation signal.

Segmentation faults 和 access violations 也常常因疏忽而产生。执行手动内存管理的编程语言允许你预留一部分内存用于存储程序数据:操作系统会给你提供一块不错的空闲内存(又名buffer(缓冲区)),让你根据程序的需要进行读写。但是,没有任何东西可以阻止你在缓冲区边界之外进行读写,访问不属于你的或根本不存在的内存。操作系统会检测到非法访问,并发出通常的违规信号。

Read more (阅读更多)

Virtual memory paves the road for many other interesting topics. For example, memory-mapped files are a powerful abstraction over the traditional way of reading and writing files. Instead of manually copying data into memory in order to operate on it, memory mapping allows a program to access a file directly from the hard drive as if it was already fully loaded in RAM. The virtual memory mechanism will take care of moving data from the hard drive to RAM as usual, when necessary. Memory-mapped files simplify the programmer's work and usually speed up file access operations. More information here.

虚拟内存为许多其他有趣的话题铺平了道路。例如,与传统的读写文件的方式相比,memory-mapped files(内存映射文件)是一种强大的抽象。内存映射不需要手动复制数据到内存中进行操作,而是允许程序直接从硬盘中访问文件,就像访问在RAM中已经完全加载好的文件一样。必要时,虚拟内存机制会像往常一样将数据从硬盘中移动到RAM中。内存映射文件简化了程序员的工作,通常会加快文件访问操作的速度。更多信息请看这里

Virtual memory also makes more difficult to reason about memory consumption. Suppose one of your programs is taking up 300 megabytes of memory: is it virtual or physical? Is part of that space paged to disk? And if it is, will the paging operations be fast enough? Also, tuning the paging file/swap area is an important step if you want to keep your system in a good shape. Operating systems provide many tools to measure and adjust memory: more information here and here.

虚拟内存也使得对内存消耗的推理(计算)更加困难。假设你的一个程序占用了300兆字节的内存:它是虚拟内存还是物理内存?其中的一部分空间有分页到磁盘上吗?如果是,分页操作是否足够快?另外,如果你想让系统保持良好的状态,调整分页文件/交换区是一个重要的步骤。操作系统提供了许多测量和调整内存的工具:更多信息请看这里这里

参考

Computer Hope — Memory
Peter J. Denning — Before memory was virtual
Android Authority — What is virtual memory?
Kernel.org — Memory Management
Operating Systems: Three Easy Pieces — Chapter 18: Paging
Philippe's Oppermann — Introduction to Paging
Computer Science from the Bottom Up — Chapter 6. Virtual Memory
Dr. John T. Bell — Operating systems, Virtual Memory
StackOverflow — Do modern OS's use paging and segmentation?
StackOverflow — What is thrashing? Why does it occur?
Wikipedia — Memory address
Wikipedia — Paging
Wikipedia — Address space
Wikipedia — Virtual memory
Wikipedia — Virtual address space
Wikipedia — Thrashing
Wikipedia — Segmentation fault
ITPro Today — Paging Performance
Aleph One — Smashing The Stack For Fun And Profit