logo资料库

深入理解Linux虚拟内存管理 英文版 PDF.pdf

第1页 / 共732页
第2页 / 共732页
第3页 / 共732页
第4页 / 共732页
第5页 / 共732页
第6页 / 共732页
第7页 / 共732页
第8页 / 共732页
资料共732页,剩余部分请下载后查看
Understanding The Linux Virtual Memory Manager Mel Gorman 6th December 2003
Preface Linux is developed with a stronger practical emphasis than a theoretical one. When new algorithms or changes to existing implementations are suggested, it is common to request code to match the argument. Many of the algorithms used in the Virtual Memory (VM) system were designed by theorists but the implementations have now diverged from the theory considerably. In part, Linux does follow the traditional development cycle of design to implementation but it is more common for changes to be made in reaction to how the system behaved in the “real-world” and intuitive decisions by developers. This means that the VM performs well in practice but there is very little VM specific documentation available except for a few incomplete overviews in a small number of websites, except the web site containing an earlier draft of this book of course! This has lead to the situation where the VM is fully understood only by a small number of core developers. New developers looking for information on how it functions are generally told to read the source and little or no information is available on the theoretical basis for the implementation. This requires that even a casual observer invest a large amount of time to read the code and study the field of Memory Management. This book, gives a detailed tour of the Linux VM as implemented in 2.4.22 and gives a solid introduction of what to expect in 2.6. As well as discussing the implementation, the theory it is is based on will also be introduced. This is not intended to be a memory management theory book but it is often much simpler to understand why the VM is implemented in a particular fashion if the underlying basis is known in advance. To complement the description, the appendix includes a detailed code comment- ary on a significant percentage of the VM. This should drastically reduce the amount of time a developer or researcher needs to invest in understanding what is happening inside the Linux VM. As VM implementations tend to follow similar code patterns even between major versions. This means that with a solid understanding of the 2.4 VM, the later 2.5 development VMs and the final 2.6 release will be decipherable in a number of weeks. The Intended Audience Anyone interested in how the VM, a core kernel subsystem, works will find answers to many of their questions in this book. The VM, more than any other subsystem, i
Preface ii affects the overall performance of the operating system. It is also one of the most poorly understood and badly documented subsystem in Linux, partially because there is, quite literally, so much of it. It is very difficult to isolate and understand individual parts of the code without first having a strong conceptual model of the whole VM, so this book intends to give a detailed description of what to expect without before going to the source. This material should be of prime interest to new developers interested in adapting the VM to their needs and to readers who simply would like to know how the VM works. It also will benefit other subsystem developers who want to get the most from the VM when they interact with it and operating systems researchers looking for details on how memory management is implemented in a modern operating system. For others, who are just curious to learn more about a subsystem that is the focus of so much discussion, they will find an easy to read description of the VM functionality that covers all the details without the need to plough through source code. However, it is assumed that the reader has read at least one general operating system book or one general Linux kernel orientated book and has a general know- ledge of C before tackling this book. While every effort is made to make the material approachable, some prior knowledge of general operating systems is assumed. Book Overview In chapter 1, we go into detail on how the source code may be managed and de- ciphered. Three tools will be introduced that are used for the analysis, easy browsing and management of code. The main tools are the Linux Cross Referencing (LXR) tool which allows source code to be browsed as a web page and CodeViz for gener- ating call graphs which was developed while researching this book. The last tool, PatchSet is for managing kernels and the application of patches. Applying patches manually can be time consuming and the use of version control software such as CVS (http://www.cvshome.org/ ) or BitKeeper (http://www.bitmover.com) are not always an option. With this tool, a simple specification file determines what source to use, what patches to apply and what kernel configuration to use. In the subsequent chapters, each part of the Linux VM implementation will be discussed in detail, such as how memory is described in an architecture independent manner, how processes manage their memory, how the specific allocators work and so on. Each will refer to the papers that describe closest the behaviour of Linux as well as covering in depth the implementation, the functions used and their call graphs so the reader will have a clear view of how the code is structured. At the end of each chapter, there will be a “What’s New” section which introduces what to expect in the 2.6 VM. The appendices are a code commentary of a significant percentage of the VM. It gives a line by line description of some of the more complex aspects of the VM. The style of the VM tends to be reasonably consistent, even between major releases of the kernel so an in-depth understanding of the 2.4 VM will be an invaluable aid to understanding the 2.6 kernel when it is released.
Preface What’s New in 2.6 iii At the time of writing, 2.6.0-test4 has just been released so 2.6.0-final is due “any month now” which means December 2003 or early 2004. Fortunately the 2.6 VM, in most ways, is still quite recognisable in comparison to 2.4. However, there is some new material and concepts in 2.6 and it would be pity to ignore them so to address this, hence the “What’s New in 2.6” sections. To some extent, these sections presume you have read the rest of the book so only glance at them during the first reading. If you decide to start reading 2.5 and 2.6 VM code, the basic description of what to expect from the “Whats New” sections should greatly aid your understanding. It is important to note that the sections are based on the 2.6.0-test4 kernel which should not change change significantly before 2.6. As they are still subject to change though, you should still treat the “What’s New” sections as guidelines rather than definite facts. Companion CD A companion CD is included with this book and it is highly recommended the reader become familiar with it, especially as you progress more through the book and are using the code commentary. It is recommended that the CD is used with a GNU/Linux system but it is not required. The text of the book is constained the CD in HTML, PDF and HTML formats so the reader can perform basic text searches if the index does not have the desired information. If you are reading the first edition of the book, you may notice small differences between the CD version and the paper version due to printing deadlines but they will be minor. Almost all the tools used to research the book’s material are contained on the CD. Each of the tools may be installed on virtually any GNU/Linux installation and references are included to available documentation and the project home sites so you can check for further updates. With many GNU/Linux installations, there is the additional bonus of being able to run a web server directly from the CD which has been tested with Red Hat 7.3 and Debian Woody but should work with any distribution. The small web site it provides at http://localhost:10080 provides a number of useful features; • A searchable index for functions that have a code commentary available. If a function is searched for that does not have a commentary, the browser will be automatically redirected to LXR; • A web browsable copy of the Linux 2.4.22 source. This allows code to be browsed and identifiers to be searched for; • A live version of CodeViz, the tool used to generate call-graphs for the book, is available. If you feel that the book’s graphs are lacking some detail you want, generate them yourself;
Preface iv • The VM Regress, CodeViz and patchset packages which are discussed in Chapter 1 are available in /cdrom/software. gcc-3.0.4 is also provided as it is required for building CodeViz. Mount the CD on /cdrom as followed; root@joshua:/$ mount /dev/cdrom /cdrom -o exec The webserver is Apache 1.3.27 (http://www.apache.org/ ) and has been built and configured to run with it’s root as /cdrom/. If your distribution normally uses another directory, you will need to use this one instead. To start it, run the script /cdrom/start_server. If there are no errors, the output should look like: mel@joshua:~$ /cdrom/start_server Starting CodeViz Server: done done Starting Apache Server: The URL to access is http://localhost:10080/ If the server starts successfully, point your browser to http://localhost:10080 to avail of the CDs web services. To shutdown the server, run the script /cdrom/stop_server and the CD may then be unmounted. Typographic Conventions The conventions used in this document are simple. New concepts that are introduced as well as URLs are in italicised font. Binaries and package names are are in bold. Structures, field names, compile time defines and variables are in a constant-width font. At times when talking about a field in a structure, both the structure and field name will be included like page→list for example. Filenames are in a constant- width font but include files have angle brackets around them like and may be found in the include/ directory of the kernel source. Acknowledgments The compilation of this book was not a trivial task. This book was researched and developed in the open and it would be remiss of me not to mention some of the people who helped me at various intervals. If there is anyone I missed, I apologise now. First, I would like to thank John O’Gorman who tragically passed away while the material for this book was being researched. It was his experience and guidance that largely inspired the format and quality of this book. Secondly, I would like to thank Mark L. Taub from Prentice Hall PTR for giving me the opportunity to publish this book. It has being a rewarding experience and it
Preface v made trawling through all the code worthwhile. Massive thanks go to my reviewers who provided clear and detailed feedback long after I thought I had finished writing. Finally, on the publishers front, I would like to thank Bruce Perens for allowing me to publish under the Bruce Peren’s Open Book Series (http://www.perens.com/Books). With the technical research, a number of people provided invaluable insight. Abhishek Nayani, was a source of encouragement and enthusiasm early in the re- search. Ingo Oeser kindly provided invaluable assistance early on with a detailed explanation on how data is copied from userspace to kernel space including some valuable historical context. He also kindly offered to help me if I felt I ever got lost in the twisty maze of kernel code. Scott Kaplan made numerous corrections to a number of systems from non-contiguous memory allocation, to page replacement policy. Jonathon Corbet provided the most detailed account of the history of the kernel development with the kernel page he writes for Linux Weekly News. Zack Brown, the chief behind Kernel Traffic, is the sole reason I did not drown in kernel related mail. IBM, as part of the Equinox Project, provided an xSeries 350 which was invaluable for running my own test kernels on machines larger than what I pre- viously had access to. Finally, Patrick Healy was crucial to ensuring that this book was consistent and approachable to people who are familiar, but not experts, on Linux or memory management. A number of people helped with smaller technical issues and general inconsisten- cies where material was not covered in sufficient depth. They are Muli Ben-Yehuda, Parag Sharma, Matthew Dobson, Roger Luethi, Brian Lowe and Scott Crosby. All of them sent corrections and queries on differnet parts of the document which ensured too much prior knowledge was assumed. Carl Spalletta sent a number of queries and corrections to every aspect of the book in its earlier online form. Steve Greenland sent a large number of grammar corrections. Philipp Marek went above and beyond being helpful sending over 90 separate corrections and queries on various aspects. Long after I thought I was finished, Aris Sotiropoulos sent a large number of small corrections and suggestions. The last person, whose name I cannot remember but is an editor for a magazine sent me over 140 corrections against an early version to the document. You know who you are, thanks. Eleven people sent a few corrections, though small, were still missed by several of my own checks. They are Marek Januszewski, Amit Shah, Adrian Stanciu, Andy Isaacson, Jean Francois Martinez, Glen Kaukola, Wolfgang Oertl, Michael Babcock, Kirk True, Chuck Luciano and David Wilson. On the development of VM Regress, there were nine people who helped me keep it together. Danny Faught and Paul Larson both sent me a number of bug reports and helped ensure it worked with a variety of different kernels. Cliff White, from the OSDL labs ensured that VM Regress would have a wider application than my own test box. Dave Olien, also associated with the OSDL labs was responsible for updating VM Regress to work with 2.5.64 and later kernels. Albert Cahalan sent all the information I needed to make it function against later proc utilities. Finally, Andrew Morton, Rik van Riel and Scott Kaplan all provided insight on what direction the tool should be developed to be both valid and useful.
Preface vi The last long list are people who sent me encouragement and thanks at various intervals. They are Martin Bligh, Paul Rolland, Mohamed Ghouse, Samuel Chess- man, Ersin Er, Mark Hoy, Michael Martin, Martin Gallwey, Ravi Parimi, Daniel Codt, Adnan Shafi, Xiong Quanren, Dave Airlie, Der Herr Hofrat, Ida Hallgren, Manu Anand, Eugene Teo, Diego Calleja and Ed Cashin. Thanks, the encourage- ment was heartening. In conclusion, I would like to thank a few people without whom, I would not have completed this. I would like to thank my parents who kept me going long after I should have been earning enough money to support myself. I would like to thank my girlfriend Karen, who patiently listened to rants, tech babble, angsting over the book and made sure I was the person with the best toys. Kudos to friends who dragged me away from the computer periodically and kept me relatively sane, including Daren who is cooking me dinner as I write this. Finally, I would like to thank the thousands of hackers that have contributed to GNU, the Linux kernel and other Free Software projects over the years who without I would not have an excellent system to write about. It was an inspiration to me to see such dedication when I first started programming on my own PC 6 years ago after finally figuring out that Linux was not an application for Windows used for reading email.
Contents List of Figures List of Tables xiii xvi 1 Introduction 1 2 1.1 Getting Started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Managing the Source . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Browsing the Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.4 Reading the Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.5 Submitting Patches . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2 Describing Physical Memory 14 2.1 Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2 Zones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3 Zone Initialisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.4 Pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.5 High Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.6 What’s New In 2.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3 Page Table Management 32 3.1 Describing the Page Directory . . . . . . . . . . . . . . . . . . . . . . 33 3.2 Describing a Page Table Entry . . . . . . . . . . . . . . . . . . . . . . 35 3.3 Using Page Table Entries . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.4 Translating and Setting Page Table Entries . . . . . . . . . . . . . . . 38 3.5 Allocating and Freeing Page Tables . . . . . . . . . . . . . . . . . . . 38 3.6 Kernel Page Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.7 Mapping addresses to a struct page . . . . . . . . . . . . . . . . . . 41 3.8 Translation Lookaside Buffer (TLB) . . . . . . . . . . . . . . . . . . . 42 3.9 Level 1 CPU Cache Management . . . . . . . . . . . . . . . . . . . . 43 3.10 What’s New In 2.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4 Process Address Space 52 4.1 Linear Address Space . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.2 Managing the Address Space . . . . . . . . . . . . . . . . . . . . . . . 54 4.3 Process Address Space Descriptor . . . . . . . . . . . . . . . . . . . . 54 vii
分享到:
收藏