Inside The Python Virtual Machine
Obi Ike-Nwosu
This book is for sale at http://leanpub.com/insidethepythonvirtualmachine
This version was published on 2019-03-02
This is a Leanpub book. Leanpub empowers authors and publishers with the Lean Publishing
process. Lean Publishing is the act of publishing an in-progress ebook using lightweight tools and
many iterations to get reader feedback, pivot until you have the right book and build traction once
you do.
© 2015 - 2019 Obi Ike-Nwosu
Also By Obi Ike-Nwosu
Intermediate Python
Contents
1.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2. The View From 30,000ft . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3. Compiling Python Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
From Source To Parse Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Python tokens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
From Parse Tree To Abstract Syntax Tree . . . . . . . . . . . . . . . . . . . . . . . . .
Building The Symbol Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
From AST To Code Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1
3.2
3.3
3.4
3.5
1
3
9
9
11
15
16
24
4. Python Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
37
38
41
44
48
59
PyObject
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Under the cover of Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Type Object Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Minting type instances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Objects and their attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Method Resolution Order (MRO)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1
4.2
4.3
4.4
4.5
4.6
5. Code Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
62
68
70
Exploring code objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Code Objects within other code objects
. . . . . . . . . . . . . . . . . . . . . . . . . .
Code Objects in the VM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1
5.2
5.3
6.
7.
Frames Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.1
74
Allocating Frame Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Interpreter and Thread States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
76
7.1
7.2
78
The Interpreter state . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The Thread state . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.
Intermezzo: The abstract.c Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
9. The evaluation loop, ceval.c . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
88
90
Putting names in place . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The parts of the machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.1
9.2
CONTENTS
9.3
9.4
The Evaluation loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A sampling of opcodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
94
99
10. The Block Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
A Short Note on Exception Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
10.1
11. From Class code to bytecode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
12. Generators: Behind the scenes.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
The Generator object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
Running a generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
12.1
12.2
1. Introduction
The Python Programming language has been around for quite a while. Development work was
started on the first version by Guido Van Rossum in 1989 and it has since grown to become one of
the more popular languages that has been used in applications ranging from graphical interfaces to
financial¹ and data analysis² applications.
This write-up aims to go behind the scene of the Python interpreter and provide a conceptual
overview of how a python program is executed. This material targets CPython which as of this
writing is the most popular implementation of Python and is considered the standard.
Python and CPython are used interchangeably in this text but any mention of Python
refers to CPython which is the version of python implemented in C. Other implementations
include PyPy which is python implemented in a restricted subset of Python, Jython which
is python implemented on the Java Virtual Machine etc.
I like to think of the execution of a python program as split into two or three main phases as listed
below depending on how the interpreter is invoked. These are covered in different measures within
this write-up:
1. Initialization : This involves the set up of the various data structures needed by the python
process. This will probably only counts when a program is being executed non-interactively
through the interpreter shell.
2. Compiling : This involves activities such as parsing source code to build syntax trees, creation
of abstract syntax trees, building of symbol tables and generation of code objects.
3. Interpreting : This involves the actual execution of generated code objects within some context.
The process of generating parse trees and abstract syntax trees from source code is language agnostic
so the same methods that apply to other languages also apply to Python; as a result, not much
is on this subject is covered here. On the other hand, the process of building symbol tables and
code objects from the Abstract Syntax tree is the more interesting part of the compilation phase
which is handled in a more or less python specific way and attention is paid to it. The interpreting
of compiled code objects and all the data structures that are used in the process is also covered.
Topics that will be touched upon include but are not limited to the process of building symbol tables
and generating code objects, python objects, frame objects, code objects, function objects, python
opcodes, the interpreter loop, generators and user defined classes.
¹http://tpq.io/
²http://pandas.pydata.org/
Introduction
2
This material is aimed at anybody that is interested in gaining some insight into how the
CPython virtual machine functions. It is assumed that the user is already familiar with python and
understands the fundamentals of the language. As part of this expose on the virtual machine, we go
through a considerable amount of C code so a user that has a rudimentary understanding of C will
find it easier to follow. After all said and done, all that is needed to get through this material is a
healthy desire to want to learn about the CPython virtual machine.
This work is an expanded version of personal notes taken while investigating the inner working of
the python interpreter. There is substantial amount of wisdoms in videos available in Pycon videos³,
school lectures⁴ and blog write-ups⁵. This work will not be complete without acknowledging these
fantastic sources that have been leveraged in the production of this work.
At the end of this book, a user should be able to understand the intricacies of how the Python
interpreter executes a program. This includes the various steps involved in executing the program
and the various data structures that are crucial to the execution of such program. We start off with
a gentle bird’s eye view of what happens when a trivial program is executed by passing the module
name to the interpreter at the commandline. The CPython executable can be installed from source
by following the instructions at the Python Developer’s Guide⁶.
Python version 3 is used throughout this material.
³https://www.youtube.com/watch?v=XGF3Qu4dUqk
⁴http://pgbovine.net/cpython-internals.htm/
⁵https://tech.blog.aknin.name/2010/04/02/pythons-innards-introduction/
⁶https://docs.python.org/devguide/index.html#
2. The View From 30,000ft
This chapter provides a high level expose on how the interpreter goes about executing a python
program. In subsequent chapters, we zoom in on the various pieces of puzzle and provide a more
detailed description of such pieces. Regardless of the complexity of a python program, this process is
the same. The excellent explanation of this process provided by Yaniv Aknin in his Python Internal
series¹ provides some of the basis and motivation for this discussion.
Given a python module, test.py, this module can be executed at the command line by passing it
as an argument to the python interpreter program as such $python test.py. This is just one of the
ways of the invoking the python executable - we could start the interactive interpreter, execute a
string as code etc but these other methods of execution are not of interest to us. When the module
is passed as an argument to the executable on the command line, figure 2.1 best captures the flow
of various activities that are involved in the actual execution of the supplied module.
Figure 2.1: Flow during execution of source code
The python executable is a C program just like any other C program such as the linux kernel or
a simple hello world program in C so pretty much the same process happens when the python
executable is invoked. Take a moment to grasp this, the python executable is just another program
that runs your own program. The same argument can be made for the relationship between C and
assembly or llvm. The standard process initialization which depends on the platform the executable
is running on starts once the python executable is invoked with module name as argument,
¹https://tech.blog.aknin.name/2010/04/02/pythons-innards-introduction/