Overview of Single-User Minibase
It is easiest to understand Minibase in terms of how a query is processed.
A query is first parsed and optimized to obtain an evaluation plan. The
evaluation plan is structured as a tree of iterators, each of which
supports a ``get-next-tuple'' interface, and corresponds typically to a
relational algebra operator. Execution is initiated by calling
get-next-tuple at the root of the plan tree, which successively generates
get-next-tuple calls on descendant nodes. Execution consists essentially
of a data-driven loop of such calls until all the input tuples have been
processed. This is described below in more detail. (Only single-user
Minibase is covered in this overview---in fact, only single-user Minibase
is available for now, and most likely, for a while yet!)
Parser and Optimizer: These modules take an SQL query and find the best
plan for evaluating it. The optimizer is similar to the one used in System
R (and subsequently in most commercial relational database systems). In
optimizing a query, the optimizer considers information in the catalog
about relations and indexes. It can read catalog information from a Unix
file, and thus the parser/optimizer can be used in a stand-alone mode,
independent of the rest of Minibase (say, for optimizer related
exercises).
Execution Planner: This module takes the plan tree produced by the
optimizer, and creates a run-time data structure. This structure is
essentially a tree of ``iterators'', each of which supports a
``get-next-tuple'' interface. Tuples are returned in response to
get-next-tuple calls by copying them into dynamically allocated main
memory (not the buffer pool).
The figure below shows details of iterator trees and the Minibase code
that they call.
Main loop: Execution is triggered by ``pulling'' on the root of the tree
with a get-next-tuple call. This results in similar calls to iterators
lower in the tree. Leaf level iterators retrieve tuples from a relation
(perhaps using an index), and intermediate level iterators in the tree
correspond to joins. Selections and projections are ``piggy-backed'' onto
these iterators. More information is available on each of these
components:
Iterators A ``get-next-tuple'' interface for file scans, index scans and
joins.
Join Methods Nested loops, sort-merge and hash joins are supported.
Heap Files All data records are stored in heap files, which are files of
unordered pages implemented on top of the DB class.
Access Methods: Currently only a single access method is supported, B+
Trees. The access method in Minibase is dense, unclustered, and store
key/rid-of-data-record pairs (Alternative 2, in terms of the text). Data
records are always stored in heap files, as noted above, and access methods
are implemented (like heap files) as files on top of the DB class.
Buffer Manager: The buffer manager swaps pages in and out of the (main
memory) buffer pool in response to requests from access method and the
heap file component.
Disk Space Manager: A database is a fixed size Unix file, and pages (in
the file) on disk are managed by the disk space manager.
Disk Space Manager
The disk space manager (DSM) (implemented as part of the DB class) is the
component of Minibase that takes care of the allocation and deallocation
of pages within a database. It also performs reads and writes of pages
to and from disk, and provides a logical file layer within the context
of a database management system.
A Minibase database is implemented as a single Unix file. Its pages are
simply page-sized blocks of bytes within this file. The higher-level
structures of a Minibase database, such as heap files and B+ tree files,
are actually logical files consisting of collections of database pages.
In discussing the DSM, whenever necessary we will refer to page-sized
blocks of bytes in the underlying Unix file as ``pages'', and pages in
higher-level structures such as heap files as, for example, ``heap file
pages'' in order to avoid confusion.
The Page Class
The abstraction of a page is provided by the Page class. All higher level applications use this Page
class. Higher layers impose their own structure on pages simply by casting page pointers to their
own record types. The data part of a page is guaranteed to start at the beginning of the block.
The Database (DB) Class
The DB class provides the abstraction of a single database stored on disk. It shields the rest of the
software from the fact that the database is implemented as a single Unix file. It provides methods
for allocating additional pages (from the underlying Unix file) for use in the database and
deallocating pages (which may then be re-used in response to subsequent allocation requests).
(The DB class actually supports allocation and deallocation of a consecutive run of pages, though
the higher-level code usually just asks for one page at a time.)
There is one instance of the DB class for every database used. (In Minibase,
a transaction is allowed to have at most one active database, so there
is just one instance of this class.) The operations on this class include
creating and destroying databases, and, as noted above, allocating and
deallocating pages. Further, existing databases can be opened and closed,
and there are methods to retrieve certain characteristic properties of
the database, like the number of pages and page size.
File Directory
The DB class also provides a file naming service, which is used by higher-level code to create
logical ``files of pages''. This service is implemented using records consisting of file names and
their header page ids. There are functions to insert, look up, and delete file entries. The set of file
entries is collectively referred to as the file directory.
Space Management
The DB class keeps track of allocated space within the database using a fixed set of pages called
the space map. They can be thought of as containing a bit map: one bit per database page, with
``0'' denoting that the corresponding page is free and ``1'' denoting that it is allocated.
The DB class maintains the space map, in addition to its other duties,
updating it whenever pages are allocated or deallocated.
Limitations, or Room for Improvement
The current implementation creates fixed-size databases; the space map is set when the database is
created, and never grows. This limitation would be fairly easy to remedy, either by setting a
maximum database size (still fixing the size of the space map, but allowing the database to grow
to fit the maximum number of pages representable in the map), or by having the space map be a
linked list of pages and grow as needed.
Buffer Manager
Introduction
The buffer manager reads disk pages into a main memory page as needed.
The collection of main memory pages (called frames) used by the buffer
manager for this purpose is called the bufferpool. This is just an array
of Page objects. The buffer manager is used by (the code for) access
methods, heap files, and relational operators to read / write /allocate
/ de-allocate pages. The Buffer Manager makes calls to the underlying DB
class object, which actually performs these functions on disk pages.
Replacement policies for the buffer manager can be changed easily at
compile time.
Design Decisions
A hash table is used to figure out what page frame a given disk page (i.e., with a given pageId)
occupies. A buffer descriptor object is associated with every page frame in the buffer pool. It
contains a dirty bit, the page number, and the pin count for the page occupying that frame.
When a page is requested, the buffer manager brings it in and pins it.
The buffer manager does not keep track of all the pages that have been
pinned by a transaction. It is up to the various components (that call
the buffer manager) to make sure that all pinned pages are subsequently
unpinned.
See the text for a more detailed description of the buffer manager.
Heap File
Introduction
A heap file is an unordered set of records. The following operations are supported:
* Heap files can be created and destroyed.
* Existing heapfiles can be opened and closed.
* Records can be inserted and deleted.
* Records are uniquely identified by a record id (rid). A specific record can be retrieved by using
the record id.
Sequential scans on heap files are also supported. A scan object is created,
and get-next calls on this object allow us to retrieve all records starting
from the first record.
A file scan iterator uses a heap file scan, and calls the eval function
to apply any desired selections to the retrieved tuples.
Temporary heap files are supported as well. These are used for external
sorting.
Awordonnaming: The catalog also names files. It would be best to have
a global naming algorithm.
Design Decisions
The main kind of page structure used in the Heap File is HFPage, and this is viewed as a Page
object by lower-level code. The HFPage class uses a slotted page structure with a slot directory
that contains (slot offset, slot length) pairs. The page number and slot number are used together to
uniquely identify a record (rid). When a record is deleted, the space is reclaimed by compacting
records on the page, and the length in the corresponding slot entry is set to a negative value
(indicating that the slot is empty).
Pages are maintained in a doubly linked list, and can be de-allocated when
empty. The heap file layer makes calls to the buffer manager. The only
exception is that the directory (file naming) services of the DB class
(disk space management) are called directly.
Alignment issues are not considered. Tuples are stored only as a series
of bytes. It is up to upper layers to make sense of them. (See related
point in discussion of tuples.)
See the text for a more detailed description of implementation
alternatives for heap files.
Access Methods
Introduction
An access method (or index) facilitates retrieval of a desired record from a relation. In Minibase,
all records in a relation are stored in a Heap File. An index consists of a collection of records of
the form , where `key value' is a value for the search key of the index, and `rid' is
the id of a record in the relation being indexed. Any number of indexes can be defined on a
relation.
Only B+ Tree indexes are currently supported.
Minibase only supports unclustered indexes. (In fact, dense, unclustered
indexes storing key/rid-of-data-record pairs; Alternative 2, in terms of
the text.) Implementing clustered indexes would require major changes.
(The optimizer, however, can correctly handle clustered vs. unclustered
indexes. Of course, this is only useful while running the optimizer in
stand-alone mode against synthetic catalogs.)
Design Decisions
All access methods are based on a common C++ base class. A particular access method can be
understood in terms of its file structure (which determines how the pages in the index file are
organized), page structures (typical page format), and the functionality of scans (e.g., whether
range scans are supported).
Introduction
The relational operators are implemented as iterators, i.e., they support a get-next-tuple interface.
Selections are implemented as part of file-scan iterators or index-scan
iterators; projections (with no selection) are always file-scan iterators.
Selections on single relations are combined with projections.
Whenever possible, selections are combined with joins by applying the
(non-join) selection conditions to the tuples in the join result.
Similarly, any unwanted attributes are projected out as part of the join
operators. Each join method defines a new kind of iterator.
Sorting is implemented as a separate iterator, and is used internally by
the sort-merge join operator. It can also be used for GROUP-BY, ORDER-BY
and DISTINCT.
Design Decisions
External sorting. [ public interface ]
Projection (with elimination of duplicates). [ public interface ]
Selection. [ public interface ]
Join algorithms [ public interface ]
Tuples
Introduction
Each record in a relation is a collection of bytes that is cast as an object of type Tuple. Only
implementors of access methods, heap files and relational operators need to worry about how the
bytes are utilized . Others can treat them as just a collection of bytes.
Tuplesmay have attributes of three types: strings (up to 255 characters),
integers, and real numbers (floats, in C++).