RISCV-BOOM Documentation
Chris Celio, Jerry Zhao, Abraham Gonzalez, Ben Korpan
May 15, 2019
Contents:
1 Introduction and Overview
The BOOM Pipeline .
.
The RISC-V ISA .
The Chisel HCL .
.
.
.
1.1
1.2
1.3
1.4 Quick-start
1.5
1.6
.
.
.
.
.
.
.
.
The BOOM Repository .
.
The Rocket-Chip Repository .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2 Instruction Fetch
.
.
.
.
.
.
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1
2.2
2.3
2.4
.
.
The Rocket I-Cache .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Fetching Compressed Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The Fetch Buffer .
.
The Fetch Target Queue .
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
3 Branch Prediction
3.1
3.2
The Next-line Predictor (NLP) .
The Backing Predictor (BPD)
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
4 The Decode Stage
4.1
RVC Changes
.
.
.
.
.
.
.
.
.
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5 The Rename Stage
5.1
5.2
5.3
5.4
5.5
5.6
.
.
The Purpose of Renaming .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The Explicit Renaming Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The Rename Map Table .
The Busy Table .
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The Free List .
.
.
Stale Destination Specifiers
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
6 The Reorder Buffer (ROB) and the Dispatch Stage
6.1
6.2
6.3
6.4
6.5
.
.
.
The ROB Organization .
.
.
.
ROB State .
.
.
The Commit Stage .
.
.
Exceptions and Flushes
.
Point of No Return (PNR) .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7 The Issue Unit
1
1
3
3
4
4
6
9
9
10
10
11
13
13
15
23
23
25
25
25
25
28
28
28
29
29
29
31
31
32
33
i
.
.
.
.
Speculative Issue .
Issue Slot .
.
.
Issue Select Logic .
.
7.1
.
7.2
7.3
.
7.4 Un-ordered Issue Queue .
7.5 Age-ordered Issue Queue
.
7.6 Wake-up .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8 The Register Files and Bypass Network
8.1
8.2
Register Read .
.
Bypass Network .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
9 The Execute Pipeline
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Execution Units
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Functional Units .
Branch Unit & Branch Speculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Load/Store Unit
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Floating Point Units
Floating Point Divide and Square-root Unit
.
Parameterization .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Control/Status Register Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The Rocket Custom Co-Processor Interface (RoCC)
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.1
9.2
9.3
9.4
9.5
9.6
9.7
9.8
9.9
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
10 The Load/Store Unit (LSU)
.
.
.
10.1 Store Instructions .
10.2 Load Instructions .
.
10.3 The BOOM Memory Model .
10.4 Memory Ordering Failures .
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11 The Memory System and the Data-cache Shim
12 Micro-architectural Event Tracking
.
.
12.1 Setup HPM events to track .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.2 Reading HPM counters in software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.3 Adding your own HPE .
12.4 External Resources .
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
13 Verification
.
.
13.1 RISC-V Tests .
.
13.2 RISC-V Torture Tester .
.
.
13.3 Continuous Integration (CI)
.
.
.
.
14 Debugging
14.1 FireSim Debugging .
.
14.2 Pipeline Visualization .
.
.
.
.
.
15 Physical Realization
.
.
.
.
.
.
.
.
.
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15.1 Register Retiming .
.
15.2 Pipelining Configuration Options
.
.
.
.
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16 Future Work
16.1 The BOOM Custom Co-processor Interface (BOCC) . . . . . . . . . . . . . . . . . . . . . . . . . .
17 Parameterization
17.1 BOOM Parameters .
17.2 Other Parameters .
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ii
33
33
34
34
35
35
37
37
38
39
40
41
42
43
43
43
45
45
46
47
47
49
49
49
51
53
53
54
54
54
55
55
55
55
57
57
57
59
59
60
61
61
63
63
64
18 Frequently Asked Questions
19 The BOOM Ecosystem
19.1 Scala, Chisel, Generators, Configs, Oh My! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20 Terminology
21 Bibliography
22 Indices and tables
Bibliography
65
67
67
71
73
75
77
iii
iv
CHAPTER 1
Introduction and Overview
The goal of this document is to describe the design and implementation of the Berkeley Out–of–Order Machine
(BOOM).
BOOM is heavily inspired by the MIPS R10k and the Alpha 21264 out–of–order processors. Like the R10k and the
21264, BOOM is a unified physical register file design (also known as “explicit register renaming”).
The source code to BOOM can be found at https://github.com/riscv-boom/riscv-boom.
1.1 The BOOM Pipeline
Fig. 1.1: Default BOOM Pipeline with Stages
1.1.1 Overview
Conceptually, BOOM is broken up into 10 stages: Fetch, Decode, Register Rename, Dispatch, Issue, Register
Read, Execute, Memory, Writeback and Commit. However, many of those stages are combined in the current
implementation, yielding seven stages: Fetch, Decode/Rename, Rename/Dispatch, Issue/RegisterRead, Execute,
Memory and Writeback (Commit occurs asynchronously, so it is not counted as part of the “pipeline”).
1.1.2 Stages
Fetch
Instructions are fetched from the Instruction Memory and pushed into a FIFO queue, known as the Fetch Buffer.
Branch prediction also occurs in this stage, redirecting the fetched instructions as necessary.1
1 While the Fetch Buffer is N-entries deep, it can instantly read out the first instruction on the front of the FIFO. Put another way, instructions
don’t need to spend N cycles moving their way through the Fetch Buffer if there are no instructions in front of them.
1
RISCV-BOOM Documentation
Decode
Decode pulls instructions out of the Fetch Buffer and generates the appropriate Micro-Op(s) to place into the pipeline.2
Rename
The ISA, or “logical”, register specifiers (e.g. x0-x31) are then renamed into “physical” register specifiers.
Dispatch
The Micro-Op is then dispatched, or written, into a set of Issue Queues.
Issue
Micro-Ops sitting in a Issue Queue wait until all of their operands are ready and are then issued.3 This is the beginning
of the out–of–order piece of the pipeline.
Register Read
Issued Micro-Ops first read their register operands from the unified physical register file (or from the bypass net-
work). . .
Execute
. . . and then enter the Execute stage where the functional units reside.
Issued memory operations perform their
address calculations in the Execute stage, and then store the calculated addresses in the Load/Store Unit which resides
in the Memory stage.
Memory
The Load/Store Unit consists of three queues: a Load Address Queue (LAQ), a Store Address Queue (SAQ), and
a Store Data Queue (SDQ). Loads are fired to memory when their address is present in the LAQ. Stores are fired
to memory at Commit time (and naturally, stores cannot be committed until both their address and data have been
placed in the SAQ and SDQ).
Writeback
ALU operations and load operations are written back to the physical register file.
Commit
The Reorder Buffer (ROB), tracks the status of each instruction in the pipeline. When the head of the ROB is
not-busy, the ROB commits the instruction. For stores, the ROB signals to the store at the head of the Store Queue
(SAQ/SDQ) that it can now write its data to memory.
2 Because RISC-V is a RISC ISA, currently all instructions generate only a single Micro-Op. More details on how store Micro-Ops are handled
can be found in The Memory System and the Data-cache Shim.
3 More precisely, Micro-Ops that are ready assert their request, and the issue scheduler chooses which Micro-Ops to issue that cycle.
2
Chapter 1.
Introduction and Overview