CPU 流水线观察实验报告书
课
姓
学
学
程:
名:
号:
院:
高级计算机体系结构
计算机学院
2017 年 11 月
目
录
1 预备知识——DLX 处理器简介 ·········································································································1
1.1 DLX 基本结构·················································································································································1
1.2 DLX 流水线结构············································································································································ 3
1.3 DLX 基本流水线············································································································································ 6
1.4
常用指令·····················································································································································7
2 实验准备 ·············································································································································10
2.1
2.2
准备环境——VMWARE 虚拟机········································································································· 10
熟悉软件·················································································································································· 11
3 实验二 指令流水线相关性分析 ·····································································································13
3.1
3.2
3.3
3.4
实验目的·················································································································································· 13
实验环境·················································································································································· 13
实验步骤·················································································································································· 13
实验过程·················································································································································· 13
3.4.1 装载程序与数据文件····················································································································· 13
3.4.2 观察数据/控制/结构相关············································································································· 14
3.4.3 考察增加浮点运算部件对性能的影响·····················································································17
3.4.4 考察增加 forward 部件对性能的影响······················································································18
3.4.5 观察转移指令转移成功与否的流水线开销··········································································· 18
3.5
实验总结·················································································································································· 19
4 实验三 DLX 处理器程序设计 ·········································································································20
4.1
4.2
4.3
4.4
4.5
实验目的·················································································································································· 20
实验环境·················································································································································· 20
实验原理·················································································································································· 20
实验步骤·················································································································································· 20
实验过程·················································································································································· 21
4.5.1 编写汇编程序··································································································································· 21
4.5.2 观察数据/控制/结构相关············································································································· 22
4.5.3 考察增加浮点运算部件对性能的影响·····················································································23
4.5.4 考察增加 forward 部件对性能的影响······················································································24
4.5.5 观察转移指令转移成功与否的流水线开销··········································································· 25
4.6
实验总结·················································································································································· 25
5 实验四 代码优化 ······························································································································26
5.1
5.2
5.3
5.4
5.5
实验目的·················································································································································· 26
实验环境·················································································································································· 26
实验原理·················································································································································· 26
实验步骤·················································································································································· 26
实验过程·················································································································································· 27
5.5.1 优化汇编程序··································································································································· 27
5.5.2 观察数据/控制/结构相关············································································································· 28
5.5.3 考察增加浮点运算部件对性能的影响·····················································································28
5.5.4 考察增加 forward 部件对性能的影响······················································································29
5.6
实验总结·················································································································································· 30
6 实验五 循环展开 ······························································································································31
6.1
6.2
6.3
6.4
6.5
实验目的·················································································································································· 31
实验环境·················································································································································· 31
实验原理·················································································································································· 31
实验步骤·················································································································································· 31
实验过程·················································································································································· 31
6.5.1 编写矩阵相乘程序··························································································································31
6.5.2 观察数据/控制/结构相关············································································································· 33
6.5.3 考察增加浮点运算部件对性能的影响·····················································································34
6.5.4 考察增加 forward 部件对性能的影响······················································································35
6.6
实验总结·················································································································································· 35
1 预备知识——DLX 处理器简介
1.1 DLX 基本结构
DLX 是一种典型的 Load/Store 型指令集结构。它不仅体现了当今多种机器
的指令集结构的共同特点,而且它还体现出未来一些机器的指令集结构的特点。
这些机器的指令集结构设计思想都和 DLX 指令集结构的设计思想十分相似,它
们都强调:
具有一套简单的 Load/Store 指令集;
注重指令流水效率;
简化指令的译码;
高效支持编译器。
DLX 是一种易于学习和研究的处理器结构模型。这种类型的机器正在日趋流行,
而且其结构非常易于理解。
1.DLX 中的寄存器
DLX 中有 32 个通用寄存器(GPRs),分别将其命名为 R0,R1…R31。每个
通用寄存器长度为 32 位。
另外,DLX 中有 32 个浮点寄存器(FPRs),分别将其命名为 F0,F1…F31。每个
浮点寄存器长度为 32 位。这些浮点寄存器可以用来保存 32 位的单精度浮点数,
或者通过相邻两个浮点寄存器奇偶对 FiFi+1(i=0,2,4…,30)来保存双精度浮点数,
这 种 组 合 而 成 的 64 位 双 精 度 浮 点 寄 存 器 在 DLX 中 分 别 被 命 名 为
F0,F2…F28,F30.
2. DLX 数据类型
DLX 提供了多种长度的整型数据和浮点数据。对整型数据而言,有 8 位,16 位,
32 位多种长度;对浮点而言,有 32 位单精度浮点数和 64 位双精度浮点数。浮
点数据表示采用的是 IEEE754 标准。DLX 操作都是对 32 位整型数据及 32 或 64
位浮点数据进行的。
3.DLX 的寻址方式和数据传送
DLX 提供了寄存器寻址,立即寻址,偏移寻址和寄存器间接寻址四种寻址
方式。寄存器寻址字段的大小为 5 位,用来标识 32 个通用寄存器或浮点寄存器。
4. DLX 的指令格式
因为 DLX 只有四种寻址方式,所以将其寻址方式编码在操作码中。为了简化指
令译码,并充分发挥流水线的效率,所有 DLX 指令的字长均是 32 位,其中用 6
位表示操作码。DLX 中各种类型指令的格式如图 1 所示:
1
5.DLX 中的操作
图 1
DLX 指令中的操作可以分为四种类型,即:Load 和 Store 操作、ALU 操作、分支
和跳转操作、浮点操作。
(1)Load 和 Store 操作
可以对 DLX 的所有通用寄存器和浮点寄存器进行 Load(载入)和 Store(储存)操
作,但是通用寄存器 R0 的 Load 操作没有任何效果。
(2) ALU 操作
在 DLX 中,所有的 ALU 指令都是寄存器-寄存器型指令,其运算包含了简单的算
术和逻辑运算,如加、减、AND、OR、XOR 和移位。另外,DLX 还允许所有这些指令
对立即值进行操作,立即值以 16 位符号扩展形式出现。LHI(Load 高位立即值)
操作将立即值载入到一个寄存器的高半部分,而该寄存器的低半部分则设置为 0。
这样就可以通过两条 Load 指令构造一个 32 位的常数。
正如上面所提到的,R0 主要用来合成一些有用的操作。比如,Load 一个常数
就可以看作是一次简单的立即值加操作,其中一个源操作数是 R0;寄存器-寄存
器间的数据移动也可以看作是一次简单的加,其中一个源操作数是 R0。这两个操
作可以分别用 LI 和 MOV 表示。
在 DLX 指令集中,还有一些寄存器比较指令(=,≠,<,>,≤,≥),如果比较
结果为真,这些指令就在目标寄存器中填入 1(表示真),否则填入 0(表示假)。
2
因为这些比较操作指令要对目标寄存器进行“设置”,所以也称它们为设置相等、设
置不等、设置小于等指令。
(3) 分支和跳转操作
在 DLX 中,对程序流程的控制是通过一些跳转和分支指令来实现的。根据描述目
标地址的方法和是否链接可以将跳转操作指令分为四种类型。其中两种类型的跳
转指令用带符号位的 26 位偏移量加上程序计数器的值来确定跳转的目标地址,
另外两种类型的跳转指令则指定一个寄存器,由寄存器中的内容决定跳转的目标
地址。跳转有两种类型,一种是简单跳转,另一种是跳转并链接(用于过程调用),
后者将返回一个地址,即将下一条顺序指令地址(返回地址)保存在寄存器 R31
中。
DLX 中的所有分支指令均是条件分支指令,其源操作数寄存器中包含了一个
数值或某个比较结果。分支指令测试该源操作数寄存器中的值是 0 还是非 0,决
定分支是否成功。分支目标地址由一个带符号的 26 位偏移量加上程序计数器的
值来确定,分支目的地址指向下一条要执行的指令
(4) 浮点操作
在 DLX 中,浮点指令的操作数来源于浮点寄存器,同时该浮点指令还指明了
相应的操作是单精度浮点操作还是双精度浮点操作。
DLX 的浮点操作有:加、减、乘、除。后缀 D 代表双精度浮点操作,而后缀 F
代表单精度浮点操作(如:ADDD、ADDF、SUBD、SUBF、MULTD、MULTF、DIVD、DIVF)。值
得提出的是,DLX 的浮点比较操作设置浮点状态寄存器中的位,如果比较结果为
真,则将该位设置为 1;如果比较结果为假,则将该位设置为 0。浮点分支指令
BFPT 和 BFTF 则测试该寄存器的值来决定分支是否成功。
另外,操作 MOVF 将一个单精度浮点寄存器的内容拷贝至另一个单精度浮点
寄存器; MOVD 则将一个双精度浮点寄存器的内容拷贝至另一双精度寄存器;
MOVFP2I 和 MOVI2FP 操作则是在一个浮点寄存器和通用寄存器之间移动数据,如果
要将一个双精度浮点数移入两个通用寄存器则需要两条指令,另外 DLX 还提供了
在 32 位浮点寄存器中进行整数乘除操作的指令。
1.2 DLX 流水线结构
为了说明指令的流水执行方式,先论述在不流水的情况下,DLX 指令是如
何执行的。图 2 给出了实现 DLX 指令的一种简单数据通路,下面可以看出在五
个时钟周期内可以完成一条 DLX 指令。
1.取指令周期(IF):
IR ← Mem[PC]NPC ← PC+4
其操作为:根据 PC 值从存贮器中取出指令,并将指令送入指令寄存器 IR;PC 值
增加 4,指向顺序的下一条指令,并将下一条指令的地址放入临时寄存器 NPC 中.
3
图 2
2.指令译码/读寄存器周期(ID):
A ←Regs[IR6..10]
B ←Regs[IR11..15]
Imm←(IR16)16 ## IR16..31
其操作为:对指令操作码进行译码,按照给定的寻址方式和地址段中的内容形成
操作数的地址,并用这个地址读操作数。操作数可能在驻村中,也可能在通用寄
存器中。[8]
指令的译码操作和读寄存器操作是并行进行的。之所以能做到这一点,是因为
DLX 指令格式中,操作码在固定位置。这种技术也称为固定字段译码技术。值得注
意的是,在上述过程中,可能读出了一些在后面周期中并不会使用到的寄存器内
容,但是这并不会影响指令执行的正确性。相反,却可以有效地降低问题的复杂
性。
另外,由于立即值在 DLX 指令格式中处于固定位置,因此这里也对其进行符
号扩展,以便在下一个周期能使用它。当然由于指令的不同,也许在后面的周期
中并不会用到这个立即值,但无论如何,提前形成立即值总是有益无害的。
3.执行/有效地址计算周期(EX)在这个周期,不同的指令有不同的操作。[1]
(1)存储器访问:ALUOutput←A+Imm
当指令为存贮器访问指令时,该周期的操作为:ALU 将操作数相加形成有效地
址,并将结果放入临时寄存器 ALUoutput 中。
4
(2)寄存器―寄存器 ALU 操作:ALUOutput←A opB
当指令为寄存器---寄存器 ALU 操作指令时,该周期的操作为:ALU 根据操
作码指出的功能对临时寄存器 A 和 B 中的值进行处理,并将结果送入临时寄存
器 ALUoutput 中.
(3)寄存器―立即值 ALU 操作:ALUOutput←A opImm
当指令为寄存器---寄存器 ALU 操作指令时,该周期的操作为: ALU 根据操作码
指出的功能对临时寄存器 A 和 Imm 中的值进行处理,并将结果送入临时寄存器
ALUoutput 中.
(4)分支操作
ALUOutput←NPC+Imm
Cond←(A op 0)
当指令为分支指令时,该周期的操作为:ALU 将临时寄存器 NPC 和 Imm 中的
值相加,得到分支的目标地址。同时,对在前一个周期读入到寄存器 A 的值进
行检查,决定分支是否成功。OP 由分支操作码决定
这里,将有效地址计算周期和执行周期合并为一个时钟周期,这是由 DLX 指令集
结构本身的特点所允许的,因为在 DLX 指令集结构中,没有任何指令需要同时计
算数据的存储器地址、计算分支指令的目标地址和对数据进行处理。另外,上面
四种操作类型中没有包含各种形式的跳转操作,它们和分支操作十分相似,这里
就不再赘述
4.存储器访问/分支完成周期(MEM)
在该周期处理的 DLX 指令只有 Load、Store 和分支指令。
存储器访问:LMD ←Mem[ALUOutput]或 Mem[ALUOutput] ←B
存贮器的访问操作包含了 Load 和 Store 两种类型的操作。如果指令是 Load 指令,
就将临时寄存器 ALUOutput 中的值作为访存地址,从存贮器中读出相应的数据,
并放入临时寄存器 LMD 中;如果指令是 Store 指令,就将临时寄存器中的值按
照临时寄存器 ALUOutput 所知名的地址写入存贮器。
分支操作:if(cond)PC ←ALUOutput elsePC ←NPC
如 果 分 支 条 件 寄 存 器 中 的 内 容 为 真 , 表 明 分 支 转 移 成 功 , 选 择 临 时 寄 存
ALUOutput 中的值作为分支目标地址,并将其放入 PC 中。否则,他将临时寄存
器 NPC 中的值送入 PC 中,作为下一条指令地址。[1]
5.写回周期(WB):
不同指令在该周期完成的工作也不一样。这里按如下指令类型对写回周期所要完
成的工作进行说明。
(1)寄存器―寄存器型 ALU 指令:Regs[IR16..20] ←ALUOutput
(2)寄存器―立即值型 ALU 指令:Regs[IR11..15] ←ALUOutput
5