logo资料库

B32(SunnyApril)自制虚拟机第一部分英文教程.pdf

第1页 / 共73页
第2页 / 共73页
第3页 / 共73页
第4页 / 共73页
第5页 / 共73页
第6页 / 共73页
第7页 / 共73页
第8页 / 共73页
资料共73页,剩余部分请下载后查看
Introduction Welcome to my tutorial on virtual machines. This tutorial will introduce you to the concept of a virtual machine and then we will, step by step, create our own simple virtual machine in C#. Keep in mind that a virtual machine is a very complicated thing and even the simplest virtual machine can take years for a team of programmers to create. With that said, don’t expect to be able to create your own language or virtual machine that will take over .NET or Java overnight. In this tutorial, we will first layout the plan for our virtual machine. Then we will create a very simple intermediate language. An intermediate language is the lowest level language still readable by humans. It is comparable to assembly language, which is also the lowest level language on most computers. The first program we will create will be a very simple intermediate compiler that will convert our intermediate language to bytecode. Bytecode is a set of binary instructions that our virtual machine will be able to directly execute. It is comparable to machine language, which is a set of binary or machine instructions that all computers and CPUs understand. This virtual machine will be our second project. It will be a virtual machine, created from scratch in C# that will execute our bytecode. It will be very simple at first, but then we will expand it by adding threading support and dual screen outputs (you’ll find out what I’m talking about later). All of the code in this tutorial is created using Visual Studio 2008 Professional, targeting the .NET Framework 2.0. Since I’m targeting the 2.0 framework, you should be able to use Visual Studio 2005 as well. Since creating a virtual machine really does dive down into the nuts and bolts of how computers work, I am assuming the reader of this has a pretty good, or a basic knowledge of, programming, hexadecimal and binary number systems, and threading. It would also really help to know something about assembly language, although I will try to help you understand things on a need-to-know basis. If I haven’t scared you off and you’re still interested in how to make a virtual machine, then let’s begin!
How to create your own virtual machine in a step-by-step tutorial Brought to you by icemanind 2009 Planning it out As described in the introduction, the first thing we will want to do is draw out a rough blue print of what our machine will be able to do. I have decided to call our machine, B32 (Binary 32), although, for simplicity’s sake it will not be a 32-bit machine. It will be a 16-bit machine. B32 will have 64K of memory and it can be addressed anywhere from $0000 - $FFFF. A B32 executable program can access any part of that memory. Along with a 64K memory space, we will introduce 5 registers into our virtual machine. All CPU’s and all virtual machines have what’s called registers. A register is similar to a variable. Registers hold numbers and depending on how large the register is, determines how large of a number it can hold. Unlike variables, however, registers do not take up memory space. Registers are “built into” CPUs. This will make more sense once you see an example, which is coming up real soon. To keep things simple, we will only implement 5 registers into our virtual machines. These registers will be called A, B, D, X and Y. The A and B registers are only 8 bits in length, which means each register can hold any number between 0 and 255 unsigned or between -128 to 127 signed. For now, we are going to worry only about unsigned integers. We will get into signed later and we will briefly touch on floating point numbers later. The X, Y and D registers will be 16 bits in length, capable of storing any number between 0 and 65,535 unsigned or between -32768 to 32767 signed. The D register will be something of a unique register. The D register will hold the concatenated values of the A and B registers. In other words, if register A has $3C and register B has $10, than register D will contain $3C10. Anytime a value in the A or B register is changed, then the value in the D register is also changed. The same is true if a value in the D register is changed, the A and B registers will be changed accordingly. You will see later why this is handy to have. This has been a lot of dry talk, but here is a picture to represent our B32 registers: B32 Registers A 8 B 8 bits bits { D 16 bits X 16 bits Y 16 bits Hopefully this makes sense to you. If not, you will catch on as we progress through the tutorial. Earlier when I told you that our virtual machine had 64K of free memory for an executable to use, that was not entirely true. Really it’s only 60K because 4000 bytes must be reserved for screen 3
2009 How to create your own virtual machine in a step-by-step tutorial output. I’ve chosen to use $A000 - $AFA0. This area of memory will map to our screen. In most CPUs and Brought to you by icemanind most virtual machines, this memory is mapped inside the video card memory, however, for simplicity; I am going to share our 64K of memory with our video output. This memory will give us an 80x25 screen (80 columns, 25 rows). You may be thinking right now, “I think your math is off dude. 80 times 25 is only 2000”. This is true; however, the extra 2000 bytes will be for an attribute. For those of us old enough to remember programming assembly language, back in the old DOS days, will already be familiar with an attribute byte. An attribute byte defines the foreground and background color of our text. How it works is the last 3 bits of the byte make up the RGB or Red, Green, Blue values of our foreground color. The 4th bit is an intensity flag. If this bit is 1 then the color is brighter. The next 3 bits make up the RGB values of our background color. The last bit is not used (back in DOS days, this bit was used to make text blink, but in B32, it is ignored). You will see later how colors are created using this method. The final part of this section will define the mnemonics and the bytecode that make up a B32 executable. Mnemonics are the building block of our assembly language code that will be assembled to bytecode. For now, I am only going to introduce enough for us to get started and we will expand on our list throughout this tutorial. The first mnemonic we will introduce is called “LDA”. “LDA” is short for “Load A Register” and what it will do is assign a value to the A register. Now in most CPUs and virtual machines, you have what’s called addressing modes. Addressing modes determine how a register gets its value. For example, is the value specified directly on the operand (an operand is the data that follows the mnemonic) or does it pull a value from somewhere in memory or is loaded from a value assigned to another register? There can be dozens of addressing modes, depending on how complex of a virtual machine you want to create. For now, our virtual machine will only pull data directly specified in the operand. We will assign this mnemonic a bytecode value of $01. Since we decided earlier that the A register can only hold an 8 bit value, we now that the entire length of a “LDA” mnemonic that pulls direct data from the operand will be 2 bytes in length (1 byte for the mnemonic and 1 byte for the data). The next mnemonic we will discuss will be called “LDX”. “LDX” is short for “Load X Register” and, just like “LDA”, it will load a value directly into the X register from the operand. Another difference between “LDX” and “LDA” is the length. Since our X register can hold 16 bits of data, that means the total length of the bytecodes will be 3 bytes instead of 2 (1 byte for the mnemonic and 2 bytes for the data). We will assign this mnemonic a bytecode of $02. If I lost you guys, keep reading and I promise this will make sense when we look at some examples. The next mnemonic we will discuss now will be called “STA”. “STA” is short for “Store A Register” and its function will be to store the value contained in the A register into a location somewhere in our 64K memory. Unlike our load mnemonics, which pulls the value directly from the operand, our store mnemonic will pull its data from the value stored in one of the 16 bit registers. We will assign this mnemonic a bytecode of $03. The final mnemonic we will discuss is call “END”. “END” will do exactly that. It will terminate the application. All B32 programs must have an END mnemonic as the last line of the program. The operand for the END mnemonic will be a label that will point to where execution of our B32 program will begin. 4
How to create your own virtual machine in a step-by-step tutorial Brought to you by icemanind 2009 Here is a summary of our mnemonics: Mnemonic LDA $01 LDX $02 STA $03 END $04 Description Example What will this example do? Assigns a value to our A register LDA #$2A Assigns a value to our X register LDX #16000 Assigns the hex value $2A to the A register Assigns the number 16,000 to the X register Stores the value of the A register to a memory location STA ,X Terminates the B32 program END START to our Stores the value of the A register the memory location pointed to by the X register Terminate the program and tell that execution of our program should start at the START label assembler You may be wondering what the pound sign ‘#’ means in my examples above. The pound sign will tell our assembler to use “this value”, that is, the value immediately following the pound sign. We will introduce other forms of LDA, LDX and STA later in this tutorial, but for now, this is enough to get us started. For those of you who may also be wondering what the dollar sign ‘$’ means, it is a prefix that will tell our assembler that the value is in hexadecimal format. If there is no dollar sign present, then the assembler will assume the number is a regular integer number. One final mnemonic that we will introduce is called “END”. This is not really a mnemonic though. This is an assembler command that will tell our assembler “this is the end of our program”. All B32 programs we created must have at least 1 and only 1 END statement and it should be the last line of the program. The operand for our END statement will be a label that points to the part of our program where execution will begin. We will discuss labels and execution points later in the tutorial. One final piece of business we need to discuss before we get our hands dirty and start writing our assembler is the file format of our B32 executables. To keep things simple, our file format will be as follows: Data “B32” Length Description 3 Bytes Our magic header number 2 Bytes 2 Bytes This is a 16-bit integer that tells our virtual machine where, in memory, to place our program. This is a 16-bit integer that tells our virtual machine where to begin execution of our program. ?? Bytes This will be the start of our bytecode, which can be any length. 5
How to create your own virtual machine in a step-by-step tutorial Brought to you by icemanind 2009 Most binary file formats have a “magic number” as a header. A magic number is one or more bytes that are unique to that file format. For example, all DOS and Windows executables start with “MZ”. Java binary class files have 4 bytes for its magic number and start with $CAFEBABE. Our B32 executables will start with “B32”. There are two main purposes for this “magic number”. The first is, our virtual machine can check to be sure the file it’s trying to execute is, indeed, a B32 binary file. The second purpose for have magic numbers is some operating systems, such as Linux for example, can automatically execute files by looking at this magic number in a database, then calling the appropriate program. B32 Assembler Finally! It’s time to get our hands dirty and start working on our assembler. The goal of the assembler will be to translate our B32 mnemonics into a B32 binary. Our assembler is going to expect input to be in the following format: [Optional Label:] [Optional white space] A label starts with a letter and is composed of any number of letters or numbers, followed by a colon. As far as the assembler’s concerned, a label will simply be translated into a 16-bit value defining an area of memory. A white space is any number of spaces or tabs. Each mnemonic MUST be preceded by at least 1 space or 1 tab; otherwise, our assembler will treat the mnemonic as a label instead of as a mnemonic. Likewise, each mnemonic must also have at least 1 space or 1 tab after the mnemonic. To demonstrate this, we are going to create our very first B32 assembly language program right now. Open up notepad or some other text editor of your choosing and type the following program EXACTLY as you see it below and don’t forget to put a single space before each mnemonic and also be sure to end the last line with a newline: START: LDA #65 LDX #$A000 STA ,X END START Five points if you can guess what this program will do! That’s right! This program doesn’t do much except put a capital letter ‘A’ in the upper left hand corner of the screen. The first line is our label. This is where our execution will begin. The next line loads our A register with the value of 65. The ASCII value of ‘A’ is 65. The following line loads our X register with hex value $A000. If you remember from our previous discussion, we said that our video memory will start at $A000, thus defining the upper left 6
How to create your own virtual machine in a step-by-step tutorial hand corner. The next line is the action line that actually stores the value in register A (65, which is the Brought to you by icemanind letter ‘A’ in ASCII) at the location pointed to by register X (which is $A000). The final line ends our program and tells the assembler to start execution at our START label. Hopefully this all makes sense to you. If not, scroll back up and reread the “Planning it out” section carefully. 2009 Save this file somewhere on your computer. Call it “Test.asm”. We will now create the assembler that will be able to translate this code into B32 bytecode. Our assembler will work by assembling in two phases. First the assembler will load the program into memory. Then it will start phase one of the assemble process. This phase will scan for all labels we have in the program. Each time the assembler encounters a label, it will store the label as a key in a hash table and the current location in the program as its value. A hash table is a type of .NET collection that stores values based on unique keys. This is a perfect collection to use for gather labels, since each label name must be unique in our program. Once this is complete, the assembler will move onto phase two. Phase two will actually translate our mnemonics into bytecode. Okay, fire up Visual Studio and create a new C# Windows Form project called “B32Assembler”. Target the 2.0 framework or higher. Open up Form1 and change the name to frmMainForm. Resize it so that the width is 300 and the height is 207. Add the following controls to the form: Control Type Label Label Label Label TextBox TextBox TextBox Button Button Button OpenFileDialog Autosize Autosize Autosize Autosize Name Label1 Label2 Label3 Label4 Location Size X = 16 Y = 23 X = 20 Y = 52 X = 44 Y = 77 X = 77 Y = 77 txtSourceFileName X = 87 Y = 20 txtOutputFileName X = 87 Y = 49 X = 87 Y = 75 X = 97 Y = 138 X = 193 Y = 17 X = 193 Y = 46 N/A W = 100 H = 20 W = 100 H = 20 W = 100 H = 20 W = 75 H = 23 W = 75 H = 23 W = 75 H = 23 N/A btnOutputBrowse btnSourceBrowse fdDestinationFile btnAssemble txtOrigin Other properties Text = “Source File:” Text = “Output File:” Text = “Origin:” Text = “$” Text = “” Text = “” Text = “” Text = “Assemble!” Text = “Browse…” Text = “Browse…” Filter = “B32 Files|*.B32” DefaultExt = “B32” Filename = “” 7
How to create your own virtual machine in a step-by-step tutorial 2009 Brought to you by icemanind OpenFileDialog fdSourceFile N/A N/A Your form should look like the following: = Assembly “B32 CheckFileExists = False Filter Files|*.asm” DefaultExt = “asm” Filename = “” Now double click the top browse button to create a “click” event handler for the button, then type in the following code: private void btnSourceBrowse_Click(object sender, EventArgs e) { this.fdSourceFile.ShowDialog(); this.txtSourceFileName.Text = fdSourceFile.FileName; } Now go back to designer view and double click the second browse button, then type in the following code: private void btnOutputBrowse_Click(object sender, EventArgs e) { this.fdDestinationFile.ShowDialog(); this.txtOutputFileName.Text = fdDestinationFile.FileName; } What this will do is allow the user to browse for a source and output file. If you run the program now and click one of the browse buttons, it should pop up with a dialog box allowing you to find and choose a file. Once you select a file and click OK, the filename should pop into the appropriate text box. The origin will be where, in our 64K memory region, you want the program to be placed. Now that we got our interface wired up, let’s add the main functionality. Double click on the “Assemble!” button to create a click event handler. Before coding the event handler though, add the following class members to our class: public partial class frmMainForm : Form { private string SourceProgram; 8
How to create your own virtual machine in a step-by-step tutorial 2009 Brought to you by icemanind private System.Collections.Hashtable LabelTable; private int CurrentNdx; private ushort AsLength; private bool IsEnd; private ushort ExecutionAddress; The SourceProgram variable will hold our program in memory. The B32 assembler will read our source file and dump the contents into the SourceProgram variable. As described earlier, LabelTable is a hash table that will hold our labels. The hash table will be populated during the first stage of the assembly. CurrentNdx will be an integer variable that will be an index pointer to the current location in the file. AsLength will be an unsigned 16-bit variable that will keep track of how big our binary program is. IsEnd is simply a flag to determine if the end of the program has been reached. Finally, ExecutionAddress will hold the value of our execution address. If some of this doesn’t make sense yet, it will as we code our program. We are also going to need an enumeration that will store our registers. Add the following enumeration just below the code you just entered: private enum Registers { Unknown = 0, A = 4, B = 2, D = 1, X = 16, Y = 8 } We will put this enumeration to use later on when we start coding our helper functions. We will create a function that will read a register from our program and return an enumeration type representing the register. Finally, before we start coding our Assemble button handler, add the following lines to the frmMainForm class constructor. These lines will automatically initialize our variables we added earlier, assign a default origin, and allocate memory for our hash table: public frmMainForm() { InitializeComponent(); LabelTable = new System.Collections.Hashtable(50); CurrentNdx = 0; AsLength = 0; ExecutionAddress = 0; IsEnd = false; SourceProgram = ""; txtOrigin.Text = "1000"; } 9
分享到:
收藏