logo资料库

BookSim 2.0 User's Guide.pdf

第1页 / 共12页
第2页 / 共12页
第3页 / 共12页
第4页 / 共12页
第5页 / 共12页
第6页 / 共12页
第7页 / 共12页
第8页 / 共12页
资料共12页,剩余部分请下载后查看
BookSim 1.0 User’s Guide Brian Towles and William J. Dally September 10, 2004 Contents 1 Introduction 2 Getting started 2.1 Downloading and building the simulator . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Running a simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Simulation output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Examples 1 2 2 2 2 4 4 Configuration parameters 4 4.1 Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 4.2 Routing algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 4.3 Flow control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.4 Router organizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.4.1 The input-queued router . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.4.2 The event-driven router . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 4.5 Allocators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 4.6 Traffic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 4.7 Simulation parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 A Random number generation 12 1 Introduction This document describes the use of the BookSim interconnection network simulator. The simulator is designed as a companion to the textbook “Principles and Practices of Interconnection Networks” (PPIN) published by Morgan Kaufmann (ISBN: 0122007514) and it is assumed that is reader is familiar with the material covered in that text. This user guide is fairly brief as, with most simulators, the best way to learn and understand the simulator is to study the code. Most of the simulator’s components are designed to be modular so tasks such as adding a new routing algorithm, topology, or router microarchitecture should not require a complete redesign of the code. Once you have downloaded the code, compiled it, and run a simple example (Section 2), the more detailed examples of Section 3 give a good overview of the capabilities of the simulator. A list of configuration options is provided in Section 4 for reference.
2 Getting started 2 Getting started 2 2.1 Downloading and building the simulator The latest version of the simulator is available from http://cva.stanford.edu as a compressed tar archive. UNIX/Linux users can extract this archive using the tar utility tar xvfz booksim-1.0.tar.gz Windows users can use a compression program such as WinZip to extract the archive. The simulator itself is written in C++ and has been specifically tested with GNU’s G++ compiler (version ≥ 3). In addition, both a LEX and YACC tool (also known as FLEX and BISON) are needed to create the configuration parser. These are standard tools in any UNIX/Linux development environment. It is suggested that Windows users download the CYGWIN versions (http://www.cygwin.com) of these UNIX development tools to simplify their compilation process. The Makefile should be edited so that the first lines give the paths to the tools. At Stanford, for example, the compiler, YACC, and LEX are stored in the /usr/pubsw/bin directory. The Makefile reflects this: CPP YACC LEX = /usr/pubsw/bin/g++ = /usr/pubsw/bin/byacc -d = /usr/pubsw/bin/flex Then, the simulator can be compiled by running make in the directory that contains the Makefile. 2.2 Running a simulation The syntax of the simulator is simply booksim [configfile] The optional parameter configfile is a file that contains configuration information for the simu- lator. So, for example, to simulate the performance of a simple 8 × 8 torus (8-ary 2-cube) network on uniform traffic, a configuration such as the one shown in Figure 1 could be used. This particular configuration is stored in examples/torus88. In addition to specifying the topology, the configuration file also contains basic information about the routing algorithm, flow control, and traffic. This simple example uses dimension-order routing and, to ensure deadlock-freedom of this routing function in the torus, two virtual channels are required. The injection rate parameter is added to tell the simulator to inject 0.15 flits per simulation cycle per node. Because the simulator operates at the flit level, most parameters are specified in units of flits as is the case with the injection rate. Also, any line of the configuration that begins with // is treated as a comment and ignored by the simulator. A detailed list of configuration parameters is given in Section 4. 2.3 Simulation output Continuing our example, running the torus simulation produces the output shown in Figure 2. Each simulation has three basic phases: warm up, measurement, and drain. The length of the warm up and measurement phases is a multiple of a basic sample period (defined by sample period in the configuration). As shown in the figure, the current latency and throughput (rate of accepted pack- ets) for the simulation is printed after each sample period. The overall throughput is determined September 10, 2004
2.3 Simulation output 3 // Topology topology = torus; k n = 8; = 2; // Routing routing_function = dim_order; // Flow control num_vcs = 2; // Traffic traffic injection_rate = 0.15; = uniform; Figure 1: Example configuration file for simulating a 8-ary 2-cube network. %================================= % Average latency = 6.02008 % Accepted packets = 0.11 at node 52 (avg = 0.147094) % latency change = 1 % throughput change = 1 ... % Warmed up ... %================================= % Average latency = 6.0796 % Accepted packets = 0.119 at node 5 (avg = 0.148266) = 0.00562457 % latency change % throughput change = 0.00379387 ... % Draining all recorded packets ... % Draining remaining packets ... ====== Traffic class 0 ====== Overall average latency = 6.09083 (1 samples) Overall average accepted rate = 0.149475 (1 samples) Overall min accepted rate = 0.138551 (1 samples) Figure 2: Simulator output from running the examples/torus88 configuration file. September 10, 2004
3 Examples 4 by the lowest throughput of all the destination in the network, but the average throughput is also displayed. After the warm up periods have passed, the simulator prints the “Warmed up” message and resets all the simulation statistics. Then, the measurement phase begins and statistics continue to be reported after each sample period. Once the measurement periods have passed, all the measurement packets are drained from the network before final latency and throughput numbers are reported. Details of the configuration parameters used to control the length of the simulation phases are covered in Section 4.7. 3 Examples One of the most basic performance measures of any interconnection network is its latency versus offered load. Figure 3 shows a simple configuration file for making this measurement in a 8-ary 2-mesh network under the transpose traffic pattern. This configuration was used to generate Figure 25.2 in PPIN. The particular configuration accounts for some small delays and pipelining of the input-queued router and also introduces a small input speedup to account for any inefficiencies in allocation. By running simulations for many increments of injection rate, the average latency curve can be found. Then, to compare the performance of dimension-order routing against several other routing algorithms, for example, the routing function option can be changed. Figure 4 shows a configuration file that can be used to determine the distribution of packet latencies in a 2-ary 6-fly network that uses age-based arbitration. Note the use of the priority configuration parameter along with the select allocators that account for packet priorities. The simulator does not output latency distributions by default, but by editing trafficmanager.cpp, setting the configuration variable DISPLAY LAT DIST to true, and recompiling, the distribution will be displayed at the end of the simulation. This technique was used to produced the distribution shown in Figure 25.12 of PPIN. As a final example, Figure 5 shows the use of the special single-node topology to test the performance of a switch allocator — in this case, the iSLIP allocator. The in ports and out ports options set up a simulation of an 8 × 8 crossbar. 4 Configuration parameters All information used to configure a simulation is passed through a configuration file as illustrated by the example in Section 2.2. This section lists the existing configuration parameters — a user can incorporate additional options by changing the booksim config.cpp file. 4.1 Topologies The topology parameter determines the underlying topology of the network and the simulator supports four basic topologies: fly mesh A k-ary n-fly (butterfly) topology. The k parameter determines the network’s radix and the n parameter determines the network’s dimension. A k-ary n-mesh (mesh) topology. The k parameter determines the network’s radix and the n parameter determines the network’s dimension. September 10, 2004
4.1 Topologies 5 // Topology topology = mesh; k = 8; n = 2; // Routing routing_function = dim_order; // Flow control num_vcs = 8; vc_buf_size = 8; wait_for_tail_credit = 1; // Router architecture vc_allocator = islip; sw_allocator = islip; alloc_iters = 1; credit_delay = 2; routing_delay = 1; vc_alloc_delay = 1; input_speedup output_speedup internal_speedup = 1.0; = 2; = 1; // Traffic traffic const_flits_per_packet = 20; = transpose; // Simulation sim_type injection_rate = 0.1; = latency; Figure 3: A typical configuration file (examples/mesh88 lat) for creating a latency versus offered load curve for a 8-ary 2-mesh network. September 10, 2004
6 4.1 Topologies // Topology topology = fly; k = 2; n = 6; // Routing routing_function = dest_tag; // Flow control num_vcs = 8; vc_buf_size = 8; wait_for_tail_credit = 1; // Router architecture vc_allocator = select; sw_allocator = select; alloc_iters = 1; = 2; credit_delay = 1; routing_delay vc_alloc_delay = 1; input_speedup output_speedup internal_speedup = 1.0; = 2; = 1; // Traffic traffic const_flits_per_packet = 20; priority = age; = uniform; // Simulation sim_type injection_rate = 0.1; = latency; Figure 4: A configuration file (examples/fly26 age) for finding the distribution of packet latencies using age-based arbitration. September 10, 2004
4.1 Topologies 7 // Topology = single; topology in_ports = 8; out_ports = 8; // Routing routing_function = single; // Flow control vc_allocator = islip; sw_allocator = islip; alloc_iters = 2; num_vcs vc_buf_size = 1000; = 8; wait_for_tail_credit = 0; // Simulation sim_type injection_rate = 0.1; = latency; Figure 5: A single-node configuration file (examples/single) for testing the performance of a switch allocator. September 10, 2004
4.2 Routing algorithms 8 single A network with a single node, used for testing single router performance. The number of input and output ports for the node is determined by the in ports and out ports parameters, respectively. torus A k-ary n-cube (torus) topology. The k parameter determines the network’s radix and the n parameter determines the network’s dimension. Both the mesh and torus topologies support the addition of random link failures with the link failures parameter. The value of link failures determines the number of channels that are randomly removed from the topology and are thus no longer available for forwarding packets. Moreover, the randomization for failed channels is controlled by selecting an integer value for the fail seed parameter — a fixed seed gives a fixed set of failed channels, independent of other randomization in the simulation. Also, note that only certain routing functions support this feature (see Section 4.2). 4.2 Routing algorithms The routing function parameter selects a routing algorithm for the topology. Many routing algorithms need multiple virtual channels for deadlock freedom (VCDF). dim order Dimension-order routing. Works for the mesh topology (1 VCDF) and for the torus topology (2 VCDF). dim order bal Dimension-order routing for the torus topology with a more balanced use of VCs to avoid deadlock (2 VCDF). dim order ni A non-interfering version of dimension-order routing. Works on the torus or mesh topology and requires one VC per network terminal. min adapt A minimal adaptive routing algorithm for the mesh topology (2 VCDF) and for the torus topology (3 VCDF). planar adapt Planar-adaptive routing for the mesh topology (2 VCDF). Supports routing around failed channels. romm romm ni single valiant ROMM routing for the mesh (2 VCDF). Load is balanced by routing in two phases: one from the source to a random intermediate node in the minimal quadrant and a second from the intermediate to the destination. A non-interfering version of ROMM routing for the mesh that requires one VC per network terminal. A dummy routing function used for the single topology. Valiant’s randomized routing algorithm for the mesh (2 VCDF) and torus (4 VCDF) topology. valiant ni A non-interfering version of Valiant’s algorithm for the torus that requires 4 VCs per network terminal. Also, the simulator code is structured so that additional routing algorithms can be added with minimal changes to the overall simulator (see the routefunc.cpp file in the simulator’s source code). September 10, 2004
分享到:
收藏