CS 351: Design of Large Programs

Assignment # 3 (preliminary version), Due Date 03/25/02

The goal of this assignment is to simulate the flow of packets in a network using "wormhole routing".

Wormhole routing is a variant of virtual cut-through which, in turn, is a cross between circuit and packet switching. In packet switching, a message to be transmitted is split into packets. Packets typically have a network-specific maximum length and are individually routed. This means that the different packets comprising a message may traverse different paths from the source (S) to the destination (D). One implication of this is that Packet 5 of a message may reach D before Packet 3 which was transmitted before.

In wormhole routing, a packet or worm is split into flits (think of a flit as a unit of information transfer: a flit traverses a link between two adjacent nodes in one time unit.) In the sequel, the terms packet and worm are used interchangeably. The flits move in lock step. Fig. 1 shows the progress of an isolated worm in a 2-dimensional mesh. S and D are 3 hops apart and the packet length is 7 flits. The packet was launched at time t=0. Fig. 1 shows the position of each flit between t=1 and t=9 (at which point the entire packet has reached D). The numbers in Fig. 1 identify the flits of the worm -- 1 represents the head flit and 7 the tail flit. The dark lines indicate the path over which the flits of a worm extend. Note that in the absence of any contention, a packet will take f+h-1 time units to reach its destination - h units for the head flit to reach its destination and f-1 units for the remainder of the flits.

Topology, Addressing and Routing

For the purpose of this assignment, we consider highly regular topologies to simplify the addressing and routing. The topologies to be considered belong to the family of generalized hypercubes.

An r-radix, d-dimensional hypercube is one in which each node may be addressed as a d-tuple
(a0, a1, . . . ad-1) where   0 <= ai < r ,   0 <= i < d.  Each node is connected to a node that differs in exactly one position by 1 in the modulo r sense. An example of a 2-dimensional radix 5 hypercube is shown in Figure 2.

In the first part of this assignment, we confine attention to this structure, with arbitrary radix, called a toroid. We assume that all links are uni-directional. In particular, the horizontal links are left-to-right and the vertical links are top-to-bottom.

We assume that a packet from S to D will always take the shortest path. If the magnitudes of the difference between the X co-ordinates of S and D is x and the difference between the Y co-ordinates is y, then there are (x+y)!/(x!y!) shortest paths between S and D. To minimize the probability of deadlock, however, we assume that all packets are routed through a node that shares the x co-ordinate with D and the y co-ordinate with S. (i.e. the path is made up of a horizontal sub-path(if any) followed by a vertical sub-path (if any).) Figure 3 shows the different shortest paths between S and D. Of these, the first path should be chosen in this assignment.

Node Architecture

A node is made up of a processor and a router. The processor acts as a source and sink of packets. Figure 4 shows a microscopic view of a node in a two-dimensional hypercube. Here, each node includes three input buffers and two output buffers. Each buffer is capable of holding one flit. Note that an outgoing south-bound link can receive a packet from an incoming south-bound link or an incoming east-bound link or from the processor at the node itself. However, an outgoing east-bound link cannot receive a packet from an incoming south-bound link because of the routing algorithm above (first route along the x dimension and then the y dimension.)

Figure 5 shows the usage of relevant buffers in the absence of contention. (Figures 5, 6 and 7 will be provided in class.) A packet starts out at t=0 from S to D (3 hops away.) At t=0, its head flit is in the output buffer connected to the outgoing horizontal link. The rest of the packet is in the processor at S. At t=1, the head flit has been transferred to the output buffer of S' neighbor that is connected to an outgoing vertical link. The rest of the flits follow in tandem. Since the length of the packet is 3, the entire packet is safely in the destination processor at time t=5.

Handling Contention

If part of the onward path of a packet is currently being used by another packet, the former is blocked. Figure 6(a) is a snapshot at t=10 when packet B is just blocked by packet A. It can be verified that A must have begun its journey at t=0 and B at t=5. Figure 6(b) zooms into the routers in the vicinity of the point of contention. Note that the head flit of B sits in an input buffer but that its body flits sit in output buffers of the upstream routers. Fig. 6(b) also shows input snapshots when B is first unblocked.

Fig. 7 shows worm B blocked by A and C blocked by B. It is possible that an unblocked worm, R blocks one or more worms which in turn block one or more worms, etc. This results in a tree of blocked worms rooted at R. In such a case, no worm can be unblocked before its parent. In this scenario, each simulation cycle should first compute the new positions of the flits of R and then proceed down the tree.

Input to the Simulator

Your program should read input from a file formatted as follows

2   5
1   0   1   2   0   4   7
2   0   2   3   1   2   5
3   1   3   2   1   1   7
-1   3
4   4   4   4   3   3   8
5   4   1   4   2   3   6

Here's the interpretation of the different lines.
The first line contains the number of dimensions followed by the radix of the hypercube. (The hypercubes in this assignment will be either 2 or 3 dimensional.)
Lines 2, 3, 4, 6 and 7 specify the characteristics of a newly generated worm.
The first element is the worm id, followed by the time at which it enters the network. The next 2*d digits are the source address followed by the destination address (recall that the address of a node in a d-dimensional hypercube is represented by a d-tuple.) The last digit is the worm length.
Line 5 begins with a "-1". This is an indication that the program should output the state of the network at time t=3.

Output of your Program

For each line begining with a   "-1" the following output should be displayed on the screen. A list of worms currently in the network starting with the worm's id, the router lodging its head flit and whether it is blocked or unblocked. The list of worms should appear in ascending order of id.


Here are a few more clarifications:
In the 2-d hypercube, node (0,0) will be the node in the extreme north-west. The first co-ordinate is the x or horizontal or west-east co-ordinate and the second is the y or vertical or north-south co-ordinate. Co-ordinates will increase west to east and north to south. Thus an unblocked worm will move from west to east or north to south, never in the reverse direction (except while traversing the wrap-around connections.)
Again, in the 3-d cube, a node is represented as a 3-tuple. The first is the x, the second is the y and the third is the z co-ordinate or dimension.

The sink of a processor is capable of absorbing multiple flits in the same cycle. The flits being refered to of course belong to different worms destined for the same processor.

If two or more worms are blocked in a router (the latter may occur in the 3-d hypercube), they will be unblocked in the order of their arrival. However, if two (or more) worms arrive simultaneously, preference will be given to the worm in the following order:
incoming z link first, then incoming y link, then incoming x link and then from the processor (this is for the 3-d case. For the 2-d case, start with y.) The reason for this is: If the worm has arrived via the z-dimension, its body is probably spread over more of the network since it has traversed x and y links already. Probabilistically, it is blocking more worms. Hence let it be unblocked first.

If a processor generates another worm while the previous worm that it generated has not completely left its router, then that worm shall be discarded. (This means that it should not be buffered in the processor for later transmission. So, your program does not need to maintain a queue internal to the processor.)

Finally, here's a hand-crafted example of what your program should expect as input and the expected output.

The input file follows:

2   15
1   0   3   5   9   13   9
2   2   5   0   9   8   11
3   4   0   0   7   9   4
4   7   1   5   11   7   8
-1   7
5   10   2   0   9   3   4
-1   12

The corresponding output is:

State at time t =7
worm idlead flitco-ordinatesb/u
1 1 9,6 u
2 1 9,1 u
3 1 3,0 u
4 1 1,5 u

State at time t =12
worm idlead flitco-ordinatesb/u
1 1 9,11 u
2 1 9,5 b
3 1 5,0 b
4 1 6,5 u
5 1 2,0 b