# Regular Layouts of Butterfly Networks\* Jörg Keller Fachbereich 14 Informatik Universität des Saarlandes Postfach 151150, 66041 Saarbrücken, Germany #### Abstract Physical arrangements of butterfly networks impose severe problems because of wire length. The problem gets even harder if standard technology like printed circuit boards, racks, and cabinets, must be used. We investigate regular arrangements of butterfly networks. We construct xu-stage butterfly networks from u-stage networks. The sub-networks can be arranged in x cubes of dimension x-1 such that sub-networks in cubes i and i+1 are connected if their coordinates differ only in dimension i. This allows for regular wiring and hence for use of standard components. The proposed method generalizes a result by Wise, which is obtained by choosing x=2. For x=3 we obtain an arrangement that can be used to layout a butterfly on a chip with regular wiring or to implement a butterfly on printed circuit boards. ### 1 Introduction Butterfly networks are often used as interconnection networks of parallel machines [1, 2, 3] because of several advantageous properties. A butterfly network with $d^n$ inputs and outputs and nodes with indegree and outdegree d has depth n, which is optimal for a constant-degree network. As the path between each pair of inputs and outputs is unique, routing decisions only need the d-ary representation of the output number. This simplifies routing logic and reduces node delay. Furthermore, there exist packet routing algorithms which route n-relations in T = O(n) time with high probability and only require buffers capable of storing a constant number of packets [4, 3]. This allows to use the same network node design for networks of different size. Moreover, these algorithms allow a throughput that is linear in the number of inputs and outputs, as they deliver $nd^n/T = d^n$ packets per O(1) time steps. This can be used to run the network in a "pipelined" manner, with an initial delay of O(n) and subsequent delay of O(1) between packets. However, physical arrangements of butterfly networks reveal some difficulties. While networks as three-dimensional meshes directly lead to arrangements with constant wire length, butterfly networks cannot achieve this. Vitányi proves that the average wire length in a butterfly network cannot be a constant even under relaxed conditions (nodes have unit volume and arbitrary shape, wires have zero volume) [5]. Moreover, straightforward arrangements lead <sup>\*</sup>This work was supported by NWO through NFI project ALADDIN under contract no. NF 62-376, and by DFG through SFB 124, TP D4. Part of this work was done while the author was at CWI, Amsterdam, The Netherlands. <sup>&</sup>lt;sup>1</sup>Routings such that n packets are fed into every input and n packets are collected by every output. Figure 1: Wise's Arrangement of Boards to wiring that is not regular and therefore not suited for an implementation with standard components. Wise proposes an arrangement based on cutting a 2u-stage butterfly network into two parts [6]. Each part now consists of a number of u-stage subnetworks. Each subnetwork is implemented on one printed circuit board (or board for short). Wise shows that each subnetwork of the first part is connected to each subnetwork of the second part. He arranges the boards as shown in Fig. 1. The boards of each part are put in a rack vertically, the racks are put one atop another such that boards of different parts are orthogonal. Now all connections between different boards can be done by short vertical wires or via connectors. Wise's arrangement has the disadvantage that racks with different orientations of boards do not fit into standard cabinets. Vertical wires or connectors between the boards make the removal of boards very difficult. Also, as boards are only available up to a size of $50\,\mathrm{cm}\times50\,\mathrm{cm}$ and must have a minimum distance of approximately $4\,\mathrm{cm}$ from each other to allow cooling, the applicability of this approach is restricted to a small number of stages. We generalize Wise's approach to overcome these disadvantages. In Sect. 2 we explore the interconnection structure of xu-stage butterfly networks constructed from u-stage butterfly networks. In Sect. 3 we apply our knowledge to derive three arrangements for x=2,3,4, respectively. For x=2 we obtain Wise's arrangement. In Sect. 4 we compute the lengths of wires in the arrangement for x=3, which we think to be most practical for current technology. We present our conclusions in Sect. 5. ### 2 Structure of Interconnections In the sequel we will denote the set $\{0, 1, ..., d-1\}$ by $B_d$ . **Definition 1** The n-stage d-ary butterfly network is a graph G = (V, E) with node set $V = \{0, \ldots, n-1\} \times B_d^{n-1}$ . The first index is called stage, level or column number, the second is called row number. For all $0 \le i, j \le n-1$ and $v_k, w_k \in B_d$ , nodes $\langle i, v_{n-2}, \ldots, v_0 \rangle_{n,d}$ and $\langle j, w_{n-2}, \ldots, w_0 \rangle_{n,d}$ are connected if and only if j = i+1 and and $v_k = w_k$ for $0 \le k \le n-2$ and $k \ne i$ . Inputs are connected to nodes in stage 0, d inputs per node, outputs are connected to nodes in stage n-1, d outputs per node. We denote nodes in n-stage d-ary butterfly networks by $(i, v_{n-2} \dots v_0)_{n,d}$ , where $0 \le i \le n-1$ and $v_j \in B_d$ for $0 \le j \le n-2$ . Note that we write row numbers with digit zero rightmost, as used for the representation of integers. Figure 2: (a) A 4-Stage Binary Butterfly Network, (b) The Network cut between levels 1 and 2 A 4-stage 2-ary (or binary) butterfly is shown in Fig. 2(a). Note that there is still some confusion about what to call a "Butterfly Network". We adapt our definition from Leighton [7, p. 440]. Others call this network an "FFT network" and call "Butterfly" the graph obtained when merging the vertices of the first and the last levels having the same row number (see [8]). Note also that the graph here is drawn differently from the example in Leighton's book, where the levels are in reversed order, because Leighton writes row numbers with digit zero leftmost, but orders the rows 0000,0001,0010,... as we do. By a *cut* through an *n*-stage butterfly network G between stages i and i+1 we mean the removal of all edges in G of the form $(\langle i, v \rangle_{n,d}, \langle i+1, w \rangle_{n,d})$ , for $0 \le i \le n-2$ and $v, w \in B_d^{n-1}$ . Fig. 2(b) shows a 4-stage binary butterfly network cut between stages 1 and 2. The removed edges are shown as dotted lines. The connected components are drawn with different shades. For an n-stage butterfly network G with one cut (or several cuts) we denote by the arranged graph G' = (V', E') the graph where each node represents a connected component of G after the cut. Nodes in G' are connected if and only if the connected components in G that they represent were connected by an edge that was removed during the cut. Hence, G' is the quotient graph of G. Fig. 3 shows the arranged graph of the butterfly network shown in Fig. 2. The nodes are shaded in analogy to the connected components in Fig. 2(b). Consider a n-stage d-ary butterfly network G, where n = xu. If we cut the network between stages iu - 1 and iu for $1 \le i < x$ , it falls into x parts of u stages each, numbered from 0 to x - 1. Figure 3: The Arranged Graph G' of a 4-Stage Butterfly Network **Lemma 1** After the cut, each part of G consists of $d^{(x-1)u}$ u-stage binary butterfly networks, hence the arranged graph G' consists of a set of nodes $V' = \{0, \ldots, x-1\} \times \{0, \ldots, d^{(x-1)u}-1\}$ . **Proof:** When we cut G in parts of u stages, it is clear from Def. 1 that each part consists of u-stage d-ary butterfly networks with $d^{u-1}$ rows each. As each row in one part is only contained in one u-stage butterfly and as there are $d^{n-1}$ rows in total, there must be $d^{n-1}/d^{u-1} = d^{(x-1)u}$ butterfly networks in each part. We number the u-stage butterfly networks of each part (and also the nodes in G' that represent them) from 0 to $d^{(x-1)u}-1$ according to the smallest row number they contain. We here interpret the row numbers — which are strings of length n-1 over the alphabet $B_d$ — as (n-1)-digit representations to base d. More formally, for a string $a_{s-1} \dots a_0 \in B_d^s$ the number represented by it is $\mathbf{n}_d(a_{s-1} \dots a_0) = \sum_{i=0}^{s-1} a_i \cdot d^i$ . Vice versa, for integers s and d and $v \in \{0, \dots, d^s-1\}$ , the function $\mathbf{r}_{s,d}(v) = a_{s-1} \dots a_0$ with $a_i \in B_d$ denotes the s-digit d-ary representation of v, such that $\mathbf{n}_d(a_{s-1} \dots a_0) = v$ . As a consequence, we notice that V' is isomorphic to $\tilde{V} = \{0, \ldots, x-1\} \times B_{d^u}^{x-1}$ by $\varphi : V' \to \tilde{V}$ , where $\varphi(i,v) = \langle i, \mathbf{r}_{x-1,d^u}(v) \rangle_{x,d^u}$ for $0 \le i \le x-1$ and $v \in \{0,\ldots,d^{(x-1)u}-1\}$ . We will use $\tilde{V}$ instead of V' in the sequel. **Theorem 1** If we arrange G after the cuts described above, then E' consists exactly of all edges of the form $$(\langle i-1, c_{x-2} \dots c_0 \rangle_{x,d^u}, \langle i, c_{x-2} \dots c_i \alpha c_{i-2} \dots c_0 \rangle_{x,d^u}),$$ for all $\alpha \in B_{d^u}$ and $c_j \in B_{d^u}$ , with $1 \le i \le x - 1$ and $0 \le j \le x - 2$ . Corollary 1 The arranged graph G' is isomorphic to the x-stage $d^u$ -ary butterfly network. **Proof of Theorem 1:** We will first develop mappings $\psi_i$ between rows in G and nodes in level i of G' of the form "row x of G is contained in node $\langle i, \psi_i(x) \rangle_{x,d^u}$ in part i of the arranged graph". Then we will derive the edge structure of G' by applying this mapping to the edge structure of G. It is easy to see that a node in level i of G' contains only rows of G that differ in digits iu to (i+1)u-2, i.e. rows of the form $$a_{xu-2} \dots a_{(i+1)u-1} \beta_{u-2} \dots \beta_0 a_{iu-1} \dots a_0$$ for all $\beta = \beta_{u-2} \dots \beta_0 \in B_d^{u-1}$ . It follows from our naming convention that nodes in level i of G' are ordered according to rows where $\beta = 0 \dots 0$ . We conclude that for nodes $\langle i, b_{x-2} \dots b_0 \rangle_{x,d^u}$ and $\langle i, b'_{x-2} \dots b'_0 \rangle_{x,d^u}$ in G' that contain rows $a_{xu-2} \dots a_{(i+1)u-1} \beta_{u-2} \dots \beta_0 a_{iu-1} \dots a_0$ and $a'_{xu-2} \dots a'_{(i+1)u-1} \beta'_{u-2} \dots \beta'_0 a'_{iu-1} \dots a'_0$ of G, respectively, $$\begin{array}{lcl} & \mathbf{n}_{d^{u}}(b_{x-2} \dots b_{0}) & < & \mathbf{n}_{d^{u}}(b'_{x-2} \dots b'_{0}) \\ \iff & \mathbf{n}_{d}(a_{xu-2} \dots a_{(i+1)u-1} \underbrace{0 \dots 0}_{u-1} a_{iu-1} \dots a_{0}) & < & \mathbf{n}_{d}(a'_{xu-2} \dots a'_{(i+1)u-1} \underbrace{0 \dots 0}_{u-1} a'_{iu-1} \dots a'_{0}) \\ \iff & \mathbf{n}_{d}(a_{xu-2} \dots a_{(i+1)u-1} a_{iu-1} \dots a_{0}) & < & \mathbf{n}_{d}(a'_{xu-2} \dots a'_{(i+1)u-1} a'_{iu-1} \dots a'_{0}) \end{array}.$$ As $|B_{du}^{x-1}| = |B_d^{(x-1)u}|$ , the orders are total and isomorphic and we obtain mappings $\psi_i$ for each level i of G' of the following form: $$\psi_{i}(a_{n-2} \dots a_{(i+1)u-1}\beta_{u-2} \dots \beta_{0}a_{iu-1} \dots a_{0}) = \mathbf{r}_{x-1,d^{u}}(\mathbf{n}_{d}(a_{n-2} \dots a_{(i+1)u-1}a_{iu-1} \dots a_{0})) = \mathbf{n}_{d}(a_{n-2} \dots a_{(x-1)u-1}) \dots \mathbf{n}_{d}(a_{(i+2)u-2} \dots a_{(i+1)u-1})\mathbf{n}_{d}(a_{iu-1} \dots a_{(i-1)u}) \dots \mathbf{n}_{d}(a_{u-1} \dots a_{0}).$$ (1) Intuitively, we take the (n-1)-digit d-ary representation of a row, erase digits iu to (i+1)u-2, and turn the remaining (x-1)u-digit d-ary string into an (x-1)-digit $d^u$ -ary string by grouping u consecutive digits into the number they represent. To find the edges in E', we need the edges in G that were removed during the cuts. These are all edges between levels iu-1 and iu in G, for $1 \leq i < x$ . Consider node $X = \langle iu-1, v_{n-2} \dots v_0 \rangle_{n,d}$ in G. By Def. 1, X is connected to all nodes of the form $Y_j = \langle iu, v_{n-2} \dots v_{iu} j v_{iu-2} \dots v_0 \rangle_{n,d}$ for all $j \in B_d$ . If we identify nodes X and $Y_j$ with their row numbers, then this generates edges in E' of the form $e_j(X) = (\langle i-1, \psi_{i-1}(X) \rangle_{x,d^u}, \langle i, \psi_i(Y_j) \rangle_{x,d^u})$ . By Eq. (1) we get $$\psi_{i-1}(X) = \mathbf{n}_d(v_{n-2} \dots v_{(x-1)u-1}) \dots \mathbf{n}_d(v_{(i+1)u-2} \dots v_{iu-1}) \mathbf{n}_d(v_{(i-1)u-1} \dots v_{(i-2)u}) \dots \mathbf{n}_d(v_{u-1} \dots v_0)$$ $$\psi_i(Y_j) = \mathbf{n}_d(v_{n-2} \dots v_{(x-1)u-1}) \dots \mathbf{n}_d(v_{(i+2)u-2} \dots v_{(i+1)u-1}) \mathbf{n}_d(jv_{iu-2} \dots v_{(i-1)u}) \dots \mathbf{n}_d(v_{u-1} \dots v_0)$$ It follows that $e_j(X) = (\langle i-1, c_{x-2} \dots c_0 \rangle_{x,d^u}, \langle i, c'_{x-2} \dots c'_0 \rangle_{x,d^u})$ where $$c_k = c'_k = \begin{cases} \mathbf{n}_d(v_{(k+2)u-2} \dots v_{(k+1)u-1}) & : i \leq k \leq x-2 \\ \mathbf{n}_d(v_{(k+1)u-1} \dots v_{ku}) & : 0 \leq k \leq i-2 \end{cases}.$$ As $v_{iu-2} \dots v_{(i-1)u}$ can be chosen freely without changing $\psi_{i-1}(X)$ and j can be arbitrary in $B_d$ , $c'_{i-1} = \mathbf{n}_d(jv_{iu-2}\dots v_{(i-1)u})$ will be any value in $B_{d^u}$ when using all nodes X in G with the same $\psi_{i-1}(X)$ and all $j \in B_d$ . If we choose $\alpha = c'_{i-1}$ the Theorem follows. # 3 Applications The structure of the arranged graph G' allows the geometric interpretation, that each level can be arranged in an (x-1)-dimensional cube with side length $d^u$ . The nodes of the level are placed on the integer lattice in that cube, i.e. on positions with integral coordinates. We then arrange the x cubes in such a way that wiring is regular. We will develop our applications for binary butterfly networks, i.e. d=2. However, the concepts can also be applied to arbitrary d. If we implement each u-stage butterfly network on a printed circuit board or as a basic building block on a chip, then Theorem 1 directly leads to applications for x = 2, 3, 4. We do not consider larger values of x because we do not want to deal with cubes of more than three dimensions. Figure 4: Arrangement of Boards for n = 6 and x = 3 ### 3.1 Arranging 2*u*-stage butterfly networks For butterfly networks with 2u stages (i.e. x=2), we obtain $V'=\{0,1\}\times B_{2^u}$ and $E'=\{(\langle 0,v_0\rangle_{2,2^u},\langle 1,w_0\rangle_{2,2^u})|v_0,w_0\in B_{2^u}\}$ . The arranged graph G' is the complete bipartite graph with $2^u$ nodes per partition. Each level consists of a line (i.e. a 1-dimensional cube) of boards. This is exactly the arrangement proposed by Wise. ### 3.2 Arranging 3*u*-stage butterfly networks For x = 3, we obtain $V' = \{0, 1, 2\} \times B_{2^u}^2$ and $$E' = \{ (\langle 0, v_1 v_0 \rangle_{3,2^u}, \langle 1, v_1 \alpha \rangle_{3,2^u}) \mid v_0, v_1, \alpha \in B_{2^u} \}$$ $$\cup \{ (\langle 1, v_1 v_0 \rangle_{3,2^u}, \langle 2, \alpha v_0 \rangle_{3,2^u}) \mid v_0, v_1, \alpha \in B_{2^u} \}.$$ The nodes of each part of G' form a square (i.e. a 2-dimensional cube) with side length $2^u$ . We arrange the squares as shown in Fig. 4. The squares for parts zero and two are split into two pieces each to make the arrangement symmetric. All interconnections between nodes are horizontal or vertical. Two example wirings are shown in Fig. 4. If each node represents a basic block on a chip, then the proposed geometry Figure 5: Arrangement of Boards for x = 4 leads to a regular layout with cell blocks and wiring channels. Such a layout should even be appropriate for standard cell technology, where a straightforward layout would lead to irregular wiring and thus might impose severe problems to wire routing. If each node represents a printed circuit board, then the boards can be put into standard racks and cabinets. Each board can be removed as soon as the links connected to it are removed. If links are narrow, then they can leave the boards via connectors to the backplane, the wiring can be made on the rear side of the cabinet. If links are wide, then links can leave boards via connectors at the front and the back of the cabinet. The links form wiring channels between the boards. Note that the arrangement can also be applied if n is not a multiple of three. If n = 3u + 1, we cut between levels u - 1 and u and levels 2u and 2u + 1. In parts 0 and 2 of the arranged graph G' we merge two nodes with consecutive row numbers into one. In the same manner we proceed for n = 3u + 2. We cut between levels u - 1 and u and levels 2u + 1 and 2u + 2 and merge in parts 0 and 2 of G' four nodes into one. **Theorem 2** When merging nodes in G' as given by one of the two procedures above, the resulting graph G'' is isomorphic to the 3-stage $d^u$ -ary butterfly network. Each edge in G'' represents d edges of G for n = 3u + 1 and $d^2$ edges of G for n = 3u + 2. The formal proof of Theorem 2 is given in Appendix A. The 3-level arrangement was described for n = 7 and d = 2 in [9] and sketched for arbitrary n and d = 2 in [10]. #### 3.3 Arranging 4*u*-stage butterfly networks For x = 4 we arrange the nodes of each part at positions with integral coordinates in a cube with a side length of $2^u$ . We arrange the four parts as shown in Fig. 5. Connections between nodes all run orthogonal to the plane separating two cubes. This arrangement seems to be inappropriate for current technology. However, it might very well be useful for 3-dimensional VLSI design in the future, or for future types of rack and cabinet technologies. ### 4 Wire Lengths We compute the length of wires in arrangements of 3u-stage butterfly networks. We restrict ourselves to the case x = 3, because the case x = 2 was investigated by Wise [6] and the case x = 4 seems to be inappropriate for current VLSI and rack technology. Consider again Fig. 4. The longest horizontal (vertical) wire crosses the complete part 1 and half of part 2 (or 0 for a vertical wire). We assume that the nodes have uniform rectangular shape with height $h_y$ and width $h_x$ . Let further the width of each wiring channel between rows or columns of nodes be w. We assume no vertical spacing between nodes in part 0 and no horizontal spacing between nodes in part 2. Then the length of a wire is at most $$l_i = w2^u + h_i(2^u + 2^u/2) (2)$$ where i = y for vertical wires and i = x for horizontal wires. The first term represents the $2^u$ wire channels in part 1, and the second term represents the $2^u$ nodes the wire has to pass in part 1 and the $2^u/2$ nodes in part 0 or 2. The channel width w can be expressed in terms of the thickness of a single wire $w_0$ . Each node in part 0 has $2^u$ output wires, and $2^u$ nodes form a column in part 0. Hence $2^u \cdot 2^u$ wires cross the border between part 1 and part 0 (or 2) in each channel and $w = 2^{2u}w_0$ . The nodes' height $h_y$ can be given by the number of wires leaving the node, hence $h_y = 2^u w_1$ where $w_1$ is the length of the device needed to connect a wire to the node. This is either the length of a connector if nodes represent boards or the size of the contact area between a wire and a gate if nodes represent basic blocks on a chip. The nodes' width $h_x$ is a constant $w_2$ depending on technology. Unfortunately, for printed circuit boards $h_y \gg h_x$ , resulting in different lengths for horizontal and vertical wires. We therefore give the arrangement equal height and width. If $h_y = 4h_x$ , then we merge consecutive rows of boards two by two. We reduce the height by a factor two and increase the width by a factor of two, resulting in "pseudo heights" $h'_y = h'_x = h_y/2 = 2h_x$ . If in general $h_y = 4^i h_x$ or $\log_4(h_y/h_x) = i$ , then we merge $\lfloor 2^i \rfloor$ consecutive rows of boards into one and obtain $h'_y = h'_x = \sqrt{h_y h_x}$ as "pseudo" height and width. We now can transform Eq. 2 to $$l_x = l_y = 2^{3u}w_0 + 3 \cdot 2^{u-1}\sqrt{2^u w_1 w_2}.$$ (3) We see that the length of a wire is linear in the number of inputs of the butterfly network, but with a very small constant $w_0$ . For printed circuit boards, we have constants $w_0 = 0.9 \ mm$ for flat cable, $w_1 \approx 10 \ mm$ for standard 64 pin connectors and $w_2 = 2.4 \ mm$ for printed circuit boards with at most six signal layers. Then we obtain the following values for $l_x$ (measured in mm): | u | 1 | 2 | 3 | |-------|------|-------|-------| | $l_x$ | 28.0 | 116.4 | 627.1 | The wire lengths might be somehow too optimistic because printed circuit boards have some minimum height independent of the number of connectors on them and because we assumed no spacing between boards except for the wire channels. These restrictions however should increase the length by a factor of less than two. Note that by grouping each wire channel into u layers of $2^{2u}/u$ wires each, the wire length can be reduced to $O(2^{3u}/u)$ which is equivalent to the best 2-dimensional solution which is optimal [11]. # 5 Conclusions Although butterfly networks have no obvious regular implementation with short wires, their structure can be exploited to obtain arrangements with regular wiring in VLSI and printed circuit technology. Furthermore, the arrangements allow the use of standard components like racks and cabinets if the butterfly network is implemented on printed circuit boards. The method we used is a generalization of a result by Wise. We investigated a 3-level approach in more detail, because we believe that it is most suited for current VLSI and printed circuit technology. The 3-level approach is used in the design of the SB-PRAM, a 128-processor machine with a shared address space but distributed memory modules [10]. ## **Appendix** # A Extension of 3-level arrangements to arbitrary n To prove Theorem 2, we proceed as we did in the proof of Theorem 1. We first develop mappings and then apply them to derive the edge structure of the arranged graphs. First we treat the case n=3u+1. After cutting between levels u-1 and u and 2u and 2u+1, parts 0 and 2 decompose into $d^{2u+1}$ u-stage butterfly networks, part 1 decomposes into $d^{2u}$ (u+1)-stage butterfly networks. If we merge two networks of parts 0 and 2 into one node of the arranged graph, then the node set of this new graph G'' is isomorphic to $\{0,1,2\} \times B_{d^u}^2$ . We obtain a mapping $$\psi_{i}(a_{n-2}...a_{0}) = \begin{cases} \mathbf{n}_{d}(a_{3u-1}...a_{2u})\mathbf{n}_{d}(a_{2u-1}...a_{u}) & : i = 0\\ \mathbf{n}_{d}(a_{3u-1}...a_{2u})\mathbf{n}_{d}(a_{u-1}...a_{0}) & : i = 1\\ \mathbf{n}_{d}(a_{2u-1}...a_{u})\mathbf{n}_{d}(a_{u-1}...a_{0}) & : i = 2. \end{cases}$$ $$(4)$$ To derive the edges between levels 0 and 1 of G'', we need the edges between levels u-1 and u in G that were removed during the cut. We consider a node $X = \langle u-1, a_{n-2} \dots a_0 \rangle_{n,d}$ from G that is by Def. 1 connected to all nodes $Y_j = \langle u, a_{n-2} \dots a_u j a_{u-2} \dots a_0 \rangle_{n,d}$ in G, for $0 \le j \le d-1$ . If we identify each node with its row number, then this generates edges in G'' of the form $(\langle 0, \psi_0(X) \rangle_{3,d^u}, \langle 1, \psi_1(Y_i) \rangle_{3,d^u})$ . By Eq. 4 we obtain $$\psi_0(X) = \mathbf{n}_d(a_{3u-1} \dots a_{2u}) \mathbf{n}_d(a_{2u-1} \dots a_u) \psi_1(Y_j) = \mathbf{n}_d(a_{3u-1} \dots a_{2u}) \mathbf{n}_d(ja_{u-2} \dots a_0) .$$ The value of $a_{u-2} ldots a_0$ can be freely chosen without changing $\psi_0(X)$ . Furthermore, j will have all values in $B_d$ . Hence, $\mathbf{n}_d(ja_{u-2} ldots a_0)$ will have all values in $B_d^u$ and the edges generated in G'' have the required form. Note that each edge in G'' represents d edges of G as $a_{u-1}$ can have any value in $B_d$ without changing $\psi_0(X)$ and $\psi_1(Y_j)$ . The edges between levels 1 and 2 of G'' can be verified in the same manner. For the case n=3u+2, we cut between levels u-1 and u and between levels 2u+1 and 2u+2. Parts 0 and 2 decompose into $d^{2u+2}$ u-stage butterfly networks, part 1 decomposes into $d^{2u}$ (u+2)-stage butterfly networks. In parts 0 and 2 we merge four sub-networks into one node of the arranged graph. Then the node set of G'' is isomorphic to $\{0,1,2\} \times B_{d^u}^2$ and we obtain a mapping $$\psi_i(a_{n-2}...a_0) = \begin{cases} \mathbf{n}_d(a_{3u}...a_{2u+1})\mathbf{n}_d(a_{2u}...a_{u+1}) & : & i = 0 \\ \mathbf{n}_d(a_{3u}...a_{2u+1})\mathbf{n}_d(a_{u-1}...a_0) & : & i = 1 \\ \mathbf{n}_d(a_{2u-1}...a_u)\mathbf{n}_d(a_{u-1}...a_0) & : & i = 2 \end{cases}.$$ Now we can verify the edge structure of G'' in the same manner as we did for the case n=3u+1. The only difference will be that two digits in X can be changed without either changing $\psi(X)$ or $\psi(Y_j)$ . Hence each edge in G'' represents $d^2$ edges of G. ## References - [1] Butterfly Product Overview, BBN Advanced Computers, Inc., Cambridge, MA, 1987. - [2] Gottlieb, A., R. Grishman, C. P. Kruskal, K. McAuliffe, L. Rudolph, and M. Snir, The NYU ultracomputer designing an MIMD shared memory parallel computer, *IEEE Trans. Comput.* C-32(2) (1983) 175–189. - [3] Ranade, A. G., How to emulate shared memory, J. Comput. System Sci. 42(3) (1991) 307–326. - [4] Leighton, F., B. Maggs, and S. Rao, Universal packet routing algorithms, in: *Proc. 29th Symp. on Foundations of Computer Science* (IEEE, 1988) pp. 256–269. - [5] Vitányi, P. M. B., Locality, communication and interconnect length in multicomputers, SIAM J. Comput. 17(4) (1988) 659-672. - [6] Wise, D. S., Compact layouts of Banyan/FFT networks, in: H. T. Kung, B. Sproull, and G. Steele, eds., Proc. CMU Conference on VLSI Systems and Computations, (1981) pp. 186-195. - [7] Leighton, F. T., Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes (Morgan Kaufmann Publ., San Mateo, CA, 1992). - [8] Annexstein, F., M. Baumslag, and A. L. Rosenberg, Group action graphs and parallel architectures, SIAM J. Comput. 19(3) (1990) 544-569. - [9] Abolhassan, F., R. Drefenstedt, J. Keller, W. J. Paul, and D. Scheerer, On the physical design of PRAMs, in: J. Buchmann, H. Ganzinger, and W. J. Paul, eds., *Informatik* Festschrift zum 60. Geburtstag von Günter Hotz (Teubner, Stuttgart, 1992) pp. 1–19. - [10] Abolhassan, F., R. Drefenstedt, J. Keller, W. J. Paul, and D. Scheerer, On the physical design of PRAMs, *Comput. J.* **36**(8) (1993) 756–762. - [11] Beigel, R. and C. P. Kruskal, Processor Networks and Interconnection Networks without Long Wires, in: *Proc. Symp. on Parallel Algorithms and Architectures* (ACM, New York, NY, 1989) pp. 42–51.