删除或更新信息,请邮件至freekaoyan#163.com(#换成@)

A congestion-aware OE router employing fair arbitration for network-on-chip

本站小编 Free考研考试/2022-01-01




1.
Introduction




With the growing communication demand, the highly scalable, reusable, and parallel network-on-chip (NoC) has become an effective solution for integrated circuits[1]. The routing algorithm determines the forwarding paths for packets[2]. Determined routings can easily block injected packets due to the lack of path options[3]. To alleviate unbalanced network loads, adaptive routings supplying varied paths and proper selections are required[4]. Turn models that permit and prohibit dedicated turns are primarily applied to explore path diversity with deadlock freedom[5]. Since the congestion-oblivious turn models share the shortcoming of a disability to adjust the routing path based on network conditions[6], the emerging congestion-aware routings employ the congestion index (CI) in each direction to indicate the network load, and packet traversal can be optimized by selecting the least congested output[7]. In previous works, buffer status[8], channel pressure[9], and the number of blocked ports[10] can be applied as CIs; CIs for local and distant links[11] or even on the entire path[12] can be applied for congestion prediction.



Under the congestion-based priority scheme directed by CI, balanced network load can be achieved since packets can be transferred from heavily congested regions to light congested regions[13]. Nevertheless, random selections are conventionally introduced to the conditions of the same congestion level[14]; the extra delay caused by the random generator weakens the performance improvement of congestion alleviation. To address this issue, we propose a congestion-aware odd-even (OE) router employing fair arbitration (CAOE-FA), whose candidate outputs are determined based on the limitation of the OE turn model, and the least loaded downstream neighbor indicating the lowest buffer occupancy is selected as the output. For the candidates sharing the same CI value, the fair alterable priority (AP) arbitration[15] is introduced instead of selecting an output randomly, with its application field converted from permitted turns to directions with the same congestion level; thus, fairness can be supplied to compensate for the conventional degradations produced by the random generator. The proposed CAOE-FA is realized as a combination of congestion-awareness and fair arbitration. Therefore, the network performance of CAOE-FA can be optimized with the advantages of both the congestion-aware router and the fair arbiter.



The remainder of the paper is organized as follows: the architecture of the CAOE-FA router is introduced in Section 2, the routing scheme and latency model of CAOE-FA are analyzed in Section 3, simulation results are analyzed in Section 4, and conclusions of this paper are drawn in Section 5.




2.
CAOE-FA router architecture




Different from non-adaptive or congestion-oblivious routers, dedicated links are added to the CAOE-FA router for the demand of CI detection and propagation[9]. The internal architecture of a CAOE-FA router is depicted in Fig. 1, where the occupancy that an input buffer is filled with flits is employed to indicate the congestion level in the corresponding direction. A flit stands for the fundamental unit for data storage and forwarding[16]. Links to propagate flits and CIs are represented by black and red arrows respectively; furthermore, CI_local and CI_neighbor in all directions are marked to represent the congestion information of both local and neighboring ports. Thus, bi-directional links are formed to connect a local router to its downstream neighbors: for the input of flits and output of local CIs at each input port, as well as the output of flits and input of neighboring CIs at each output port. Generated cycle-by-cycle, local and downstream CIs are connected from the buffers of corresponding ports to the arbiter logic; the arbiter logic contains a comparator for gathered CIs, and a fair arbiter for outputs with the same CI. Local CIs are delivered to the arbiter logic of adjacent routers via corresponding ports, where they can be applied for adjacent routing determination; the local arbiter logic compares the gathered CIs from candidate neighbors limited by the OE turn model, to make its own routing decision. Inside the arbiter logic, the least congested neighboring port is selected as the next-hop destination for packets forwarding, based on the comparison; for neighbors sharing the same CI, the routing path is decided by the result of the fair arbiter.






onerror="this.onerror=null;this.src='http://www.jos.ac.cn/fileBDTXB/journal/article/jos/2018/12/PIC/18050005-1.jpg'"
class="figure_img" id="Figure1"/>



Download



Larger image


PowerPoint slide






Figure1.
(Color online) CAOE-FA router architecture.





3.
The mechanism of CAOE-FA





3.1
The CAOE-FA routing scheme




The congestion-oblivious OE turn model can achieve deadlock freedom without extra complexity, while the absence of CI information constrains the flexibility of routing[17]. Fully adaptive routings can make a routing decision among diverse paths according to the network condition, while the deadlock freedom cannot be guaranteed[18]. To utilize the advantages of both congestion-awareness and deadlock freedom, the OE turn model tends to be provided with congestion-aware selection to develop path diversity; to propose the CAOE-FA scheme, we take congestion states on each possible path as the primary basis for output selection. The buffer occupancies by flits in corresponding neighbors are employed as CIs to describe the congestion levels in all candidate directions. The CAOE-FA router makes a routing decision by selecting the lightest loaded permitted neighbor as the output. The congestion-based path determination is completed inside the comparator for gathered CIs. For neighboring routers on more than one candidate path sharing the same CI value, a further solution is required for determination; instead of conventional techniques that randomly select an output, the CAOE-FA routing adopts an AP arbiter[15] with fairness outperforming the Round-Robin arbiter[19].



A flowchart of the proposed CAOE-FA is depicted in Fig. 2. The coordinates of both the current node and the destination node for the forwarding packet are analyzed by the CAOE-FA router. According to the limitation of the OE turn model, the permitted and prohibited directions are determined. As shown in Fig. 2, if a single output is permitted, flits should be delivered to the limited output; for the case that multiple outputs are permitted, selection based on the congestion level is required. As a minimal routing, at most two candidate directions d1 and d2 build the set of candidate outputs[6]. According to the collected buffer occupancy of neighbors in each direction indicating its CI, the output is determined by the comparison result: the candidate direction with the lower CI is selected. For the case of multiple candidate directions sharing the same CI, the bits in the variable request representing d1 and d2 are set to 1, with the rest set to 0; the granting result of a fair AP arbiter is applied as the output for flit.






onerror="this.onerror=null;this.src='http://www.jos.ac.cn/fileBDTXB/journal/article/jos/2018/12/PIC/18050005-2.jpg'"
class="figure_img" id="Figure2"/>



Download



Larger image


PowerPoint slide






Figure2.
The flowchart of CAOE-FA.




The pseudo codes of CAOE-FA are listed in Algorithm 1. Path determinations are accomplished by the comparison based on the collected CI values in all permitted directions; to provide fairness, the fair AP arbiter is applied for outputs with the same CI value. The case that E and N directions are permitted by the OE turn model is taken for instance. CI in either the E or N direction is obtained as the variant buffer_occupancy from the corresponding downstream port represented by E_neighbor or N_neighbor; CIs in candidate directions can be calculated by the constant Buffer_size indicating the buffer capacity to receive flits, minusing the detected variant free_slots indicating the available vacant units in each buffer; in addition, the variant grant represents the arbitration result for directions sharing the same congestion level.



Algorithm 1. CAOE-FA scheme (with E and N directions permitted)



buffer_occupancy[E_neighbor]=Buffer_size-free_slots[E_neighbor];



buffer_occupancy[N_neighbor]=Buffer_size-free_slots[N_neighbor];



CI[E] = buffer_occupancy[E_neighbor];



CI[N] = buffer_occupancy[N_neighbor];



if(CI[E] < CI[N])



E←f;



if(CI[E] > CI[N])



N←f;



if(CI[E] = CI[N])



grant←AP arbiter for directions of E N



if(grant=E)



E←f;



if(grant=N)



N←f;



end if



In this paper, the application field for the fair AP arbiter is adjusted from all permitted directions to the candidate outputs sharing the same congestion level. Since CI values are applied as the primary basis for output selection, the fair AP arbiter is employed only when the congestion states of multiple candidate neighbors are the same. Equal probability can be offered to next-hop routers with an equal load condition, thus, the fair selection can provide a balanced load across the network[20]. In the stage of dealing with undetermined candidates with equal CI, the fair arbitration omits the process of random generation; thus, the improved application field for arbitration can contribute to optimized performance metrics for the CAOE-FA router.




3.2
The latency analysis for CAOE-FA




The routing latency for the packet consists of the processing latency on the router (LR) and the transmitting latency through the network (LNoC). For each routing algorithm i, its latency (L(i)) can be defined in Eq. (1):









$Lleft( i
ight) = {L_{
m{R}}}left( i
ight) + {L_{{
m{NoC}}}}left( i
ight).$


(1)



For OE, the router latency is comprised of the delay consumed by permission for directions (LPD), and the time to decide the output when more than one candidate directions exist (l). Since conventional OE simply selects N or S among multiple candidates, its l is negligible[6]. The router latency of OE can be determined in Eq. (2):









${L_{
m{R}}}{
m{(OE)}} = {L_{{
m{PD}}}} + lc.$


(2)



In Eq. (2), c stands for the state of candidate directions: if more than one outputs are permitted, c is set to 1, otherwise to 0.



For congestion-aware OE schemes, the time consumed by the process of CIs detection from candidate neighbors and the selection among them (Ls) is added to the router latency. The router latency for congestion-aware OE that randomly selects an output when multiple candidate tie (CAOE-random) in the CI value is shown in Eq. (3):









${L_{
m{R}}}{text{(CAOE-random)}} = {L_{{
m{PD}}}} + {L_{
m{S}}}c + {L_{{
m{rand}}}}t.$


(3)



In Eq. (3), t stands for the state of whether candidate directions tie in CI: if tie, t is set to 1, otherwise to 0. The Lrand stands for the time consumed by random selection. The router latency of the proposed CAOE-FA is listed in Eq. (4). With the application of fair arbiter to select the outputs that tie in CI, the latency for fair arbitration (LFA) replaces Lrand caused by random selection:









${L_{
m{R}}}{text{(CAOE-FA)}} = {L_{{
m{PD}}}} + {L_{
m{S}}}c + {L_{{
m{FA}}}}t.$


(4)



As observed from Eqs. (2)–(4), congestion-aware schemes introduce extra delay on routers for the comparison of CI and the decision when CI values tie. Whether the price paid can lead to optimized network latency has a significant influence on the performance of NoC. The network latency of routing algorithm i(LNoC(i)) can be defined as the product of a weight matrix (w(i)) and the congestion distribution (CD(i)), as shown in Eq. (5):









$begin{split}&{L_{{
m{NoC}}}}left( i
ight) = wleft( i
ight) cdot {
m{CD}}left( i
ight),&quad i in left{ {{text{OE,CAOE-random,CAOE-FA}}}
ight}.end{split}$


(5)



The weight matrix is determined by the path of packet forwarding, while the congestion distribution plays a prominent role in the LNoC. Even though congestion-aware schemes produce extra LR, they are able to alleviate uneven congestion for NoC to reduce the LNoC.




4.
Results analysis





4.1
Simulation setup




A cycle accurate simulator Noxim based on System C is applied to verify the congestion-alleviation properties of routings. The simulation parameters are listed in Table 1. The congestion-oblivious OE, CAOE-random, and CAOE-FA are used in experiments. Traffic patterns that determine the distributions for packets destinations are selected and are listed in Table 1. The four traffic patterns supported by Noxim are applied to decide the destinations by the coordinates of source nodes[21]. Hotspots can be added as a congested traffic mode to verify the traffic similar to realistic network application[4].






ParameterValue
Network size and topology4 × 4 mesh
Routing algorithmsOE, CAOE-random, CAOE-FA
Flit width32 bits
Packet size3 flits
Buffer sizeTraffic pattern9 flitsrandom, bit reversal, transpose 1, transpose 2, hotspot





Table1.
Configuration parameters for simulation.



Table options
-->


Download as CSV





ParameterValue
Network size and topology4 × 4 mesh
Routing algorithmsOE, CAOE-random, CAOE-FA
Flit width32 bits
Packet size3 flits
Buffer sizeTraffic pattern9 flitsrandom, bit reversal, transpose 1, transpose 2, hotspot






4.2
Network performance




Latency and throughput are the two important performance metrics to evaluate the network speed and capacity. Under heavy congestion, the waiting time for blocked packets increases the average transmission latency, and saturation occurring at the low injection rate prevents the network from receiving more packets; thus, both latency and throughput can be employed to evaluate the congestion alleviation of the dedicated routing algorithm[22]. To verify the flexibility of congestion-aware and congestion-oblivious routing schemes, the average packet latencies and saturation throughputs of OE CAOE-random and CAOE-FA are depicted in Figs. 3 and 4.






onerror="this.onerror=null;this.src='http://www.jos.ac.cn/fileBDTXB/journal/article/jos/2018/12/PIC/18050005-3.jpg'"
class="figure_img" id="Figure3"/>



Download



Larger image


PowerPoint slide






Figure3.
(Color online) Average packet latency without hotspot. (a) Random. (b) Transpose1. (c) Transpose2. (d) Bit reversal.




As observed in Fig. 3, under a low injection rate, the average packet latency of the congestion-oblivious OE remain the lowest, while the proposed CAOE-FA outperforms under a higher injection rate, benefiting from varied path options to bypass massive traffic. The reduction of CAOE-FA router is slight compared to OE (by merely 2.76%–8.72%), while it is more obvious compared to the CAOE-random (by 9.14%–16.02%). The reason is that under synthesis traffic patterns, the distributions of congestions and loads are increased uniformly with the growing packets injection rate; thus the improvements in congestion alleviation are unobvious. The CD(i) for OE and the congestion-aware schemes are almost equal. Since w(i) can be determined by the traffic pattern and remain almost unchanged for each routing algorithm, the NoC latencies of CAOE-random and CAOE-FA are almost the same without obvious congestion alleviation. Whereas, since congestion-aware schemes introduce the delay for CI based path determination, their router latency can be increased; for congestion-aware routers, the time for packets to pass through can be lengthened due to the processes of CI gathering and comparing. The CAOE-random fails to outperform congestion-oblivious OE, since the random generator for outputs selection brings about extra delay; Ls and Lrand increase the delay on the CAOE-random router, with negligible contribution to congestion alleviation.



The proposed CAOE-FA reduces the latency CAOE-random at an obvious level, benefiting from the replacement of the time-consuming random generator with the fair arbiter. The reduction of 9.14%–16.02% is contributed by the time-saving arbitration at the stage of path selection for the same candidate CI, proving LFA for CAOE-FA is much less than Lrand for CAOE-random. While the performance improvement of CAOE-FA compared to OE is slight, since both Ls and LFA can introduce increased router latency. However, the slight reduction of 2.76%–8.72% proves that the congestion is slightly alleviated, leading to shortened network latency LNoC(CAOE-FA) relevant to CD(CAOE-FA), which neutralizes the increased router latency.



As observed in Fig. 4, the improvement of saturation throughput compared to CAOE-random (by 18.06%–25.52%) is also more obvious than the comparison to OE (by 8.59%–22.83%). CAOE-random fails to reduce the OE latency, with increased router latency caused by Ls and Lrand, the capacity for the network is limited. For CAOE-FA, network congestion can be slightly alleviated due to the optimized scheme for path determination, as observed in Fig. 3, which brings about more balanced network load and improved network capacity.






onerror="this.onerror=null;this.src='http://www.jos.ac.cn/fileBDTXB/journal/article/jos/2018/12/PIC/18050005-4.jpg'"
class="figure_img" id="Figure4"/>



Download



Larger image


PowerPoint slide






Figure4.
(Color online) Saturation throughput without hotspot.




For realistic NoCs, packets transmission cannot be distributed ideally as synthesis traffic patterns do; packets tend to be transferred to dedicated destinations to meet the application demand, certain nodes receiving more packets are classified into hotspots. To verify the performance metrics more approximate to realistic network status, hotspots can be added to synthesis traffics. According to the destination-hotspot scenario, dedicated nodes are defined as hotspots that receive more packets than non-hotspot nodes do; multiple dedicated sources apply the defined hotspots as destinations, while the remaining nodes define their destinations according to synthesis traffic patterns. The average packet latencies and saturation throughputs of congestion-aware and congestion-oblivious routings under derived hotspot traffic patterns are depicted in Figs. 5 and 6.






onerror="this.onerror=null;this.src='http://www.jos.ac.cn/fileBDTXB/journal/article/jos/2018/12/PIC/18050005-5.jpg'"
class="figure_img" id="Figure5"/>



Download



Larger image


PowerPoint slide






Figure5.
(Color online) Average packet latency with hotspot. (a) Random-hotspot. (b) Transpose1-hotspot. (c) Transpose2-hotspot. (d) Bit reversal-hotspot.




As observed in Figs. 5 and 6, both CAOE-random and CAOE-FA outperforms congestion-oblivious OE under the hotspot-added traffic patterns; the CAOE-FA router reduces the average packet latency of OE by 10.45%–22.18% and improves the saturation throughput of OE by 39.21%–68.58%. Under the regionally congestion condition introduced by the hotspots destinations, CAOE-FA outperforms OE since packets can be transferred via less congested routes, benefiting from the adjustable output selection with network congestion as the primary basis; as well as the fair AP arbitration to remit the degradation caused by the random generator, at the process of selection under equal congestion levels. The traversing delay on the network can be obviously reduced with congestion alleviation; thus, the CAOE-FA achieves much lower LNoC due to its effective load balancing. On the other hand, the application of the fair arbiter can minimize the time consumed for selection among outputs sharing the same CI; thus, extra LR caused by CI comparing (Ls) and arbitration (LFA) can be overcome by the reduced network latency. As a sequence, CAOE-FA can improve the network capacity at a significant level. The reduction in average packet latency compared to CAOE-random is 8.8%–14.11%; the improvement in saturation throughput compared to CAOE-random is 29.71%–42.3%, benefiting from the replacement of the random generator.






onerror="this.onerror=null;this.src='http://www.jos.ac.cn/fileBDTXB/journal/article/jos/2018/12/PIC/18050005-6.jpg'"
class="figure_img" id="Figure6"/>



Download



Larger image


PowerPoint slide






Figure6.
(Color online) Saturation throughput with hotspot.




By analyzing the performance improvement of CAOE-FA, it can be concluded that both its congestion-awareness and fair arbitration can contribute to enhanced network speed and capacity. Under non-hotspot traffics, the reduced LFA plays a prominent role for lower forwarding latency; under regionally congested traffics with hotspots, the alleviated congestion condition leads to reduced LNoC for improved performance metrics.




4.3
Hardware implementation




To evaluate the hardware implementation overhead of the CAOE-FA router, area and power are the two most important metrics. The area cost of NoC consists of the areas of router, network interface, processing elements, and links. For varied routing algorithms, the structures of processing elements and interfaces remain the same; under the 2D Mesh network, the area is mainly consumed by the architectures of routers and relevant links. To verify the area cost, the buffers, logics, and switches inside the router as well as the signal links between routers should be evaluated under determined CMOS technology, through the synthesizing achieved by EDA tools. The power can be consumed by the router architecture and routing algorithm. Components of input buffers and crossbar switch, procedures of switch allocation and routing computation for data forwarding can produce power consumption. Under determined CMOS technology, power consumption of a router can be achieved by the waveform analysis of EDA tools. Thus, to evaluate the area costs and power consumptions of OE router and the proposed CAOE-FA, Design Complier F2011.09SP2 of Synopsys is employed as the EDA tool; the RTL codes of the routers with Verilog are synthesized by the SMIC CMOS 0.13 μm technology, at an operating frequency of 500 MHz and with a voltage supply of 1.8 V. The experiment results of implementation overheads are listed in Table 2.






Routing algorithmOECAOE-FAIncreasing percentage
Power consumption0.2959 mW0.2996 mW1.25%
Area cost587.473 μm2686.868 μm217.92%





Table2.
Implementation overhead.



Table options
-->


Download as CSV





Routing algorithmOECAOE-FAIncreasing percentage
Power consumption0.2959 mW0.2996 mW1.25%
Area cost587.473 μm2686.868 μm217.92%





As observed in Table 2, the congestion-aware path selection and the fair arbitration of the CAOE-FA router bring about extra hardware overheads of both power and area: 1.25% of increased power consumption and 17.92% of increasing area cost compared to the congestion-oblivious OE router. Compared to the improvements of CAOE-FA in network performance, the increased overhead is acceptable.




5.
Conclusion




In this paper, to meet the demand of varied transmission paths and congestion-alleviated selection, we propose a CAOE-FA router that combines the advantages of both congestion-awareness and fair arbitration. Among all directions permitted by the OE turn model, the lightest loaded output indicated by the least buffer occupancy is selected for packets forwarding. By comparing the gathered congestion information in permitted directions, the routing path determination of CAOE-FA is provided with both congestion-awareness and deadlock freedom. Furthermore, for candidate outputs sharing the same congestion level, the fair arbiter is employed to replace the conventional time-consuming random selection. Since the CAOE-FA router can bypass packets to light congested regions, blocking problems can be alleviated without the degradation caused by the random generator; thus, more balanced network load and more flexible routing can be provided. Experimental results show that the proposed CAOE-FA router can contribute to improved network conditions, by reducing the packet traversing latency and improving the network throughput significantly with acceptable increasing in area and power consumption.



相关话题/congestionaware router employing

  • 领限时大额优惠券,享本站正版考研考试资料!
    大额优惠券
    优惠券领取后72小时内有效,10万种最新考研考试考证类电子打印资料任你选。涵盖全国500余所院校考研专业课、200多种职业资格考试、1100多种经典教材,产品类型包含电子书、题库、全套资料以及视频,无论您是考研复习、考证刷题,还是考前冲刺等,不同类型的产品可满足您学习上的不同需求。 ...
    本站小编 Free壹佰分学习网 2022-09-19