CN109547263B

CN109547263B - Network-on-chip optimization method based on approximate calculation

Info

Publication number: CN109547263B
Application number: CN201811537135.6A
Authority: CN
Inventors: 肖思源; 王小航; 潘文明
Original assignee: Guangzhou Jianfei Communication Co ltd; South China University of Technology SCUT
Current assignee: Guangzhou Jianfei Communication Co ltd; South China University of Technology SCUT
Priority date: 2018-12-15
Filing date: 2018-12-15
Publication date: 2021-08-20
Anticipated expiration: 2038-12-15
Also published as: CN109547263A

Abstract

The invention discloses a network-on-chip optimization method based on approximate computation, which comprises a data cutter, a data restorer, a flow predictor, a global controller and a local controller, wherein the data cutter is used for cutting data; the data clipping device clips data before the data packet is injected into the network port, and the length of the data packet is shortened; the data restorer restores the lost data after receiving the clipped data packet; the flow predictor predicts the data flow in the next regulation interval according to the past node communication condition; the global controller is used for calculating the approximately calculated optimal configuration of each node based on global information and the quality requirement of a user under a global view angle and sending control information to each node; and the local controller configures the data loss rate of each data packet waiting to be injected into the network according to the received control information. The method can optimize the performance and the power consumption of the network on chip with lower cost on the premise of not violating the requirement of a user on the output quality.

Description

Network-on-chip optimization method based on approximate calculation

Technical Field

The invention relates to the technical field of network optimization, in particular to a network-on-chip optimization method based on approximate calculation.

Background

In the background of gradual upgrading of semiconductor technology, the many-core chip is an effective design for improving the system performance-power consumption ratio. In response to the requirements of many-core chips in various aspects such as communication bandwidth, low power consumption, and expansibility, the network-on-chip is considered to be a promising technology. Many of the applications in the fields of image processing, machine learning and the like which are popular nowadays have the characteristics of large data communication traffic and capability of tolerating a certain output result error, so that approximate calculation is taken as a new computer design idea to improve the performance of a system or save the energy consumption of the system by relaxing the requirement on the result accuracy. However, conventional network-on-chip designs have no or little tolerance for errors by applications, and in the network-on-chip environment, control of the output quality of applications is a complex problem.

The invention patent application with the application publication number of CN108183860A discloses a two-dimensional network-on-chip self-adaptive routing method based on a particle swarm algorithm, wherein the particle swarm algorithm is applied in the patent application and used for calculating the optimal path of a data packet route, and the purposes of reducing network delay and balancing network load are achieved. The above patent application does not take advantage of the error tolerance characteristics of the application by changing the data transmission path for network optimization.

The invention patent application with the application publication number of CN108173760A discloses a network-on-chip mapping method based on an improved simulated annealing algorithm, and the optimal mapping from an IP core to a network-on-chip node is calculated through the improved simulated annealing algorithm to optimize the power consumption of a system. The above patent application optimizes the network only by adjusting the position relationship of the IP core in the network on chip, and does not utilize the tolerance of the application program to the error in the output.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a network-on-chip optimization method based on approximate calculation, which can maximize the utilization of error tolerance in application to optimize the performance of the network-on-chip on the premise of not violating the requirement of a user on output quality.

The technical solution of the present invention for solving the above technical problems is as follows.

A network-on-chip optimization method based on approximate computation comprises a flow predictor, a data clipper, a data restorer, a global controller and a local controller; the flow predictor operates on each node, data flow in each regulation interval is predicted at the beginning of each regulation interval according to the past operation condition of the node, and each node sends a flow prediction value to the global controller.

Each node of the network on chip is respectively provided with a data cutter and a data restorer, the data cutter cuts a data packet according to a set data loss rate before the data packet is injected into a network port, and the data restorer restores the lost data after the data packet is received.

Assembling a global controller on a main control node, distributing an optimal configuration for each node according to global flow information and the requirement of a user on the quality of an application calculation result, and sending the generated control information to the corresponding node through a network; the flow information is unidirectional communication traffic among all nodes, namely communication traffic from the i node to the j node, and for a network with n nodes, n (n-1) flows are in total; the control information includes a congestion error budget allocated to the node, a serialization error budget, and a data loss rate configuration that minimizes network congestion.

And assembling a local controller on each node, setting the data loss rate of each data packet to be injected into the network according to the received control information, and delivering the data packets to a data clipping device for processing. The original data sequence contained in the data packet is, for example, 20 integer numbers of four bytes, and the original data sequence includes a plurality of data units.

Furthermore, the global controller collects local information and performs global optimization regulation, and the local controller performs data loss rate setting each time.

Furthermore, by relaxing the requirement of the system on the data precision, the data transmitted in the network on chip is cut in a loss manner, the data volume circulating in the network is reduced, and the lost data is recovered at the destination node to ensure the integrity of the data.

Further, the flow prediction process is as follows: the flow predictor of each node collects data information during application operation, namely the data volume sent from the node to other nodes, and an autoregressive model for time sequence prediction is constructed through offline training; on the basis of an autoregressive model trained offline, the following flow prediction is corrected by taking the difference between the previous flow prediction value and the observed value, namely the obtained true value, as feedback.

Further, the process of the data clipping device for clipping data is as follows: the data clipping device clips data by l data units at intervals according to the data loss rate given by the local controller; every time the l data units are reserved, the data clipper discards one data unit, resulting in a clipped data packet that is smaller than the original data packet.

Further, the recovery process of the lost data is as follows: for the clipped data sequence, the data restorer inserts the restored data units into the data sequence according to the interval l used in clipping; every other data unit, the data restorer inserts a restoration value which is the average of the sum of the data values of its neighbouring units.

Further, the specific control process of the global controller is as follows:

1) mass loss model

Simulating mechanisms of a data clipper and a data restorer by software, generating different data loss ratios in the operation of application to obtain samples with different data loss rates and quality losses, and establishing a quality loss model for each application in a linear interpolation mode; the use of a quality loss model to estimate the missing data will result in errors in the final output results of the application.

2) Error budget

According to the result of the flow prediction and the quality requirement defined by a user, after the flow prediction value of each flow on each node is collected, the quality loss model is utilized to reversely calculate the data volume which is allowed to be approximately cut and recovered, the total data volume which is allowed to be cut in the whole network, namely the total error budget, is obtained, the cutting volume which is distributed to different nodes, namely the error budget of the nodes is obtained, and the error budget of the nodes comprises the congestion error budget and the serialization error.

3) Determining congestion level of a single link

For a single link, if the flow rate of the flow exceeds the maximum bandwidth of the link, the link is considered to be congested, and the exceeding part is defined as the congestion degree of the link; the flow congestion degree is defined as the sum of the congestion degrees of all links on the path of one flow.

4) Congestion minimization

Calculating a data loss rate for each flow by using a greedy strategy according to the total error budget and the total flow predicted value so as to minimize congestion; firstly, initializing the congestion degree of each link and each flow, and then starting iteration; selecting a flow with the highest congestion degree in each iteration, selecting the highest link congestion degree on a path through which the flow flows, setting the traffic loss proportion of the flow, and setting the data loss rate of the flow as the proportion if the set traffic loss proportion ensures that the link is not congested any more; if not, setting the maximum data loss rate, namely cutting half of the load of the data packet; if the total error budget is not enough, the data flow only loses the data loss amount distributed by the main control node; every time a data loss rate is set, the expected amount of data lost is deducted from the total error budget, and each link, each stream are updated and the next iteration is entered.

The iteration of each link, each flow continues until the total error budget is exhausted, the link is no longer congested, or all flows have been processed; after the iteration is finished, each flow has a configuration for minimizing the data loss rate of congestion, and the data loss rate of each flow is preset; for a node, the sum of the expected lost data amounts of all flows it sends out is the congestion error budget allocated to this node.

5) Reducing serialization latency

On the basis of minimizing the congestion data loss rate, the main control node redistributes the residual total error budget after the distribution in the step 4) to each node according to the probability of continuously losing data of each node, namely the serialization error budget of the node; the serialization latency of a packet is reduced by clipping the packet, and the amount of data lost per clipping is deducted from the serialization error budget of the node.

To this end, global control is completed and the congestion error budget, the serialization error budget, and the congestion minimized data loss rate are assigned to each corresponding node, i.e., the optimal configuration of each node.

Further, since the global controller only calculates how much to allocate to each node, and each node gets the allocated budget, the local controller needs to further perform the setting of the data loss rate to "consume" (deduct) the error budget when performing data loss transmission.

Further, the control process of the local controller: the local controller sets the data loss rate of each data packet waiting to be injected into the network according to the congestion error budget, the serialization error budget and the congestion minimum data loss rate distributed by the global controller; setting the initial loss rate to be zero, if the congestion error budget of the node is not zero, setting the loss rate to be the loss rate capable of minimizing the network congestion, and deducting the discarded data from the congestion error budget; after the congestion error budget is deducted, if the serialization error budget still remains, the loss rate is set to the maximum loss rate provided by the data clipper, i.e. half the clipped packet load, and the added discarded data is deducted from the serialization error budget.

Further, the congestion error budget is the data volume expected to be discarded on the data flow from each node; the serialized error budget is the error budget remaining after each node drops the congestion error budget.

Compared with the prior art, the invention has the following beneficial effects:

1. the invention optimizes the overall performance and energy consumption of the network by reducing the data volume circulating in the network, improving the congestion condition of the network and reducing the network time delay.

2. According to the invention, the data is cut and optimally regulated on the basis of flow prediction and quality requirements, so that the benefit of performance is maximized under the condition of meeting the quality requirements.

Drawings

FIG. 1 is a global traffic prediction graph of the present invention.

FIG. 2 is a global information control map of the present invention.

FIG. 3 is a flow chart of the error budgeting of the present invention.

Fig. 4 is a block diagram of a specific embodiment.

FIG. 5 is a flow chart of data clipping and restoring in the present invention.

Fig. 6 is pseudo code of the congestion minimization algorithm of the present invention.

Detailed Description

The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.

Referring to fig. 4, C denotes a processor core in the network-on-chip node, M denotes a processor Cache (Cache memory), R denotes a router, and NI denotes a network interface. The invention discloses a network-on-chip optimization method based on approximate computation, which comprises a flow predictor, a data clipper, a data restorer, a global controller and a local controller. Wherein:

as shown in fig. 1, the traffic predictor operates on each node L, predicts the communication condition of the data stream sent by the node in each regulation interval according to the past operation condition of the node at the beginning of the interval, and sends the predicted communication condition to the master control node G.

The global controller is installed on the selected master control node G, and an optimal configuration is allocated to each node according to global traffic information and the requirement of the user on the quality of the application calculation result, as shown in fig. 2, the generated control information is sent to the corresponding node through the network.

And assembling a local controller on each node, setting the data loss rate of each data packet to be injected into the network according to the received control information, and delivering the data packets to a data clipping device for processing.

The flow prediction specifically comprises the following steps: collecting runtime information of the application, constructing an autoregressive model for time series prediction through offline training,

y^AR(t)＝λ₁y_t-1+λ₂y_t-2+…+λ_py_t-p+μ+ε_t

wherein y is^AR(t) is the predicted value at time t from the autoregressive model, y_t-1…y_t-pIs an observed value of network traffic, λ₁…λ_pIs an independent variable coefficient, mu is a constant term, epsilon_tThe model error at time t, the parameters may be fitted by least squares. On the basis of an autoregressive model trained offline, the predicted result is corrected by obtaining the feedback of the difference between the predicted value and the observed value of the flow on line,

wherein

Is the final predicted flow, Δ_tIs the feedback value at the time t,

is the prediction error and α is the learning rate.

The specific tailoring of the data is: referring to FIG. 5, for a raw data sequence a to be processed₁,…,a_l-1,a_l,a_l+1…, the data clipper clips data at intervals l according to a given data loss rate. Every time the l data units are reserved, the data clipper discards one data unit, resulting in a clipped data packet that is smaller than the original data packet.

The specific recovery of lost data is: referring to FIG. 5, for a clipped data sequence a₁,…,a_l-1,a_l+1…, the data restorer is based onThe interval l used in the cropping inserts the recovered data unit on the data sequence. The data restorer inserting a restore value every l data units, e.g.

a′_l＝(a_l-1+a_l+1)/2

I.e. the average of the sum of the data values of its neighboring cells. Since l ≧ 2, the maximum loss rate provided by the data clipper in this example is 50% (when l ≧ 2).

The global control specifically comprises:

1) mass loss model

The method comprises the steps of simulating the influence on the application caused by the data clipper and the restorer by realizing and embedding the mechanisms of the data clipper and the restorer into the source code of the application, running the modified application in different data loss ratios, collecting samples such as data loss rate and quality loss, completing unknown values through linear interpolation, and establishing a quality loss model. The model is used to estimate the error in the final output of the application that would result from a loss of a certain proportion of the data.

2) Error budget

According to the global flow information and the requirement of the user on the quality of the application calculation result, after the flow prediction result in the whole network is collected, the data volume which is allowed to be approximately recovered after clipping can be reversely calculated by using a quality loss model,

i.e. the total error budget of the entire network. Wherein q is^-1Is an inverse function of the mass loss model, theta is the mass loss limit given by the user, q^-1(theta) the proportion of data allowed to be lost, v_ijFor traffic that an inode communicates to a j node, n is the number of network nodes,

i.e. the sum of the traffic in the whole network. Each time a node approximates the data, it will deduct a certain error budget (congestion) allocated to that nodeA plug error budget, a serialization error budget, or both).

3) Degree of congestion

For a single link, if the flow rate exceeds the maximum bandwidth of the link, the link is considered to be congested, the exceeding part is defined as the congestion degree of the link, and the congestion degree of the link k is

Where c is the maximum bandwidth of the link, x_ijThe data loss rate on stream i → j. r is_ijkA binary quantity for formulation, when calculating the flow flowing through a link, for masking the flow not flowing through the link in a summation formula, and the flow congestion degree is defined as

I.e. the sum of the congestion degrees of all links on the path of one flow.

4) Congestion minimization

Referring to FIG. 6, the detailed algorithm pseudo code is illustrated. After the traffic information (predicted traffic value of each flow on each node) is collected, the error budget of the whole network is calculated. Calculating a data loss rate for each flow by using a greedy strategy according to the total error budget and the flow predicted value so as to minimize congestion; firstly, initializing the congestion degree of each link and each flow according to the method, and then starting iteration; selecting a flow with the highest congestion degree in each iteration, selecting the highest link congestion degree on a path through which the flow flows, and setting the data loss rate of the flow as the proportion if the traffic loss proportion of the flow is set so that the link is not congested any more; if not, setting the maximum data loss rate; if the total error budget is not sufficient, only the amount allowed by the total error budget is lost. Each time a data loss rate is set, the amount of data that is expected to be lost is subtracted from the total error budget and updated before entering the next iteration.

The iteration of each link, each flow continues until the total error budget is exhausted, the link is no longer congested, or all flows have been processed. After the iteration is over, each flow has a data loss rate configuration for minimizing congestion, i.e. it is predetermined how much data each flow will lose. For a node, the sum of the expected lost data amounts of all flows it sends out is the congestion error budget allocated to this node.

5) Reducing serialization delay

In the case of insignificant network congestion, the main determinant of network latency is no longer long queuing at the time of congestion. The serialization delay of the data packet can be reduced by clipping the data packet, and the clipping needs the support of error budget.

As shown in fig. 3, on the basis of minimizing the congestion data loss rate, the master node allocates the total error budget remaining after the allocation in 4) to each node in proportion according to the possibility that each node continues to lose data, as the serialized error budget allocated to each node; the serialization latency of a packet can be reduced by cropping the packet, and the amount of data lost per cropping is subtracted from the serialization error budget.

And at this point, the global control is completed, and the congestion error budget, the serialization error budget and the congestion minimum data loss rate configuration corresponding to each node are sent to the node.

The specific process of local control is as follows:

for each data packet waiting to be injected into the network, the local controller makes a selection of the data loss rate according to the margin of the two error budgets. Setting the initial loss rate to zero, temporarily setting the loss rate to a congestion minimized loss rate if the congestion error budget remains, and deducting the discarded data from the congestion error budget; if the serialization error budget remains, then the loss rate is again set to the maximum loss rate and the increased portion of the discarded data is subtracted from the serialization error budget.

The present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents and are included in the scope of the present invention.

Claims

1. A network-on-chip optimization method based on approximate computation is characterized in that the method is realized by a flow predictor, a data clipper, a data restorer, a global controller and a local controller; the flow predictor operates on each node, data flow in a regulation interval is predicted at the beginning of each regulation interval according to the past operating condition of the node, and each node sends a flow prediction value to the global controller;

each node of the network on chip is provided with a data cutter and a data restorer, the data cutter cuts a data packet according to a set data loss rate before the data packet is injected into a network port, and the data restorer restores lost data after receiving the cut data packet;

assembling a global controller on a main control node, distributing an optimal configuration for each node according to global flow information and the requirement of a user on the quality of an application calculation result, and sending the generated control information to the corresponding node through a network; the flow information is unidirectional communication traffic among all nodes, namely communication traffic from a node i to a node j, for a network with n nodes, n (n-1) flows are in total, n represents the number of network nodes, and i and j respectively represent different nodes; the control information includes a congestion error budget, a serialization error budget, and a data loss rate configuration that minimizes network congestion;

assembling local controllers on each node, setting the data loss rate of each data packet waiting to be injected into the network according to the received control information, and then delivering the data packets to a data clipping device for processing; the data packet contains an original data sequence, and the original data sequence comprises a plurality of data units.

2. The method as claimed in claim 1, wherein the global controller collects local information and performs global optimization control, and the local controller performs data loss rate setting every time.

3. The method of claim 1, wherein the traffic predictor comprises the following steps:

the flow predictor of each node collects data information during application operation, namely the data volume sent from the node to other nodes, and an autoregressive model for time sequence prediction is constructed through offline training; on the basis of an autoregressive model trained offline, the following flow prediction is corrected by taking the difference between the previous flow prediction value and the observed value, namely the obtained true value, as feedback.

4. The method for optimizing network on chip based on approximate computation of claim 1, wherein the data clipping unit clips the data by:

the clipper clips data by l data units at intervals according to the data loss rate given by the local controller; the data clipper discards one data unit every time l data units are retained, resulting in a clipped data packet that is smaller than the original data packet, l representing the clipping interval used by the data clipper.

5. The method for optimizing network on chip based on approximate computation of claim 1, wherein the recovery procedure of the lost data is as follows: for the clipped data sequence, the data restorer inserts the restored data units into the data sequence according to the interval l used in clipping; the data restorer inserts a restore value every l data units, the restore value is the average of the sum of the data values of the left and right adjacent units of the restored data unit, and l represents the clipping interval used by the data clipper.

6. The method for optimizing network on chip based on approximate computation of claim 1, wherein the control process of the global controller is as follows:

1) establishing a mass loss model

Simulating mechanisms of a data cutter and a data restorer by software, generating different data loss ratios in the operation of application to obtain samples with different data loss rates and quality losses, completing unknown values by linear interpolation, and establishing a quality loss model of each application; the quality loss model is used to estimate the error that the lost data will cause to apply the final output result;

2) error budget

According to global flow information and the requirement of a user on the quality of an application calculation result, after a flow predicted value of each flow on the whole network, namely each node, is collected, a quality loss model is utilized to calculate the data volume which is allowed to be approximately cut and then recovered in a reverse mode, the total data volume which is allowed to be cut in the whole network, namely the total error budget, is obtained, the cutting volume which is allowed to be cut in the whole network, namely the error budget of the node is redistributed to different nodes, namely the error budget of the node, and the error budget of the node comprises a congestion error budget and a serialization error;

3) determining congestion level of a single link

For a single link, if the flow rate of the flow exceeds the maximum bandwidth of the link, the link is considered to be congested, and the exceeding part is defined as the congestion degree of the link; defining the flow congestion degree as the sum of the congestion degrees of all links on a path of one flow;

4) congestion minimization

Calculating a data loss rate for each flow by using a greedy strategy according to the total error budget and the total flow predicted value so as to minimize congestion; firstly, initializing the congestion degree of each link and each flow, and then starting iteration; selecting a flow with the highest congestion degree in each iteration, selecting the highest link congestion degree on a path through which the flow flows, setting the traffic loss proportion of the flow, and setting the data loss rate of the flow as the proportion if the set traffic loss proportion ensures that the link is not congested any more; if not, setting the maximum data loss rate, namely cutting half of the load of the data packet; if the total error budget is not enough, the data flow only loses the data loss amount distributed by the main control node; when a data loss rate is set, the predicted lost data amount is deducted from the total error budget, and each link and each flow are updated and enter the next iteration;

the iteration of each link, each flow continues until the total error budget is exhausted, the link is no longer congested, or all flows have been processed; after the iteration is finished, each flow has a data loss rate configuration for minimizing congestion, and the data loss rate configuration is preset; for a certain node, the sum of the data amount expected to be lost by all the flows sent by the node is the congestion error budget allocated to the node;

5) reducing serialization delay

On the basis of minimizing the congestion data loss rate, the main control node allocates the residual total error budget after the allocation in the step 4) to each node according to the probability of each node continuing to lose data, namely the serialization error budget of the node; the serialization time delay of the data packet is reduced by cutting the data packet, and the data volume lost by each cutting is deducted from the serialization error budget of the node;

7. The method for optimizing network on chip based on approximate calculation according to claim 1, wherein the control process of the local controller is as follows:

the local controller sets the data loss rate of each data packet waiting to be injected into the network according to the congestion error budget, the serialization error budget and the congestion minimum data loss rate distributed by the global controller; setting the initial loss rate to be zero, if the congestion error budget of the node is not zero, setting the loss rate to be the loss rate capable of minimizing the network congestion, and deducting the discarded data from the congestion error budget; after the congestion error budget is deducted, if the serialization error budget still remains, the loss rate is set to the maximum loss rate provided by the data clipper, i.e. half the clipped packet load, and the added discarded data is deducted from the serialization error budget.

8. The method according to claim 7, wherein the congestion error budget is an expected amount of dropped data on a data flow from each node; the serialized error budget is the error budget remaining after each node drops the congestion error budget.