CN106919368A

CN106919368A - A kind of accelerated method in streamed data stream graph key path, acceleration system, device and chip

Info

Publication number: CN106919368A
Application number: CN201710028096.6A
Authority: CN
Inventors: 欧焱; 谭旭; 朱亚涛; 叶笑春; 王达; 严龙; 范东睿
Original assignee: Beijing Zhongke Rui Core Technology Co Ltd
Current assignee: Beijing Ruixin Data Flow Technology Co ltd; Beijing Zhongke Ruixin Technology Group Co ltd
Priority date: 2017-01-12
Filing date: 2017-01-12
Publication date: 2017-07-04
Anticipated expiration: 2037-01-12
Also published as: CN106919368B

Abstract

The present invention discloses a kind of accelerated method in streamed data stream graph key path, acceleration system, device and chip, wherein the described method comprises the following steps：Determine the key node in data flow；Increased predecessor node before the key node, descendant node is increased after the key node；Replicate key node and form multiple key child nodes；Data to be transmitted exports the data to be transmitted via selected described crucial child node by after the predecessor node, selecting one of them available crucial child node from the descendant node.The present invention is only optimized to single operation number memory space in the middle of DFD, so that the key operation in critical path in the middle of single operation number memory space can be with executed in parallel, so as to shorten the execution time of critical path in DFD, execution efficiency has a clear superiority compared with traditional structure.

Description

A kind of accelerated method in streamed data stream graph key path, acceleration system, device and Chip

Technical field

The present invention relates to data stream architecture technical field, more particularly to a kind of stream data flowing system structure critical path The accelerated method in footpath, acceleration system, device and chip.

Background technology

Current super calculating field first-class merit consumption is limited in be pushed to encounter huge resistance during E grades of this node of calculating, number Extensive concern is obtained with its outstanding parallel feature and power dissipation ratio of performance again according to flowing system structure.Data stream architecture Calculate large-scale scientific algorithm, it is necessary first to the journey that can be performed in data stream architecture is write using data-flow language Sequence；The program translation of writing data-flow language into DFD is needed afterwards, shows the dependence between module in program And the dependence between instruction；Obtain after DFD, it is necessary to DFD to be mapped to the Data Stream Processing of physics On unit；Last calculative data, continuously perform in injecting data stream handle in a streaming manner.Data fluid Data in system can be transmitted directly between instruction, alleviate and related and data are instructed present in traditional von Neumann structure Time delay and power problemses that relevant issues are brought, improve the degree of parallelism of instruction.

Data flow kernel portion needs to support data flow instruction to the correct mapping of DFD that traditional mapping mode is By the operation in DFD, (operation here, can both refer to individual instructions, it is also possible to refer to the instruction that a plurality of sequential instructions are constituted Collection, can make instruction, module, task equigranular) (operation corresponding data stream is mapped one by one with the node in DFD A node in figure).In the middle of a DFD, the execution efficiency of different nodes is different, in DFD The different nodes of identical descendant node are usually present, such as 2 operands in add operation node come from different 2 Individual predecessor node (predecessor node is probably memory access node, it is also possible to which others calculate some etc.).Due to holding between node Scanning frequency rate is different, then deposit needs the source data for performing can not be while reach in node, and node needs institute's active data to arrive Can be performed after.Depend on reaching the latest the time of node source data between so at the beginning of the execution of node, and count It is referred to as critical path according to the mutual set of node in the presence of dependence for flowing slow.If as shown in figure 1, node 3 is performed compared with node 2 Speed is slow, then path 1-3-4 is referred to as into critical path.Critical path can cause the functional part of the node in DFD Utilization rate is low, and also directly determines the DFD execution efficiency in individual data flow graph.

In order to alleviate what key operation (the inefficient operation in critical path) in DFD came to program tape The influence of long delay, improves the execution efficiency of functional part in DFD interior joint, and the method for generally using is by whole number Many parts are replicated according to flow graph, multiple operand memory spaces are formed, a plurality of path is provided to data so that different behaviour in program Count the execution that memory space can be parallel, the delayed impact that the key operation in critical path brings is masked, such as Fig. 1 institutes Show.But there is following 2 key issues using the method for replicating many sets of data flow graphs：

1st, replicating the method for multiple operand memory spaces can not solve key operation execution in operand memory space Slow problem.Using above method, be to data provide it is a plurality of walk path, but for every path, number According to path execution efficiency again without change.

2nd, memory space can be caused to explode using the method for multiple operand memory spaces.Using this method, work as key When the execution efficiency of operation is slow enough, it is necessary to which the delayed impact that key operation brings is covered in enough storage areas.And One operand memory space is the amount of physical memory of whole DFD, and can be to transmitting number between data flow processor According to bringing pressure.When enough operand memory spaces are needed, memory space can be caused to explode.

Based on the problem present on, requirement of the method that traditional acceleration critical path is operated to hardware is huge, and The execution efficiency of critical path of single operation number memory space is not alleviated fundamentally, it is necessary to one kind is deposited acceptable In the range of storage space, to the accelerated method of DFD critical path.

The content of the invention

In view of the shortcomings of the prior art, the present invention proposes a kind of method in expedited flow graph key path, system and its Device.

Present invention firstly provides a kind of accelerated method in streamed data stream graph key path, comprise the following steps：

Determine the key node in data flow；

Increased predecessor node before the key node, descendant node is increased after the key node；

Replicate key node and form multiple key child nodes；

Data to be transmitted by after the predecessor node, selecting one of them available crucial child node, and via selected Described crucial child node the data to be transmitted is exported from the descendant node.

According to the accelerated method in streamed data stream graph key path proposed by the present invention, wherein, the predecessor node includes First flag bit, the forwarding logic of first choice logical sum first；First flag bit is used to represent the crucial child node energy No reception data, the selection one from multiple key child nodes is available according to first flag bit for the first choice logic Crucial child node, first forwarding logic is used to receive the selected crucial child node of the first choice logic, and will be to be passed Transmission of data is forwarded to the selected crucial child node.

According to the accelerated method in streamed data stream graph key path proposed by the present invention, wherein, the descendant node includes Second flag bit, the second selection forwarding logic of logical sum second；During second flag bit is used to represent the crucial child node Data whether effectively, the second selection logic selects the effective key child node of data, institute according to second flag bit The effective key child node of data that the second forwarding logic is selected by the described second selection logic is stated, data to be transmitted is passed through The descendant node output.

According to the accelerated method in streamed data stream graph key path proposed by the present invention, wherein, when one of them pass When having data in key child node, its corresponding described first mark position is " the crucial child node is unavailable "；When wherein one When not having data in the individual crucial child node, its corresponding described first mark position is " the crucial child node can use "； When the data stored in one of them described crucial child node are effective, its corresponding described second mark position is that " can turn Send out the data of the key node "；When the data invalid stored in one of them described crucial child node, its corresponding described the Two mark positions are " can not forward the data of the key node ".

The present invention also provides a kind of acceleration system in streamed data stream graph key path, including：

Multiple key child nodes, are replicated by a key node in data flow and formed, the multiple crucial child node energy Enough perform identical operation；

Predecessor node, before the multiple crucial child node, can for the selection from the multiple crucial child node Crucial child node is transmitting data；

Rearmounted node, before the multiple crucial child node, for selecting number from the multiple crucial child node According to effective crucial child node, and valid data therein are sent out.

According to the acceleration system in streamed data stream graph key path proposed by the present invention, wherein, the predecessor node includes First flag bit, the forwarding logic of first choice logical sum first；First flag bit is used to represent the crucial child node energy No reception data, the selection one from multiple key child nodes is available according to first flag bit for the first choice logic Crucial child node, first forwarding logic is used to receive the selected crucial child node of the first choice logic, and will be to be passed Transmission of data is forwarded to the selected crucial child node.

According to the acceleration system in streamed data stream graph key path proposed by the present invention, wherein, the descendant node includes Second flag bit, the second selection forwarding logic of logical sum second；During second flag bit is used to represent the crucial child node Data whether effectively, the second selection logic selects the effective key child node of data, institute according to second flag bit The effective key child node of data that the second forwarding logic is selected by the described second selection logic is stated, data to be transmitted is passed through The descendant node output.

According to the acceleration system in streamed data stream graph key path proposed by the present invention, wherein, when one of them pass When having data in key child node, its corresponding described first mark position is " the crucial child node is unavailable "；When wherein one When not having data in the individual crucial child node, its corresponding described first mark position is " the crucial child node can use "； When the data stored in one of them described crucial child node are effective, its corresponding described second mark position is that " can turn Send out the data of the key node "；When the data invalid stored in one of them described crucial child node, its corresponding described the Two mark positions are " can not forward the data of the key node ".

The present invention also provides a kind of device of the acceleration system including above-mentioned streamed data stream graph key path simultaneously.

The present invention also provides a kind of chip of the acceleration system including above-mentioned streamed data stream graph key path simultaneously.

Compared with prior art, the beneficial effects of the present invention are：

The present invention is only optimized to single operation number memory space in the middle of DFD so that the storage of single operation number is empty Between key operation in central critical path can be with executed in parallel, during so as to the execution for shortening critical path in DFD Between, execution efficiency has a clear superiority compared with traditional structure.

Brief description of the drawings

Fig. 1 speeds up to mode for traditional data flow graph critical path；

Fig. 2 changes for the relatively conventional method of the critical path accelerated method of stream data flow structure；

Fig. 3 is the structure chart of Map nodes；

Fig. 4 is the structure chart of Merge nodes；

Fig. 5 is the critical path accelerated method embodiment of stream data flow structure.

Specific embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation is described, it is clear that described embodiment is only a part of embodiment of the invention, rather than whole embodiments.It is based on Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under the premise of creative work is not paid Embodiment, belongs to the scope of protection of the invention.

Fig. 2 is referred to, the wherein left side is traditional data stream architecture, and the right is obtaining using the method for the present invention New DFD.Wherein the node 3 on Fig. 2 left sides is key operation node, and Fig. 2 the right 3.1 and 3.2 is crucial child node, And same operation can be done, child node 3.1,3.2 is all connected with same Map nodes and Merge nodes.

A kind of method for accelerating critical path proposed by the present invention is specifically included：

In the middle of single operation number memory space, key operation node is replicated many parts and obtains multiple key child nodes.Often One crucial child node can do identical operation, but not have dependence between crucial child node, and all crucial sons Node has identical predecessor node (Map) and descendant node (Merge), and having in predecessor node and descendant node to recognize The flag bit of data whereabouts/come from which crucial child node.The node for replicating many parts of key operation is exactly to be deposited to operand Storage space provides a plurality of data path in critical path, for data each time, its number of instructions and tradition for flowing through Structure is the same.

Map nodes were added before key operation child node, i.e., the predecessor node of all of crucial child node is exactly Map sections Point, when without Map nodes are added, only one of which key node, data only have wall scroll path.Adding Map nodes Afterwards, and there are a multiple identicals key child nodes, there is provided a plurality of data path.Map nodes are responsible for the number of predecessor node According to being distributed to crucial child node.The structure of Map nodes selection logic and turns as shown in figure 3, there is flag bit in Map nodes Hair three modules of logic.Whether flag bit is the feedback that crucial child node gives Map nodes, indicates empty in crucial child node In the spare time, if crucial child node is not idle, flag bit is indicated can not be continued to the crucial child node transmission data, if closed Key child node is idle, then being indicated in the middle of flag bit can be to the crucial child node transmission data.Select the main work(of logic Can be from the middle of the flag bit of its numerous descendant node selection one available crucial child node, have chosen this child node it Afterwards, selection logic empties the flag bit of this child node, and the flag bit for putting the crucial child node is " node is not idle ".Forwarding The function of logic is to receive the key node that selection logic is chosen, and the secondary data is forwarded into this key node.

Merge nodes are added after crucial child node, is also the descendant node of all crucial child nodes, in traditional behaviour Count in the middle of memory space, an only data path, in the method for the new acceleration critical path for proposing, based on set forth above Two kinds of structures, have multiple key child nodes to form a plurality of data path, and the major function of Merge nodes is by crucial child node In serial mode be sent to descendant node.The structure of Merge nodes in Merge nodes as shown in figure 4, have mark Position, selection logic, three modules of forwarding logic.Wherein flag bit is the feedback that crucial child node gives Merge nodes, is indicated Whether effectively crucial child node gives the data of Merge nodes, if data invalid, flag bit indicates that Merge can not turn The data of the crucial child node are sent out, if crucial son node number is according to effectively, being indicated in the middle of flag bit can forward the pass The data that key child node is transmitted.The major function for selecting logic is to choose one from the middle of the flag bit of its numerous descendant node Individual available crucial child node, have chosen after this child node, and selection logic empties the flag bit of this child node, puts the pass Key child node is " data invalid ".The function of forwarding logic is to receive the crucial child node that selection logic is chosen, by key Descendant node of the data forwarding of node to Merge nodes.

The structure of Map nodes and Merge nodes has flag bit, selection logic, three modules of forwarding logic, and indicates Position is feedback of the crucial child node to the two special joints.Unlike, the flag bit in Map nodes is to indicate key It is whether idle available in child node, if can be to the new data of its forwarding；And the flag bit in Merge nodes then indicates key Whether the data in child node are effectively usable, if the data forwarding of crucial child node can be gone out.Map nodes are by data Be distributed to crucial child node, Merge nodes be by the data forwarding of crucial child node out.In addition, Map nodes are a datas Flow point is dealt into a plurality of data path, and Merge nodes are, by the data in a plurality of data path, to select one to its Descendant node.Physically see, the path of Map nodes is a pair of multi-paths, and the path of Merge nodes is many-to-one logical Road, but logically from the point of view of, Map nodes and Merge node datas only have a logical path.

It is different from traditional acceleration key methodology, traditional method be by replicate multiple operand memory spaces come Increase parallel between instructing, be the concurrency for increasing program in instruction-level aspect, it is empty that it often replicates the storage of once-through operation number Between, it is necessary to whole DFD is replicated a, the critical path of each operand memory space is still present.And this hair The method of bright middle proposition is that operand memory space is optimized, and a plurality of number is proposed for same operand memory space It is more fine-grained optimization method according to path, and method proposed by the present invention can be multiplexed with conventional method, it is higher to reach Execution efficiency.

While the dress the invention allows for a kind of acceleration system in streamed data stream graph key path and comprising the system Put.

Will illustrate that single operand memory space is clapped on 6 basal latency bases please continue to refer to Fig. 5, in the present embodiment Interior, the data mobility status in operand memory space, wherein A nodes are crucial child node with B node, and it can realize phase Same function.Assuming that do not have data circulation when initialization, in datagram, and since the 1st time claps, DFD The middle data for starting to pour into n-th.Assuming that crucial child node execution needs 2-3 time to clap can terminate.

Step 501：N-th data reach Map nodes, and Map nodes judge its descendant node, it is found that A nodes are equal with B node In idle condition, its mode bit is 11, and it forwards the data to A nodes (assuming that n-th data need to be held in key operation 3 times of row clap), and be 01 by state position.Node A receives n-th data, starts execution and (performs the 1st time Clap)；There is no data in node B, be not carried out.And the mode bit of Merge nodes is 00, there is no valid data, so it is not Oriented its successor node forwards data.

Step 502：N+1 secondary datas reach Map nodes, and Map nodes judge its descendant node, it is found that mode bit is 01, N+1 secondary datas are forwarded to B node (assuming that N+1 secondary datas need to perform 2 times clapping) by Map nodes, and by state Position is 00.Node A is carrying out the data (performing 1 time bat) of n-th, and node B starts to perform N+1 secondary datas (performing 0 time bat).And the mode bit of Merge nodes is 00, there is no valid data, so it is not oriented thereafter after knot Point forwarding data.

Step 503：N+2 secondary datas reach Map nodes, and Map nodes judge its descendant node, it is found that mode bit is 00, The data of N+2 times are not forwarded.Node A is carrying out the data (performing 2 times bats) of n-th, and node B starts to perform N+1 secondary datas (perform 1 time bat).And the mode bit of Merge nodes is 00, there is no valid data, so it is not Oriented its successor node forwards data.

Step 504：Because last time N+2 secondary data is not forwarded, Map sections are reached when still N+2 secondary datas are clapped Point, Map nodes judge its descendant node, it is found that mode bit is 00, and the data of N+2 times are not forwarded.Node A is carrying out The data (performing 3 times bats) of n times, node B starts to perform N+1 secondary datas (performing 2 times bats).Node A's The data of data and node B are finished simultaneously, and are 11 by the state position of Merge nodes；The mode bit of Merge nodes It is 11, the data for choosing B node are forwarded, and is 10 by its state position.

Step 505：Because last time N+2 secondary data is not forwarded, Map sections are reached when still N+2 secondary datas are clapped Point, Map nodes judge its descendant node, find mode bit be 11, Map nodes by N+2 secondary datas be forwarded to B node (assuming that N+2 secondary datas need to perform 2 times bats), and be 10 by state position.Node A is in idle condition, and node B starts Perform N+2 secondary datas (performing 0 time bat).The mode bit of Merge nodes is 10, and the data for choosing A nodes are turned Hair, and be 00 by its state position.

Step 506：N+3 secondary datas reach Map nodes, and Map nodes judge its descendant node, it is found that mode bit is 01, N+3 secondary datas are forwarded to A nodes (assuming that N+3 secondary datas need to perform 2 times clapping) by Map nodes, and by state Position is 00.Node A starts to perform the data (performing 0 time bat) of the N+3 times, and node B starts to perform N+2 secondary datas (performing 1 time bat).The mode bit of Merge nodes is 00, does not have valid data, so it is not oriented thereafter after node Forwarding data.

One of ordinary skill in the art will appreciate that：Accompanying drawing is the schematic diagram of one embodiment, module in accompanying drawing or Flow is not necessarily implemented necessary to the present invention.

One of ordinary skill in the art will appreciate that：The module in device in embodiment can be according to embodiment description point It is distributed in the device of embodiment, it is also possible to carry out respective change and be disposed other than in one or more devices of the present embodiment.On Stating the module of embodiment can merge into a module, it is also possible to be further split into multiple submodule.

Finally it should be noted that：The above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations；Although The present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those within the art that：It still may be used Modified with to the technical scheme described in previous embodiment, or equivalent is carried out to which part technical characteristic；And These modifications are replaced, and do not make the spirit and model of the essence disengaging embodiment of the present invention technical scheme of appropriate technical solution Enclose.

Claims

1. a kind of accelerated method in streamed data stream graph key path, it is characterised in that comprise the following steps：

Determine the key node in data flow；

Replicate key node and form multiple key child nodes；

Data to be transmitted by after the predecessor node, selecting one of them available crucial child node, and via selected institute State crucial child node and the data to be transmitted is exported from the descendant node.

2. the accelerated method in streamed data stream graph key path according to claim 1, it is characterised in that forerunner's section Point includes the first flag bit, the forwarding logic of first choice logical sum first；First flag bit is used to represent the crucial son Can node receive data, and the first choice logic selects one according to first flag bit from multiple key child nodes Available crucial child node, first forwarding logic is used to receive the selected crucial child node of the first choice logic, and Data to be transmitted is forwarded to the selected crucial child node.

3. the accelerated method in streamed data stream graph key path according to claim 2, it is characterised in that the follow-up section Point includes the second flag bit, the second selection forwarding logic of logical sum second；Second flag bit is used to represent the crucial son Whether effectively, the second selection logic selects the effective crucial son section of data to data in node according to second flag bit Point, the effective key child node of data that second forwarding logic is selected by the described second selection logic, by number to be transmitted Exported according to by the descendant node.

4. the accelerated method in streamed data stream graph key path according to claim 3, it is characterised in that when one of them When having data in the crucial child node, its corresponding described first mark position is " the crucial child node is unavailable "；When When not having data in one of them described crucial child node, its corresponding described first mark position is " the crucial child node Can use "；When the data stored in one of them described crucial child node are effective, its corresponding described second mark position is " data of the key node can be forwarded "；When the data invalid stored in one of them described crucial child node, its correspondence Second mark position be " data of the key node can not be forwarded ".

5. a kind of acceleration system in streamed data stream graph key path, it is characterised in that including：

Multiple key child nodes, are replicated by a key node in data flow and formed, and the multiple crucial child node can be held Row identical is operated；

Predecessor node, before the multiple crucial child node, for selecting available from the multiple crucial child node Crucial child node is transmitting data；

Rearmounted node, before the multiple crucial child node, for selecting data to have from the multiple crucial child node The crucial child node of effect, and valid data therein are sent out.

6. the acceleration system in streamed data stream graph key path according to claim 5, it is characterised in that forerunner's section Point includes the first flag bit, the forwarding logic of first choice logical sum first；First flag bit is used to represent the crucial son Can node receive data, and the first choice logic selects one according to first flag bit from multiple key child nodes Available crucial child node, first forwarding logic is used to receive the selected crucial child node of the first choice logic, and Data to be transmitted is forwarded to the selected crucial child node.

7. the acceleration system in streamed data stream graph key path according to claim 6, it is characterised in that the follow-up section Point includes the second flag bit, the second selection forwarding logic of logical sum second；Second flag bit is used to represent the crucial son Whether effectively, the second selection logic selects the effective crucial son section of data to data in node according to second flag bit Point, the effective key child node of data that second forwarding logic is selected by the described second selection logic, by number to be transmitted Exported according to by the descendant node.

8. the acceleration system in streamed data stream graph key path according to claim 7, it is characterised in that when one of them When having data in the crucial child node, its corresponding described first mark position is " the crucial child node is unavailable "；When When not having data in one of them described crucial child node, its corresponding described first mark position is " the crucial child node Can use "；When the data stored in one of them described crucial child node are effective, its corresponding described second mark position is " data of the key node can be forwarded "；When the data invalid stored in one of them described crucial child node, its correspondence Second mark position be " data of the key node can not be forwarded ".

9. a kind of device of the acceleration system in the streamed data stream graph key path including described in claim 5-8.

10. a kind of chip of the acceleration system in the streamed data stream graph key path including described in claim 5-8.