A kind of accelerated method in streamed data stream graph key path, acceleration system, device and
Chip
Technical field
The present invention relates to data stream architecture technical field, more particularly to a kind of stream data flowing system structure critical path
The accelerated method in footpath, acceleration system, device and chip.
Background technology
Current super calculating field first-class merit consumption is limited in be pushed to encounter huge resistance during E grades of this node of calculating, number
Extensive concern is obtained with its outstanding parallel feature and power dissipation ratio of performance again according to flowing system structure.Data stream architecture
Calculate large-scale scientific algorithm, it is necessary first to the journey that can be performed in data stream architecture is write using data-flow language
Sequence;The program translation of writing data-flow language into DFD is needed afterwards, shows the dependence between module in program
And the dependence between instruction;Obtain after DFD, it is necessary to DFD to be mapped to the Data Stream Processing of physics
On unit;Last calculative data, continuously perform in injecting data stream handle in a streaming manner.Data fluid
Data in system can be transmitted directly between instruction, alleviate and related and data are instructed present in traditional von Neumann structure
Time delay and power problemses that relevant issues are brought, improve the degree of parallelism of instruction.
Data flow kernel portion needs to support data flow instruction to the correct mapping of DFD that traditional mapping mode is
By the operation in DFD, (operation here, can both refer to individual instructions, it is also possible to refer to the instruction that a plurality of sequential instructions are constituted
Collection, can make instruction, module, task equigranular) (operation corresponding data stream is mapped one by one with the node in DFD
A node in figure).In the middle of a DFD, the execution efficiency of different nodes is different, in DFD
The different nodes of identical descendant node are usually present, such as 2 operands in add operation node come from different 2
Individual predecessor node (predecessor node is probably memory access node, it is also possible to which others calculate some etc.).Due to holding between node
Scanning frequency rate is different, then deposit needs the source data for performing can not be while reach in node, and node needs institute's active data to arrive
Can be performed after.Depend on reaching the latest the time of node source data between so at the beginning of the execution of node, and count
It is referred to as critical path according to the mutual set of node in the presence of dependence for flowing slow.If as shown in figure 1, node 3 is performed compared with node 2
Speed is slow, then path 1-3-4 is referred to as into critical path.Critical path can cause the functional part of the node in DFD
Utilization rate is low, and also directly determines the DFD execution efficiency in individual data flow graph.
In order to alleviate what key operation (the inefficient operation in critical path) in DFD came to program tape
The influence of long delay, improves the execution efficiency of functional part in DFD interior joint, and the method for generally using is by whole number
Many parts are replicated according to flow graph, multiple operand memory spaces are formed, a plurality of path is provided to data so that different behaviour in program
Count the execution that memory space can be parallel, the delayed impact that the key operation in critical path brings is masked, such as Fig. 1 institutes
Show.But there is following 2 key issues using the method for replicating many sets of data flow graphs:
1st, replicating the method for multiple operand memory spaces can not solve key operation execution in operand memory space
Slow problem.Using above method, be to data provide it is a plurality of walk path, but for every path, number
According to path execution efficiency again without change.
2nd, memory space can be caused to explode using the method for multiple operand memory spaces.Using this method, work as key
When the execution efficiency of operation is slow enough, it is necessary to which the delayed impact that key operation brings is covered in enough storage areas.And
One operand memory space is the amount of physical memory of whole DFD, and can be to transmitting number between data flow processor
According to bringing pressure.When enough operand memory spaces are needed, memory space can be caused to explode.
Based on the problem present on, requirement of the method that traditional acceleration critical path is operated to hardware is huge, and
The execution efficiency of critical path of single operation number memory space is not alleviated fundamentally, it is necessary to one kind is deposited acceptable
In the range of storage space, to the accelerated method of DFD critical path.
The content of the invention
In view of the shortcomings of the prior art, the present invention proposes a kind of method in expedited flow graph key path, system and its
Device.
Present invention firstly provides a kind of accelerated method in streamed data stream graph key path, comprise the following steps:
Determine the key node in data flow;
Increased predecessor node before the key node, descendant node is increased after the key node;
Replicate key node and form multiple key child nodes;
Data to be transmitted by after the predecessor node, selecting one of them available crucial child node, and via selected
Described crucial child node the data to be transmitted is exported from the descendant node.
According to the accelerated method in streamed data stream graph key path proposed by the present invention, wherein, the predecessor node includes
First flag bit, the forwarding logic of first choice logical sum first;First flag bit is used to represent the crucial child node energy
No reception data, the selection one from multiple key child nodes is available according to first flag bit for the first choice logic
Crucial child node, first forwarding logic is used to receive the selected crucial child node of the first choice logic, and will be to be passed
Transmission of data is forwarded to the selected crucial child node.
According to the accelerated method in streamed data stream graph key path proposed by the present invention, wherein, the descendant node includes
Second flag bit, the second selection forwarding logic of logical sum second;During second flag bit is used to represent the crucial child node
Data whether effectively, the second selection logic selects the effective key child node of data, institute according to second flag bit
The effective key child node of data that the second forwarding logic is selected by the described second selection logic is stated, data to be transmitted is passed through
The descendant node output.
According to the accelerated method in streamed data stream graph key path proposed by the present invention, wherein, when one of them pass
When having data in key child node, its corresponding described first mark position is " the crucial child node is unavailable ";When wherein one
When not having data in the individual crucial child node, its corresponding described first mark position is " the crucial child node can use ";
When the data stored in one of them described crucial child node are effective, its corresponding described second mark position is that " can turn
Send out the data of the key node ";When the data invalid stored in one of them described crucial child node, its corresponding described the
Two mark positions are " can not forward the data of the key node ".
The present invention also provides a kind of acceleration system in streamed data stream graph key path, including:
Multiple key child nodes, are replicated by a key node in data flow and formed, the multiple crucial child node energy
Enough perform identical operation;
Predecessor node, before the multiple crucial child node, can for the selection from the multiple crucial child node
Crucial child node is transmitting data;
Rearmounted node, before the multiple crucial child node, for selecting number from the multiple crucial child node
According to effective crucial child node, and valid data therein are sent out.
According to the acceleration system in streamed data stream graph key path proposed by the present invention, wherein, the predecessor node includes
First flag bit, the forwarding logic of first choice logical sum first;First flag bit is used to represent the crucial child node energy
No reception data, the selection one from multiple key child nodes is available according to first flag bit for the first choice logic
Crucial child node, first forwarding logic is used to receive the selected crucial child node of the first choice logic, and will be to be passed
Transmission of data is forwarded to the selected crucial child node.
According to the acceleration system in streamed data stream graph key path proposed by the present invention, wherein, the descendant node includes
Second flag bit, the second selection forwarding logic of logical sum second;During second flag bit is used to represent the crucial child node
Data whether effectively, the second selection logic selects the effective key child node of data, institute according to second flag bit
The effective key child node of data that the second forwarding logic is selected by the described second selection logic is stated, data to be transmitted is passed through
The descendant node output.
According to the acceleration system in streamed data stream graph key path proposed by the present invention, wherein, when one of them pass
When having data in key child node, its corresponding described first mark position is " the crucial child node is unavailable ";When wherein one
When not having data in the individual crucial child node, its corresponding described first mark position is " the crucial child node can use ";
When the data stored in one of them described crucial child node are effective, its corresponding described second mark position is that " can turn
Send out the data of the key node ";When the data invalid stored in one of them described crucial child node, its corresponding described the
Two mark positions are " can not forward the data of the key node ".
The present invention also provides a kind of device of the acceleration system including above-mentioned streamed data stream graph key path simultaneously.
The present invention also provides a kind of chip of the acceleration system including above-mentioned streamed data stream graph key path simultaneously.
Compared with prior art, the beneficial effects of the present invention are:
The present invention is only optimized to single operation number memory space in the middle of DFD so that the storage of single operation number is empty
Between key operation in central critical path can be with executed in parallel, during so as to the execution for shortening critical path in DFD
Between, execution efficiency has a clear superiority compared with traditional structure.
Brief description of the drawings
Fig. 1 speeds up to mode for traditional data flow graph critical path;
Fig. 2 changes for the relatively conventional method of the critical path accelerated method of stream data flow structure;
Fig. 3 is the structure chart of Map nodes;
Fig. 4 is the structure chart of Merge nodes;
Fig. 5 is the critical path accelerated method embodiment of stream data flow structure.
Specific embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Site preparation is described, it is clear that described embodiment is only a part of embodiment of the invention, rather than whole embodiments.It is based on
Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under the premise of creative work is not paid
Embodiment, belongs to the scope of protection of the invention.
Fig. 2 is referred to, the wherein left side is traditional data stream architecture, and the right is obtaining using the method for the present invention
New DFD.Wherein the node 3 on Fig. 2 left sides is key operation node, and Fig. 2 the right 3.1 and 3.2 is crucial child node,
And same operation can be done, child node 3.1,3.2 is all connected with same Map nodes and Merge nodes.
A kind of method for accelerating critical path proposed by the present invention is specifically included:
In the middle of single operation number memory space, key operation node is replicated many parts and obtains multiple key child nodes.Often
One crucial child node can do identical operation, but not have dependence between crucial child node, and all crucial sons
Node has identical predecessor node (Map) and descendant node (Merge), and having in predecessor node and descendant node to recognize
The flag bit of data whereabouts/come from which crucial child node.The node for replicating many parts of key operation is exactly to be deposited to operand
Storage space provides a plurality of data path in critical path, for data each time, its number of instructions and tradition for flowing through
Structure is the same.
Map nodes were added before key operation child node, i.e., the predecessor node of all of crucial child node is exactly Map sections
Point, when without Map nodes are added, only one of which key node, data only have wall scroll path.Adding Map nodes
Afterwards, and there are a multiple identicals key child nodes, there is provided a plurality of data path.Map nodes are responsible for the number of predecessor node
According to being distributed to crucial child node.The structure of Map nodes selection logic and turns as shown in figure 3, there is flag bit in Map nodes
Hair three modules of logic.Whether flag bit is the feedback that crucial child node gives Map nodes, indicates empty in crucial child node
In the spare time, if crucial child node is not idle, flag bit is indicated can not be continued to the crucial child node transmission data, if closed
Key child node is idle, then being indicated in the middle of flag bit can be to the crucial child node transmission data.Select the main work(of logic
Can be from the middle of the flag bit of its numerous descendant node selection one available crucial child node, have chosen this child node it
Afterwards, selection logic empties the flag bit of this child node, and the flag bit for putting the crucial child node is " node is not idle ".Forwarding
The function of logic is to receive the key node that selection logic is chosen, and the secondary data is forwarded into this key node.
Merge nodes are added after crucial child node, is also the descendant node of all crucial child nodes, in traditional behaviour
Count in the middle of memory space, an only data path, in the method for the new acceleration critical path for proposing, based on set forth above
Two kinds of structures, have multiple key child nodes to form a plurality of data path, and the major function of Merge nodes is by crucial child node
In serial mode be sent to descendant node.The structure of Merge nodes in Merge nodes as shown in figure 4, have mark
Position, selection logic, three modules of forwarding logic.Wherein flag bit is the feedback that crucial child node gives Merge nodes, is indicated
Whether effectively crucial child node gives the data of Merge nodes, if data invalid, flag bit indicates that Merge can not turn
The data of the crucial child node are sent out, if crucial son node number is according to effectively, being indicated in the middle of flag bit can forward the pass
The data that key child node is transmitted.The major function for selecting logic is to choose one from the middle of the flag bit of its numerous descendant node
Individual available crucial child node, have chosen after this child node, and selection logic empties the flag bit of this child node, puts the pass
Key child node is " data invalid ".The function of forwarding logic is to receive the crucial child node that selection logic is chosen, by key
Descendant node of the data forwarding of node to Merge nodes.
The structure of Map nodes and Merge nodes has flag bit, selection logic, three modules of forwarding logic, and indicates
Position is feedback of the crucial child node to the two special joints.Unlike, the flag bit in Map nodes is to indicate key
It is whether idle available in child node, if can be to the new data of its forwarding;And the flag bit in Merge nodes then indicates key
Whether the data in child node are effectively usable, if the data forwarding of crucial child node can be gone out.Map nodes are by data
Be distributed to crucial child node, Merge nodes be by the data forwarding of crucial child node out.In addition, Map nodes are a datas
Flow point is dealt into a plurality of data path, and Merge nodes are, by the data in a plurality of data path, to select one to its
Descendant node.Physically see, the path of Map nodes is a pair of multi-paths, and the path of Merge nodes is many-to-one logical
Road, but logically from the point of view of, Map nodes and Merge node datas only have a logical path.
It is different from traditional acceleration key methodology, traditional method be by replicate multiple operand memory spaces come
Increase parallel between instructing, be the concurrency for increasing program in instruction-level aspect, it is empty that it often replicates the storage of once-through operation number
Between, it is necessary to whole DFD is replicated a, the critical path of each operand memory space is still present.And this hair
The method of bright middle proposition is that operand memory space is optimized, and a plurality of number is proposed for same operand memory space
It is more fine-grained optimization method according to path, and method proposed by the present invention can be multiplexed with conventional method, it is higher to reach
Execution efficiency.
While the dress the invention allows for a kind of acceleration system in streamed data stream graph key path and comprising the system
Put.
Will illustrate that single operand memory space is clapped on 6 basal latency bases please continue to refer to Fig. 5, in the present embodiment
Interior, the data mobility status in operand memory space, wherein A nodes are crucial child node with B node, and it can realize phase
Same function.Assuming that do not have data circulation when initialization, in datagram, and since the 1st time claps, DFD
The middle data for starting to pour into n-th.Assuming that crucial child node execution needs 2-3 time to clap can terminate.
Step 501:N-th data reach Map nodes, and Map nodes judge its descendant node, it is found that A nodes are equal with B node
In idle condition, its mode bit is 11, and it forwards the data to A nodes (assuming that n-th data need to be held in key operation
3 times of row clap), and be 01 by state position.Node A receives n-th data, starts execution and (performs the 1st time
Clap);There is no data in node B, be not carried out.And the mode bit of Merge nodes is 00, there is no valid data, so it is not
Oriented its successor node forwards data.
Step 502:N+1 secondary datas reach Map nodes, and Map nodes judge its descendant node, it is found that mode bit is 01,
N+1 secondary datas are forwarded to B node (assuming that N+1 secondary datas need to perform 2 times clapping) by Map nodes, and by state
Position is 00.Node A is carrying out the data (performing 1 time bat) of n-th, and node B starts to perform N+1 secondary datas
(performing 0 time bat).And the mode bit of Merge nodes is 00, there is no valid data, so it is not oriented thereafter after knot
Point forwarding data.
Step 503:N+2 secondary datas reach Map nodes, and Map nodes judge its descendant node, it is found that mode bit is 00,
The data of N+2 times are not forwarded.Node A is carrying out the data (performing 2 times bats) of n-th, and node B starts to perform
N+1 secondary datas (perform 1 time bat).And the mode bit of Merge nodes is 00, there is no valid data, so it is not
Oriented its successor node forwards data.
Step 504:Because last time N+2 secondary data is not forwarded, Map sections are reached when still N+2 secondary datas are clapped
Point, Map nodes judge its descendant node, it is found that mode bit is 00, and the data of N+2 times are not forwarded.Node A is carrying out
The data (performing 3 times bats) of n times, node B starts to perform N+1 secondary datas (performing 2 times bats).Node A's
The data of data and node B are finished simultaneously, and are 11 by the state position of Merge nodes;The mode bit of Merge nodes
It is 11, the data for choosing B node are forwarded, and is 10 by its state position.
Step 505:Because last time N+2 secondary data is not forwarded, Map sections are reached when still N+2 secondary datas are clapped
Point, Map nodes judge its descendant node, find mode bit be 11, Map nodes by N+2 secondary datas be forwarded to B node (assuming that
N+2 secondary datas need to perform 2 times bats), and be 10 by state position.Node A is in idle condition, and node B starts
Perform N+2 secondary datas (performing 0 time bat).The mode bit of Merge nodes is 10, and the data for choosing A nodes are turned
Hair, and be 00 by its state position.
Step 506:N+3 secondary datas reach Map nodes, and Map nodes judge its descendant node, it is found that mode bit is 01,
N+3 secondary datas are forwarded to A nodes (assuming that N+3 secondary datas need to perform 2 times clapping) by Map nodes, and by state
Position is 00.Node A starts to perform the data (performing 0 time bat) of the N+3 times, and node B starts to perform N+2 secondary datas
(performing 1 time bat).The mode bit of Merge nodes is 00, does not have valid data, so it is not oriented thereafter after node
Forwarding data.
One of ordinary skill in the art will appreciate that:Accompanying drawing is the schematic diagram of one embodiment, module in accompanying drawing or
Flow is not necessarily implemented necessary to the present invention.
One of ordinary skill in the art will appreciate that:The module in device in embodiment can be according to embodiment description point
It is distributed in the device of embodiment, it is also possible to carry out respective change and be disposed other than in one or more devices of the present embodiment.On
Stating the module of embodiment can merge into a module, it is also possible to be further split into multiple submodule.
Finally it should be noted that:The above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although
The present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those within the art that:It still may be used
Modified with to the technical scheme described in previous embodiment, or equivalent is carried out to which part technical characteristic;And
These modifications are replaced, and do not make the spirit and model of the essence disengaging embodiment of the present invention technical scheme of appropriate technical solution
Enclose.