CN106919368A - A kind of accelerated method in streamed data stream graph key path, acceleration system, device and chip - Google Patents

A kind of accelerated method in streamed data stream graph key path, acceleration system, device and chip Download PDF

Info

Publication number
CN106919368A
CN106919368A CN201710028096.6A CN201710028096A CN106919368A CN 106919368 A CN106919368 A CN 106919368A CN 201710028096 A CN201710028096 A CN 201710028096A CN 106919368 A CN106919368 A CN 106919368A
Authority
CN
China
Prior art keywords
node
data
key
child node
crucial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710028096.6A
Other languages
Chinese (zh)
Other versions
CN106919368B (en
Inventor
欧焱
谭旭
朱亚涛
叶笑春
王达
严龙
范东睿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Ruixin Data Flow Technology Co ltd
Beijing Zhongke Ruixin Technology Group Co ltd
Original Assignee
Beijing Zhongke Rui Core Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhongke Rui Core Technology Co Ltd filed Critical Beijing Zhongke Rui Core Technology Co Ltd
Priority to CN201710028096.6A priority Critical patent/CN106919368B/en
Publication of CN106919368A publication Critical patent/CN106919368A/en
Application granted granted Critical
Publication of CN106919368B publication Critical patent/CN106919368B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3824Operand accessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3867Concurrent instruction execution, e.g. pipeline, look ahead using instruction pipelines
    • G06F9/3869Implementation aspects, e.g. pipeline latches; pipeline synchronisation and clocking

Abstract

The present invention discloses a kind of accelerated method in streamed data stream graph key path, acceleration system, device and chip, wherein the described method comprises the following steps:Determine the key node in data flow;Increased predecessor node before the key node, descendant node is increased after the key node;Replicate key node and form multiple key child nodes;Data to be transmitted exports the data to be transmitted via selected described crucial child node by after the predecessor node, selecting one of them available crucial child node from the descendant node.The present invention is only optimized to single operation number memory space in the middle of DFD, so that the key operation in critical path in the middle of single operation number memory space can be with executed in parallel, so as to shorten the execution time of critical path in DFD, execution efficiency has a clear superiority compared with traditional structure.

Description

A kind of accelerated method in streamed data stream graph key path, acceleration system, device and Chip
Technical field
The present invention relates to data stream architecture technical field, more particularly to a kind of stream data flowing system structure critical path The accelerated method in footpath, acceleration system, device and chip.
Background technology
Current super calculating field first-class merit consumption is limited in be pushed to encounter huge resistance during E grades of this node of calculating, number Extensive concern is obtained with its outstanding parallel feature and power dissipation ratio of performance again according to flowing system structure.Data stream architecture Calculate large-scale scientific algorithm, it is necessary first to the journey that can be performed in data stream architecture is write using data-flow language Sequence;The program translation of writing data-flow language into DFD is needed afterwards, shows the dependence between module in program And the dependence between instruction;Obtain after DFD, it is necessary to DFD to be mapped to the Data Stream Processing of physics On unit;Last calculative data, continuously perform in injecting data stream handle in a streaming manner.Data fluid Data in system can be transmitted directly between instruction, alleviate and related and data are instructed present in traditional von Neumann structure Time delay and power problemses that relevant issues are brought, improve the degree of parallelism of instruction.
Data flow kernel portion needs to support data flow instruction to the correct mapping of DFD that traditional mapping mode is By the operation in DFD, (operation here, can both refer to individual instructions, it is also possible to refer to the instruction that a plurality of sequential instructions are constituted Collection, can make instruction, module, task equigranular) (operation corresponding data stream is mapped one by one with the node in DFD A node in figure).In the middle of a DFD, the execution efficiency of different nodes is different, in DFD The different nodes of identical descendant node are usually present, such as 2 operands in add operation node come from different 2 Individual predecessor node (predecessor node is probably memory access node, it is also possible to which others calculate some etc.).Due to holding between node Scanning frequency rate is different, then deposit needs the source data for performing can not be while reach in node, and node needs institute's active data to arrive Can be performed after.Depend on reaching the latest the time of node source data between so at the beginning of the execution of node, and count It is referred to as critical path according to the mutual set of node in the presence of dependence for flowing slow.If as shown in figure 1, node 3 is performed compared with node 2 Speed is slow, then path 1-3-4 is referred to as into critical path.Critical path can cause the functional part of the node in DFD Utilization rate is low, and also directly determines the DFD execution efficiency in individual data flow graph.
In order to alleviate what key operation (the inefficient operation in critical path) in DFD came to program tape The influence of long delay, improves the execution efficiency of functional part in DFD interior joint, and the method for generally using is by whole number Many parts are replicated according to flow graph, multiple operand memory spaces are formed, a plurality of path is provided to data so that different behaviour in program Count the execution that memory space can be parallel, the delayed impact that the key operation in critical path brings is masked, such as Fig. 1 institutes Show.But there is following 2 key issues using the method for replicating many sets of data flow graphs:
1st, replicating the method for multiple operand memory spaces can not solve key operation execution in operand memory space Slow problem.Using above method, be to data provide it is a plurality of walk path, but for every path, number According to path execution efficiency again without change.
2nd, memory space can be caused to explode using the method for multiple operand memory spaces.Using this method, work as key When the execution efficiency of operation is slow enough, it is necessary to which the delayed impact that key operation brings is covered in enough storage areas.And One operand memory space is the amount of physical memory of whole DFD, and can be to transmitting number between data flow processor According to bringing pressure.When enough operand memory spaces are needed, memory space can be caused to explode.
Based on the problem present on, requirement of the method that traditional acceleration critical path is operated to hardware is huge, and The execution efficiency of critical path of single operation number memory space is not alleviated fundamentally, it is necessary to one kind is deposited acceptable In the range of storage space, to the accelerated method of DFD critical path.
The content of the invention
In view of the shortcomings of the prior art, the present invention proposes a kind of method in expedited flow graph key path, system and its Device.
Present invention firstly provides a kind of accelerated method in streamed data stream graph key path, comprise the following steps:
Determine the key node in data flow;
Increased predecessor node before the key node, descendant node is increased after the key node;
Replicate key node and form multiple key child nodes;
Data to be transmitted by after the predecessor node, selecting one of them available crucial child node, and via selected Described crucial child node the data to be transmitted is exported from the descendant node.
According to the accelerated method in streamed data stream graph key path proposed by the present invention, wherein, the predecessor node includes First flag bit, the forwarding logic of first choice logical sum first;First flag bit is used to represent the crucial child node energy No reception data, the selection one from multiple key child nodes is available according to first flag bit for the first choice logic Crucial child node, first forwarding logic is used to receive the selected crucial child node of the first choice logic, and will be to be passed Transmission of data is forwarded to the selected crucial child node.
According to the accelerated method in streamed data stream graph key path proposed by the present invention, wherein, the descendant node includes Second flag bit, the second selection forwarding logic of logical sum second;During second flag bit is used to represent the crucial child node Data whether effectively, the second selection logic selects the effective key child node of data, institute according to second flag bit The effective key child node of data that the second forwarding logic is selected by the described second selection logic is stated, data to be transmitted is passed through The descendant node output.
According to the accelerated method in streamed data stream graph key path proposed by the present invention, wherein, when one of them pass When having data in key child node, its corresponding described first mark position is " the crucial child node is unavailable ";When wherein one When not having data in the individual crucial child node, its corresponding described first mark position is " the crucial child node can use "; When the data stored in one of them described crucial child node are effective, its corresponding described second mark position is that " can turn Send out the data of the key node ";When the data invalid stored in one of them described crucial child node, its corresponding described the Two mark positions are " can not forward the data of the key node ".
The present invention also provides a kind of acceleration system in streamed data stream graph key path, including:
Multiple key child nodes, are replicated by a key node in data flow and formed, the multiple crucial child node energy Enough perform identical operation;
Predecessor node, before the multiple crucial child node, can for the selection from the multiple crucial child node Crucial child node is transmitting data;
Rearmounted node, before the multiple crucial child node, for selecting number from the multiple crucial child node According to effective crucial child node, and valid data therein are sent out.
According to the acceleration system in streamed data stream graph key path proposed by the present invention, wherein, the predecessor node includes First flag bit, the forwarding logic of first choice logical sum first;First flag bit is used to represent the crucial child node energy No reception data, the selection one from multiple key child nodes is available according to first flag bit for the first choice logic Crucial child node, first forwarding logic is used to receive the selected crucial child node of the first choice logic, and will be to be passed Transmission of data is forwarded to the selected crucial child node.
According to the acceleration system in streamed data stream graph key path proposed by the present invention, wherein, the descendant node includes Second flag bit, the second selection forwarding logic of logical sum second;During second flag bit is used to represent the crucial child node Data whether effectively, the second selection logic selects the effective key child node of data, institute according to second flag bit The effective key child node of data that the second forwarding logic is selected by the described second selection logic is stated, data to be transmitted is passed through The descendant node output.
According to the acceleration system in streamed data stream graph key path proposed by the present invention, wherein, when one of them pass When having data in key child node, its corresponding described first mark position is " the crucial child node is unavailable ";When wherein one When not having data in the individual crucial child node, its corresponding described first mark position is " the crucial child node can use "; When the data stored in one of them described crucial child node are effective, its corresponding described second mark position is that " can turn Send out the data of the key node ";When the data invalid stored in one of them described crucial child node, its corresponding described the Two mark positions are " can not forward the data of the key node ".
The present invention also provides a kind of device of the acceleration system including above-mentioned streamed data stream graph key path simultaneously.
The present invention also provides a kind of chip of the acceleration system including above-mentioned streamed data stream graph key path simultaneously.
Compared with prior art, the beneficial effects of the present invention are:
The present invention is only optimized to single operation number memory space in the middle of DFD so that the storage of single operation number is empty Between key operation in central critical path can be with executed in parallel, during so as to the execution for shortening critical path in DFD Between, execution efficiency has a clear superiority compared with traditional structure.
Brief description of the drawings
Fig. 1 speeds up to mode for traditional data flow graph critical path;
Fig. 2 changes for the relatively conventional method of the critical path accelerated method of stream data flow structure;
Fig. 3 is the structure chart of Map nodes;
Fig. 4 is the structure chart of Merge nodes;
Fig. 5 is the critical path accelerated method embodiment of stream data flow structure.
Specific embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation is described, it is clear that described embodiment is only a part of embodiment of the invention, rather than whole embodiments.It is based on Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under the premise of creative work is not paid Embodiment, belongs to the scope of protection of the invention.
Fig. 2 is referred to, the wherein left side is traditional data stream architecture, and the right is obtaining using the method for the present invention New DFD.Wherein the node 3 on Fig. 2 left sides is key operation node, and Fig. 2 the right 3.1 and 3.2 is crucial child node, And same operation can be done, child node 3.1,3.2 is all connected with same Map nodes and Merge nodes.
A kind of method for accelerating critical path proposed by the present invention is specifically included:
In the middle of single operation number memory space, key operation node is replicated many parts and obtains multiple key child nodes.Often One crucial child node can do identical operation, but not have dependence between crucial child node, and all crucial sons Node has identical predecessor node (Map) and descendant node (Merge), and having in predecessor node and descendant node to recognize The flag bit of data whereabouts/come from which crucial child node.The node for replicating many parts of key operation is exactly to be deposited to operand Storage space provides a plurality of data path in critical path, for data each time, its number of instructions and tradition for flowing through Structure is the same.
Map nodes were added before key operation child node, i.e., the predecessor node of all of crucial child node is exactly Map sections Point, when without Map nodes are added, only one of which key node, data only have wall scroll path.Adding Map nodes Afterwards, and there are a multiple identicals key child nodes, there is provided a plurality of data path.Map nodes are responsible for the number of predecessor node According to being distributed to crucial child node.The structure of Map nodes selection logic and turns as shown in figure 3, there is flag bit in Map nodes Hair three modules of logic.Whether flag bit is the feedback that crucial child node gives Map nodes, indicates empty in crucial child node In the spare time, if crucial child node is not idle, flag bit is indicated can not be continued to the crucial child node transmission data, if closed Key child node is idle, then being indicated in the middle of flag bit can be to the crucial child node transmission data.Select the main work(of logic Can be from the middle of the flag bit of its numerous descendant node selection one available crucial child node, have chosen this child node it Afterwards, selection logic empties the flag bit of this child node, and the flag bit for putting the crucial child node is " node is not idle ".Forwarding The function of logic is to receive the key node that selection logic is chosen, and the secondary data is forwarded into this key node.
Merge nodes are added after crucial child node, is also the descendant node of all crucial child nodes, in traditional behaviour Count in the middle of memory space, an only data path, in the method for the new acceleration critical path for proposing, based on set forth above Two kinds of structures, have multiple key child nodes to form a plurality of data path, and the major function of Merge nodes is by crucial child node In serial mode be sent to descendant node.The structure of Merge nodes in Merge nodes as shown in figure 4, have mark Position, selection logic, three modules of forwarding logic.Wherein flag bit is the feedback that crucial child node gives Merge nodes, is indicated Whether effectively crucial child node gives the data of Merge nodes, if data invalid, flag bit indicates that Merge can not turn The data of the crucial child node are sent out, if crucial son node number is according to effectively, being indicated in the middle of flag bit can forward the pass The data that key child node is transmitted.The major function for selecting logic is to choose one from the middle of the flag bit of its numerous descendant node Individual available crucial child node, have chosen after this child node, and selection logic empties the flag bit of this child node, puts the pass Key child node is " data invalid ".The function of forwarding logic is to receive the crucial child node that selection logic is chosen, by key Descendant node of the data forwarding of node to Merge nodes.
The structure of Map nodes and Merge nodes has flag bit, selection logic, three modules of forwarding logic, and indicates Position is feedback of the crucial child node to the two special joints.Unlike, the flag bit in Map nodes is to indicate key It is whether idle available in child node, if can be to the new data of its forwarding;And the flag bit in Merge nodes then indicates key Whether the data in child node are effectively usable, if the data forwarding of crucial child node can be gone out.Map nodes are by data Be distributed to crucial child node, Merge nodes be by the data forwarding of crucial child node out.In addition, Map nodes are a datas Flow point is dealt into a plurality of data path, and Merge nodes are, by the data in a plurality of data path, to select one to its Descendant node.Physically see, the path of Map nodes is a pair of multi-paths, and the path of Merge nodes is many-to-one logical Road, but logically from the point of view of, Map nodes and Merge node datas only have a logical path.
It is different from traditional acceleration key methodology, traditional method be by replicate multiple operand memory spaces come Increase parallel between instructing, be the concurrency for increasing program in instruction-level aspect, it is empty that it often replicates the storage of once-through operation number Between, it is necessary to whole DFD is replicated a, the critical path of each operand memory space is still present.And this hair The method of bright middle proposition is that operand memory space is optimized, and a plurality of number is proposed for same operand memory space It is more fine-grained optimization method according to path, and method proposed by the present invention can be multiplexed with conventional method, it is higher to reach Execution efficiency.
While the dress the invention allows for a kind of acceleration system in streamed data stream graph key path and comprising the system Put.
Will illustrate that single operand memory space is clapped on 6 basal latency bases please continue to refer to Fig. 5, in the present embodiment Interior, the data mobility status in operand memory space, wherein A nodes are crucial child node with B node, and it can realize phase Same function.Assuming that do not have data circulation when initialization, in datagram, and since the 1st time claps, DFD The middle data for starting to pour into n-th.Assuming that crucial child node execution needs 2-3 time to clap can terminate.
Step 501:N-th data reach Map nodes, and Map nodes judge its descendant node, it is found that A nodes are equal with B node In idle condition, its mode bit is 11, and it forwards the data to A nodes (assuming that n-th data need to be held in key operation 3 times of row clap), and be 01 by state position.Node A receives n-th data, starts execution and (performs the 1st time Clap);There is no data in node B, be not carried out.And the mode bit of Merge nodes is 00, there is no valid data, so it is not Oriented its successor node forwards data.
Step 502:N+1 secondary datas reach Map nodes, and Map nodes judge its descendant node, it is found that mode bit is 01, N+1 secondary datas are forwarded to B node (assuming that N+1 secondary datas need to perform 2 times clapping) by Map nodes, and by state Position is 00.Node A is carrying out the data (performing 1 time bat) of n-th, and node B starts to perform N+1 secondary datas (performing 0 time bat).And the mode bit of Merge nodes is 00, there is no valid data, so it is not oriented thereafter after knot Point forwarding data.
Step 503:N+2 secondary datas reach Map nodes, and Map nodes judge its descendant node, it is found that mode bit is 00, The data of N+2 times are not forwarded.Node A is carrying out the data (performing 2 times bats) of n-th, and node B starts to perform N+1 secondary datas (perform 1 time bat).And the mode bit of Merge nodes is 00, there is no valid data, so it is not Oriented its successor node forwards data.
Step 504:Because last time N+2 secondary data is not forwarded, Map sections are reached when still N+2 secondary datas are clapped Point, Map nodes judge its descendant node, it is found that mode bit is 00, and the data of N+2 times are not forwarded.Node A is carrying out The data (performing 3 times bats) of n times, node B starts to perform N+1 secondary datas (performing 2 times bats).Node A's The data of data and node B are finished simultaneously, and are 11 by the state position of Merge nodes;The mode bit of Merge nodes It is 11, the data for choosing B node are forwarded, and is 10 by its state position.
Step 505:Because last time N+2 secondary data is not forwarded, Map sections are reached when still N+2 secondary datas are clapped Point, Map nodes judge its descendant node, find mode bit be 11, Map nodes by N+2 secondary datas be forwarded to B node (assuming that N+2 secondary datas need to perform 2 times bats), and be 10 by state position.Node A is in idle condition, and node B starts Perform N+2 secondary datas (performing 0 time bat).The mode bit of Merge nodes is 10, and the data for choosing A nodes are turned Hair, and be 00 by its state position.
Step 506:N+3 secondary datas reach Map nodes, and Map nodes judge its descendant node, it is found that mode bit is 01, N+3 secondary datas are forwarded to A nodes (assuming that N+3 secondary datas need to perform 2 times clapping) by Map nodes, and by state Position is 00.Node A starts to perform the data (performing 0 time bat) of the N+3 times, and node B starts to perform N+2 secondary datas (performing 1 time bat).The mode bit of Merge nodes is 00, does not have valid data, so it is not oriented thereafter after node Forwarding data.
One of ordinary skill in the art will appreciate that:Accompanying drawing is the schematic diagram of one embodiment, module in accompanying drawing or Flow is not necessarily implemented necessary to the present invention.
One of ordinary skill in the art will appreciate that:The module in device in embodiment can be according to embodiment description point It is distributed in the device of embodiment, it is also possible to carry out respective change and be disposed other than in one or more devices of the present embodiment.On Stating the module of embodiment can merge into a module, it is also possible to be further split into multiple submodule.
Finally it should be noted that:The above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although The present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those within the art that:It still may be used Modified with to the technical scheme described in previous embodiment, or equivalent is carried out to which part technical characteristic;And These modifications are replaced, and do not make the spirit and model of the essence disengaging embodiment of the present invention technical scheme of appropriate technical solution Enclose.

Claims (10)

1. a kind of accelerated method in streamed data stream graph key path, it is characterised in that comprise the following steps:
Determine the key node in data flow;
Increased predecessor node before the key node, descendant node is increased after the key node;
Replicate key node and form multiple key child nodes;
Data to be transmitted by after the predecessor node, selecting one of them available crucial child node, and via selected institute State crucial child node and the data to be transmitted is exported from the descendant node.
2. the accelerated method in streamed data stream graph key path according to claim 1, it is characterised in that forerunner's section Point includes the first flag bit, the forwarding logic of first choice logical sum first;First flag bit is used to represent the crucial son Can node receive data, and the first choice logic selects one according to first flag bit from multiple key child nodes Available crucial child node, first forwarding logic is used to receive the selected crucial child node of the first choice logic, and Data to be transmitted is forwarded to the selected crucial child node.
3. the accelerated method in streamed data stream graph key path according to claim 2, it is characterised in that the follow-up section Point includes the second flag bit, the second selection forwarding logic of logical sum second;Second flag bit is used to represent the crucial son Whether effectively, the second selection logic selects the effective crucial son section of data to data in node according to second flag bit Point, the effective key child node of data that second forwarding logic is selected by the described second selection logic, by number to be transmitted Exported according to by the descendant node.
4. the accelerated method in streamed data stream graph key path according to claim 3, it is characterised in that when one of them When having data in the crucial child node, its corresponding described first mark position is " the crucial child node is unavailable ";When When not having data in one of them described crucial child node, its corresponding described first mark position is " the crucial child node Can use ";When the data stored in one of them described crucial child node are effective, its corresponding described second mark position is " data of the key node can be forwarded ";When the data invalid stored in one of them described crucial child node, its correspondence Second mark position be " data of the key node can not be forwarded ".
5. a kind of acceleration system in streamed data stream graph key path, it is characterised in that including:
Multiple key child nodes, are replicated by a key node in data flow and formed, and the multiple crucial child node can be held Row identical is operated;
Predecessor node, before the multiple crucial child node, for selecting available from the multiple crucial child node Crucial child node is transmitting data;
Rearmounted node, before the multiple crucial child node, for selecting data to have from the multiple crucial child node The crucial child node of effect, and valid data therein are sent out.
6. the acceleration system in streamed data stream graph key path according to claim 5, it is characterised in that forerunner's section Point includes the first flag bit, the forwarding logic of first choice logical sum first;First flag bit is used to represent the crucial son Can node receive data, and the first choice logic selects one according to first flag bit from multiple key child nodes Available crucial child node, first forwarding logic is used to receive the selected crucial child node of the first choice logic, and Data to be transmitted is forwarded to the selected crucial child node.
7. the acceleration system in streamed data stream graph key path according to claim 6, it is characterised in that the follow-up section Point includes the second flag bit, the second selection forwarding logic of logical sum second;Second flag bit is used to represent the crucial son Whether effectively, the second selection logic selects the effective crucial son section of data to data in node according to second flag bit Point, the effective key child node of data that second forwarding logic is selected by the described second selection logic, by number to be transmitted Exported according to by the descendant node.
8. the acceleration system in streamed data stream graph key path according to claim 7, it is characterised in that when one of them When having data in the crucial child node, its corresponding described first mark position is " the crucial child node is unavailable ";When When not having data in one of them described crucial child node, its corresponding described first mark position is " the crucial child node Can use ";When the data stored in one of them described crucial child node are effective, its corresponding described second mark position is " data of the key node can be forwarded ";When the data invalid stored in one of them described crucial child node, its correspondence Second mark position be " data of the key node can not be forwarded ".
9. a kind of device of the acceleration system in the streamed data stream graph key path including described in claim 5-8.
10. a kind of chip of the acceleration system in the streamed data stream graph key path including described in claim 5-8.
CN201710028096.6A 2017-01-12 2017-01-12 Accelerated method, acceleration system, device and the chip in a kind of streamed data stream graph key path Active CN106919368B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710028096.6A CN106919368B (en) 2017-01-12 2017-01-12 Accelerated method, acceleration system, device and the chip in a kind of streamed data stream graph key path

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710028096.6A CN106919368B (en) 2017-01-12 2017-01-12 Accelerated method, acceleration system, device and the chip in a kind of streamed data stream graph key path

Publications (2)

Publication Number Publication Date
CN106919368A true CN106919368A (en) 2017-07-04
CN106919368B CN106919368B (en) 2019-01-29

Family

ID=59453510

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710028096.6A Active CN106919368B (en) 2017-01-12 2017-01-12 Accelerated method, acceleration system, device and the chip in a kind of streamed data stream graph key path

Country Status (1)

Country Link
CN (1) CN106919368B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107506175A (en) * 2017-07-21 2017-12-22 北京中科睿芯科技有限公司 A kind of DFD congestion detection method based on execution efficiency gradient prediction
CN108287729A (en) * 2017-12-22 2018-07-17 北京中科睿芯智能计算产业研究院有限公司 A kind of data flow instruction mapping method
CN112015397A (en) * 2020-09-07 2020-12-01 深圳职业技术学院 Loop detection method and system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030061576A1 (en) * 2001-06-15 2003-03-27 Sanjeev Saluja Enhancing mergeability of datapaths and reducing datapath widths responsively to upper bound on information content
CN1494301A (en) * 2002-10-30 2004-05-05 华为技术有限公司 Method of storing and reading out data information through data information key field
CN103699365A (en) * 2014-01-07 2014-04-02 西南科技大学 Thread division method for avoiding unrelated dependence on many-core processor structure
CN103944748A (en) * 2014-02-17 2014-07-23 北京航空航天大学 Network-key-node self-similar-traffic generation simplification method based on genetic algorithm
US20150081693A1 (en) * 2013-09-18 2015-03-19 International Business Machines Corporation Managing data paths in an operator graph
CN104572182A (en) * 2014-12-23 2015-04-29 杭州华为数字技术有限公司 Configuration method of stream application, nodes and stream computing system
CN104679484A (en) * 2013-11-27 2015-06-03 华为技术有限公司 Concurrence analysis method and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030061576A1 (en) * 2001-06-15 2003-03-27 Sanjeev Saluja Enhancing mergeability of datapaths and reducing datapath widths responsively to upper bound on information content
CN1494301A (en) * 2002-10-30 2004-05-05 华为技术有限公司 Method of storing and reading out data information through data information key field
US20150081693A1 (en) * 2013-09-18 2015-03-19 International Business Machines Corporation Managing data paths in an operator graph
CN104679484A (en) * 2013-11-27 2015-06-03 华为技术有限公司 Concurrence analysis method and device
CN103699365A (en) * 2014-01-07 2014-04-02 西南科技大学 Thread division method for avoiding unrelated dependence on many-core processor structure
CN103944748A (en) * 2014-02-17 2014-07-23 北京航空航天大学 Network-key-node self-similar-traffic generation simplification method based on genetic algorithm
CN104572182A (en) * 2014-12-23 2015-04-29 杭州华为数字技术有限公司 Configuration method of stream application, nodes and stream computing system

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107506175A (en) * 2017-07-21 2017-12-22 北京中科睿芯科技有限公司 A kind of DFD congestion detection method based on execution efficiency gradient prediction
CN107506175B (en) * 2017-07-21 2020-07-28 北京中科睿芯科技有限公司 Data flow graph congestion detection method based on execution efficiency gradient prediction
CN108287729A (en) * 2017-12-22 2018-07-17 北京中科睿芯智能计算产业研究院有限公司 A kind of data flow instruction mapping method
CN108287729B (en) * 2017-12-22 2021-04-13 北京中科睿芯智能计算产业研究院有限公司 Data stream instruction mapping method
CN112015397A (en) * 2020-09-07 2020-12-01 深圳职业技术学院 Loop detection method and system
CN112015397B (en) * 2020-09-07 2023-09-26 深圳职业技术学院 Loop detection method and system

Also Published As

Publication number Publication date
CN106919368B (en) 2019-01-29

Similar Documents

Publication Publication Date Title
US11392740B2 (en) Dataflow function offload to reconfigurable processors
Sterling et al. Gilgamesh: A multithreaded processor-in-memory architecture for petaflops computing
US11816560B2 (en) Performance estimation-based resource allocation for reconfigurable architectures
Pasandi et al. SFQmap: A technology mapping tool for single flux quantum logic circuits
US20210271630A1 (en) Compiler Flow Logic for Reconfigurable Architectures
TWI634489B (en) Multi-layer artificial neural network
TWI781441B (en) A method, non-transitory computer readable storage medium and system of efficiently executing an operation unit graph on a reconfigurable data processor with a target architecture
CN106919368A (en) A kind of accelerated method in streamed data stream graph key path, acceleration system, device and chip
Dousti et al. Minimizing the latency of quantum circuits during mapping to the ion-trap circuit fabric
Werner et al. Hardware-accelerated join processing in large Semantic Web databases with FPGAs
Wang et al. Reconfigurable hardware accelerators: Opportunities, trends, and challenges
Mallon et al. On the scalability of the clusters-booster concept: a critical assessment of the deep architecture
Ashby et al. The impact of global communication latency at extreme scales on Krylov methods
Werner et al. Accelerated join evaluation in Semantic Web databases by using FPGAs
TWI784845B (en) Dataflow function offload to reconfigurable processors
Gottlieb et al. Clustered programmable-reconfigurable processors
Zulberti et al. Architectural implications for inference of graph neural networks on cgra-based accelerators
CN107368287A (en) A kind of acceleration system, accelerator and its accelerated method of data flow architecture Circular dependency
Guo Mapping applications onto FPGA-centric clusters
Miller A multiple-stream registerless shared-resource processor
Touzene All-to-all broadcast in hexagonal torus networks on-chip
Zong et al. SlideAcross: A low-latency adaptive router for chip multi-processor
Mast et al. Traveling salesman: a heuristic scaling analysis
Liu et al. Hardware Architecture of Reconfigurable Cryptographic Processors
Belkebir et al. Two-steps into energy consumption optimisation due to the mapping of multimedia application to network on chip architecture

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 100000 Beijing City, Haidian District North Road Zhongguancun green garden wensong Road No. 1

Patentee after: Beijing Zhongke Ruixin Technology Group Co.,Ltd.

Address before: 100000 Beijing City, Haidian District North Road Zhongguancun green garden wensong Road No. 1

Patentee before: SMARTCORE (BEIJING) Co.,Ltd.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20200513

Address after: Room 312, 3 / F, building 26, lanxiyuan District 4, Shunyi District, Beijing

Patentee after: Beijing Ruixin data flow technology Co.,Ltd.

Address before: 100000 Beijing City, Haidian District North Road Zhongguancun green garden wensong Road No. 1

Patentee before: Beijing Zhongke Ruixin Technology Group Co.,Ltd.