CN116663464B

CN116663464B - Optimization method and system for critical timing path

Info

Publication number: CN116663464B
Application number: CN202310954693.7A
Authority: CN
Inventors: 贝泽华; 杨静磊; 唐洁群
Original assignee: Shanghai Hejian Industrial Software Group Co Ltd
Current assignee: Shanghai Hejian Industrial Software Group Co Ltd
Priority date: 2023-08-01
Filing date: 2023-08-01
Publication date: 2023-10-20
Anticipated expiration: 2043-08-01
Also published as: CN116663464A

Abstract

The invention relates to the technical field of electronic design automation, in particular to an optimization method and system of a critical timing path, wherein the optimization method determines target processors of all nodes by evaluating the arrival time of the nodes on the critical path, and the target processor of a node i is a processor which enables the arrival time of the end point of the relevant timing path to be the shortest. According to the method, the shortest arrival time of each node is directly calculated based on the time sequence diagram in the optimization process, so that the condition that simulation results are inconsistent with actual results is avoided, the arrival time of all the time sequence diagram end points can be optimal, namely the arrival time on each critical path is the shortest arrival time, and the overall circuit performance can be improved.

Description

Optimization method and system for critical timing path

Technical Field

The invention relates to the technical field of electronic design automation, in particular to a method and a system for optimizing a key time sequence path.

Background

In FPGA-based hardware accelerated verification systems, one design will be split and put into multiple FPGAs. Thus, the timing path will span multiple FPGAs. Because the delay between FPGAs is greater than the delay within FPGAs, critical timing paths typically span multiple FPGAs, limiting system performance. In order to increase the operating frequency of the system, the transmission time on the critical timing path needs to be optimized.

Current optimization methods for critical timing paths include inserting registers, register balancing, operator balancing, eliminating code priority, logic replication, and critical signal back-off in combinational logic. Where logic replication is a critical timing path in a design that when the fanout of a signal is relatively large, it causes the signal to become excessively long to each destination logic node, where fanout can be reduced by replicating the signal.

In the prior art, the fan-out of the critical timing path is reduced by copying the current signal, so that the delay of the critical timing path can be reduced to a certain extent, but the following technical problems exist:

1. in the prior art, the time sequence diagram is re-established independently of the actual logic circuit, and the delay of a critical path crossing the FPGA is calculated in a mode of independently carrying out analog simulation on the time sequence diagram.

2. The nodes in the time sequence diagram are more, the calculation complexity is high in the process of path optimization, and the calculation resource consumption is high.

Disclosure of Invention

Aiming at the technical problem 1, the invention adopts the following technical scheme:

in a first aspect, an embodiment of the present invention provides a method for optimizing a critical timing path, the method comprising:

s100, acquiring a plurality of supernodes in a key time sequence path, wherein a supernode i is provided with a port set Pi and a candidate processor set Fi, and is one or more circuit modules in the same processor; the mth port in Pi is pm, the a candidate processor fa in Fi is a candidate position which can be copied to by the supernode i, the value range of M is 1 to M, M is the total number of ports of the supernode i, the value range of a is 1 to A, and A is the total number of candidate processors of the supernode i.

S200, accessing the supernode according to the topological order, and when the supernode i does not belong to a fixed node, calculating a target processor of the supernode i and copying the supernode i into the target processor; the obtaining step of the target processor comprises the following steps:

s220, acquiring a candidate path arc between pm of the supernode i and a port pn of a fan-out node j of the b-th processor fb _i,pm->j,pn The method comprises the steps of carrying out a first treatment on the surface of the Copying pm of supernode i into each candidate processor in Fi to obtain an arrival time set Arr formed by A arrival times of pm reaching pn of fan-out node j ^j _i,pm,fb,pn ，Arr ^j _i,pm,fb,pn Arr of (A) ^j _{i,fa,pm,fb,pn} The arrival time obtained for copying pm of supernode i into fa.

S240, calculating the arrival time sets of all candidate paths of the fan-out node j according to the arrival time setsTime-of-arrival set calculation of optimal arrival time arr for pn when fan-out node j is located at fb ^j _fb,pn 。

S260, obtaining Arr ^j _i,pm,fb,pn All of (1) satisfy arr ^j _{i,fa,pm,fb,pn} ≤arr ^j _fb,pn The processor set to which the pm of supernode i corresponding to the arrival time is copied, to obtain the best candidate processor set of the pm of supernode i.

S280, according to all fan-out nodes of the supernode i, obtaining all best candidate processor sets of the supernode i, and according to all best candidate processor sets of pm of the supernode i, obtaining target processors copied by the supernode i.

In a second aspect, another embodiment of the present invention provides a critical timing path optimization system, including a processor and a storage medium communicatively coupled to the processor, wherein the system is capable of implementing a critical timing path optimization method as described above when the processor executes a program in the storage medium.

Compared with the prior art, the method and the system for optimizing the key time sequence path have obvious beneficial effects, can achieve quite technical progress and practicality, have wide industrial utilization value, and have at least the following beneficial effects:

the first embodiment of the invention provides a method and a system for optimizing a critical timing path, which determine target processors of all nodes by evaluating arrival time of nodes on the critical timing path, wherein for arrival time of a corresponding node and a fan-out node j candidate path after the node i is copied to different processors, shortest arrival time on each path is selected, the optimal arrival time of the fan-out node j is the maximum value of the shortest arrival time of all ports, and the target processor to which the node i is copied is obtained according to the optimal arrival time of the fan-out node j. The method does not need to be separated from a logic circuit in the optimization process, but directly calculates the shortest arrival time of each node based on the logic circuit, so that the condition that the simulation result is inconsistent with the actual result is avoided, and the method and the system can enable the arrival time of each node to be the shortest arrival time, namely the arrival time on each key path to be the shortest arrival time, and can improve the overall circuit performance.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of a method for optimizing a critical timing path according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of connection relationships between partial supernodes according to an embodiment of the present invention;

fig. 3 is a flowchart of a method for extracting a supernode according to another embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.

In order to solve the technical problem 1, the present invention provides a first embodiment.

Example 1

Under the condition of not changing the connection relation between the nodes in the critical timing path, the invention enables the propagation time on the critical timing path to be optimal by reconfiguring the processor placed by each node in the critical timing path.

Referring to fig. 1, a flowchart of a method for optimizing a critical timing path is shown, the method includes the following steps:

The critical timing path is the path with the longest delay from input to output in the user design. Generally, the delay across the processor is much larger than the delay inside the processor, so the optimization of the critical timing path is the optimization of the delay across the processor.

Specifically, the port set pi= { p1, p2, …, pM, …, pM }, where M is the number of ports in the supernode i, and the range of values of M is 1 to M. The candidate processor set fi= { f1, f2, …, fA, …, fA } of the supernode i, the value range of a is 1 to a, and a is the number of candidate processors of the supernode i. It should be noted that the number of ports of different supernodes and the candidate processor set may be the same or different.

The candidate processors comprise all processors capable of placing the copied supernode i, and a candidate processor set corresponding to each supernode can be specified by a user.

Optionally, the processor is programmable array logic (Field Programmable Gate Array, FPGA).

Optionally, the circuit module is a standard element in the processor, a functional module formed by a logic circuit, or a combined circuit of a plurality of standard elements. Supernodes include two types, one type having one output and at least one input and the other type having multiple outputs and at least one input. Supernodes are the basic unit of replication, and smaller logical units cannot be replicated. When a supernode is selected to be replicated to other processors, each of the units that make up the supernode is replicated.

S200, accessing the supernodes according to the topological order, and when the supernode i does not belong to a fixed node, calculating a target processor of the supernode i and copying the supernode i into the target processor.

Optionally, the fixed nodes include a user-specified non-movable circuit element, a start node and a termination node for each critical timing path.

Wherein, the connection relation between the circuits before and after copying is unchanged. If the current node is copied to another processor, and if the two connected nodes are in the same processor, the delay across the processor can be converted into the delay inside the processor, so that the delay of the critical timing path is reduced, and the purpose of optimizing the critical timing path is achieved. If the delay on the path after copying is not in the same processor but is smaller than the delay before copying, the purpose of reducing the delay of the critical timing path can be achieved as well.

Further, the obtaining step of the target processor includes:

s220, acquiring a candidate path arc between pm of the supernode i and a port pn of a fan-out node j of the b-th processor fb _i,pm->j,pn The method comprises the steps of carrying out a first treatment on the surface of the Copying pm of the supernode i into each candidate processor in Fi respectively to obtain a virtual router (arc) _i,pm->j,pn Arrival time set Arr of A arrival times of pn arriving at fan-out node j ^j _i,pm,fb,pn ，Arr ^j _i,pm,fb,pn Arr of (A) ^j _{i,fa,pm,fb,pn} The arrival time obtained for copying pm of supernode i into fa;

it should be noted that, since the connection relationship before and after the duplication of the supernode i is unchanged, that is, no matter which processor the supernode i is duplicated to, the candidate path arc _i,pm->j,pn The connection relation of (3) is unchanged.

Referring to fig. 2, the fan-out node of supernode i is supernode j, the fan-out node of supernode k is supernode j, and the port pm of supernode i and the port pn of supernode j are connected to form a candidate path arc _i,pm->j,pn The port pd of the supernode k and the port ps of the supernode j are connected to form a candidate path arc _k,pd->j,ps . Where supernode i has a candidate processor set Fi and supernode k has a candidate processor set Fk. According to S220, the pm of supernode i is copied to a different processor of FiCalculation of candidate Path arc when pm of supernode i is copied into each processor _i,pm->j,pn The arrival time of pn reaching the node j, and the arrival time set Arr of pn is obtained ^j _i,pm,fb,pn The method comprises the steps of carrying out a first treatment on the surface of the Similarly, according to S220, the pd of the supernode k is copied to different processors of Fk to obtain an arrival time set Arr of ps ^j _k,pd,fb,pn 。

As a preferred embodiment, arr ^j _{i,fa,pm,fb,pn} The method meets the following conditions: arr ^j _{i,fa,pm,fb,pn} =arr ⁱ _fa,pm +d ^fa ^,fb _i,pm->j,pn The method comprises the steps of carrying out a first treatment on the surface of the Wherein arr is ⁱ _fa,pm To replicate pm of supernode i at the optimal arrival time at fa; d, d ^fa,fb _i,pm->j,pn To a candidate path arc when pm of supernode i is replicated at fa and pn of fan-out node j is replicated at fb _i,pm->j,pn Time delay in the above. arr ^j _{i,fa,pm,fb,pn} To be defined by candidate path arc when pm of supernode i is placed in fa and fan-out node j is placed in fb _i,pm->j,pn Time of arrival to pn.

As a preferred embodiment, arr ^j _i,pm,fb,pn The method meets the following conditions: arr (Arr) ^j _i,pm,fb,pn ={arr ^j _{i,fa,pm,fb,pn} |fa∈Fi, arc _i,pm->j,pn When the candidate path arc _i,pm->j,pn Invariably, when pm of supernode i is placed in different processors of Fi and fan-out node j is placed in fb, respectively, by candidate path arc _i,pm->j,pn A times of arrival to pn.

The arrival time of each supernode is sequentially and iteratively increased along the propagation path, and the arrival time of the corresponding candidate path of the output port of each supernode in different processors can be obtained through S220.

S240, calculating the arrival time sets of all candidate paths of the fan-out nodes j, and calculating the optimal arrival time arr of pn when the fan-out nodes j are positioned at fb according to the arrival time sets ^j _fb,pn 。

As a preferred embodiment, arr ^j _fb,pn The method meets the following conditions: arr ^j _fb,pn =max _i,pm (min _fa (arr ^j _{i,fa,pm,fb,pn} ) A) is provided; wherein min is _fa (arr ^j _{i,fa,pm,fb,pn} ) The shortest arrival time for the arrival time obtained in each candidate processor that replicates pm of supernode i to Fi; max (max) _i,pm (ii) is the maximum of the shortest arrival times of all candidate paths of the pn of the acquisition fan-out node j.

It should be noted that, setting the arrival time of the fan-out node j to the maximum value of the arrival times corresponding to all the ports can ensure that the data in all the candidate paths can arrive within the optimal arrival time.

Optionally, the arrival time of each node can be obtained according to a search algorithm, wherein the search algorithm is a depth-first traversal algorithm or a breadth-first traversal algorithm, and other algorithms capable of calculating the arrival time through a search function in the prior art fall into the protection scope of the invention. The arrival time of each node can be calculated stepwise and cumulatively forward according to the search algorithm. In the critical timing path, the starting node and the ending node of the critical timing path are fixed nodes and cannot be copied to other processors, the starting node of the critical timing path is used as the first node for searching, the optimal arrival time of the fan-out nodes is calculated, and the optimal arrival time of all the fan-out nodes is calculated by pushing.

Referring again to fig. 2, if node j has only two ports pn, ps and one output port pu, p _n The shortest arrival time is min _fa (arr ^j _{i,fa,pm,fb,pn} )，p _s The shortest arrival time is min _fa (arr ^j _{k,fa,pd,fb,ps} ) The optimal arrival time of pu is min _fa (arr ^j _{i,fa,pm,fb,pn} ) And min _fa (arr ^j _{k,fa,pd,fb,ps} ) Is the maximum value of (a).

The optimal arrival time of each supernode can be obtained through S240.

S260, obtaining Arr ^j _i,pm,fb,pn All of (1) satisfy arr ^j _{i,fa,pm,fb,pn} ≤arr ^j _fb,pn To (a)The pm of the super node i corresponding to the arrival time is copied to the processor set to obtain the best candidate processor set of the pm of the node i.

It should be noted that, when pm of supernode i is placed in any one of the set of best candidate processors, the best arrival time arr is obtained ^j _fb,pn 。

S280, acquiring all best candidate processor sets of pm of the supernode i according to all fan-out nodes of the supernode i, and acquiring target processors copied by the supernode i according to all best candidate processor sets of pm of the supernode i.

As a preferred embodiment, the step of acquiring the target processor in S280 includes:

s282, obtaining W best candidate processor sets R of pm of the supernode i according to all W fan-out nodes of pm of the supernode i ⁱ _t ={R ⁱ _t1 ,R ⁱ _t2 ,…,R ⁱ _tw ,…R ⁱ _tW (wherein R is) ⁱ _tw To obtain the best candidate processor set for pm of supernode i from the W-th fan-out node, W has a value in the range of 1 to W. The output port of the supernode i is treated as an object of optimization when the best candidate processor is acquired.

S283, statistics of R ⁱ The number of times each processor appears; for each fan-out w, select at R ⁱ _t The most frequently occurring processor is the target processor of the pm copy of supernode i.

S284, taking the pm copied target processor of the supernode i as the target processor of the supernode i. Because supernode i is a whole, when the target processor which should be copied by the output port of supernode i is obtained, the output port sends the target processor to other ports of supernode i, so that all ports belonging to supernode i are copied to the same target processor.

Referring again to fig. 2, when node i has multiple fan-out nodes: fan-out node j and fan-out node h, and port pm of supernode i and port pv of supernode h are connected to form candidate path arc _i,pm->h,pv . Fan-out sectionPoint j passes through candidate path arc _i,pm->j,pn The processor replicated by the back-to-front selection pm, the fan-out node h will also pass through the candidate path arc _i,pm->h,pv And the processors copied by back and forth selection pm can acquire the optimal processor set copied by pm, the weight of the corresponding processor is increased by one when the processor is selected once, and the candidate processor with the largest weight can meet the copying requirement of more fan-out nodes, thereby reducing the copying cost. It should be noted that, after the target processor copied by pm is obtained, pm will send the target processor to other ports in the same supernode, and copy the relevant ports of the supernode i and the circuit modules between the ports at the same time during the copying, where the copying cost is the copying area, each supernode is composed of a corresponding logic circuit or is a circuit unit with multiple output ports, each copying time needs to occupy corresponding hardware resources in the corresponding processor, and the more the number of the copying supernodes, the greater the copying cost.

As a preferred embodiment, S200 further comprises: s250, obtaining the worst negative timing margin arrBnd of the pm replication of the node i at fa _i,fa,pm (Worst Negative Slack) the optimal arrival time arr ⁱ _fa,pm Updated to arrBnd _i,fa,pm . After S250 is executed, S260 is executed. More processors that meet the condition can be released under relaxation of the relaxation arrival time are stored in the optimal candidate processor set, thereby enabling more target processors that meet the condition in the subsequent fetch target processors. For example, when S250 is not executed, the supernode i in S282 has three fan-out nodes in total, and the 3 best candidate processor sets corresponding to pm of the supernode i are { fa, fb, fc }, { fa, fd, fe } and { fg }, respectively, and the final target processors are fa and fg; when S250 is executed, after releasing more arrival times satisfying the condition, the obtained set of best candidate processors are { fa, fb, fc }, { fa, fc, fd, fe } and { fg, fc }, respectively, where the obtained target processor is fc, where there are more best candidate processors satisfying the condition, so that the result of the optimization can be more optimized under the condition of satisfying the system requirement.

Wherein pm of node i replicates the worst negative timing margin arrBnd at fa _i,fa,pm The acquisition step of (a) comprises:

s252, taking the node i as a target node i, acquiring R fan-out nodes of the target node i, and acquiring time delay of the target node i on a candidate path of each fan-out node and the worst negative timing margin of each fan-out node. It should be noted that, the worst negative timing margin of each supernode output port is calculated from the worst negative timing margin of the endpoint from the back to the front in turn, wherein the worst negative timing margin of the endpoint is a preset value.

S254, R candidate worst negative timing margins of the target node i are calculated, wherein pm of the target node i and the candidate worst negative timing margin arrBnd of pn of the fan-out node j _i,fa,pm The method meets the following conditions: arrBnd _i,fa,pm =arrBnd _j,fb,pn -d ^fa,fb _i,pm->j,pn Wherein arrBnd _j,fb,pn D is the worst negative timing margin of pn of fan-out node j ^fa,fb _i,pm->j,pn Path arc is selected when port pm for target node i is replicated on fa and port pn for node j is replicated on processor fb _i,pm->j,pn Time delay in the above.

S256, calculating the minimum value of R candidate worst negative timing margin to obtain arrBnd _i,fa,pm 。

Arr0 when the next stage fan-out node y is the end point _y Is a preset value.

In summary, the first embodiment provides an optimization method for a critical timing path, which selects the shortest arrival time on each path by calculating the arrival time on the corresponding node and the candidate path of the fan-out node j after copying the node i to different processors, and obtains the target processor to which the node i is copied according to the best arrival time of the fan-out node j, where the best arrival time of the fan-out node j is the maximum value of the shortest arrival times of all ports. The method does not need to be separated from a logic circuit in the optimization process, but directly calculates the shortest arrival time of each node based on the logic circuit, so that the condition that the simulation result is inconsistent with the actual result is avoided, and the method and the system can enable the arrival time of each node to be the shortest arrival time, namely the arrival time on each key path to be the shortest arrival time, and can improve the overall circuit performance.

Based on the same inventive concept as the method embodiment in the first embodiment, the first embodiment of the present invention further provides a critical timing path optimization system, which includes a processor and a storage medium communicatively connected to the processor, wherein when the processor executes a program in the storage medium, the system may implement a critical timing path optimization method provided in the first embodiment, where a critical timing path optimization method is described in detail in the first embodiment and is not repeated.

In order to solve the problem 2, the embodiment of the invention further provides a method and a system for extracting the supernode.

Example two

Referring to fig. 3, a second embodiment provides a method for extracting a supernode, including the following steps:

p100, extracting a directed graph according to a time sequence diagram of the time sequence circuit, wherein nodes in the directed graph are input ports and output ports of a circuit device, and edges in the directed graph are time sequence arcs.

Optionally, the sequential circuit is distributed among a plurality of processors.

The time sequence circuit is a circuit with a memory function, the memory element generally adopts a trigger, and the time sequence circuit consists of a combination circuit and the trigger.

The timing arc (timing arc) in the timing diagram is a delay path between two nodes, and is used for connecting each port sending out signals and each port receiving signals. The port from which the signal is sent and the port from which the signal is received may be an output port of one unit and an input port of another unit, or may be an input port of one unit and an output port of itself, respectively. The timing arc includes a path between timing cells and a path within a timing cell. The inter-sequential cell path is a delay path between the output port of one cell and the input port of another cell. The path in the time sequence unit is a delay path of an input port and an output port of the unit. Among these types of sequential cells are registers, latches, memories, and the like.

It should be noted that, because the user design is relatively large, the user needs to be put into a plurality of processors. To obtain a complete timing diagram, the timing diagram needs to be extracted as a whole from all the timing circuits in the processor.

And P200, acquiring target critical path end points in the directed graph, and extracting all target critical paths ending at each target critical path end point according to a preset delay threshold value, wherein the critical cone subgraphs are formed by edges and nodes on all target critical paths of each target critical path end point. It should be noted that, each target critical path end point corresponds to a critical cone sub-graph.

The front node is a node of a previous stage of the target node, and the target node is a fan-out node of the front node.

Optionally, the search algorithm is a depth-first traversal algorithm or a breadth-first traversal algorithm, and other algorithms capable of calculating the arrival time through the search function in the prior art fall within the protection scope of the present invention.

As a preferred embodiment, the step of obtaining the critical path includes: acquiring the propagation time of each path in the time sequence diagram, acquiring the delay threshold range of the critical path, and screening paths with the propagation time within the delay threshold range to obtain the critical path.

As a preferred embodiment, P200 further comprises the step of extracting the cone sub-graph: the cone subgraph of each end point is obtained according to a searching algorithm, wherein the step of obtaining the cone subgraph of the target critical path end point e comprises the following steps: taking a target critical path end point e as a target node, traversing all the front nodes of the target node to obtain target front nodes; and taking each target prepositive node as a new target node, acquiring all target prepositive nodes of the new target node, and searching from back to front by analogy until no prepositive node exists, ending the searching, and obtaining a conical subgraph with the searching range being the target critical path end point e.

Wherein, the extracting step of the key cone subgraph that the P200 further includes the target key path end point e further includes: when traversing all the prepositions of the target nodes, according to a preset delay threshold, when the propagation time between the prepositions and the target nodes is within the range of the delay threshold, obtaining the target prepositions, taking each target prepositions as a new target node, obtaining all the target prepositions of the new target nodes, and so on, searching from back to front until no prepositions exist, ending the searching, and obtaining a key cone subgraph with the searching range as an end point e.

And P300, merging all key cone subgraphs into a key directed graph G_crit, performing topological ordering on nodes in the G_crit, and sequentially accessing to obtain supernodes. Further, the step of obtaining the supernode includes:

p310, for node r, querying all the prepositions of node r, and combining the prepositions meeting the combination condition with node r into a new node r'; the merging conditions are as follows: node r and preamble node q are in the same processor, preamble node q is a replicable node, preamble node q has only one fan-out signal and the receiving node of the fan-out signal has only node r.

P320, repeatedly executing P310-P320 according to the new node r ', and ending the combination when the front nodes of the node r' do not meet the combination condition, thereby obtaining the supernode.

As a preferred embodiment, the method further includes P330, dividing the nodes in the timing diagram into three classes: the method comprises the following steps of a key node, a relevant node and an irrelevant node, wherein the key node is a node in a key cone sub-graph, the relevant node is a node in the cone sub-graph corresponding to all target key path end points, the irrelevant node is any node which does not belong to the relevant node in a time sequence diagram, and the supernode is divided according to any one of the following conditions: for key nodes, obtaining supernodes according to P310-P320; dividing non-critical nodes in related nodes in each processor into a supernode; the uncorrelated nodes in each processor are divided into one supernode. It should be noted that, the relevant nodes include but are not equal to the key nodes, and there is no intersection between the irrelevant nodes and the relevant nodes.

When path optimization is carried out subsequently, data such as node calculation time and the like in the supernode do not need to be traversed, and only the supernode is required to be processed as a whole, so that the calculation complexity can be reduced, and occupied calculation resources are reduced. The logic units with single fan-out are combined into one supernode, and the supernode is single fan-out, so that only relevant logic units can be copied during copying, logic units on other fan-out paths can not be copied, and the copying area is reduced.

The supernode obtained by the P310-P320 method does not increase a new circulation path, and the created supernode can avoid redundant duplication, reduce duplication cost and save hardware resources. For the sequential units which do not meet the merging condition, namely the units with multiple fan-outs are independently divided into one supernode, even if the copied supernode has multiple fan-outs, since the supernode only comprises a single logic unit with multiple fan-outs, other related circuits are not copied together when the supernode is copied, redundant copying can be avoided, and the copying area is reduced. For the front node with single fan-out, the super node is created through merging, the creation mode does not create multiple output nodes, the super node copied during copying is single fan-out, the irrelevant logic units are decoupled from the fan-out paths of the super node, the copying area is reduced to the greatest extent, the relevant logic units are copied only during copying, and the logic units on other fan-out paths are not copied, so that redundant copying can be avoided.

As a preferred embodiment, the timing arcs inside the supernode are timing paths between boundary nodes inside the supernode, where a boundary node is a node in the supernode interior that establishes a connection with other nodes outside the supernode. The internal nodes of the super node comprise boundary nodes and non-boundary nodes, and the non-boundary nodes are nodes which do not establish connection with other nodes outside the super node.

As a preferred embodiment, the method further comprises:

p400, updating the time sequence diagram by using the supernode, wherein the nodes in the updated time sequence diagram are supernodes, and the supernodes comprise boundary nodes which are connected with other nodes outside the supernodes and internal nodes which are not connected with other nodes outside the supernodes; the connection relation between the boundary node in the supernode and the external supernode forms an edge in the updated time sequence diagram.

The method for optimizing the critical timing path according to the first embodiment is executed on the basis of the updated timing diagram, so that the computing resources can be reduced, the computing complexity can be reduced, the replication area can be reduced, the hardware resources can be saved, and meanwhile, the problem that more timing sequences are introduced by replication-independent circuits can be avoided.

In summary, the second embodiment provides a method for extracting supernodes, which extracts a directed graph according to a timing diagram of a timing circuit, and extracts a key cone subgraph ending in an endpoint in the timing diagram according to a target endpoint; merging all the key cone subgraphs into a key directed graph, sequentially accessing nodes according to the topological sequence of the key directed graph, merging each front node meeting the merging condition with the current node, accessing the front node again by using the merged nodes as new nodes, and the like to obtain supernodes; and respectively and independently using the time sequence units and the fixed nodes which do not meet the merging condition as supernodes. The super node obtained by the extraction method provided by the invention can reduce the computational complexity and the consumption of computational resources when performing path optimization, and the created super node does not increase a new circulation path, so that the redundant replication can be avoided, the replication cost is reduced, and the hardware resources are saved.

Based on the same inventive concept as the above method embodiment, the present invention further provides a supernode extraction system, which includes a processor and a storage medium communicatively connected to the processor, wherein when the processor executes a program in the storage medium, the system may implement a supernode extraction method provided according to the second embodiment. One of the extraction methods of supernodes is described in detail in the second embodiment, and will not be described again.

While certain specific embodiments of the invention have been described in detail by way of example, it will be appreciated by those skilled in the art that the above examples are for illustration only and are not intended to limit the scope of the invention. Those skilled in the art will also appreciate that many modifications may be made to the embodiments without departing from the scope and spirit of the invention. The scope of the invention is defined by the appended claims.

Claims

1. A method of optimizing a critical timing path, the method comprising:

s100, acquiring a plurality of supernodes in a key time sequence path, wherein a supernode i is provided with a port set Pi and a candidate processor set Fi, and is one or more circuit modules in the same processor; the M-th port in Pi is pm, the a-th candidate processor fa in Fi is a candidate position which can be copied to by the supernode i, the value range of M is 1 to M, M is the total number of ports of the supernode i, the value range of a is 1 to A, and A is the total number of candidate processors of the supernode i;

s240, calculating the arrival time sets of all candidate paths of the fan-out node j, and calculating the optimal arrival time arr of pn when the fan-out node j is positioned at fb according to the arrival time sets ^j _fb,pn ；

S260, obtaining Arr ^j _i,pm,fb,pn All of (1) satisfy arr ^j _{i,fa,pm,fb,pn} ≤arr ^j _fb,pn Is copied to the processor set corresponding to the arrival time of supernode i pmObtaining the optimal candidate processor set of pm of the supernode i;

s280, acquiring all optimal candidate processor sets of pm of the supernode i according to all fan-out nodes of the supernode i, and acquiring target processors copied by the supernode i according to all optimal candidate processor sets of pm of the supernode i;

the step of obtaining the target processor in S280 includes:

s282, obtaining W best candidate processor sets R of pm of the supernode i according to all W fan-out nodes of pm of the supernode i ⁱ _t ={R ⁱ _t1 ,R ⁱ _t2 ,…,R ⁱ _tw ,…R ⁱ _tW (wherein R is) ⁱ _tw Obtaining the optimal candidate processor set of pm of the supernode i according to the W-th fan-out node, wherein the value range of W is 1 to W;

s283, statistics of R ⁱ _t The number of times each processor appears; for each fan-out node w, select at R ⁱ _t The processor with the highest occurrence number is a target processor copied by pm of the supernode i;

s284, taking the pm copied target processor of the supernode i as the target processor of the supernode i.

2. The method of claim 1, wherein in S220 arr ^j _{i,fa,pm,fb,pn} The method meets the following conditions:

arr ^j _{i,fa,pm,fb,pn} =arr ⁱ _fa,pm +d ^fa,fb _i,pm->j,pn ；

wherein arr is ⁱ _fa,pm To replicate pm of supernode i at the optimal arrival time at fa; d, d ^fa,fb _i,pm->j,pn To a candidate path arc when pm of supernode i is replicated at fa and pn of fan-out node j is replicated at fb _i,pm->j,pn Time delay in the above.

3. The method according to claim 1, wherein in S240 arr ^j _fb,pn The method meets the following conditions:

arr ^j _fb,pn =max _i,pm (min _fa (arr ^j _{i,fa,pm,fb,pn} ))；

wherein min is _fa (arr ^j _{i,fa,pm,fb,pn} ) The shortest arrival time for the arrival time obtained in each candidate processor that replicates pm of supernode i to Fi; max (max) _i,pm (act) is to obtain the maximum of the shortest arrival times of all candidate paths of fan-out node j.

4. The method of claim 1, wherein the fixed nodes comprise user-specified non-movable circuit elements, a start node and an end node for each critical timing path.

5. The method of claim 2, wherein S200 further comprises:

s250, obtaining the worst negative timing margin arrBnd of the pm replication of the node i at fa _i,fa,pm Copying pm of node i to the optimal arrival time arr at fa ⁱ _fa,pm Updated to arrBnd _i,fa,pm 。

6. The method of claim 5, wherein the worst negative timing margin arrBnd _i,fa,pm The acquisition step of (a) comprises:

s252, taking a node i as a target node i, acquiring R fan-out nodes of the target node i, and acquiring time delay of the target node i on a candidate path of each fan-out node and the worst negative timing margin of each fan-out node;

s254, R candidate worst negative timing margins of the target node i are calculated, wherein pm of the target node i and the candidate worst negative timing margin arrBnd of pn of the fan-out node j _i,fa,pm The method meets the following conditions: arrBnd _i,fa,pm =arrBnd _j,fb,pn -d ^fa,fb _i,pm->j,pn Wherein arrBnd _j,fb,pn D is the worst negative timing margin of pn of fan-out node j ^fa,fb _i,pm->j,pn Port pm for target node i is replicated on fa and port pn for node j is replicated on processorSelecting path arc at fb _i,pm->j,pn Time delay on the clock;

7. A critical timing path optimization system comprising a processor and a storage medium in communication with the processor, wherein the system is operable to implement the method of any of claims 1-6 when the processor executes a program in the storage medium.