CN113420520A

CN113420520A - Integrated circuit device design simulation method, device, apparatus and readable storage medium

Info

Publication number: CN113420520A
Application number: CN202110723085.6A
Authority: CN
Inventors: 王斌; 潘于; 陈俊
Original assignee: Haiguang Information Technology Co Ltd
Current assignee: Haiguang Information Technology Co Ltd
Priority date: 2021-06-25
Filing date: 2021-06-25
Publication date: 2021-09-21
Anticipated expiration: 2041-06-25
Also published as: CN113420520B

Abstract

The present disclosure provides an integrated circuit device design simulation method, device, apparatus, and readable storage medium. The design simulation method for the integrated circuit device comprises the following steps: obtaining a run queue constructed based on a plurality of modules of an integrated circuit device, wherein the run queue comprises at least one module selected from the plurality of modules, and in the run queue, the at least one module is sequentially arranged based on an execution hierarchy of the at least one module; determining an available processor; and running, using the available processors, simulation functions of the modules within the run queue for simulation verification of the integrated circuit device.

Description

Integrated circuit device design simulation method, device, apparatus and readable storage medium

Technical Field

Embodiments of the present disclosure relate to the field of integrated circuit technologies, and in particular, to a method, an apparatus, a device, and a readable storage medium for design simulation of an integrated circuit device.

Background

As the complexity of integrated circuits continues to increase, the complexity of circuit designs also continues to increase. For example, the demand for computing power of computing devices is rapidly increasing, General-purpose graphics processors (gpgpgpus) have extremely wide application due to the design of parallel computing, and the requirement for fast iteration of integrated circuits is higher and higher, wherein the quality of the model of the integrated circuit plays an extremely important role in the whole circuit design.

Because of the characteristics of integrated circuit design, models are generally based on interfaces, modules, registers and clocks to serve processing transactions, developers mostly abstract these general components to construct a unified model framework, and fill the functions of the models on the basis of the model framework, so as to facilitate subsequent design steps such as simulation verification, physical design and the like. For a large integrated circuit such as a GPGPU, the large integrated circuit has the characteristics of multiple modules, multiple interfaces, multiple data interaction and the like, even if GPGPU simulation is carried out on the level of a model framework, a large amount of verification time needs to be consumed, and more time needs to be consumed in the process of hybrid simulation with design. Thus, it becomes critical how to accelerate model simulation during the design process of an integrated circuit.

Disclosure of Invention

Some embodiments of the present disclosure provide a design simulation method, apparatus, device and readable storage medium for an integrated circuit device for increasing a model simulation rate of the integrated circuit.

According to an aspect of the present disclosure, a design simulation method for an integrated circuit device is provided. The design simulation method comprises the following steps: obtaining a run queue constructed based on a plurality of modules of an integrated circuit device, wherein the run queue comprises at least one module selected from the plurality of modules, and in the run queue, the at least one module is sequentially arranged based on an execution hierarchy of the at least one module; determining an available processor; and running, using the available processors, simulation functions of the modules within the run queue for simulation verification of the integrated circuit device.

According to some embodiments of the present disclosure, a run queue is built based on a plurality of modules of an integrated circuit device according to the following steps: obtaining a hierarchy between a plurality of modules of an integrated circuit device; generating a multi-way tree structure corresponding to at least one module based on the hierarchical structure, wherein the multi-way tree structure characterizes different modules by different nodes, each child-node-characterized module is included in a parent-node-characterized module corresponding to a child node, and an execution hierarchy of the at least one module is characterized by a connection order between the child node and the parent node having a node depth; and traversing the nodes in the multi-branch tree structure based on the node depth to construct a running queue.

According to some embodiments of the present disclosure, the nodes are arranged in the run queue in a connection order and include therein node information associated with the nodes, the node information including: the simulation function of the node comprises a node name, an operation indicator, father node information and child node information, wherein the operation indicator is used for indicating whether the simulation function of the node can operate or not, the father node information comprises the node name of a father node directly connected with the node, and the child node information comprises the node name of a child node directly connected with the node.

According to some embodiments of the present disclosure, traversing nodes in a multi-way tree structure based on node depth to construct a run queue comprises: according to the depth-first principle, starting from a leaf node with the maximum node depth, sequentially visiting to a root node with the node depth of 1 in the multi-branch tree from the leaf node with the maximum node depth from the node depth to the root node with the node depth from the node depth, so as to construct a running queue, wherein the leaf node represents a node without a child node, and the root node represents a node without a parent node.

According to some embodiments of the present disclosure, according to the depth-first principle, starting from a leaf node having the largest node depth, sequentially visiting to a root node having a node depth of 1 in the multi-way tree in an order from the largest node depth to the smallest node depth, includes: sequentially storing node information of nodes in a running queue according to node depths of the nodes, wherein whether the node is a leaf node or a parent node is determined for the nodes with the same node depth, the node information of the leaf node is stored in front of the node information of the parent node in the running queue, and an operation indicator of the leaf node is set to a mark 1, and an operation indicator of the parent node is set to a mark 2, wherein the operation indicator indicates that the simulation function of the node is operable for the mark 1, and the operation indicator indicates that the simulation function of the node is inoperable for the mark 2.

According to some embodiments of the present disclosure, according to the depth-first principle, starting from a leaf node having the largest node depth, sequentially visiting in order of node depth from large to small in a multi-way tree to a root node having a node depth of 1, further comprising: for nodes in the multi-branch tree structure, determining active nodes with data information, and marking the active nodes and parent nodes and child nodes directly connected with the active nodes as active states.

According to some embodiments of the present disclosure, storing node information of a node in a run queue according to a node depth of the node includes: in a simulation clock cycle, only for a node marked as an active state in the current multi-branch tree structure, storing node information of the node in a running queue according to the node depth of the node, and not storing nodes which are not marked as the active state in the current multi-branch tree structure.

According to some embodiments of the present disclosure, sequentially storing node information of nodes in a run queue according to node depths of the nodes comprises: for maximum node depth N: determining a node with the node depth of N, and storing node information of the node with the node depth of N in a running queue, wherein a running indicator of the node with the node depth of N is set as a mark 1, and a node name of a father node directly connected with the node depth of N is used as father node information of the node with the node depth of N; and for node depths M with depths less than N: determining a leaf node with a node depth of M, and storing node information of the leaf node with the node depth of M in a running queue, wherein a running indicator of the leaf node with the node depth of M is set to be a mark 1, and a node name of a father node directly connected with the leaf node with the node depth of M is used as father node information of the leaf node with the node depth of M; determining a father node with the node depth of M, and storing node information of the father node with the node depth of M in a running queue, wherein the running indicator of the father node with the node depth of M is set to be a mark 2, the node name of the father node directly connected with the father node with the node depth of M is used as the father node information of the father node with the node depth of M, and the node name of a child node directly connected with the father node with the node depth of M is used as the child node information of the father node with the node depth of M, wherein N is an integer larger than 1, and M is an integer smaller than N and larger than 1.

According to some embodiments of the present disclosure, the number of available processors is plural, and running the simulation function of the module within the run queue using the available processors includes: simulation functions for multiple modules within the run queue are run in parallel using multiple available processors.

According to some embodiments of the present disclosure, running simulation functions of a plurality of modules within the run queue in parallel comprises: determining the number of nodes with running indicators marked by 1 in a running queue; and running, using the available processors, in parallel, a simulation function of the nodes marked with 1 running indicators within the running queue based on the number of available processors and the number of nodes marked with 1 running indicators.

According to some embodiments of the present disclosure, the number of available processors is denoted as S, the number of nodes whose operation indicators are marked 1 is denoted as T, S and T are integers equal to or greater than 0, and the executing the emulation functions of the nodes in the run queue in parallel using the available processors based on the number of available processors and the number of nodes whose operation indicators are marked 1 comprises: in the event that it is determined that T is less than or equal to S, running, with the available processors, simulation functions of the T running indicators in the running queue as the node labeled 1 in parallel; alternatively, in the event that it is determined that T is greater than S, the available processors are utilized to run simulation functions for the first S run indicators in the run queue in parallel for the node labeled 1.

According to some embodiments of the present disclosure, running simulation functions of a plurality of modules within the run queue in parallel further comprises: and updating the running queue in the running process.

According to some embodiments of the present disclosure, updating a run queue during a run includes: representing each node after the simulation function is operated as a first node, determining a father node of the first node according to father node information of the first node, and updating the node information of the father node of the first node; deleting the node information of the first node in the running queue; and continuously updates until the run queue is empty.

According to some embodiments of the present disclosure, the parent node of the first node is represented as a second node, and updating the node information of the parent node of the first node includes: deleting the node name of the first node from the child node information of the second node; determining whether child node information of the second node is null; in case that it is determined that the child node information of the second node is empty, the operation indicator of the second node is set to a flag 1.

According to another aspect of the present disclosure, there is also provided a design simulation apparatus for an integrated circuit device, including: a run queue obtaining unit configured to: obtaining a run queue constructed based on a plurality of modules of an integrated circuit device, wherein the run queue comprises at least one module selected from the plurality of modules, and in the run queue, the at least one module is sequentially arranged based on an execution hierarchy of the at least one module; a computing unit configured to: determining an available processor; and a simulation verification unit configured to run simulation functions of the modules within the run queue using the available processors for simulation verification of the integrated circuit device.

According to some embodiments of the present disclosure, the design simulation apparatus further includes a run queue building unit. The run queue building unit is configured to: obtaining a hierarchy between a plurality of modules of an integrated circuit device; generating a multi-way tree structure corresponding to at least one module based on the hierarchical structure, wherein the multi-way tree structure characterizes different modules by different nodes, each child-node-characterized module is included in a parent-node-characterized module corresponding to a child node, and an execution hierarchy of the at least one module is characterized by a connection order between the child node and the parent node having a node depth; and traversing nodes in the multi-way tree structure based on the node depth to construct a running queue, wherein the nodes are arranged in the running queue according to a connection order and include node information associated with the nodes, and the node information includes: the simulation function of the node comprises a node name, an operation indicator, father node information and child node information, wherein the operation indicator is used for indicating whether the simulation function of the node can operate or not, the father node information comprises the node name of a father node directly connected with the node, and the child node information comprises the node name of a child node directly connected with the node.

According to some embodiments of the disclosure, the run queue building unit is configured to: according to the depth-first principle, starting from a leaf node with the maximum node depth, sequentially visiting to a root node with the node depth of 1 in the multi-branch tree from the leaf node with the maximum node depth from the node depth to the root node with the node depth from the node depth, so as to construct a running queue, wherein the leaf node represents a node without a child node, and the root node represents a node without a parent node.

According to some embodiments of the disclosure, the run queue building unit is configured to: sequentially storing node information of nodes in a running queue according to node depths of the nodes, wherein whether the node is a leaf node or a parent node is determined for the nodes with the same node depth, the node information of the leaf node is stored in front of the node information of the parent node in the running queue, and an operation indicator of the leaf node is set to a mark 1, and an operation indicator of the parent node is set to a mark 2, wherein the operation indicator indicates that the simulation function of the node is operable for the mark 1, and the operation indicator indicates that the simulation function of the node is inoperable for the mark 2.

According to some embodiments of the disclosure, the run queue building unit is further configured to: for nodes in the multi-branch tree structure, determining active nodes with data information, and marking the active nodes and parent nodes and child nodes directly connected with the active nodes as active states.

According to some embodiments of the disclosure, the run queue building unit is configured to: in a simulation clock cycle, only for a node marked as an active state in the current multi-branch tree structure, storing node information of the node in a running queue according to the node depth of the node, and not storing nodes which are not marked as the active state in the current multi-branch tree structure.

According to some embodiments of the disclosure, the run queue building unit is configured to: for maximum node depth N: determining a node with the node depth of N, and storing node information of the node with the node depth of N in a running queue, wherein a running indicator of the node with the node depth of N is set as a mark 1, and a node name of a father node directly connected with the node depth of N is used as father node information of the node with the node depth of N; and for node depths M with depths less than N: determining a leaf node with a node depth of M, and storing node information of the leaf node with the node depth of M in a running queue, wherein a running indicator of the leaf node with the node depth of M is set to be a mark 1, and a node name of a father node directly connected with the leaf node with the node depth of M is used as father node information of the leaf node with the node depth of M; determining a father node with the node depth of M, and storing node information of the father node with the node depth of M in a running queue, wherein the running indicator of the father node with the node depth of M is set to be a mark 2, the node name of the father node directly connected with the father node with the node depth of M is used as the father node information of the father node with the node depth of M, and the node name of a child node directly connected with the father node with the node depth of M is used as the child node information of the father node with the node depth of M, wherein N is an integer larger than 1, and M is an integer smaller than N and larger than 1.

According to some embodiments of the present disclosure, the number of available processors is plural, the simulation verification unit is configured to: simulation functions for multiple modules within the run queue are run in parallel using multiple available processors.

According to some embodiments of the disclosure, the simulation verification unit is configured to: determining the number of nodes with running indicators marked by 1 in a running queue; and running, using the available processors, in parallel, a simulation function of the nodes marked with 1 running indicators within the running queue based on the number of available processors and the number of nodes marked with 1 running indicators.

According to some embodiments of the present disclosure, the number of available processors is denoted as S, the number of nodes whose operation indicators are 1 is denoted as T, S and T are integers equal to or greater than 0, the emulation verification unit is configured to: in the event that it is determined that T is less than or equal to S, running, with the available processors, simulation functions of the T running indicators in the running queue as the node labeled 1 in parallel; alternatively, in the event that it is determined that T is greater than S, the available processors are utilized to run simulation functions for the first S run indicators in the run queue in parallel for the node labeled 1.

According to some embodiments of the disclosure, the simulation verification unit is further configured to: and updating the running queue in the running process.

According to some embodiments of the disclosure, the simulation verification unit is configured to: representing each node after the simulation function is operated as a first node, determining a father node of the first node according to father node information of the first node, and updating the node information of the father node of the first node; deleting the node information of the first node in the running queue; and continuously updates until the run queue is empty.

According to some embodiments of the disclosure, the parent node of the first node is represented as a second node, the emulation verification unit is configured to: deleting the node name of the first node from the child node information of the second node; determining whether child node information of the second node is null; in case that it is determined that the child node information of the second node is empty, the operation indicator of the second node is set to a flag 1.

According to yet another aspect of the present disclosure, there is also provided a design simulation apparatus for an integrated circuit device. The design simulation apparatus for an integrated circuit device includes: a processor; and a memory, wherein the memory has stored therein computer readable code which, when executed by the processor, performs a design simulation method for an integrated circuit device as described above.

According to yet another aspect of the present disclosure, there is also provided a non-transitory computer readable storage medium having stored thereon instructions that, when executed by a processor, cause the processor to perform the design simulation method for an integrated circuit device as described above.

By using the design simulation method, device, equipment and readable storage medium for the integrated circuit device provided by the embodiment of the disclosure, the simulation verification process for the integrated circuit device can be performed based on the running queue constructed by at least one module selected from a plurality of modules of the integrated circuit device, so that the model simulation rate of the integrated circuit is improved, and the design cycle of the integrated circuit is shortened.

Drawings

In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 shows a schematic diagram of a model framework of an integrated circuit;

FIG. 2A shows a flow diagram of a design simulation method according to an embodiment of the present disclosure;

FIG. 2B illustrates a flow diagram for building a run queue according to an embodiment of the disclosure;

FIG. 3 shows a schematic diagram of a multi-way tree structure corresponding to the model framework in FIG. 1;

FIG. 4 illustrates a schematic diagram of node information for a run queue according to an embodiment of the disclosure;

FIG. 5 shows a schematic diagram of a multi-way tree structure with a maximum node depth of 5;

FIG. 6 illustrates an overall flow diagram of a design simulation method according to an embodiment of the present disclosure;

FIG. 7 shows a schematic block diagram of a design simulation apparatus according to an embodiment of the present disclosure;

FIG. 8 shows a schematic block diagram of a design simulation device according to an embodiment of the present disclosure;

FIG. 9 shows a schematic diagram of an architecture of an exemplary computing device, in accordance with embodiments of the present disclosure;

fig. 10 shows a schematic diagram of a non-transitory computer-readable storage medium according to an embodiment of the disclosure.

Detailed Description

The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure. It is to be understood that the described embodiments are merely exemplary of some, and not all, of the present disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without any inventive step, are intended to be within the scope of the present disclosure.

Furthermore, as used in this disclosure and in the claims, the terms "a," "an," "the," and/or "the" are not intended to be inclusive in the singular, but rather are inclusive in the plural, unless the context clearly dictates otherwise. The use of "first," "second," and similar terms in this disclosure is not intended to indicate any order, quantity, or importance, but rather is used to distinguish one element from another. Likewise, the word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect.

For an integrated circuit, high manufacturing cost and irreversible development time cost make each link in a circuit design process not tolerant to a little error, and as the complexity of the integrated circuit is increased day by day, a space and a range to be explored for verification are increased, so that the time required for verification is increased. The full and quick verification process is favorable for timely finding errors existing in the design, so that the design success rate is improved.

For large integrated circuit structures, such as GPGPUs, the model framework of the circuit structure is typically abstracted into multiple modules with a hierarchical structure based on a clock. In particular, the division of a module may be understood as dividing a large design into small sub-designs, for example, the circuit as a whole may be divided into a plurality of modules according to the implemented functions. The division of the modules may take into account factors such as the number of divided modules, signal interconnections between modules, etc.

Schematically, fig. 1 shows a model framework schematic with a hierarchical structure. In fig. 1 a model framework with a 3-level structure is shown, wherein the level 1 structure comprises the top level modules, the level 2 structure comprises

modules

1 and 2, and the level 3 structure comprises sub-module 1, sub-module 2, sub-module 3 and sub-module 4. In the model framework, there may be data transmission between modules, which is shown in fig. 1 as pipes, e.g., pipe 1, sub-pipe 1, and sub-pipe 2, e.g., data may be transmitted from sub-module 1 to sub-module 2 through sub-pipe 1.

In the related art, in order to perform simulation verification on the model framework shown in fig. 1, for each simulation clock cycle, modules in the model framework need to be subjected to serial verification processes in a sequence from bottom to top in a hierarchical structure. For example, for one simulation clock cycle, the simulation functions of the modules need to be executed one by one in the order of sub-module 1, sub-module 2, sub-module 3, sub-module 4, module 1, module 2, and the top module to verify whether the modules can achieve their design functions.

Although the above serial verification method is easy to implement, it also seriously affects the speed of simulation verification of the integrated circuit, especially for the integrated circuit with complex structure, such as GPGPU, which has a large number of hierarchical structures and complex layers in the model framework, and also has a large number of modules included therein, the above serial verification method will make the verification run time of the whole model too long, which is not favorable for the optimal design of the integrated circuit, and also cannot meet the market demand for continuously shortening the production cycle of the integrated circuit design.

In order to improve the model simulation efficiency of an integrated circuit, some embodiments of the present disclosure provide a design simulation method for an integrated circuit device, which shortens the simulation verification time while ensuring the design effect.

It is to be understood that the integrated circuit devices referred to herein may refer broadly to various types of circuit structures, such as GPGPU, Soc, microprocessor, and the like, without limitation. Unless otherwise defined, all terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.

To more clearly illustrate the design simulation method according to an embodiment of the present disclosure, fig. 2A is provided for illustrating a flow chart of the design simulation method according to an embodiment of the present disclosure. The following describes an implementation process of the design simulation method according to the embodiment of the present disclosure with reference to fig. 2A.

As shown in fig. 2A, first, in step S201, a run queue built based on a plurality of modules of an integrated circuit device is acquired. For example, the integrated circuit device may be a GPGPU and the plurality of modules of the integrated circuit device may be modules included in a model framework of the integrated circuit device, e.g., the model framework may be as shown in fig. 1. It is to be understood that the model framework shown in fig. 1 is only schematic, the hierarchy structure may be more than 3 levels, for example, several levels or several tens of levels, for different integrated circuit structures, and the modules included in the model framework are not limited to the modules shown in fig. 1, and are not limited herein.

According to some embodiments of the present disclosure, the run queue may include at least one module selected from a plurality of modules. For example, the run queue may include a portion of the modules in the model framework or may include all of the modules in the model framework. Further, according to some embodiments of the present disclosure, the at least one module in the run queue is ordered based on an execution hierarchy of the at least one module. The execution level can be understood as the execution order relationship among the modules shown in fig. 1, for example, the execution order of the module 1 should be after the execution order of the sub-modules 1-4 included in the module, that is, the execution logic should process the module at the 3 rd level before processing the module at the 2 nd level. The process of building the run queue will be described in detail below.

Next, in step S202, available processors are determined. In this step, the available processors may be processors that can be used to perform simulation verification. The number of available processors may be multiple, and further, the processors may be distributed, or processors in cloud computing, without limitation.

Next, at step S203, the acquired simulation functions of the modules within the run queue are run using the available processors for simulation verification of the integrated circuit device. The simulation function of the module can be a main (main) function of the module, the main function can be used as a starting point for executing the simulation program, if other functions exist, the simulation program returns to the main function after the other functions are called, and finally the whole simulation program is ended by the main function.

With the design simulation method according to some embodiments of the present disclosure, the simulation verification process of each module in the related art is avoided from being performed serially, and instead, the simulation verification process for the integrated circuit device is performed based on the operation queue constructed by at least one module selected from the plurality of modules of the integrated circuit device, which is advantageous for improving the model simulation efficiency of the integrated circuit and shortening the design cycle of the integrated circuit. Especially for large integrated circuit designs such as GPGPU, the simulation verification process based on the operation queue can speed up the model simulation, thereby saving the time cost of the circuit design.

Hereinafter, a construction process of the run queue according to an embodiment of the present disclosure will be described in detail. Specifically, fig. 2B shows a schematic flow chart of constructing a run queue according to an embodiment of the present disclosure.

According to some embodiments of the present disclosure, a run queue is constructed based on a plurality of modules of an integrated circuit device, and first, in step S210, a hierarchical structure between the plurality of modules of the integrated circuit device is obtained. The hierarchy among the modules herein may refer to a model framework generated for an integrated circuit to be designed, the model framework being composed of a plurality of modules having a hierarchical structure, for example, as shown in fig. 1.

Next, in step S220, a multi-way tree structure corresponding to at least one module is generated based on the hierarchical structure. In some embodiments according to the present disclosure, at least one module may be all modules in the model architecture, or may be a part of the modules (for example, only a part of the modules for which verification is desired is included), and is not limited herein.

For convenience of description, taking the model framework in fig. 1 as an example, the multi-way tree structure constructed for all modules of the model framework in fig. 1 is shown in fig. 3. As shown in FIG. 3, in the multi-way tree structure, different modules are characterized by different nodes, e.g., a node is represented by a circular box, which corresponds to a module in the model framework of the integrated circuit. Furthermore, for nodes in the multi-way tree structure, the nodes are divided into child nodes and parent nodes, wherein each module characterized by a child node is included in the module characterized by a parent node corresponding to the child node, for example, the node corresponding to module 1 is the parent node and the nodes corresponding to sub-modules 1-4 are the child nodes. As another example, the node corresponding to the top module is the parent node and the nodes corresponding to

modules

1 and 2 are the child nodes. That is, in the multi-way tree structure, the execution hierarchy of at least one module is characterized in the order of connection between a child node having a node depth and a parent node. For example, for the model framework shown in fig. 1 including a 3-level structure, the maximum node depth of the multi-way tree structure (as shown in fig. 3) constructed therefrom is 3, wherein a node having a node depth of 1 corresponds to the 1 st level structure, a node having a node depth of 2 corresponds to the 2 nd level structure, and a node having a node depth of 3 corresponds to the 3 rd level structure.

Next, as shown in FIG. 2B, in step S230, nodes in the multi-way tree structure are traversed based on the node depth to construct a run queue. According to some embodiments of the present disclosure, nodes are arranged in a run queue in a connection order between nodes in a multi-way tree structure, and node information associated with the nodes is included in the run queue.

Fig. 4 is a schematic diagram illustrating node information of a run queue according to an embodiment of the present disclosure, and as shown in fig. 4, the run queue includes nodes, such as node 1, node 2, node n, and the like, arranged according to a connection order between the nodes, and includes node information corresponding to the nodes, such as node 1, node 2, node n, and the like, respectively.

According to some embodiments of the present disclosure, the node information may include a node name, an operation indicator, parent node information, and child node information. As an example, the node name may be the name of the module corresponding to the node. As an example, node 1 in the running queue corresponds to sub-module 1, and then the node name of node 1 may be denoted as "sub-module 1", as another example, the node name may also be other information for identifying the module corresponding to the node, such as a number, which is not limited herein. The run indicator may be used to indicate whether the emulation function of the node is runnable, and the setting of the run indicator will be described below. For example, the parent node information may include a node name of a parent node directly connected to the node, and the child node information includes a node name of a child node directly connected to the node. As an example, in the case that the node 1 corresponds to the sub-module 1, the parent node information of the node 1 may include the name of the module 1 directly connected to the sub-module 1, i.e., "module 1", while the node 1 does not have a child node, and thus, the child node information of the node 1 may be empty.

According to some embodiments of the present disclosure, the above step S230 may include: according to the depth-first principle, starting from a leaf node with the maximum node depth, sequentially visiting to a root node with the node depth of 1 in the multi-branch tree from the leaf node with the maximum node depth to the root node with the node depth from large to small so as to construct a running queue. Leaf nodes represent nodes without children, and root nodes represent nodes without parents. For example, in the multi-way tree structure shown in fig. 3, the sub-modules 1 to 4 are leaf nodes with a node depth of 3, the module 2 is a leaf node with a node depth of 2, the module 1 is a parent node with a node depth of 2, and the top module is a root node with a node depth of 1.

According to some embodiments of the present disclosure, the traversal may be: and sequentially storing the node information of the nodes in the running queue according to the node depth of the nodes. For example, node information having a greater node depth is preferentially stored in the run queue because the execution hierarchy of the node having the greater node depth precedes the execution hierarchy of the node having the lesser node depth. Further, according to some embodiments of the present disclosure, for nodes having the same node depth, it is first determined whether the node is a leaf node or a parent node, node information of the leaf node is stored before node information of the parent node in a run queue, and a run indicator of the leaf node is set to a flag 1 and a run indicator of the parent node is set to a flag 2, where the run indicator indicates that the emulation function of the node is runnable for the flag 1 and the run indicator indicates that the emulation function of the node is not runnable for the flag 2. As an example, the indicia of the run indicator may be represented by a 1-bit binary number, e.g., a binary number of 1 corresponds to indicia 1 and a binary number of 0 corresponds to indicia 2.

According to some embodiments of the present disclosure, for the maximum node depth N, for example, in the multi-way tree structure shown in fig. 3, the maximum node depth is 3, that is, N is 3, in which case sequentially storing node information of the nodes in the run queue according to the node depths of the nodes includes: and determining a node with the node depth of N, and storing node information of the node with the node depth of N in a running queue, wherein for the node with the node depth of 3, a running indicator of the node is set to be 1, and the node name of a father node directly connected with the node is used as father node information of the node.

In fig. 3, the node with the node depth of 3 is the node of the sub-module 1 to the sub-module 4, and for such leaf nodes, which do not include the sub-node, the priority may be performed, and thus, the node information of the node with the node depth of 3 may be sequentially stored in the run queue. For the node information of the node with the node depth of 3, its operation indicator is first set to a flag 1 for indicating that the simulation functions of these 4 nodes are in an executable state, and then the node name of the parent node directly connected thereto (i.e., "module 1") may be stored in the operation queue as its parent node information.

According to some embodiments of the present disclosure, for a node depth M with a depth less than N, i.e., for nodes other than the node having the maximum node depth, for example, M is 1 or 2, in which case sequentially storing node information of the nodes in the run queue according to the node depths of the nodes comprises: firstly, determining a leaf node with a node depth of M, and storing node information of the leaf node with the node depth of M in an operation queue, wherein an operation indicator of the leaf node with the node depth of M is set as a mark 1, and a node name of a father node directly connected with the leaf node with the node depth of M is used as father node information of the leaf node with the node depth of M; next, a parent node having a node depth of M is determined, and node information of the parent node having the node depth of M is stored in the run queue, wherein the run indicator of the parent node having the node depth of M is set to a flag 2, a node name of the parent node directly connected to the parent node having the node depth of M is taken as the parent node information of the parent node having the node depth of M, and a node name of a child node directly connected to the parent node having the node depth of M is taken as the child node information of the parent node having the node depth of M.

In the above process of sequentially storing node information of nodes in the run queue according to the node depths of the nodes, the nodes are arranged based on the connection order between the nodes, which characterizes the execution hierarchy of the modules in the model framework. Therefore, in the process of executing verification based on the node information of the nodes sequentially stored in the running queue, the executing process of the simulation function can be executed according to the sequentially arranged node information, and the time-consuming verification program for serially executing the simulation functions of all modules in the related art is avoided.

Fig. 5 shows a schematic diagram of a multi-way tree structure with a maximum node depth of 5, and a run queue building process according to some embodiments of the present disclosure will be described below with the multi-way tree structure shown in fig. 5 as a specific example.

As shown in fig. 5, the maximum node depth of the multi-way tree structure is 5, first, the whole multi-way tree starts to be traversed according to the depth-first principle, and the node information of the traversed nodes is put into the running queue, that is, the traversal order is from bottom to top in the multi-way tree structure, for example, the nodes with the node depth of 5 are traversed first, then the nodes with the node depth of 4 are traversed, and so on until the root node with the node depth of 1 (shown as a processor Core (Core) in fig. 5) is traversed.

For the traversed current node, firstly, whether the current node is a leaf node is determined, if yes, the operation indicator of the leaf node is set to be 1, and the node name of the father node connected with the leaf node is stored in the father node information of the father node. As an example, for a node "command execution submodule 1" having a node depth of 5, its operation indicator may be set to a flag 1, and the name of a parent node directly connected thereto (i.e., "command execution module") may be stored in its parent node information. If the current node is a parent node, for example, a command execution module with a node depth of 4, setting the operation indicator of the current node to a flag 2, and putting all child nodes of the current node into the child node information thereof, wherein the child nodes of the command execution module with a node depth of 4 include: a command run sub-module 1 and a command run sub-module 2.

Traversal is performed in the manner described above until the root node is traversed. By way of example, for the multi-way tree structure shown in fig. 5, the order of traversing the nodes within the resulting run queue may be: { command operation sub-module 1 (operation core 1), command operation sub-module 2 (operation core 1), operation sub-module 1 (operation core 1), operation sub-module 2 (operation core 1), command operation sub-module 1 (operation core 2), command operation sub-module 2 (operation core 2), operation sub-module 1 (operation core 2), and operation sub-module 2 (operation core 2) }. Here, it is understood that the above-illustrated "command execution sub-module 1 (arithmetic core 1)" represents the command execution sub-module 1 corresponding to the arithmetic core 1 in fig. 5. Since the nodes listed above are leaf nodes with a maximum node depth of 5, these nodes are preferably placed in the run queue, and their run indicators may be set to flag 1, i.e., indicating that the emulation functions of these nodes are in a runnable state.

Next to the leaf node with the above node depth of 5, the running queue may be a node with a node depth of 4, and for nodes with the same node depth of 4, the order is, starting from the leaf node, that: { command receiving module (arithmetic core 1), arithmetic control module (arithmetic core 1), data cache (arithmetic core 1), control module 1 (arithmetic core 1), control module 2 (arithmetic core 1), command receiving module (arithmetic core 2), arithmetic control module (arithmetic core 2), data cache (arithmetic core 2), control module 1 (arithmetic core 2), and control module 2 (arithmetic core 2) }. Since the above nodes are leaf nodes with a node depth of 4, the operation indicators of the nodes can be set to the flag 1, i.e. the simulation functions of the nodes are also in an operable state.

After the leaf node with the node depth of 4, the parent node with the node depth of 4 can be obtained, and the arrangement order is as follows: { a command execution module (arithmetic core 1), an arithmetic element (arithmetic core 1), a command execution module (arithmetic core 2), and an arithmetic element (arithmetic core 2) }. Since this part of the nodes are parent nodes, their running indicators can be set to the flag 2, indicating that the emulation functions of these nodes are in an inoperable state. In other words, it is necessary for the parent node to execute the simulation function of the parent node after executing the simulation function of the child node included in the parent node, so that the running state of the parent node can be marked as non-runnable in the process of building the running queue, so as to avoid confusion of the execution hierarchy among the nodes, for example, the simulation function of the command execution module is executed before the command execution submodule, which is not in accordance with the execution hierarchy order of the modules in the model framework.

According to the process described above, nodes in the multi-way tree structure are traversed until the root node with the node depth of 1 is traversed.

For the above implementation of the operation indicator, for example, an operation Flag may be set, where Flag ═ 1 indicates that the emulation function is in an executable state, i.e., as Flag 1, and Flag ═ 0 indicates that the emulation function is in an inoperable state, i.e., as Flag 2. For another example, two indicators may also be set to indicate the operating state of the emulation function of the module, e.g., setting the parameters Ready and Pending, where Ready 1 indicates that the emulation function is in an operable state, i.e., as flag 1, and Pending 1 indicates that the emulation function is in an inoperable state, i.e., as flag 2. The above operation indicator may also be implemented in other ways, and is not limited herein. In addition, it is understood that the above node information may also include other indication information besides the above node name, operation indicator, parent node information and child node information, and may be set according to actual requirements.

According to some embodiments of the present disclosure, according to the depth-first principle, starting from a leaf node having the largest node depth, sequentially visiting in order of node depth from large to small in a multi-way tree to a root node having a node depth of 1, further comprising: for nodes in the multi-branch tree structure, determining active nodes with data information, and marking the active nodes and parent nodes and child nodes directly connected with the active nodes as active states. In these embodiments, a busy mechanism is introduced, adding an Active state (Active) attribute to the module. In a simulation clock cycle, only for a node marked as an active state in the current multi-branch tree structure, storing node information of the node in a running queue according to the node depth of the node, and not storing nodes which are not marked as the active state in the current multi-branch tree structure.

For a node where data information exists, it can be understood as a node where data transmission exists. For example, a pipe for transmitting data is shown in fig. 1, and the pipe is used for receiving data input by the module interface. When the data content in the pipeline, the nodes upstream and downstream are in busy state. Thus, the nodes may be set to an active state, and furthermore, the parent and child nodes directly connected to the active node are marked as active.

In the process of constructing the running queue by traversing the multi-branch tree structure, only the node information of the node marked as the active state is stored in the running queue. As an example, in traversing a node with a node depth of 5, assuming that only the command running submodule 2 (operation core 1) and the operation submodule 2 (operation core 1) are nodes marked as an active state, for a node with a node depth of 5, the nodes put into the run queue may include { the command running submodule 2 (operation core 1), the operation submodule 2 (operation core 1) }, while other nodes with a node depth of 5 are not put into the run queue because they are not in an active state, that is, there is no data processing. In addition, since the command execution submodule 2 (the operation core 1) and the operation submodule 2 (the operation core 1) are in an active state, the parent nodes (the command execution module (the operation core 1) and the operation unit (the operation core 1), respectively) thereof are also marked as an active state and are sequentially placed in the execution queue according to the depth-first principle.

In embodiments applying the busy mechanism described above, the nodes within the run queue comprise only partially active, i.e., busy, modules of the plurality of modules of the integrated circuit device. This makes it possible to avoid the verification process for modules that are not busy during the simulation verification process based on the run queue, and to save unnecessary verification time compared to the manner in which all modules are verified serially in the related art. For example, in the verification process for a certain simulation clock cycle, the verification task degree is relatively simple, and there may be many modules in an idle (idle) state, in this case, in the simulation verification of the run queue constructed based on the above busy mechanism, the simulation function of the module that is not marked as an active state is no longer run, so that the verification efficiency of the integrated circuit can be greatly improved, which is more significant for a large integrated circuit such as a GPGPU.

The process of building a run queue, which may be referred to as a post-processing stage of simulation verification, is described above in connection with fig. 2B-5. Based on the built run queue, a design simulation method according to some embodiments of the present disclosure may determine available processors and then run simulation functions of modules within the run queue using the determined available processors to implement simulation verification of the integrated circuit device. The process of performing simulation verification based on the run queue may be referred to as a run phase, and the implementation process of the run phase will be described below.

According to some embodiments of the present disclosure, the number of available processors may be one or more, and in the case where the number of available processors is one, the simulation functions of the module may be verified sequentially from the run queue. In the case where the number of available processors is plural, running the simulation function of the module within the run queue using the available processors includes: simulation functions for multiple modules within the run queue are run in parallel using multiple available processors. The parallel operation may be that the simulation functions of different modules are respectively operated by a plurality of processors to improve the verification efficiency.

Specifically, running the simulation functions of the modules in the run queue in parallel may include: determining the number of nodes with running indicators marked by 1 in a running queue; and running, in parallel, a simulation function of the nodes marked with 1 running indicators within the run queue using the available processors based on the number of available processors and the number of nodes marked with 1 running indicators. It is assumed that the number of available processors is denoted by S, the number of nodes whose operation indicators are 1 is denoted by T, and S and T are integers equal to or greater than 0. In the case where it is determined that T is less than or equal to S, that is, the number of modules currently in a runnable state is not greater than the number of available processors, the simulation functions of the T nodes with running indicators of 1 in the run queue may be run in parallel using the available processors. In the case where it is determined that T is greater than S, that is, the number of modules currently in a runnable state is greater than the number of available processors, the simulation functions of the first S running indicators in the running queue, which are nodes marked with 1, may be run in parallel using the available processors.

In some embodiments according to the present disclosure, the run queue may also be updated during the run phase after the available processors have run the simulation verification functions of the module. Specifically, updating the run queue may include: representing each node after the simulation function is executed as a first node, determining a parent node of the first node according to the parent node information of the first node, and updating the node information of the parent node of the first node. Next, the node information of the first node may be deleted in the run queue; and continuously updates until the run queue is empty.

For example, taking the first node as the node 1 as an example, after the simulation function of the module corresponding to the node 1 is run by using an available processor, the parent node (i.e., the second node) of the node 1 may be determined according to the parent node information of the node 1 in the run queue, and the node information of the parent node may be updated, for example, the node name of the node 1 in the child node information of the parent node is deleted. Then, the node information of the node 1 may be deleted in the run queue, indicating that the simulation function of the node 1 has completed the simulation verification process.

According to some embodiments of the present disclosure, updating the node information of the parent node of the first node further comprises: determining whether child node information of the second node is null; in case that it is determined that the child node information of the second node is empty, the operation indicator of the second node is set to a flag 1. For example, after deleting the node name of the node 1 in the child node information of the parent node, it may be determined whether the child node information of the parent node is empty, and if the node name is determined to be empty, it indicates that all the child nodes of the parent node have completed simulation verification. This is because the child nodes of the parent node are all operational, and the parent node is exposed as a child node and is in an operational state.

In addition, the number of available processors may also be updated after the above-described node information update process is completed. This is because the available processor running the node 1 has completed the authentication task for the node 1, which can be reused as an available processor for authentication of other modules.

The above parallel validation process based on the run queue is looped until all nodes in the run queue complete validation, i.e., the run queue is empty, which indicates that the validation process for the current simulation clock cycle has been completed. After the run phase of the current simulation clock cycle is completed, the post-processing phase of the next simulation clock cycle may be re-entered for building a run queue for the next simulation clock cycle.

In addition, in the running phase, a monitoring program may be further configured to check whether a timeout behavior exists in the verification process for the module, and if the timeout behavior exists, it indicates that an error may exist in the simulation function of the module, and in this case, the monitoring program may control to jump out of the running phase and report the error.

FIG. 6 shows a flow diagram of a method of performing design simulation according to an embodiment of the present disclosure, wherein a verification process is shown for a certain simulation clock cycle, including a run phase and a post-processing phase.

As shown in FIG. 6, the verification process for each simulation clock cycle may be divided into a run phase 601 and a post-processing phase 602. Run may be called to the emulation verification function of all modules first, for example, with a function cycle n. Next, it may be determined whether a Trigger flag (Trigger) is 1, where the Trigger may be activated by a test stimulus or by another external module, such as a memory module. If Trigger is determined to be 1, a multi-way tree structure (e.g., as shown in fig. 3 or fig. 5) may be constructed and active nodes, i.e., nodes having data content for the current simulation clock cycle, marked. Then, the multi-way tree traversal may be performed according to the node depth and the construction process of the run queue may be performed node-by-node depth in the manner described above until the traversal to the root node.

If it is determined that Trigger is not 1, a running stage 601 may be entered, where it may be first determined whether the running queue is empty, and if the running queue is empty, it indicates that the entire model is in an idle state without running the simulation function, i.e., the verification process of the current simulation clock cycle is ended. If the run queue is not empty, the number of nodes whose run indicators in the run queue are marked 1 (denoted as T) is determined, and the number of currently available processors, CPU (denoted as S), is checked to perform simulation verification in parallel based on the determined S and T. For example, if T is less than or equal to S, the simulation functions of all nodes in a runnable state may be run in parallel, and if T is greater than S, the simulation functions of the first S nodes with running indicators marked 1 in the running queue may be run in parallel according to S.

After the simulation function of the finished node is run, the node information in the running queue can be updated. For example, the node information of the parent node (i.e., the second node) of the first node described above is updated, and the node information of the first node is deleted from the running queue. In addition, if the child node information of the second node is null, the operation indicator of the second node is set to a flag 1, which corresponds to the completion of the operation of all the child nodes of the second node, and the second node as a parent node is updated to be an operable node. Then, the operation queue can be returned to be judged whether to be empty, if not, the parallel verification process can be continuously executed according to the above process until the operation queue is empty, and the verification process of the current simulation clock period is finished.

By using the design simulation method for the integrated circuit device provided by the embodiment of the disclosure, the simulation verification process for the integrated circuit device can be performed based on the operation queue constructed by at least one module selected from a plurality of modules of the integrated circuit device, thereby being beneficial to improving the model simulation efficiency of the integrated circuit and shortening the design cycle of the integrated circuit. In addition, a parallel simulation verification process can be carried out based on the constructed running queue, so that the verification speed is improved, and the speed is obviously improved under the condition of extremely high parallelism. For example, for a multi-branch tree structure constructed by a model framework of a GPGPU including 8734 nodes and a maximum node depth of 14 levels, compared with the related art in which verification processes of all modules are executed serially, a simulation verification rate can be improved by several tens of times. In addition, in some embodiments according to the present disclosure, a busy mechanism is further introduced in the process of constructing the run queue, only nodes with data processing contents are placed in the run queue, and modules in an idle state are not placed in the run queue, so that the simulation verification time for idle modules is avoided, and in this case, the simulation verification rate based on the run queue can be improved by more than several hundred times. The verification speed of the model is improved, so that the verification time cost of the integrated circuit is reduced, the design period of the integrated circuit is shortened, and the market demand for continuously shortening the design and production period of the integrated circuit is met.

FIG. 7 shows a schematic block diagram of a design simulation apparatus according to an embodiment of the present disclosure. As shown in fig. 7, the design simulation apparatus 1000 may include a run queue obtaining unit 1010, a calculating unit 1020, and a simulation verifying unit 1030.

According to some embodiments of the present disclosure, the run queue obtaining unit 1010 may be configured to: a run queue constructed based on a plurality of modules of an integrated circuit device is obtained. For example, the integrated circuit device may be a GPGPU and the plurality of modules of the integrated circuit device may be modules included in a model framework of the integrated circuit device, e.g., the model framework may be as shown in fig. 1. It will be appreciated that the model framework shown in fig. 1 is merely illustrative, and the hierarchy may be more than 3 levels, e.g. several or several tens of levels, for different integrated circuit structures, and further, the modules included in the model framework are not limited to the modules shown in fig. 1. According to some embodiments of the present disclosure, the run queue may include at least one module selected from a plurality of modules. For example, the run queue may include a portion of the modules in the model framework or may include all of the modules in the model framework. Further, according to some embodiments of the present disclosure, the at least one module in the run queue is ordered based on an execution hierarchy of the at least one module. The execution level can be understood as the execution order relationship among the modules shown in fig. 1, for example, the execution order of the module 1 should be after the execution order of the sub-modules 1-4 included in the module, that is, the execution logic should process the module at the 3 rd level before processing the module at the 2 nd level.

According to some embodiments of the present disclosure, the computing unit 1020 may be configured to: an available processor is determined. The available processors may be processors that can be used to perform simulation verification. The number of available processors may be multiple, and further, the processors may be distributed, or processors in cloud computing, without limitation.

According to some embodiments of the present disclosure, simulation verification unit 1030 may be configured to: the available processors are used to run simulation functions for modules within the run queue for simulation verification of the integrated circuit device.

By performing the simulation verification of the integrated circuit using the design simulation apparatus 1000 according to some embodiments of the present disclosure, it is possible to avoid performing the simulation verification process of each module serially in the related art, and to perform the simulation verification process for the integrated circuit device based on the operation queue constructed by at least one module selected from the plurality of modules of the integrated circuit device, which is advantageous to improve the model simulation efficiency of the integrated circuit and shorten the integrated circuit design cycle. Especially for large integrated circuit designs such as GPGPU, the simulation verification process based on the operation queue can speed up the model simulation, thereby saving the time cost of the circuit design.

According to some embodiments of the present disclosure, as shown in fig. 7, the design simulation apparatus 1000 may further include a run queue building unit 1040. The run queue building unit 1040 may be configured to: a hierarchy between a plurality of modules of an integrated circuit device is obtained. The hierarchy among the modules herein may refer to a model framework generated for an integrated circuit to be designed, the model framework being composed of a plurality of modules having a hierarchical structure, for example, as shown in fig. 1.

Next, the run queue building unit 1040 may be configured to: based on the hierarchical structure, a multi-way tree structure corresponding to at least one module is generated. In some embodiments according to the present disclosure, at least one module may be all modules in the model architecture, or may be a part of the modules (for example, only a part of the modules for which verification is desired is included), and is not limited herein. Wherein the multi-way tree structure characterizes different modules in different nodes, each child-characterized module being included in a parent-characterized module corresponding to a child node, the execution hierarchy of at least one module being characterized in the order of connection between the child and parent nodes with node depth.

Next, the run queue building unit 1040 may be configured to: and traversing the nodes in the multi-branch tree structure based on the node depth to construct a running queue. According to some embodiments of the present disclosure, nodes are arranged in a run queue in a connection order between nodes in a multi-way tree structure, and node information associated with the nodes is included in the run queue.

According to some embodiments of the disclosure, the node information may include: the simulation function of the node comprises a node name, an operation indicator, father node information and child node information, wherein the operation indicator is used for indicating whether the simulation function of the node can operate or not, the father node information comprises the node name of a father node directly connected with the node, and the child node information comprises the node name of a child node directly connected with the node. Reference may be made to the description above in connection with fig. 4 for node information, which is not repeated here.

According to some embodiments of the present disclosure, run queue build unit 1040 is configured to: according to the depth-first principle, starting from a leaf node with the maximum node depth, sequentially visiting to a root node with the node depth of 1 in the multi-branch tree from the leaf node with the maximum node depth to the root node with the node depth from large to small so as to construct a running queue. Leaf nodes represent nodes without children, and root nodes represent nodes without parents. For example, in the multi-way tree structure shown in fig. 3, sub-modules 1 to 4 are leaf nodes, and the top module is a root node.

According to some embodiments of the present disclosure, the running queue building unit 1040 may perform the traversal described above, including: and sequentially storing the node information of the nodes in the running queue according to the node depth of the nodes. For example, node information having a greater node depth is preferentially stored in the run queue because the execution hierarchy of the node having the greater node depth precedes the execution hierarchy of the node having the lesser node depth. Further, according to some embodiments of the present disclosure, for nodes having the same node depth, it is first determined whether the node is a leaf node or a parent node, node information of the leaf node is stored before node information of the parent node in a run queue, and a run indicator of the leaf node is set to a flag 1 and a run indicator of the parent node is set to a flag 2, where the run indicator indicates that the emulation function of the node is runnable for the flag 1 and the run indicator indicates that the emulation function of the node is not runnable for the flag 2.

According to some embodiments of the present disclosure, for the maximum node depth N, for example, in the multi-way tree structure shown in fig. 3, the maximum node depth is 3, that is, N is 3, in which case the run queue building unit 1040 performs the step of sequentially storing the node information of the nodes in the run queue according to the node depths of the nodes including: and determining a node with the node depth of N, and storing node information of the node with the node depth of N in a running queue, wherein for the node with the node depth of 3, a running indicator of the node is set to be 1, and the node name of a father node directly connected with the node is used as father node information of the node.

According to some embodiments of the present disclosure, for a node depth M that is less than N, the run queue build unit 1040 is configured to: determining a leaf node with a node depth of M, and storing node information of the leaf node with the node depth of M in a running queue, wherein a running indicator of the leaf node with the node depth of M is set to be a mark 1, and a node name of a father node directly connected with the leaf node with the node depth of M is used as father node information of the leaf node with the node depth of M; determining a father node with the node depth of M, and storing node information of the father node with the node depth of M in a running queue, wherein the running indicator of the father node with the node depth of M is set to be a mark 2, the node name of the father node directly connected with the father node with the node depth of M is used as the father node information of the father node with the node depth of M, and the node name of a child node directly connected with the father node with the node depth of M is used as the child node information of the father node with the node depth of M, wherein N is an integer larger than 1, and M is an integer smaller than N and larger than 1.

For the above implementation of the operation indicator, for example, an operation Flag may be set, where Flag ═ 1 indicates that the emulation function is in an executable state, and Flag ═ 0 indicates that the emulation function is in an inoperable state. For another example, two indicators may be set to indicate the operating state of the emulation function of the module, for example, setting parameters Ready and Pending, where Ready 1 indicates that the emulation function is in an operable state and Pending 1 indicates that the emulation function is in an inoperable state. The above operation indicator may also be implemented in other ways, and is not limited herein. In addition, it is understood that the above node information may also include other indication information besides the above node name, operation indicator, parent node information and child node information, and may be set according to actual requirements.

According to some embodiments of the present disclosure, the run queue building unit 1040 may be further configured to: for nodes in the multi-branch tree structure, determining active nodes with data information, and marking the active nodes and parent nodes and child nodes directly connected with the active nodes as active states. In these embodiments, a busy mechanism is introduced, adding an Active state (Active) attribute to the module. In one simulation clock cycle, the run queue build unit 1040 may be configured to: and only for the node marked as the activated state in the current multi-branch tree structure, storing the node information of the node in the running queue according to the node depth of the node, and not storing the node which is not marked as the activated state in the current multi-branch tree structure.

According to some embodiments of the present disclosure, the number of available processors may be one or more. In the case where the number of available processors is plural, the simulation verification unit 1030 may be configured to: simulation functions for multiple modules within the run queue are run in parallel using multiple available processors. The parallel operation may be that the simulation functions of different modules are respectively operated by a plurality of processors to improve the verification efficiency.

According to some embodiments of the present disclosure, simulation verification unit 1030 may be configured to: determining the number of nodes with running indicators marked by 1 in a running queue; and running, in parallel, a simulation function of the nodes marked with 1 running indicators within the run queue using the available processors based on the number of available processors and the number of nodes marked with 1 running indicators.

According to some embodiments of the present disclosure, the number of available processors is denoted as S, the number of nodes having an operation indicator of 1 is denoted as T, S and T are integers greater than or equal to 0, and the simulation verification unit 1030 may be configured to: in the case that it is determined that T is less than or equal to S, running, in parallel, simulation functions of the T running indicators in the running queue as the node marked 1 by using available processors; alternatively, in the event that it is determined that T is greater than S, the simulation function for the first S run indicators in the run queue, labeled 1, is run in parallel using the available processors.

According to some embodiments of the present disclosure, the simulation verification unit 1030 may be further configured to: and updating the running queue in the running process.

According to some embodiments of the present disclosure, simulation verification unit 1030 may be configured to: representing each node after the simulation function is operated as a first node, determining a father node of the first node according to father node information of the first node, and updating the node information of the father node of the first node; deleting the node information of the first node in the running queue; and continuously updates until the run queue is empty.

According to some embodiments of the disclosure, a parent node of a first node is represented as a second node. The simulation verification unit 1030 may be configured to: deleting the node name of the first node from the child node information of the second node; determining whether child node information of the second node is null; in case that it is determined that the child node information of the second node is empty, the operation indicator of the second node is set to a flag 1.

According to some embodiments of the present disclosure, in the running phase, the design simulation apparatus 1000 may further be provided with a monitoring program, so as to check whether a timeout behavior exists in the verification process for the module, and if the timeout behavior exists, it indicates that there may be an error in the simulation function of the module, in this case, the monitoring program may control to jump out of the running phase and report the error.

With regard to the specific implementation process of the steps performed by the design simulation apparatus 1000, reference may be made to the design simulation method according to the present disclosure described above with reference to the drawings, and the description will not be repeated here.

According to still another aspect of the present disclosure, there is also provided a design simulation apparatus for an integrated circuit device. FIG. 8 shows a schematic block diagram of a design simulation device according to an embodiment of the present disclosure.

As shown in fig. 8, design simulation device 2000 may include a processor 2010 and a memory 2020. In accordance with an embodiment of the present disclosure, stored in memory 2020 is computer readable code, which when executed by processor 2010, may perform a design simulation method as described above.

Processor 2010 may perform various actions and processes in accordance with programs stored in memory 2020. In particular, the processor 2010 may be an integrated circuit having signal processing capabilities. The processor may be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. Various methods, steps and logic blocks disclosed in embodiments of the invention may be implemented or performed. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, which may be the X86 architecture or the ARM architecture or the like.

The memory 2020 stores computer-executable instruction code that, when executed by the processor 2010, is used to implement a design simulation method according to an embodiment of the present disclosure. The memory 2020 may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), or flash memory. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of example and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic Random Access Memory (SDRAM), Double Data Rate Synchronous Dynamic Random Access Memory (DDRSDRAM), Enhanced Synchronous Dynamic Random Access Memory (ESDRAM), Synchronous Link Dynamic Random Access Memory (SLDRAM), and direct memory bus random access memory (DR RAM). It should be noted that the memory described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

Methods or apparatus in accordance with embodiments of the present disclosure may also be implemented by way of the architecture of computing device 3000 shown in FIG. 9. As shown in fig. 9, computing device 3000 may include a bus 3010, one or more CPUs 3020, a Read Only Memory (ROM)3030, a Random Access Memory (RAM)3040, a communication port 3050 to connect to a network, input/output components 3060, a hard disk 3070, and the like. A storage device in the computing device 3000, such as the ROM 3030 or the hard disk 3070, may store various data or files used in the processing and/or communication of the design simulation method provided by the present disclosure, as well as program instructions executed by the CPU. Computing device 3000 can also include user interface 3080. Of course, the architecture shown in FIG. 9 is merely exemplary, and one or more components of the computing device shown in FIG. 9 may be omitted when implementing different devices, as desired. By way of example, the computing device 3000 described above may be implemented as a computer installed with a design simulation application for an integrated circuit, and is not limited thereto.

For example, an application program for integrated circuit design simulation may be installed in the computing device 3000, and the computing device 3000 may be a multi-core processor including a plurality of CPUs 3020. Therefore, the computing equipment can construct or acquire a running queue constructed based on a plurality of modules of the integrated circuit device, and run simulation functions of the modules in the running queue in parallel by means of a plurality of CPUs (central processing units) to perform a parallel simulation verification process on the integrated circuit device, so that the simulation verification speed of the integrated circuit is improved, and the circuit design period is shortened.

According to yet another aspect of the present disclosure, there is also provided a non-transitory computer-readable storage medium. Fig. 10 shows a schematic diagram of a non-transitory computer-readable storage medium according to an embodiment of the disclosure.

As shown in fig. 10, computer-readable storage media 4020 has instructions stored thereon, such as computer-readable instructions 4010. The computer readable instructions 4010, when executed by a processor, can perform the design simulation method described with reference to the above figures. Computer-readable storage media include, but are not limited to, volatile memory and/or nonvolatile memory, for example. Volatile memory can include, for example, Random Access Memory (RAM), cache memory (or the like). The non-volatile memory may include, for example, Read Only Memory (ROM), a hard disk, flash memory, and the like. For example, the computer-readable storage medium 4020 may be connected to a computing device such as a computer, and then, in the case where the computing device executes the computer-readable instructions 4010 stored on the computer-readable storage medium 4020, the design simulation method as described above may be performed.

According to yet another aspect of the disclosure, there is also provided a computer program product or computer program comprising computer readable instructions stored in a computer readable storage medium. The computer readable instructions may be read by a processor of a computer device from a computer readable storage medium, and the computer readable instructions are executed by the processor to cause the computer device to perform the design simulation method described in the above embodiments.

By using the design simulation method, device, equipment and readable storage medium for the integrated circuit device provided by the embodiment of the disclosure, the simulation verification process for the integrated circuit device can be performed based on the running queue constructed by at least one module selected from a plurality of modules of the integrated circuit device, so that the model simulation efficiency of the integrated circuit is improved, and the design cycle of the integrated circuit is shortened. In addition, a parallel simulation verification process can be carried out based on the constructed running queue, so that the verification speed is improved, and the speed is obviously improved under the condition of extremely high parallelism. In addition, in some embodiments according to the present disclosure, a busy mechanism is further introduced in the process of constructing the run queue, only nodes with data processing contents are placed in the run queue, and modules in an idle state are not placed in the run queue, so that the simulation verification time for idle modules is avoided, and in this case, the simulation verification rate based on the run queue can be improved by more than several hundred times. The verification speed of the model is improved, so that the verification time cost of the integrated circuit is reduced, the design period of the integrated circuit is shortened, and the market demand for continuously shortening the design and production period of the integrated circuit is met.

Those skilled in the art will appreciate that the disclosure of the present disclosure is susceptible to numerous variations and modifications. For example, the various devices or components described above may be implemented in hardware, or may be implemented in software, firmware, or a combination of some or all of the three.

Further, while the present disclosure makes various references to certain elements of a system according to embodiments of the present disclosure, any number of different elements may be used and run on a client and/or server. The units are merely illustrative and different aspects of the systems and methods may use different units.

Flow charts are used in this disclosure to illustrate steps of methods according to embodiments of the disclosure. It should be understood that the preceding and following steps are not necessarily performed in the exact order in which they are performed. Rather, various steps may be processed in reverse order or simultaneously. Also, other operations may be added to the processes.

It will be understood by those skilled in the art that all or part of the steps of the above methods may be implemented by instructing the relevant hardware through a computer program, and the program may be stored in a computer readable storage medium, such as a read-only memory, a magnetic or optical disk, and the like. Alternatively, all or part of the steps of the above embodiments may be implemented using one or more integrated circuits. Accordingly, each module/unit in the above embodiments may be implemented in the form of hardware, and may also be implemented in the form of a software functional module. The present disclosure is not limited to any specific form of combination of hardware and software.

Unless otherwise defined, all terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

The foregoing is illustrative of the present disclosure and is not to be construed as limiting thereof. Although a few exemplary embodiments of this disclosure have been described, those skilled in the art will readily appreciate that many modifications are possible in the exemplary embodiments without materially departing from the novel teachings and advantages of this disclosure. Accordingly, all such modifications are intended to be included within the scope of this disclosure as defined in the claims. It is to be understood that the foregoing is illustrative of the present disclosure and is not to be construed as limited to the specific embodiments disclosed, and that modifications to the disclosed embodiments, as well as other embodiments, are intended to be included within the scope of the appended claims. The present disclosure is defined by the claims and their equivalents.

Claims

1. A design simulation method for an integrated circuit device, comprising:

obtaining a run queue constructed based on a plurality of modules of the integrated circuit device, wherein the run queue comprises at least one module selected from the plurality of modules, and wherein the at least one module is sequentially arranged based on an execution hierarchy of the at least one module in the run queue;

determining an available processor; and

running, using the available processors, simulation functions of modules within the run queue for simulation verification of the integrated circuit device.

2. The method of claim 1, wherein the run queue is built based on a plurality of modules of the integrated circuit device according to the steps of:

obtaining a hierarchy between a plurality of modules of the integrated circuit device;

generating a multi-way tree structure corresponding to the at least one module based on the hierarchical structure, wherein the multi-way tree structure characterizes different modules in different nodes, each child-characterized module is included in a parent-characterized module corresponding to the child node, and an execution hierarchy of the at least one module is characterized in a connection order between the child node and the parent node having a node depth; and

traversing nodes in the multi-way tree structure based on node depth to construct the run queue.

3. The method of claim 2, wherein nodes are arranged in the run queue in the connection order and include node information associated with the nodes, the node information comprising: node name, operation indicator, parent node information and child node information,

the operation indicator is used for indicating whether the simulation function of the node can be operated, the father node information comprises the node name of the father node directly connected with the node, and the child node information comprises the node name of the child node directly connected with the node.

4. The method of claim 3, wherein the traversing the nodes in the multi-way tree structure based on node depth to build the run queue comprises:

according to the depth-first principle, starting from a leaf node with the maximum node depth, sequentially visiting to a root node with the node depth of 1 in the multi-branch tree from the leaf node with the maximum node depth to the root node with the node depth, so as to construct the running queue, wherein the leaf node represents a node without a child node, and the root node represents a node without a parent node.

5. The method of claim 4, wherein the sequentially accessing in the multi-way tree from the leaf node with the maximum node depth to the root node with the node depth of 1 in the order of the node depths from large to small according to the depth-first principle comprises:

sequentially storing node information of nodes in the run queue according to node depths of the nodes, wherein whether a node is a leaf node or a parent node is determined for nodes having the same node depth, the node information of the leaf node is stored before the node information of the parent node in the run queue, and the run indicator of the leaf node is set to a flag 1, the run indicator of the parent node is set to a flag 2,

wherein, the running indicator is marked with 1 to indicate that the simulation function of the node can be run, and the running indicator is marked with 2 to indicate that the simulation function of the node can not be run.

6. The method of claim 5, wherein the sequentially accessing in the multi-way tree from the leaf node with the maximum node depth to the root node with the node depth of 1 in the order of decreasing node depth on the depth-first principle further comprises:

for nodes in the multi-branch tree structure, determining active nodes with data information, and marking the active nodes and parent nodes and child nodes directly connected with the active nodes as active states.

7. The method of claim 6, wherein the storing node information for a node in the run queue according to the node depth for the node comprises:

in a simulation clock cycle, only for a node marked as the active state in the current multi-branch tree structure, storing node information of the node in the running queue according to the node depth of the node, and not storing nodes which are not marked as the active state in the current multi-branch tree structure.

8. The method of claim 5, wherein the sequentially storing node information of nodes in the run queue according to node depths of the nodes comprises:

for maximum node depth N:

determining a node with a node depth of N, and storing node information of the node with the node depth of N in the running queue, wherein a running indicator of the node with the node depth of N is set to be a mark 1, and a node name of a father node directly connected with the node depth of N is used as father node information of the node with the node depth of N; and

for node depths M with depths less than N:

determining a leaf node with a node depth of M, and storing node information of the leaf node with the node depth of M in the running queue, wherein a running indicator of the leaf node with the node depth of M is set to be a mark 1, and a node name of a father node directly connected with the leaf node with the node depth of M is used as father node information of the leaf node with the node depth of M;

determining a parent node with a node depth of M, and storing node information of the parent node with the node depth of M in the running queue, wherein a running indicator of the parent node with the node depth of M is set to a flag 2, a node name of a parent node directly connected with the parent node with the node depth of M is taken as the parent node information of the parent node with the node depth of M, and a node name of a child node directly connected with the parent node with the node depth of M is taken as the child node information of the parent node with the node depth of M,

wherein N is an integer greater than 1, and M is an integer less than N and greater than 1.

9. The method of claim 5, wherein the number of available processors is plural, the using the available processors to run the simulation functions of the modules within the run queue comprising:

running simulation functions for a plurality of modules within the run queue in parallel using a plurality of the available processors.

10. The method of claim 9, wherein said running simulation functions of a plurality of modules within the run queue in parallel comprises:

determining the number of nodes with running indicators marked by 1 in the running queue; and

running, using the available processors, in parallel, a simulation function of the node marked 1 as the run indicator within the run queue based on the number of available processors and the number of nodes marked 1 as the run indicator.

11. The method of claim 10, wherein the number of available processors is denoted S, the number of nodes whose running indicators are labeled 1 is denoted T, S and T are integers greater than or equal to 0,

the running the simulation functions of the nodes within the run queue in parallel with the available processors based on the number of available processors and the number of nodes with the run indicator labeled 1 comprises:

in the event that it is determined that T is less than or equal to S, running, with the available processors, simulation functions of T run indicators in the run queue as nodes labeled 1 in parallel; alternatively, the first and second electrodes may be,

in the event that it is determined that T is greater than S, a simulation function of the first S run indicators in the run queue, labeled 1, is run in parallel with the available processors.

12. The method of claim 9, wherein the running simulation functions of a plurality of modules within the run queue in parallel further comprises:

and updating the running queue in the running process.

13. The method of claim 12, wherein said updating the run queue during run comprises:

representing each node after the simulation function is operated as a first node, determining a father node of the first node according to father node information of the first node, and updating the node information of the father node of the first node;

deleting the node information of the first node in the running queue; and

the updating continues until the run queue is empty.

14. The method of claim 13, wherein the parent node of the first node is represented as a second node, and the updating the node information of the parent node of the first node comprises:

deleting the node name of the first node from the child node information of the second node;

determining whether child node information of the second node is null;

and setting the operation indicator of the second node to be 1 under the condition that the child node information of the second node is determined to be null.

15. A design simulation apparatus for an integrated circuit device, comprising:

a run queue obtaining unit configured to: obtaining a run queue constructed based on a plurality of modules of the integrated circuit device, wherein the run queue comprises at least one module selected from the plurality of modules, and wherein the at least one module is sequentially arranged based on an execution hierarchy of the at least one module in the run queue;

a computing unit configured to: determining an available processor; and

a simulation verification unit configured to run simulation functions of modules within the run queue using the available processors for simulation verification of the integrated circuit device.

16. The apparatus of claim 15, further comprising a run queue building unit configured to:

traversing nodes in the multi-way tree structure based on node depth to construct the run queue,

wherein nodes are arranged in the run queue in the connection order and include node information associated with the nodes, the node information including: the simulation function of the node comprises a node name, a running indicator, father node information and child node information, wherein the running indicator is used for indicating whether the simulation function of the node can run, the father node information comprises the node name of a father node directly connected with the node, and the child node information comprises the node name of a child node directly connected with the node.

17. The apparatus of claim 16, wherein the run queue building unit is configured to:

18. The apparatus of claim 17, wherein the run queue building unit is configured to:

19. A design simulation apparatus for an integrated circuit device, comprising:

a processor; and

memory, wherein the memory has stored therein computer readable code which, when executed by the processor, performs the design simulation method for an integrated circuit device of any of claims 1-14.

20. A non-transitory computer readable storage medium having stored thereon instructions that, when executed by a processor, cause the processor to perform a design simulation method for an integrated circuit device as recited in any of claims 1-14.