CN113420520B

CN113420520B - Integrated circuit device design simulation method, device, apparatus and readable storage medium

Info

Publication number: CN113420520B
Application number: CN202110723085.6A
Authority: CN
Inventors: 王斌; 潘于; 陈俊
Original assignee: Haiguang Information Technology Co Ltd
Current assignee: Haiguang Information Technology Co Ltd
Priority date: 2021-06-25
Filing date: 2021-06-25
Publication date: 2024-01-23
Anticipated expiration: 2041-06-25
Also published as: CN113420520A

Abstract

The present disclosure provides an integrated circuit device design simulation method, apparatus, device, and readable storage medium. The design simulation method for the integrated circuit device comprises the following steps: obtaining a run queue constructed based on a plurality of modules of the integrated circuit device, wherein the run queue includes at least one module selected from the plurality of modules, and in the run queue, the at least one module is sequentially arranged based on an execution hierarchy of the at least one module; determining available processors; and running a simulation function of the module within the run queue using the available processor for simulation verification of the integrated circuit device.

Description

Integrated circuit device design simulation method, device, apparatus and readable storage medium

Technical Field

Embodiments of the present disclosure relate to the field of integrated circuit technology, and more particularly, to an integrated circuit device design simulation method, apparatus, device, and readable storage medium.

Background

As the complexity of integrated circuits increases, the complexity of circuit designs increases. For example, there is a current dramatic increase in the computing power of computing devices, and General-purpose graphics processors (GPGPU) have very wide applications due to their parallel computing design, and furthermore, the requirements for fast iterations of integrated circuits are increasing, where the quality of the model of the integrated circuit plays an extremely important role in the overall circuit design.

Because of the nature of integrated circuit design, models are usually based on interfaces, modules, registers, and clocks to service transactions, so developers mostly abstract these generic components to build a unified model framework, and fill the functions of the model on the basis of the model framework to facilitate the subsequent design steps of simulation verification, physical design, and the like. For a large-scale integrated circuit such as a GPGPU, the method has the characteristics of multiple modules, multiple interfaces, multiple data interactions and the like, and even if the GPGPU simulation is performed on the level of a model framework, a large amount of verification time is required, and more time is required in the process of designing the hybrid simulation. Thus, how to accelerate model simulation becomes critical in the design process of integrated circuits.

Disclosure of Invention

Some embodiments of the present disclosure provide a design simulation method, apparatus, device, and readable storage medium for an integrated circuit device for increasing a model simulation rate of an integrated circuit.

According to one aspect of the present disclosure, a design simulation method for an integrated circuit device is provided. The design simulation method comprises the following steps: obtaining a run queue constructed based on a plurality of modules of the integrated circuit device, wherein the run queue includes at least one module selected from the plurality of modules, and in the run queue, the at least one module is sequentially arranged based on an execution hierarchy of the at least one module; determining available processors; and running a simulation function of the module within the run queue using the available processor for simulation verification of the integrated circuit device.

According to some embodiments of the present disclosure, a run queue is built based on a plurality of modules of an integrated circuit device according to the following steps: acquiring a hierarchical structure among a plurality of modules of an integrated circuit device; generating a multi-way tree structure corresponding to the at least one module based on the hierarchy, wherein the multi-way tree structure characterizes different modules with different nodes, each child node characterized module is included in a parent node characterized module corresponding to a child node, and the execution hierarchy of the at least one module is characterized in a connection order between the child node and the parent node with a node depth; and traversing nodes in the multi-way tree structure based on the node depth to construct a run queue.

According to some embodiments of the present disclosure, nodes are arranged in a run queue in connection order and include therein node information associated with the nodes, the node information including: the operation indicator is used for indicating whether the simulation function of the node is operable or not, the father node information comprises the node name of the father node directly connected with the node, and the child node information comprises the node name of the child node directly connected with the node.

According to some embodiments of the present disclosure, traversing nodes in a multi-way tree structure based on node depth to build a run queue includes: according to the depth-first principle, starting from a leaf node with the maximum node depth, sequentially accessing to a root node with node depth of 1 in the multi-way tree from the order of the node depth from large to small, so as to construct a running queue, wherein the leaf node represents a node without a child node, and the root node represents a node without a parent node.

According to some embodiments of the present disclosure, sequentially accessing in order of node depth from large to small in a multi-drop tree starting from a leaf node having a maximum node depth to a root node having a node depth of 1 according to a depth-first principle, comprises: the node information of the nodes is sequentially stored in a running queue according to the node depth of the nodes, wherein for the nodes with the same node depth, whether the nodes are leaf nodes or parent nodes is determined, the node information of the leaf nodes is stored in the running queue before the node information of the parent nodes, a running indicator of the leaf nodes is set to be marked 1, the running indicator of the parent nodes is set to be marked 2, the running indicator is marked 1 to indicate that the simulation function of the node is operable, and the running indicator is marked 2 to indicate that the simulation function of the node is inoperable.

According to some embodiments of the present disclosure, starting from a leaf node with the largest node depth, sequentially accessing in the multi-way tree in order of node depth from large to small to a root node with node depth 1 according to the depth-first principle, further comprising: for nodes in the multi-way tree structure, active nodes with data information are determined, and the active nodes and parent nodes and child nodes directly connected with the active nodes are marked as active states.

According to some embodiments of the present disclosure, storing node information for a node in a run queue according to a node depth of the node includes: in one simulation clock period, only for the node marked as the active state in the current multi-way tree structure, storing the node information of the node in the running queue according to the node depth of the node, and not storing the node which is not marked as the active state in the current multi-way tree structure.

According to some embodiments of the present disclosure, sequentially storing node information of nodes in a run queue according to node depths of the nodes includes: for maximum node depth N: determining a node with the node depth of N, and storing node information of the node with the node depth of N in an operation queue, wherein an operation indicator of the node with the node depth of N is set as a mark 1, and a node name of a father node directly connected with the node depth of N is used as father node information of the node with the node depth of N; and for node depths M less than N: determining a leaf node with a node depth M, and storing node information of the leaf node with the node depth M in an operation queue, wherein an operation indicator of the leaf node with the node depth M is set as a mark 1, and a node name of a father node directly connected with the leaf node with the node depth M is used as father node information of the leaf node with the node depth M; determining a parent node with a node depth M, and storing node information of the parent node with the node depth M in a running queue, wherein a running indicator of the parent node with the node depth M is set to be a mark 2, a node name of the parent node directly connected with the parent node with the node depth M is taken as the parent node information of the parent node with the node depth M, and a node name of a child node directly connected with the parent node with the node depth M is taken as the child node information of the parent node with the node depth M, wherein N is an integer larger than 1, and M is an integer smaller than N and larger than 1.

According to some embodiments of the present disclosure, the number of available processors is a plurality, and running the simulation function of the module within the run queue using the available processors includes: multiple available processors are used to run simulation functions for multiple modules within the run queue in parallel.

According to some embodiments of the present disclosure, running simulation functions of a plurality of modules within the run queue in parallel includes: determining the number of nodes with running indicators marked 1 in a running queue; and running the emulation functions of nodes running indicator marker 1 within the run queue in parallel using the available processors based on the number of available processors and the number of nodes running indicator marker 1.

According to some embodiments of the present disclosure, the number of available processors is denoted as S, the number of nodes with run indicators labeled 1 is denoted as T, S and T are integers greater than or equal to 0, running the emulation functions of the nodes within the run queue in parallel with the available processors based on the number of available processors and the number of nodes with run indicators labeled 1 comprises: in the case that T is less than or equal to S, using an available processor to run the simulation functions of the nodes with T running indicators in the running queue as markers 1 in parallel; alternatively, in the event that T is determined to be greater than S, the available processors are utilized to run the simulation functions of the nodes whose first S run indicators in the run queue are marker 1 in parallel.

According to some embodiments of the present disclosure, the simulation functions of the plurality of modules within the run queue are run in parallel, further comprising: the run queue is updated during the run.

According to some embodiments of the present disclosure, updating a run queue during a run includes: each node after the simulation function operation is completed is expressed as a first node, the father node of the first node is determined according to the father node information of the first node, and the node information of the father node of the first node is updated; deleting node information of the first node in the operation queue; and continuing to update until the run queue is empty.

According to some embodiments of the present disclosure, the parent node of the first node is represented as a second node, and updating node information of the parent node of the first node includes: deleting the node name of the first node from the child node information of the second node; determining whether child node information of the second node is empty; in the case where it is determined that the child node information of the second node is empty, the operation indicator of the second node is set to a flag 1.

According to another aspect of the present disclosure, there is also provided a design simulation apparatus for an integrated circuit apparatus, including: a run queue acquisition unit configured to: obtaining a run queue constructed based on a plurality of modules of the integrated circuit device, wherein the run queue includes at least one module selected from the plurality of modules, and in the run queue, the at least one module is sequentially arranged based on an execution hierarchy of the at least one module; a computing unit configured to: determining available processors; and a simulation verification unit configured to run simulation functions of the modules within the run queue using the available processors for simulation verification of the integrated circuit device.

According to some embodiments of the present disclosure, the design simulation apparatus further includes a run queue construction unit. The run queue construction unit is configured to: acquiring a hierarchical structure among a plurality of modules of an integrated circuit device; generating a multi-way tree structure corresponding to the at least one module based on the hierarchy, wherein the multi-way tree structure characterizes different modules with different nodes, each child node characterized module is included in a parent node characterized module corresponding to a child node, and the execution hierarchy of the at least one module is characterized in a connection order between the child node and the parent node with a node depth; and traversing nodes in the multi-way tree structure based on the node depth to construct a run queue, wherein the nodes are arranged in the run queue according to the connection sequence and comprise node information associated with the nodes, and the node information comprises: the operation indicator is used for indicating whether the simulation function of the node is operable or not, the father node information comprises the node name of the father node directly connected with the node, and the child node information comprises the node name of the child node directly connected with the node.

According to some embodiments of the present disclosure, the run queue construction unit is configured to: according to the depth-first principle, starting from a leaf node with the maximum node depth, sequentially accessing to a root node with node depth of 1 in the multi-way tree from the order of the node depth from large to small, so as to construct a running queue, wherein the leaf node represents a node without a child node, and the root node represents a node without a parent node.

According to some embodiments of the present disclosure, the run queue construction unit is configured to: the node information of the nodes is sequentially stored in a running queue according to the node depth of the nodes, wherein for the nodes with the same node depth, whether the nodes are leaf nodes or parent nodes is determined, the node information of the leaf nodes is stored in the running queue before the node information of the parent nodes, a running indicator of the leaf nodes is set to be marked 1, the running indicator of the parent nodes is set to be marked 2, the running indicator is marked 1 to indicate that the simulation function of the node is operable, and the running indicator is marked 2 to indicate that the simulation function of the node is inoperable.

According to some embodiments of the disclosure, the run queue building unit is further configured to: for nodes in the multi-way tree structure, active nodes with data information are determined, and the active nodes and parent nodes and child nodes directly connected with the active nodes are marked as active states.

According to some embodiments of the present disclosure, the run queue construction unit is configured to: in one simulation clock period, only for the node marked as the active state in the current multi-way tree structure, storing the node information of the node in the running queue according to the node depth of the node, and not storing the node which is not marked as the active state in the current multi-way tree structure.

According to some embodiments of the present disclosure, the run queue construction unit is configured to: for maximum node depth N: determining a node with the node depth of N, and storing node information of the node with the node depth of N in an operation queue, wherein an operation indicator of the node with the node depth of N is set as a mark 1, and a node name of a father node directly connected with the node depth of N is used as father node information of the node with the node depth of N; and for node depths M less than N: determining a leaf node with a node depth M, and storing node information of the leaf node with the node depth M in an operation queue, wherein an operation indicator of the leaf node with the node depth M is set as a mark 1, and a node name of a father node directly connected with the leaf node with the node depth M is used as father node information of the leaf node with the node depth M; determining a parent node with a node depth M, and storing node information of the parent node with the node depth M in a running queue, wherein a running indicator of the parent node with the node depth M is set to be a mark 2, a node name of the parent node directly connected with the parent node with the node depth M is taken as the parent node information of the parent node with the node depth M, and a node name of a child node directly connected with the parent node with the node depth M is taken as the child node information of the parent node with the node depth M, wherein N is an integer larger than 1, and M is an integer smaller than N and larger than 1.

According to some embodiments of the present disclosure, the number of available processors is a plurality, and the emulation-verification unit is configured to: multiple available processors are used to run simulation functions for multiple modules within the run queue in parallel.

According to some embodiments of the present disclosure, the simulation verification unit is configured to: determining the number of nodes with running indicators marked 1 in a running queue; and running the emulation functions of nodes running indicator marker 1 within the run queue in parallel using the available processors based on the number of available processors and the number of nodes running indicator marker 1.

According to some embodiments of the present disclosure, the number of available processors is denoted S, the number of nodes with run indicator labeled 1 is denoted T, S and T are integers greater than or equal to 0, and the emulation prevention unit is configured to: in the case that T is less than or equal to S, using an available processor to run the simulation functions of the nodes with T running indicators in the running queue as markers 1 in parallel; alternatively, in the event that T is determined to be greater than S, the available processors are utilized to run the simulation functions of the nodes whose first S run indicators in the run queue are marker 1 in parallel.

According to some embodiments of the disclosure, the simulation verification unit is further configured to: the run queue is updated during the run.

According to some embodiments of the present disclosure, the simulation verification unit is configured to: each node after the simulation function operation is completed is expressed as a first node, the father node of the first node is determined according to the father node information of the first node, and the node information of the father node of the first node is updated; deleting node information of the first node in the operation queue; and continuing to update until the run queue is empty.

According to some embodiments of the present disclosure, the parent node of the first node is denoted as the second node, and the simulation verification unit is configured to: deleting the node name of the first node from the child node information of the second node; determining whether child node information of the second node is empty; in the case where it is determined that the child node information of the second node is empty, the operation indicator of the second node is set to a flag 1.

According to yet another aspect of the present disclosure, there is also provided a design simulation apparatus for an integrated circuit device. The design simulation apparatus for an integrated circuit device includes: a processor; and a memory, wherein the memory has stored therein computer readable code which, when executed by the processor, performs the design simulation method for the integrated circuit device as described above.

According to yet another aspect of the present disclosure, there is also provided a non-transitory computer-readable storage medium having instructions stored thereon, which when executed by a processor, cause the processor to perform the design simulation method for an integrated circuit device as described above.

By using the design simulation method, the device, the equipment and the readable storage medium for the integrated circuit device, which are provided by the embodiment of the invention, the simulation verification process for the integrated circuit device can be carried out based on the running queue constructed by at least one module selected from a plurality of modules of the integrated circuit device, thereby being beneficial to improving the model simulation rate of the integrated circuit and shortening the design period of the integrated circuit.

Drawings

In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings may be obtained according to these drawings without inventive effort to a person of ordinary skill in the art.

FIG. 1 shows a schematic diagram of a model framework of an integrated circuit;

FIG. 2A illustrates a flow chart of a design simulation method in accordance with an embodiment of the present disclosure;

FIG. 2B illustrates a flow diagram for building a run queue according to an embodiment of the present disclosure;

FIG. 3 shows a schematic diagram of a multi-way tree structure corresponding to the model framework of FIG. 1;

FIG. 4 illustrates a schematic diagram of node information of a run queue according to an embodiment of the present disclosure;

FIG. 5 shows a schematic diagram of a multi-way tree structure with a maximum node depth of 5;

FIG. 6 illustrates an overall flow diagram of a design simulation method in accordance with an embodiment of the present disclosure;

FIG. 7 shows a schematic block diagram of a design simulation apparatus in accordance with an embodiment of the present disclosure;

FIG. 8 shows a schematic block diagram of a design simulation device in accordance with an embodiment of the present disclosure;

FIG. 9 illustrates a schematic diagram of an architecture of an exemplary computing device, according to an embodiment of the present disclosure;

fig. 10 shows a schematic diagram of a non-transitory computer-readable storage medium according to an embodiment of the disclosure.

Detailed Description

The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure. It will be apparent that the described embodiments are merely embodiments of a portion, but not all, of the present disclosure. All other embodiments, which can be made by one of ordinary skill in the art without the need for inventive faculty, are intended to be within the scope of the present disclosure, based on the embodiments in this disclosure.

Furthermore, as shown in the present disclosure and claims, unless the context clearly indicates otherwise, the words "a," "an," "the," and/or "the" are not specific to the singular, but may include the plural. The terms "first," "second," and the like, as used in this disclosure, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. Likewise, the word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect.

For integrated circuits, the high manufacturing cost and the irreversible development time cost make each link in the circuit design process not tolerant to a bit of errors, and as the complexity of the integrated circuits increases, the space and range required for verification are also increased, so that the time required for verification is also longer. The full and rapid verification process is beneficial to timely finding errors in the design, so that the design success rate is improved.

For large integrated circuit structures, such as GPGPUs, the model framework of the circuit structure is typically abstracted into multiple modules having a hierarchical structure based on a clock. In particular, for the division of modules, it is understood that a large design is divided into small sub-designs, for example, a circuit as a whole may be divided into a plurality of modules according to the functions implemented. Factors that need to be considered for the division of the modules may include, for example, the number of modules divided, signal interconnections between modules, and the like.

Schematically, fig. 1 shows a schematic diagram of a model framework having a hierarchical structure. In fig. 1 a model framework is shown with a 3-level structure, wherein the 1-level structure comprises a top level module, the 2-level structure comprises a module 1 and a module 2, and the 3-level structure comprises a sub-module 1, a sub-module 2, a sub-module 3 and a sub-module 4. In the model framework there may be data transmissions between the modules, which are shown in fig. 1 as pipes, e.g. pipe 1, sub-pipe 1 and sub-pipe 2, e.g. data may be transmitted from sub-module 1 to sub-module 2 through sub-pipe 1.

In the related art, in order to perform simulation verification on a model frame as shown in fig. 1, for each simulation clock cycle, a serial verification process is required for modules within the model frame in order of the hierarchical structure from bottom to top. For example, for one of the simulation clock cycles, the simulation functions of each module need to be executed one by one in the order of sub-module 1, sub-module 2, sub-module 3, sub-module 4, module 1, module 2, top-level module to verify whether each module can implement its design function.

Although the above serial verification method is easy to implement, this also seriously affects the rate of simulation verification of an integrated circuit, especially for integrated circuits with complex structures, such as GPGPU, which have a large number of hierarchical structures in a model frame and a very large number of modules included therein, and the above serial verification method makes the verification operation time of the whole model too long, which is not beneficial to the optimal design of the integrated circuit, and cannot meet the market demand for continuously shortening the production period of the integrated circuit design.

In order to improve model simulation efficiency of an integrated circuit, some embodiments of the present disclosure provide a design simulation method for an integrated circuit device, which shortens simulation verification time while ensuring design effects.

It is to be understood that the integrated circuit devices referred to herein may refer broadly to various types of circuit structures, such as GPGPU, soc, microprocessor, etc., and are not limited in this regard. Unless defined otherwise, all terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure pertains.

To more clearly illustrate a design simulation method according to an embodiment of the present disclosure, fig. 2A is provided for illustrating a flowchart of the design simulation method according to an embodiment of the present disclosure. Implementation of the design simulation method according to the embodiment of the present disclosure will be described below with reference to fig. 2A.

As shown in fig. 2A, first, in step S201, a run queue constructed based on a plurality of modules of an integrated circuit device is acquired. For example, the integrated circuit device may be a GPGPU, and the plurality of modules of the integrated circuit device may be modules included in a model framework of the integrated circuit device, e.g., the model framework may be as shown in fig. 1. It will be appreciated that the model framework shown in fig. 1 is merely illustrative, and that the hierarchy may be more than 3 levels, e.g., several levels or tens of levels, for different integrated circuit structures, and that the modules included in the model framework are not limited to the modules shown in fig. 1, and are not limited thereto.

According to some embodiments of the present disclosure, the run queue may include at least one module selected from a plurality of modules. For example, the run queue may include a portion of the modules in the model framework, or all of the modules in the model framework that may also be included. Further, according to some embodiments of the present disclosure, at least one module in the run queue is ordered based on an execution hierarchy of the at least one module. This execution hierarchy may be understood as the execution order relationship between the modules shown in fig. 1, for example, the execution order of module 1 should follow the execution order of the sub-modules 1-4 it includes, i.e., in execution logic, modules at hierarchy 3 should be processed before modules at hierarchy 2. The construction process for the run queue will be described in detail below.

Next, in step S202, an available processor is determined. In this step, the available processor may be a processor that can be used for performing simulation verification. The number of processors available may be multiple, and further, the processors may be distributed, or processors in cloud computing, without limitation.

Next, in step S203, the available processors are used to run the simulation functions of the modules within the acquired run queues for simulation verification of the integrated circuit device. The simulation function of the module may be a main function of the module, the main function may be used as a starting point for executing the simulation program, if there are other functions, the main function is returned after completing the call to the other functions, and finally the main function ends the whole simulation program.

With the design simulation method according to some embodiments of the present disclosure, the simulation verification process of each module serially performed in the related art is avoided, but the simulation verification process for the integrated circuit device is performed by being based on the run queue constructed by at least one module selected from the plurality of modules of the integrated circuit device, which is advantageous in improving the model simulation efficiency of the integrated circuit and shortening the integrated circuit design cycle. Particularly for large integrated circuit designs such as GPGPU, the simulation verification process based on run queues can speed up model simulation more, thereby saving time costs of circuit design.

Next, a construction process of the run queue according to an embodiment of the present disclosure will be described in detail. Specifically, fig. 2B shows a schematic flow diagram of building a run queue according to an embodiment of the present disclosure.

According to some embodiments of the present disclosure, a run queue is constructed based on a plurality of modules of an integrated circuit device, and first, at step S210, a hierarchical structure between the plurality of modules of the integrated circuit device is acquired. The hierarchical structure among the plurality of modules herein may refer to a model frame generated for an integrated circuit to be designed, the model frame being composed of a plurality of modules having a hierarchical structure, for example, as shown in fig. 1.

Next, in step S220, a multi-tree structure corresponding to at least one module is generated based on the hierarchical structure. In some embodiments according to the present disclosure, at least one module may be all or a portion of a module in a model architecture (e.g., including only a portion of a module for which verification is desired), without limitation.

For ease of description, taking the model framework of FIG. 1 as an example, a multi-way tree structure constructed for all modules of the model framework of FIG. 1 is shown in FIG. 3. As shown in fig. 3, in the multi-drop tree structure, different modules are characterized by different nodes, e.g., one node is represented by a circular box, which corresponds to one module in the model framework of the integrated circuit. Further, for nodes in the multi-way tree structure, the nodes are divided into child nodes and parent nodes, wherein the module characterized by each child node is included in the module characterized by the parent node corresponding to that child node, e.g., the node corresponding to module 1 is the parent node and the nodes corresponding to child module 1-child module 4 are child nodes. For another example, the node corresponding to the top-level module is a parent node and the nodes corresponding to modules 1 and 2 are child nodes. That is, in a multi-way tree structure, the execution hierarchy of at least one module is characterized in the order of connections between child and parent nodes having a node depth. For example, for the model framework shown in fig. 1 that includes a 3-level structure, the maximum node depth of the multi-way tree structure constructed therefrom (as shown in fig. 3) is 3, where nodes with node depth 1 correspond to the 1-level structure, nodes with node depth 2 correspond to the 2-level structure, and nodes with node depth 3 correspond to the 3-level structure.

Next, as shown in fig. 2B, at step S230, nodes in the multi-way tree structure are traversed based on the node depth to build a run queue. According to some embodiments of the present disclosure, nodes are arranged in a run queue in order of connection between nodes in a multi-way tree structure, and node information associated with the nodes is included in the run queue.

Fig. 4 illustrates a schematic diagram of node information of a run queue according to an embodiment of the present disclosure, and as illustrated in fig. 4, the run queue includes nodes such as node 1, node 2, node n, and the like arranged in connection order between the nodes, and includes node information corresponding to the nodes such as node 1, node 2, node n, and the like, respectively.

According to some embodiments of the present disclosure, the node information may include a node name, a run indicator, parent node information, and child node information. As an example, the node name may be the name of the module corresponding to the node. As an example, if node 1 in the run queue corresponds to sub-module 1, then the node name of node 1 may be denoted as "sub-module 1", and as other examples, the node name may also be other information, such as a number, for identifying the module to which the node corresponds, which is not limited herein. The run indicator may be used to indicate whether the simulation function of the node is operable, as will be described below with respect to the setting of the run indicator. For example, the parent node information may include a node name of a parent node directly connected to the node, and the child node information includes a node name of a child node directly connected to the node. As an example, in the case where the node 1 corresponds to the child module 1, the parent node information of the node 1 may include the name of the module 1 directly connected to the child module 1, i.e., "module 1", and the node 1 does not have a child node, and thus, the child node information of the node 1 may be empty.

According to some embodiments of the present disclosure, the above step S230 may include: according to the depth-first principle, starting from the leaf node with the largest node depth, the root node with the node depth of 1 is sequentially accessed from the big to the small in the multi-way tree to construct the running queue. The leaf nodes represent nodes without child nodes, and the root nodes represent nodes without parent nodes. For example, in the multi-way tree structure shown in fig. 3, the sub-modules 1-4 are leaf nodes with a node depth of 3, the module 2 is a leaf node with a node depth of 2, the module 1 is a parent node with a node depth of 2, and the top-level module is a root node with a node depth of 1.

According to some embodiments of the present disclosure, the traversal described above may be: node information of the nodes is sequentially stored in the run queue according to the node depth of the nodes. For example, node information having a larger node depth is preferentially stored in the run queue because the execution hierarchy of a node having a larger node depth precedes the execution hierarchy of a node having a smaller node depth. Further, according to some embodiments of the present disclosure, for a node having the same node depth, it is first determined whether the node is a leaf node or a parent node, node information of the leaf node is stored before node information of the parent node in a run queue, a run indicator of the leaf node is set to a flag 1, and a run indicator of the parent node is set to a flag 2, wherein the run indicator is the flag 1 indicating that a simulation function of the node is operable, and the run indicator is the flag 2 indicating that the simulation function of the node is not operable. As an example, the marking of the running indicator may be represented in a 1-bit binary number, e.g., a binary number of 1 corresponds to marking 1 and a binary number of 0 corresponds to marking 2.

According to some embodiments of the present disclosure, for a maximum node depth N, for example, in the multi-way tree structure shown in fig. 3, the maximum node depth is 3, i.e., n=3, in which case sequentially storing node information of nodes in the run queue according to the node depth of the nodes includes: and determining the node with the node depth of N, and storing node information of the node with the node depth of N in a running queue, wherein for the node with the node depth of 3, a running indicator is set as a mark 1, and the node name of a father node directly connected with the node is used as father node information.

In fig. 3, the node with the node depth of 3 is the node of the sub-module 1-4, which does not include the sub-node for such leaf node, and may be preferentially executed, whereby the node information of the node with the node depth of 3 may be sequentially stored in the run queue. For node information of a node with a node depth of 3, its running indicator is first set to a flag of 1 for indicating that the simulation functions of the 4 nodes are in a runnable state, and then the node name of a parent node directly connected thereto (i.e., "module 1") may be stored in the running queue as its parent node information.

According to some embodiments of the present disclosure, for node depths M that are less than N in depth, i.e., for nodes other than those having the largest node depth, e.g., M is 1 or 2, in which case sequentially storing node information for the nodes in the run queue according to the node depths of the nodes includes: firstly, determining a leaf node with a node depth M, and storing node information of the leaf node with the node depth M in an operation queue, wherein an operation indicator of the leaf node with the node depth M is set as a mark 1, and a node name of a father node directly connected with the leaf node with the node depth M is used as father node information of the leaf node with the node depth M; next, a parent node of a node depth M is determined, and node information of the parent node of the node depth M is stored in a run queue, wherein a run indicator of the parent node of the node depth M is set to a flag 2, a node name of the parent node directly connected to the parent node of the node depth M is taken as the parent node information of the parent node of the node depth M, and a node name of the child node directly connected to the parent node of the node depth M is taken as the child node information of the parent node of the node depth M.

In the above process of sequentially storing node information of nodes in the run queue in accordance with the node depth of the nodes, the nodes are arranged based on the connection order between the nodes, which characterizes the execution hierarchy of the modules in the model framework. Therefore, in the process of executing verification based on the node information of the nodes sequentially stored in the operation queue, the execution process of the simulation function can be performed according to the node information sequentially arranged, and the time-consuming verification program for serially executing the simulation functions of all modules in the related art is avoided.

Fig. 5 shows a schematic diagram of a multi-way tree structure with a maximum node depth of 5, and a run queue construction process according to some embodiments of the present disclosure will be described below with the multi-way tree structure shown in fig. 5 as a specific example.

As shown in fig. 5, the maximum node depth of the multi-way tree structure is 5, firstly, traversing the entire multi-way tree according to the depth-first principle, putting node information of the traversed nodes into a running queue, namely traversing the multi-way tree structure in the order from bottom to top, for example, traversing the nodes with the node depth of 5 first, then traversing the nodes with the node depth of 4, and so on until traversing to the root node with the node depth of 1 (shown as a processor Core (Core) in fig. 5).

For the traversed current node, first, it is determined whether the current node is a leaf node, if so, the operation indicator of the leaf node is set to be a flag 1, and the node name of the parent node connected thereto is stored into the parent node information thereof. As an example, for a node "command execution sub-module 1" with a node depth of 5, its execution indicator may be set to flag 1, and the name of the parent node directly connected thereto (i.e., "command execution module") may be stored into its parent node information. If the current node is a parent node, for example, a command execution module with a node depth of 4, the execution indicator of the current node is set to be a flag 2, and all child nodes of the current node are put into child node information thereof, wherein the child nodes of the command execution module with the node depth of 4 comprise: a command running sub-module 1 and a command running sub-module 2.

Traversing is performed in the manner described above until traversing to the root node. As an example, for the multi-way tree structure shown in fig. 5, the order of nodes within the traversed run queue may be: { Command run submodule 1 (operation core 1), command run submodule 2 (operation core 1), operator submodule 1 (operation core 1), operator submodule 2 (operation core 1), command run submodule 1 (operation core 2), command run submodule 2 (operation core 2), operator submodule 1 (operation core 2), operator submodule 2 (operation core 2) }. It is to be understood that "command execution sub-module 1 (operation core 1)" shown above represents the command execution sub-module 1 corresponding to the operation core 1 in fig. 5. Since the nodes listed above are leaf nodes with a maximum node depth of 5, these nodes are preferentially placed in the run queue and their run indicator may be set to flag 1, i.e., to indicate that the simulation functions of these nodes are in a runnable state.

Next, the above leaf node with the node depth of 5 may be a node with the node depth of 4 in the running queue, and for the node with the same node depth of 4, the order of arrangement is that: { command accepting module (operation core 1), operation control module (operation core 1), data cache (operation core 1), control module 1 (operation core 1), control module 2 (operation core 1), command accepting module (operation core 2), operation control module (operation core 2), data cache (operation core 2), control module 1 (operation core 2), control module 2 (operation core 2) }. Since the above nodes are leaf nodes with a node depth of 4, the run indicators of these nodes may be set to flag 1, i.e., to indicate that the simulation functions of these nodes are also in a run-able state.

After the leaf node with the node depth of 4, the parent node with the node depth of 4 can be arranged in the following order: { command execution module (operation core 1), operation unit (operation core 1), command execution module (operation core 2), operation unit (operation core 2) }. Since this portion of nodes are parent nodes, its run indicator may be set to flag 2, indicating that the simulation functions of these nodes are in a non-operational state. In other words, for the parent node, it is necessary to execute the simulation function of the parent node after executing the simulation function of the child node that it includes, and thus, in the process of constructing the execution queue, the execution state thereof may be marked as non-executable to avoid confusion of the execution hierarchy among the nodes, for example, the simulation function of the command execution module is executed before the command execution sub-module, which does not conform to the execution hierarchy order of the modules in the model framework.

According to the above described process, nodes in the multi-way tree structure are traversed until traversing to the root node with the node depth of 1.

For the implementation of the above run indicator, for example, it may be that a run Flag is set, flag=1 indicates that the simulation function is in a runnable state, i.e., as a Flag 1, and flag=0 indicates that the simulation function is in a non-runnable state, i.e., as a Flag 2. For another example, two indicators may also be set to indicate the running state of the simulation function of the module, e.g., parameters Ready and Pending, where ready=1 indicates that the simulation function is in a runnable state, i.e., as tag 1, and pending=1 indicates that the simulation function is in a non-runnable state, i.e., as tag 2. The above run indicators may also be implemented in other ways, without limitation. In addition, it is understood that the above node information may further include other indication information besides the above node name, the operation indicator, the parent node information, and the child node information, and may be set according to actual requirements.

According to some embodiments of the present disclosure, starting from a leaf node with the largest node depth, sequentially accessing in the multi-way tree in order of node depth from large to small to a root node with node depth 1 according to the depth-first principle, further comprising: for nodes in the multi-way tree structure, active nodes with data information are determined, and the active nodes and parent nodes and child nodes directly connected with the active nodes are marked as active states. In these embodiments, a busy mechanism is introduced, adding an Active attribute to the module. In one simulation clock period, only for the node marked as the active state in the current multi-way tree structure, storing the node information of the node in the running queue according to the node depth of the node, and not storing the node which is not marked as the active state in the current multi-way tree structure.

For a node that has data information, it is understood that a node that has data transmission. For example, fig. 1 shows a pipeline for transmitting data, where the pipeline is used to receive data put in by a module interface. When the data content in the pipe, it indicates that the nodes upstream and downstream thereof are in a busy state. Thus, the nodes can be set to an active state, and further, parent and child nodes directly connected to the active node are marked as active.

In constructing the run queue by traversing the multi-way tree structure, only node information for nodes marked as active is stored in the run queue. As an example, in traversing a node with a node depth of 5, assuming that only command execution sub-module 2 (operation core 1) and operation sub-module 2 (operation core 1) are nodes marked as active, for a node with a node depth of 5, a node put into the execution queue may include { command execution sub-module 2 (operation core 1), operation sub-module 2 (operation core 1) }, while other nodes with a node depth of 5 are not put into the execution queue because they are not active, i.e., there is no data processing. In addition, since the command execution submodule 2 (operation core 1) and the operation submodule 2 (operation core 1) are in an active state, the parent nodes thereof (the command execution module (operation core 1) and the operation element (operation core 1) respectively) are also marked as active states and placed in the execution queue in order according to the above-described depth-first principle.

In embodiments applying the busy mechanism described above, the nodes within the run queue include only a portion of the active modules, i.e., the busy modules, of the plurality of modules of the integrated circuit device. This makes it possible to avoid the verification process for modules that are not busy in the process of performing simulation verification based on the run queue, saving unnecessary verification time compared to the manner of verifying all modules serially in the related art. For example, in the verification process for a certain simulation clock period, the verification task degree is relatively simple, and there may be more modules in an idle (idle) state, in this case, in the simulation verification based on the operation queue constructed by introducing the busy mechanism, the simulation function of the module which is not marked as the active state will not be operated any more, so that the verification efficiency of the integrated circuit can be greatly improved, which is more remarkable for a large-scale integrated circuit such as a GPGPU.

The process of building a run queue, which may be referred to as the post-processing stage of simulation verification, is described above in connection with fig. 2B-5. Based on the constructed run queue, design simulation methods according to some embodiments of the present disclosure may determine available processors and then run simulation functions of modules within the run queue using the determined available processors to implement simulation verification of the integrated circuit device. The process of performing simulation verification based on the run queue may be referred to as a run phase, and the implementation of the run phase will be described below.

According to some embodiments of the present disclosure, the number of available processors may be one or more, and in the case where the number of available processors is one, the simulation functions of the modules may be sequentially verified from the run queue. In the case where the number of available processors is multiple, running the simulation functions of the modules within the run queue using the available processors includes: multiple available processors are used to run simulation functions for multiple modules within the run queue in parallel. The parallel operation can be to run the simulation functions of different modules by a plurality of processors respectively so as to improve the verification efficiency.

Specifically, running the simulation functions of the plurality of modules within the run queue in parallel may include: determining the number of nodes with running indicators marked 1 in a running queue; and running the simulation functions of the nodes with run indicators of marker 1 in the run queue in parallel using the available processors based on the number of available processors and the number of nodes with run indicators of marker 1. It is assumed that the number of available processors is denoted S, the number of nodes for which the running indicator is a flag 1 is denoted T, and S and T are integers equal to or greater than 0. In the case where T is determined to be less than or equal to S, i.e., the number of modules currently in an executable state is not greater than the number of available processors, then the available processors may be utilized to run the simulation functions of the node whose T run indicators are marker 1 in parallel in the run queue. In the case where T is determined to be greater than S, i.e., the number of modules currently in an executable state is greater than the number of available processors, then the available processors may be utilized to run the simulation functions of the nodes whose first S run indicators are marker 1 in the run queue in parallel.

In accordance with some embodiments of the present disclosure, the run queue may also be updated during the run phase after the available processors have run the simulation verification functions of the module. Specifically, updating the run queue may include: each node after the completion of the running of the simulation function is represented as a first node, the parent node of the first node is determined according to the parent node information of the first node, and the node information of the parent node of the first node is updated, for example, the parent node of the first node may be represented as a second node, i.e., the node name of the first node is deleted from the child node information of the second node. Then, deleting the node information of the first node in the running queue; and continuing to update until the run queue is empty.

For example, taking the first node as the node 1 as an example, after the simulation function of the module corresponding to the node 1 is executed by using the available processor, the parent node (i.e., the second node) of the node 1 may be determined according to the parent node information of the node 1 in the execution queue, and the node information of the parent node may be updated, for example, the node name of the node 1 in the child node information of the parent node is deleted. Then, node information of the node 1 may be deleted in the run queue, indicating that the simulation function of the node 1 has completed the simulation verification process.

According to some embodiments of the present disclosure, updating node information of a parent node of the first node further includes: determining whether child node information of the second node is empty; in the case where it is determined that the child node information of the second node is empty, the operation indicator of the second node is set to a flag 1. For example, after deleting the node name of the node 1 in the child node information of the parent node, it may be determined whether the child node information of the parent node is empty, and if it is determined that the child nodes of the parent node have all completed the simulation verification, in this case, the running indicator of the parent node may be set to a flag 1, that is, a simulation function indicating that the parent node is currently in a runnable state. This is because the child node of the parent node is fully operational and the parent node is exposed as a child node and is in an operational state.

In addition, after the above-described node information updating process is completed, the number of available processors may also be updated. This is because the available processor running the node 1 has completed the authentication task for the node 1, which can be re-used as the available processor for authentication of other modules.

The above parallel validation process based on the run queue is looped until all nodes within the run queue have completed validation, i.e., the run queue is empty, indicating that the validation process for the current simulation clock cycle has been completed. After completion of the run phase of the current simulation clock cycle, the post-processing phase of the next simulation clock cycle may be re-entered for use in building a run queue for the next simulation clock cycle.

In addition, in the operation stage, a monitoring program can be further arranged for checking whether the overtime behavior exists in the verification process of the module, if so, the simulation function of the module can be wrong, and in this case, the monitoring program can control to jump out of the operation stage and report the mistake.

FIG. 6 illustrates a flow chart of performing a design simulation method in accordance with an embodiment of the present disclosure, wherein a verification process including a run phase and a post-processing phase is shown for a certain simulation clock cycle.

As shown in fig. 6, the verification process for each simulated clock cycle can be divided into a run phase 601 and a post-processing phase 602. First, the simulation verification functions of all modules may be called with, for example, the function cycle_n.run. Next, it may be determined whether a Trigger (Trigger) is 1, which may be activated by a test stimulus, or by an external other module such as a memory module, or the like. If Trigger is determined to be 1, a multi-drop tree structure (e.g., as shown in FIG. 3 or FIG. 5) may be constructed and active nodes marked, i.e., nodes for which there is data content for the current emulated clock period. Next, a multi-tree traversal may be performed with node depths and the run queue build process may be performed with node depths one by one in the manner described above until traversing to the root node.

If it is determined that Trigger is not 1, the operation stage 601 may be entered, where it may be first determined whether the operation queue is empty, and if the operation queue is empty, it indicates that the entire model is in an idle state without running a simulation function, that is, the verification process of the current simulation clock cycle is ended. If the run queue is not empty, the number of nodes in the run queue for which the run indicator is tag 1 (denoted T) is determined, and the number of currently available processors CPU (denoted S) is checked for simulation verification based on the determined S and T in parallel. For example, if T is less than or equal to S, then the simulation functions of all the nodes in the runnable state may be run in parallel, and if T is greater than S, then the simulation functions of the nodes whose first S run indicators are marker 1 in the run queue may be run in parallel according to S.

After the simulation function of the node is completed, the node information in the running queue can be updated. For example, the node information of the parent node (i.e., the second node) of the first node is updated, and the node information of the first node is deleted from the run queue. Further, if the child node information of the second node is null, the operation indicator of the second node is set to flag 1, which corresponds to the completion of all operations of the child node of the second node, and the second node, which is a parent node, is updated to an operable node. And then, returning to the step of judging whether the running queue is empty or not, if not, continuing to execute the parallel verification process according to the above process until the running queue is empty, and ending the verification process of the current simulation clock cycle.

By using the design simulation method for the integrated circuit device, which is provided by the embodiment of the invention, the simulation verification process for the integrated circuit device can be performed based on the running queue constructed by at least one module selected from a plurality of modules of the integrated circuit device, thereby being beneficial to improving the model simulation efficiency of the integrated circuit and shortening the design period of the integrated circuit. In addition, the parallel simulation verification process can be performed based on the constructed running queue, so that the verification speed is improved, and the speed improvement is very obvious for the case of extremely high parallelism. For example, the multi-way tree structure built for the model framework of the GPGPU includes 8734 nodes, the maximum node depth is 14 stages, and the simulation verification rate can be improved by tens of times compared to the verification process in which all modules are serially executed in the related art. In addition, in some embodiments according to the present disclosure, a busy mechanism is introduced in the process of constructing the run queue, only the node having the data processing content is put into the run queue, and the module in the idle state is not put into the run queue, so that the simulation verification time for the idle module is avoided, in which case, the simulation verification rate based on the run queue can obtain a promotion of several hundred times or more. The verification rate of the model is improved, so that the verification time cost of the integrated circuit is reduced, the design period of the integrated circuit is shortened, and the market demand for continuously shortening the design production period of the integrated circuit is met.

FIG. 7 shows a schematic block diagram of a design simulation apparatus in accordance with an embodiment of the present disclosure. As shown in fig. 7, the design simulation apparatus 1000 may include a run queue acquisition unit 1010, a calculation unit 1020, and a simulation verification unit 1030.

According to some embodiments of the present disclosure, the run queue acquisition unit 1010 may be configured to: a run queue constructed based on a plurality of modules of an integrated circuit device is obtained. For example, the integrated circuit device may be a GPGPU, and the plurality of modules of the integrated circuit device may be modules included in a model framework of the integrated circuit device, e.g., the model framework may be as shown in fig. 1. It will be appreciated that the model framework shown in fig. 1 is merely illustrative, and that the hierarchy may be more than 3 levels, e.g., several levels or tens of levels, for different integrated circuit structures, and that the modules included in the model framework are not limited to the modules shown in fig. 1. According to some embodiments of the present disclosure, the run queue may include at least one module selected from a plurality of modules. For example, the run queue may include a portion of the modules in the model framework, or all of the modules in the model framework that may also be included. Further, according to some embodiments of the present disclosure, at least one module in the run queue is ordered based on an execution hierarchy of the at least one module. This execution hierarchy may be understood as the execution order relationship between the modules shown in fig. 1, for example, the execution order of module 1 should follow the execution order of the sub-modules 1-4 it includes, i.e., in execution logic, modules at hierarchy 3 should be processed before modules at hierarchy 2.

According to some embodiments of the present disclosure, the computing unit 1020 may be configured to: the available processors are determined. The available processors may be processors that can be used for performing simulation verification. The number of processors available may be multiple, and further, the processors may be distributed, or processors in cloud computing, without limitation.

According to some embodiments of the present disclosure, the simulation verification unit 1030 may be configured to: the available processors are used to run simulation functions of the modules within the run queue for simulation verification of the integrated circuit device.

With the design simulation apparatus 1000 according to some embodiments of the present disclosure, it is possible to avoid the simulation verification process of serially executing each module in the related art, but to perform the simulation verification process for the integrated circuit apparatus by performing the simulation verification process for the integrated circuit apparatus based on the run queue constructed by at least one module selected from the plurality of modules of the integrated circuit apparatus, which is advantageous in improving the model simulation efficiency of the integrated circuit and shortening the integrated circuit design cycle. Particularly for large integrated circuit designs such as GPGPU, the simulation verification process based on run queues can speed up model simulation more, thereby saving time costs of circuit design.

According to some embodiments of the present disclosure, as shown in fig. 7, the design simulation apparatus 1000 may further include a run queue construction unit 1040. The run queue construction unit 1040 may be configured to: a hierarchy is obtained between a plurality of modules of an integrated circuit device. The hierarchical structure among the plurality of modules herein may refer to a model frame generated for an integrated circuit to be designed, the model frame being composed of a plurality of modules having a hierarchical structure, for example, as shown in fig. 1.

Next, the run queue construction unit 1040 may be configured to: based on the hierarchy, a multi-way tree structure corresponding to the at least one module is generated. In some embodiments according to the present disclosure, at least one module may be all or a portion of a module in a model architecture (e.g., including only a portion of a module for which verification is desired), without limitation. Wherein the multi-drop tree structure characterizes different modules in different nodes, each child node characterized module being included in a parent node characterized module corresponding to a child node, the execution hierarchy of at least one module being characterized in a connection order between the child node and the parent node having a node depth.

Next, the run queue construction unit 1040 may be configured to: nodes in the multi-way tree structure are traversed based on the node depth to build a run queue. According to some embodiments of the present disclosure, nodes are arranged in a run queue in order of connection between nodes in a multi-way tree structure, and node information associated with the nodes is included in the run queue.

According to some embodiments of the present disclosure, the node information may include: the operation indicator is used for indicating whether the simulation function of the node is operable or not, the father node information comprises the node name of the father node directly connected with the node, and the child node information comprises the node name of the child node directly connected with the node. The description of the node information with reference to fig. 4 above is not repeated here.

According to some embodiments of the present disclosure, the run queue construction unit 1040 is configured to: according to the depth-first principle, starting from the leaf node with the largest node depth, the root node with the node depth of 1 is sequentially accessed from the big to the small in the multi-way tree to construct the running queue. The leaf nodes represent nodes without child nodes, and the root nodes represent nodes without parent nodes. For example, in the multi-way tree structure shown in fig. 3, sub-modules 1-4 are leaf nodes and the top-level module is the root node.

According to some embodiments of the present disclosure, the run queue construction unit 1040 performing the above traversal may include: node information of the nodes is sequentially stored in the run queue according to the node depth of the nodes. For example, node information having a larger node depth is preferentially stored in the run queue because the execution hierarchy of a node having a larger node depth precedes the execution hierarchy of a node having a smaller node depth. Further, according to some embodiments of the present disclosure, for a node having the same node depth, it is first determined whether the node is a leaf node or a parent node, node information of the leaf node is stored before node information of the parent node in a run queue, a run indicator of the leaf node is set to a flag 1, and a run indicator of the parent node is set to a flag 2, wherein the run indicator is the flag 1 indicating that a simulation function of the node is operable, and the run indicator is the flag 2 indicating that the simulation function of the node is not operable.

According to some embodiments of the present disclosure, for a maximum node depth N, for example, in the multi-way tree structure shown in fig. 3, the maximum node depth is 3, i.e., n=3, in which case the running queue construction unit 1040 performs storing node information of nodes sequentially in the running queue according to the node depth of the nodes, including: and determining the node with the node depth of N, and storing node information of the node with the node depth of N in a running queue, wherein for the node with the node depth of 3, a running indicator is set as a mark 1, and the node name of a father node directly connected with the node is used as father node information.

According to some embodiments of the present disclosure, for a node depth M that is less than N, the run queue construction unit 1040 is configured to: determining a leaf node with a node depth M, and storing node information of the leaf node with the node depth M in an operation queue, wherein an operation indicator of the leaf node with the node depth M is set as a mark 1, and a node name of a father node directly connected with the leaf node with the node depth M is used as father node information of the leaf node with the node depth M; determining a parent node with a node depth M, and storing node information of the parent node with the node depth M in a running queue, wherein a running indicator of the parent node with the node depth M is set to be a mark 2, a node name of the parent node directly connected with the parent node with the node depth M is taken as the parent node information of the parent node with the node depth M, and a node name of a child node directly connected with the parent node with the node depth M is taken as the child node information of the parent node with the node depth M, wherein N is an integer larger than 1, and M is an integer smaller than N and larger than 1.

For the implementation of the above run indicator, for example, it may be that a run Flag is set, flag=1 indicates that the simulation function is in a runnable state, and flag=0 indicates that the simulation function is in a non-runnable state. For another example, two indicators may also be set to indicate the running state of the simulation function of the module, e.g., parameters Ready and Pending, where ready=1 indicates that the simulation function is in a runnable state and pending=1 indicates that the simulation function is in a non-runnable state. The above run indicators may also be implemented in other ways, without limitation. In addition, it is understood that the above node information may further include other indication information besides the above node name, the operation indicator, the parent node information, and the child node information, and may be set according to actual requirements.

According to some embodiments of the present disclosure, the run queue construction unit 1040 may be further configured to: for nodes in the multi-way tree structure, active nodes with data information are determined, and the active nodes and parent nodes and child nodes directly connected with the active nodes are marked as active states. In these embodiments, a busy mechanism is introduced, adding an Active attribute to the module. In one simulation clock cycle, the run queue construction unit 1040 may be configured to: and storing node information of the node in the running queue according to the node depth of the node only for the node marked as the active state in the current multi-way tree structure, and not storing the node which is not marked as the active state in the current multi-way tree structure.

According to some embodiments of the present disclosure, the number of available processors may be one or more. In the case where the number of available processors is plural, the simulation verification unit 1030 may be configured to: multiple available processors are used to run simulation functions for multiple modules within the run queue in parallel. The parallel operation can be to run the simulation functions of different modules by a plurality of processors respectively so as to improve the verification efficiency.

According to some embodiments of the present disclosure, the simulation verification unit 1030 may be configured to: determining the number of nodes with running indicators marked 1 in a running queue; and running the simulation functions of the nodes with run indicators of marker 1 in the run queue in parallel using the available processors based on the number of available processors and the number of nodes with run indicators of marker 1.

According to some embodiments of the present disclosure, the number of available processors is denoted as S, the number of nodes with a run indicator of 1 is denoted as T, S and T are integers greater than or equal to 0, and the emulation prevention unit 1030 may be configured to: in the case that T is less than or equal to S, using available processors to run the simulation functions of the nodes with T running indicators in the running queue as markers 1 in parallel; alternatively, in the event that T is determined to be greater than S, the available processors are utilized to run the simulation functions of the nodes whose first S run indicators are marker 1 in the run queue in parallel.

According to some embodiments of the present disclosure, the simulation verification unit 1030 may be further configured to: the run queue is updated during the run.

According to some embodiments of the present disclosure, the simulation verification unit 1030 may be configured to: each node after the simulation function operation is completed is expressed as a first node, the father node of the first node is determined according to the father node information of the first node, and the node information of the father node of the first node is updated; deleting node information of the first node in the operation queue; and continuing to update until the run queue is empty.

According to some embodiments of the present disclosure, a parent node of a first node is denoted as a second node. The simulation verification unit 1030 may be configured to: deleting the node name of the first node from the child node information of the second node; determining whether child node information of the second node is empty; in the case where it is determined that the child node information of the second node is empty, the operation indicator of the second node is set to a flag 1.

According to some embodiments of the present disclosure, in the run phase, the design simulation apparatus 1000 may further be provided with a monitor program for checking whether there is a timeout behavior in the verification process for the module, and if there is a timeout behavior, it indicates that there may be an error in the simulation function of the module, in which case the monitor program may control to jump out of the run phase and report the error.

The specific implementation of the steps performed with respect to the design simulation apparatus 1000 may refer to the design simulation method according to the present disclosure described above in connection with the drawings, and the description will not be repeated here.

According to yet another aspect of the present disclosure, there is also provided a design simulation apparatus for an integrated circuit device. FIG. 8 shows a schematic block diagram of a design simulation device in accordance with an embodiment of the present disclosure.

As shown in fig. 8, design simulation device 2000 may include a processor 2010 and a memory 2020. In accordance with an embodiment of the present disclosure, memory 2020 has stored therein computer readable code which, when executed by processor 2010, can perform a design simulation method as described above.

Processor 2010 may perform various actions and processes in accordance with programs stored in memory 2020. In particular, processor 2010 may be an integrated circuit having signal processing capabilities. The processor may be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. Various methods, steps, and logic blocks disclosed in embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, and may be an X86 architecture or an ARM architecture or the like.

The memory 2020 stores computer executable instruction code that, when executed by the processor 2010, is used to implement a design simulation method in accordance with an embodiment of the present disclosure. The memory 2020 may be volatile memory or nonvolatile memory or may include both volatile and nonvolatile memory. The non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), or flash memory. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of example, and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate synchronous dynamic random access memory (ddr SDRAM), enhanced Synchronous Dynamic Random Access Memory (ESDRAM), synchronous Link Dynamic Random Access Memory (SLDRAM), and direct memory bus random access memory (DR RAM). It should be noted that the memory described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

Methods or apparatus according to embodiments of the present disclosure may also be implemented by way of the architecture of computing device 3000 shown in fig. 9. As shown in fig. 9, computing device 3000 may include a bus 3010, one or more CPUs 3020, a Read Only Memory (ROM) 3030, a Random Access Memory (RAM) 3040, a communication port 3050 connected to a network, an input/output component 3060, a hard disk 3070, and the like. A storage device in the computing device 3000, such as a ROM 3030 or hard disk 3070, may store various data or files for processing and/or communication use of the design simulation method provided by the present disclosure and program instructions executed by the CPU. The computing device 3000 may also include a user interface 3080. Of course, the architecture shown in FIG. 9 is merely exemplary, and one or more components of the computing device shown in FIG. 9 may be omitted as may be practical in implementing different devices. By way of example, the computing device 3000 described above may be implemented as a computer with a design simulation application installed for an integrated circuit, without limitation.

For example, an application for integrated circuit design simulation may be installed in the computing device 3000, and the computing device 3000 may be a multi-core processor including a plurality of CPUs 3020. Therefore, the computing equipment can construct or acquire the running queue constructed based on a plurality of modules of the integrated circuit device, and the simulation functions of the modules in the running queue are run in parallel by means of a plurality of CPUs so as to simulate the verification process of the integrated circuit device in parallel, thereby improving the simulation verification rate of the integrated circuit and shortening the circuit design period.

According to yet another aspect of the present disclosure, there is also provided a non-transitory computer-readable storage medium. Fig. 10 shows a schematic diagram of a non-transitory computer-readable storage medium according to an embodiment of the disclosure.

As shown in fig. 10, the computer-readable storage medium 4020 has stored thereon instructions, such as computer-readable instructions 4010. When the computer readable instructions 4010 are executed by the processor, the design simulation method described with reference to the above figures may be performed. Computer-readable storage media include, but are not limited to, volatile memory and/or nonvolatile memory, for example. Volatile memory can include, for example, random Access Memory (RAM) and/or cache memory (cache) and the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, and the like. For example, the computer-readable storage medium 4020 may be connected to a computing device such as a computer, and then, in the case where the computing device runs the computer-readable instructions 4010 stored on the computer-readable storage medium 4020, the design simulation method as described above may be performed.

According to yet another aspect of the present disclosure, there is also provided a computer program product or a computer program comprising computer readable instructions stored in a computer readable storage medium. The processor of the computer device may read the computer-readable instructions from the computer-readable storage medium, and execute the computer-readable instructions, so that the computer device performs the design simulation method described in the above embodiments.

By using the design simulation method, the device, the equipment and the readable storage medium for the integrated circuit device, which are provided by the embodiment of the invention, the simulation verification process for the integrated circuit device can be carried out based on the running queue constructed by at least one module selected from a plurality of modules of the integrated circuit device, thereby being beneficial to improving the model simulation efficiency of the integrated circuit and shortening the design period of the integrated circuit. In addition, the parallel simulation verification process can be performed based on the constructed running queue, so that the verification speed is improved, and the speed improvement is very obvious for the case of extremely high parallelism. In addition, in some embodiments according to the present disclosure, a busy mechanism is introduced in the process of constructing the run queue, only the node having the data processing content is put into the run queue, and the module in the idle state is not put into the run queue, so that the simulation verification time for the idle module is avoided, in which case, the simulation verification rate based on the run queue can obtain a promotion of several hundred times or more. The verification rate of the model is improved, so that the verification time cost of the integrated circuit is reduced, the design period of the integrated circuit is shortened, and the market demand for continuously shortening the design production period of the integrated circuit is met.

Those skilled in the art will appreciate that various modifications and improvements can be made to the disclosure. For example, the various devices or components described above may be implemented in hardware, or may be implemented in software, firmware, or a combination of some or all of the three.

Further, while the present disclosure makes various references to certain elements in a system according to embodiments of the present disclosure, any number of different elements may be used and run on a client and/or server. The units are merely illustrative and different aspects of the systems and methods may use different units.

A flowchart is used in this disclosure to describe the steps of a method according to an embodiment of the present disclosure. It should be understood that the steps that follow or before do not have to be performed in exact order. Rather, the various steps may be processed in reverse order or simultaneously. Also, other operations may be added to these processes.

Those of ordinary skill in the art will appreciate that all or a portion of the steps of the methods described above may be implemented by a computer program to instruct related hardware, and the program may be stored in a computer readable storage medium, such as a read only memory, a magnetic disk, or an optical disk. Alternatively, all or part of the steps of the above embodiments may be implemented using one or more integrated circuits. Accordingly, each module/unit in the above embodiment may be implemented in the form of hardware, or may be implemented in the form of a software functional module. The present disclosure is not limited to any specific form of combination of hardware and software.

Unless defined otherwise, all terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure pertains. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

The foregoing is illustrative of the present disclosure and is not to be construed as limiting thereof. Although a few exemplary embodiments of this disclosure have been described, those skilled in the art will readily appreciate that many modifications are possible in the exemplary embodiments without materially departing from the novel teachings and advantages of this disclosure. Accordingly, all such modifications are intended to be included within the scope of this disclosure as defined in the claims. It is to be understood that the foregoing is illustrative of the present disclosure and is not to be construed as limited to the specific embodiments disclosed, and that modifications to the disclosed embodiments, as well as other embodiments, are intended to be included within the scope of the appended claims. The disclosure is defined by the claims and their equivalents.

Claims

1. A design simulation method for an integrated circuit device, comprising:

Obtaining a run queue constructed based on a plurality of modules of the integrated circuit device, wherein the run queue includes at least one module selected from the plurality of modules, and in the run queue, the at least one module is sequentially arranged based on an execution hierarchy of the at least one module;

determining available processors; and

running, using the available processor, a simulation function of a module within the run queue for simulation verification of the integrated circuit device, wherein the run queue is constructed based on a plurality of modules of the integrated circuit device according to the steps of:

acquiring a hierarchical structure among a plurality of modules of the integrated circuit device;

generating a multi-way tree structure corresponding to the at least one module based on the hierarchy, wherein the multi-way tree structure characterizes different modules by different nodes, each child node characterized module is included in a parent node characterized module corresponding to the child node, and the execution hierarchy of the at least one module is characterized by a connection order between child nodes and parent nodes with node depths; and

traversing nodes in the multi-way tree structure based on node depth to build the run queue,

Wherein nodes are arranged in the run queue in the connection order and node information associated with the nodes is included, the node information including: the node comprises a node name, an operation indicator, father node information and child node information, wherein the operation indicator is used for indicating whether a simulation function of the node can operate, the father node information comprises the node name of a father node directly connected with the node, and the child node information comprises the node name of a child node directly connected with the node.

2. The method of claim 1, wherein traversing nodes in the multi-way tree structure based on node depth to build the run queue comprises:

the run queue is constructed by sequentially accessing from a leaf node having a maximum node depth to a root node having a node depth of 1 in the multi-way tree in order of node depth from large to small, according to a depth-first principle, wherein the leaf node represents a node having no child node, and the root node represents a node having no parent node.

3. The method of claim 2, wherein said sequentially accessing in the multi-way tree in order of node depth from big to small starting from a leaf node having a largest node depth to a root node having a node depth of 1 according to a depth-first principle comprises:

Sequentially storing node information of nodes in the run queue according to node depths of the nodes, wherein for nodes having the same node depth, it is determined whether the node is a leaf node or a parent node, storing the node information of the leaf node before the node information of the parent node in the run queue, and setting a run indicator of the leaf node to be a flag 1, setting a run indicator of the parent node to be a flag 2,

wherein, the running indicator is marked 1 to indicate that the simulation function of the node is operable, and the running indicator is marked 2 to indicate that the simulation function of the node is not operable.

4. A method according to claim 3, wherein said sequentially accessing in the multi-way tree in order of node depth from big to small starting from a leaf node having a largest node depth to a root node having a node depth of 1 according to a depth-first principle, further comprises:

for nodes in the multi-way tree structure, determining active nodes with data information, and marking the active nodes and father nodes and child nodes directly connected with the active nodes as active states.

5. The method of claim 4, wherein storing node information for a node in the run queue according to its node depth comprises:

in one simulation clock period, only for the node marked as the active state in the current multi-way tree structure, storing the node information of the node in the running queue according to the node depth of the node, and not storing the node which is not marked as the active state in the current multi-way tree structure.

6. The method of claim 3, wherein the sequentially storing node information of nodes in the run queue according to node depths of the nodes comprises:

for maximum node depth N:

determining a node with a node depth of N, and storing node information of the node with the node depth of N in the running queue, wherein a running indicator of the node with the node depth of N is set to be marked 1, and a node name of a father node directly connected with the node depth of N is used as father node information of the node with the node depth of N; and

for node depths M less than N:

determining a leaf node with a node depth M, and storing node information of the leaf node with the node depth M in the running queue, wherein a running indicator of the leaf node with the node depth M is set to be marked 1, and a node name of a father node directly connected with the leaf node with the node depth M is used as father node information of the leaf node with the node depth M;

Determining a parent node of a node depth M, and storing node information of the parent node of the node depth M in the run queue, wherein a run indicator of the parent node of the node depth M is set to be a mark 2, a node name of the parent node directly connected to the parent node of the node depth M is taken as the parent node information of the parent node of the node depth M, and a node name of a child node directly connected to the parent node of the node depth M is taken as the child node information of the parent node of the node depth M,

wherein N is an integer greater than 1, and M is an integer less than N and greater than 1.

7. The method of claim 3, wherein the number of available processors is a plurality, the running a simulation function of a module within the run queue using the available processors comprising:

a plurality of the available processors are used to run simulation functions of a plurality of modules within the run queue in parallel.

8. The method of claim 7, wherein the running in parallel the simulation functions of the plurality of modules within the run queue comprises:

determining the number of nodes with running indicators marked 1 in the running queue; and

Based on the number of available processors and the number of nodes for which the run indicator is tag 1, the available processors are used to run in parallel a simulation function for the nodes for which the run indicator is tag 1 within the run queue.

9. The method of claim 8, wherein the number of available processors is denoted as S, the number of nodes for which the run indicator is a flag 1 is denoted as T, S and T are integers greater than or equal to 0,

the running the emulation functions of nodes within the run queue in parallel with the available processors based on the number of available processors and the number of nodes for which the run indicator is tag 1 comprises:

in the case that T is less than or equal to S, utilizing the available processors to run simulation functions of nodes with T running indicators of mark 1 in the running queue in parallel; or,

in the event that T is determined to be greater than S, the available processors are utilized to run the simulation functions of nodes whose first S run indicators are marker 1 in the run queue in parallel.

10. The method of claim 7, wherein the running simulation functions of a plurality of modules within the run queue in parallel further comprises:

And updating the running queue in the running process.

11. The method of claim 10, wherein the updating the run queue during run time comprises:

each node after the simulation function operation is completed is represented as a first node, the father node of the first node is determined according to the father node information of the first node, and the node information of the father node of the first node is updated;

deleting node information of the first node in the operation queue; and

the updating continues until the run queue is empty.

12. The method of claim 11, wherein the parent node of the first node is represented as a second node, the updating node information of the parent node of the first node comprising:

deleting the node name of the first node from the child node information of the second node;

determining whether child node information of the second node is empty;

in the case where it is determined that the child node information of the second node is empty, the operation indicator of the second node is set to a flag 1.

13. A design simulation apparatus for an integrated circuit device, comprising:

a run queue acquisition unit configured to: obtaining a run queue constructed based on a plurality of modules of the integrated circuit device, wherein the run queue includes at least one module selected from the plurality of modules, and in the run queue, the at least one module is sequentially arranged based on an execution hierarchy of the at least one module;

A computing unit configured to: determining available processors; and

a simulation verification unit configured to run a simulation function of a module within the run queue using the available processor for simulation verification of the integrated circuit device, wherein the design simulation device further comprises a run queue construction unit configured to:

14. The apparatus of claim 13, wherein the run queue construction unit is configured to:

15. The apparatus of claim 14, wherein the run queue construction unit is configured to:

16. A design simulation apparatus for an integrated circuit device, comprising:

a processor; and

a memory, wherein the memory has stored therein computer readable code which, when executed by the processor, performs the design simulation method for an integrated circuit device of any of claims 1-12.

17. A non-transitory computer readable storage medium having instructions stored thereon that, when executed by a processor, cause the processor to perform the design simulation method for an integrated circuit device of any of claims 1-12.