WO2024067303A1 - 仿真方法、电子设备、计算机可读介质 - Google Patents

仿真方法、电子设备、计算机可读介质 Download PDF

Info

Publication number
WO2024067303A1
WO2024067303A1 PCT/CN2023/120117 CN2023120117W WO2024067303A1 WO 2024067303 A1 WO2024067303 A1 WO 2024067303A1 CN 2023120117 W CN2023120117 W CN 2023120117W WO 2024067303 A1 WO2024067303 A1 WO 2024067303A1
Authority
WO
WIPO (PCT)
Prior art keywords
simulation
data flow
flow node
calculation weight
module
Prior art date
Application number
PCT/CN2023/120117
Other languages
English (en)
French (fr)
Inventor
胡韶华
石金锋
朱智华
刘衡祁
Original Assignee
深圳市中兴微电子技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市中兴微电子技术有限公司 filed Critical 深圳市中兴微电子技术有限公司
Publication of WO2024067303A1 publication Critical patent/WO2024067303A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/32Circuit design at the digital level
    • G06F30/33Design verification, e.g. functional simulation or model checking
    • G06F30/3308Design verification, e.g. functional simulation or model checking using simulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2115/00Details relating to the type of the circuit
    • G06F2115/06Structured ASICs

Definitions

  • the present disclosure relates to the technical field of integrated circuit design, and in particular to a simulation method, an electronic device, and a computer-readable medium.
  • the embodiments of the present disclosure provide a simulation method, an electronic device, and a computer-readable medium.
  • an embodiment of the present disclosure provides a simulation method, comprising: analyzing codes of multiple simulation modules to generate multiple data flow node trees, wherein each simulation module corresponds to at least one data flow node tree; determining calculation weights of multiple simulation modules based on the multiple data flow node trees; and integrating the multiple simulation modules into multiple simulation threads based on the calculation weights of the multiple simulation modules, wherein the multiple simulation threads correspond one-to-one to the multiple simulation units.
  • an embodiment of the present disclosure provides an electronic device, comprising: one or more processors; a memory on which one or more programs are stored, and when the one or more programs are executed by the one or more processors, the one or more processors implement the simulation method described in the first aspect of the embodiment of the present disclosure.
  • an embodiment of the present disclosure provides a computer-readable medium having a computer program stored thereon, and when the program is executed by a processor, the simulation method described in the first aspect of the embodiment of the present disclosure is implemented.
  • FIG1 is a schematic diagram of a parallel simulation
  • FIG2 is a schematic diagram of another parallel simulation
  • FIG3 is a flow chart of a simulation method according to an embodiment of the present disclosure.
  • FIG4 is a flow chart of some steps in another simulation method according to an embodiment of the present disclosure.
  • FIG5 is a block diagram of an electronic device according to an embodiment of the present disclosure.
  • FIG. 6 is a block diagram of a computer-readable medium according to an embodiment of the present disclosure.
  • the execution core will have two states: waiting for scheduling and executing, which will lead to reduced simulation efficiency; as shown in FIG2, due to differences in the design and implementation of different simulation modules, the calculation amount of different simulation threads varies greatly.
  • some simulation threads usually need to wait for other simulation threads to complete the calculation after completing the calculation.
  • the execution state of execution core 0 is shorter than the execution state of execution core 1.
  • execution core 0 is in the state of waiting for scheduling, and can enter the next simulation cycle only after the execution core 1 completes the calculation, resulting in low simulation parallelism.
  • an embodiment of the present disclosure provides a simulation method, comprising the following steps S1 to S3 .
  • step S1 the codes of multiple simulation modules are analyzed to generate multiple data flow node trees, wherein each simulation module corresponds to at least one data flow node tree.
  • step S2 calculation weights of the plurality of simulation modules are determined according to the plurality of data flow node trees.
  • step S3 the plurality of simulation modules are integrated into a plurality of simulation threads according to the calculation weights of the plurality of simulation modules, wherein the plurality of simulation threads correspond one-to-one to the plurality of simulation units.
  • the data flow node tree is a tree structure representing the signal flow and operation in the simulation module.
  • a simulation module may correspond to one or more data flow node trees, and different data flow node trees represent different signal flows in the simulation module.
  • the disclosed embodiment does not specifically limit this.
  • the calculation weight represents the amount of calculation.
  • the calculation weight of a simulation module represents the amount of calculation corresponding to the simulation module, and determines the time required to complete the calculation of the simulation module.
  • multiple simulation threads correspond one-to-one to multiple simulation units, which means that the simulation threads are bound to the simulation units, thereby determining a fixed binding relationship between the simulation threads and the simulation units.
  • the number of simulation threads obtained by integrating multiple simulation modules is equal to the number of simulation units used for simulation.
  • One simulation thread is executed in one simulation unit in different simulation cycles, and different simulation threads are executed in parallel in different simulation units, thereby reducing scheduling overhead.
  • the simulation unit may be a CPU core or other simulation units.
  • the embodiments of the present disclosure do not specifically limit how to integrate multiple simulation modules.
  • the simulation modules may be split or multiple simulation modules may be merged.
  • multiple data flow node trees are generated by analyzing the codes of multiple simulation modules, and then the calculation weights of the multiple simulation modules are determined based on the data flow node trees. Based on this, the multiple simulation modules are integrated into multiple simulation threads corresponding to the multiple simulation units one by one, so that one simulation thread is executed in the same simulation unit in different simulation cycles, thereby reducing scheduling overhead, improving the efficiency of parallel simulation, and reducing the simulation time of integrated circuit design.
  • the disclosed embodiment does not specifically limit how to integrate multiple simulation modules into multiple simulation threads according to the calculation weights of the simulation modules.
  • integrating the plurality of simulation modules into a plurality of simulation threads according to the calculation weights of the plurality of simulation modules includes the following steps S31 to S33 .
  • step S31 the average calculation weight of the simulation unit is determined according to the calculation weights of the plurality of simulation modules.
  • step S32 the plurality of simulation modules are integrated into a plurality of simulation threads according to the average calculation weight.
  • step S33 simulation codes for each simulation thread are generated.
  • the average calculation weight of the simulation unit refers to the average calculation weight that makes the calculation amount of multiple simulation units equal in one simulation cycle; integrating multiple simulation modules into multiple simulation threads according to the average calculation weight refers to integrating multiple simulation modules into multiple simulation threads whose calculation weights are close to the average calculation weight.
  • the calculation weight of the integrated simulation thread is equal to the average calculation weight, or the difference between the calculation weight of the integrated simulation thread and the average calculation weight is less than a specific value.
  • multiple simulation modules are integrated into multiple simulation threads according to the average calculation weight, the calculation weights of each simulation thread are similar or equal, and the running time of the simulation unit executing the simulation code of each simulation thread is similar or equal, thereby improving the parallelism of the simulation and the efficiency of the parallelized simulation, thereby further reducing the simulation time during integrated circuit design.
  • the disclosed embodiments do not specifically limit how to determine the average calculation weight of a simulation unit according to the calculation weights of multiple simulation modules.
  • the total calculation weight may be calculated according to the calculation weights of each simulation module, and then the average calculation weight may be calculated using the total calculation weight and the number of simulation threads or the number of simulation units.
  • the embodiments of the present disclosure do not specifically limit how to integrate multiple simulation modules.
  • multiple simulation modules are merged to generate a simulation thread.
  • the plurality of simulation modules are integrated into the plurality of simulation threads (the above step S32), including: according to the average calculation weight, the plurality of first simulation modules are combined to generate one simulation thread, wherein the first simulation module is the simulation module having a calculation weight less than the average calculation weight. module.
  • calculation weight of the simulation thread generated by combining multiple first simulation modules is equal to or close to the average calculation weight.
  • the simulation module whose calculation weight is less than the average calculation weight and whose difference from the average calculation weight is greater than a specific value is used as the first simulation module. That is, only the simulation modules with sufficiently small calculation weight are merged; and the simulation modules whose calculation weight is less than the average calculation weight and whose difference from the average calculation weight is less than a specific value directly generate a simulation thread, occupying one simulation unit.
  • a simulation module is split into multiple simulation threads.
  • multiple simulation modules are integrated into multiple simulation threads (the above-mentioned step S32), including: according to the data flow node tree corresponding to the second simulation module, the second simulation module is split to generate multiple simulation threads, wherein the second simulation module is the simulation module whose calculation weight is greater than the average calculation weight.
  • splitting the second simulation module according to the data flow node tree means splitting with the data flow node tree as the splitting unit.
  • one data flow node tree corresponding to the second simulation module can be generated into one simulation thread, or multiple data flow node trees corresponding to the second simulation module can be generated into one simulation thread. The disclosed embodiment does not make special restrictions on this.
  • calculation weights of the multiple simulation threads generated by splitting the second simulation module are equal to or close to the average calculation weight.
  • the simulation module whose calculation weight is greater than the average calculation weight and whose difference from the average calculation weight is greater than a specific value is used as the second simulation module. That is, only the simulation module whose calculation weight is large enough is split; and the simulation module whose calculation weight is greater than the average calculation weight and whose difference from the average calculation weight is less than a specific value directly generates a simulation thread, occupying one simulation unit.
  • the data flow node tree may also be split.
  • the second simulation module is split to generate multiple simulation threads, including: according to the average calculation weight, the first data flow node tree is split into multiple data flow node subtrees, wherein the first data flow node tree is the data flow node tree corresponding to the second simulation module, and the calculation weight is greater than the average calculation weight, and the calculation weight of the data flow node subtree is less than or equal to the average calculation weight; and according to the data flow node subtree and the second data flow node tree corresponding to the second simulation module, the second simulation module is split to generate multiple simulation threads, wherein the first data flow node tree is the data flow node tree corresponding to the second simulation module, and the calculation weight is less than or equal to the average calculation weight.
  • the disclosed embodiment does not specifically limit how to split the first data stream node tree into multiple data stream node subtrees.
  • the first data flow node tree is split into multiple data flow node subtrees, including: traversing the first data flow node from the output signal corresponding to the first data flow node; determining the calculation weight of the candidate data flow node subtree obtained during the traversal process; and determining the candidate data flow node subtree whose calculation weight is less than or equal to the average calculation weight as the data flow node subtree.
  • the second simulation module is split to generate multiple simulation threads, including: determining the calculation weights of the overlapping nodes of the data flow node tree corresponding to any two output signals of the second simulation module; and when the proportion of the calculation weights of the overlapping nodes is less than a preset threshold, The data flow node trees corresponding to the two output signals are divided into different simulation threads.
  • the overlapping node refers to the common node of two data flow node trees.
  • the overlapping nodes can be reused; in different simulation threads, for different data flow node trees, the overlapping nodes cannot be reused.
  • the calculation weight ratio of overlapping nodes when the calculation weight ratio of overlapping nodes is less than a preset threshold, the number of data flow nodes corresponding to the two output signals is divided into different simulation threads, which will not increase too much calculation weight; and when the calculation weight ratio of overlapping nodes is greater than the preset threshold, the data flow nodes corresponding to the two output signals are divided into the same simulation thread.
  • generating simulation code for each simulation thread includes: traversing the data flow node tree corresponding to the simulation thread, and generating the simulation code for the simulation thread according to the node type and data of the nodes of the data flow node tree.
  • the generated simulation code is a compilable code that can be executed by the simulation unit.
  • the disclosed embodiment does not specifically limit how to analyze the code of the simulation module to generate a data flow node tree.
  • the codes of multiple simulation modules are analyzed to generate multiple data flow node trees, including the following steps S11 to S13 .
  • step S11 lexical analysis is performed on the code of the simulation module to generate lexical information.
  • step S12 the lexical information is parsed to generate an abstract syntax tree corresponding to the simulation module.
  • step S13 the abstract syntax tree is traversed according to the output signal of the simulation module to generate the data flow node tree corresponding to the output signal, wherein the simulation module includes at least one output signal.
  • the code of the simulation module is subjected to lexical analysis, including: scanning and extracting information such as tags, parameter lists, port lists, register lists, operators, keywords, line numbers, etc. of the simulation language through lexical analysis.
  • an abstract syntax tree is generated according to the simulation language rules.
  • the abstract syntax tree is a tree-like abstract representation of the syntax structure.
  • the disclosed embodiment does not specifically limit how to generate the data stream node tree.
  • the abstract syntax tree is traversed according to the output signal of the simulation module to generate the data flow node tree corresponding to the output signal (the above-mentioned step S13), including: traversing the abstract syntax tree to extract the input signal and output signal of the simulation module; for any one of the output signals, the input signals associated with the output signal are combined into the data flow node tree corresponding to the output signal; and, according to the node type of the node of the data flow node tree, the calculation weight of each of the nodes is set.
  • the calculation weight of each node when setting the calculation weight of each node, is set according to the node type such as branch structure, various operator expressions, terminal node, etc.
  • traversing the abstract syntax tree according to the output signal of the simulation module to generate the data flow node tree corresponding to the output signal also includes: parallelizing the nodes in the data flow node tree to obtain parallel nodes; setting parallel flags for the parallel nodes; and determining the calculation weights of the parallel nodes.
  • the disclosed embodiment does not specifically limit how to determine the calculation weight of the simulation module according to the data flow node tree.
  • the calculation of the plurality of simulation modules is determined according to the plurality of data flow node trees. Weight, including the following steps S21 and S22.
  • step S21 the calculation weight of the data stream node tree is determined according to the number of nodes in the data stream node tree, the tree depth, and the calculation weight of each node.
  • step S22 the calculation weight of the simulation module is determined according to the calculation weights of each data flow node tree corresponding to the simulation module.
  • simulation method provided in the embodiment of the present disclosure is suitable for but not limited to the design simulation of large-scale integrated circuits, especially system modeling SystemC simulation and register transfer level (RTL) simulation.
  • the large-scale integrated circuits include but are not limited to switching chips and network processor chips.
  • an embodiment of the present disclosure provides an electronic device, comprising: one or more processors 101; a memory 102, on which one or more programs are stored, and when the one or more programs are executed by one or more processors, the one or more processors implement the simulation method described in the first aspect of the embodiment of the present disclosure; and one or more I/O interfaces 103, connected between the processor and the memory, and configured to implement information interaction between the processor and the memory.
  • the processor 101 is a device with data processing capabilities, including but not limited to a central processing unit (CPU); the memory 102 is a device with data storage capabilities, including but not limited to random access memory (RAM, more specifically SDRAM, DDR, etc.), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), and flash memory (FLASH); the I/O interface (read-write interface) 103 is connected between the processor 101 and the memory 102, and can realize information exchange between the processor 101 and the memory 102, including but not limited to a data bus (Bus), etc.
  • RAM random access memory
  • ROM read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • FLASH flash memory
  • the I/O interface (read-write interface) 103 is connected between the processor 101 and the memory 102, and can realize information exchange between the processor 101 and the memory 102, including but not limited to a data bus (Bus), etc.
  • the processor 101 , the memory 102 , and the I/O interface 103 are connected to each other via a bus 104 , and further connected to other components of the computing device.
  • an embodiment of the present disclosure provides a computer-readable medium having a computer program stored thereon, and when the program is executed by a processor, the simulation method described in the first aspect of the embodiment of the present disclosure is implemented.
  • This embodiment provides a parallel simulation method, comprising the following steps:
  • Perform lexical analysis on the codes of each simulation module scan and extract the tags, parameter lists, port lists, register lists, operators, keywords, line numbers and other information of the simulation language, and output the corresponding lexical information.
  • Simulation module and signal analysis traverse the abstract syntax tree to obtain the calling relationship, instantiation parameters, and symbol table of each simulation module including input signals, output signals, registers, etc., and clarify the signal connections between simulation modules.
  • Binding relationship analysis traverse the abstract syntax tree and apply module parameters (parameter and localparam) to each simulation module and expression node; for any output signal in the symbol table, construct a subtree composed of all related input signals into a certain data flow node tree using branch structures, operator expressions, terminals and other nodes.
  • various nodes on the data flow node tree are modified: when the simulation hardware supports multi-operand parallel processing, the nodes are processed in parallel.
  • the calculation weight of the data flow node tree is determined, and then the calculation weight of the simulation module is obtained.
  • Module splitting and merging Multiple modules with calculation weights less than the average calculation weight of the simulation unit are merged to generate the running code of a single simulation thread, occupying one simulation unit; large calculation weight modules are split according to the data flow node tree of each output signal to generate multiple simulation thread codes, occupying multiple simulation units.
  • Split by output signal traverse the data flow node tree of two different output signals and calculate the calculation weights of the overlapping nodes; when the calculation weight ratio of the overlapping nodes is lower than the preset threshold, the simulation calculations of the two signals can be separated into different simulation threads or simulation units.
  • Data flow node tree splitting The calculation weight of the data flow node tree of some output signals is much greater than the average calculation weight of the simulation unit, and needs to be split; traverse the data flow node tree from the output signal, calculate the calculation weight of each subtree, until a subtree with a weight less than the average calculation weight of the simulation unit is found, and then split.
  • Code generation Traverse the data flow node tree and generate corresponding simulation code.
  • This embodiment provides a device for implementing the above-mentioned simulation method, including the following modules: lexical analysis, syntax analysis, syntax tree access, and symbol table construction; it also includes: data flow node tree conversion, node parallelization, node calculation weight table, data flow node tree operation, code generation and other modules.
  • the lexical analysis module mainly extracts language feature information and outputs the results to the syntax analysis module.
  • the syntax analysis module determines the correctness of the syntax according to the syntax rules, and records the syntax information to form an abstract syntax tree.
  • the syntax tree access module implements the method of traversing the syntax tree, and the symbol table module records the symbol information of the syntax tree and extracts the input and output signals.
  • the syntax tree is traversed for the output signal, and all related input signals are formed into a new data flow node tree.
  • Each node is set with calculation weights according to the branch structure, various operator expressions, terminal nodes and other node types; the data flow node tree is a binary tree.
  • the simulation unit supports multi-operand parallel processing, the nodes involved also need to set corresponding flags and adjust the calculation weights.
  • the calculation weight of the entire tree and the calculation weight of all modules can be obtained.
  • the calculation weight of all modules is divided by the number of simulation units, which is the average calculation weight that needs to be assigned to each simulation unit.
  • Multiple modules or data flow node trees with calculation weights less than the average calculation weight of the simulation unit can be merged to occupy one simulation unit.
  • the data flow node tree is traversed from the output signal, and the computational weight of each subtree is calculated simultaneously, until a subtree with a computational weight smaller than the average computational weight of the simulation unit is found, and the subtree is split and the output and input signals of the subtree are readjusted.
  • Such software may be distributed on computer-readable media, which may include computer storage media (or non-transitory media) and communication media (or transient media).
  • computer storage media includes any medium that is implemented in any method or technology for storing information, such as computer-readable instructions, data structures, program modules, or other data. Volatile and non-volatile, removable and non-removable media.
  • Computer storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tapes, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store the desired information and can be accessed by a computer.
  • communication media typically contain computer-readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Geometry (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本公开提供一种仿真方法,包括:对多个仿真模块的代码进行分析,生成多个数据流节点树,其中,每个所述仿真模块对应至少一个所述数据流节点树;根据多个所述数据流节点树确定多个所述仿真模块的计算权重;根据多个所述仿真模块的计算权重将多个所述仿真模块整合为多个仿真线程,其中,多个所述仿真线程与多个仿真单元一对一对应。本公开还提供一种电子设备、一种计算机可读介质。

Description

仿真方法、电子设备、计算机可读介质
相关申请的交叉引用
本申请要求于2022年9月30日提交的中国专利申请No.202211208476.5的优先权,该中国专利申请的内容通过引用的方式整体合并于此。
技术领域
本公开涉及集成电路设计技术领域,特别涉及一种仿真方法、一种电子设备、一种计算机可读介质。
背景技术
随着集成电路设计的规模不断增加,仿真所需时间也在快速上升。动辄几天甚至几周的仿真时间,使得系统建模systemC仿真和寄存器传输级(RTL,Register Transfer Level)仿真已成为集成电路设计环节中的瓶颈。通过多核服务器等设施进行并行化仿真,是提升仿真效率的重要手段。
但是,并行化仿真的性能有待进一步优化。
发明内容
本公开实施例提供一种仿真方法、一种电子设备、一种计算机可读介质。
第一方面,本公开实施例提供一种仿真方法,包括:对多个仿真模块的代码进行分析,生成多个数据流节点树,其中,每个所述仿真模块对应至少一个所述数据流节点树;根据多个所述数据流节点树确定多个所述仿真模块的计算权重;根据多个所述仿真模块的计算权重将多个所述仿真模块整合为多个仿真线程,其中,多个所述仿真线程与多个仿真单元一对一对应。
第二方面,本公开实施例提供一种电子设备,包括:一个或多个处理器;存储器,其上存储有一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现本公开实施例第一方面所述的仿真方法。
第三方面,本公开实施例提供一种计算机可读介质,其上存储有计算机程序,所述程序被处理器执行时实现本公开实施例第一方面所述的仿真方法。
附图说明
图1是一种并行化仿真的示意图;
图2是另一种并行化仿真的示意图;
图3是根据本公开实施例的一种仿真方法的流程图;
图4是根据本公开实施例的另一种仿真方法中部分步骤的流程图;
图5是根据本公开实施例的一种电子设备的组成框图;
图6是根据本公开实施例的一种计算机可读介质的组成框图。
具体实施方式
为使本领域的技术人员更好地理解本公开的技术方案,下面结合附图对本公开提供的仿真方法、电子设备、计算机可读介质进行详细描述。
在下文中将参考附图更充分地描述示例实施例,但是所述示例实施例可以以不同形式来体现且不应当被解释为限于本文阐述的实施例。反之,提供这些实施例的目的在于使本公开透彻和完整,并将使本领域技术人员充分理解本公开的范围。
在不冲突的情况下,本公开各实施例及实施例中的各特征可相互组合。
如本文所使用的,术语“和/或”包括一个或多个相关列举条目的任何和所有组合。
本文所使用的术语仅用于描述特定实施例,且不意欲限制本公开。如本文所使用的,单数形式“一个”和“该”也意欲包括复数形式,除非上下文另外清楚指出。还将理解的是,当本说明书中使用术语“包括”和/或“由……制成”时,指定存在所述特征、整体、步骤、操作、元件和/或组件,但不排除存在或添加一个或多个其它特征、整体、步骤、操作、元件、组件和/或其群组。
除非另外限定,否则本文所用的所有术语(包括技术和科学术语)的含义与本领域普通技术人员通常理解的含义相同。还将理解,诸如那些在常用字典中限定的那些术语应当被解释为具有与其在相关技术以及本公开的背景下的含义一致的含义,且将不解释为具有理想化或过度形式上的含义,除非本文明确如此限定。
在一些相关技术中,进行并行化仿真时,首先确定当前仿真周期需要运行的多个仿真模块(BLOCK)的仿真线程,加入就绪队列,将就绪队列中的仿真线程调度到多个中央处理器(CPU,Central Processing Unit)的执行核或仿真单元上实现并行运行;待所有仿真线程完成运行后,再确定下一仿真周期需要运行的仿真线程;周而复始。如图1所示,利用上述并行化仿真技术进行大规模集成电路的设计仿真时,由于就绪队列中的大量线程需要消耗调度开销,执行核会存在等待调度、执行状态两种状态,会导致仿真效率降低;如图2所示,由于不同仿真模块的设计实现存在差异,不同仿真线程的计算量差异较大,在一个仿真周期内,通常都会有一些仿真线程完成计算后需要等待另一些仿真线程完成计算,例如,执行核0的执行状态的时长小于执行核1的执行状态的时长,执行核0在计算完后进行等待调度的状态,等执行核1计算完后才能进入下一个仿真周期,导致仿真并行度较低。
有鉴于此,第一方面,参照图3,本公开实施例提供一种仿真方法,包括以下步骤S1至S3。
在步骤S1、对多个仿真模块的代码进行分析,生成多个数据流节点树,其中,每个所述仿真模块对应至少一个所述数据流节点树。
在步骤S2、根据多个所述数据流节点树确定多个所述仿真模块的计算权重。
在步骤S3、根据多个所述仿真模块的计算权重将多个所述仿真模块整合为多个仿真线程,其中,多个所述仿真线程与多个仿真单元一对一对应。
在本公开实施例中,数据流节点树是表示仿真模块中信号流向及操作的树状结构。在一个仿真模块中可以有一个或多个信号流向;相应地,一个仿真模块可以对应一个或多个数据流节点树,不同的数据流节点树表示仿真模块中不同的信号流向。本公开实施例对此不做特殊限定。
在本公开实施例中,计算权重表示计算量。例如,仿真模块的计算权重表示仿真模块对应的计算量,决定完成仿真模块的计算所需的时间。
在本公开实施例中,多个仿真线程与多个仿真单元一一对应,是指将仿真线程与仿真单元绑定,从而确定了仿真线程与仿真单元固定的绑定关系,将多个仿真模块整合得到的仿真线程的数量与仿真使用的仿真单元的数量相等,一个仿真线程在不同的仿真周期都在一个仿真单元中执行,不同的仿真线程在不同的仿真单元中并行执行,从而能够降低调度开销。
本公开实施例对仿真单元不做特殊限定。例如,仿真单元可以是CPU核,也可以是其他仿真单元。
本公开实施例对于如何整合多个仿真模块不做特殊限定。例如,可以拆分仿真模块,也可以将多个仿真模块合并。
本公开实施例提供的仿真方法中,通过对多个仿真模块的代码进行分析,生成多个数据流节点树,然后根据数据流节点树确定了多个仿真模块的计算权重,以此为依据将多个仿真模块整合为与多个仿真单元一一对应的多个仿真线程,使得一个仿真线程在不同的仿真周期都在同一个仿真单元中执行,从而降低了调度开销,提升了并行化仿真的效率,减少了集成电路设计的仿真时间。
本公开实施例对于如何根据仿真模块的计算权重将多个仿真模块整合为多个仿真线程不做特殊限定。
在一些实施例中,参照图4,根据多个所述仿真模块的计算权重将多个所述仿真模块整合为多个仿真线程,包括以下步骤S31至S33。
在步骤S31、根据多个所述仿真模块的计算权重确定所述仿真单元的平均计算权重。
在步骤S32、根据所述平均计算权重,将多个所述仿真模块整合为多个所述仿真线程。
在步骤S33、生成各个所述仿真线程的仿真代码。
需要说明的是,在本公开实施例中,仿真单元的平均计算权重,是指使得多个仿真单元在一个仿真周期内的计算量相等的平均计算权重;根据平均计算权重将多个仿真模块整合为多个仿真线程,是指将多个仿真模块整合为多个计算权重接近平均计算权重的仿真线程。例如,整合得到的仿真线程的计算权重等于平均计算权重,或整合得到的仿真线程的计算权重与平均计算权重的差距小于特定值。
在本公开实施例中,根据平均计算权重将多个仿真模块整合为多个仿真线程,各个仿真线程的计算权重相近或相等,仿真单元执行各个仿真线程的仿真代码的运行时间相近或相等,从而能够提升仿真的并行度,提升了并行化仿真的效率,从而能够进一步减少集成电路设计时的仿真时间。
本公开实施例对于如何根据多个仿真模块的计算权重确定仿真单元的平均计算权重不做特殊限定。例如,可根据各个仿真模块的计算权重计算总计算权重,然后利用总计算权重和仿真线程的数量或仿真单元的数量计算平均计算权重。
本公开实施例对于如何整合多个仿真模块不做特殊限定。
在一些实施例中,将其中多个仿真模块合并生成一个仿真线程。
相应地,在一些实施例中,根据所述平均计算权重,将多个所述仿真模块整合为多个所述仿真线程(上述步骤S32),包括:根据所述平均计算权重,将多个第一仿真模块合并生成一个所述仿真线程,其中,所述第一仿真模块为计算权重小于所述平均计算权重的所述仿真 模块。
需要说明的是,多个第一仿真模块合并生成的仿真线程的计算权重等于或接近平均计算权重。
在一些实施例中,将计算权重小于平均计算权重、且与平均计算权重的差距大于特定值的仿真模块作为第一仿真模块。即,只将计算权重足够小的仿真模块进行合并;而计算权重小于平均计算权重、且与平均计算权重的差距小于特定值的仿真模块则直接生成一个仿真线程,占用一个仿真单元。
在一些实施例中,将一个仿真模块拆分为多个仿真线程。
相应地,在一些实施例中,根据所述平均计算权重,将多个所述仿真模块整合为多个所述仿真线程(上述步骤S32),包括:按照第二仿真模块对应的所述数据流节点树,将所述第二仿真模块拆分生成多个所述仿真线程,其中,所述第二仿真模块为计算权重大于所述平均计算权重的所述仿真模块。
在本公开实施例中,按照数据流节点树拆分第二仿真模块,是指以数据流节点树作为拆分单位进行拆分。在本公开实施例中,可以将第二仿真模块对应的一个数据流节点树生成一个仿真线程,也可以将第二仿真模块对应的多个数据流节点树生成一个仿真线程。本公开实施例对此不做特殊限定。
需要说明的是,将第二仿真模块拆分生成的多个仿真线程的计算权重等于或接近平均计算权重。
在一些实施例中,将计算权重大于平均计算权重、且与平均计算权重的差距大于特定值的仿真模块作为第二仿真模块。即,只将计算权重足够大的仿真模块进行拆分;而计算权重大于平均计算权重、且与平均计算权重的差距小于特定值的仿真模块则直接生成一个仿真线程,占用一个仿真单元。
在一些实施例中,在第二仿真模块对应的数据流节点树的计算权重较大的情况下,还可以对数据流节点树进行拆分。
相应地,在一些实施例中,按照第二仿真模块对应的所述数据流节点树,将所述第二仿真模块拆分生成多个所述仿真线程,包括:根据所述平均计算权重,将第一数据流节点树拆分为多个数据流节点子树,其中,所述第一数据流节点树为所述第二仿真模块对应的、计算权重大于所述平均计算权重的所述数据流节点树,所述数据流节点子树的计算权重小于或等于所述平均计算权重;以及按照所述第二仿真模块对应的所述数据流节点子树和第二数据流节点树,将所述第二仿真模块拆分生成多个所述仿真线程,其中,所述第一数据流节点树为所述第二仿真模块对应的、计算权重小于或等于所述平均计算权重的所述数据流节点树。
本公开实施例对于如何将第一数据流节点树拆分为多个数据流节点子树不做特殊限定。
在一些实施例中,根据所述平均计算权重,将第一数据流节点树拆分为多个数据流节点子树,包括:从所述第一数据流节点对应的输出信号遍历所述第一数据流节点;确定遍历过程中获取的候选数据流节点子树的计算权重;以及将计算权重小于或等于所述平均计算权重的候选数据流节点子树确定为所述数据流节点子树。
在一些实施例中,按照第二仿真模块对应的所述数据流节点树,将所述第二仿真模块拆分生成多个所述仿真线程,包括:确定所述第二仿真模块的任意两个输出信号对应的数据流节点树的重合节点的计算权重;以及在所述重合节点的计算权重占比小于预设阈值的情况下, 将两个所述输出信号对应的数据流节点树划分到不同的所述仿真线程。
在本公开实施例中,重合节点是指两个数据流节点树共同的节点。在同一个仿真线程中,对于不同的数据流节点树,重合节点能够复用;在不同的仿真线程中,对于不同的数据流节点树,重合节点不能复用。
在本公开实施例中,在重合节点的计算权重占比小于预设阈值的情况下,将两个输出信号对应的数据流节点数划分到不同的仿真线程,不会增加过多的计算权重;而在重合节点的计算权重占比大于预设阈值的情况下,将两个输出信号对应的数据流节点划分到同一个仿真线程,通过复用重合节点能够避免新增计算权重,有利于确保仿真效率。
在一些实施例中,生成各个所述仿真线程的仿真代码(上述步骤S33),包括:遍历所述仿真线程对应的数据流节点树,根据所述数据流节点树的节点的节点类型和数据,生成所述仿真线程的仿真代码。
需要说明的是,生成的仿真代码为能够由仿真单元执行的可编译的代码。
本公开实施例对于如何对仿真模块的代码进行分析以生成数据流节点树不做特殊限定。
在一些实施例中,参照图4,对多个仿真模块的代码进行分析,生成多个数据流节点树,包括以下步骤S11至S13。
在步骤S11、对所述仿真模块的代码进行词法分析,生成词法信息。
在步骤S12、对所述词法信息进行语法分析,生成所述仿真模块对应的抽象语法树。
在步骤S13、根据所述仿真模块的输出信号遍历所述抽象语法树,生成所述输出信号对应的所述数据流节点树,其中,所述仿真模块包括至少一个所述输出信号。
在一些实施例中,对所述仿真模块的代码进行词法分析,包括:通过词法分析扫描、提取仿真语言的标记、参数列表、端口列表、寄存器列表、运算符、关键字、行号等信息。
在一些实施例中,进行语法分析时,根据仿真语言规则生成抽象语法树。抽象语法树是语法结构的树状抽象表示。
本公开实施例对于如何生成数据流节点树不做特殊限定。
在一些实施例中,根据所述仿真模块的输出信号遍历所述抽象语法树,生成所述输出信号对应的所述数据流节点树(上述步骤S13),包括:遍历所述抽象语法树,提取所述仿真模块的输入信号和输出信号;针对任意一个所述输出信号,将与所述输出信号关联的输入信号组成所述输出信号对应的所述数据流节点树;以及,根据所述数据流节点树的节点的节点类型,设置各个所述节点的计算权重。
在一些实施例中,在设置各个节点的计算权重时,每个节点按分支结构、各种运算符表达式、终端节点等节点类型设置计算权重。
在一些实施例中,根据所述仿真模块的输出信号遍历所述抽象语法树,生成所述输出信号对应的所述数据流节点树(上述步骤S13),还包括:对所述数据流节点树中的节点进行并行化处理,得到并行节点;为所述并行节点设置并行标志;以及,确定所述并行节点的计算权重。
需要说明的是,在本公开实施例中,当仿真单元支持多操作数并行处理时,对节点进行并行化处理,有利于提升运算效率。
本公开实施例对于如何根据数据流节点树确定仿真模块的计算权重不做特殊限定。
在一些实施例中,参照图4,根据多个所述数据流节点树确定多个所述仿真模块的计算 权重,包括以下步骤S21和S22。
在步骤S21、根据所述数据流节点树节点数目、树深度、各个节点的计算权重确定所述数据流节点树的计算权重。
在步骤S22、根据所述仿真模块对应的各个数据流节点树的计算权重,确定所述仿真模块的计算权重。
需要说明的是,本公开实施例提供的仿真方法适于但不限于大规模集成电路的设计仿真,尤其是系统建模systemC仿真和寄存器传输级(RTL)仿真,所述大规模集成电路包括但不限于交换芯片及网络处理器芯片。
第二方面,参照图5,本公开实施例提供一种电子设备,包括:一个或多个处理器101;存储器102,其上存储有一个或多个程序,当一个或多个程序被一个或多个处理器执行,使得一个或多个处理器实现本公开实施例第一方面所述的仿真方法;以及,一个或多个I/O接口103,连接在处理器与存储器之间,配置为实现处理器与存储器的信息交互。
处理器101为具有数据处理能力的器件,其包括但不限于中央处理器(CPU)等;存储器102为具有数据存储能力的器件,其包括但不限于随机存取存储器(RAM,更具体如SDRAM、DDR等)、只读存储器(ROM)、带电可擦可编程只读存储器(EEPROM)、闪存(FLASH);I/O接口(读写接口)103连接在处理器101与存储器102间,能实现处理器101与存储器102的信息交互,其包括但不限于数据总线(Bus)等。
在一些实施例中,处理器101、存储器102和I/O接口103通过总线104相互连接,进而与计算设备的其它组件连接。
第三方面,参照图6,本公开实施例提供一种计算机可读介质,其上存储有计算机程序,程序被处理器执行时实现本公开实施例第一方面所述的仿真方法。
为了使本领域技术人员能够更清楚地理解本公开实施例提供的技术方案,下面通过具体的实施例,对本公开实施例提供的技术方案进行详细说明:
实施例一
本实施例提供一种并行化仿真方法,包括以下步骤:
对各仿真模块代码进行词法分析:扫描和提取仿真语言的标记、参数列表、端口列表、寄存器列表、运算符、关键字和行号等信息,输出对应词法信息。
对词法信息进行语法分析:根据使用语言的语法规则,生成抽象语法树。
仿真模块及信号分析:遍历抽象语法树,获取各仿真模块的调用关系、例化参数以及包括输入信号和输出信号、寄存器等信息的符号表,明确仿真模块间的各信号连接。
绑定关系分析:遍历抽象语法树,将模块参数(parameter和localparam)应用到各仿真模块和表达式节点等处;对符号表中任一输出信号,把所有相关的输入信号组成的子树,用分支结构、运算符表达式、终端等节点构建成一棵确定的数据流节点树。
根据仿真方式类型,对数据流节点树上的各种节点进行修改:当仿真硬件支持多操作数并行处理时,对节点进行并行化处理。
根据数据流节点树中的节点数目和节点类型对应的计算权重,确定数据流节点树的计算权重,进而得到仿真模块的计算权重。
根据所有仿真模块的总计算权重和仿真线程或仿真单元数目,确定仿真单元平均计算权 重。
模块拆分合并:多个计算权重小于仿真单元平均计算权重的模块合并生成单仿真线程的运行代码,占用一个仿真单元;大计算权重模块按各个输出信号的数据流节点树,分拆生成多个仿真线程代码,占用多个仿真单元。
按输出信号拆分:遍历两个不同的输出信号的数据流节点树,计算重合节点的计算权重;当重合节点的计算权重占比低于预设阈值时,可以将这两个信号的仿真计算分离到不同的仿真线程或仿真单元。
数据流节点树拆分:某些输出信号的数据流节点树的计算权重远大于仿真单元平均计算权重,需要进行拆分;从输出信号遍历数据流节点树,计算各子树的计算权重,直到一个小于仿真单元平均计算权重的子树,进行拆分。
代码生成:遍历数据流节点树,生成对应仿真代码。
实施例二
本实施例提供一种实现上述仿真方法的设备,包括以下模块:词法分析、语法分析、语法树访问、构建符号表;还包括:数据流节点树转换、节点并行化、节点计算量权重表、数据流节点树运算、代码生成等模块。
词法分析模块主要完成对语言特征信息的提取,将结果输出到语法分析模块。
语法分析模块根据语法规则,判断语法正确性,同时记录语法信息,形成抽象语法树。语法树访问模块实现遍历语法树的方法,符号表模块记录语法树的符号信息并提取输入输出信号。针对输出信号遍历语法树,将所有相关的输入信号形成新的数据流节点树,每个节点按分支结构、各种运算符表达式、终端节点等节点类型设置计算权重;数据流节点树是二叉树,当仿真单元支持多操作数并行处理时,涉及的节点还需要设置对应标志及调整计算权重。
经过以上操作后,通过遍历树,可以得到整个树的计算权重和所有模块的计算权重。将所有模块的计算权重除以仿真单元数目,即为需要分配给每个仿真单元的平均计算权重。多个计算权重小于仿真单元平均计算权重的模块或数据流节点树可以合并,占用一个仿真单元。
对于大计算权重的数据流节点树,从输出信号遍历数据流节点树,同时计算各子树的计算权重,直到找到一个小于仿真单元平均计算权重的子树,进行拆分并重新调整子树的输出输入信号。
最后,遍历数据流节点树,按节点类型和数据,为目标仿真单元生成可编译的仿真代码。
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统、装置中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。在硬件实施方式中,在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分;例如,一个物理组件可以具有多个功能,或者一个功能或步骤可以由若干物理组件合作执行。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其它数据)的任何方法或技术中实施的易 失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其它存储器技术、CD-ROM、数字多功能盘(DVD)或其它光盘存储、磁盒、磁带、磁盘存储或其它磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其它的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其它传输机制之类的调制数据信号中的其它数据,并且可包括任何信息递送介质。
本文已经公开了示例实施例,并且虽然采用了具体术语,但它们仅用于并仅应当被解释为一般说明性含义,并且不用于限制的目的。在一些实例中,对本领域技术人员显而易见的是,除非另外明确指出,否则可单独使用与特定实施例相结合描述的特征、特性和/或元素,或可与其它实施例相结合描述的特征、特性和/或元件组合使用。因此,本领域技术人员将理解,在不脱离由所附的权利要求阐明的本公开的范围的情况下,可进行各种形式和细节上的改变。

Claims (14)

  1. 一种仿真方法,包括:
    对多个仿真模块的代码进行分析,生成多个数据流节点树,其中,每个所述仿真模块对应至少一个所述数据流节点树;
    根据多个所述数据流节点树确定多个所述仿真模块的计算权重;
    根据多个所述仿真模块的计算权重将多个所述仿真模块整合为多个仿真线程,其中,多个所述仿真线程与多个仿真单元一对一对应。
  2. 根据权利要求1所述的仿真方法,其中,根据多个所述仿真模块的计算权重将多个所述仿真模块整合为多个仿真线程,包括:
    根据多个所述仿真模块的计算权重确定所述仿真单元的平均计算权重;
    根据所述平均计算权重,将多个所述仿真模块整合为多个所述仿真线程;
    生成各个所述仿真线程的仿真代码。
  3. 根据权利要求2所述的仿真方法,其中,根据所述平均计算权重,将多个所述仿真模块整合为多个所述仿真线程,包括:
    根据所述平均计算权重,将多个第一仿真模块合并生成一个所述仿真线程,其中,所述第一仿真模块为计算权重小于所述平均计算权重的所述仿真模块。
  4. 根据权利要求2所述的仿真方法,其中,根据所述平均计算权重,将多个所述仿真模块整合为多个所述仿真线程,包括:
    按照第二仿真模块对应的所述数据流节点树,将所述第二仿真模块拆分生成多个所述仿真线程,其中,所述第二仿真模块为计算权重大于所述平均计算权重的所述仿真模块。
  5. 根据权利要求4所述的仿真方法,其中,按照第二仿真模块对应的所述数据流节点树,将所述第二仿真模块拆分生成多个所述仿真线程,包括:
    根据所述平均计算权重,将第一数据流节点树拆分为多个数据流节点子树,其中,所述第一数据流节点树为所述第二仿真模块对应的、计算权重大于所述平均计算权重的所述数据流节点树,所述数据流节点子树的计算权重小于或等于所述平均计算权重;
    按照所述第二仿真模块对应的所述数据流节点子树和第二数据流节点树,将所述第二仿真模块拆分生成多个所述仿真线程,其中,所述第一数据流节点树为所述第二仿真模块对应的、计算权重小于或等于所述平均计算权重的所述数据流节点树。
  6. 根据权利要求5所述的仿真方法,其中,根据所述平均计算权重,将第一数据流节点树拆分为多个数据流节点子树,包括:
    从所述第一数据流节点对应的输出信号遍历所述第一数据流节点;
    确定遍历过程中获取的候选数据流节点子树的计算权重;
    将计算权重小于或等于所述平均计算权重的候选数据流节点子树确定为所述数据流节点 子树。
  7. 根据权利要求4所述的仿真方法,其中,按照第二仿真模块对应的所述数据流节点树,将所述第二仿真模块拆分生成多个所述仿真线程,包括:
    确定所述第二仿真模块的任意两个输出信号对应的数据流节点树的重合节点的计算权重;
    在所述重合节点的计算权重占比小于预设阈值的情况下,将两个所述输出信号对应的数据流节点树划分到不同的所述仿真线程。
  8. 根据权利要求2所述的仿真方法,其中,生成各个所述仿真线程的仿真代码,包括:
    遍历所述仿真线程对应的数据流节点树,根据所述数据流节点树的节点的节点类型和数据,生成所述仿真线程的仿真代码。
  9. 根据权利要求1至8中任意一项所述的仿真方法,其中,对多个仿真模块的代码进行分析,生成多个数据流节点树,包括:
    对所述仿真模块的代码进行词法分析,生成词法信息;
    对所述词法信息进行语法分析,生成所述仿真模块对应的抽象语法树;
    根据所述仿真模块的输出信号遍历所述抽象语法树,生成所述输出信号对应的所述数据流节点树,其中,所述仿真模块包括至少一个所述输出信号。
  10. 根据权利要求9所述的仿真方法,其中,根据所述仿真模块的输出信号遍历所述抽象语法树,生成所述输出信号对应的所述数据流节点树,包括:
    遍历所述抽象语法树,提取所述仿真模块的输入信号和输出信号;
    针对任意一个所述输出信号,将与所述输出信号关联的输入信号组成所述输出信号对应的所述数据流节点树;
    根据所述数据流节点树的节点的节点类型,设置各个所述节点的计算权重。
  11. 根据权利要求10所述的仿真方法,其中,根据所述仿真模块的输出信号遍历所述抽象语法树,生成所述输出信号对应的所述数据流节点树,还包括:
    对所述数据流节点树中的节点进行并行化处理,得到并行节点;
    为所述并行节点设置并行标志;
    确定所述并行节点的计算权重。
  12. 根据权利要求1至8中任意一项所述的仿真方法,其中,根据多个所述数据流节点树确定多个所述仿真模块的计算权重,包括:
    根据所述数据流节点树节点数目、树深度、各个节点的计算权重确定所述数据流节点树的计算权重;
    根据所述仿真模块对应的各个数据流节点树的计算权重,确定所述仿真模块的计算权重。
  13. 一种电子设备,包括:
    一个或多个处理器;
    存储器,其上存储有一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现根据权利要求1至12中任意一项所述的仿真方法。
  14. 一种计算机可读介质,其上存储有计算机程序,所述程序被处理器执行时实现根据权利要求1至12中任意一项所述的仿真方法。
PCT/CN2023/120117 2022-09-30 2023-09-20 仿真方法、电子设备、计算机可读介质 WO2024067303A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211208476.5A CN117852456A (zh) 2022-09-30 2022-09-30 仿真方法、电子设备、计算机可读介质
CN202211208476.5 2022-09-30

Publications (1)

Publication Number Publication Date
WO2024067303A1 true WO2024067303A1 (zh) 2024-04-04

Family

ID=90476238

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/120117 WO2024067303A1 (zh) 2022-09-30 2023-09-20 仿真方法、电子设备、计算机可读介质

Country Status (2)

Country Link
CN (1) CN117852456A (zh)
WO (1) WO2024067303A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20160112830A (ko) * 2015-03-20 2016-09-28 한국전자통신연구원 최적화된 태스크 기반 데이터 처리 서비스 생성 방법 및 장치
CN107329828A (zh) * 2017-06-26 2017-11-07 华中科技大学 一种面向cpu/gpu异构集群的数据流编程方法和系统
CN112115302A (zh) * 2020-09-29 2020-12-22 集美大学 一种XML文档的并行XPath查询方法、终端设备及存储介质
CN114385173A (zh) * 2020-10-16 2022-04-22 深圳前海微众银行股份有限公司 编译方法、装置、设备及存储介质
CN114970814A (zh) * 2022-05-17 2022-08-30 北京灵汐科技有限公司 神经网络计算图的处理方法及处理装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20160112830A (ko) * 2015-03-20 2016-09-28 한국전자통신연구원 최적화된 태스크 기반 데이터 처리 서비스 생성 방법 및 장치
CN107329828A (zh) * 2017-06-26 2017-11-07 华中科技大学 一种面向cpu/gpu异构集群的数据流编程方法和系统
CN112115302A (zh) * 2020-09-29 2020-12-22 集美大学 一种XML文档的并行XPath查询方法、终端设备及存储介质
CN114385173A (zh) * 2020-10-16 2022-04-22 深圳前海微众银行股份有限公司 编译方法、装置、设备及存储介质
CN114970814A (zh) * 2022-05-17 2022-08-30 北京灵汐科技有限公司 神经网络计算图的处理方法及处理装置

Also Published As

Publication number Publication date
CN117852456A (zh) 2024-04-09

Similar Documents

Publication Publication Date Title
CN108280023B (zh) 任务执行方法、装置和服务器
US10332008B2 (en) Parallel decision tree processor architecture
US20140282444A1 (en) Programming language transformations with abstract syntax tree extensions
WO2016078592A1 (zh) 批量数据查询方法和装置
CN106611044B (zh) 一种sql优化方法及设备
US9760531B2 (en) Method, apparatus, and computer-readable medium for parallelization of a computer program on a plurality of computing cores
AU2014315619B2 (en) Methods and systems of four-valued simulation
US20240045860A1 (en) Data query method and system, heterogeneous acceleration platform, and storage medium
CN109791492B (zh) 流水线相关树查询优化器和调度器
WO2023124217A1 (zh) 一种获取多列数据的综合排列数据的方法与设备
CN108513658A (zh) 一种事务处理方法及装置
Su et al. An efficient GPU implementation of inclusion-based pointer analysis
CN114385148A (zh) 一种实现联动功能的方法、装置、设备和存储介质
WO2024067303A1 (zh) 仿真方法、电子设备、计算机可读介质
CN112988163A (zh) 编程语言智能适配方法、装置、电子设备和介质
US20130173647A1 (en) String matching device based on multi-core processor and string matching method thereof
WO2015062035A1 (zh) 一种列式数据库处理方法及装置
CN114817124A (zh) 多核微控制器间映射方法、设备及计算机可读存储介质
CN113703739A (zh) 基于omiga引擎的跨语言融合计算方法、系统及终端
Zhao et al. AutoGraph: Optimizing DNN computation graph for parallel GPU kernel execution
CN110244954A (zh) 一种应用程序的编译方法及设备
CN114817311B (zh) 应用于GaussDB数据库存储过程的并行计算方法
CN117435248B (zh) 一种自适应指令集编码自动生成方法及装置
CN116821559B (zh) 用于快速获取一组大数据集中趋势的方法、系统及终端
US10592212B2 (en) System and method for software development based on procedures

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23870535

Country of ref document: EP

Kind code of ref document: A1