WO2023093185A1 - 一种用于神经网络计算的数据流动方法和装置 - Google Patents

一种用于神经网络计算的数据流动方法和装置 Download PDF

Info

Publication number
WO2023093185A1
WO2023093185A1 PCT/CN2022/116171 CN2022116171W WO2023093185A1 WO 2023093185 A1 WO2023093185 A1 WO 2023093185A1 CN 2022116171 W CN2022116171 W CN 2022116171W WO 2023093185 A1 WO2023093185 A1 WO 2023093185A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
variable
variables
calculation
output
Prior art date
Application number
PCT/CN2022/116171
Other languages
English (en)
French (fr)
Inventor
王宏升
陈�光
Original Assignee
之江实验室
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 之江实验室 filed Critical 之江实验室
Priority to US17/954,109 priority Critical patent/US11941507B2/en
Publication of WO2023093185A1 publication Critical patent/WO2023093185A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present invention relates to the computer system field of a specific calculation model, in particular to a data flow method and device for neural network calculation.
  • the object of the present invention is to provide a data flow method and device for neural network calculation to overcome the deficiencies in the prior art.
  • the present invention provides the following technical solutions:
  • the invention discloses a data flow method for neural network calculation, comprising the following steps:
  • Step 1 Initialize the life cycle of variables in the calculation graph, and initialize the time period from the definition of the variable to the end of use as the life cycle of the variable in the calculation graph;
  • Step 2 Define the propagation rules for the variable in use to flow through the node, define the variable at a certain node in the calculation graph to be used, and generate the definition of the variable at the predecessor node of the node, so that the input set of effective variables flowing through the node includes the variable;
  • Step 3 Design the propagation rules for redefining variables flowing through nodes.
  • a variable is redefined at a node in the calculation graph, the life cycle of the variable in the predecessor node of the node will be terminated when it flows through the node;
  • Step 4 Define the propagation rules for the associated variables in the calculation graph to flow through the nodes
  • Step 5 analyzing the effective variables of the input and output of each node based on the data flow
  • Step 6 Collect the collection of effective variables before and after passing through the node, and collect the collection of effective variables flowing through each node obtained from the life cycle analysis of the variables based on the calculation graph;
  • Step 7 Allocate memory unit blocks for valid variables on the edge of the calculation graph
  • Step 8 Define propagation rules for expressions available in the computation graph
  • Step 9 analyze the available expressions of the input and output of each node based on the data flow
  • Step 10 Optimizing the available expressions in the calculation graph: saving the calculation results of the available expressions at the nodes of the calculation graph into intermediate variables, and the available expressions that reappear in subsequent nodes will be replaced by the intermediate variables.
  • the propagation rule for redefining the variable flowing through the node in step 3 is specifically: if the variable is not redefined at a certain node in the calculation graph, then in the output set of valid variables flowing through the node The variable of can pass through the node to propagate to the variable of its output set; if the variable is redefined at a node in the computation graph, the variable cannot pass through the node.
  • step 4 the specific sub-steps of said step 4 are as follows:
  • Step 4.1 define that the variable flowing through a certain node of the calculation graph is equal to the used variable
  • Step 4.2 defining the set of output variables of the node as the union of the input sets of all successor nodes of the node;
  • Step 4.3 define the set of input variables of the node as the set of output variables, remove the set of variables defined at the node, and take a union with the set of variables used at the node.
  • step 5 the specific sub-steps of said step 5 are as follows:
  • Step 5.1 Initialize the input effective variable set of the output node.
  • the output effective variable set is an empty set, and the derivation method of the input effective variable set is: remove the output effective variable set from the node The redefined variable is combined with the variable used at the node;
  • Step 5.2 initialize the output effective variable set of the input node, and for the input node of the entire calculation graph, initialize the output effective variable set element to be the variable defined at the node;
  • Step 5.3 deriving the output effective variable set of the intermediate node, the output effective variable set of the node is the union of the input effective variables of its successor nodes;
  • Step 5.4 deduce the input effective variable set of the intermediate node, the input effective variable set of the node is its output effective variable combination, remove the redefined variable set at the node, and then take the variable used at the node A union of sets; preferably, in the step 7, the condition defined as a variable allocation unit memory block of a certain node in the calculation graph is: the variable at the node is within the life cycle, that is, the valid variable on the edge of the calculation graph Variables are allocated unit memory blocks. During the compilation of the calculation graph, according to the number of variables in the collection of valid variables collected above, memory unit blocks are pre-allocated for each variable in advance.
  • step 8 the specific sub-steps of said step 8 are as follows:
  • Step 8.1 define that the variable flowing through a certain node in the calculation graph is equal to the used variable
  • Step 8.2 define the set of input variables of this node as the intersection of output variable sets of all predecessor nodes of this node;
  • Step 8.3 define the set of output variables of this node as the union of all input variables of this node and the used variables set, and remove expressions that contain redefined variables.
  • step 9 the specific sub-steps of said step 9 are as follows:
  • Step 9.1 initialization of the set of available expressions, initialize the set of available expressions flowing into the entry nodes of the entire calculation graph as an empty set;
  • Step 9.2 Add the new expression in the successor node to the available expression set.
  • the available expression of the node is the intersection of the available expression sets on the multiple input edges as the The newly added element of the node's available expressions collection;
  • Step 9.3 For the successor node in which the variable in the available node is redefined, the expression containing the redefined variable needs to be removed from the set of available expressions.
  • step 10 the specific sub-steps of said step 10 are as follows:
  • Step 10.1 calculating available expressions separately
  • Step 10.2 save the calculation result of the available expression at the node of the calculation graph into an intermediate variable
  • Step 10.3 for the successor node in which no variable in the usable expression is redefined, replace the usable expression in the successor node with the value of the intermediate variable;
  • Step 10.4 for the successor node where the variable in the available expression is redefined, insert an update node of the available expression.
  • the invention also discloses a data flow device for neural network calculation, the device includes a memory and one or more processors, executable codes are stored in the memory, and the one or more processors execute the When the executable code is described above, it is used to implement the above-mentioned data flow method for neural network calculation.
  • the present invention discloses a data flow method and device for neural network calculation in a deep learning training system, which can analyze the nodes of each calculation subgraph in the local loop from the perspective of the global scope.
  • the data flow method for neural network calculation improves the calculation efficiency of the entire calculation graph and saves time and cost.
  • researchers and engineering users use the above-mentioned data flow method for neural network calculations to build models.
  • the calculation efficiency of neural networks is high, memory resources are saved, and large-scale deep neural network models are promoted. The development of landing applications.
  • Fig. 1 is a schematic diagram of the propagation rules of variables in use in the embodiment of the present invention.
  • FIG. 2 is a schematic diagram of propagation rules for redefining variables in an embodiment of the present invention
  • Fig. 3 is a schematic diagram of the definition of a data flow graph according to an embodiment of the present invention.
  • Fig. 4 is a schematic diagram of the analysis of the life cycle of the variables of the calculation graph based on the data flow graph according to the embodiment of the present invention.
  • Fig. 5 is a data flow diagram of an expression available in a definition calculation graph according to an embodiment of the present invention.
  • FIG. 6 is a schematic diagram of available expressions for analyzing the input and output of each node based on the data flow according to the embodiment of the present invention
  • Fig. 7 is a schematic diagram of the available expressions in the optimization calculation diagram of the embodiment of the present invention.
  • FIG. 8 is a schematic diagram of a data flow device used for neural network calculation according to an embodiment of the present invention.
  • An embodiment of the present invention provides a data flow method for neural network calculation, including the following steps:
  • Step 1 Initialize the life cycle of variables in the calculation graph
  • Step 2 Define the propagation rules for variables in use to flow through nodes
  • Step 3 Design the propagation rules for redefining variables to flow through nodes
  • Step 4 Define the propagation rules for the associated variables in the calculation graph to flow through the nodes
  • Step five analyze the effective variables of input and output of each node based on the data flow
  • Step 6 Collect the set of effective variables before and after passing through the node
  • Step 7 Allocate memory unit blocks for valid variables on the edge of the calculation graph
  • Step 8 Define the propagation rules for expressions available in the computation graph
  • Step 9 analyze the available expressions of the input and output of each node based on the data flow
  • Step 10 Optimize the available expressions in the calculation graph
  • step 1 initialize the life cycle of the variable in the calculation graph: the time period between the initialization and the end of the variable's definition is the life cycle of the variable in the calculation graph, and only the variables in the calculation graph are in the life cycle. Effective.
  • step 2 the propagation rules for variables in use to flow through nodes are defined. Given a given computation graph G(E,V), where E is an edge of computation graph G, and V is a node of computation graph G. At node V, if variable 1 is used, then the definition of variable 1 must be generated at the predecessor node of node V, so the input set of valid variables flowing through node V must contain variable 1.
  • the propagation rules for variables in use are shown in Figure 1.
  • the mark " ⁇ " appearing in this embodiment indicates that the expression of the position can be any expression.
  • step three design the propagation rules for redefining variables to flow through nodes.
  • variable 1 is redefined, then the lifetime of variable 1 in the predecessor node of node V is terminated while flowing through node V. That is, the variable 1 of the output node V is invalid, and the input set of valid variables flowing through the node V does not contain the variable 1.
  • the output set of valid variables flowing through node V includes variable 1. Variable 1 in the output set cannot propagate through node V to the input set.
  • the propagation rules for redefining variables are shown in Figure 2.
  • node V From the perspective of node V, it can be summarized as follows: if the variable is not redefined at node V, then the variables in the output set of effective variables flowing through node V can propagate through node V to the variables of its input set, as shown in Figure 2 variable 2 in ; otherwise, the variable cannot pass through the node V, such as variable 1 in FIG. 2 .
  • step 4 the propagation rules for the associated variables flowing through the nodes in the calculation graph are defined. As shown in Figure 3.
  • the propagation rules for defining the associated variables in the calculation graph to flow through the nodes include the following process:
  • the set of output variables of node V is defined as the union of the input sets of all successor nodes of node V.
  • the set of input variables of node V is defined as the set of output variables minus the set of variables v defined at node V, and then the union with the set of variables E used at node V.
  • step five analyze the effective variables of the input and output of each node based on the data flow, as shown in Figure 4:
  • Si: ⁇ ... ⁇ represents the set of valid variables flowing through the corresponding nodes obtained from the i-th step analysis.
  • the output effective variable set of the node is the union of the input effective variables of its successor nodes.
  • the effective output variable set is S5: ⁇ x ⁇ .
  • the input effective variable set of the node is its output effective variable combination, removing the redefined variable set at the node, and taking a union with the variable set used at the node.
  • its input effective variable set is: its input effective variable set S2: ⁇ x ⁇ removes the redefined variable at the node, and then takes the The union of ⁇ x ⁇ of the variable sets used at the node, and finally its input valid variable set is S3: ⁇ x ⁇ .
  • step six collect the set of valid variables before and after passing through the node. Collect the collection of effective variables flowing through each node obtained from the life cycle analysis of variables in the calculation graph based on the above data flow graph.
  • step 7 memory unit blocks are allocated for valid variables on the edge of the calculation graph. It is defined that the condition for allocating a unit memory block to the variable 1 at the node V is: the variable 1 at the node V is within the life cycle, that is to say, only a valid variable is allocated a unit memory block.
  • step eight the propagation rules for expressions available in the computation graph are defined. As shown in Figure 5.
  • the propagation rules for defining the available expressions in the calculation graph include the following process:
  • the set of input variables of node V is defined as: the intersection of output variable sets of all predecessor nodes of node V.
  • the set of output variables of node V is defined as the union of the set of all input variables of node V and the used variable E, and then the expressions containing redefined variables are removed.
  • step nine analyze the available expressions of the input and output of each node based on the data flow, as shown in Figure 6:
  • the available expressions of the global expressions in the calculation graph are analyzed step by step, where Si: ⁇ ... ⁇ represents the set of available expressions flowing through the corresponding nodes obtained from the i-th step analysis.
  • Figure 6 shows the set of globally available expressions that flow through the inputs and outputs of each node in the entire computational graph.
  • tf.abs(y) means to take the absolute value of the tensor y
  • tf.matmul(x,y) Indicates the matrix multiplication operation between tensor x and tensor y;
  • tf.ones(a.shape) means to create a tensor with the same shape as tensor a and all elements are 1.
  • the available expressions for analyzing the input and output of each node based on the data flow include the following steps:
  • Step 1 Initialization of the set of available expressions. Initializes the set of available expressions that flow into the entry nodes of the entire computation graph to an empty set. Step s1 in Figure 6.
  • Step 2 the expression set can be used to add the new expression in the successor node.
  • the available expression of the node is the intersection of the available expression sets on the multiple input edges as the newly added element of the available expression set of the node.
  • Step 3 For the successor node in which the variable in the available node is redefined, the expression containing the redefined variable needs to be removed from the set of available expressions. Step s5 in Figure 6.
  • the variable a is redefined, which causes the calculation result of the available expression tf.add(a,b) contained in the available expression set s4: ⁇ tf.add(a,b) ⁇ of the predecessor node to change.
  • the node needs to remove the expression tf.add(a,b) containing the redefined variable a from the available expression set s4: ⁇ tf.add(a,b) ⁇ , the node
  • the corresponding set of available expressions ends up being an empty set.
  • step ten optimize the available expressions in the calculation graph.
  • the calculation results of the available expressions at the nodes of the calculation graph are stored in intermediate variables, and the available expressions appearing again in subsequent nodes will be replaced by the intermediate variables.
  • the purpose of optimizing the available expressions is to reduce the recalculation of the same available expressions in subsequent nodes, improve the calculation efficiency of the entire calculation graph, and save time and cost.
  • the available expressions in the optimization calculation graph include the following processes:
  • the process of optimizing the available expressions in the calculation graph is shown in FIG. 7 .
  • the calculation result of the available expression tf.add(a,b) is saved in the variable w, and the subsequent node tf.abs(y)>tf.abs(tf.add(a,b))
  • an embodiment of the present invention also provides a data flow device for neural network calculation, which also includes a memory and one or more processors, executable codes are stored in the memory, and the one or more processors When the executable code is executed, it is used to implement the data flow method for neural network calculation in the foregoing embodiments.
  • An embodiment of a data flow device for neural network calculation of the present invention can be applied to any device with data processing capability, and any device with data processing capability can be a device or device such as a computer.
  • the device embodiments can be implemented by software, or by hardware or a combination of software and hardware. Taking software implementation as an example, as a device in a logical sense, it is formed by reading the corresponding computer program instructions in the non-volatile memory into the memory for operation by the processor of any device capable of data processing. From the perspective of hardware, as shown in FIG. 8, it is a hardware structure diagram of any device with data processing capabilities where a data flow device for neural network calculation is located, except for the processor shown in FIG.
  • any device with data processing capability where the device in the embodiment is usually based on the actual function of any device with data processing capability may also include other hardware. Let me repeat. For the implementation process of the functions and effects of each unit in the above device, please refer to the implementation process of the corresponding steps in the above method for details, and will not be repeated here.
  • the device embodiment since it basically corresponds to the method embodiment, for related parts, please refer to the part description of the method embodiment.
  • the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in One place, or it can be distributed to multiple network elements. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of the present invention. It can be understood and implemented by those skilled in the art without creative effort.
  • An embodiment of the present invention also provides a computer-readable storage medium, on which a program is stored.
  • the program is executed by a processor, the data flow method for neural network calculation in the above-mentioned embodiments is implemented.
  • the computer-readable storage medium may be an internal storage unit of any device capable of data processing described in any of the foregoing embodiments, such as a hard disk or a memory.
  • the computer-readable storage medium can also be an external storage device of any device with data processing capabilities, such as a plug-in hard disk, a smart media card (Smart Media Card, SMC), an SD card, and a flash memory card equipped on the device. (Flash Card) etc.
  • the computer-readable storage medium may also include both an internal storage unit of any device capable of data processing and an external storage device.
  • the computer-readable storage medium is used to store the computer program and other programs and data required by any device capable of data processing, and can also be used to temporarily store data that has been output or will be output.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Complex Calculations (AREA)

Abstract

一种用于神经网络计算的数据流动方法和装置,包括以下步骤:步骤1、初始化计算图中变量的生命周期,初始化从变量的定义开始到使用结束之间的时间段为所述变量在计算图中的生命周期;步骤2、定义使用中变量流过节点的传播规则,定义计算图某节点处的变量被使用,在该节点的前驱节点产生该变量的定义,使得流过该节点的有效变量的输入集合包含该变量;所述方法可以从全局作用域的视角分析出局部循环里的每个计算子图的节点所需分配的内存单元块。所述的用于神经网络计算的数据流动方法提高整个计算图的计算效率,节约时间成本。

Description

一种用于神经网络计算的数据流动方法和装置
相关申请的交叉引用
本发明要求于2022年8月10日向中国国家知识产权局提交的申请号为CN202210952808.4、发明名称为“一种用于神经网络计算的数据流动方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及特定计算模型的计算机系统领域,特别涉及一种用于神经网络计算的数据流动方法和装置。
背景技术
随着人工智能产业化应用的快速发展,大规模深度神经网络模型的训练系统越来越成为学术界和工业界的研究热点。在局部作用域下,已有的用于神经网络计算的子图之间的数据传递方式是通过计算节点与内存进行频繁交换,并且需要为每个子图设计数据流动算法。在全局作用域内,这种方式不能满足带循环结构计算子图的数据流动效率。
发明内容
本发明的目的在于提供一种用于神经网络计算的数据流动方法和装置,以克服现有技术中的不足。
为实现上述目的,本发明提供如下技术方案:
本发明公开了一种用于神经网络计算的数据流动方法,包括以下步骤:
步骤1、初始化计算图中变量的生命周期,初始化从变量的定义开始到使用结束之间的时间段为所述变量在计算图中的生命周期;
步骤2、定义使用中变量流过节点的传播规则,定义计算图某节点处的变量被使用,在该节点的前驱节点产生该变量的定义,使得流过该节点的有效变量的输入集合包含该变量;
步骤3、设计重定义变量流过节点的传播规则,当变量在计算图某节点处被重新定义,则使该节点的前驱节点中的该变量的生命周期在流过该节点时被终结;
步骤4、定义计算图中关联变量流过节点的传播规则;
步骤5、基于数据流分析每个节点的输入和输出的有效变量;
步骤6、收集流过节点前后的有效变量的集合,收集基于计算图的变量的生命周期分析所得的流经每个节点的有效变量的集合;
步骤7、为计算图边上的有效变量分配内存单元块;
步骤8、定义计算图中可用表达式的传播规则;
步骤9、基于数据流分析每个节点的输入和输出的可用表达式;
步骤10、优化计算图中的可用表达式:将计算图节点处的可用表达式的计算结果保存到中间变量中,再次出现在后继节点中的所述可用表达式将替换为所述中间变量。
作为优选的,所述步骤3中重定义变量流过节点的传播规则具体为:若该变量在所述计算图某节点处没有被重新定义,则流过所述节点的有效变量的输出集合中的变量可以穿过该节点传播至其输出集合的变量;若该变量在所述计算图某节点处被重新定义,则所述变量不可以穿过该节点。
作为优选的,所述步骤4的具体子步骤如下:
步骤4.1、定义流过计算图某节点的变量等于使用变量;
步骤4.2、定义该节点的输出变量的集合为该节点的所有后继节点的输入集合的并集;
步骤4.3、定义该节点的输入变量的集合为输出变量的集合去除该节点处定义的该变量集合,再与该节点处使用变量集合取并集。
作为优选的,所述步骤5的具体子步骤如下:
步骤5.1、初始化输出节点的输入有效变量集合,对于整张计算图的输出节点,其输出有效变量集合为空集,其输入有效变量集合推导方法为:其输出有效变量集合移除所述节点处被重新定义的变量,再与所述节点处被使用的变量取并集;
步骤5.2、初始化输入节点的输出有效变量集合,对于整张计算图的输入节点,初始化其输出的有效变量集合元素为所述节点处定义的变量;
步骤5.3、推导中间节点的输出有效变量集合,所述节点的输出有效变量集合为取其后继节点的输入有效变量的并集;
步骤5.4、推导中间节点的输入有效变量集合,所述节点的输入有效变量集合为其输出有效变量结合移除所述节点处被重新定义的变量集合,再取与所述节点处被使用的变量集合的并集;作为优选的,所述步骤7中,定义为计算图某节点的变量分配单元内存块的条件为:所述节点处的该变量在生命周期内,即计算图边上的有效变量才分配单元内存块,在计算图编译期间,根据上述收集的有效变量的集合中变量的个数,提前为每个变量预分配内存单元块。
作为优选的,所述步骤8的具体子步骤如下:
步骤8.1、定义流过计算图某节点的变量等于使用变量;
步骤8.2、定义该节点的输入变量的集合为该节点的所有前驱节点的输出变量集合的交集; 步骤8.3、定义该节点的输出变量的集合为该节点的所有输入变量的集合与使用变量的并集,再去除包含重定义变量的表达式。
作为优选的,所述步骤9的具体子步骤如下:
步骤9.1、可用表达式集合的初始化,将流入整个计算图入口节点的可用表达式的集合初始化为空集;
步骤9.2、可用表达式集合添加后继节点中新出现的表达式,对于存在多条输入边的节点,所述节点的可用表达式是取多条输入边上的可用表达式集合的交集作为所述节点的可用表达式集合新添加的元素;
步骤9.3、对于存在可用节点中的变量被重新定义的后继节点,需要将包含所述重定义变量的表达式从所述可用表达式集合中移除。
作为优选的,所述步骤10的具体子步骤如下:
步骤10.1、单独计算可用表达式;
步骤10.2、将计算图节点处的可用表达式的计算结果保存到中间变量中;
步骤10.3、对于没有可用表达式中的变量被重定义的后继节点,将后继节点中的可用表达式替换为中间变量的值;
步骤10.4、对于存在可用表达式中的变量被重定义的后继节点,插入可用表达式的更新节点。
本发明还公开了一种用于神经网络计算的数据流动装置,所述装置包括存储器和一个或多个处理器,所述存储器中存储有可执行代码,所述一个或多个处理器执行所述可执行代码时,用于实现上述用于神经网络计算的数据流动方法。
本发明的有益效果:本发明公开了深度学习训练系统中一种用于神经网络计算的数据流动方法和装置,可以从全局作用域的视角分析出局部循环里的每个计算子图的节点所需分配的内存单元块。所述的用于神经网络计算的数据流动方法提高整个计算图的计算效率,节约时间成本。研究人员和工程应用者开发分布式模型的过程中,利用所述的一种用于神经网络计算的数据流动方法搭建模型,神经网络的计算效率高,节约内存资源,推动了深度神经网络大模型落地应用的发展。
附图说明
图1是本发明实施例使用中变量的传播规则示意图;
图2是本发明实施例重定义变量的传播规则示意图;
图3是本发明实施例数据流图的定义示意图;
图4是本发明实施例基于数据流图进行的计算图的变量的生命周期的分析示意图;
图5是本发明实施例定义计算图中可用表达式的数据流图;
图6是本发明实施例基于数据流分析每个节点的输入和输出的可用表达式示意图;
图7是本发明实施例优化计算图中的可用表达式示意图;
图8是本发明实施例用于神经网络计算的数据流动装置示意图。
具体实施方式
为使本发明的目的、技术方案和优点更加清楚明了,下面通过附图及实施例,对本发明进行进一步详细说明。但是应该理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限制本发明的范围。此外,在以下说明中,省略了对公知结构和技术的描述,以避免不必要地混淆本发明的概念。
本发明实施例提供一种用于神经网络计算的数据流动方法,包括以下步骤:
步骤一、初始化计算图中变量的生命周期;
步骤二、定义使用中变量流过节点的传播规则;
步骤三、设计重定义变量流过节点的传播规则;
步骤四、定义计算图中关联变量流过节点的传播规则;
步骤五、基于数据流分析每个节点的输入和输出的有效变量;
步骤六、收集流过节点前后的有效变量的集合;
步骤七、为计算图边上的有效变量分配内存单元块;
步骤八、定义计算图中可用表达式的传播规则;
步骤九、基于数据流分析每个节点的输入和输出的可用表达式;
步骤十、优化计算图中的可用表达式;
步骤一中初始化计算图中变量的生命周期:初始化从变量的定义开始到使用结束之间的时间段为所述变量在计算图中的生命周期,计算图中只有在生命周期内的变量才是有效的。
步骤二中定义使用中变量流过节点的传播规则。已知给定计算图G(E,V),其中E是计算图G的边,V是计算图G的节点。在节点V处,如果变量1被使用,那么在节点V的前驱节点一定产生变量1的定义,因此流过节点V的有效变量的输入集合一定包含变量1。使用中变量的传播规则如图1所示。本实施例中出现的标识“■”,表示所述位置的表达式可以为任意表达式。
步骤三中设计重定义变量流过节点的传播规则。在节点V处,如果变量1被重新定义,那么节点V的前驱节点中的变量1的生命周期在流过节点V时被终结。也就是说,输出节点V的变量1是无效的,流过节点V的有效变量的输入集合中不包含变量1。而且由于变量 1在节点V被重新定义,所以流过节点V的有效变量的输出集合包含了变量1。输出集合中的变量1无法穿过节点V传播至输入集合中。重定义变量的传播规则如图2所示。从节点V的视角可以总结为:如果变量在节点V处没有被重新定义,那么流过节点V的有效变量的输出集合中的变量可以穿过节点V传播至其输入集合的变量,如图2中的变量2;反之,所述变量就不可以穿过节点V,如图2中的变量1。
步骤四中定义计算图中关联变量流过节点的传播规则。如图3所示。所述定义计算图中关联变量流过节点的传播规则包含如下过程:
(1)对于计算图的节点V:定义变量v=使用变量E;
(2)定义节点输出变量的集合。节点V的输出变量的集合定义为节点V的所有后继节点的输入集合的并集。节点V的输出变量的集合表示为:输出(节点V)=U输入(后继节点Vs),Vs∈后继节点(V);
(3)定义节点的输入变量的集合。节点V的输入变量的集合定义为输出变量的集合去除节点V处定义的变量v集合,再与节点V处使用变量E集合的并集。节点V的输入变量的集合表示为:输入(节点V)=(输出(节点V)\{定义变量v})∪{使用变量E}。
步骤五中基于数据流分析每个节点的输入和输出的有效变量,如图4所示:
基于数据流图逐步骤分析计算图中全局变量的生命周期,其中Si:{……}表示第i步分析所得的流经对应节点的有效变量的集合。如图4所示了流经整个计算图中每个节点的输入和输出的全局有效变量集合。本实施例中出现的“{}”表示空集;x=input表示将输入的张量变量赋予张量变量x。
所述基于数据流分析每个节点的输入和输出的有效变量包含如下过程:
(1)初始化输出节点的输入有效变量集合。对于整张计算图的输出节点,其输出有效变量集合为空集,其输入有效变量集合推导方法是:其输出有效变量集合移除所述节点处被重新定义的变量,再与所述节点处被使用的变量取并集。所以其输入有效变量集合为:空集与输出变量取并集,输入有效变量集合包含输出变量元素。如图4中的S1:{x}。
(2)初始化输入节点的输出有效变量集合。对于整张计算图的输入节点,初始化其输出的有效变量集合元素为所述节点处定义的变量。如图4中的S3:{x}。
(3)推导中间节点的输出有效变量集合。所述节点的输出有效变量集合为取其后继节点的输入有效变量的并集。如图4中,对于x.shape[0]>520节点,其输出有效变量集合为:其后继节点y=x/8的输入有效变量集合S2:{x}的并集,因此其输出有效变量集合为S2:{x}。同理,对于z=tf.nn.relu(z)节点,其输出有效变量集合为S5:{x}。
推导中间节点的输入有效变量集合。所述节点的输入有效变量集合为其输出有效变量结合移除所述节点处被重新定义的变量集合,再取与所述节点处被使用的变量集合的并集。如图4中,对于x.shape[0]>520节点,其输入有效变量集合为:其输入有效变量集合S2:{x}移除所述节点处被重新定义的变量,再取与所述节点处被使用的变量集合的{x}的并集,最终其输入有效变量集合为S3:{x}。同理,对于z=tf.nn.relu(z)节点,其输入有效变量集合为入边上的S7:{x,z}集合和入边上的S6:{x,z}集合。对于x=x/3节点,其输入有效变量集合为其输入有效变量集合S6:{x,z}移除被重定义变量{x},再取与被使用变量{x}的并集,所述输入有效变量集合最终为S8:{x,z}。对于|z|>0节点,其输入有效变量集合为其后继节点的输入有效变量集合的并集,所以为S9:{x,z}。
步骤六中收集流过节点前后的有效变量的集合。收集基于上述数据流图进行的计算图的变量的生命周期分析所得的流经每个节点的有效变量的集合。
步骤七中为计算图边上的有效变量分配内存单元块。定义为节点V处的变量1分配单元内存块的条件为:所述节点V处的变量1在生命周期内,也就是说,有效变量才分配单元内存块。在计算图编译期间,根据上述收集的有效变量的集合中的变量的个数,提前为每个变量预分配内存单元块。
步骤八中定义计算图中可用表达式的传播规则。如图5所示。
所述定义计算图中可用表达式的传播规则包含如下过程:
(1)对于计算图的节点V:定义变量v=使用变量E;
(2)定义节点的输入变量的集合。节点V的输入变量的集合定义为:节点V的所有前驱节点的输出变量集合的交集。节点V的输入变量的集合表示为:输入(节点V)=∩输出(前驱节点Vs),Vs∈前驱节点(V);
(3)定义节点输出变量的集合。节点V的输出变量的集合定义为节点V的所有输入变量的集合与使用变量E的并集,再去除包含重定义变量的表达式。节点V的输出变量的集合表示为:输出(节点V)=(输出(节点V)∪{使用变量E})\{表达式(重定义变量v)}。
步骤九中基于数据流分析每个节点的输入和输出的可用表达式,如图6所示:
基于数据流图逐步骤分析计算图中全局表达式的可用表达式,其中Si:{……}表示第i步分析所得的流经对应节点的可用表达式的集合。如图6所示了流经整个计算图中每个节点的输入和输出的全局可用表达式的集合。
tf.add(x,y):表示张量x与张量y进行相加操作;
tf.abs(y):表示取张量y的绝对值;
tf.matmul(x,y):表示张量x与张量y进行矩阵乘法操作;
tf.ones(a.shape):表示创建一个与张量a形状相同且所有元素都为1的张量。
所述基于数据流分析每个节点的输入和输出的可用表达式包含如下步骤:
步骤1、可用表达式集合的初始化。将流入整个计算图入口节点的可用表达式的集合初始化为空集。如图6中的步骤s1。
步骤2、可用表达式集合添加后继节点中新出现的表达式。对于存在多条输入边的节点,所述节点的可用表达式是取多条输入边上的可用表达式集合的交集作为所述节点的可用表达式集合新添加的元素。如图6中的步骤s2、s3和s4,其中,对于s4集合对应的tf.abs(y)>tf.abs(tf.add(a,b))节点,所述节点的可用表达式集合的推导过程如下:由于所述节点存在两条输入边,所以所述节点的可用表达式的集合是取两条输入边上的可用表达式集合的交集,也就是取s3和s6的交集。
步骤3、对于存在可用节点中的变量被重新定义的后继节点,需要将包含所述重定义变量的表达式从所述可用表达式集合中移除。如图6中的步骤s5。对于s5集合对应的节点x=tf.add(a,b),所述节点的可用表达式的推导过程如下:由于其前驱节点a=tf.add(a,tf.ones(a.shape))重新定义了变量a,导致前驱节点的可用表达式集合s4:{tf.add(a,b)}包含的可用表达式tf.add(a,b)的计算结果发生变化。所以所述节点需要将包含所述重定义变量a的表达式tf.add(a,b)从所述可用表达式集合s4:{tf.add(a,b)}中移除,所述节点对应的可用表达式集合最终为空集。
步骤十中优化计算图中的可用表达式。将计算图节点处的可用表达式的计算结果保存到中间变量中,再次出现在后继节点中的所述可用表达式将替换为所述中间变量。优化可用表达式的目的是减少后继节点相同可用表达式的重计算,提高整个计算图的计算效率,节约时间成本。所述优化计算图中的可用表达式包含如下过程:
(1)计算可用表达式。单独计算可用表达式。
(2)将可用表达式计算结果保存到中间变量中。
(3)对于没有可用表达式中的变量被重定义的后继节点,将后继节点中的可用表达式替换为中间变量的值。
(4)对于存在可用表达式中的变量被重定义的后继节点,插入可用表达式的更新节点。由于可用表达式中的变量在后继节点中被重新定义,导致前驱节点包含的可用表达式的计算结果发生变化,那么就需要插入可用表达式计算结果的更新节点。
所述优化计算图中的可用表达式的过程如图7所示。在s2步骤,将可用表达式tf.add(a,b) 的计算结果保存在变量w中,将后继节点tf.abs(y)>tf.abs(tf.add(a,b))中的可用表达式tf.add(a,b)替换为变量w的值。由于步骤s5没有可用表达式,导致前驱节点包含的可用表达式tf.add(a,b)的计算结果发生变化,所以需要插入一个更新变量w的节点,如w=x。
参见图8,本发明实施例还提供了一种用于神经网络计算的数据流动装置,还包括存储器和一个或多个处理器,存储器中存储有可执行代码,所述一个或多个处理器执行所述可执行代码时,用于实现上述实施例中的用于神经网络计算的数据流动方法。
本发明一种用于神经网络计算的数据流动装置的实施例可以应用在任意具备数据处理能力的设备上,该任意具备数据处理能力的设备可以为诸如计算机等设备或装置。装置实施例可以通过软件实现,也可以通过硬件或者软硬件结合的方式实现。以软件实现为例,作为一个逻辑意义上的装置,是通过其所在任意具备数据处理能力的设备的处理器将非易失性存储器中对应的计算机程序指令读取到内存中运行形成的。从硬件层面而言,如图8所示,为本发明一种用于神经网络计算的数据流动装置所在任意具备数据处理能力的设备的一种硬件结构图,除了图8所示的处理器、内存、网络接口、以及非易失性存储器之外,实施例中装置所在的任意具备数据处理能力的设备通常根据该任意具备数据处理能力的设备的实际功能,还可以包括其他硬件,对此不再赘述。上述装置中各个单元的功能和作用的实现过程具体详见上述方法中对应步骤的实现过程,在此不再赘述。
对于装置实施例而言,由于其基本对应于方法实施例,所以相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本发明方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。
本发明实施例还提供一种计算机可读存储介质,其上存储有程序,该程序被处理器执行时,实现上述实施例中的用于神经网络计算的数据流动方法。
所述计算机可读存储介质可以是前述任一实施例所述的任意具备数据处理能力的设备的内部存储单元,例如硬盘或内存。所述计算机可读存储介质也可以是任意具备数据处理能力的设备的外部存储设备,例如所述设备上配备的插接式硬盘、智能存储卡(Smart Media Card,SMC)、SD卡、闪存卡(Flash Card)等。进一步的,所述计算机可读存储介质还可以既包括任意具备数据处理能力的设备的内部存储单元也包括外部存储设备。所述计算机可读存储介质用于存储所述计算机程序以及所述任意具备数据处理能力的设备所需的其他程序和数 据,还可以用于暂时地存储已经输出或者将要输出的数据。
以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内所作的任何修改、等同替换或改进等,均应包含在本发明的保护范围之内。

Claims (9)

  1. 一种用于神经网络计算的数据流动方法,其特征在于:包括以下步骤:
    步骤1、初始化计算图中变量的生命周期,初始化从变量的定义开始到使用结束之间的时间段为所述变量在计算图中的生命周期;
    步骤2、定义使用中变量流过节点的传播规则,定义计算图某节点处的变量被使用,在该节点的前驱节点产生该变量的定义,使得流过该节点的有效变量的输入集合包含该变量;
    步骤3、设计重定义变量流过节点的传播规则,当变量在计算图某节点处被重新定义,则使该节点的前驱节点中的该变量的生命周期在流过该节点时被终结;
    步骤4、定义计算图中关联变量流过节点的传播规则;
    步骤5、基于数据流分析每个节点的输入和输出的有效变量;
    步骤6、收集流过节点前后的有效变量的集合,收集基于计算图的变量的生命周期分析所得的流经每个节点的有效变量的集合;
    步骤7、为计算图边上的有效变量分配内存单元块;
    步骤8、定义计算图中可用表达式的传播规则;
    步骤9、基于数据流分析每个节点的输入和输出的可用表达式;
    步骤10、优化计算图中的可用表达式:将计算图节点处的可用表达式的计算结果保存到中间变量中,再次出现在后继节点中的所述可用表达式将替换为所述中间变量。
  2. 如权利要求1所述的一种用于神经网络计算的数据流动方法,其特征在于:所述步骤3中重定义变量流过节点的传播规则具体为:若该变量在所述计算图某节点处没有被重新定义,则流过所述节点的有效变量的输出集合中的变量可以穿过该节点传播至其输出集合的变量;若该变量在所述计算图某节点处被重新定义,则所述变量不可以穿过该节点。
  3. 如权利要求1所述的一种用于神经网络计算的数据流动方法,其特征在于:所述步骤4的具体子步骤如下:
    步骤4.1、定义流过计算图某节点的变量等于使用变量;
    步骤4.2、定义该节点的输出变量的集合为该节点的所有后继节点的输入集合的并集;
    步骤4.3、定义该节点的输入变量的集合为输出变量的集合去除该节点处定义的该变量集合,再与该节点处使用变量集合取并集。
  4. 如权利要求1所述的一种用于神经网络计算的数据流动方法,其特征在于:所述步骤5的具体子步骤如下:
    步骤5.1、初始化输出节点的输入有效变量集合,对于整张计算图的输出节点,其输出有效变量集合为空集,其输入有效变量集合推导方法为:其输出有效变量集合移除所述节点处被重 新定义的变量,再与所述节点处被使用的变量取并集;
    步骤5.2、初始化输入节点的输出有效变量集合,对于整张计算图的输入节点,初始化其输出的有效变量集合元素为所述节点处定义的变量;
    步骤5.3、推导中间节点的输出有效变量集合,所述节点的输出有效变量集合为取其后继节点的输入有效变量的并集;
    步骤5.4、推导中间节点的输入有效变量集合,所述节点的输入有效变量集合为其输出有效变量结合移除所述节点处被重新定义的变量集合,再取与所述节点处被使用的变量集合的并集。
  5. 如权利要求1所述的一种用于神经网络计算的数据流动方法,其特征在于:所述步骤7中,定义为计算图某节点的变量分配单元内存块的条件为:所述节点处的该变量在生命周期内,即计算图边上的有效变量才分配单元内存块,在计算图编译期间,根据上述收集的有效变量的集合中变量的个数,提前为每个变量预分配内存单元块。
  6. 如权利要求1所述的一种用于神经网络计算的数据流动方法,其特征在于:所述步骤8的具体子步骤如下:
    步骤8.1、定义流过计算图某节点的变量等于使用变量;
    步骤8.2、定义该节点的输入变量的集合为该节点的所有前驱节点的输出变量集合的交集;
    步骤8.3、定义该节点的输出变量的集合为该节点的所有输入变量的集合与使用变量的并集,再去除包含重定义变量的表达式。
  7. 如权利要求1所述的一种用于神经网络计算的数据流动方法,其特征在于:所述步骤9的具体子步骤如下:
    步骤9.1、可用表达式集合的初始化,将流入整个计算图入口节点的可用表达式的集合初始化为空集;
    步骤9.2、可用表达式集合添加后继节点中新出现的表达式,对于存在多条输入边的节点,所述节点的可用表达式是取多条输入边上的可用表达式集合的交集作为所述节点的可用表达式集合新添加的元素;
    步骤9.3、对于存在可用节点中的变量被重新定义的后继节点,需要将包含所述重定义变量的表达式从所述可用表达式集合中移除。
  8. 如权利要求1所述的一种用于神经网络计算的数据流动方法,其特征在于:所述步骤10的具体子步骤如下:
    步骤10.1、单独计算可用表达式;
    步骤10.2、将计算图节点处的可用表达式的计算结果保存到中间变量中;
    步骤10.3、对于没有可用表达式中的变量被重定义的后继节点,将后继节点中的可用表达式替换为中间变量的值;
    步骤10.4、对于存在可用表达式中的变量被重定义的后继节点,插入可用表达式的更新节点。
  9. 一种用于神经网络计算的数据流动装置,其特征在于:所述装置包括存储器和一个或多个处理器,所述存储器中存储有可执行代码,所述一个或多个处理器执行所述可执行代码时,用于实现权利要求1-8任一项所述用于神经网络计算的数据流动方法。
PCT/CN2022/116171 2022-08-10 2022-08-31 一种用于神经网络计算的数据流动方法和装置 WO2023093185A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/954,109 US11941507B2 (en) 2022-08-10 2022-09-27 Data flow method and apparatus for neural network computation by determining input variables and output variables of nodes of a computational graph of a neural network

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210952808.4A CN115033391B (zh) 2022-08-10 2022-08-10 一种用于神经网络计算的数据流动方法和装置
CN202210952808.4 2022-08-10

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/954,109 Continuation US11941507B2 (en) 2022-08-10 2022-09-27 Data flow method and apparatus for neural network computation by determining input variables and output variables of nodes of a computational graph of a neural network

Publications (1)

Publication Number Publication Date
WO2023093185A1 true WO2023093185A1 (zh) 2023-06-01

Family

ID=83130104

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/116171 WO2023093185A1 (zh) 2022-08-10 2022-08-31 一种用于神经网络计算的数据流动方法和装置

Country Status (3)

Country Link
US (1) US11941507B2 (zh)
CN (1) CN115033391B (zh)
WO (1) WO2023093185A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115268877B (zh) * 2022-09-27 2022-12-13 之江实验室 一种用于图计算并行执行的中间表示方法及装置
US11782723B1 (en) 2022-09-27 2023-10-10 Zhejiang Lab Intermediate representation method and apparatus for parallel execution of graph computation
CN115658331B (zh) * 2022-12-28 2023-03-21 杭州登临瀚海科技有限公司 动态神经网络的编译方法、装置、电子设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190279086A1 (en) * 2017-08-19 2019-09-12 Wave Computing, Inc. Data flow graph node update for machine learning
US20200151580A1 (en) * 2018-11-13 2020-05-14 International Business Machines Corporation Generating and managing deep tensor neural networks
US20210174190A1 (en) * 2019-12-05 2021-06-10 International Business Machines Corporation Neural network training using a data flow graph and dynamic memory management
CN114186687A (zh) * 2022-02-17 2022-03-15 之江实验室 一种面向神经网络模型计算的中间表示方法和装置
CN114461351A (zh) * 2022-04-13 2022-05-10 之江实验室 一种用于神经网络计算的动态图执行方法及装置

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104268023B (zh) * 2014-10-10 2017-08-25 西安电子科技大学 一种msvl程序内存管理方法
DE102019205081A1 (de) * 2019-04-09 2020-10-15 Robert Bosch Gmbh Verfahren, Vorrichtung und Computerprogramm zum Betreiben eines tiefen neuronalen Netzes
WO2021077284A1 (zh) * 2019-10-22 2021-04-29 深圳鲲云信息科技有限公司 神经网络运行系统和方法
CN111045959B (zh) * 2019-11-18 2024-03-19 中国航空工业集团公司西安航空计算技术研究所 一种基于存储优化的复杂算法变量映射方法
CN110941494A (zh) * 2019-12-02 2020-03-31 哈尔滨工程大学 一种面向深度学习的gpu并行计算的数据处理方法
CN111338635B (zh) * 2020-02-20 2023-09-12 腾讯科技(深圳)有限公司 计算图的图编译方法、装置、设备及存储介质
WO2021219211A1 (en) * 2020-04-29 2021-11-04 Huawei Technologies Co., Ltd. Memory allocation in a neural network
US11347442B2 (en) * 2020-04-30 2022-05-31 Sap Se In-memory database service abstraction component
CN112527502A (zh) * 2020-12-09 2021-03-19 北京百瑞互联技术有限公司 一种优化音视频编解码器内存的方法、装置及介质及设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190279086A1 (en) * 2017-08-19 2019-09-12 Wave Computing, Inc. Data flow graph node update for machine learning
US20200151580A1 (en) * 2018-11-13 2020-05-14 International Business Machines Corporation Generating and managing deep tensor neural networks
US20210174190A1 (en) * 2019-12-05 2021-06-10 International Business Machines Corporation Neural network training using a data flow graph and dynamic memory management
CN114746871A (zh) * 2019-12-05 2022-07-12 国际商业机器公司 使用数据流图和动态存储器管理的神经网络训练
CN114186687A (zh) * 2022-02-17 2022-03-15 之江实验室 一种面向神经网络模型计算的中间表示方法和装置
CN114461351A (zh) * 2022-04-13 2022-05-10 之江实验室 一种用于神经网络计算的动态图执行方法及装置

Also Published As

Publication number Publication date
CN115033391A (zh) 2022-09-09
US20240054319A1 (en) 2024-02-15
US11941507B2 (en) 2024-03-26
CN115033391B (zh) 2022-11-11

Similar Documents

Publication Publication Date Title
WO2023093185A1 (zh) 一种用于神经网络计算的数据流动方法和装置
WO2022222839A1 (zh) 一种面向神经网络模型计算的中间表示方法和装置
WO2024021192A1 (zh) 一种用于神经网络计算的图优化方法和装置
US20230236888A1 (en) Memory allocation method, related device, and computer-readable storage medium
US20230177312A1 (en) Method for distributed type training adaptation and apparatus in deep learning framework and ai accelerator card
Seinstra et al. A software architecture for user transparent parallel image processing
CN108921188B (zh) 一种基于Spark大数据平台的并行CRF方法
Florea et al. A generalized accelerated composite gradient method: Uniting Nesterov's fast gradient method and FISTA
CN115269204B (zh) 一种用于神经网络编译的内存优化方法及装置
Tschaikowski et al. Tackling continuous state-space explosion in a Markovian process algebra
US20240127027A1 (en) Optimization method and apparatus for compiling computation graph
Klawonn et al. Energy efficiency of nonlinear domain decomposition methods
Manor et al. Using HW/SW codesign for deep neural network hardware accelerator targeting low-resources embedded processors
Pingali et al. Amorphous data-parallelism in irregular algorithms
Uhrmacher et al. Effective and efficient modeling and simulation with DEVS variants
JP2022127619A (ja) ハードウェアアクセラレートされた命令の使用
Guermouche et al. Scheduling trees of malleable tasks for sparse linear algebra
Cunningham High performance algorithms for quantum gravity and cosmology
US20240028886A1 (en) Graph Optimization Method and Apparatus for Neural Network Computation
CN115269205B (zh) 一种面向神经网络计算的内存优化方法和装置
WO2024065866A1 (zh) 一种用于计算图编译的中间表示方法及装置
US20240104341A1 (en) Memory optimization method and apparatus for neural network compilation
US11907693B2 (en) Job decomposition processing method for distributed computing
US20240104016A1 (en) Intermediate Representation Method and Apparatus for Compiling Computation Graphs
US20240104395A1 (en) Memory optimization method and device oriented to neural network computing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22897277

Country of ref document: EP

Kind code of ref document: A1