WO2023212975A1 - 映射的方法、电子设备、计算机可读存储介质 - Google Patents

映射的方法、电子设备、计算机可读存储介质 Download PDF

Info

Publication number
WO2023212975A1
WO2023212975A1 PCT/CN2022/092624 CN2022092624W WO2023212975A1 WO 2023212975 A1 WO2023212975 A1 WO 2023212975A1 CN 2022092624 W CN2022092624 W CN 2022092624W WO 2023212975 A1 WO2023212975 A1 WO 2023212975A1
Authority
WO
WIPO (PCT)
Prior art keywords
nodes
node
control
input
output
Prior art date
Application number
PCT/CN2022/092624
Other languages
English (en)
French (fr)
Inventor
张伟豪
沈杨书
祝夭龙
Original Assignee
北京灵汐科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京灵汐科技有限公司 filed Critical 北京灵汐科技有限公司
Publication of WO2023212975A1 publication Critical patent/WO2023212975A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5072Grid computing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present disclosure relates to the technical field of computational graph mapping, and in particular to mapping methods, electronic devices, and computer-readable storage media.
  • a "computation graph” In order to implement mapping, a "computation graph" must first be constructed. In the calculation graph, multiple nodes and connections between nodes are used to represent the logical process of the overall operation. Each node represents part of the sub-operation in the overall operation, and the connections represent the data between nodes. (Token) flow, such as the output of one node being connected to the input of another node, indicating that the output data of the previous node is used as the input data of the subsequent node (for example, the operation result of one neuron is used as the basis for the operation of another neuron) . After obtaining the calculation graph, each node in it can be mapped to the corresponding processing unit according to the calculation graph.
  • control flow that is, the actual operation processes performed for different inputs may be different. For example, processes such as judgment and looping all belong to control flow.
  • a very simple control flow will greatly increase the complexity of the calculation graph (control explosion), which will cause great difficulties in the construction and understanding of the calculation graph, and will lead to mapping results based on such calculation graphs. It is unreasonable, cannot fully utilize the computing power of each processing unit, and increases the amount of data to be transmitted between different processing units.
  • the present disclosure provides a mapping method, electronic device, and computer-readable storage medium.
  • an embodiment of the present disclosure provides a mapping method, which includes:
  • each of the nodes is connected to at least one other node and has an input and an output;
  • the nodes include a computing node, a control node, and a switching node;
  • each of the computing nodes is used to subdivide the input data from its input. operation to generate output data at its output;
  • each of the control nodes is used to perform logical operations on input data from its input to generate control instructions at its output;
  • each of the switching nodes is connected to at least one control node, It is used to control the connection relationship between its input and output according to the control instructions from the control node it is connected to.
  • the nodes further include connection nodes
  • connection nodes includes an varying number of connections and inputs, where each input is connected to at least one output, and each output is connected to at least one input.
  • the method before mapping the nodes in the computing graph to the processing unit, the method further includes at least one of the following:
  • the method before mapping the nodes in the computing graph to the processing unit, the method further includes at least one of the following:
  • mapping nodes in the computing graph to processing units includes:
  • the nodes in the calculation graph are mapped to the processing cores of the many-core system.
  • mapping nodes in the computing graph to processing units includes:
  • Nodes in the calculation graph are mapped to at least two different types of processing units; wherein nodes of the same type are mapped to processing units of the same type.
  • mapping nodes in the computing graph to at least two different types of processing units at least one of the following is satisfied:
  • the computing nodes are mapped to processors
  • the control node is mapped to the logic controller
  • the switching nodes are mapped into switching circuits.
  • the computing node satisfies at least one of the following:
  • At least some of the operation nodes perform operations when all of their inputs have input data
  • At least some of the operation nodes perform operations when at least one of their inputs has input data.
  • an electronic device which includes:
  • a memory communicatively connected to the at least one processor
  • the memory stores one or more computer programs that can be executed by the at least one processor, and the one or more computer programs are executed by the at least one processor, so that the at least one processor can Execute the above mapping method.
  • the present disclosure provides a computer-readable storage medium on which a computer program is stored, wherein the computer program implements the above mapping method when executed by a processor.
  • the control node plays the role of deciding "what control to perform", while the switching node "implements the corresponding control”. Therefore, each control (such as judgment) passes through two nodes (control node and switching node). node); therefore, for the same operation with control flow, the calculation graph obtained by the embodiment of the present disclosure has a much simpler structure than the calculation graph based on data flow.
  • the calculation graph is easy to construct and easy to understand. According to such calculation
  • the mapping results obtained by the graph are also more reasonable, which can fully utilize the computing power of each processing unit and reduce the amount of data to be transmitted between different processing units.
  • Figure 1 is a calculation diagram constructed based on certain operations in related technologies
  • Figure 2 is a flow chart of a mapping method provided by an embodiment of the present disclosure
  • Figure 3 is a calculation diagram constructed based on certain operations in the mapping method of the embodiment of the present disclosure
  • Figure 4 is a schematic diagram of the role of switching nodes in the mapping method according to the embodiment of the present disclosure.
  • Figure 5 is a schematic diagram of the node merging process in the mapping method according to the embodiment of the present disclosure.
  • Figure 6 is a block diagram of an electronic device provided by an embodiment of the present disclosure.
  • Figure 7 is a block diagram of a computer-readable storage medium provided by an embodiment of the present disclosure.
  • a calculation graph (also called a data flow graph) is mainly constructed based on "data flow” and mapped according to the calculation graph.
  • x, y, w, v, and z respectively represent the corresponding data (Token)
  • h, f, p, q, and g respectively represent the corresponding functions (sub-operations)
  • T and F respectively represent that the judgment result is yes. (True) or False
  • the line represents the connection
  • the arrow represents the data flow direction in the connection.
  • embodiments of the present disclosure provide a mapping method.
  • the mapping method in the embodiment of the present disclosure is used to map (or distribute) a target algorithm (such as a neural network) that needs to be run to each processing unit of the processing device.
  • a target algorithm such as a neural network
  • the target algorithm that needs to be run (such as a neural network) is used to achieve certain functions, such as speech recognition, image processing, etc.
  • the processing device includes multiple processing units, each processing unit has certain data processing capabilities, and data transmission can be achieved between different processing units; thus, after the mapping is completed, each processing unit can process the target algorithm mapped into it Part of the content, through the cooperation of each processing unit, the above target algorithm is jointly realized.
  • mapping method in the embodiment of the present disclosure can be implemented by an independent mapping device or by some components in the processing device (such as a separate controller or part of the processing unit).
  • mapping method includes:
  • each node is connected to at least one other node and has input and output; the nodes include operation nodes, control nodes, and switching nodes; each operation node is used to perform sub-operations on the input data from its input to output at its output. Generate output data; each control node is used to perform logical operations on input data from its input to generate control instructions at its output; each switching node is connected to at least one control node, and is used to perform logical operations on input data from its connected control node. Control instructions to control the connection relationship between its input and output.
  • a calculation graph is first constructed according to the target algorithm.
  • the calculation graph includes multiple nodes, each node has one or more inputs (input terminals), and one or more outputs (output terminals), so that each node can perform certain processing on the input data from its input, to produce output data at its output.
  • the calculation graph also includes connections between different nodes. The connections are used to transmit data (Token), such as transmitting the output data generated by one node to the input of other nodes.
  • each node must be mapped to a processing unit for processing.
  • each processing unit of the processing device can be mapped to One or more nodes, or no mapping nodes; thus each processing unit can process the calculations of the nodes mapped to it, and if the nodes with connections (that is, data needs to be transmitted) are located in different processing units, corresponding Data transmission between processing units.
  • the calculation graph is usually mapped to multiple processing units; but when the total size of the calculation graph is small, it is also feasible if all nodes are mapped to one processing unit.
  • control nodes In the calculation graph of the embodiment of the present disclosure, in addition to the operation nodes directly used to perform certain actual operations (sub-operations), there are also control nodes and switching nodes.
  • control node can generate certain control instructions based on the input data, that is, perform certain logical processing on the input data (such as judging whether the input data meets certain conditions), and determine according to the result of the logical processing (such as the judgment result) under the current situation What operation "should” be performed is to determine the current "how the data should flow” and generate corresponding control instructions.
  • the switch node can change the connection relationship between its own input and output according to the control instructions from the control node, that is, determine which output (or outputs) the data from each of its inputs can be transmitted to, so that the switch node can Without performing substantive operations, changing the connection relationships between other nodes according to control instructions is to "control the flow of data.”
  • the control node plays the role of deciding "what control to perform", while the switching node "implements the corresponding control”. Therefore, each control (such as judgment) passes through two nodes (control node and switching node). node); therefore, for the same operation with control flow, the calculation graph obtained by the embodiment of the present disclosure has a much simpler structure than the calculation graph based on data flow.
  • the calculation graph is easy to construct and easy to understand. According to such calculation
  • the mapping results obtained by the graph are also more reasonable, which can fully utilize the computing power of each processing unit and reduce the amount of data to be transmitted between different processing units.
  • control instructions are diverse.
  • the control instructions may be in the form of a "matrix".
  • its corresponding control instruction can be a 4-line*3 matrix of columns.
  • control instruction is the matrix in Figure 4, the connection represented by it can refer to the right side of Figure 4, that is, the input data from input a2 can reach output b1 (that is, input a2 is connected to output b1), and the input data from input a3 Output b2 can be reached, and input data from inputs a1, a3, and a4 can all reach output b3.
  • connection relationships represented by the switching nodes are also different.
  • the nodes also include connection nodes
  • Each connection node includes a varying number of connections and inputs, where each input is connected to at least one output and each output is connected to at least one input.
  • connection nodes which also do not perform substantive operations, but are used to connect each input to one or more outputs, that is, to implement Data diversion (copying), convergence, crossover, etc., to change the flow direction of data.
  • connection node can be regarded as a special "switching node (static switching node)"; the difference from the actual switching node is that the connection relationship between the input and the output in the connection node is determined and does not change according to the switching instructions. connection relationship.
  • the upper end of the graph in the switching node is a three-cross and the lower end is a one-cross, which means that the switching node is used to connect all three inputs to the one output.
  • switching node can also implement other functions, such as branching (copying) one input to multiple outputs, etc.
  • At least one of the following is further included:
  • the switching nodes and control nodes in it can also be merged and split to further optimize the calculation graph, and then the nodes in the optimized calculation graph can be mapped. , to obtain better mapping results.
  • the two switching nodes are directly connected and controlled by two control nodes respectively. Therefore, the two can be merged into one that can be controlled by two control nodes at the same time. Switch nodes that implement more complex switching functions.
  • connection node there is also a connection node in Figure 3, and as before, the connection node can also be regarded as a "static switching node" with an unchanged state. Therefore, referring to Figure 5, the connection node can also be merged with the switching node, but after the merger Some of the connection relationships among the switching nodes are fixed.
  • operation nodes can also perform operations such as splitting and merging. For example, when the amount of computing required by a single computing node is too large to be mapped to one processing unit, it can be split into multiple serial or parallel computing nodes; and multiple serial or parallel computing nodes, It can also be combined into one computing node for processing.
  • mapping nodes in the computational graph to processing units (S102) includes:
  • each node in the calculation graph can be mapped to the processing core of the many-core system, that is, the target operation is mapped to the many-core system.
  • the many-core system includes multiple processing cores (Core) with processing capabilities; data can be transmitted between each processing core, and between each processing core and the outside of the many-core system through the on-chip network; through each processing Some nodes of the mapping calculation graph in the core can be used to process target operations using a many-core system, such as using a many-core system to run a neural network.
  • Core processing cores
  • mapping nodes in the computational graph to processing units (S102) includes:
  • nodes of the same type are mapped to processing units of the same type.
  • mapping nodes in the calculation graph to at least two different types of processing units (S1022) at least one of the following is satisfied:
  • Computational nodes are mapped to processors
  • Control nodes are mapped into logical controllers
  • Switching nodes are mapped into switching circuits.
  • the computing node mainly performs actual operations (such as addition, subtraction, multiplication, division, matrix operations, etc.), so that it can be mapped to a processor (such as a processing core) used to process general operations.
  • the control node mainly performs logical operations (such as comparison, judgment, etc.), so it can be mapped to the logic controller for implementation; while the switching node actually functions as a "switch", so it can be mapped to the switch circuit (such as gate circuit), of course, the switch circuit should be controlled by the control instruction.
  • the computing node satisfies at least one of the following:
  • At least some operation nodes perform operations when all their inputs have input data
  • At least some of the operation nodes perform operations when at least one of their inputs has input data.
  • the calculation graph constructed according to the embodiment of the present disclosure includes control nodes and switching nodes. Since switching nodes will change the data flow direction, the input data of the computing nodes at different times may be different. If possible, There may be situations where "sometimes all inputs have input data, and sometimes some or all inputs have no input data.”
  • trigger rules starting rules
  • the trigger rules can stipulate that all inputs must be The operation is performed only when there is input data. It can also be specified that the operation is performed when at least one input or part of the specific inputs has input data.
  • an operation node has three inputs, and its sub-operation is to sum the input data as output data
  • its triggering rule can be to perform the summation calculation only when all three inputs have input data, or its triggering rule It is also possible to perform a summation calculation as long as at least one input (or part of a specific input) has input data (an input without input data can be considered as 0).
  • control node performs certain logical operations on its input data and generates control instructions, which can also be regarded as the control node working according to certain triggering rules (that is, the control instructions are generated based on the input only when the triggering rules are met). For example, if a control node has two inputs, performs a "comparison operation" on the two inputs, and generates a control instruction when a specific comparison result (such as the two input data are the same); then the triggering rules of the control node can also be viewed It is "both inputs have input data, and the two input data are the same, that is, triggering".
  • the switching node determines the connection relationship between its input and output according to the control instruction, which can also be regarded as the switching node working according to certain triggering rules (that is, at least one input and output are connected only when the triggering rules are met). For example, if a switching node has one input and one output, and is connected to control node 1, control node 2, and control node 3 at the same time, and there are control instructions in control node 1 and control node 2 at the same time, and/or, the control node 3.
  • the triggering rule of the switching node can also be regarded as "satisfying at least one of the following, that is, triggering: (1) Control node 1 and control node 2 have control instructions at the same time, (2) Control node 3 has control instructions”.
  • connection relationship between the input and output of the connection node is determined, so it can also be regarded as a special static switching node. Therefore, whether the connection node has output data is only determined by its input data, and this can also be regarded as a connection.
  • the node works according to certain triggering rules (that is, output data is generated only when the triggering rules are met). For example, if the three inputs of a connection node are all connected to the same output, its triggering rule can be regarded as "triggered as long as at least one input has input data.”
  • the present disclosure provides an electronic device.
  • an electronic device 500 includes:
  • At least one processor 501 At least one processor 501;
  • the memory 502 stores one or more computer programs that can be executed by at least one processor 501, and the one or more computer programs are executed by at least one processor 501, so that at least one processor 501 can perform the above mapping method. .
  • the present disclosure provides a computer-readable storage medium.
  • a computer program is stored on the computer-readable storage medium 600 of the embodiment of the present disclosure, where the computer program implements the above-mentioned mapping method when executed by a processor (including a processing core).
  • the computer-readable storage media of embodiments of the present disclosure may be volatile or non-volatile computer-readable storage media.
  • Embodiments of the present disclosure also provide a computer program product, which includes computer readable code, or a non-volatile computer readable storage medium carrying the computer readable code.
  • computer readable code When the computer readable code is run in a processor of an electronic device When, the processor in the electronic device executes the above mapping method.
  • computer storage media includes volatile and non-volatile media implemented in any method or technology for storage of information such as computer readable program instructions, data structures, program modules or other data. lossless, removable and non-removable media.
  • Computer storage media include, but are not limited to, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM), static random access memory (SRAM), flash memory or other memory technology, portable Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disk (DVD) or other optical disk storage, magnetic cassette, magnetic tape, disk storage or other magnetic storage device, or that can be used to store the desired information and can be accessed by a computer any other media.
  • communication media typically embodies computer readable program instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery medium.
  • Computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to various computing/processing devices, or to an external computer or external storage device over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network.
  • the network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers.
  • a network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage on a computer-readable storage medium in the respective computing/processing device .
  • Computer program instructions for performing operations of the present disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or instructions in one or more programming languages.
  • the computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server implement.
  • the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as an Internet service provider through the Internet). connect).
  • LAN local area network
  • WAN wide area network
  • an external computer such as an Internet service provider through the Internet. connect
  • an electronic circuit such as a programmable logic circuit, a field programmable gate array (FPGA), or a programmable logic array (PLA)
  • the electronic circuit can Computer readable program instructions are executed to implement various aspects of the disclosure.
  • the computer program product described here may be implemented specifically through hardware, software, or a combination thereof.
  • the computer program product is embodied as a computer storage medium.
  • the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK) and so on.
  • These computer-readable program instructions may be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus, thereby producing a machine that, when executed by the processor of the computer or other programmable data processing apparatus, , resulting in an apparatus that implements the functions/actions specified in one or more blocks in the flowchart and/or block diagram.
  • These computer-readable program instructions can also be stored in a computer-readable storage medium. These instructions cause the computer, programmable data processing device and/or other equipment to work in a specific manner. Therefore, the computer-readable storage medium storing the instructions is Includes an article of manufacture that includes instructions that implement aspects of the functions/acts specified in one or more blocks of the flowcharts and/or block diagrams.
  • Computer-readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other equipment, causing a series of operating steps to be performed on the computer, other programmable data processing apparatus, or other equipment to produce a computer-implemented process , thereby causing instructions executed on a computer, other programmable data processing apparatus, or other equipment to implement the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions that contains one or more executable functions for implementing the specified logical functions instruction.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two consecutive blocks may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending on the functionality involved.
  • each block of the block diagram and/or flowchart illustration, and combinations of blocks in the block diagram and/or flowchart illustration can be implemented by special purpose hardware-based systems that perform the specified functions or acts. , or can be implemented using a combination of specialized hardware and computer instructions.
  • Example embodiments have been disclosed herein, and although specific terms are employed, they are used and should be interpreted in a general illustrative sense only and not for purpose of limitation. In some instances, it will be apparent to those skilled in the art that features, characteristics and/or elements described in connection with a particular embodiment may be used alone, or may be used in conjunction with other embodiments, unless expressly stated otherwise. Features and/or components used in combination. Accordingly, it will be understood by those skilled in the art that various changes in form and details may be made without departing from the scope of the present disclosure as set forth in the appended claims.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Image Processing (AREA)
  • Hardware Redundancy (AREA)

Abstract

本公开提供了一种映射的方法,该方法包括:根据待映射的目标算法构建具有多个节点的计算图;将所述计算图中的节点映射至处理单元中;其中,每个所述节点与至少一个其它节点连接,并具有输入和输出;所述节点包括运算节点、控制节点、切换节点;每个所述运算节点用于对来自其输入的输入数据进行子运算,以在其输出产生输出数据;每个所述控制节点用于对来自其输入的输入数据进行逻辑运算,以在其输出产生控制指令;每个所述切换节点与至少一个控制节点连接,用于根据来自其所连接的控制节点的控制指令,控制其输入与输出的连接关系。本公开还提供了一种电子设备、计算机可读存储介质。

Description

映射的方法、电子设备、计算机可读存储介质 技术领域
本公开涉及计算图映射技术领域,特别涉及映射的方法、电子设备、计算机可读存储介质。
背景技术
为提高运行效率,可将需要进行的整体运算(用于实现一定功能的运算,如语音识别、图像处理等)的不同部分映射(或者说分配)到不同的处理单元中运行。例如,可将一个神经网络中的不同神经元或不同层,映射到众核系统的不同处理核中分别运行。
为实现映射,先要构建“计算图”,计算图中用多个节点和节点间的连接表示整体运算的逻辑过程,每个节点表示整体运算中的部分子运算,而连接表示节点间的数据(Token)流动,如一个节点的输出与另一个节点的输入连接,表示在前节点的输出数据作为在后节点的输入数据(例如是一个神经元的运算结果作为另一个神经元的运算依据)。得到计算图后,可根据计算图将其中的各节点映射至相应处理单元中。
随着技术的发展,越来越多的运算中包括“控制流”,即对于不同的输入实际进行的运算过程可能不同,例如判断、循环等过程均属于控制流。而相关技术中,很简单的控制流就会导致计算图的复杂性大大增加(控制爆炸),这会给计算图的构建、理解造成很大困难,且会导致根据这样的计算图的映射结果不合理,无法充分利用各处理单元的算力,并增加不同处理单元之间要传输的数据量。
发明内容
本公开提供一种映射的方法、电子设备、计算机可读存储介质。
第一方面,本公开实施例提供一种映射的方法,其包括:
根据待映射的目标算法构建具有多个节点的计算图;
将所述计算图中的节点映射至处理单元中;
其中,每个所述节点与至少一个其它节点连接,并具有输入和输出;所述节点包括运算节点、控制节点、切换节点;每个所述运算节点用于对来自其输入的输入数据进行子运算,以在其输出产生输出数据;每个所述控制节点用于对来自其输入的输入数据进行逻辑运算,以在其输出产生控制指令;每个所述切换节点与至少一个控制节点连接,用于根据来自其所连接的控制节点的控制指令,控制其输入与输出的连接关系。
在一些实施例中,所述节点还包括连接节点;
每个所述连接节点包括数量不等的连接和输入,其中每个输入与至少一个输出连接,每个输出与至少一个输入连接。
在一些实施例中,在所述将所述计算图中的节点映射至处理单元中之前,所述方法还包括以下至少一项:
将两个直接相连的切换节点合并为一个切换节点;
将一个切换节点拆分为两个直接相连的切换节点。
在一些实施例中,在所述将所述计算图中的节点映射至处理单元中之前,所述方法还包括以下至少一项:
将两个与相同切换节点连接的控制节点合并为一个控制节点;
将一个控制节点拆分为两个与相同切换节点连接的控制节点。
在一些实施例中,所述将所述计算图中的节点映射至处理单元中,包括:
将所述计算图中的节点映射至众核系统的处理核中。
在一些实施例中,所述将所述计算图中的节点映射至处理单元中,包括:
将所述计算图中的节点映射至至少两种不同类型的处理单元中;其中,同类型的节点映射至相同类型的处理单元中。
在一些实施例中,所述将所述计算图中的节点映射至至少两种不同类型的处理单元中时,满足以下至少一项:
所述运算节点被映射至处理器中;
所述控制节点被映射至逻辑控制器中;
所述切换节点被映射至开关电路中。
在一些实施例中,所述运算节点满足以下至少一项:
至少部分所述运算节点在其所有输入均有输入数据时才进行运算;
至少部分所述运算节点在其至少一个输入有输入数据时即进行运算。
第二方面,本公开提供了一种电子设备,该电子设备包括:
至少一个处理器;
与所述至少一个处理器通信连接的存储器;
其中,所述存储器存储有可被所述至少一个处理器执行的一个或多个计算机程序,一个或多个所述计算机程序被所述至少一个处理器执行,以使所述至少一个处理器能够执行上述的映射的方法。
第三方面,本公开提供了一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序在被处理器执行时实现上述的映射的方法。
本公开实施例中,控制节点起到了决定“进行什么控制”的作用,而切换节点则“实现了相应的控制”,由此,每一个控制(如判断)通过两个节点(控制节点和切换节点)即可实现;因此,对具有控制流的相同运算,本公开实施例得到的计算图相对基于数据流的计算图结构要简单得多,计算图容易构建且易于理解,而根据这样的计算图得到的映射结果也更合理,能充分利用各处理单元的算力,并降低不同处理单元之间要传输的数据量。
附图说明
附图用来提供对本公开的进一步理解,并且构成说明书的一部分,与详细实施例一起用于解释本公开,并不构成对本公开的限制。通过参考附图对详细实施例进行描述,以上和其它特征和优点对本领域技术人员将变得更加显而易见,在附图中:
图1为相关技术中根据一定的运算构建的计算图;
图2为本公开实施例提供的一种映射的方法的流程图;
图3为本公开实施例的映射的方法中根据一定的运算构建的计算图;
图4为本公开实施例的映射的方法中切换节点的作用示意图;
图5为本公开实施例的映射的方法中节点合并过程的示意图;
图6为本公开实施例提供的一种电子设备的组成框图;
图7为本公开实施例提供的一种计算机可读存储介质的组成框图。
具体实施方式
为使本领域的技术人员更好地理解本公开的技术方案,以下结合附图对本公开的示范性实施例做出说明,其中包括本公开实施例的各种细节以助于理解,应当将它们认为仅仅是示范性的。因此,本领域普通技术人员应当认识到,可以对这里描述的实施例做出各种改变和修改,而不会背离本公开的范围和精神。同样,为了清楚和简明,以下的描述中省略了对公知功能和结构的描述。
在不冲突的情况下,本公开各实施例及实施例中的各特征可相互组合。
如本文所使用的,术语“和/或”包括一个或多个相关列举条目的任何和所有组合。
本文所使用的术语仅用于描述特定实施例,且不意欲限制本公开。如本文所使用的,单数形式“一个”和“该”也意欲包括复数形式,除非上下文另外清楚指出。还将理解的是,当本说明书中使用术语“包括”和/或“由……制成”时,指定存在所述特征、整体、步骤、操作、 元件和/或组件,但不排除存在或添加一个或多个其它特征、整体、步骤、操作、元件、组件和/或其群组。“连接”或者“相连”等类似的词语并非限定于物理的或者机械的连接,而是可以包括电性的连接,不管是直接的还是间接的。
除非另外限定,否则本文所用的所有术语(包括技术和科学术语)的含义与本领域普通技术人员通常理解的含义相同。还将理解,诸如那些在常用字典中限定的那些术语应当被解释为具有与其在相关技术以及本公开的背景下的含义一致的含义,且将不解释为具有理想化或过度形式上的含义,除非本文明确如此限定。
在一些相关技术中,主要基于“数据流”构建计算图(也称为数据流图),并根据该计算图进行映射。
其中,基于数据流的计算图中,当有判断、循环等控制流的流程时,其每种不同的结果分支都分别作为不同的连接(或者说作为不同数据流),由此会导致计算图的复杂程度大大提高。
例如,对以下包括控制流(判断过程)的程序代码代表的运算,按照相关技术,其对应的基于数据流的计算图可参照图1(可参见相关技术Dennis J B,Fosseen J B,Linderman J P.Data flow schemas[J].Lecture Notes in Computer Science,1974:187-216):
Figure PCTCN2022092624-appb-000001
图1中,x、y、w、v、z分别表示相应的数据(Token),h、f、p、q、g分别表示相应的函数(子运算),T、F分别表示判断结果为是(True)或否(False)的情况,连线表示连接,箭头表示连接中的 数据流向。
可见,以上程序代码代表的逻辑过程其实很简单,但其对应的计算图已经非常复杂。因此,根据相关技术的映射方式,会导致计算图结构复杂,给计算图的构建、理解造成很大困难,且会导致根据这样的计算图的映射结果不合理,无法充分利用各处理单元的算力,并增加不同处理单元之间要传输的数据量。
第一方面,本公开实施例提供一种映射的方法。
本公开实施例的映射的方法用于将需要运行的目标算法(如一个神经网络)映射(或者说分配)到处理设备的各处理单元中。
其中,需要运行的目标算法(如一个神经网络)是用于实现一定功能的,如语音识别、图像处理等。
其中,处理设备包括多个处理单元,每个处理单元均具有一定的数据处理能力,且不同处理单元之间可实现数据传输;从而在映射完成后,每个处理单元可处理目标算法映射至其中的部分内容,通过各处理单元的协作共同实现以上的目标算法。
其中,本公开实施例的映射的方法可以通过独立的映射设备实现,也可通过处理设备中的部分器件(如单独的控制器,或部分处理单元)实现。
参照图2,本公开实施例的映射的方法包括:
S101、根据待映射的目标算法构建具有多个节点的计算图。
S102、将计算图中的节点映射至处理单元中。
其中,每个节点与至少一个其它节点连接,并具有输入和输出;节点包括运算节点、控制节点、切换节点;每个运算节点用于对来自其输入的输入数据进行子运算,以在其输出产生输出数据;每个控制节点用于对来自其输入的输入数据进行逻辑运算,以在其输出产生控制指令;每个切换节点与至少一个控制节点连接,用于根据来自其所连接的控制节点的控制指令,控制其输入与输出的连接关系。
本公开实施例的映射的方法中,先根据目标算法构建出计算图。
计算图包括多个节点,每个节点具有一个或多个输入(输入端),以及一个或多个输出(输出端),从而每个节点都可对来自其输入的输入数据进行一定的处理,以在其输出产生输出数据。计算图还包括连接在不同节点间的连接,连接用于传输数据(Token),如将一个节点产生的输出数据,传输至其它节点的输入。
在得到以上计算图后,则将其中的节点映射至处理设备中的处理单元中,具体是每个节点必须映射至一个处理单元中进行处理,当然,处理设备的每个处理单元中可映射有一个或多个节点,也可没有映射节点;从而每个处理单元可处理映射至其中的节点的计算,且若之间具有连接(即需要传输数据)的节点位于不同处理单元中,则需要相应处理单元间进行数据传输。
从充分利用处理设备的处理能力的角度考虑,计算图通常是被映射至多个处理单元中的;但当计算图总规模较小时,若所有节点均被映射至一个处理单元中,也是可行的。
本公开实施例的计算图中,除直接用于进行一定实际的运算(子运算)的运算节点外,还具有控制节点和切换节点。
其中,控制节点可根据输入数据产生一定的控制指令,即对输入数据进行一定的逻辑处理(如判断输入数据是否满足一定条件),并根据逻辑处理的结果(如判断结果)确定在当前情况下“应当”进行什么运算,也就是确定当前“数据应当如何流动”,并产生相应的控制指令。
而切换节点(Switch)则可根据来自控制节点的控制指令,改变其自身输入与输出的连接关系,即决定来自其每个输入的数据能传输至其哪个(或哪些)输出,从而切换节点可在不进行实质性运算的情况下,根据控制指令改变其它节点间的连接关系,也就是“控制数据的流动”。
本公开实施例中,控制节点起到了决定“进行什么控制”的作用,而切换节点则“实现了相应的控制”,由此,每一个控制(如判断)通过两个节点(控制节点和切换节点)即可实现;因此,对具有控制 流的相同运算,本公开实施例得到的计算图相对基于数据流的计算图结构要简单得多,计算图容易构建且易于理解,而根据这样的计算图得到的映射结果也更合理,能充分利用各处理单元的算力,并降低不同处理单元之间要传输的数据量
例如,对以上基于数据流的计算图(图1),按照本公开实施例的方式其对应的计算图可参照图3,可见,图3中计算图的结构远比图1中计算图的结构简单。
图3中,各图形的意义如其上方的说明,x、y、w、v、z表示分别表示相应的数据(Token),h、f、p、q、g分别表示相应的函数(子运算),连线表示连接,箭头表示连接中的数据流向。
其中,控制指令的具体形式是多样的。例如,控制指令可为“矩阵”的形式。
具体的,参照图4左侧,假设对一个切换节点,其具有四个输入a1、a2、a3、a4,三个输出b1、b2、b3,则其对应的控制指令可为一个4行*3列的矩阵。
若控制指令为图4中的矩阵时,则其表示的连接可参照图4右侧,即,来自输入a2的输入数据可达到输出b1(即输入a2连接输出b1),来自输入a3的输入数据可达到输出b2,来自输入a1、a3、a4的输入数据均可达到输出b3。
当然,根据控制指令的不同,切换节点表示的连接关系也是不同的。
在一些实施例中,节点还包括连接节点;
每个连接节点包括数量不等的连接和输入,其中每个输入与至少一个输出连接,每个输出与至少一个输入连接。
参照图3,本公开实施例得到的计算图中,还可包括连接节点,其同样不进行实质性的运算,而是用于使每一个输入与一个或多个输出连接,也就是用于实现数据的分流(复制)、汇聚、交叉等,以改变数据的流向。
由此,连接节点可视为一种特殊的“切换节点(静态切换节点)”;而与实际切换节点的区别在于,连接节点中输入与输出的连接关系是确定的,而不是根据切换指令改变连接关系。
应当理解,连接节点的输入与输出数量应当不同,即不能是每个输入都连接且只连接一个输出,因若是如此其就是相当于一条或多条连接而已,而不是节点。
例如,参照图3,其中切换节点内的图形上端为三叉,下端为一叉,即表示该切换节点用于将三个输入均连接至该一个输出。
应当理解,切换节点也可实现其它的功能,例如将一个输入分流(复制)到多个输出等。
应当理解,图3中示出的切换节点内具体图形的意义只是示例性的,而不是对本公开实施例的限定。
在一些实施例中,在将计算图中的节点映射至处理单元中(S102)之前,还包括以下至少一项:
S1011、将两个直接相连的切换节点合并为一个切换节点。
S1012、将一个切换节点拆分为两个直接相连的切换节点。
S1013、将两个与相同切换节点连接的控制节点合并为一个控制节点。
S1014、将一个控制节点拆分为两个与相同切换节点连接的控制节点。
作为本公开实施例的一种形式,在得到计算图后,还可对其中的切换节点、控制节点进行合并、拆分,以进一步优化计算图,之后再对优化的计算图中的节点进行映射,以获得更好的映射结果。
例如,参照图5,对图3所示的计算图,其中两个切换节点直接连接,并分别受两个控制节点的控制,故二者可合并为一个同时受两个控制节点控制的、能实现更复杂的切换功能的切换节点。
再如,参照图5,在切换节点合并后,其中两个控制节点相当于控制同一个切换节点,因此这两个控制节点也可合并为一个控制节点, 从而其能根据更复杂的规则产生更复杂的控制指令,控制该切换节点。
应当理解,控制节点、切换节点的拆分过程是与此相反的过程,在此不再详细描述。
应当理解,图3中还具有连接节点,而如前,连接节点也可视为一种状态不变的“静态切换节点”,因此参照图5,连接节点也可以与切换节点合并,只是合并后的切换节点中有部分连接关系是固定不变的。
应当理解,每次执行本公开实施例的方法时,是否进行以上拆分、合并的步骤,以及具体对哪些节点进行以上拆分、合并的步骤,都可根据计算图的情况和处理单元的情况确定。
应当理解,运算节点也可进行拆分、合并等操作。例如,当单个运算节点所需的运算量过大而无法映射至一个处理单元中时,可将其拆分为多个串行或并行的运算节点;而多个串行或并行的运算节点,也可合并为一个运算节点处理。
在一些实施例中,将计算图中的节点映射至处理单元中(S102)包括:
S1021、将计算图中的节点映射至众核系统的处理核中。
作为本公开实施例的一种方式,可以是将计算图中的各节点分别映射至众核系统的处理核中,也就是将目标运算映射至众核系统中。
众核系统包括多个具有处理能力的处理核(Core);而各处理核之间,以及各处理核与众核系统外部之间,均可通过片上网络实现数据的传输;通过在每个处理核中映射计算图的部分节点,可用众核系统实现对目标运算的处理,例如用众核系统运行神经网络。
在一些实施例中,将计算图中的节点映射至处理单元中(S102)包括:
S1022、将计算图中的节点映射至至少两种不同类型的处理单元中。
其中,同类型的节点映射至相同类型的处理单元中。
作为本公开实施例的另一种方式,也可以是设置多种不同类型的 处理单元,分别特别适用于对不同类型的节点进行处理(当然此时计算图的节点必定是被映射至多个处理单元中的),从而可将各类节点分别映射至其对应类型的处理单元中,以进一步充分利用处理单元的算力。
在一些实施例中,将计算图中的节点映射至至少两种不同类型的处理单元中(S1022)时,满足以下至少一项:
运算节点被映射至处理器中;
控制节点被映射至逻辑控制器中;
切换节点被映射至开关电路中。
作为本公开实施例的一种具体方式,运算节点主要进行的是实际的运算(如加减乘除、矩阵运算等),从而其可被映射至用于处理通用运算的处理器(如处理核)中;而控制节点主要进行的是逻辑运算(如比较、判断等),因此其可被映射至逻辑控制器中实现;而切换节点实际起到“开关”的作用,因此可被映射至开关电路(如门电路)中实现,当然,该开关电路应当受到控制指令的控制。
在一些实施例中,运算节点满足以下至少一项:
至少部分运算节点在其所有输入均有输入数据时才进行运算;
至少部分运算节点在其至少一个输入有输入数据时即进行运算。
如前,根据本公开实施例构建的计算图中包括控制节点和切换节点,而由于切换节点会改变数据流向,故对运算节点而言其在不同时刻的输入数据的情况可能有不同,如可能会存在“有时全部输入都有输入数据,有时部分或全部输入无输入数据”的情况。
而根据运算节点需求的不同,也可为不同的运算节点设定不同的“触发规则(启动规则)”,即运算节点只在满足触发规则时才进行运算,例如触发规则可以是规定必须所有输入均有输入数据时才进行运算,也可以是规定在有至少一个输入或部分特定输入有输入数据时即进行运算。
例如,若一个运算节点具有三个输入,其子运算是对输入数据求 和作为输出数据,则其触发规则可以是仅在三个输入均有输入数据时才进行求和计算,或其触发规则也可以是只要有至少一个输入(或部分特定输入)有输入数据即进行求和计算(可认为没有输入数据的输入为0)。
应当理解,以上触发规则不能与运算节点的子运算的要求矛盾。例如,若某个运算节点必须基于多个输入数据进行运算,则当其只有一个输入有输入数据时,则必定不能触发。
应当理解,控制节点对其输入数据进行一定的逻辑运算并产生控制指令,而这也可视为控制节点根据一定的触发规则工作(即满足触发规则时才根据输入产生控制指令)。例如,若某控制节点有两个输入,并对两个输入进行“比较运算”,且在特定比较结果(如两个输入数据相同)时产生控制指令;则该控制节点的触发规则也可视为“两个输入均有输入数据,且两个输入数据相同,即触发”。
应当理解,切换节点根据控制指令控制确定其输入与输出的连接关系,这也可视为切换节点根据一定的触发规则工作(即满足触发规则时才将至少一个输入与输出导通)。例如,若某切换节点有一个输入和一个输出,且同时连接控制节点1、控制节点2、控制节点3,并在控制节点1、控制节点2中同时有控制指令时,和/或,控制节点3有控制指令时,将其输入和输出导通;则该切换节点的触发规则也可视为“满足以下至少一项,即触发:(1)控制节点1和控制节点2同时有控制指令,(2)控制节点3有控制指令”。
应当理解,连接节点的输入与输出间的连接关系确定,故也可视为特殊的静态切换节点,因此,连接节点是否有输出数据仅由其输入数据的情况决定,而这也可视为连接节点根据一定的触发规则工作(即满足触发规则时才在输出产生输出数据)。例如,若某连接节点的三个输入均连接同一个输出,则其触发规则可视为“只要有至少一个输入有输入数据,即触发”。
第二方面,本公开提供了一种电子设备。
参照图6,本公开实施例的电子设备500包括:
至少一个处理器501;
与至少一个处理器501通信连接的存储器502;
其中,存储器502存储有可被至少一个处理器501执行的一个或多个计算机程序,一个或多个计算机程序被至少一个处理器501执行,以使至少一个处理器501能够执行上述的映射的方法。
第三方面,本公开提供了一种计算机可读存储介质。
参照图7,本公开实施例的计算机可读存储介质600上存储有计算机程序,其中,计算机程序在被处理器(包括处理核)执行时实现上述的映射的方法。
本公开实施例的计算机可读存储介质可以是易失性或非易失性计算机可读存储介质。
本公开实施例还提供了一种计算机程序产品,包括计算机可读代码,或者承载有计算机可读代码的非易失性计算机可读存储介质,当计算机可读代码在电子设备的处理器中运行时,电子设备中的处理器执行上述的映射的方法。
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统、装置中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。在硬件实施方式中,在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分;例如,一个物理组件可以具有多个功能,或者一个功能或步骤可以由若干物理组件合作执行。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读存储介质上,计算机可读存储介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。
如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读程序指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM)、静态随机存取存储器(SRAM)、闪存或其他存储器技术、便携式压缩盘只读存储器(CD-ROM)、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读程序指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。
这里所描述的计算机可读程序指令可以从计算机可读存储介质下载到各个计算/处理设备,或者通过网络、例如因特网、局域网、广域网和/或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令,并转发该计算机可读程序指令,以供存储在各个计算/处理设备中的计算机可读存储介质中。
用于执行本公开操作的计算机程序指令可以是汇编指令、指令集架构(ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码,编程语言包括面向对象的编程语言—诸如Smalltalk、C++等,以及常规的过程式编程语言—诸如“C”语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络—包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。在 一些实施例中,通过利用计算机可读程序指令的状态信息来个性化定制电子电路,例如可编程逻辑电路、现场可编程门阵列(FPGA)或可编程逻辑阵列(PLA),该电子电路可以执行计算机可读程序指令,从而实现本公开的各个方面。
这里所描述的计算机程序产品可以具体通过硬件、软件或其结合的方式实现。在一个可选实施例中,计算机程序产品具体体现为计算机存储介质,在另一个可选实施例中,计算机程序产品具体体现为软件产品,例如软件开发包(Software Development Kit,SDK)等等。
这里参照根据本公开实施例的方法、装置(系统)和计算机程序产品的流程图和/或框图描述了本公开的各个方面。应当理解,流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合,都可以由计算机可读程序指令实现。
这些计算机可读程序指令可以提供给通用计算机、专用计算机或其它可编程数据处理装置的处理器,从而生产出一种机器,使得这些指令在通过计算机或其它可编程数据处理装置的处理器执行时,产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中,这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作,从而,存储有指令的计算机可读存储介质则包括一个制造品,其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。
也可以把计算机可读程序指令加载到计算机、其它可编程数据处理装置、或其它设备上,使得在计算机、其它可编程数据处理装置或其它设备上执行一系列操作步骤,以产生计算机实现的过程,从而使得在计算机、其它可编程数据处理装置、或其它设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。
附图中的流程图和框图显示了根据本公开的多个实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或指令的 一部分,模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
本文已经公开了示例实施例,并且虽然采用了具体术语,但它们仅用于并仅应当被解释为一般说明性含义,并且不用于限制的目的。在一些实例中,对本领域技术人员显而易见的是,除非另外明确指出,否则可单独使用与特定实施例相结合描述的特征、特性和/或元素,或可与其他实施例相结合描述的特征、特性和/或元件组合使用。因此,本领域技术人员将理解,在不脱离由所附的权利要求阐明的本公开的范围的情况下,可进行各种形式和细节上的改变。

Claims (10)

  1. 一种映射的方法,其特征在于,包括:
    根据待映射的目标算法构建具有多个节点的计算图;
    将所述计算图中的节点映射至处理单元中;
    其中,每个所述节点与至少一个其它节点连接,并具有输入和输出;所述节点包括运算节点、控制节点、切换节点;每个所述运算节点用于对来自其输入的输入数据进行子运算,以在其输出产生输出数据;每个所述控制节点用于对来自其输入的输入数据进行逻辑运算,以在其输出产生控制指令;每个所述切换节点与至少一个控制节点连接,用于根据来自其所连接的控制节点的控制指令,控制其输入与输出的连接关系。
  2. 根据权利要求1所述的方法,其特征在于,所述节点还包括连接节点;
    每个所述连接节点包括数量不等的连接和输入,其中每个输入与至少一个输出连接,每个输出与至少一个输入连接。
  3. 根据权利要求1所述的方法,其特征在于,在所述将所述计算图中的节点映射至处理单元中之前,所述方法还包括以下至少一项:
    将两个直接相连的切换节点合并为一个切换节点;
    将一个切换节点拆分为两个直接相连的切换节点。
  4. 根据权利要求1所述的方法,其特征在于,在所述将所述计算图中的节点映射至处理单元中之前,所述方法还包括以下至少一项:
    将两个与相同切换节点连接的控制节点合并为一个控制节点;
    将一个控制节点拆分为两个与相同切换节点连接的控制节点。
  5. 根据权利要求1所述的方法,其特征在于,所述将所述计算图中的节点映射至处理单元中,包括:
    将所述计算图中的节点映射至众核系统的处理核中。
  6. 根据权利要求1所述的方法,其特征在于,所述将所述计算图中的节点映射至处理单元中,包括:
    将所述计算图中的节点映射至至少两种不同类型的处理单元中;其中,同类型的节点映射至相同类型的处理单元中。
  7. 根据权利要求6所述的方法,其特征在于,所述将所述计算图中的节点映射至至少两种不同类型的处理单元中时,满足以下至少一项:
    所述运算节点被映射至处理器中;
    所述控制节点被映射至逻辑控制器中;
    所述切换节点被映射至开关电路中。
  8. 根据权利要求1所述的方法,其特征在于,所述运算节点满足以下至少一项:
    至少部分所述运算节点在其所有输入均有输入数据时才进行运算;
    至少部分所述运算节点在其至少一个输入有输入数据时即进行运算。
  9. 一种电子设备,其特征在于,包括:
    至少一个处理器;
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的一个或多个计 算机程序,一个或多个所述计算机程序被所述至少一个处理器执行,以使所述至少一个处理器能够执行如权利要求1至8中任意一项所述的映射的方法。
  10. 一种计算机可读存储介质,其特征在于,其上存储有计算机程序,所述计算机程序在被处理器执行时实现如权利要求1至8中任意一项所述的映射的方法。
PCT/CN2022/092624 2022-05-06 2022-05-13 映射的方法、电子设备、计算机可读存储介质 WO2023212975A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210486838.0A CN114860443A (zh) 2022-05-06 2022-05-06 映射的方法、电子设备、计算机可读存储介质
CN202210486838.0 2022-05-06

Publications (1)

Publication Number Publication Date
WO2023212975A1 true WO2023212975A1 (zh) 2023-11-09

Family

ID=82635934

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/092624 WO2023212975A1 (zh) 2022-05-06 2022-05-13 映射的方法、电子设备、计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN114860443A (zh)
WO (1) WO2023212975A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080184041A1 (en) * 2007-01-31 2008-07-31 Microsoft Corporation Graph-Based Tamper Resistance Modeling For Software Protection
CN108292374A (zh) * 2015-11-09 2018-07-17 谷歌有限责任公司 训练表示为计算图的神经网络
CN110321064A (zh) * 2018-03-30 2019-10-11 北京深鉴智能科技有限公司 用于神经网络的计算平台实现方法及系统
CN112862068A (zh) * 2021-01-15 2021-05-28 复旦大学 面向复杂卷积神经网络的容错架构及方法
CN114330735A (zh) * 2020-09-30 2022-04-12 伊姆西Ip控股有限责任公司 处理机器学习模型的方法、电子设备和计算机程序产品
CN114356511A (zh) * 2021-08-16 2022-04-15 中电长城网际系统应用有限公司 任务分配方法、系统

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080184041A1 (en) * 2007-01-31 2008-07-31 Microsoft Corporation Graph-Based Tamper Resistance Modeling For Software Protection
CN108292374A (zh) * 2015-11-09 2018-07-17 谷歌有限责任公司 训练表示为计算图的神经网络
CN110321064A (zh) * 2018-03-30 2019-10-11 北京深鉴智能科技有限公司 用于神经网络的计算平台实现方法及系统
CN114330735A (zh) * 2020-09-30 2022-04-12 伊姆西Ip控股有限责任公司 处理机器学习模型的方法、电子设备和计算机程序产品
CN112862068A (zh) * 2021-01-15 2021-05-28 复旦大学 面向复杂卷积神经网络的容错架构及方法
CN114356511A (zh) * 2021-08-16 2022-04-15 中电长城网际系统应用有限公司 任务分配方法、系统

Also Published As

Publication number Publication date
CN114860443A (zh) 2022-08-05

Similar Documents

Publication Publication Date Title
JP7308003B2 (ja) 非集中的な分散型深層学習
TW201807621A (zh) 人造神經網路、人造神經元及其控制方法
CN109684087B (zh) 运算方法、装置及相关产品
WO2019239254A1 (en) Parallel computational architecture with reconfigurable core-level and vector-level parallelism
US20200311539A1 (en) Cloud computing data compression for allreduce in deep learning
WO2022171066A1 (zh) 基于物联网设备的任务分配方法、网络训练方法及装置
US20210209450A1 (en) Compressed weight distribution in networks of neural processors
JP7256811B2 (ja) アドバンストインタコネクト技術を利用してaiトレーニングを加速するための方法及びシステム
US20190332924A1 (en) Central scheduler and instruction dispatcher for a neural inference processor
CN110034920B (zh) 粗粒度可重构密码逻辑阵列的映射方法及装置
US20220207327A1 (en) Method for dividing processing capabilities of artificial intelligence between devices and servers in network environment
CN112868033A (zh) 用于提供具有可调节计算需求的机器学习模型的系统和方法
CN114861941A (zh) 多模型融合方法、装置、电子设备及计算机可读存储介质
WO2023212975A1 (zh) 映射的方法、电子设备、计算机可读存储介质
CA3163625C (en) Optimal interpretable decision trees using integer programming techniques
CN116097281A (zh) 经由无限宽度神经网络的理论的超参数传递
CN116909748A (zh) 算力资源分配方法、装置、电子设备及存储介质
Ortner et al. Online spatio-temporal learning with target projection
US20190325295A1 (en) Time, space, and energy efficient neural inference via parallelism and on-chip memory
WO2022063157A1 (zh) 参数配置方法及相关系统
WO2022098495A1 (en) Compiler configurable to generate instructions executable by different deep learning accelerators from a description of an artificial neural network
CN114443824A (zh) 数据处理方法、装置、电子设备及计算机存储介质
CA3151974A1 (en) Automated real time mortgage servicing and whole loan valuation
WO2021077284A1 (zh) 神经网络运行系统和方法
US20240028899A1 (en) Stickification using anywhere padding to accelerate data manipulation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22940669

Country of ref document: EP

Kind code of ref document: A1