WO2021164506A1 - 神经网络模型的数据处理方法、装置、设备及存储介质 - Google Patents

神经网络模型的数据处理方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2021164506A1
WO2021164506A1 PCT/CN2021/073758 CN2021073758W WO2021164506A1 WO 2021164506 A1 WO2021164506 A1 WO 2021164506A1 CN 2021073758 W CN2021073758 W CN 2021073758W WO 2021164506 A1 WO2021164506 A1 WO 2021164506A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
calculation
fused
network operators
operator
Prior art date
Application number
PCT/CN2021/073758
Other languages
English (en)
French (fr)
Inventor
黄炯凯
蔡权雄
牛昕宇
Original Assignee
深圳鲲云信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳鲲云信息科技有限公司 filed Critical 深圳鲲云信息科技有限公司
Priority to US17/800,689 priority Critical patent/US20240220765A1/en
Publication of WO2021164506A1 publication Critical patent/WO2021164506A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the embodiments of the present application relate to the technical field of computing boards, for example, to a data processing method, device, device, and storage medium of a neural network model.
  • the deep learning system can be set to provide a variety of functions in the central processing unit (CPU) or graphics processing unit (GPU).
  • CPU central processing unit
  • GPU graphics processing unit
  • Neural network model running on the processor.
  • framework version is updated faster. Fusion technology needs to be designed according to the architectural characteristics of different frameworks.
  • the processor runs the neural network model, such as the Caffe network model, it compiles and analyzes multiple computing nodes in the neural network model each time, and executes multiple calculation nodes in a certain form according to the structure of the neural network model. Compute the operation of the node.
  • the above-mentioned computing nodes are executed on different processors, frequent switching between different processors is performed, and the communication between different processors is frequent and multiple data copies are performed, which reduces the calculation speed of the neural network model. .
  • This application provides a data processing method, device, equipment, and storage medium of a neural network model, so as to improve the speed and efficiency of data flow.
  • an embodiment of the present application provides a data processing method of a neural network model, including: obtaining a plurality of neural network operators in the neural network model;
  • the calculation engine is used to calculate the calculation instruction.
  • the method further includes: judging whether the plurality of neural network operators can be fused, and responding Based on the judgment result that the plurality of neural network operators can be fused, the plurality of neural network operators are fused according to the preset rule to obtain a fused neural network operator.
  • the method further includes: in response to the judgment result that the multiple neural network operators cannot be fused, continuing to obtain new neural network operators .
  • the fusion of the plurality of neural network operators according to preset rules to obtain a fused neural network operator includes: sequentially following convolution, activation function, pooling/upsampling, shortcut, and activation function , Discharging the multiple neural network operators in the order of global pooling, and fusing the discharged neural network operators to obtain a fused neural network operator.
  • the calculation of the calculation instruction using the calculation engine includes: judging whether the calculation instruction is the only data flow operation, and in response to the judgment result that the calculation instruction is the only data flow operation, using the calculation The engine calculates the calculation instruction.
  • the method further includes: in response to the judgment result that the calculation instruction is not the only data flow operation, according to the fused neural network operator Regroup calculation instructions.
  • the method before the calculation using the calculation engine to calculate the calculation instruction, the method further includes: parsing the calculation instruction.
  • the embodiment of the present application further provides a data processing device of a neural network model, including:
  • the acquisition module is set to acquire multiple neural network operators in the neural network model
  • a fusion module configured to fuse the multiple neural network operators according to preset rules to obtain a fused neural network operator
  • a combination module configured to combine the fused neural network operators into calculation instructions
  • the calculation module is configured to use a calculation engine to calculate the calculation instruction.
  • an embodiment of the present application further provides a neural network data processing device, including one or more processors;
  • Storage device set to store one or more programs
  • the one or more processors implement the foregoing method.
  • the embodiment of the present application further provides a computer-readable storage medium on which a computer program is stored.
  • the computer program includes program instructions, and the program instructions implement the foregoing method when executed by a processor.
  • the embodiment of the application discloses a data processing method, device, equipment and storage medium of a neural network model.
  • the method includes: obtaining a plurality of neural network operators in the neural network model; The network operators are fused to obtain a fused neural network operator; the fused neural network operators are combined into calculation instructions; the calculation engine is used to calculate the calculation instructions.
  • the data processing method of a neural network model provided in this application speeds up the calculation process, solves the problem that the data flow calculation cannot be quickly realized in the related technology, and realizes that the algorithm instructions are parsed into one time after the algorithm instructions reach the driver.
  • Data flow process multiple data flow processes realize a neural network data flow as a whole, so that the data flow speed can reach the highest efficiency.
  • FIG. 1 is a flowchart of a data processing method of a neural network model provided in an embodiment of the application
  • FIG. 2 is a flowchart of a data processing method of a neural network model provided in an embodiment of the application
  • FIG. 3 is a schematic structural diagram of a data processing device of a neural network model provided in an embodiment of the present application
  • Fig. 4 is a schematic structural diagram of a computer device provided in an embodiment of the present application.
  • Some exemplary embodiments are described as processes or methods depicted as flowcharts. Although the flowchart describes multiple steps as sequential processing, many of the steps can be implemented in parallel, concurrently, or simultaneously. In addition, the order of multiple steps can be rearranged. The processing may be terminated when its operations are completed, but may also have additional steps not included in the drawings. Processing can correspond to methods, functions, procedures, subroutines, subroutines, and so on.
  • first”, “second”, etc. may be used herein to describe various directions, actions, steps or elements, etc., but these directions, actions, steps or elements are not limited by these terms. These terms are only used to distinguish a first direction, action, step or element from another direction, action, step or element.
  • the first calculation engine may be referred to as the second calculation engine, and similarly, the second calculation engine may be referred to as the first calculation engine. Both the first calculation engine and the second calculation engine are calculation engines, but the first calculation engine and the second calculation engine are not the same calculation engine.
  • the terms “first”, “second”, etc. cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Thus, the features defined with “first” and “second” may explicitly or implicitly include one or more of these features.
  • “plurality” means at least two, such as two, three, etc., unless specifically defined otherwise.
  • FIG. 1 is a flowchart of a data processing method of a neural network model provided by an embodiment of this application. This embodiment is applicable to a case where a data flow architecture is accelerated by a fusion calculation operator, and includes step 100 to step 130.
  • Step 100 Obtain multiple neural network operators in the neural network model.
  • one neural network model includes multiple algorithm instructions, one algorithm instruction includes multiple neural network operators, and the operation flow includes multiple neural network operators and multiple neural network operators with a multi-layer structure.
  • Step 110 Fusion the multiple neural network operators according to preset rules to obtain a fused neural network operator.
  • multiple neural network operators in the neural network are fused according to preset fusion rules (ie, preset rules).
  • the fusion process includes selecting a target operator from multiple neural network operators, obtaining the connection relationship between the target operator and the lower-level operator, and determining the fusion relationship according to the connection relationship.
  • the first neural network model includes a multi-layer structure
  • the calculation process includes a plurality of neural network operators of the multi-layer structure and a connection relationship between the plurality of neural network operators, and each layer structure corresponds to at least one Neural network operator
  • the neural network model architecture fusion device generates the calculation graph of the first neural network model according to the calculation process, including: the neural network model architecture fusion device selects a target from the multiple neural network operators Operator, the target operator is the starting node of the directed acyclic graph; the architecture fusion device of the neural network model obtains the lower layer operator of the target operator and the connection between the target operator and the lower layer operator Relationship; The architecture fusion device of the neural network model connects the lower node corresponding to the lower layer operator with the starting node according to the connection relationship between the target operator and the lower layer operator to obtain the directed acyclic graph.
  • the architecture fusion device of the neural network model determines the N fusionable nodes and M non-fusion nodes in the directed acyclic graph according to the information of at least two processing units corresponding to the plurality of neural network operators, and the fusionable nodes
  • the neural network operator corresponding to the node is an operator executed by the image processing unit (Image Processing Unit, IPU), and the N and the M are both integers greater than 1, and the architecture fusion device of the neural network model is able to
  • the fusion node divides the fusion segment to obtain a directed acyclic graph after the fusion segment is divided.
  • the directed acyclic graph after the fusion segment is divided includes P fusion segments, and the P is greater than or equal to 1 and less than or equal to all
  • the N is an integer; the architecture fusion device of the neural network model obtains the Q paths and M node layers of the M non-fusion nodes in the directed acyclic graph, the Q is greater than the M, and each non-fusion node
  • the fusion node corresponds to at least one path and one node layer;
  • the architecture fusion device of the neural network model simplifies the directed acyclic graph after the fusion segment is divided according to the Q paths and the M node layers to obtain the Directed acyclic graph after fusion.
  • the neural network operator corresponding to the non-fusion node is an operator that is not executed on the IPU.
  • each fusion segment is a subgraph of a directed acyclic graph, and at least one operator corresponding to at least one fusionable node in the same fusion segment is an operator executed on IPU, and the at least one operator is in IPU There is no need to switch processing units and multiple data copies for the above execution.
  • the architecture fusion device of the neural network model obtains the Q paths and M node layers of the M non-fusion nodes in the directed acyclic graph as follows: the architecture fusion device of the neural network model starts from the directed The first node layer of the acyclic graph begins to traverse layer by layer, and at least one path and one node layer corresponding to each non-fused node are obtained, and Q paths and M of M non-fused nodes in the directed acyclic graph are obtained. Node layer.
  • the architecture fusion device of the neural network model obtains the node connection relationship between the N fusionable nodes; the architecture fusion device of the neural network model divides the fusion segment of the N fusionable nodes, including: if the node m can be merged with The node connection relationship of the fusionable node n is adjacent nodes of the same node layer or parent-child nodes of different node layers, and the architecture fusion device of the neural network model divides the fusionable node m and the fusionable node n into the same In one fusion segment, both the fusionable node m and the fusionable node n are any one of the N fusionable nodes.
  • the architecture fusion device of the neural network model simplifies the directed acyclic graph after the fusion segment is divided according to the Q paths and the M node layers, including: the architecture fusion device of the neural network model obtains the M The node position relationship between non-fusion nodes; if the operator corresponding to the non-fusion node p is the same as the operator corresponding to the non-fusion node q, the architecture fusion device of the neural network model determines the non-fusion node p and the non-fusion node The node position relationship of q, the non-converged node p and the non-converged node q are both any one of the M non-converged nodes; if the non-converged node p and the non-converged node q are If the node position relationship is located at different node layers and on different paths, the architecture fusion device of the neural network model points the edge pointing to the non-fused node p to the non-fused node q, adding an edge of the non-fused no
  • the operator corresponding to the non-converged node q receives data sent by different nodes and performs calculations at different times, and the number of different nodes is the same as the number at different times. If the node position relationship between the non-converged node p and the non-converged node q is located at different node levels and on different paths, the architecture fusion device of the neural network model points the edge to the non-converged node q For the non-fused node p, add a node to which an edge of the non-fused node p points to the edge of the non-fused node q, and delete the non-fused node q. Among them, the neural network operator corresponding to the non-fusion node P receives data sent by different nodes at different times and performs calculations, and the number of different nodes is the same at different times.
  • Step 120 Combine the fused neural network operators into calculation instructions.
  • one algorithm instruction includes multiple neural network operators, and the fused neural network operators are combined into multiple calculation instructions to facilitate the allocation of a suitable number of calculation engines for calculation.
  • the arithmetic instruction and the calculation instruction may be different, and the calculation instruction may include one or more arithmetic instructions.
  • Step 130 Use a calculation engine to calculate the calculation instruction.
  • an appropriate number of calculation engines are selected to calculate the calculation instructions.
  • multiple calculation engines are used to recognize images. At this time, the more calculation engines are used, the faster the comparison between the image and the image in the database, and the longer the time to output the comparison result. short.
  • the technical solution of the embodiment of the present application obtains multiple neural network operators in a neural network model; fuse the multiple neural network operators according to preset rules to obtain a fused neural network operator; Neural network operators are combined into calculation instructions; the calculation engine is used to calculate the calculation instructions, which speeds up the calculation process, solves the problem of fast data flow calculation in related technologies, and realizes that the algorithm instructions can be Multiple operators are parsed into a data flow process, and multiple data flow processes realize a neural network data flow as a whole, so that the data flow speed can reach the highest efficiency.
  • Figure 2 is a flowchart of a neural network model data processing method in an embodiment of the application. This embodiment is an optional embodiment based on the first embodiment. In an implementation, the method includes: steps 200 to Step 250.
  • Step 200 Obtain multiple neural network operators in the neural network model.
  • Step 210 Determine whether the multiple neural network operators can be fused, and in response to the judgment result that the multiple neural network operators can be fused, perform fusion on the multiple neural network operators according to a preset rule to obtain The fused neural network operator; in response to the judgment result that the multiple neural network operators cannot be fused, continue to obtain new neural network operators.
  • the fusion method refers to the example in the first embodiment.
  • the neural network operator is discharged and the discharged neural network operator is sequentially arranged in the order of convolution, activation function, pooling/upsampling, shortcut, activation function, and global pooling Perform fusion to obtain the fused neural network operator.
  • convolution refers to the convolutional layer in the neural network, which is the core building block of the convolutional network that performs most of the heavy calculation work in the neural network.
  • the parameters in the convolutional layer are composed of a set of learnable filters.
  • the function of pooling is to gradually reduce the space size of the representation to reduce the parameters and calculations in the network, thereby controlling overfitting.
  • the pooling layer runs independently on each depth slice of the input, and uses the maximum MAX operation to adjust its size spatially.
  • the activation function is generally used between layers of a neural network to convert the output of the previous layer and input it to the next layer. If there is no nonlinear characteristic introduced by the activation function, the neural network is only equivalent to the matrix multiplication of the original perceptron.
  • the activation function includes non-linear characteristics: that is, when the activation function is non-linear, it can be proved that the two-layer neural network can approximate arbitrarily complex functions.
  • the activation function has the characteristic of continuous differentiability: since the training of neural network is an optimization method based on gradient, and the mathematical foundation is continuous differentiable, the selected activation function should also be continuously differentiable.
  • the step activation function is not continuous at the 0 point, and the derivative is 0 except for the zero point, so the gradient-based method cannot be used.
  • the value range of the activation function is limited, the gradient-based training method is often more stable, because the representation of features is more significantly affected by the limited weight.
  • the range of values is unlimited, training is usually more efficient, because the representation of features will significantly affect most of the weights. In this case, a smaller learning rate is usually required.
  • the activation function is monotonic. When the activation function is monotonic, the error surface of the single-layer network must be convex.
  • the neural network operator is discharged according to the preset rules, namely convolution + activation function + pooling/upsampling + shortcut + activation function + global pooling, and the neural network to be discharged is judged Whether the operator can be fused, if fusion can be performed, the neural network operator is fused according to the fusion method in the first embodiment.
  • Step 220 Fusion the multiple neural network operators according to a preset rule to obtain a fused neural network operator.
  • Step 230 Combine the fused neural network operators into calculation instructions.
  • Step 240 Analyze the calculation instruction.
  • the calculation instruction is analyzed, which includes disassembling the calculation instruction into multiple neural network operators, determining one or more neural network data streams according to these neural network operators, and assigning corresponding neural network data streams according to the neural network data streams.
  • the calculation engine performs calculation processing.
  • Step 250 It is judged whether the calculation instruction is the only data flow operation, and in response to the judgment result that the calculation instruction is the only data flow operation, the calculation engine is used to calculate the calculation instruction. In response to the judgment result that the calculation instruction is not the only data flow operation, the calculation instruction is recombined according to the fused neural network operator.
  • the neural network data stream obtained after the calculation instruction is parsed is judged to determine whether it is the only data stream.
  • the calculation engine generally can only process one data stream at a time, so there are multiple data streams at the same time. In the case of flow, it will affect the calculation time of the neural network. Firstly, judge whether it is the only data stream. If it is, directly assign the calculation engine to perform calculation processing on it. Only calculating one neural network data stream can greatly save the processing time of the neural network and achieve the problem of improving the speed and efficiency of the data stream. If it is not, the neural network operator obtained by splitting the calculation instruction is re-fused with the new neural network operator to form a new calculation instruction. Then recombine multiple new calculation instructions, and continue to determine whether it is the only data stream.
  • the technical solution of the embodiment of the present application obtains multiple neural network operators in the neural network model; determines whether the multiple neural network operators can be fused, and if so, calculates the multiple neural networks according to preset rules. If not, continue to obtain new neural network operators; perform fusion on the multiple neural network operators according to preset rules to obtain the fused neural network operator;
  • the fused neural network operator combination is a calculation instruction; the calculation instruction is analyzed; it is determined whether the calculation instruction is the only data flow operation, and if so, the calculation engine is used to calculate the calculation instruction.
  • the calculation instructions are recombined according to the fused neural network operator, which speeds up the calculation process, solves the problem of fast data flow calculation in related technologies, and realizes that after the algorithm instructions arrive at the driver, multiple The operator is parsed into a data flow process, and multiple data flow processes realize a neural network data flow as a whole, so that the data flow speed can reach the highest efficiency.
  • FIG. 3 is a schematic structural diagram of a data processing device 300 of a neural network model in an embodiment of the present application.
  • the neural network model data processing apparatus 300 provided by the embodiment of the present application may include:
  • the obtaining module 310 is configured to obtain multiple neural network operators in the neural network model
  • the fusion module 320 is configured to fuse the multiple neural network operators according to preset rules to obtain a fused neural network operator
  • the combination module 330 is configured to combine the fused neural network operators into calculation instructions
  • the calculation module 340 is configured to use a calculation engine to calculate the calculation instruction.
  • the device is further configured to: before the fusion of the plurality of neural network operators according to a preset rule to obtain a fused neural network operator, determine the plurality of neural network operators Whether the fusion can be performed, in response to the judgment result that the multiple neural network operators can be fused, the multiple neural network operators are fused according to a preset rule to obtain a fused neural network operator.
  • the device is further configured to: after judging whether the multiple neural network operators can be fused, in response to the judgment result that the multiple neural network operators cannot be fused, continue to obtain new ones. Neural network operator.
  • the fusion module is set to: sequentially discharge the plurality of neural network operators in the order of convolution, activation function, pooling/upsampling, shortcut, activation function, and global pooling, and discharge The neural network operator of is merged to obtain the fused neural network operator.
  • the calculation module is configured to: determine whether the calculation instruction is the only data flow operation, and in response to the judgment result that the calculation instruction is the only data flow operation, use the calculation engine to perform the calculation on the calculation Instruction to perform calculations.
  • the calculation module is configured to: after judging whether the calculation instruction is the only data flow operation, in response to the judgment result that the calculation instruction is not the only data flow operation, according to the fusion
  • the latter neural network operator recombines the calculation instructions.
  • the device is further configured to: parse the calculation instruction.
  • a neural network data processing device obtained in an embodiment of the application obtains a plurality of neural network operators in a neural network model; fuse the plurality of neural network operators according to preset rules to obtain a fused neural network Operators; combine the fused neural network operators into calculation instructions; use a calculation engine to calculate the calculation instructions, which speeds up the calculation process, solves the problem of rapid data flow calculation in related technologies, and realizes After the algorithm instructions arrive at the driver, multiple operators are parsed into a data flow process. The multiple data flow processes realize a neural network data flow as a whole, so that the data flow speed can reach the highest efficiency.
  • FIG. 4 is a schematic structural diagram of a computer device provided by an embodiment of the application.
  • the computer device includes a memory 410 and a processor 420.
  • the number of processors 420 in the computer device may be one or more.
  • a processor 420 is taken as an example; the memory 410 and the processor 420 in the device may be connected through a bus or other methods.
  • the connection through a bus is taken as an example.
  • the memory 410 can be configured to store software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the methods in the embodiments of the present application (for example, in a data processing device of a neural network model).
  • the processor 420 executes multiple functional applications and data processing of the device/terminal/device by running the software programs, instructions, and modules stored in the memory 410, thereby realizing the foregoing method.
  • the processor 420 is configured to run a computer program stored in the memory 410, and implement the following steps:
  • the calculation engine is used to calculate the calculation instruction.
  • a computer device provided in an embodiment of the present application, its computer program is not limited to the above method operations, and can also execute the method provided in any embodiment of the present application.
  • the memory 410 may include a program storage area and a data storage area.
  • the program storage area may store an operating system and an application program required by at least one function; the data storage area may store data created according to the use of the terminal, and the like.
  • the memory 410 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other non-volatile solid-state storage devices.
  • the memory 410 may include a memory remotely provided with respect to the processor 420, and these remote memories may be connected to the device/terminal/device through a network. Examples of the aforementioned networks include the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
  • the technical solutions provided in the embodiments of this application obtain multiple neural network operators in a neural network model; fuse the multiple neural network operators according to preset rules to obtain a fused neural network operator; The fused neural network operators are combined into calculation instructions; the calculation engine is used to calculate the calculation instructions, which speeds up the calculation process, solves the problem of fast data flow calculation in related technologies, and realizes that the algorithm instructions can reach the driver program After that, the multiple operators are parsed into a data flow process, and the multiple data flow processes realize a neural network data flow as a whole, so that the data flow speed can reach the highest efficiency.
  • Embodiment 5 of the present application also provides a storage medium containing computer-executable instructions, the computer-executable instructions being used to execute the above method when executed by a computer processor, and the method includes:
  • the calculation engine is used to calculate the calculation instruction.
  • An embodiment of the present application provides a storage medium containing computer-executable instructions, and the computer-executable instructions are not limited to the method operations described above, and can also execute the methods provided in any embodiment of the present application.
  • the computer-readable storage medium of the embodiment of the present application may adopt a combination of one or more computer-readable media.
  • the computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium.
  • the computer-readable storage medium may be, for example, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination thereof.
  • Computer-readable storage media include: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable Read-Only Memory (Erasable Programmable Read-Only Memory, EPROM) or flash memory, optical fiber, portable compact disk read-only memory (Compact Disc Read Only Memory, CD-ROM), optical storage device, magnetic storage device, or the above suitable combination.
  • the computer-readable storage medium can be a variety of tangible media containing or storing a program, and the program can be used by or in combination with an instruction execution system, apparatus, or device.
  • the computer-readable signal medium may include a data signal propagated in baseband or as a part of a carrier wave, and computer-readable program code is carried therein. This propagated data signal can take many forms, including electromagnetic signals, optical signals, or a suitable combination of the above.
  • the computer-readable signal medium may also be a computer-readable medium other than a computer-readable storage medium, and the computer-readable medium may send, propagate, or transmit a program for use by or in combination with the instruction execution system, apparatus, or device.
  • the program code contained on the storage medium can be transmitted by a suitable medium, including wireless, wire, optical cable, radio frequency (RF), etc., or a suitable combination of the above.
  • a suitable medium including wireless, wire, optical cable, radio frequency (RF), etc., or a suitable combination of the above.
  • the computer program code used to perform the operations of this application can be written in one or more programming languages or a combination thereof.
  • the programming languages include object-oriented programming languages—such as Java, Smalltalk, C++, and also conventional Procedural programming language-such as "C" language or similar programming language.
  • the program code can be executed entirely on the user's computer, partly on the user's computer, executed as an independent software package, partly on the user's computer and partly executed on a remote computer, or entirely executed on the remote computer or terminal.
  • the remote computer can be connected to the user's computer through any kind of network-including Local Area Network (LAN) or Wide Area Network (WAN)-or it can be connected to an external computer ( For example, use an Internet service provider to connect via the Internet).
  • LAN Local Area Network
  • WAN Wide Area Network
  • the storage medium provided by the embodiment of the present application obtains multiple neural network operators in a neural network model; fuse the multiple neural network operators according to preset rules to obtain a fused neural network operator; and merge the neural network operators.
  • the latter combination of neural network operators is a calculation instruction; the calculation engine is used to calculate the calculation instruction, which speeds up the calculation process, solves the problem of fast data flow calculation in related technologies, and realizes that the algorithm instruction reaches the driver program.
  • the multiple operators are parsed into a data flow process, and the multiple data flow processes realize a neural network data flow as a whole, so that the data flow speed can reach the highest efficiency.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本申请实施例公开了一种神经网络模型的数据处理方法、装置、设备及存储介质,该方法包括:获取神经网络模型中的多个神经网络算子;根据预设规则对所述多个神经网络算子进行融合得到融合后的神经网络算子;将所述融合后的神经网络算子组合为计算指令;使用计算引擎对所述计算指令进行计算。

Description

神经网络模型的数据处理方法、装置、设备及存储介质
本申请要求在2020年02月18日提交中国专利局、申请号为202010099460.X的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及计算板卡技术领域,例如涉及一种神经网络模型的数据处理方法、装置、设备及存储介质。
背景技术
随着人工智能技术的发展,产生了许多可扩展的深度学习系统,深度学习系统可以设置为提供多种能够在中央处理单元(Central Processing Unit,CPU)或图形处理器(Graphics Processing Unit,GPU)等处理器上运行的神经网络模型。深度学习的框架种类繁多,并且框架版本迭代更新速度较快,融合技术需要根据不同框架的架构特点来进行设计。
处理器在运行神经网络模型时,如运行Caffe网络模型时,每次对该神经网络模型中的多个计算节点分别进行编译、解析,按照该神经网络模型的结构形式以一定的形式执行多个计算节点的运算。当上述计算节点在不同的处理器上执行时,在不同的处理器之间进行频繁的切换,不同处理器之间的通讯次数较多且进行多次数据拷贝,降低了神经网络模型的运算速度。
发明内容
本申请提供一种神经网络模型的数据处理方法、装置、设备及存储介质,以实现提高数据流速度效率。
在一实施例中,本申请实施例提供了一种神经网络模型的数据处理方法,包括:获取神经网络模型中的多个神经网络算子;
根据预设规则对所述多个神经网络算子进行融合得到融合后的神经网络算子;
将所述融合后的神经网络算子组合为计算指令;
使用计算引擎对所述计算指令进行计算。
可选的,在所述根据预设规则对所述多个神经网络算子进行融合得到融合后的神经网络算子之前,还包括:判断所述多个神经网络算子是否能够进行融 合,响应于所述多个神经网络算子能够进行融合的判断结果,根据所述预设规则对所述多个神经网络算子进行融合得到融合后的神经网络算子。
可选的,在所述判断所述多个神经网络算子是否能够进行融合之后,还包括:响应于所述多个神经网络算子不能进行融合的判断结果,继续获取新的神经网络算子。
可选的,所述根据预设规则对所述多个神经网络算子进行融合得到融合后的神经网络算子,包括:依次按照卷积、激活函数、池化/上采样、捷径、激活函数、全局池化的顺序对所述多个神经网络算子排放,并对排放的神经网络算子进行融合得到融合后的神经网络算子。
可选的,所述使用计算引擎对所述计算指令进行计算,包括:判断所述计算指令是否为唯一一个数据流操作,响应于所述计算指令是唯一一个数据流操作的判断结果,使用计算引擎对所述计算指令进行计算。
可选的,在所述判断所述计算指令是否为唯一一个数据流操作之后,还包括:响应于所述计算指令不是唯一一个数据流操作的判断结果,根据所述融合后的神经网络算子重新组合计算指令。
可选的,在所述使用计算引擎对所述计算指令进行计算之前,还包括:对所述计算指令进行解析。
在一实施例中,本申请实施例还提供了一种神经网络模型的数据处理装置,包括:
获取模块,设置为获取神经网络模型中的多个神经网络算子;
融合模块,设置为根据预设规则对所述多个神经网络算子进行融合得到融合后的神经网络算子;
组合模块,设置为将所述融合后的神经网络算子组合为计算指令;
计算模块,设置为使用计算引擎对所述计算指令进行计算。
在一实施例中,本申请实施例还提供了一种神经网络数据处理设备,包括一个或多个处理器;
存储装置,设置为存储一个或多个程序,
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现上述方法。
在一实施例中,本申请实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序包括程序指令,该程序指令被处理器执行时实现上述方法。
本申请实施例公开了一种神经网络模型的数据处理方法、装置、设备及存储介质,该方法包括:获取神经网络模型中的多个神经网络算子;根据预设规则对所述多个神经网络算子进行融合得到融合后的神经网络算子;将所述融合后的神经网络算子组合为计算指令;使用计算引擎对所述计算指令进行计算。本申请提供的一种神经网络模型的数据处理方法,加快了计算流程,解决了相关技术中无法快速实现数据流计算的问题,实现了使算法指令到达驱动程序之后把多个算子解析成一次数据流过程,多个数据流过程整体实现一个神经网络的数据流,使得数据流速度能够达到最高的效率。
附图说明
图1为本申请实施例中提供的一种神经网络模型的数据处理方法的流程图;
图2为本申请实施例中提供的一种神经网络模型的数据处理方法的流程图;
图3是本申请实施例中提供的一种神经网络模型的数据处理装置的结构示意图;
图4是本申请实施例中提供的一种计算机设备的结构示意图。
具体实施方式
下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释本申请,而非对本申请的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与本申请相关的部分而非全部结构。
下面结合附图和实施例对本申请进行说明。此处所描述的实施例仅仅用于解释本申请,而非对本申请的限定。为了便于描述,附图中仅示出了与本申请相关的部分而非全部结构。
一些示例性实施例被描述成作为流程图描绘的处理或方法。虽然流程图将多个步骤描述成顺序的处理,但是其中的许多步骤可以被并行地、并发地或者同时实施。此外,多个步骤的顺序可以被重新安排。当其操作完成时处理可以被终止,但是还可以具有未包括在附图中的附加步骤。处理可以对应于方法、函数、规程、子例程、子程序等等。
此外,术语“第一”、“第二”等可在本文中用于描述多种方向、动作、步骤或元件等,但这些方向、动作、步骤或元件不受这些术语限制。这些术语 仅用于将第一个方向、动作、步骤或元件与另一个方向、动作、步骤或元件区分。举例来说,可以将第一计算引擎称为第二计算引擎,且类似地,可将第二计算引擎称为第一计算引擎。第一计算引擎和第二计算引擎两者都是计算引擎,但第一计算引擎和第二计算引擎不是同一计算引擎。术语“第一”、“第二”等而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本申请的描述中,“多个”的含义是至少两个,例如两个,三个等,除非另有明确限定。
实施例一
图1为本申请实施例提供的一种神经网络模型的数据处理方法的流程图,本实施例可适用于融合计算算子加快数据流架构的情况,包括步骤100至步骤130。
步骤100、获取神经网络模型中的多个神经网络算子。
在一实施例中,在神经网络模型中,一个神经网络模型中包括多个算法指令,一个算法指令中包括多个神经网络算子,运算流程包括多层结构的多个神经网络算子和多个神经网络算子之间的连接关系。在计算引擎处理神经网络模型的计算指令后,获取所有的神经网络算子的信息,包括运算符号、运算参数、多个算子之间的连接关系等等。
步骤110、根据预设规则对所述多个神经网络算子进行融合得到融合后的神经网络算子。
本实施例中,根据预先设定的融合规则(即预设规则)对神经网络中的多个神经网络算子进行融合。融合过程包括从多个神经网络算子中选取目标算子,获取目标算子与下层算子的连接关系,根据连接关系确定融合关系。
示例性的,第一神经网络模型包括多层结构,运算流程包括所述多层结构的多个神经网络算子和所述多个神经网络算子之间的连接关系,每层结构对应至少一个神经网络算子,神经网络模型的架构融合装置根据所述运算流程生成所述第一神经网络模型的计算图,包括:神经网络模型的架构融合装置从所述多个神经网络算子中选取目标算子,所述目标算子为有向无环图的起始节点;神经网络模型的架构融合装置获取所述目标算子的下层算子和所述目标算子与所述下层算子的连接关系;神经网络模型的架构融合装置根据所述目标算子与下层算子的连接关系将所述下层算子对应的下层节点与所述起始节点连接,得到所述有向无环图。神经网络模型的架构融合装置根据所述多个神经网络算子对应的至少两个处理单元信息确定所述有向无环图中的N个可融合节点和M个 非融合节点,所述可融合节点对应的神经网络算子为图像处理单元(Image Processing Unit,IPU)执行的算子,所述N和所述M均为大于1的整数;神经网络模型的架构融合装置对所述N个可融合节点进行融合段划分,得到融合段划分后的有向无环图,所述融合段划分后的有向无环图包括P个融合段,所述P为大于或等于1且小于或等于所述N的整数;神经网络模型的架构融合装置获取所述M个非融合节点在所述有向无环图中的Q条路径和M个节点层,所述Q大于所述M,每个非融合节点对应至少一条路径和一个节点层;神经网络模型的架构融合装置根据所述Q条路径和所述M个节点层对所述融合段划分后的有向无环图进行简化,得到所述融合后的有向无环图。其中,非融合节点对应的神经网络算子为不在IPU上执行的算子。其中,每个融合段为有向无环图的一个子图,同一融合段中的至少一个可融合节点对应的至少一个算子均为在IPU上执行的算子,该至少一个算子在IPU上执行无需切换处理单元,无需多次数据拷贝。在一些实施例中,神经网络模型的架构融合装置获取M个非融合节点在有向无环图中的Q条路径和M个节点层的实施方式为:神经网络模型的架构融合装置从有向无环图的第1个节点层开始逐层遍历,获取每个非融合节点对应的至少一条路径和一个节点层,得到M个非融合节点在有向无环图中的Q条路径和M个节点层。神经网络模型的架构融合装置获取所述N个可融合节点之间的节点连接关系;神经网络模型的架构融合装置对所述N个可融合节点进行融合段划分,包括:若可融合节点m与可融合节点n的节点连接关系为同一个节点层的相邻节点或不同节点层的父子节点,则神经网络模型的架构融合装置将所述可融合节点m和所述可融合节点n划分至同一个融合段,所述可融合节点m和所述可融合节点n均为所述N个可融合节点中的任意一个。神经网络模型的架构融合装置根据所述Q条路径和所述M个节点层对所述融合段划分后的有向无环图进行简化,包括:神经网络模型的架构融合装置获取所述M个非融合节点之间的节点位置关系;若非融合节点p对应的算子与非融合节点q对应的算子相同,则神经网络模型的架构融合装置确定所述非融合节点p与所述非融合节点q的节点位置关系,所述非融合节点p和所述非融合节点q均为所述M个非融合节点中的任意一个;若所述非融合节点p与所述非融合节点q的所述节点位置关系为位于不同节点层且处于不同的路径,则神经网络模型的架构融合装置将指向所述非融合节点p的边指向所述非融合节点q,增加所述非融合节点q的一条边指向所述非融合节点p的边指向的节点,删除所述非融合节点p。其中,非融合节点q对应的算子在不同时刻接收到不同节点发送的数据并执行计算,不同节点的数量与不同时刻的数量是相同的。若所述非融合节点p与所述非融合节点q的所述节点位置关系为位于不同节点层且处于不同的路径,则神经网络模型的架构融合装置将指向所述非融合节点q的边指向所述非融合节 点p,增加所述非融合节点p的一条边指向所述非融合节点q的边指向的节点,删除所述非融合节点q。其中,非融合节点P对应的神经网络算子在不同时刻接收到不同节点发送的数据并执行计算,不同节点的数量在不同时刻是相同的。
步骤120、将所述融合后的神经网络算子组合为计算指令。
本实施例中,在神经网络模型之中,一个算法指令中包括多个神经网络算子,将融合后的神经网络算子组合为多个计算指令,方便分配合适数量的计算引擎进行计算。在一实施例中,算法指令和计算指令可以不同,计算指令中可以包含一个或多个算法指令。
步骤130、使用计算引擎对所述计算指令进行计算。
本实施例中,根据步骤120中的计算指令数量,选择合适数量的计算引擎对计算指令进行计算。此时的计算引擎至少为一个以上,根据该计算任务的优先级来确定,计算引擎数量越多,处理该计算任务的速度越快。示例性的,在深度学习计算中,使用多个计算引擎对图像进行识别,此时,使用计算引擎的数量越多,对该图像和数据库中图像对比的速度越快,输出对比结果的时间越短。
本申请实施例的技术方案通过获取神经网络模型中的多个神经网络算子;根据预设规则对所述多个神经网络算子进行融合得到融合后的神经网络算子;将所述融合后的神经网络算子组合为计算指令;使用计算引擎对所述计算指令进行计算,加快了计算流程,解决了相关技术中无法快速实现数据流计算的问题,实现了使算法指令到达驱动程序之后把多个算子解析成一次数据流过程,多个数据流过程整体实现一个神经网络的数据流,使得数据流速度能够达到最高的效率。
实施例二
图2为本申请实施例中一种神经网络模型的数据处理方法的流程图,本实施例是在实施例一的基础上的可选实施例,在一实施中,该方法包括:步骤200至步骤250。
步骤200、获取神经网络模型中的多个神经网络算子。
步骤210、判断所述多个神经网络算子是否能够进行融合,响应于所述多个神经网络算子能够进行融合的判断结果,根据预设规则对所述多个神经网络算子进行融合得到融合后的神经网络算子;响应于所述多个神经网络算子不能进行融合的判断结果,继续获取新的神经网络算子。
本实施例中,在进行神经网络算子融合之前需要判断这部分神经网络算子是否能够进行融合,融合方式参照实施例一的示例。当神经网络算子能够进行融合时,依次按照卷积、激活函数、池化/上采样、捷径、激活函数、全局池化的顺序对所述神经网络算子排放并对排放的神经网络算子进行融合得到融合后的神经网络算子。其中,卷积是指神经网络中的卷积层,是神经网络中进行大部分繁重计算工作的卷积网络的核心构建块,卷积层中的参数由一组可学习的过滤器组成。池化的功能是逐渐减小表示的空间大小,以减少网络中的参数和计算量,从而控制过度拟合。池化层在输入的每个深度切片上独立运行,并使用最大MAX操作在空间上调整其大小。激活函数一般用于神经网络的层与层之间,将上一层的输出转换之后输入到下一层。如果没有激活函数引入的非线性特性,那么神经网络就只相当于原始感知机的矩阵相乘。激活函数包括非线性特性:即当激活函数为非线性的时候,可以证明两层的神经网络可以逼近任意复杂的函数。激活函数具有连续可微的特性:由于神经网络的训练是基于梯度的优化方法,数学基础是连续可微,因此选取的激活函数也要保证连续可微。阶跃激活函数在0点不连续,且在除零点外导数都为0,因此不能使用基于梯度的方法。当激活函数的取值范围有限时,基于梯度的训练方法往往更加稳定,因为特征的表示受有限权重的影响更显著。当取值范围无限时,训练通常会更有效率,因为特征的表示会显著影响大部分权重,在这种情况下,通常需要更小的学习率。激活函数具有单调性,当激活函数是单调时,单层网络的误差曲面一定是凸的。
若当前神经网络算子不能进行融合时,获取新的神经网络算子来进行融合。获取新的神经网络算子之后,根据预设规则即卷积+激活函数+池化/上采样+捷径+激活函数+全局池化的顺序对所述神经网络算子排放,判断排放的神经网络算子是否能够进行融合,若能够进行融合,则根据实施例一中的融合方式对神经网络算子进行融合。
步骤220、根据预设规则对所述多个神经网络算子进行融合得到融合后的神经网络算子。
步骤230、将所述融合后的神经网络算子组合为计算指令。
步骤240、对所述计算指令进行解析。
本实施例中,对计算指令进行解析,其中包括将计算指令拆解为多个神经网络算子,根据这些神经网络算子确定一个或者多个神经网络数据流,根据神经网络数据流分配相应的计算引擎进行计算处理。
步骤250、判断所述计算指令是否为唯一一个数据流操作,响应于所述计算指令是唯一一个数据流操作的判断结果,使用计算引擎对所述计算指令进行计 算。响应于所述计算指令不是唯一一个数据流操作的判断结果,根据所述融合后的神经网络算子重新组合计算指令。
本实施例中,将计算指令解析后得到的神经网络数据流进行判断,判断是否为唯一一个数据流,在神经网络计算中,计算引擎一次一般只能处理一个数据流,所以同时存在多个数据流的情况下,会影响神经网络计算时间。首先判断是否为唯一一个数据流,如果是,则直接分配计算引擎对其进行计算处理,只计算一个神经网络数据流可以大大节省神经网络的处理时间,达到提高数据流速度效率的问题。如果不是,则将计算指令拆分得到的神经网络算子,重新与新的神经网络算子进行融合,再组成新的计算指令。然后将多个新的计算指令进行重新组合,继续判断是否为唯一一个数据流。
本申请实施例的技术方案通过获取神经网络模型中的多个神经网络算子;判断所述多个神经网络算子是否能够进行融合,若是,则根据预设规则对所述多个神经网络算子进行融合得到融合后的神经网络算子;若否,则继续获取新的神经网络算子;根据预设规则对所述多个神经网络算子进行融合得到融合后的神经网络算子;将所述融合后的神经网络算子组合为计算指令;对所述计算指令进行解析;判断所述计算指令是否为唯一一个数据流操作,若是,则使用计算引擎对所述计算指令进行计算。若否,则根据所述融合后的神经网络算子重新组合计算指令,加快了计算流程,解决了相关技术中无法快速实现数据流计算的问题,实现了使算法指令到达驱动程序之后把多个算子解析成一次数据流过程,多个数据流过程整体实现一个神经网络的数据流,使得数据流速度能够达到最高的效率。
实施例三
本申请实施例所提供的神经网络模型的数据处理装置可执行本申请任意实施例所提供的方法,具备执行方法相应的功能模块和效果。图3是本申请实施例中的一种神经网络模型的数据处理装置300的结构示意图。参照图3,本申请实施例提供的神经网络模型的数据处理装置300可以包括:
获取模块310,设置为获取神经网络模型中的多个神经网络算子;
融合模块320,设置为根据预设规则对所述多个神经网络算子进行融合得到融合后的神经网络算子;
组合模块330,设置为将所述融合后的神经网络算子组合为计算指令;
计算模块340,设置为使用计算引擎对所述计算指令进行计算。
在一实施例中,所述装置还设置为:在所述根据预设规则对所述多个神经 网络算子进行融合得到融合后的神经网络算子之前,判断所述多个神经网络算子是否能够进行融合,响应于所述多个神经网络算子能够进行融合的判断结果,根据预设规则对所述多个神经网络算子进行融合得到融合后的神经网络算子。
在一实施例中,所述装置还设置为:在判断所述多个神经网络算子是否能够进行融合之后,响应于所述多个神经网络算子不能进行融合的判断结果,继续获取新的神经网络算子。
在一实施例中,融合模块是设置为:依次按照卷积、激活函数、池化/上采样、捷径、激活函数、全局池化的顺序对所述多个神经网络算子排放,并对排放的神经网络算子进行融合得到融合后的神经网络算子。
在一实施例中,所述计算模块是设置为:判断所述计算指令是否为唯一一个数据流操作,响应于所述计算指令是唯一一个数据流操作的判断结果,使用计算引擎对所述计算指令进行计算。
在一实施例中,所述计算模块是设置为:在判断所述计算指令是否为唯一一个数据流操作之后,响应于所述计算指令不是唯一一个数据流操作的判断结果,根据将所述融合后的神经网络算子重新组合计算指令。
在一实施例中,所述装置还设置为:对所述计算指令进行解析。
本申请实施例中提供的一种神经网络数据处理装置,通过获取神经网络模型中的多个神经网络算子;根据预设规则对所述多个神经网络算子进行融合得到融合后的神经网络算子;将所述融合后的神经网络算子组合为计算指令;使用计算引擎对所述计算指令进行计算,加快了计算流程,解决了相关技术中无法快速实现数据流计算的问题,实现了使算法指令到达驱动程序之后把多个算子解析成一次数据流过程,多个数据流过程整体实现一个神经网络的数据流,使得数据流速度能够达到最高的效率。
实施例四
图4为本申请实施例提供的一种计算机设备的结构示意图,如图4所示,该计算机设备包括存储器410、处理器420,计算机设备中处理器420的数量可以是一个或多个,图4中以一个处理器420为例;设备中的存储器410、处理器420可以通过总线或其他方式连接,图4中以通过总线连接为例。
存储器410作为一种计算机可读存储介质,可设置为存储软件程序、计算机可执行程序以及模块,如本申请实施例中的方法对应的程序指令/模块(例如,神经网络模型的数据处理装置中的获取模块310、融合模块320、组合模块330、计算模块340)。处理器420通过运行存储在存储器410中的软件程序、指令以 及模块,从而执行设备/终端/设备的多种功能应用以及数据处理,即实现上述方法。
其中,处理器420设置为运行存储在存储器410中的计算机程序,实现如下步骤:
获取神经网络模型中的多个神经网络算子;
根据预设规则对所述多个神经网络算子进行融合得到融合后的神经网络算子;
将所述融合后的神经网络算子组合为计算指令;
使用计算引擎对所述计算指令进行计算。
在一实施例中,本申请实施例所提供的一种计算机设备,其计算机程序不限于如上的方法操作,还可以执行本申请任意实施例所提供的方法。
存储器410可包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序;存储数据区可存储根据终端的使用所创建的数据等。此外,存储器410可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实例中,存储器410可包括相对于处理器420远程设置的存储器,这些远程存储器可以通过网络连接至设备/终端/设备。上述网络的实例包括互联网、企业内部网、局域网、移动通信网及其组合。
本申请实施例中提供的技术方案通过获取神经网络模型中的多个神经网络算子;根据预设规则对所述多个神经网络算子进行融合得到融合后的神经网络算子;将所述融合后的神经网络算子组合为计算指令;使用计算引擎对所述计算指令进行计算,加快了计算流程,解决了相关技术中无法快速实现数据流计算的问题,实现了使算法指令到达驱动程序之后把多个算子解析成一次数据流过程,多个数据流过程整体实现一个神经网络的数据流,使得数据流速度能够达到最高的效率。
实施例五
本申请实施例五还提供一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时用于执行上述方法,该方法包括:
获取神经网络模型中的多个神经网络算子;
根据预设规则对所述多个神经网络算子进行融合得到融合后的神经网络算子;
将所述融合后的神经网络算子组合为计算指令;
使用计算引擎对所述计算指令进行计算。
本申请实施例所提供的一种包含计算机可执行指令的存储介质,其计算机可执行指令不限于如上所述的方法操作,还可以执行本申请任意实施例所提供的方法。
本申请实施例的计算机可读存储介质,可以采用一个或多个计算机可读的介质的组合。计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质。计算机可读存储介质例如可以是电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者以上的组合。计算机可读存储介质包括:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机存取存储器(Random Access Memory,RAM)、只读存储器(Read-Only Memory,ROM)、可擦式可编程只读存储器(Erasable Programmable Read-Only Memory,EPROM)或闪存、光纤、便携式紧凑磁盘只读存储器(Compact Disc Read Only Memory,CD-ROM)、光存储器件、磁存储器件、或者上述合适的组合。在本文件中,计算机可读存储介质可以是多种包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。
计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括电磁信号、光信号或上述合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。
存储介质上包含的程序代码可以用适当的介质传输,包括无线、电线、光缆、射频(Radio Frequency,RF)等等,或者上述合适的组合。
可以以一种或多种程序设计语言或其组合来编写用于执行本申请操作的计算机程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或终端上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(Local Area Network,LAN)或广域网(Wide Area Network,WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
本申请实施例提供的存储介质通过获取神经网络模型中的多个神经网络算子;根据预设规则对所述多个神经网络算子进行融合得到融合后的神经网络算子;将所述融合后的神经网络算子组合为计算指令;使用计算引擎对所述计算指令进行计算,加快了计算流程,解决了相关技术中无法快速实现数据流计算的问题,实现了使算法指令到达驱动程序之后把多个算子解析成一次数据流过程,多个数据流过程整体实现一个神经网络的数据流,使得数据流速度能够达到最高的效率。

Claims (10)

  1. 一种神经网络模型的数据处理方法,包括:
    获取神经网络模型中的多个神经网络算子;
    根据预设规则对所述多个神经网络算子进行融合得到融合后的神经网络算子;
    将所述融合后的神经网络算子组合为计算指令;
    使用计算引擎对所述计算指令进行计算。
  2. 根据权利要求1中所述的方法,其中,在所述根据预设规则对所述多个神经网络算子进行融合得到融合后的神经网络算子之前,还包括:判断所述多个神经网络算子是否能够进行融合,响应于所述多个神经网络算子能够进行融合的判断结果,根据所述预设规则对所述多个神经网络算子进行融合得到融合后的神经网络算子。
  3. 根据权利要求2中所述的方法,其中,在所述判断所述多个神经网络算子是否能够进行融合之后,还包括:响应于所述多个神经网络算子不能进行融合的判断结果,继续获取新的神经网络算子。
  4. 根据权利要求1中所述的方法,其中,所述根据预设规则对所述多个神经网络算子进行融合得到融合后的神经网络算子,包括:依次按照卷积、激活函数、池化/上采样、捷径、激活函数、全局池化的顺序对所述多个神经网络算子排放,并对排放的神经网络算子进行融合得到融合后的神经网络算子。
  5. 根据权利要求1中所述的方法,其中,所述使用计算引擎对所述计算指令进行计算,包括:判断所述计算指令是否为唯一一个数据流操作,响应于所述计算指令是唯一一个数据流操作的判断结果,使用计算引擎对所述计算指令进行计算。
  6. 根据权利要求5中所述的方法,其中,在所述判断所述计算指令是否为唯一一个数据流操作之后,还包括:响应于所述计算指令不是唯一一个数据流操作的判断结果,根据所述融合后的神经网络算子重新组合计算指令。
  7. 根据权利要求1中所述的方法,其中,在所述使用计算引擎对所述计算指令进行计算之前,还包括:对所述计算指令进行解析。
  8. 一种神经网络模型的数据处理装置,包括:
    获取模块,设置为获取神经网络模型中的多个神经网络算子;
    融合模块,设置为根据预设规则对所述多个神经网络算子进行融合得到融合后的神经网络算子;
    组合模块,设置为将所述融合后的神经网络算子组合为计算指令;
    计算模块,设置为使用计算引擎对所述计算指令进行计算。
  9. 一种计算设备,包括:
    一个或多个处理器;
    存储装置,设置为存储一个或多个程序,
    所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-7中任一项所述的方法。
  10. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序包括程序指令,所述程序指令被处理器执行时实现如权利要求1-7中任一项所述的方法。
PCT/CN2021/073758 2020-02-18 2021-01-26 神经网络模型的数据处理方法、装置、设备及存储介质 WO2021164506A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/800,689 US20240220765A1 (en) 2020-02-18 2021-01-26 Tape pasting mechanism with multiple functions of clutch-type synchronous punching, tape pasting and cutting

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010099460.X 2020-02-18
CN202010099460.XA CN111260019B (zh) 2020-02-18 2020-02-18 神经网络模型的数据处理方法、装置、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2021164506A1 true WO2021164506A1 (zh) 2021-08-26

Family

ID=70951079

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/073758 WO2021164506A1 (zh) 2020-02-18 2021-01-26 神经网络模型的数据处理方法、装置、设备及存储介质

Country Status (3)

Country Link
US (1) US20240220765A1 (zh)
CN (1) CN111260019B (zh)
WO (1) WO2021164506A1 (zh)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115293340A (zh) * 2022-08-09 2022-11-04 上海壁仞智能科技有限公司 数据同步处理方法、装置、计算设备和存储介质
CN115796228A (zh) * 2022-11-15 2023-03-14 北京百度网讯科技有限公司 算子融合方法、装置、设备以及存储介质
CN116389786A (zh) * 2023-03-29 2023-07-04 深圳市安飞信息有限公司 基于节点扩容的视频云存储方法、装置和电子设备
CN116932092A (zh) * 2023-09-18 2023-10-24 之江实验室 一种算子调用代码自动生成的方法、装置、介质及设备
WO2024051377A1 (zh) * 2022-09-07 2024-03-14 华为云计算技术有限公司 模型优化方法、装置以及计算设备
CN118051234A (zh) * 2024-04-12 2024-05-17 北京壁仞科技开发有限公司 用于软硬件适配的方法、计算装置、介质和程序产品

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111260019B (zh) * 2020-02-18 2023-04-11 深圳鲲云信息科技有限公司 神经网络模型的数据处理方法、装置、设备及存储介质
CN111737193B (zh) * 2020-08-03 2020-12-08 深圳鲲云信息科技有限公司 数据存储方法、装置、设备和存储介质
TWI770629B (zh) * 2020-10-08 2022-07-11 大陸商星宸科技股份有限公司 神經網路優化方法、裝置及處理器
CN112465133B (zh) * 2020-11-25 2022-12-09 安徽寒武纪信息科技有限公司 控制流多核并行方法、计算机设备和存储介质
CN112686378A (zh) * 2020-12-23 2021-04-20 展讯通信(上海)有限公司 神经网络的计算部署方法及装置、存储介质、计算机设备
CN114692860A (zh) * 2020-12-28 2022-07-01 华为技术有限公司 一种计算图的节点融合方法及设备
CN112766483B (zh) * 2020-12-30 2023-01-31 上海熠知电子科技有限公司 异构系统的数据处理方法、装置及计算机可读存储介质
CN112884123B (zh) * 2021-02-23 2024-03-01 杭州海康威视数字技术股份有限公司 神经网络优化方法、装置、电子设备及可读存储介质
CN112947933B (zh) * 2021-02-24 2024-07-12 上海商汤智能科技有限公司 一种算子的执行方法、装置、计算机设备及存储介质
CN113065639B (zh) * 2021-03-08 2023-06-13 深圳云天励飞技术股份有限公司 算子融合方法、系统、设备及存储介质
CN113010469B (zh) * 2021-03-18 2023-05-26 恒睿(重庆)人工智能技术研究院有限公司 图像特征提取方法、装置以及计算机可读存储介质
CN113835900B (zh) * 2021-11-26 2022-02-22 山东产研鲲云人工智能研究院有限公司 神经网络计算方法、装置、设备及计算机可读存储介质
CN114970845A (zh) * 2022-01-12 2022-08-30 厦门壹普智慧科技有限公司 一种面向通用神经网络张量处理器的统一计算方法
CN114118389B (zh) * 2022-01-28 2022-05-10 深圳鲲云信息科技有限公司 神经网络数据处理方法、设备及存储介质
CN115357626A (zh) * 2022-09-06 2022-11-18 中国建设银行股份有限公司 数据处理方法、装置、电子设备、介质及产品
CN118171683A (zh) * 2022-12-09 2024-06-11 华为技术有限公司 一种用于神经网络的算子融合方法及相关装置
CN117576125B (zh) * 2024-01-16 2024-04-16 芯瞳半导体技术(山东)有限公司 一种神经网络计算图的分割方法、装置、设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109754073A (zh) * 2018-12-29 2019-05-14 北京中科寒武纪科技有限公司 数据处理方法、装置、电子设备和可读存储介质
CN110321999A (zh) * 2018-03-30 2019-10-11 北京深鉴智能科技有限公司 神经网络计算图优化方法
CN110689116A (zh) * 2019-09-24 2020-01-14 上海寒武纪信息科技有限公司 一种神经网络剪枝方法、装置、计算机设备及存储介质
CN111260019A (zh) * 2020-02-18 2020-06-09 深圳鲲云信息科技有限公司 神经网络模型的数据处理方法、装置、设备及存储介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11093826B2 (en) * 2016-02-05 2021-08-17 International Business Machines Corporation Efficient determination of optimized learning settings of neural networks
CN108229455B (zh) * 2017-02-23 2020-10-16 北京市商汤科技开发有限公司 物体检测方法、神经网络的训练方法、装置和电子设备
CN109740751B (zh) * 2018-12-24 2020-04-14 中科寒武纪科技股份有限公司 神经网络模型的架构融合方法及相关装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110321999A (zh) * 2018-03-30 2019-10-11 北京深鉴智能科技有限公司 神经网络计算图优化方法
CN109754073A (zh) * 2018-12-29 2019-05-14 北京中科寒武纪科技有限公司 数据处理方法、装置、电子设备和可读存储介质
CN110689116A (zh) * 2019-09-24 2020-01-14 上海寒武纪信息科技有限公司 一种神经网络剪枝方法、装置、计算机设备及存储介质
CN111260019A (zh) * 2020-02-18 2020-06-09 深圳鲲云信息科技有限公司 神经网络模型的数据处理方法、装置、设备及存储介质

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115293340A (zh) * 2022-08-09 2022-11-04 上海壁仞智能科技有限公司 数据同步处理方法、装置、计算设备和存储介质
CN115293340B (zh) * 2022-08-09 2023-07-14 上海壁仞智能科技有限公司 数据同步处理方法、装置、计算设备和存储介质
WO2024051377A1 (zh) * 2022-09-07 2024-03-14 华为云计算技术有限公司 模型优化方法、装置以及计算设备
CN115796228A (zh) * 2022-11-15 2023-03-14 北京百度网讯科技有限公司 算子融合方法、装置、设备以及存储介质
CN115796228B (zh) * 2022-11-15 2024-04-05 北京百度网讯科技有限公司 算子融合方法、装置、设备以及存储介质
CN116389786A (zh) * 2023-03-29 2023-07-04 深圳市安飞信息有限公司 基于节点扩容的视频云存储方法、装置和电子设备
CN116389786B (zh) * 2023-03-29 2024-04-05 南京浮点智算数字科技有限公司 基于节点扩容的视频云存储方法、装置和电子设备
CN116932092A (zh) * 2023-09-18 2023-10-24 之江实验室 一种算子调用代码自动生成的方法、装置、介质及设备
CN116932092B (zh) * 2023-09-18 2024-01-09 之江实验室 一种算子调用代码自动生成的方法、装置、介质及设备
CN118051234A (zh) * 2024-04-12 2024-05-17 北京壁仞科技开发有限公司 用于软硬件适配的方法、计算装置、介质和程序产品

Also Published As

Publication number Publication date
CN111260019A (zh) 2020-06-09
CN111260019B (zh) 2023-04-11
US20240220765A1 (en) 2024-07-04

Similar Documents

Publication Publication Date Title
WO2021164506A1 (zh) 神经网络模型的数据处理方法、装置、设备及存储介质
Markakis et al. EXEGESIS: Extreme edge resource harvesting for a virtualized fog environment
CN112689828A (zh) 放置由网络流量触发的容器工作负载以在网络边缘设备处进行高效计算
US20170010673A1 (en) Gesture based sharing of user interface portion
KR102592036B1 (ko) 사용자 중심 컨텐츠 스트리밍을 위한 방법 및 시스템
CN114745317B (zh) 面向算力网络的计算任务调度方法及相关设备
Haider et al. On the planning and design problem of fog computing networks
KR102613367B1 (ko) 딥러닝 모델 서빙 최적화를 위한 모델 자동 경량화 방법 및 장치, 이를 이용한 클라우드 추론 서비스 제공 방법
KR102688854B1 (ko) 컴퓨팅 플랫폼에서의 기능 구현을 위한 컴퓨팅 리소스 추정
Laroui et al. SO‐VMEC: service offloading in virtual mobile edge computing using deep reinforcement learning
CN115534939B (zh) 车辆控制方法、装置、电子设备和计算机可读介质
Liang et al. DNN surgery: Accelerating DNN inference on the edge through layer partitioning
Chen et al. AndroidOff: Offloading android application based on cost estimation
CN107133741B (zh) 待办任务处理方法、装置、可读存储介质及电子设备
Toumi et al. Machine learning for service migration: a survey
Durkadevi et al. Generic method for SDN controller selection using AHP and TOPSIS methods
US11915122B2 (en) Gateway for distributing an artificial neural network among multiple processing nodes
CN114172817A (zh) 一种边缘计算的虚拟网络功能部署方法与系统
Young et al. A governance architecture for self-adaption & control in IoT applications
CN115955685A (zh) 多智能体协同路由方法、设备及计算机存储介质
WO2021077281A1 (zh) 深度学习框架的调整方法、装置、服务器及存储介质
CN107018201A (zh) 数据流架构中利用关键路径信息进行指令动态迁移的方法
CN115883401B (zh) 一种基于流交互图的端到端网络性能预测方法、系统及平台
US11868232B1 (en) Execution-time telemetry reporting using object model
CN111582482B (zh) 用于生成网络模型信息的方法、装置、设备和介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21756558

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 12.01.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21756558

Country of ref document: EP

Kind code of ref document: A1