WO2021139633A1 - 深度学习模型的转化方法、装置、服务器及存储介质 - Google Patents

深度学习模型的转化方法、装置、服务器及存储介质 Download PDF

Info

Publication number
WO2021139633A1
WO2021139633A1 PCT/CN2021/070223 CN2021070223W WO2021139633A1 WO 2021139633 A1 WO2021139633 A1 WO 2021139633A1 CN 2021070223 W CN2021070223 W CN 2021070223W WO 2021139633 A1 WO2021139633 A1 WO 2021139633A1
Authority
WO
WIPO (PCT)
Prior art keywords
intermediate expression
calculation graph
deep learning
operator
data flow
Prior art date
Application number
PCT/CN2021/070223
Other languages
English (en)
French (fr)
Inventor
熊超
蔡权雄
牛昕宇
Original Assignee
深圳鲲云信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳鲲云信息科技有限公司 filed Critical 深圳鲲云信息科技有限公司
Priority to US17/791,373 priority Critical patent/US20230139106A1/en
Publication of WO2021139633A1 publication Critical patent/WO2021139633A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/10Interfaces, programming languages or software development kits, e.g. for simulating neural networks
    • G06N3/105Shells for specifying net layout
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the embodiments of the present application relate to deep learning technology, for example, to a method, device, server, and storage medium of a deep learning model transformation.
  • Deep learning networks are usually trained by algorithms. In most cases, algorithm developers tend to use public deep learning frameworks for model training.
  • One deep learning framework can develop multiple deep learning models, and most public deep learning frameworks are aimed at central processing units/graphics processing. It is designed for computing devices such as Central Processing Unit/Graphics Processing Unit (CPU/GPU).
  • CPU/GPU Central Processing Unit/Graphics Processing Unit
  • the CPU/GPU adopts the traditional instruction set architecture, the architecture efficiency is low, the operator granularity is small, and the flexibility is high.
  • the data flow architecture is more efficient and is more suitable for the development trend of deep learning technology from the perspective of the technical route.
  • the data expression of the data flow architecture and the instruction set architecture the granularity of the data flow architecture operator is much greater than that of the instruction set architecture; the data flow architecture operator must pre-determine the arrangement order of the computing modules according to the data dependencies before computing. This difference determines that the model trained under the instruction set architecture cannot be directly deployed in the data flow architecture, which greatly hinders the application development of the data flow architecture.
  • the embodiments of the present application provide a method, device, server, and storage medium for transforming a deep learning model, so as to transform a deep learning model developed based on an instruction set architecture to run under a data flow architecture.
  • an embodiment of the present application provides a method for transforming a deep learning model, including:
  • the target data stream network model corresponding to the conversion of the target deep learning model is obtained.
  • an embodiment of the present application provides a device for transforming a deep learning model, including:
  • the target deep learning model analysis module is set to parse the target deep learning model into the intermediate expression of the instruction set calculation graph
  • An intermediate expression conversion module of the instruction set calculation graph configured to convert the intermediate expression of the instruction set calculation graph into an intermediate expression of the data flow calculation graph
  • a data flow calculation graph intermediate expression adjustment module configured to adjust the data flow calculation graph intermediate expression to a customized architecture intermediate expression
  • the target data flow network model generation module is configured to obtain the target data flow network model corresponding to the target deep learning model according to the intermediate expression of the customized architecture.
  • the target deep learning model includes the granularity of the first operator
  • the intermediate expression of the instruction set calculation graph includes the granularity of the second operator
  • the intermediate expression of the data flow calculation graph includes the granularity of the third operator.
  • the granularity of the first operator is the same as the granularity of the second operator.
  • the granularity of the second operator is smaller than the granularity of the third operator.
  • the intermediate expression of the instruction set calculation graph further includes a second operator
  • the intermediate expression of the data flow calculation graph further includes a third operator
  • the second operator forms the third operator through fusion.
  • an embodiment of the present application provides a server, including:
  • One or more processors are One or more processors;
  • Storage device set to store one or more programs
  • the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors implement the method provided in any embodiment of the present application.
  • the embodiment of the present application provides a computer-readable storage medium having a computer program stored on the computer-readable storage medium, and when the program is executed by a processor, a method as provided in any embodiment of the present application is implemented.
  • the target deep learning model is parsed into an intermediate expression of an instruction set calculation graph; the intermediate expression of the instruction set calculation graph is converted into an intermediate expression of a data flow calculation graph; the intermediate expression of the data flow calculation graph is adjusted to be customized Intermediate expression of the architecture; according to the intermediate expression of the customized architecture, the target data stream network model corresponding to the transformation of the target deep learning model is obtained.
  • the deep learning model developed based on the instruction set architecture is transformed to run under the data flow architecture, and the deep learning model is described by the intermediate expression of the instruction set calculation graph, the intermediate expression of the data flow calculation graph, and the intermediate expression of the customized architecture, which can be based on actual needs
  • the trade-off between legibility and execution efficiency makes the design more flexible.
  • FIG. 1 is a schematic flowchart of a method for transforming a deep learning model provided by Embodiment 1 of the application;
  • FIG. 2 is a schematic structural diagram of a deep learning model conversion device provided in the second embodiment of the application.
  • FIG. 3 is a schematic structural diagram of a server provided in Embodiment 3 of this application.
  • Some exemplary embodiments are described as processes or methods depicted as flowcharts. Although the flowchart describes multiple steps as sequential processing, many of the steps can be implemented in parallel, concurrently, or simultaneously. In addition, the order of multiple steps can be rearranged. The processing may be terminated when the multiple step operations are completed, but there may also be additional steps not included in the drawing. Processing can correspond to methods, functions, procedures, subroutines, subroutines, and so on.
  • first”, “second”, etc. may be used herein to describe various directions, actions, steps or elements, etc., but these directions, actions, steps or elements are not limited by these terms. These terms are only used to distinguish a first direction, action, step or element from another direction, action, step or element.
  • the granularity of the first operator can be referred to as the granularity of the second operator, and similarly, the granularity of the second operator can be referred to as the first operator. Graininess. Both the first operator granularity and the second operator granularity are operator granularity, but they are not the same operator granularity.
  • Figure 1 is a schematic flow diagram of a method for transforming a deep learning model provided by Embodiment 1 of the application, which is suitable for inputting a deep learning model developed based on an instruction set architecture to a chip based on a data flow architecture for operation.
  • the conversion device of the deep learning model is executed, and the device can be implemented by software and/or hardware, and can be integrated on the server.
  • the deep learning model conversion method provided in Embodiment 1 of the present application includes:
  • the deep learning framework is a large amount of basic code for algorithm developers to perform model training, such as TensorFlow, Caffe, Mxnet, Torch, etc.
  • the deep learning model is a neural network developed under the deep learning framework to implement a specific algorithm Model
  • a deep learning framework can develop multiple deep learning models.
  • the set of all instructions that the CPU/GPU can execute is called the instruction set
  • the instruction set architecture is an interface between the CPU/GPU physical hardware and the upper-layer software.
  • Most of the published deep learning models are designed for computing devices such as CPU/GPU, that is, most of the published deep learning models use an instruction set architecture.
  • the middle expression of the instruction set calculation graph defines the network structure of the deep learning model, that is, the types and connection relationships of operators.
  • An operator is composed of one or more minimum arithmetic units that can be executed by the target computing device.
  • the connection relationship between the operators represents the operation rules between the operators, and the operator granularity represents the complexity of the operators, usually by The number of the smallest arithmetic units included in the granularity of an operator is expressed.
  • the operator with a larger granularity is called a large-particle operator, and an operator with a smaller granularity is called a small-particle operator.
  • the smallest arithmetic unit is A1, A2, A3, A4, and its operators are also A1, A2, A3, A4, then the corresponding operator granularity is 1, and the operator types are A1, There are four types of A2, A3, and A4.
  • the connection relationship between the operators can be to run A1+A2 first, and then run A1+A2+A3+A4.
  • the deep learning model that uses the instruction set architecture generally contains small-particle operators. The operator has a small particle size and therefore has a high degree of flexibility but low efficiency. When the amount of calculated data is too large, it will take a long time. calculating time.
  • Analyze the target deep learning model into the intermediate expression of the instruction set calculation graph that is, analyze the operator types and the operation rules between the operators in the target deep learning model, so that the target deep learning model developed based on the instruction set architecture can be analyzed
  • the operators in the fusion and transformation, so that the target deep learning model can run under the framework of data flow.
  • the operator granularity in the target deep learning model is the first operator granularity
  • the operator granularity in the intermediate expression of the instruction set calculation graph is the second operator granularity, because the target deep learning model is parsed into the instruction set calculation graph
  • the intermediate expression does not change the granularity of the operator, so the granularity of the first operator is the same as the granularity of the second operator.
  • the operators in the target deep learning model and the operators in the intermediate expression of the instruction set calculation graph are also the same, both are The first operator, that is, in the intermediate expression of the instruction set calculation graph, the granularity of the second operator is obtained for the first operator.
  • the operator/operator granularity in the target deep learning model is consistent with the operator/operator granularity expressed in the middle of the instruction set calculation graph.
  • the middle expression of the instruction set calculation graph is closest to the expression of the original calculation graph of the target deep learning model.
  • the granularity of the first operator/first operator is closer to the neural network algorithm design level, has higher legibility, and is convenient for developers to interpret the network structure.
  • S120 Convert the intermediate expression of the instruction set calculation graph into an intermediate expression of the data flow calculation graph.
  • the middle expression of the data flow calculation graph represents the types and connection relationships of operators under the data flow architecture.
  • the operator expressed in the middle of the instruction set calculation graph is the first operator, and the operator expressed in the middle of the data flow calculation graph is the second operator. Converting the intermediate expression of the instruction set calculation graph into the middle expression of the data flow calculation graph is in accordance with the data flow
  • the operator granularity of the instruction set reorganizes the middle expression of the instruction set calculation graph, and the first operator expressed in the middle of the instruction set calculation graph is merged into the second operator expressed in the middle of the data flow calculation graph according to the granularity of the data flow calculation graph, that is, the instruction
  • the small-particle operators expressed in the middle of the set calculation graph are merged into large-particle operators.
  • the four operators expressed in the middle of the instruction set calculation graph are A1, A2, A3, and A4.
  • the connection relationship between the operators can be to run A1+A2 first, then run A1+A2+A3+A4, and change the instruction set
  • A1+A2 A1 and A2 are small particle operators
  • B B is a large particle operator
  • A3+A4 is merged into C.
  • the operator granularity of B is 2
  • the operators in the middle expression of the data flow calculation graph are B and C
  • the connection relationship between the operators is B+C.
  • the fusion here does not mean simply superimposing, and includes the meaning of fusion and transformation.
  • the intermediate expression of the data flow calculation graph includes the granularity of the third operator, and the granularity of the third operator contained in the intermediate expression of the data flow calculation graph is greater than the granularity of the second operator contained in the intermediate expression of the instruction set calculation graph.
  • the intermediate expression of the customized architecture represents the operators of the data flow architecture of the target deep learning model and their connection relationships.
  • the intermediate expression of the data flow calculation graph is adjusted to a customized architecture intermediate expression, that is, the operators of the intermediate expression of the data flow calculation graph are reorganized and rewritten according to the design principles of the data flow architecture of the running target deep learning model.
  • the intermediate expression of the customized architecture is close to the underlying operation, and the operation efficiency is higher.
  • the operator expressed in the middle of the data flow calculation graph represents the smallest arithmetic unit that can be executed under the data flow architecture.
  • the intermediate expression of the customized architecture can divide the smallest arithmetic unit according to modules, for example, the operator expressed in the middle of the data flow calculation graph There are four types of B, C, D, and E.
  • the calculation relationship between the operators is to calculate B+C first, and then B+C+D+E.
  • the intermediate expression of the customized architecture can be the first module to run B+C ,
  • the second module runs D+E.
  • the first module and the second module can perform calculations at the same time, thereby reducing calculation time and higher efficiency.
  • S140 According to the intermediate expression of the customized architecture, obtain the target data stream network model corresponding to the conversion of the target deep learning model.
  • the target data flow network model is a deep learning model that runs under the data flow architecture
  • the intermediate expression of the customized architecture can be regarded as the calculation graph of the target data flow network model, which includes both the target data flow network model
  • the operator type and the corresponding data parameters also include the connection relationship between the operators in the target data flow network model.
  • the target deep learning model can be run, thereby transforming the deep learning model developed based on the instruction set architecture to run under the data flow architecture.
  • the target deep learning model is parsed into an intermediate expression of an instruction set calculation graph; the intermediate expression of the instruction set calculation graph is converted into an intermediate expression of a data flow calculation graph; the intermediate expression of the data flow calculation graph is adjusted to be customized Intermediate expression of the architecture; according to the intermediate expression of the customized architecture, the target data stream network model corresponding to the transformation of the target deep learning model is obtained.
  • the deep learning model developed based on the instruction set architecture is transformed to run under the data flow architecture, and the deep learning model is described by the intermediate expression of the instruction set calculation graph, the intermediate expression of the data flow calculation graph, and the intermediate expression of the customized architecture, which can be based on actual needs
  • the trade-off between legibility and execution efficiency makes the design more flexible.
  • the target deep learning model includes a first operator granularity
  • the intermediate expression of the instruction set calculation graph includes a second operator granularity
  • the intermediate expression of the data flow calculation graph includes a third operator granularity degree.
  • the granularity of the first operator is the same as the granularity of the second operator.
  • the granularity of the second operator is smaller than the granularity of the third operator.
  • the intermediate expression of the instruction set calculation graph further includes a first operator
  • the intermediate expression of the data flow calculation graph further includes a second operator.
  • the granularity of the third operator is obtained for the second operator.
  • a plurality of first operators forms the second operator through fusion transformation.
  • the target deep learning model is parsed into an intermediate expression of an instruction set calculation graph; the intermediate expression of the instruction set calculation graph is converted into an intermediate expression of a data flow calculation graph; the intermediate expression of the data flow calculation graph is adjusted to be customized The intermediate expression of the customized architecture; according to the intermediate expression of the customized architecture, the target data stream network model corresponding to the conversion of the target deep learning model is obtained.
  • the deep learning model developed based on the instruction set architecture is transformed to run under the data flow architecture, and the deep learning model is described by the intermediate expression of the instruction set calculation graph, the intermediate expression of the data flow calculation graph, and the intermediate expression of the customized architecture, which can be based on actual needs
  • the trade-off between legibility and execution efficiency makes the design more flexible.
  • FIG. 2 is a schematic structural diagram of a deep learning model conversion device provided by an embodiment of the application.
  • This embodiment may be suitable for inputting a deep learning model developed based on an instruction set architecture into a chip based on a data flow architecture for operation.
  • the device can be implemented in software and/or hardware, and can be integrated on a server.
  • the deep learning model conversion device provided in the embodiment of the present application can execute the deep learning model conversion method provided in any embodiment of the present application, and has the functional modules and effects corresponding to the execution method.
  • the deep learning model conversion device 200 includes: a target deep learning model analysis module 210, an instruction set calculation graph intermediate expression conversion module 220, a data flow calculation graph intermediate expression adjustment module 230, and a target The data flow network model generation module 240, where:
  • the target deep learning model analysis module 210 is configured to parse the target deep learning model into an intermediate expression of the instruction set calculation graph;
  • the intermediate expression conversion module 220 of the instruction set calculation graph is configured to convert the intermediate expression of the instruction set calculation graph into the intermediate expression of the data flow calculation graph;
  • the data flow calculation graph intermediate expression adjustment module 230 is configured to adjust the data flow calculation graph intermediate expression to a customized architecture intermediate expression
  • the target data flow network model generation module 240 is configured to obtain the target data flow network model corresponding to the conversion of the target deep learning model according to the intermediate expression of the customized architecture.
  • the target deep learning model analysis module 210, the instruction set calculation graph intermediate expression conversion module 220, and the data flow calculation graph intermediate expression adjustment module 230 are all independent modules.
  • the target deep learning model analysis module 210, the intermediate expression conversion module 220 of the instruction set calculation graph, and the intermediate expression adjustment module 230 of the data flow calculation graph are all independent modules. Affect the working logic of other modules. For example, if the target deep learning model needs to be replaced, and the target deep learning model after replacement and the target deep learning model before replacement are developed based on different deep learning frameworks, the relevant logic of the target deep learning model analysis module 210 is modified In order to correspond to the deep learning framework corresponding to the replaced target deep learning model, the intermediate expression conversion module 220 of the instruction set calculation graph and the intermediate expression adjustment module 230 of the data flow calculation graph can remain unchanged and continue to be used; if the target data flow network model If changes are needed, relevant changes are made to the intermediate expression adjustment module 230 of the data flow calculation graph, and the target deep learning model analysis module 210 and the intermediate expression conversion module 220 of the instruction set calculation graph can remain unchanged and continue to be used.
  • the target deep learning model includes a first operator granularity
  • the intermediate expression of the instruction set calculation graph includes a second operator granularity
  • the intermediate expression of the data flow calculation graph includes a third operator granularity degree.
  • the granularity of the first operator is the same as the granularity of the second operator.
  • the granularity of the second operator is smaller than the granularity of the third operator.
  • the intermediate expression of the instruction set calculation graph further includes a first operator
  • the intermediate expression of the data flow calculation graph further includes a second operator
  • a plurality of first operators forms the second operator through fusion transformation.
  • the embodiment of the application realizes the deep learning model that will be developed based on the instruction set architecture by converting the target deep learning model analysis module, the instruction set calculation graph intermediate expression conversion module, the data flow calculation graph intermediate expression adjustment module, and the target data flow network model generation module. Transformed to run under the data flow architecture, use the intermediate expression of the instruction set calculation graph, the intermediate expression of the data flow calculation graph and the intermediate expression of the customized architecture to describe the deep learning model, which can be weighed in terms of legibility and execution efficiency according to actual needs. Makes the design more flexible; the target deep learning model analysis module, the instruction set calculation graph intermediate expression conversion module, and the data flow calculation graph intermediate expression adjustment module are all independent modules, which increase the scalability of the conversion device of the deep learning model and improve the development speed.
  • FIG. 3 is a schematic structural diagram of a server provided in Embodiment 3 of the present application.
  • FIG. 3 shows a block diagram of an exemplary server 312 suitable for implementing the embodiments of the present application.
  • the server 312 shown in FIG. 3 is only an example.
  • the server 312 is represented in the form of a general server.
  • the components of the server 312 may include: one or more processors 316, a storage device 328, and a bus 318 connecting different system components (including the storage device 328 and the processor 316).
  • the bus 318 represents one or more of several types of bus structures, including a storage device bus or a storage device controller, a peripheral bus, a graphics acceleration port, a processor, or a local bus using any bus structure among multiple bus structures.
  • these architectures include Industry Standard Architecture (Subversive Alliance, ISA) bus, Micro Channel Architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) ) Local bus and Peripheral Component Interconnect (PCI) bus.
  • the server 312 includes a variety of computer system readable media. These media may be usable media that can be accessed by the server 312, including volatile and non-volatile media, removable and non-removable media.
  • the storage device 328 may include a computer system readable medium in the form of a volatile memory, such as a random access memory (RAM) 330 and/or a cache 332.
  • the server 312 may include other removable/non-removable, volatile/non-volatile computer system storage media.
  • the storage system 334 may be used to read and write non-removable, non-volatile magnetic media (not shown in FIG. 3, and generally referred to as a "hard drive").
  • a disk drive for reading and writing to removable non-volatile disks such as "floppy disks”
  • a removable non-volatile optical disk such as a compact disc (Compact Disc Read)
  • each drive may be connected to the bus 318 through one or more data media interfaces.
  • the storage device 328 may include at least one program product, and the program product has a set (for example, at least one) program modules that are configured to perform the functions of multiple embodiments of the present application.
  • a program/utility tool 340 having a set of (at least one) program module 342 may be stored in, for example, the storage device 328.
  • Such program module 342 includes an operating system, one or more application programs, other program modules, and program data. Each of the examples or some combination may include the realization of a network environment.
  • the program module 342 generally executes the functions and/or methods in the embodiments described in this application.
  • the server 312 may also communicate with one or more external devices 314 (such as keyboards, pointing servers, displays 324, etc.), and may also communicate with one or more servers that enable users to interact with the server 312, and/or communicate with
  • the server 312 can communicate with one or more other computing servers (such as a network card, a modem, etc.). This communication can be performed through an input/output (Input/Output, I/O) interface 322.
  • the server 312 may also communicate with one or more networks (for example, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) through the network adapter 320. As shown in FIG.
  • the network adapter 320 communicates with other modules of the server 312 through the bus 318.
  • other hardware and/or software modules can be used in conjunction with the server 312, including: microcode, server drives, redundant processors, external disk drive arrays, and disk arrays (Redundant Arrays of Independent Disks, RAID) systems , Tape drives and data backup storage systems.
  • the processor 316 executes a variety of functional applications and data processing by running programs stored in the storage device 328, for example, to implement a method provided in any embodiment of the present application.
  • the method may include:
  • the target data stream network model corresponding to the conversion of the target deep learning model is obtained.
  • the fourth embodiment of the present application also provides a computer-readable storage medium with a computer program stored on the computer-readable storage medium.
  • the program When the program is executed by a processor, the method provided in any embodiment of the present application can be implemented.
  • the target data stream network model corresponding to the conversion of the target deep learning model is obtained.
  • the computer storage medium of the embodiment of the present application may adopt one computer-readable medium or a combination of multiple computer-readable media.
  • the computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium.
  • the computer-readable storage medium may be, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination thereof, for example.
  • Examples of computer-readable storage media include: electrical connections with one or more wires, portable computer disks, hard disks, RAM, read-only memory (ROM), erasable memory Erasable Programmable Read-Only Memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (Compact Disc Read Only Memory, CD-ROM), optical storage device, magnetic storage device, or a suitable combination of the above .
  • the computer-readable storage medium may be a variety of tangible media containing or storing a program, and the program may be used by or in combination with an instruction execution system, apparatus, or device.
  • the computer-readable signal medium may include a data signal propagated in baseband or as a part of a carrier wave, and computer-readable program code is carried therein. This propagated data signal can take many forms, including electromagnetic signals, optical signals, or a suitable combination of the above.
  • the computer-readable signal medium may also be a computer-readable medium other than a computer-readable storage medium, and the computer-readable medium may send, propagate, or transmit a program for use by or in combination with the instruction execution system, apparatus, or device.
  • the program code contained on the computer-readable medium can be transmitted by a suitable medium, including wireless, wire, optical cable, radio frequency (RF), etc., or a suitable combination of the above.
  • a suitable medium including wireless, wire, optical cable, radio frequency (RF), etc., or a suitable combination of the above.
  • the computer program code used to perform the operations of this application can be written in one or more programming languages or a combination thereof.
  • the programming languages include object-oriented programming languages—such as Java, Smalltalk, C++, and also conventional Procedural programming language-such as "C" language or similar programming language.
  • the program code can be executed entirely on the user's computer, partly on the user's computer, executed as an independent software package, partly on the user's computer and partly executed on a remote computer, or entirely executed on the remote computer or terminal.
  • the remote computer can be connected to the user's computer through a variety of networks, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (for example, using an Internet service provider to connect to the user's computer) connection).
  • LAN local area network
  • WAN wide area network

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Stored Programmes (AREA)

Abstract

一种深度学习模型的转化方法、装置、服务器及存储介质,所述方法包括:将目标深度学习模型解析为指令集计算图中间表达(S110);将所述指令集计算图中间表达转化为数据流计算图中间表达(S120);将所述数据流计算图中间表达调整为定制化架构中间表达(S130);根据所述定制化架构中间表达,得到所述目标深度学习模型所对应转化的目标数据流网络模型(S140)。

Description

深度学习模型的转化方法、装置、服务器及存储介质
本申请要求在2020年01月07日提交中国专利局、申请号为202010015495.0的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及深度学习技术,例如涉及一种深度学习模型的转化方法、装置、服务器及存储介质。
背景技术
深度学习网络通常由算法训练得来。多数情况下,算法开发人员倾向于使用公开的深度学习框架进行模型训练,一种深度学习框架可以开发出多个深度学习模型,而大部分公开的深度学习框架是针对于中央处理器/图形处理器(Central Processing Unit/Graphics Processing Unit,CPU/GPU)这类计算设备设计的。CPU/GPU采用传统的指令集架构,架构效率较低,算子颗粒度较小,因而灵活度较高。
随着深度学习相关技术的发展,对于算力的要求越来越高。传统指令集的架构效率缺陷已经不能满足应用场景的需求。对比而言,数据流架构效率更高,从技术路线来看更加适合深度学习技术的发展趋势。然而,数据流架构于指令集架构的数据表达存在较大差异:数据流架构算子颗粒度要远大于指令集架构;数据流架构算子计算前要预先根据数据依赖确定计算模块的排列顺序。这种差异决定了指令集架构下训练得来的模型不能直接部署在数据流架构中,这大大阻碍了数据流架构的应用发展。
发明内容
本申请实施例提供一种深度学习模型的转化方法、装置、服务器及存储介质,以实现将基于指令集架构开发的深度学习模型转化到数据流架构下运行。
在一实施例中,本申请实施例提供一种深度学习模型的转化方法,包括:
将目标深度学习模型解析为指令集计算图中间表达;
将所述指令集计算图中间表达转化为数据流计算图中间表达;
将所述数据流计算图中间表达调整为定制化架构中间表达;
根据所述定制化架构中间表达,得到所述目标深度学习模型所对应转化的 目标数据流网络模型。
在一实施例中,本申请实施例提供一种深度学习模型的转化装置,包括:
目标深度学习模型解析模块,设置为将目标深度学习模型解析为指令集计算图中间表达;
指令集计算图中间表达转化模块,设置为将所述指令集计算图中间表达转化为数据流计算图中间表达;
数据流计算图中间表达调整模块,设置为将所述数据流计算图中间表达调整为定制化架构中间表达;
目标数据流网络模型生成模块,设置为根据所述定制化架构中间表达,得到所述目标深度学习模型所对应的目标数据流网络模型。
可选的,所述目标深度学习模型包括第一算子颗粒度,所述指令集计算图中间表达包括第二算子颗粒度,所述数据流计算图中间表达包括第三算子颗粒度。
可选的,所述第一算子颗粒度与所述第二算子颗粒度相同。
可选的,所述第二算子颗粒度小于所述第三算子颗粒度。
可选的,指令集计算图中间表达还包括第二算子,所述数据流计算图中间表达还包括第三算子。
可选的,所述第二算子通过融合形成所述第三算子。
在一实施例中,本申请实施例提供一种服务器,包括:
一个或多个处理器;
存储装置,设置为存储一个或多个程序,
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如本申请任意实施例所提供的方法。
在一实施例中,本申请实施例提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该程序被处理器执行时实现如本申请任意实施例所提供的方法。
本申请实施例通过将目标深度学习模型解析为指令集计算图中间表达;将所述指令集计算图中间表达转化为数据流计算图中间表达;将所述数据流计算图中间表达调整为定制化架构中间表达;根据所述定制化架构中间表达,得到所述目标深度学习模型所对应转化的目标数据流网络模型。实现了将基于指令集架构开发的深度学习模型转化到数据流架构下运行,使用指令集计算图中间 表达、数据流计算图中间表达和定制化架构中间表达来描述深度学习模型,可以根据实际需求在易读性和执行效率等方面做权衡,使得设计更为灵活。
附图说明
图1为本申请实施例一提供的一种深度学习模型的转化方法的流程示意图;
图2为本申请实施例二提供的一种深度学习模型的转化装置的结构示意图;
图3为本申请实施例三提供的一种服务器的结构示意图。
具体实施方式
下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释本申请,而非对本申请的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与本申请相关的部分而非全部结构。
一些示例性实施例被描述成作为流程图描绘的处理或方法。虽然流程图将多个步骤描述成顺序的处理,但是其中的许多步骤可以被并行地、并发地或者同时实施。此外,多个步骤的顺序可以被重新安排。当多个步骤操作完成时处理可以被终止,但是还可以具有未包括在附图中的附加步骤。处理可以对应于方法、函数、规程、子例程、子程序等等。
此外,术语“第一”、“第二”等可在本文中用于描述多种方向、动作、步骤或元件等,但这些方向、动作、步骤或元件不受这些术语限制。这些术语仅用于将第一个方向、动作、步骤或元件与另一个方向、动作、步骤或元件区分。举例来说,在不脱离本申请的范围的情况下,可以将第一算子颗粒度称为第二算子颗粒度,且类似地,可将第二算子颗粒度称为第一算子颗粒度。第一算子颗粒度和第二算子颗粒度两者都是算子颗粒度,但其不是同一算子颗粒度。术语“第一”、“第二”等而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本申请的描述中,“多个”的含义是至少两个,例如两个,三个等,除非另有明确限定。
实施例一
图1为本申请实施例一提供的一种深度学习模型的转化方法的流程示意图,可适用于将基于指令集架构开发的深度学习模型输入到基于数据流架构的芯片 中运行,该方法可以由深度学习模型的转化装置来执行,该装置可以采用软件和/或硬件的方式实现,并可集成在服务器上。
如图1所示,本申请实施例一提供的深度学习模型的转化方法包括:
S110、将目标深度学习模型解析为指令集计算图中间表达。
在一实施例中,深度学习框架是算法开发人员进行模型训练的大量基础代码,例如,TensorFlow、Caffe、Mxnet、Torch等,深度学习模型则是在深度学习框架下开发出来实现特定算法的神经网络模型,一种深度学习框架可以开发多个深度学习模型。CPU/GPU能执行的所有指令集合就称为指令集,指令集架构则是CPU/GPU物理硬件和上层软件之间的一个接口。大部分已公开的深度学习模型都是针对于CPU/GPU这类计算设备设计的,即,大部分已公开的深度学习模型采用的是指令集架构。
指令集计算图中间表达定义了深度学习模型的网络结构,即算子的种类和连接关系。算子由一个或多个能够被目标运算设备执行的最小运算单元组合而成,算子之间的连接关系表示算子之间的运算规则,算子颗粒度表示算子的复杂程度,通常由算子颗粒度包含的最小运算单元的数量表示,算子颗粒度大的称之为大颗粒算子,算子颗粒度小的称之为小颗粒算子。例如,在CPU/GPU设备中,最小运算单元为A1、A2、A3、A4,其算子也是A1、A2、A3、A4,那么对应的算子颗粒度为1,算子的种类为A1、A2、A3、A4四种,算子之间的连接关系可以是先运行A1+A2,再运行A1+A2+A3+A4。采用指令集架构的深度学习模型,一般包含的是小颗粒算子,其算子颗粒度较小,因而灵活度较高,但是效率低下,当计算的数据量过大时,需要花费较长的计算时间。
将目标深度学习模型解析为指令集计算图中间表达,即,将目标深度学习模型中的算子种类和算子之间的运算规则解析出来,从而可以对基于指令集架构开发的目标深度学习模型中的算子进行融合和转化,使目标深度学习模型能够在数据流框架下运行。
目标深度学习模型中的算子颗粒度为第一算子颗粒度,指令集计算图中间表达中的算子颗粒度为第二算子颗粒度,由于将目标深度学习模型解析为指令集计算图中间表达并没有改变算子颗粒度,故第一算子颗粒度与第二算子颗粒度相同,目标深度学习模型中的算子与指令集计算图中间表达中的算子也相同,都为第一算子,也就是说,在指令集计算图中间表达中,第二算子颗粒度是针对第一算子得到的。即目标深度学习模型中的算子/算子颗粒度与指令集计算图中间表达的算子/算子颗粒度均一致。且指令集计算图中间表达最接近目标深度学习模型的原始计算图的表达。
在一实施例中,第一算子/第一算子颗粒度更贴近神经网络算法设计层级,具有较高易读性,便于开发者解读网络结构。
S120、将所述指令集计算图中间表达转化为数据流计算图中间表达。
在一实施例中,数据流计算图中间表达表示的是数据流架构下算子的种类和连接关系。指令集计算图中间表达的算子为第一算子,数据流计算图中间表达的算子为第二算子,将指令集计算图中间表达转化为数据流计算图中间表达,就是按照数据流的算子颗粒度对指令集计算图中间表达进行重组,将指令集计算图中间表达的第一算子依据数据流算子颗粒度融合为数据流计算图中间表达的第二算子,即将指令集计算图中间表达的小颗粒算子融合为大颗粒算子。例如,指令集计算图中间表达的算子是A1、A2、A3、A4四种,算子之间的连接关系可以是先运行A1+A2,再运行A1+A2+A3+A4,将指令集计算图中间表达转为数据流计算图中间表达时,将A1+A2(A1、A2为小颗粒算子)融合为B(B为大颗粒算子),将A3+A4融合为C,此时,B的算子颗粒度为2,数据流计算图中间表达中的算子为B、C两种,算子之间的连接关系为B+C。
在一实施例中,这里的融合不是简单叠加的意思,包括融合和转化的含义。
数据流计算图中间表达包含第三算子颗粒度,数据流计算图中间表达包含的第三算子颗粒度大于指令集计算图中间表达包含的第二算子颗粒度。
S130、将所述数据流计算图中间表达调整为定制化架构中间表达。
在一实施例中,定制化架构中间表达表示运行目标深度学习模型的数据流架构的算子及其连接关系。将数据流计算图中间表达调整为定制化架构中间表达,即将数据流计算图中间表达的算子按照运行目标深度学习模型的数据流架构的设计原则进行重组和改写。定制化架构中间表达贴近底层运算,运行效率较高。
数据流计算图中间表达的算子表示的是数据流架构下能够执行的最小运算单元,定制化架构中间表达可以将该最小运算单元按照模块进行划分,例如,数据流计算图中间表达的算子为B、C、D、E四种,算子之间的运算关系为先计算B+C,再计算B+C+D+E,则定制化架构中间表达可以是第一模块运行B+C,第二模块运行D+E,在设计时,第一模块和第二模块可以同时进行计算,从而减少计算时间,效率更高。
S140、根据所述定制化架构中间表达,得到所述目标深度学习模型所对应转化的目标数据流网络模型。
在一实施例中,目标数据流网络模型是在数据流架构下运行的深度学习模型,定制化架构中间表达可以看成是目标数据流网络模型的计算图,其中既包 含目标数据流网络模型中的算子类型和对应的数据参数,也包含目标数据流网络模型中的算子间的连接关系。根据定制化架构中间表达可以运行目标深度学习模型,从而将基于指令集架构开发的深度学习模型转化到数据流架构下运行。
本申请实施例通过将目标深度学习模型解析为指令集计算图中间表达;将所述指令集计算图中间表达转化为数据流计算图中间表达;将所述数据流计算图中间表达调整为定制化架构中间表达;根据所述定制化架构中间表达,得到所述目标深度学习模型所对应转化的目标数据流网络模型。实现了将基于指令集架构开发的深度学习模型转化到数据流架构下运行,使用指令集计算图中间表达、数据流计算图中间表达和定制化架构中间表达来描述深度学习模型,可以根据实际需求在易读性和执行效率等方面做权衡,使得设计更为灵活。
在一实施例中,所述目标深度学习模型包括第一算子颗粒度,所述指令集计算图中间表达包括第二算子颗粒度,所述数据流计算图中间表达包括第三算子颗粒度。
在一实施例中,所述第一算子颗粒度与所述第二算子颗粒度相同。
在一实施例中,所述第二算子颗粒度小于所述第三算子颗粒度。
在一实施例中,指令集计算图中间表达还包括第一算子,所述数据流计算图中间表达还包括第二算子。在一实施例中,第三算子颗粒度是针对第二算子得到的。
在一实施例中,多个第一算子通过融合转化形成所述第二算子。
本申请实施例一通过将目标深度学习模型解析为指令集计算图中间表达;将所述指令集计算图中间表达转化为数据流计算图中间表达;将所述数据流计算图中间表达调整为定制化架构中间表达;根据所述定制化架构中间表达,得到所述目标深度学习模型所对应转化的目标数据流网络模型。实现了将基于指令集架构开发的深度学习模型转化到数据流架构下运行,使用指令集计算图中间表达、数据流计算图中间表达和定制化架构中间表达来描述深度学习模型,可以根据实际需求在易读性和执行效率等方面做权衡,使得设计更为灵活。
实施例二
图2为本申请实施例提供的一种深度学习模型的转化装置的结构示意图,本实施例可适用于将基于指令集架构开发的深度学习模型输入到基于数据流架构的芯片中运行。该装置可以采用软件和/或硬件的方式实现,并可集成在服务器上。本申请实施例所提供的深度学习模型的转化装置可执行本申请任意实施例所提供的深度学习模型的转化方法,具备执行方法相应的功能模块和效果。 本申请实施例二中未描述的内容可以参考本申请任意方法实施例中的描述。
如图2所示,本申请实施例提供的深度学习模型的转化装置200包括:目标深度学习模型解析模块210、指令集计算图中间表达转化模块220、数据流计算图中间表达调整模块230和目标数据流网络模型生成模块240,其中:
目标深度学习模型解析模块210,设置为将目标深度学习模型解析为指令集计算图中间表达;
指令集计算图中间表达转化模块220,设置为将所述指令集计算图中间表达转化为数据流计算图中间表达;
数据流计算图中间表达调整模块230,设置为将所述数据流计算图中间表达调整为定制化架构中间表达;
目标数据流网络模型生成模块240,设置为根据所述定制化架构中间表达,得到所述目标深度学习模型所对应转化的目标数据流网络模型。
在一实施例中,目标深度学习模型解析模块210、指令集计算图中间表达转化模块220和数据流计算图中间表达调整模块230均为独立模块。
在一实施例中,目标深度学习模型解析模块210、指令集计算图中间表达转化模块220和数据流计算图中间表达调整模块230均为独立模块表示,在修改其中某一模块的情况下,不影响其他模块的工作逻辑。例如,如果需要替换目标深度学习模型,且替换后的目标深度学习模型与替换前的目标深度学习模型是基于不同的深度学习框架开发出来的,则将目标深度学习模型解析模块210的相关逻辑修改为与替换后的目标深度学习模型对应的深度学习框架相对应,而指令集计算图中间表达转化模块220和数据流计算图中间表达调整模块230可以维持不变继续使用;如果目标数据流网络模型需要改变,则对数据流计算图中间表达调整模块230进行相关变更,目标深度学习模型解析模块210和指令集计算图中间表达转化模块220可以维持不变继续使用。
在一实施例中,所述目标深度学习模型包括第一算子颗粒度,所述指令集计算图中间表达包括第二算子颗粒度,所述数据流计算图中间表达包括第三算子颗粒度。
在一实施例中,所述第一算子颗粒度与所述第二算子颗粒度相同。
在一实施例中,所述第二算子颗粒度小于所述第三算子颗粒度。
在一实施例中,指令集计算图中间表达还包括第一算子,所述数据流计算图中间表达还包括第二算子。
在一实施例中,多个第一算子通过融合转化形成所述第二算子。
本申请实施例通过将目标深度学习模型解析模块、指令集计算图中间表达转化模块、数据流计算图中间表达调整模块和目标数据流网络模型生成模块实现了将基于指令集架构开发的深度学习模型转化到数据流架构下运行,使用指令集计算图中间表达、数据流计算图中间表达和定制化架构中间表达来描述深度学习模型,可以根据实际需求在易读性和执行效率等方面做权衡,使得设计更为灵活;目标深度学习模型解析模块、指令集计算图中间表达转化模块和数据流计算图中间表达调整模块均为独立模块,增加了深度学习模型的转化装置的拓展性,提高了开发速度。
实施例三
图3是本申请实施例三提供的一种服务器的结构示意图。图3示出了适于用来实现本申请实施方式的示例性服务器312的框图。图3显示的服务器312仅仅是一个示例。
如图3所示,服务器312以通用服务器的形式表现。服务器312的组件可以包括:一个或者多个处理器316,存储装置328,连接不同系统组件(包括存储装置328和处理器316)的总线318。
总线318表示几类总线结构中的一种或多种,包括存储装置总线或者存储装置控制器,外围总线,图形加速端口,处理器或者使用多种总线结构中的任意总线结构的局域总线。举例来说,这些体系结构包括工业标准体系结构(Industry Subversive Alliance,ISA)总线,微通道体系结构(Micro Channel Architecture,MAC)总线,增强型ISA总线、视频电子标准协会(Video Electronics Standards Association,VESA)局域总线以及外围组件互连(Peripheral Component Interconnect,PCI)总线。
服务器312包括多种计算机系统可读介质。这些介质可以是能够被服务器312访问的可用介质,包括易失性和非易失性介质,可移动的和不可移动的介质。
存储装置328可以包括易失性存储器形式的计算机系统可读介质,例如随机存取存储器(Random Access Memory,RAM)330和/或高速缓存332。服务器312可以包括其它可移动/不可移动的、易失性/非易失性计算机系统存储介质。仅作为举例,存储系统334可以用于读写不可移动的、非易失性磁介质(图3未显示,通常称为“硬盘驱动器”)。尽管图3中未示出,可以提供用于对可移动非易失性磁盘(例如“软盘”)读写的磁盘驱动器,以及对可移动非易失性光盘,例如只读光盘(Compact Disc Read-Only Memory,CD-ROM),数字视盘(Digital Video Disc-Read Only Memory,DVD-ROM)或者其它光介质)读写的光盘驱动 器。在这些情况下,每个驱动器可以通过一个或者多个数据介质接口与总线318相连。存储装置328可以包括至少一个程序产品,该程序产品具有一组(例如至少一个)程序模块,这些程序模块被配置以执行本申请多个实施例的功能。
具有一组(至少一个)程序模块342的程序/实用工具340,可以存储在例如存储装置328中,这样的程序模块342包括操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。程序模块342通常执行本申请所描述的实施例中的功能和/或方法。
服务器312也可以与一个或多个外部设备314(例如键盘、指向服务器、显示器324等)通信,还可与一个或者多个使得用户能与该服务器312交互的服务器通信,和/或与使得该服务器312能与一个或多个其它计算服务器进行通信的服务器(例如网卡,调制解调器等等)通信。这种通信可以通过输入/输出(Input/Output,I/O)接口322进行。并且,服务器312还可以通过网络适配器320与一个或者多个网络(例如局域网(Local Area Network,LAN),广域网(Wide Area Network,WAN)和/或公共网络,例如因特网)通信。如图3所示,网络适配器320通过总线318与服务器312的其它模块通信。尽管图中未示出,可以结合服务器312使用其它硬件和/或软件模块,包括:微代码、服务器驱动器、冗余处理器、外部磁盘驱动阵列、磁盘阵列(Redundant Arrays of Independent Disks,RAID)系统、磁带驱动器以及数据备份存储系统等。
处理器316通过运行存储在存储装置328中的程序,从而执行多种功能应用以及数据处理,例如实现本申请任意实施例所提供的方法,该方法可以包括:
将目标深度学习模型解析为指令集计算图中间表达;
将所述指令集计算图中间表达转化为数据流计算图中间表达;
将所述数据流计算图中间表达调整为定制化架构中间表达;
根据所述定制化架构中间表达,得到所述目标深度学习模型所对应转化的目标数据流网络模型。
实施例四
本申请实施例四还提供了一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该程序被处理器执行时实现如本申请任意实施例所提供的方法,该方法可以包括:
将目标深度学习模型解析为指令集计算图中间表达;
将所述指令集计算图中间表达转化为数据流计算图中间表达;
将所述数据流计算图中间表达调整为定制化架构中间表达;
根据所述定制化架构中间表达,得到所述目标深度学习模型所对应转化的目标数据流网络模型。
本申请实施例的计算机存储介质,可以采用一个计算机可读的介质或多个计算机可读的介质的组合。计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者以上的组合。计算机可读存储介质的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、RAM、只读存储器(Read-Only Memory,ROM)、可擦式可编程只读存储器(Erasable Programmable Read-Only Memory,EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(Compact Disc Read Only Memory,CD-ROM)、光存储器件、磁存储器件、或者上述合适的组合。在本文件中,计算机可读存储介质可以是包含或存储程序的多种有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。
计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括电磁信号、光信号或上述合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。
计算机可读介质上包含的程序代码可以用适当的介质传输,包括无线、电线、光缆、射频(Radio Frequency,RF)等等,或者上述合适的组合。
可以以一种或多种程序设计语言或其组合来编写用于执行本申请操作的计算机程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或终端上执行。在涉及远程计算机的情形中,远程计算机可以通过多种网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。

Claims (10)

  1. 一种深度学习模型的转化方法,包括:
    将目标深度学习模型解析为指令集计算图中间表达;
    将所述指令集计算图中间表达转化为数据流计算图中间表达;
    将所述数据流计算图中间表达调整为定制化架构中间表达;
    根据所述定制化架构中间表达,得到所述目标深度学习模型所对应转化的目标数据流网络模型。
  2. 如权利要求1所述的方法,其中,所述目标深度学习模型包括第一算子颗粒度,所述指令集计算图中间表达包括第二算子颗粒度,所述数据流计算图中间表达包括第三算子颗粒度。
  3. 如权利要求2所述的方法,其中,所述第一算子颗粒度与所述第二算子颗粒度相同。
  4. 如权利要求2所述的方法,其中,所述第二算子颗粒度小于所述第三算子颗粒度。
  5. 如权利要求2所述的方法,其中,所述指令集计算图中间表达还包括第一算子,所述数据流计算图中间表达还包括第二算子。
  6. 如权利要求5所述的方法,其中,多个所述第一算子通过融合转化形成所述第二算子。
  7. 一种深度学习模型的转化装置,包括:
    目标深度学习模型解析模块,设置为将目标深度学习模型解析为指令集计算图中间表达;
    指令集计算图中间表达转化模块,设置为将所述指令集计算图中间表达转化为数据流计算图中间表达;
    数据流计算图中间表达调整模块,设置为将所述数据流计算图中间表达调整为定制化架构中间表达;
    目标数据流网络模型生成模块,设置为根据所述定制化架构中间表达,得到所述目标深度学习模型所对应转化的目标数据流网络模型。
  8. 如权利要求7所述的装置,其中,所述目标深度学习模型解析模块、所述指令集计算图中间表达转化模块和所述数据流计算图中间表达调整模块均为独立模块。
  9. 一种服务器,包括:
    一个或多个处理器;
    存储装置,设置为存储一个或多个程序,
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-6中任一项所述的深度学习模型的转化方法。
  10. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,该程序被处理器执行时实现如权利要求1-6中任一项所述的深度学习模型的转化方法。
PCT/CN2021/070223 2020-01-07 2021-01-05 深度学习模型的转化方法、装置、服务器及存储介质 WO2021139633A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/791,373 US20230139106A1 (en) 2020-01-07 2021-01-05 Conversion method and apparatus for deep learning model, server, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010015495.0A CN111222636B (zh) 2020-01-07 2020-01-07 深度学习模型的转化方法、装置、服务器及存储介质
CN202010015495.0 2020-01-07

Publications (1)

Publication Number Publication Date
WO2021139633A1 true WO2021139633A1 (zh) 2021-07-15

Family

ID=70828126

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/070223 WO2021139633A1 (zh) 2020-01-07 2021-01-05 深度学习模型的转化方法、装置、服务器及存储介质

Country Status (3)

Country Link
US (1) US20230139106A1 (zh)
CN (1) CN111222636B (zh)
WO (1) WO2021139633A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111222636B (zh) * 2020-01-07 2023-06-06 深圳鲲云信息科技有限公司 深度学习模型的转化方法、装置、服务器及存储介质
CN111723935A (zh) * 2020-06-24 2020-09-29 湖北亿咖通科技有限公司 神经网络计算图的处理方法、计算机存储介质及电子设备
CN113065639B (zh) * 2021-03-08 2023-06-13 深圳云天励飞技术股份有限公司 算子融合方法、系统、设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190079803A1 (en) * 2017-09-13 2019-03-14 Next Silicon, Ltd. Directed and interconnected grid dataflow architecture
CN110321999A (zh) * 2018-03-30 2019-10-11 北京深鉴智能科技有限公司 神经网络计算图优化方法
CN110377288A (zh) * 2018-04-13 2019-10-25 赛灵思公司 神经网络压缩编译器及其编译压缩方法
CN110377340A (zh) * 2019-07-24 2019-10-25 北京中科寒武纪科技有限公司 运算方法、装置及相关产品
CN110490309A (zh) * 2019-08-14 2019-11-22 北京中科寒武纪科技有限公司 一种用于神经网络的算子融合方法及其相关产品
CN111222636A (zh) * 2020-01-07 2020-06-02 深圳鲲云信息科技有限公司 深度学习模型的转化方法、装置、服务器及存储介质

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9032362B2 (en) * 2012-09-10 2015-05-12 Sap Se System and method for generating high performance calculators for calculation graphs
CN110032449A (zh) * 2019-04-16 2019-07-19 苏州浪潮智能科技有限公司 一种优化gpu服务器的性能的方法及装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190079803A1 (en) * 2017-09-13 2019-03-14 Next Silicon, Ltd. Directed and interconnected grid dataflow architecture
CN110321999A (zh) * 2018-03-30 2019-10-11 北京深鉴智能科技有限公司 神经网络计算图优化方法
CN110377288A (zh) * 2018-04-13 2019-10-25 赛灵思公司 神经网络压缩编译器及其编译压缩方法
CN110377340A (zh) * 2019-07-24 2019-10-25 北京中科寒武纪科技有限公司 运算方法、装置及相关产品
CN110490309A (zh) * 2019-08-14 2019-11-22 北京中科寒武纪科技有限公司 一种用于神经网络的算子融合方法及其相关产品
CN111222636A (zh) * 2020-01-07 2020-06-02 深圳鲲云信息科技有限公司 深度学习模型的转化方法、装置、服务器及存储介质

Also Published As

Publication number Publication date
CN111222636A (zh) 2020-06-02
US20230139106A1 (en) 2023-05-04
CN111222636B (zh) 2023-06-06

Similar Documents

Publication Publication Date Title
WO2021139633A1 (zh) 深度学习模型的转化方法、装置、服务器及存储介质
WO2021208612A1 (zh) 数据处理的方法与装置
JP2023520420A (ja) チャットボットのために不均衡なトレーニングデータを取り扱うためのバッチング技術
Venkataramani et al. Computing approximately, and efficiently
WO2021259106A1 (zh) 神经网络芯片的优化方法、系统、设备和存储介质
WO2021129645A1 (zh) 数据并行化处理方法、系统、设备和存储介质
WO2021136512A1 (zh) 基于深度学习节点计算的调度方法、设备及存储介质
JP2022018095A (ja) マルチモーダル事前訓練モデル取得方法、装置、電子デバイス及び記憶媒体
US20210373799A1 (en) Method for storing data and method for reading data
US20220044678A1 (en) Speech processing method and method for generating speech processing model
KR102635800B1 (ko) 신경망 모델의 사전 훈련 방법, 장치, 전자 기기 및 매체
WO2019232980A1 (zh) 节点配置方法及装置、计算机可读存储介质和电子设备
WO2021259041A1 (zh) Ai计算图的排序方法、装置、设备及存储介质
CN111985831A (zh) 云计算资源的调度方法、装置、计算机设备及存储介质
JP7383801B2 (ja) 画像記述生成方法、装置、システム、媒体及び電子機器
Shaffer et al. Virtue: Performance visualization of parallel and distributed applications
JP7210830B2 (ja) 音声処理システム、音声処理方法、電子デバイス及び可読記憶媒体
CN111291882A (zh) 一种模型转换的方法、装置、设备和计算机存储介质
CN110807111A (zh) 三维图形的处理方法及装置、存储介质、电子设备
CN112487790A (zh) 包括粗略语义解析器和精细语义解析器的改进语义解析器
JP2023036634A (ja) アクセス方法、装置、電子機器及びコンピュータ記憶媒体
Liu et al. Analysis of teaching reform mode based on cognitive computing system–an example of dragon boat teaching
CN116360735A (zh) 一种表单生成方法、装置、设备和介质
WO2021077282A1 (zh) 神经网络模型转化方法、装置、服务器及存储介质
WO2021077281A1 (zh) 深度学习框架的调整方法、装置、服务器及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21738547

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 07/12/2022)

122 Ep: pct application non-entry in european phase

Ref document number: 21738547

Country of ref document: EP

Kind code of ref document: A1