WO2023221406A1 - 深度学习编译器的运行方法、装置及电子设备 - Google Patents

深度学习编译器的运行方法、装置及电子设备 Download PDF

Info

Publication number
WO2023221406A1
WO2023221406A1 PCT/CN2022/128369 CN2022128369W WO2023221406A1 WO 2023221406 A1 WO2023221406 A1 WO 2023221406A1 CN 2022128369 W CN2022128369 W CN 2022128369W WO 2023221406 A1 WO2023221406 A1 WO 2023221406A1
Authority
WO
WIPO (PCT)
Prior art keywords
operator
deep learning
operators
run
compiler
Prior art date
Application number
PCT/CN2022/128369
Other languages
English (en)
French (fr)
Inventor
王震
姜程
郑辉煌
陈特峰
孙黎
刘益群
陈浩泽
王悦
石晓伟
蓝翔
Original Assignee
北京百度网讯科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京百度网讯科技有限公司 filed Critical 北京百度网讯科技有限公司
Publication of WO2023221406A1 publication Critical patent/WO2023221406A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present disclosure relates to the field of artificial intelligence technology, in particular to the field of deep learning technology, and specifically to an operating method, device and electronic equipment of a deep learning compiler.
  • the deep learning compiler is a compiler software used to solve the problem of docking multiple hardware platforms and deep learning frameworks.
  • An operator of a deep learning framework usually corresponds to an operator of a deep learning compiler.
  • To run the deep learning framework on a deep learning compiler all operators of the deep learning framework need to be supported. Since the operators of the deep learning framework are usually developed before the deep learning compiler, the deep learning framework cannot run on the deep learning compiler due to the time difference between the two.
  • the present disclosure provides an operating method, device and electronic equipment for a deep learning compiler.
  • a method for running a deep learning compiler including:
  • the deep learning framework is controlled to run a second operator in the first set of operators based on the target data.
  • a device for running a deep learning compiler including:
  • the acquisition module is used to obtain the first operator set including the first operator and the second operator of the deep learning framework, and the second operator set of the deep learning compiler;
  • a first operating module configured to, in response to the second operator set including the first operator, control the deep learning compiler to run the first operator to obtain target data;
  • a transmission module used to transmit the target data to the deep learning framework
  • the second running module is used to control the deep learning framework to run the second operator in the first operator set based on the target data.
  • an electronic device including:
  • the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform the method according to any one of the first aspects. .
  • a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause the computer to perform the method according to any one of the first aspects.
  • a computer program product comprising a computer program/instructions that, when executed by a processor, implements the steps of the method according to any one of the first aspects.
  • the present disclosure by obtaining the first operator set of the deep learning framework and the second operator set of the deep learning compiler; in response to the second operator set including The first operator of the first operator set controls the deep learning compiler to run the first operator to obtain the target data; then transfers the target data to the deep learning framework, and controls the deep learning framework to run the first operator set based on the target data.
  • Figure 1 is a schematic flowchart of a method for operating a deep learning compiler according to the first embodiment of the present disclosure
  • Figure 2 is a schematic diagram of a deep learning framework calculation diagram provided according to the second embodiment of the present disclosure.
  • Figure 3 is a schematic diagram of a sub-operator set provided according to the second embodiment of the present disclosure.
  • Figure 4 is a schematic diagram of a subgraph provided according to a second embodiment of the present disclosure.
  • Figure 5 is a block diagram of an operating device of a deep learning compiler used to implement an embodiment of the present disclosure
  • FIG. 6 is a block diagram of an electronic device used to implement the running method of the deep learning compiler according to the embodiment of the present disclosure.
  • the acquisition, storage and application of user personal information are in compliance with relevant laws and regulations and do not violate public order and good customs.
  • the deep learning compiler can be run in conjunction with the deep learning framework.
  • an operator of the deep learning framework corresponds to an operator of the deep learning compiler, which must be run on the deep learning compiler.
  • the deep learning compiler needs to support all operators of the deep learning framework.
  • the operators of the deep learning framework are usually developed first, and then the operators that correspond to the operators of the deep learning framework are developed in the deep learning compiler. In this way, due to the time difference between the development of the operators of the deep learning framework and the deep learning compiler, the deep learning compiler cannot fully support all the operators of the deep learning framework, resulting in the deep learning framework not being able to be used in the deep learning compiler. run.
  • the second operator set includes existing operators in the second operator set It also exists in the case of the first operator in the first operator set, controlling the deep learning compiler to run the first operator to obtain the target data; wherein the first operator is part of the operators in the first operator set;
  • the target data is then transmitted to the deep learning framework, and the deep learning framework is controlled to run the second operator in the first operator set that exists in the first operator set but does not exist in the second operator set based on the target data.
  • Figure 1 is a schematic flowchart of a method for running a deep learning compiler provided by an embodiment of the present disclosure. This method can be applied to a server. As shown in Figure 1, the method includes:
  • the operator set of the deep learning framework that is, the first operator set
  • the operator set of the deep learning compiler that is, the second operator set
  • the first operator set is a set of all operators supported by the deep learning framework, which may include a first operator and a second operator.
  • the second operator set is a set of all operators supported by the deep learning compiler.
  • the specific operators in the first operator set and the second operator set may be the same or different.
  • the second operator set includes at least part of the operators in the first operator set.
  • the second operator set also includes the first operator.
  • S120 In response to the second operator set including the first operator, control the deep learning compiler to run the first operator to obtain target data.
  • the first operator is an operator that exists in the second operator set and exists in the first operator set, and the first operator is part of the operators in the first operator set. It is understandable that the first operator may be one or multiple.
  • the second operator set includes some of the operators in the first operator set.
  • Zi is the first operator.
  • the first operator set includes operators a, b, c, d, e, f
  • the second operator set includes operators a, b, c, d
  • the first operator exists in the second operator Operators that are sub-sets and exist in the first operator set, namely operators a, b, c, d, and the first operator is part of the operators in the first operator set.
  • the first operator can be Adjust the deep learning compiler to run, that is, control the deep learning compiler to run the first operator. After running the first operator, you can get the result data after the operation, that is, the target data.
  • the front-end operators of the deep learning compiler do not fully support all operators of the deep learning framework
  • the docking operation of the deep learning compiler and the deep learning framework can be realized, so that some operators of the deep learning framework can realize deep learning compilation. It accelerates the operation of the server and connects the functions of the hardware to improve operating efficiency.
  • the target data after controlling the deep learning compiler to run the first operator and obtaining the target data, can be transmitted to the deep learning framework, so that the deep learning framework can continue to run the first set of operators based on the target data. other operators except the first operator.
  • a data transmission interface can be set between the deep learning compiler and the deep learning framework, and the target data can be transmitted to the deep learning framework through this transmission interface.
  • the second operator is an operator that exists in the first operator set but does not exist in the second operator set. It is understandable that the second operator may be one or multiple.
  • a second operator that exists in the first operator set but does not exist in the second operator set may be determined. For example, assuming that the first operator set includes operators a, b, c, d, e, f, and the second operator set includes operators a, b, c, d, then the second operator can exist in the th One operator set, but operators e and f that do not exist in the second operator set. Then, the deep learning framework can be controlled to continue to run the second operator based on the target data, that is, to continue to run the second operator that exists in the first operator set but does not exist in the second operator set based on the target data.
  • the second operator set includes both the second operator set and the third operator set.
  • control the deep learning compiler to run the first operator to obtain the target data; where the first operator is part of the operators in the first operator set; and then the target data Transmit to the deep learning framework, and control the deep learning framework to run the second operator in the first operator set that exists in the first operator set but does not exist in the second operator set based on the target data.
  • the specific implementation method of controlling the deep learning compiler to execute the first operator to obtain the target data may include:
  • the sub-operator set when there are multiple first operators, when controlling the deep learning compiler to execute the first operator to obtain the target data, the sub-operator set may be first constructed based on all the first operators. For example, all the first operators can be constructed into a set, that is, a set of sub-operators, and then the deep learning framework is controlled to uniformly allocate running resources to the set of sub-operators.
  • the running resources can include video memory, stream (stream), etc. .
  • the deep learning compiler can be controlled to run a set of sub-operators based on the running resources allocated by the deep learning framework to obtain the target data.
  • the deep learning framework can be controlled to uniformly allocate running resources to all first operators, without the need to allocate running resources to each first operator separately. In this way, the allocation efficiency of running resources can be improved, thereby further improving the running efficiency.
  • the specific implementation method of constructing a sub-operator set based on all first operators may be:
  • the first operator is divided into at least one sub-operator set.
  • the first operator when constructing a set of sub-operators based on all first operators, the first operator can be divided into a set of sub-operators according to the input-output relationship between the first operators.
  • the set of sub-operators may be It may be one or multiple.
  • the first operator set of the deep learning framework can actually be understood as a calculation graph.
  • the sub-operator set can actually be a sub-computation graph, that is, a subgraph, in the calculation graph. For example, take the calculation diagram of the deep learning framework as shown in Figure 2 as an example.
  • the smaller circular nodes in Figure 2 are variable nodes, and the larger circular nodes are operator nodes, that is, a, b, c, d, e, and f all represent operators, and the lines between variable nodes and operator nodes represent the input and output relationships between them.
  • the operators supported by the deep learning compiler include a, b, c, and d, then the first operator is a, b, c, d, and the second operator is e, f.
  • operator b can be combined with operator Sub-d is divided into a sub-operator set, that is, Union 1 in Figure 3, that is, operator b and operator d are divided into a sub-graph, that is, sub-graph 1 in Figure 4; the operator a and operator c are divided into a sub-operator set, that is, Union 2 in Figure 3, that is, operator a and operator c are divided into a sub-graph, that is, sub-graph 2 in Figure 4.
  • the specific implementation method of controlling the deep learning compiler to run the first operator to obtain the target data may include:
  • the deep learning framework is controlled to allocate running resources to the i-th first operator; i belongs to [1, n-1], and n is the number of the first operator;
  • the deep learning compiler when there are multiple first operators, when the deep learning compiler is controlled to execute the first operator to obtain the target data, it can also be used for each operator in sequence according to the running order of the first operator.
  • the first operator allocates running resources and runs each first operator based on the running resources.
  • the deep learning framework can be controlled to allocate running resources to the i-th first operator according to the running order of the first operator, for example, allocate running resources such as video memory and stream to the i-th first operator.
  • the deep learning compiler is controlled to run the i-th first operator based on the running resources allocated by the deep learning framework for the i-th first operator.
  • the first operator including operators a, b, and c as an example.
  • the running order of operators a, b, and c is a, c, b
  • Resource a1 controls the deep learning compiler to run operator a based on running resource a1; then, controls the deep learning framework to allocate running resource b1 to operator b, controls the deep learning compiler to run operator b based on running resource b1; finally, controls the depth
  • the learning framework allocates running resource c1 to operator c, and controls the deep learning compiler to run operator c based on running resource c1.
  • each time the deep learning compiler is controlled to allocate running resources to the first operator i can start from 1; when there are at least two first operators with no obvious running order, i can be simultaneously Allocate running resources to at least two first operators without obvious running order, and control the deep learning compiler to run the at least two first operators without obvious running order based on the respective running resources of the at least two first operators without obvious running order.
  • the first operator alternatively, you can randomly allocate respective running resources to at least two first operators with no obvious running order, and run the at least two first operators with no obvious running order in random order.
  • running resources can also be allocated to each first operator in turn, and each first operator can be run in turn to provide another method of running the first operator through the deep learning compiler.
  • the method of running the deep learning compiler may also include the following processing:
  • the deep learning compiler is controlled to run all operators in the first operator set.
  • the deep learning compiler already fully supports the first set of operators of the deep learning framework. Therefore, after obtaining the first operator set of the deep learning framework and the second operator set of the deep learning compiler, if the second operator set includes all operators in the first operator set, that is, the deep learning compiler The processor supports all operators of the deep learning framework. At this time, the deep learning compiler can be controlled to run all operators in the first operator set, that is, all operators of the deep learning framework can be run on the deep learning compiler. That is to say, when the deep learning compiler has operators that correspond to the operators of the deep learning framework, the deep learning framework can be completely run on the deep learning compiler. In this way, the deep learning compiler and the deep learning framework can be The specific conditions of the respective supported operators enable the deep learning framework to run on the deep learning compiler.
  • the present disclosure also provides a deep learning compiler running device.
  • the deep learning compiler running device 500 may include:
  • the acquisition module 510 may be used to acquire a first operator set including a first operator and a second operator of the deep learning framework, and a second operator set of the deep learning compiler;
  • the first running module 520 may be configured to, in response to the second operator set including the first operator, control the deep learning compiler to run the first operator to obtain target data;
  • the transmission module 530 can be used to transmit the target data to the deep learning framework
  • the second running module 540 may be used to control the deep learning framework to run the second operator in the first operator set based on the target data.
  • the first operation module 520 may include:
  • a building unit that can be used to build a set of sub-operators based on all the first operators
  • the first resource allocation unit may be used to control the deep learning framework to allocate running resources to the sub-operator set;
  • the first running unit may be used to control the deep learning compiler to run the set of sub-operators based on the running resources to obtain target data.
  • the building unit may be used for:
  • the first operator is divided into at least one sub-operator set.
  • the first operation module 520 may include:
  • the second resource allocation unit may be used to control the deep learning framework to allocate running resources to the i-th first operator according to the running order of the first operator; i belongs to [1, n-1], n is the number of first operators;
  • the second operating unit can be used to control the deep learning compiler to run the i-th first operator
  • the third resource allocation unit may be used to control the deep learning framework to allocate running resources to the i+1th first operator;
  • the third operating unit may be used to control the deep learning compiler to run the i+1th first operator.
  • the first operation module 520 may include:
  • a fourth resource allocation unit used to control the deep learning framework to simultaneously allocate running resources to the at least two first operators
  • the fourth running unit is used to control the deep learning compiler to run the at least two first operators based on the respective running resources of the at least two first operators.
  • the running device 500 of the deep learning compiler may also include:
  • the third running module may be used to control the deep learning compiler to run the operators in the first operator set when the second operator set includes all operators in the first operator set. All operators.
  • the present disclosure also provides an electronic device, a readable storage medium, and a computer program product.
  • FIG. 6 shows a schematic block diagram of an example electronic device 600 that may be used to implement embodiments of the present disclosure.
  • Electronic devices are intended to refer to various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers.
  • Electronic devices may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices, and other similar computing devices.
  • the components shown herein, their connections and relationships, and their functions are examples only and are not intended to limit implementations of the disclosure described and/or claimed herein.
  • the device 600 includes a computing unit 601 that can execute according to a computer program stored in a read-only memory (ROM) 602 or loaded from a storage unit 608 into a random access memory (RAM) 603 Various appropriate actions and treatments. In the RAM 603, various programs and data required for the operation of the device 600 can also be stored.
  • Computing unit 601, ROM 602 and RAM 603 are connected to each other via bus 604.
  • An input/output (I/O) interface 605 is also connected to bus 604.
  • I/O interface 605 Multiple components in device 600 are connected to I/O interface 605, including: input unit 606, such as keyboard, mouse, etc.; output unit 607, such as various types of displays, speakers, etc.; storage unit 608, such as magnetic disk, optical disk, etc. ; and communication unit 609, such as a network card, modem, wireless communication transceiver, etc.
  • the communication unit 609 allows the device 600 to exchange information/data with other devices through computer networks such as the Internet and/or various telecommunications networks.
  • Computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various dedicated artificial intelligence (AI) computing chips, various computing units that run machine learning model algorithms, digital signal processing processor (DSP), and any appropriate processor, controller, microcontroller, etc.
  • the computing unit 601 performs various methods and processes described above, such as the running method of the deep learning compiler.
  • the running method of the deep learning compiler may be implemented as a computer software program, which is tangibly included in a machine-readable medium, such as the storage unit 608.
  • part or all of the computer program may be loaded and/or installed onto device 600 via ROM 602 and/or communication unit 609.
  • the computer program When the computer program is loaded into the RAM 603 and executed by the computing unit 601, one or more steps of the above-described running method of the deep learning compiler may be performed.
  • the computing unit 601 may be configured to execute the running method of the deep learning compiler in any other suitable manner (eg, by means of firmware).
  • Various implementations of the systems and techniques described above may be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on a chip implemented in a system (SOC), load programmable logic device (CPLD), computer hardware, firmware, software, and/or a combination thereof.
  • FPGAs field programmable gate arrays
  • ASICs application specific integrated circuits
  • ASSPs application specific standard products
  • SOC system
  • CPLD load programmable logic device
  • computer hardware firmware, software, and/or a combination thereof.
  • These various embodiments may include implementation in one or more computer programs executable and/or interpreted on a programmable system including at least one programmable processor, the programmable processor
  • the processor which may be a special purpose or general purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.
  • An output device may be a special purpose or general purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.
  • An output device may be a special purpose or general purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.
  • Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing device, such that the program codes, when executed by the processor or controller, cause the functions specified in the flowcharts and/or block diagrams/ The operation is implemented.
  • the program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with an instruction execution system, apparatus, or device.
  • the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, laptop disks, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM portable compact disk read-only memory
  • magnetic storage device or any suitable combination of the above.
  • the systems and techniques described herein may be implemented on a computer having a display device (eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user ); and a keyboard and pointing device (eg, a mouse or a trackball) through which a user can provide input to the computer.
  • a display device eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
  • a keyboard and pointing device eg, a mouse or a trackball
  • Other kinds of devices may also be used to provide interaction with the user; for example, the feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and may be provided in any form, including Acoustic input, voice input or tactile input) to receive input from the user.
  • the systems and techniques described herein may be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., A user's computer having a graphical user interface or web browser through which the user can interact with implementations of the systems and technologies described herein), or including such backend components, middleware components, or any combination of front-end components in a computing system.
  • the components of the system may be interconnected by any form or medium of digital data communication (eg, a communications network). Examples of communication networks include: local area network (LAN), wide area network (WAN), the Internet, and blockchain networks.
  • Computer systems may include clients and servers. Clients and servers are generally remote from each other and typically interact over a communications network. The relationship of client and server is created by computer programs running on corresponding computers and having a client-server relationship with each other.
  • the server can be a cloud server, also known as cloud computing server or cloud host. It is a host product in the cloud computing service system to solve the problem of traditional physical host and VPS service ("Virtual Private Server", or "VPS" for short) Among them, there are defects such as difficult management and weak business scalability.
  • the server can also be a distributed system server or a server combined with a blockchain.

Abstract

深度学习编译器的运行方法包括:获取深度学习框架的包括第一算子和第二算子的第一算子集合,以及深度学习编译器的第二算子集合;响应于第二算子集合包括第一算子,控制深度学习编译器运行第一算子得到目标数据;将目标数据传输至深度学习框架;控制深度学习框架基于目标数据运行第一算子集合中的第二算子。

Description

深度学习编译器的运行方法、装置及电子设备
相关申请的交叉引用
本申请基于申请号为2022105732208、申请日为2022年5月19日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本公开涉及人工智能技术领域,尤其涉及深度学习技术领域,具体涉及一种深度学习编译器的运行方法、装置及电子设备。
背景技术
深度学习编译器是一种用于解决多种硬件平台和深度学习框架对接问题的编译器软件。深度学习框架的一个算子通常对应一个深度学习编译器的算子,要在深度学习编译器上运行深度学习框架,则需要支持深度学习框架的所有算子。而由于深度学习框架的算子通常会先于深度学习编译器开发出来,故而,会由于二者的时间差导致深度学习框架无法在深度学习编译器运行。
发明内容
本公开提供了一种深度学习编译器的运行方法、装置及电子设备。
根据本公开的第一方面,提供了一种深度学习编译器的运行方法,包括:
获取深度学习框架的包括第一算子和第二算子的第一算子集合,以及深度学习编译器的第二算子集合;
响应于所述第二算子集合包括第一算子,控制所述深度学习编译器运行所述第一算子,得到目标数据;
将所述目标数据传输至所述深度学习框架;
控制所述深度学习框架基于所述目标数据运行所述第一算子集合中的第二算子。
根据本公开的第二方面,提供了一种深度学习编译器的运行装置,包括:
获取模块,用于获取深度学习框架的包括第一算子和第二算子的第一算子集合,以及深度学习编译器的第二算子集合;
第一运行模块,用于响应于所述第二算子集合包括第一算子,控制所述深度学习编译器运行所述第一算子,得到目标数据;
传输模块,用于将所述目标数据传输至所述深度学习框架;
第二运行模块,用于控制所述深度学习框架基于所述目标数据运行所述第一算子集合中的第二算子。
根据本公开的第三方面,提供了一种电子设备,包括:
至少一个处理器;以及
与所述至少一个处理器通信连接的存储器;其中,
所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行第一方面中任一项所述的方法。
根据本公开的第四方面,提供了一种存储有计算机指令的非瞬时计算机可读存储介质,其中,所述计算机指令用于使所述计算机执行第一方面中任一项所述的方法。
根据本公开的第五方面,提供了一种计算机程序产品,包括计算机程序/指令,所述计算机程序/指令在被处理器执行时实现第一方面中任一项所述的方法中的步骤。
在本公开的实施例中,通过获取深度学习框架的第一算子集合,以及深度学习编译器的第二算子集合;响应于第二算子集合包括既存在于第二算子集合也存在于第一算子集合的第一算子,控制深度学习编译器运行第一算子,得到目标数据;再将目标数据传输至深度学习框架,控制深度学习框架基于目标数据运行第一算子集合中存在于第一算子集合,但不存在于第二算子集合的第二算子。这样,可以在深度学习编译器不完全支持深度学习框架的所有算子的情况下,让深度学习框架的部分算子在深度学习编译器运行,避免由于时间差导致深度学习框架无法在深度学习编译器运行的情况,提高运行效率。
应当理解,本部分所描述的内容并非旨在标识本公开的实施例的关键或重要特征,也不用于限制本公开的范围。本公开的其它特征将通过以下的说明书而变得容易理解。
附图说明
附图用于更好地理解本方案,不构成对本公开的限定。其中:
图1是根据本公开第一实施例提供的一种深度学习编译器的运行方法的流程示意图;
图2是根据本公开第二实施例提供的一种深度学习框架计算图的示意图;
图3是根据本公开第二实施例提供的一种子算子集合的示意图;
图4是根据本公开第二实施例提供的一种子图的示意图;
图5是用来实现本公开实施例的深度学习编译器的运行装置的框图;
图6是用来实现本公开实施例的深度学习编译器的运行方法的电子设备的框图。
具体实施方式
以下结合附图对本公开的示范性实施例做出说明,其中包括本公开实施例的各种细节以助于理解,应当将它们认为仅仅是示范性的。因此,本领域普通技术人员应当认识到,可以对这里描述的实施例做出各种改变和修改,而不会背离本公开的范围和精神。同样,为了清楚和简明,以下的描述中省略了对公知功能和结构的描述。
本公开的技术方案中,所涉及的用户个人信息的获取、存储和应用等,均符合相关法律法规的规定,且不违背公序良俗。
目前,深度学习编译器已经可以与深度学习框架结合进行运行,深度学习编译器对接深度学习框架时,深度学习框架的一个算子对应一个深度学习编译器的算子,要在深度学习编译器上运行深度学习框架,则需要深度学习编译器支持深度学习框架的所有算子。但是,在进行算子的开发时,通常是先在开发出深度学习框架的算子,再在深度学习编译器开发出与深度学习框架的算子一一对应的算子。如此,会由于开发深度学习框架与深度学习编译器的算子的时间差,导致出现深度学习编译器未能完全支持深度学习框架的所有算子的情况,从而导致深度学习框架不能在深度学习编译器运行。
基于此,在本公开的一些实施例中,通过获取深度学习框架的第一算子集合,以及深度学习编译器的第二算子集合;在第二算子集合包括既存在于第二算子集合也存在于第一算子集合的第一算子的情况下,控制深度学习编译器运行第一算子,得到目标数据;其中,第一算子为第一算子集合中的部分算子;再将目标数据传输至深度学习框架,控制深度学习框架基于目标数据运行第一算子集合中存在于第一算子集合,但不存在于第二算子集合的第二算子。这样,可以在深度学习编译器不完全支持深度学习框架的所有算子的情况下,让深度学习框架的部分算子在深度学习编译器运行,避免由于时间差导致深度学习框架无法在深度学习编译器运行的情况,提高运行效率。
下面结合附图,说明本公开各实施例提供的技术方案。
图1是本公开实施例提供的一种深度学习编译器的运行方法的流程示意图,该方法可以应用于服务器,如图1所示,该方法包括:
S110,获取深度学习框架的包括第一算子和第二算子的第一算子集合,以及深度学习编译器的第二算子集合。
在本公开实施例中,可以获取深度学习框架的算子集合,即第一算子集合,以及获取深度学习编译器的算子集合,即第二算子集合。其中,第一算子集合为深度学习框架支持的所有算子的集合,其可以包括第一算子和第二算子。第二算子集合为深度学习编译器支持的所有算子的集合。第一算子集合和第二算子集合中的具体算子可能相同,也可能不同。第二算子集合包括至少部分的第一算子集合中的算子,例如第二算子集合也包括第一算子。
S120,响应于第二算子集合包括第一算子,控制深度学习编译器运行第一算子,得到目标数据。
其中,第一算子为存在于第二算子集合且存在于第一算子集合的算子,且第一算子为第一算子集合中的部分算子。可以理解的,第一算子可能是一个也可能是多个。
在本公开实施例中,在获取搭配深度学习框架的第一算子集合以及深度学习编译器的第二算子集合之后,可以确定第二算子集合是否包括第一算子集合中的部分算子,即第一算子。例如,假设第一算子集合包括算子a、b、c、d、e、f,第二算子集合包括算子a、b、c、d,则第一算子为存在于第二算子集合且存在于第一算子集合的算子,即算子a、b、c、d,第一算子为第一算子集合中的部分算子。只要第二算子集合包括第一算子(即只要深度学习编译器的第二算子集合包括部分对应于深度学习框架的第一算子集合中的算子),则可以将第一算子调至深度学习编译器运行,即控制深度学习编译器运行第一算子,在运行第一算子后可以得到运行后的结果数据,即目标数据。如此,可以在深度学习编译器前端算子不完全支持深度学习框架的所有算子的情况下,实现深度学习编译器和深度学习框架的对接运行,让深度学习框架的部分算子实现深度学习编译器加速运行、对接硬件的功能,从而提高运行效率。
S130,将目标数据传输至深度学习框架。
在本公开实施例中,在控制深度学习编译器运行第一算子,得到目标数据之后,可以将目标数据传输至深度学习框架,以使深度学习框架可以基于目标数据继续运行第一算子集合中除第一算子外的其他算子。示例性的,可以在深度学习编译器和深度学习框架间设置数据传输接口,通过这个传输接口可以将目标数据传输至深度学习框架。
S140,控制深度学习框架基于目标数据运行第一算子集合中的第二算子。
其中,第二算子为存在于第一算子集合,但不存在于第二算子集合的算子。可以理 解的,第二算子可能是一个也可能是多个。
在本公开实施例中,在将目标数据传输至深度学习框架之后,可以确定存在于第一算子集合,但不存在于第二算子集合的第二算子。例如,仍假设第一算子集合包括算子a、b、c、d、e、f,第二算子集合包括算子a、b、c、d,则第二算子可以为存在于第一算子集合,但不存在于第二算子集合的算子e、f。然后,可以控制深度学习框架基于目标数据继续运行第二算子,也即基于目标数据继续运行存在于第一算子集合,但不存在于第二算子集合的第二算子。
在本公开的实施例中,通过获取深度学习框架的第一算子集合,以及深度学习编译器的第二算子集合;在第二算子集合包括既存在于第二算子集合也存在于第一算子集合的第一算子的情况下,控制深度学习编译器运行第一算子,得到目标数据;其中,第一算子为第一算子集合中的部分算子;再将目标数据传输至深度学习框架,控制深度学习框架基于目标数据运行第一算子集合中存在于第一算子集合,但不存在于第二算子集合的第二算子。这样,可以在深度学习编译器不完全支持深度学习框架的所有算子的情况下,让深度学习框架的部分算子在深度学习编译器运行,避免由于时间差导致深度学习框架无法在深度学习编译器运行的情况,提高运行效率。
在本公开的一些实施例中,在第一算子为多个的情况下,上述控制深度学习编译器执行第一算子,得到目标数据的具体实现方式可以包括:
基于所有第一算子构建子算子集合;
控制深度学习框架为子算子集合分配运行资源;
控制深度学习编译器基于运行资源运行子算子集合,得到目标数据。
在本公开的实施例中,在第一算子为多个的情况下,控制深度学习编译器执行第一算子得到目标数据时,可以先基于所有的第一算子构建子算子集合。示例性的,可以将所有的第一算子构建成一个集合,即子算子集合,再控制深度学习框架统一为子算子集合分配运行资源,例如运行资源可以包括显存、stream(流)等。然后,可以控制深度学习编译器基于深度学习框架分配的运行资源运行子算子集合,得到目标数据。这样,可以控制控制深度学习框架为所有第一算子统一分配运行资源,无需为每个第一算子单独分配运行资源,如此,可以提高运行资源的分配效率,从而进一步提高运行效率。
在本公开的一些实施例中,基于所有第一算子构建子算子集合的具体实现方式可以为:
根据第一算子间的输入输出关系,将第一算子划分为至少一个子算子集合。
在本公开的实施例中,在基于所有第一算子构建子算子集合时,可以根据第一算子间的输入输出关系,将第一算子划分子算子集合,子算子集合可能是一个也可能是多个。可以理解的,深度学习框架的第一算子集合实际上可以理解为一个计算图,相应的,子算子集合实际上可以是计算图中的一个子计算图,即子图。示例性的,以深度学习框架的计算图如图2所示为例,图2中较小的圆形节点为变量节点,较大的圆形节点为算子节点,即图2中的a、b、c、d、e、f均表示算子,变量节点与算子节点间的线代表它们之间的输入输出关系。假设深度学习编译器支持的算子包括a、b、c、d,则第一算子为a、b、c、d,第二算子为e、f。结合图2可知,算子a与算子c之间存在输入输出关系,算子b与算子d之间存在输入输出关系,如图3和图4所示,则可以将算子b与算子d划分为一个子算子集合,即图3中的联合体1(Union 1),也即将算子b与算子d划分为一个子图,即图4中的子图1;将算子a与算子c划分为一个子算子集合,即图3中的联合体2(Union 2),也即将算子a与算子c划分为一个子图,即图4中的子图2。
这样,根据输入输出关系划分子算子集合,可以使得划分得到的子算子集合的逻辑关系更加清晰,更便于深度学习编译器运行,从而可以进一步提高运行效率。
在本公开的一些实施例中,在第一算子为多个的情况下,控制深度学习编译器运行第一算子,得到目标数据的具体实现方式可以包括:
按照第一算子的运行顺序,控制深度学习框架为第i个第一算子分配运行资源;i属于[1,n-1],n为第一算子的数量;
控制深度学习编译器基于第i个第一算子的运行资源,运行第i个第一算子;
控制深度学习框架为第i+1个第一算子分配运行资源;
控制深度学习编译器基于第i+1个第一算子的运行资源,运行第i+1个第一算子。
在本公开的实施例中,在第一算子为多个的情况下,控制深度学习编译器执行第一算子得到目标数据时,还可以按照第一算子的运行顺序,依次为每个第一算子分配运行资源,基于运行资源运行每个第一算子。示例性的,可以按照第一算子的运行顺序,控制深度学习框架为第i个第一算子分配运行资源,例如为第i个第一算子分配显存、stream等运行资源。再控制深度学习编译器基于深度学习框架为第i个第一算子分配的运行资源运行第i个第一算子。然后,可以控制深度学习框架为第i+1个第一算子分配运行资源,再控制深度学习编译器基于深度学习框架为第i+1个第一算子分配的运行资源运行第i+1个第一算子。依次类推,直至控制深度学习编译器基于第n个第一算子的运行资源运行第n个第一算子。
具体的,以第一算子包括算子a、b、c为例,假设算子a、b、c的运行顺序为a、c、b,则可以先控制深度学习框架为算子a分配运行资源a1,控制深度学习编译器基于运行资源a1运行算子a;然后,控制深度学习框架为算子b分配运行资源b1,控制深度学习编译器基于运行资源b1运行算子b;最后,控制深度学习框架为算子c分配运行资源c1,控制深度学习编译器基于运行资源c1运行算子c。
可以理解的,在每次控制深度学习编译器为第一算子分配运行资源时,i可以从1开始取值;当存在无明显运行顺序的至少两个第一算子时,则可以同时为无明显运行顺序的至少两个第一算子分配运行资源,控制深度学习编译器基于前述无明显运行顺序的至少两个第一算子各自的运行资源,运行前述无明显运行顺序的至少两个第一算子;或者,可以随机先后为无明显运行顺序的至少两个第一算子分配各自的运行资源,并随机先后运行前述无明显运行顺序的至少两个第一算子。
这样,也可以依次为每个第一算子分配运行资源,依次运行每个第一算子,以提供另一种通过深度学习编译器运行第一算子的方法。
在本公开的一些实施例中,深度学习编译器的运行方法,还可以包括如下处理:
在第二算子集合包括第一算子集合中的所有算子的情况下,控制深度学习编译器运行第一算子集合中的所有算子。
在本公开的实施例中,考虑到可能会存在深度学习编译器已经完全支持深度学习框架的第一算子集合的情况。故而,在获取深度学习框架的第一算子集合,以及深度学习编译器的第二算子集合之后,如果第二算子集合包括第一算子集合中的所有算子,也即深度学习编译器支持深度学习框架的所有算子。此时,则可以控制深度学习编译器运行第一算子集合中的所有算子,也即在深度学习编译器上运行深度学习框架的所有算子。也就是说,在深度学习编译器存在与深度学习框架的算子一一对应的算子时,可以在深度学习编译器上完全运行深度学习框架,如此,可以根据深度学习编译器和深度学习框架各自支持的算子的具体情况,实现深度学习框架在深度学习编译器上运行。
基于相同的发明构思,本公开还提供了一种深度学习编译器的运行装置,如图5所示,深度学习编译器的运行装置500,可以包括:
获取模块510,可以用于获取深度学习框架的包括第一算子和第二算子的第一算子集合,以及深度学习编译器的第二算子集合;
第一运行模块520,可以用于响应于所述第二算子集合包括第一算子,控制所述深度学习编译器运行所述第一算子,得到目标数据;
传输模块530,可以用于将所述目标数据传输至所述深度学习框架;
第二运行模块540,可以用于控制所述深度学习框架基于所述目标数据运行所述第一算子集合中的第二算子。
在本公开的一些实施例中,在所述第一算子为多个的情况下,所述第一运行模块520,可以包括:
构建单元,可以用于基于所有所述第一算子构建子算子集合;
第一资源分配单元,可以用于控制所述深度学习框架为所述子算子集合分配运行资源;
第一运行单元,可以用于控制所述深度学习编译器基于所述运行资源运行所述子算子集合,得到目标数据。
在本公开的一些实施例中,所述构建单元,具体可以用于:
根据所述第一算子间的输入输出关系,将所述第一算子划分为至少一个子算子集合。
在本公开的一些实施例中,在所述第一算子为多个的情况下,所述第一运行模块520,可以包括:
第二资源分配单元,可以用于按照所述第一算子的运行顺序,控制所述深度学习框架为第i个所述第一算子分配运行资源;i属于[1,n-1],n为第一算子的数量;
第二运行单元,可以用于控制所述深度学习编译器运行第i个所述第一算子;
第三资源分配单元,可以用于控制所述深度学习框架为第i+1个所述第一算子分配运行资源;
第三运行单元,可以用于控制所述深度学习编译器运行第i+1个所述第一算子。
在本公开的一些实施例中,在所述第一算子为多个且至少两个第一算子可相对于彼此独立运行的情况下,所述第一运行模块520,可以包括:
第四资源分配单元,用于控制所述深度学习框架同时为所述至少两个第一算子分配运行资源;
第四运行单元,用于控制所述深度学习编译器基于所述至少两个第一算子各自的运行资源,运行所述至少两个第一算子。
在本公开的一些实施例中,所述深度学习编译器的运行装置500,还可以包括:
第三运行模块,可以用于在所述第二算子集合包括所述第一算子集合中的所有算子的情况下,控制所述深度学习编译器运行所述第一算子集合中的所有算子。
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的 实施例中进行了详细描述,此处将不做详细阐述说明。
根据本公开的实施例,本公开还提供了一种电子设备、一种可读存储介质和一种计算机程序产品。
图6示出了可以用来实施本公开的实施例的示例电子设备600的示意性框图。电子设备旨在表示各种形式的数字计算机,诸如,膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置,诸如,个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例,并且不意在限制本文中描述的和/或者要求的本公开的实现。
如图6所示,设备600包括计算单元601,其可以根据存储在只读存储器(ROM)602中的计算机程序或者从存储单元608加载到随机访问存储器(RAM)603中的计算机程序,来执行各种适当的动作和处理。在RAM 603中,还可存储设备600操作所需的各种程序和数据。计算单元601、ROM 602以及RAM 603通过总线604彼此相连。输入/输出(I/O)接口605也连接至总线604。
设备600中的多个部件连接至I/O接口605,包括:输入单元606,例如键盘、鼠标等;输出单元607,例如各种类型的显示器、扬声器等;存储单元608,例如磁盘、光盘等;以及通信单元609,例如网卡、调制解调器、无线通信收发机等。通信单元609允许设备600通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据。
计算单元601可以是各种具有处理和计算能力的通用和/或专用处理组件。计算单元601的一些示例包括但不限于中央处理单元(CPU)、图形处理单元(GPU)、各种专用的人工智能(AI)计算芯片、各种运行机器学习模型算法的计算单元、数字信号处理器(DSP)、以及任何适当的处理器、控制器、微控制器等。计算单元601执行上文所描述的各个方法和处理,例如深度学习编译器的运行方法。例如,在一些实施例中,深度学习编译器的运行方法可被实现为计算机软件程序,其被有形地包含于机器可读介质,例如存储单元608。在一些实施例中,计算机程序的部分或者全部可以经由ROM 602和/或通信单元609而被载入和/或安装到设备600上。当计算机程序加载到RAM 603并由计算单元601执行时,可以执行上文描述的深度学习编译器的运行方法的一个或多个步骤。备选地,在其他实施例中,计算单元601可以通过其他任何适当的方式(例如,借助于固件)而被配置为执行深度学习编译器的运行方法。
本文中以上描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路 系统、场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、芯片上系统的系统(SOC)、负载可编程逻辑设备(CPLD)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括:实施在一个或者多个计算机程序中,该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释,该可编程处理器可以是专用或者通用可编程处理器,可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令,并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。
用于实施本公开的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器或控制器,使得程序代码当由处理器或控制器执行时使流程图和/或框图中所规定的功能/操作被实施。程序代码可以完全在机器上执行、部分地在机器上执行,作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
为了提供与用户的交互,可以在计算机上实施此处描述的系统和技术,该计算机具有:用于向用户显示信息的显示装置(例如,CRT(阴极射线管)或者LCD(液晶显示器)监视器);以及键盘和指向装置(例如,鼠标或者轨迹球),用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互;例如,提供给用户的反馈可以是任何形式的传感反馈(例如,视觉反馈、听觉反馈、或者触觉反馈);并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。
可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如,作为数据服务器)、或者包括中间件部件的计算系统(例如,应用服务器)、或者包括前端部件的 计算系统(例如,具有图形用户界面或者网络浏览器的用户计算机,用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如,通信网络)来将系统的部件相互连接。通信网络的示例包括:局域网(LAN)、广域网(WAN)、互联网和区块链网络。
计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。服务器可以是云服务器,又称为云计算服务器或云主机,是云计算服务体系中的一项主机产品,以解决了传统物理主机与VPS服务("Virtual Private Server",或简称"VPS")中,存在的管理难度大,业务扩展性弱的缺陷。服务器也可以为分布式系统的服务器,或者是结合了区块链的服务器。
应该理解,可以使用上面所示的各种形式的流程,重新排序、增加或删除步骤。例如,本发公开中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行,只要能够实现本公开公开的技术方案所期望的结果,本文在此不进行限制。
上述具体实施方式,并不构成对本公开保护范围的限制。本领域技术人员应该明白的是,根据设计要求和其他因素,可以进行各种修改、组合、子组合和替代。任何在本公开的精神和原则之内所作的修改、等同替换和改进等,均应包含在本公开保护范围之内。

Claims (15)

  1. 一种深度学习编译器的运行方法,包括:
    获取深度学习框架的包括第一算子和第二算子的第一算子集合,以及深度学习编译器的第二算子集合;
    响应于所述第二算子集合包括第一算子,控制所述深度学习编译器运行所述第一算子,得到目标数据;
    将所述目标数据传输至所述深度学习框架;
    控制所述深度学习框架基于所述目标数据运行所述第一算子集合中的第二算子。
  2. 根据权利要求1所述的方法,其中,在所述第一算子为多个的情况下,所述控制所述深度学习编译器执行所述第一算子,得到目标数据,包括:
    基于所有所述第一算子构建子算子集合;
    控制所述深度学习框架为所述子算子集合分配运行资源;
    控制所述深度学习编译器基于所述运行资源运行所述子算子集合,得到目标数据。
  3. 根据权利要求2所述的方法,其中,所述基于所有所述第一算子构建子算子集合,包括:
    根据所述第一算子间的输入输出关系,将所述第一算子划分为至少一个子算子集合。
  4. 根据权利要求1至3中任一项所述的方法,其中,在所述第一算子为多个的情况下,所述控制所述深度学习编译器运行所述第一算子,得到目标数据,包括:
    按照所述第一算子的运行顺序,控制所述深度学习框架为第i个所述第一算子分配运行资源;i属于[1,n-1],n为第一算子的数量;
    控制所述深度学习编译器基于第i个所述第一算子的运行资源,运行第i个所述第一算子;
    控制所述深度学习框架为第i+1个所述第一算子分配运行资源;
    控制所述深度学习编译器基于第i+1个所述第一算子的运行资源,运行第i+1个所述第一算子。
  5. 根据权利要求1至3中任一项所述的方法,其中,所述第一算子为多个,且至少两个第一算子可相对于彼此独立运行,所述控制所述深度学习编译器运行所述第一算子,得到目标数据,包括:
    控制所述深度学习框架同时为所述至少两个第一算子分配运行资源;
    控制所述深度学习编译器基于所述至少两个第一算子各自的运行资源,运行所述至少两个第一算子。
  6. 根据权利要求1至5中任一项所述的方法,所述方法还包括:
    响应于所述第二算子集合包括所述第一算子集合中的所有算子,控制所述深度学习编译器运行所述第一算子集合中的所有算子。
  7. 一种深度学习编译器的运行装置,包括:
    获取模块,用于获取深度学习框架的包括第一算子和第二算子的第一算子集合,以及深度学习编译器的第二算子集合;
    第一运行模块,用于响应于所述第二算子集合包括第一算子,控制所述深度学习编译器运行所述第一算子,得到目标数据;
    传输模块,用于将所述目标数据传输至所述深度学习框架;
    第二运行模块,用于控制所述深度学习框架基于所述目标数据运行所述第一算子集合中的第二算子。
  8. 根据权利要求7所述的装置,其中,在所述第一算子为多个的情况下,所述第一运行模块包括:
    构建单元,用于基于所有所述第一算子构建子算子集合;
    第一资源分配单元,用于控制所述深度学习框架为所述子算子集合分配运行资源;
    第一运行单元,用于控制所述深度学习编译器基于所述运行资源运行所述子算子集合,得到目标数据。
  9. 根据权利要求8所述的装置,其中,所述构建单元具体用于:
    根据所述第一算子间的输入输出关系,将所述第一算子划分为至少一个子算子集合。
  10. 根据权利要求7至9中任一项所述的装置,其中,在所述第一算子为多个的情况下,所述第一运行模块包括:
    第二资源分配单元,用于按照所述第一算子的运行顺序,控制所述深度学习框架为第i个所述第一算子分配运行资源;i属于[1,n-1],n为第一算子的数量;
    第二运行单元,用于控制所述深度学习编译器运行第i个所述第一算子;
    第三资源分配单元,用于控制所述深度学习框架为第i+1个所述第一算子分配运行资源;
    第三运行单元,用于控制所述深度学习编译器运行第i+1个所述第一算子。
  11. 根据权利要求7至9中任一项所述的装置,其中,所述第一算子为多个,且至少两个第一算子可相对于彼此独立运行,所述第一运行模块包括:
    第四资源分配单元,用于控制所述深度学习框架同时为所述至少两个第一算子分配运行资源;
    第四运行单元,用于控制所述深度学习编译器基于所述至少两个第一算子各自的运行资源,运行所述至少两个第一算子。
  12. 根据权利要求7至11中任一项所述的装置,所述装置还包括:
    第三运行模块,用于响应于第二算子集合包括所述第一算子集合中的所有算子,控制所述深度学习编译器运行所述第一算子集合中的所有算子。
  13. 一种电子设备,包括:
    至少一个处理器;以及
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1至6中任一项所述的方法。
  14. 一种存储有计算机指令的非瞬时计算机可读存储介质,其中,所述计算机指令用于使所述计算机执行根据权利要求1至6中任一项所述的方法。
  15. 一种计算机程序产品,包括计算机程序/指令,所述计算机程序/指令在被处理器执行时实现根据权利要求1至6中任一项所述的方法中的步骤。
PCT/CN2022/128369 2022-05-19 2022-10-28 深度学习编译器的运行方法、装置及电子设备 WO2023221406A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210573220.8A CN114924745A (zh) 2022-05-19 2022-05-19 深度学习编译器的运行方法、装置及电子设备
CN202210573220.8 2022-05-19

Publications (1)

Publication Number Publication Date
WO2023221406A1 true WO2023221406A1 (zh) 2023-11-23

Family

ID=82809802

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/128369 WO2023221406A1 (zh) 2022-05-19 2022-10-28 深度学习编译器的运行方法、装置及电子设备

Country Status (2)

Country Link
CN (1) CN114924745A (zh)
WO (1) WO2023221406A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114924745A (zh) * 2022-05-19 2022-08-19 北京百度网讯科技有限公司 深度学习编译器的运行方法、装置及电子设备
CN117009092B (zh) * 2023-10-07 2024-02-02 之江实验室 基于多重多臂老虎机的编译时间资源动态分配方法及系统

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110515626A (zh) * 2019-08-20 2019-11-29 Oppo广东移动通信有限公司 深度学习计算框架的代码编译方法及相关产品
US20210056389A1 (en) * 2019-08-23 2021-02-25 Samsung Electronics Co., Ltd. Neural network computing method and system including the same
CN112598121A (zh) * 2020-12-21 2021-04-02 北京时代民芯科技有限公司 一种面向深度学习编译器的高效算子优化方法
CN113342345A (zh) * 2021-05-17 2021-09-03 北京百度网讯科技有限公司 深度学习框架的算子融合方法、装置
CN113821208A (zh) * 2021-06-18 2021-12-21 清华大学 用于深度学习算子的编译优化方法及系统
CN113918351A (zh) * 2021-12-08 2022-01-11 之江实验室 深度学习框架与ai加速卡片内分布式训练适配方法和装置
CN114924745A (zh) * 2022-05-19 2022-08-19 北京百度网讯科技有限公司 深度学习编译器的运行方法、装置及电子设备

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104965687B (zh) * 2015-06-04 2017-12-08 北京东方国信科技股份有限公司 基于指令集生成的大数据处理方法及装置
WO2020037475A1 (zh) * 2018-08-20 2020-02-27 华为技术有限公司 一种调试应用程序的方法及设备
CN109902819B (zh) * 2019-02-12 2023-04-18 Oppo广东移动通信有限公司 神经网络计算方法、装置、移动终端及存储介质
CN112947933A (zh) * 2021-02-24 2021-06-11 上海商汤智能科技有限公司 一种算子的执行方法、装置、计算机设备及存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110515626A (zh) * 2019-08-20 2019-11-29 Oppo广东移动通信有限公司 深度学习计算框架的代码编译方法及相关产品
US20210056389A1 (en) * 2019-08-23 2021-02-25 Samsung Electronics Co., Ltd. Neural network computing method and system including the same
CN112598121A (zh) * 2020-12-21 2021-04-02 北京时代民芯科技有限公司 一种面向深度学习编译器的高效算子优化方法
CN113342345A (zh) * 2021-05-17 2021-09-03 北京百度网讯科技有限公司 深度学习框架的算子融合方法、装置
CN113821208A (zh) * 2021-06-18 2021-12-21 清华大学 用于深度学习算子的编译优化方法及系统
CN113918351A (zh) * 2021-12-08 2022-01-11 之江实验室 深度学习框架与ai加速卡片内分布式训练适配方法和装置
CN114924745A (zh) * 2022-05-19 2022-08-19 北京百度网讯科技有限公司 深度学习编译器的运行方法、装置及电子设备

Also Published As

Publication number Publication date
CN114924745A (zh) 2022-08-19

Similar Documents

Publication Publication Date Title
WO2023221406A1 (zh) 深度学习编译器的运行方法、装置及电子设备
WO2017041398A1 (zh) 数据传输方法和装置
US10771584B2 (en) Provisioning using pre-fetched data in serverless computing environments
EP4083802A1 (en) Resource scheduling method and apparatus, device, storage medium, and program product
US11003579B2 (en) Method, device and computer program product for managing distributed system
US20220357990A1 (en) Method for allocating data processing tasks, electronic device, and storage medium
WO2023066182A1 (zh) 文件处理方法、装置、设备及存储介质
WO2023093016A1 (zh) 云端代码开发系统、方法、装置、设备及存储介质
JP2023036774A (ja) 共有メモリのアクセス制御方法、共有メモリのアクセス制御装置、電子機器および自動運転車両
WO2023201981A1 (zh) 混合专家模型实现方法、系统、电子设备及存储介质
WO2023284387A1 (zh) 基于联邦学习的模型训练方法、装置、系统、设备和介质
KR20210156243A (ko) 딥러닝 프레임워크의 훈련 방법, 장치 및 저장 매체
WO2023082716A1 (zh) 在Linux系统中操作安卓应用的方法、装置和设备
WO2023088313A1 (zh) 输入设备虚拟化的方法、装置、电子设备和存储介质
EP4060496A2 (en) Method, apparatus, device and storage medium for running inference service platform
WO2023206889A1 (zh) 模型推理方法、装置、设备及存储介质
KR20220151585A (ko) 업무 데이터 처리 방법, 장치, 전자 기기, 저장 매체 및 컴퓨터 프로그램
WO2023000697A1 (zh) 通过语音方式控制程序的方法、设备及程序产品
CN108845946B (zh) 一种终端、调试系统以及终端调试方法
US11556401B2 (en) Method, device and computer program product for optimizing remote call
WO2023088462A1 (zh) 用于处理数据的方法、装置、设备以及存储介质
WO2023147718A1 (zh) 内容初始化方法、装置、电子设备和存储介质
US20230144949A1 (en) Virtual-machine cold migration method and apparatus, electronic device and storage medium
US20230063599A1 (en) Edge computing network, data transmission method and apparatus, device and storage medium
WO2023165058A1 (zh) 存储器模型的镜像存储实现方法、装置及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22942430

Country of ref document: EP

Kind code of ref document: A1