CN112966824A

CN112966824A - Deployment method and device of inference library and electronic equipment

Info

Publication number: CN112966824A
Application number: CN202110119167.XA
Authority: CN
Inventors: 张飞飞; 胡志强; 王运凯; 赵乔; 成瑜娟
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-01-28
Filing date: 2021-01-28
Publication date: 2021-06-15

Abstract

The disclosure discloses a deployment method and device of an inference library and electronic equipment, and relates to the technical field of artificial intelligence, in particular to the field of deep learning. The specific implementation scheme is as follows: acquiring configuration information of a hardware environment where an inference engine is located, and generating an inference base request command based on the configuration information; executing the request command to generate an inference library request and sending the inference library request to a server; receiving a data packet of a target inference library matched with the hardware environment, which is fed back by the server; deploying the target inference library on the hardware environment according to the data packet. According to the method, the data packet of the target inference library matched with the configuration information of the hardware environment where the inference engine is located can be obtained, the target inference library is deployed on the hardware environment according to the data packet, compatibility of the target inference library and the hardware environment where the inference engine is located can be guaranteed, reliability of deployment of the target inference library is improved, source code compiling is not needed for obtaining the target inference library, and time consumption is short.

Description

Deployment method and device of inference library and electronic equipment

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for deploying an inference library, an electronic device, a storage medium, and a computer program product.

Background

Currently, an inference engine is used to deploy a model to a hardware environment, and forward compute the model through an inference library by using the computing power of the hardware environment to obtain an inference result of the model. However, in the related art, source code compiling is required to obtain the inference library, the source code compiling is highly required by hardware and takes a long time, and in addition, the obtained inference library may have a problem of incompatibility with a hardware environment.

Disclosure of Invention

A method, an apparatus, an electronic device, a storage medium, and a computer program product for deploying an inference library are provided.

According to a first aspect, there is provided a method for deploying an inference base, comprising: acquiring configuration information of a hardware environment where an inference engine is located, and generating an inference base request command based on the configuration information; executing the request command to generate an inference library request and sending the inference library request to a server; receiving a data packet of a target inference library matched with the hardware environment, which is fed back by the server; deploying the target inference library on the hardware environment according to the data packet.

According to a second aspect, there is provided an apparatus for deploying an inference library, comprising: the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring configuration information of a hardware environment where an inference engine is located and generating an inference base request command based on the configuration information; the request module is used for executing the request command to generate an inference library request and sending the inference library request to the server; the receiving module is used for receiving a data packet of a target inference library matched with the hardware environment, which is fed back by the server; and the deployment module is used for deploying the target inference library on the hardware environment according to the data packet.

According to a third aspect, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of deploying an inference library according to the first aspect of the present disclosure.

According to a fourth aspect, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of deploying an inference library according to the first aspect of the present disclosure.

According to a fifth aspect, there is provided a computer program product comprising a computer program, wherein the computer program, when executed by a processor, implements the method of deploying an inference library of the first aspect of the disclosure.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic flow diagram of a method of deployment of an inference library according to a first embodiment of the present disclosure;

fig. 2 is a schematic flow chart illustrating conversion of a deep learning model format in a method for deploying an inference base according to a second embodiment of the present disclosure;

fig. 3 is a schematic flowchart of converting the format of a deep learning model in a method for deploying an inference base according to a third embodiment of the present disclosure;

fig. 4 is a schematic flowchart of converting the format of a deep learning model in a method for deploying an inference library according to a fourth embodiment of the present disclosure;

FIG. 5 is a block diagram of an inference library deployment apparatus according to a first embodiment of the present disclosure;

FIG. 6 is a block diagram of an inference library deployment apparatus according to a second embodiment of the present disclosure;

FIG. 7 is a block diagram of an electronic device for implementing a method for deployment of an inference library of an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

AI (Artificial Intelligence) is a technical science that studies and develops theories, methods, techniques and application systems for simulating, extending and expanding human Intelligence. At present, the AI technology has the advantages of high automation degree, high accuracy and low cost, and is widely applied.

DL (Deep Learning) is a new research direction in the field of ML (Machine Learning), and is an intrinsic rule and an expression level for Learning sample data, so that a Machine can have an analysis Learning capability like a human, can recognize data such as characters, images and sounds, and is widely applied to speech and image recognition.

Fig. 1 is a flowchart illustrating a method for deploying an inference library according to a first embodiment of the present disclosure.

As shown in fig. 1, a method for deploying an inference library according to a first embodiment of the present disclosure includes:

s101, acquiring configuration information of a hardware environment where the inference engine is located, and generating an inference library request command based on the configuration information.

It should be noted that the execution subject of the deployment method of the inference base of the embodiment of the present disclosure may be an inference engine. Alternatively, the inference engine may be in the form of an APP (Application).

In embodiments of the present disclosure, the inference engine may be pre-deployed on the hardware environment. Alternatively, the installation package of the inference engine may be a pip software package, for example, if the name of the inference engine is "referr _ engine", the user may install the inference engine using the command pip install referr _ engine.

Further, the configuration information of the hardware environment where the inference engine is located can be obtained, and the inference library request command is generated based on the configuration information.

It is understood that the configuration information of the hardware environment of different inference engines may be different, for example, the number of bits of the operating system of the hardware environment of inference engine a is 32 bits, the number of memory is 2G, the number of bits of the operating system of the hardware environment of inference engine B is 64 bits, and the number of memory is 8G.

It will be appreciated that the required inference library may differ for different hardware environment configuration information. In the embodiment of the disclosure, the inference bank request command can be generated based on the configuration information, so that different configuration information can correspond to different inference bank request commands, and the flexibility is higher.

Optionally, the generating of the inference base request command based on the configuration information may include pre-establishing a mapping relationship or a mapping table between the configuration information and the inference base request command, and after the configuration information is obtained, querying the mapping relationship or the mapping table to obtain the corresponding inference base request command. It should be noted that the mapping relationship or the mapping table may be set according to actual situations, and is not limited herein.

And S102, executing the request command to generate an inference library request and sending the inference library request to the server.

In the embodiments of the present disclosure, the inference engine may execute the inference base request command to generate an inference base request, and send the inference base request to the server. It will be appreciated that the inference library request carries configuration information for the hardware environment in which the inference engine resides.

In the embodiment of the present disclosure, the inference engine may establish a network connection with the server for data transmission with the server, and optionally, the network connection may be a mobile network, for example, 3G, 4G, 5G, and the like.

Optionally, the server may be a cloud server.

And S103, receiving a data packet of the target inference library matched with the hardware environment and fed back by the server.

In the embodiment of the disclosure, the server may store the data packets of the plurality of inference libraries in advance, and different inference libraries may correspond to different data packets. After receiving the inference library request, the server can acquire a data packet of a target inference library matched with the hardware environment from the stored data packets of the plurality of inference libraries according to the configuration information of the hardware environment where the inference engine in the inference library request is located, and feed back the data packet to the inference engine. Accordingly, the inference engine can receive data packets of the target inference library matched with the hardware environment, which are fed back by the server.

And S104, deploying a target inference library on the hardware environment according to the data packet.

It is to be understood that the inference engine may deploy the target inference library on the hardware environment according to the data packets, for example, the data packets may be stored into a storage space of the hardware, and then the installation of the target inference library is performed according to the data packets in the storage space of the hardware, so as to implement the deployment of the target inference library on the hardware environment.

In summary, according to the deployment method of the inference base in the embodiment of the disclosure, the inference base request command may be generated based on the configuration information of the hardware environment where the inference engine is located, the request command is executed to generate the inference base request and send the inference base request to the server, the data packet of the target inference base matched with the hardware environment and fed back by the server is received, and the target inference base is deployed on the hardware environment according to the data packet. Therefore, the data packet of the target inference library matched with the configuration information of the hardware environment where the inference engine is located can be obtained, the target inference library is deployed on the hardware environment according to the data packet, the compatibility of the target inference library and the hardware environment where the inference engine is located can be guaranteed, the deployment reliability of the target inference library is improved, source code compiling is not needed for obtaining the target inference library, and time consumption is short.

On the basis of any of the above embodiments, after deploying the target inference library on the hardware environment according to the data packet in step S104, the method further includes converting the format of the deep learning model in response to that the format of the inference engine does not match the format of the deep learning model.

It should be noted that, in the embodiment of the present disclosure, the format of the deep learning model may be set according to actual situations, and is not limited herein.

It can be understood that the deep learning model can be deployed in a hardware environment, and the format of the inference engine may not match the format of the deep learning model, and then the format of the deep learning model can be converted in response to the fact that the format of the inference engine does not match the format of the deep learning model, so as to convert the format of the deep learning model into the format matched with the format of the inference engine, that is, the format of the converted inference engine matches the format of the deep learning model, so that the compatibility between the converted deep learning model and the hardware environment where the inference engine is located can be ensured, and the deployment reliability of the deep learning model is improved.

Optionally, as shown in fig. 2, converting the format of the deep learning model may include:

s201, generating a model conversion program based on the first format information of the inference engine and the second format information of the deep learning model.

In the embodiment of the disclosure, the model conversion program can be generated based on the first format information of the inference engine and the second format information of the deep learning model, and the influence of the first format information of the inference engine and the second format information of the deep learning model on the model conversion program can be comprehensively considered, so that the model conversion program is more accurate.

In the embodiment of the present disclosure, the content, the type, and the like of the model conversion program are not limited. For example, the model transformation program may employ a programming language such as Java, Python, or the like.

Optionally, the model conversion program is generated based on the first format information of the inference engine and the second format information of the deep learning model, and two possible implementations may be included as follows.

Mode 1, reading the first format information and the second format information from the configuration file, and writing the first format information and the second format information into a model conversion program template to generate a model conversion program.

In the embodiment of the disclosure, a configuration file may be preset, and the configuration file includes first format information of the inference engine and second format information of the deep learning model. It can be understood that the user can set the configuration file according to the actual situation.

Further, the first format information and the second format information may be read from the configuration file and written into the model conversion program template to generate the model conversion program. The model conversion program template is used for generating a model conversion program according to the first format information and the second format information, and can be preset according to actual conditions.

Therefore, the method can read the first format information and the second format information in the configuration file, and write the first format information and the second format information into the model conversion program template to obtain the model conversion program.

And 2, reading the storage positions of the first format information and the second format information from the configuration file, and writing the storage positions into the model conversion program template to generate the model conversion program.

In the embodiment of the disclosure, a configuration file may be preset, and the configuration file includes storage locations of the first format information of the inference engine and the second format information of the deep learning model. It can be understood that the user can set the configuration file according to the actual situation.

Further, the storage locations of the first format information and the second format information may be read from the configuration file and written into the model conversion program template to generate the model conversion program. The model conversion program template is used for generating a model conversion program according to the storage positions of the first format information and the second format information, and the model conversion program template can be preset according to actual conditions.

Optionally, after the storage location is written into the model conversion program template, the model conversion program template may obtain the first format information of the inference engine and the second format information of the deep learning model according to the storage location, and generate the model conversion program according to the first format information of the inference engine and the second format information of the deep learning model.

Therefore, the method can read the storage positions of the first format information and the second format information in the configuration file, and write the storage positions into the model conversion program template to obtain the model conversion program.

And S202, executing a model conversion program to convert the format of the deep learning model.

Optionally, reading the storage locations of the first format information and the second format information from the configuration file, and writing the storage locations into the model conversion program template to generate the model conversion program, executing the model conversion program to convert the format of the deep learning model, which may include obtaining the first format information of the inference engine and the second format information of the deep learning model according to the storage locations of the first format information and the second format information.

Therefore, the method can generate a model conversion program based on the first format information of the inference engine and the second format information of the deep learning model, and can realize the purpose of converting the format of the deep learning model by executing the model conversion program.

On the basis of any of the above embodiments, as shown in fig. 3, converting the format of the deep learning model further includes:

s301, establishing verification data based on the model parameters of the deep learning model and the configuration information of the target inference library.

It will be appreciated that different deep learning models may correspond to different model parameters and different target inference libraries may correspond to different configuration information.

In the embodiment of the disclosure, the verification data can be constructed based on the model parameters of the deep learning model and the configuration information of the target inference library, and the influence of the model parameters of the deep learning model and the configuration information of the target inference library on the verification data can be comprehensively considered, so that the verification data is more accurate.

And S302, generating an executable file based on the verification data and executing, wherein the executable file is used for verifying the reasoning speed of the deep learning model under the target reasoning library.

In the embodiment of the disclosure, an Executable File (Executable File) can be generated and executed based on the verification data, and the Executable File can be used for verifying the reasoning speed of the deep learning model under the target reasoning library. It should be noted that an executable file refers to a file that can be loaded and executed by an operating system.

It should be noted that, in the embodiment of the present disclosure, the type of the executable file is not limited. For example, the executable file may be an exe, sys, com, etc. type file.

Therefore, the method can construct verification data based on the model parameters of the deep learning model and the configuration information of the target inference library, generate an executable file based on the verification data and execute the executable file, and therefore the inference speed of the deep learning model under the target inference library can be verified based on the executable file.

On the basis of any of the above embodiments, as shown in fig. 4, converting the format of the deep learning model further includes:

s401, acquiring a matched interface program template based on the model parameters of the deep learning model and the configuration information of the target inference library.

In the embodiment of the disclosure, the matched interface program template can be obtained based on the model parameters of the deep learning model and the configuration information of the target inference library, and the influence of the model parameters of the deep learning model and the configuration information of the target inference library on the interface program template can be comprehensively considered, so that the matching degree of the obtained interface program template is better.

Optionally, the obtaining of the matched interface program template based on the model parameters of the deep learning model and the configuration information of the target inference library may include pre-establishing a mapping relationship or a mapping table between the model parameters, the configuration information, and the interface program template, and after obtaining the model parameters and the configuration information, querying the mapping relationship or the mapping table to obtain the matched interface program template. It should be noted that the mapping relationship or the mapping table may be set according to actual situations, and is not limited herein.

In the embodiment of the present disclosure, the Interface is not limited, and may be an Application Programming Interface (API), for example.

S402, writing the identification information of the deep learning model and the identification information of the target inference library into the matched interface program template to generate a matched calling interface program.

In the embodiment of the disclosure, corresponding identification information can be set for the deep learning model and the inference library in advance, so as to distinguish different deep learning models and inference libraries.

It should be noted that, in the embodiment of the present disclosure, the form of the identification information is not limited, for example, the identification information may be in the form of a text, a character string, and the like.

Furthermore, the identification information of the deep learning model and the identification information of the target inference library can be written into the matched interface program template to generate a matched calling interface program, and the influence of the identification information of the deep learning model and the identification information of the target inference library on the calling interface program can be comprehensively considered, so that the calling interface program is more accurate.

It is to be understood that after the matching calling interface program is generated, the calling interface program can be executed to convert the format of the deep learning model by the calling interface.

Therefore, the method can acquire the matched interface program template based on the model parameters of the deep learning model and the configuration information of the target inference library, and write the identification information of the deep learning model and the identification information of the target inference library into the matched interface program template to generate the matched calling interface program.

Fig. 5 is a block diagram of a deployment apparatus of an inference library according to a first embodiment of the present disclosure.

As shown in fig. 5, the deployment apparatus 500 of the inference library of the embodiment of the present disclosure includes: an acquisition module 501, a request module 502, a receiving module 503, and a deployment module 504.

An obtaining module 501, configured to obtain configuration information of a hardware environment where an inference engine is located, and generate an inference library request command based on the configuration information;

a request module 502, configured to execute the request command to generate an inference library request, and send the inference library request to a server;

a receiving module 503, configured to receive a data packet of a target inference base matched with the hardware environment, where the data packet is fed back by the server;

a deployment module 504 configured to deploy the target inference base on the hardware environment according to the data packet.

In summary, the deployment apparatus of the inference base according to the embodiment of the present disclosure may generate the inference base request command based on the configuration information of the hardware environment where the inference engine is located, execute the request command to generate the inference base request, send the inference base request to the server, receive the data packet of the target inference base matched with the hardware environment and fed back by the server, and deploy the target inference base on the hardware environment according to the data packet. Therefore, the data packet of the target inference library matched with the configuration information of the hardware environment where the inference engine is located can be obtained, the target inference library is deployed on the hardware environment according to the data packet, the compatibility of the target inference library and the hardware environment where the inference engine is located can be guaranteed, the deployment reliability of the target inference library is improved, source code compiling is not needed for obtaining the target inference library, and time consumption is short.

Fig. 6 is a block diagram of a deployment apparatus of an inference library according to a first embodiment of the present disclosure.

As shown in fig. 6, a deployment apparatus 600 of an inference library according to an embodiment of the present disclosure includes: the system comprises an acquisition module 601, a request module 602, a receiving module 603, a deployment module 604, a conversion module 605, a first generation module 606 and a second generation module 607.

The acquiring module 601 and the acquiring module 501 have the same function and structure, the requesting module 602 and the requesting module 502 have the same function and structure, the receiving module 603 and the receiving module 503 have the same function and structure, and the deploying module 604 and the deploying module 504 have the same function and structure.

In an embodiment of the present disclosure, the apparatus for deploying an inference base 600 further includes: a conversion module 605 for: in response to the inference engine's format not matching the deep learning model's format, converting the deep learning model's format.

In an embodiment of the present disclosure, the conversion module 605 includes: a generating unit 6051 configured to generate a model conversion program based on the first format information of the inference engine and the second format information of the deep learning model; a conversion unit 6052, configured to execute the model conversion program to convert the format of the deep learning model.

In an embodiment of the present disclosure, the generating unit 6051 is specifically configured to: reading the first format information and the second format information from a configuration file; and writing the first format information and the second format information into a model conversion program template to generate the model conversion program.

In an embodiment of the present disclosure, the generating unit 6051 is specifically configured to: reading storage positions of the first format information and the second format information from a configuration file; writing the storage location into a model transformation program template to generate the model transformation program.

In an embodiment of the present disclosure, the apparatus for deploying an inference base 600 further includes: a first generating module 606 for: establishing verification data based on the model parameters of the deep learning model and the configuration information of the target inference library; generating an executable file based on the verification data and executing the executable file; wherein the executable file is used for verifying the inference speed of the deep learning model under the target inference library.

In an embodiment of the present disclosure, the apparatus for deploying an inference base 600 further includes: a second generating module 607 for: acquiring a matched interface program template based on the model parameters of the deep learning model and the configuration information of the target inference library; and writing the identification information of the deep learning model and the identification information of the target inference library into the matched interface program template to generate a matched calling interface program.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 7 illustrates a schematic block diagram of an example electronic device 700 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 7, the electronic device 700 includes a computing unit 701, which may perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the electronic device 700 can also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

A number of components in the electronic device 700 are connected to the I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, or the like; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the electronic device 700 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

Computing unit 701 may be a variety of general purpose and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The computing unit 701 performs the various methods and processes described above, such as the deployment method of the inference library described in fig. 1-4. For example, in some embodiments, the deployment method of the inference library may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 700 via the ROM 702 and/or the communication unit 709. When loaded into RAM 703 and executed by the computing unit 701, may perform one or more steps of the method of deployment of an inference library as described above. Alternatively, in other embodiments, the computing unit 701 may be configured in any other suitable manner (e.g., by means of firmware) to perform the deployment method of the inference library.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.

According to an embodiment of the present disclosure, there is also provided a computer program product including a computer program, where the computer program is executed by a processor to implement the deployment method of the inference library according to the above-mentioned embodiment of the present disclosure.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method of deployment of an inference library, comprising:

acquiring configuration information of a hardware environment where an inference engine is located, and generating an inference base request command based on the configuration information;

executing the request command to generate an inference library request and sending the inference library request to a server;

receiving a data packet of a target inference library matched with the hardware environment, which is fed back by the server;

deploying the target inference library on the hardware environment according to the data packet.

2. The inference library deployment method of claim 1, wherein after said deploying the target inference library on the hardware environment according to the data package, further comprising:

in response to the inference engine's format not matching the deep learning model's format, converting the deep learning model's format.

3. The inference library deployment method of claim 2, wherein the converting a format of the deep learning model comprises:

generating a model conversion program based on the first format information of the inference engine and the second format information of the deep learning model;

and executing the model conversion program to convert the format of the deep learning model.

4. The deployment method of the inference library according to claim 3, wherein said generating a model conversion program based on the first format information of the inference engine and the second format information of the deep learning model comprises:

reading the first format information and the second format information from a configuration file;

and writing the first format information and the second format information into a model conversion program template to generate the model conversion program.

5. The deployment method of the inference library according to claim 3, wherein said generating a model conversion program based on the first format information of the inference engine and the second format information of the deep learning model comprises:

reading storage positions of the first format information and the second format information from a configuration file;

writing the storage location into a model transformation program template to generate the model transformation program.

6. The inference library deployment method of claim 2, further comprising:

establishing verification data based on the model parameters of the deep learning model and the configuration information of the target inference library;

generating an executable file based on the verification data and executing the executable file;

wherein the executable file is used for verifying the inference speed of the deep learning model under the target inference library.

7. The inference library deployment method of claim 2, further comprising:

acquiring a matched interface program template based on the model parameters of the deep learning model and the configuration information of the target inference library;

and writing the identification information of the deep learning model and the identification information of the target inference library into the matched interface program template to generate a matched calling interface program.

8. An apparatus for deploying an inference library, comprising:

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring configuration information of a hardware environment where an inference engine is located and generating an inference base request command based on the configuration information;

the request module is used for executing the request command to generate an inference library request and sending the inference library request to the server;

the receiving module is used for receiving a data packet of a target inference library matched with the hardware environment, which is fed back by the server;

and the deployment module is used for deploying the target inference library on the hardware environment according to the data packet.

9. The inference library deployment apparatus of claim 8, further comprising: a conversion module to:

10. The inference library deployment apparatus of claim 9, wherein the conversion module comprises:

the generating unit is used for generating a model conversion program based on the first format information of the inference engine and the second format information of the deep learning model;

and the conversion unit is used for executing the model conversion program and converting the format of the deep learning model.

11. The inference library deployment apparatus according to claim 10, wherein the generating unit is specifically configured to:

12. The inference library deployment apparatus according to claim 10, wherein the generating unit is specifically configured to:

13. The inference repository deployment apparatus according to claim 9, further comprising: a first generation module to:

14. The inference repository deployment apparatus according to claim 9, further comprising: a second generation module to:

15. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of deploying an inference library of any of claims 1-7.

16. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of deploying an inference library of any of claims 1-7.

17. A computer program product comprising a computer program which, when executed by a processor, implements a method of deploying an inference library as claimed in any of claims 1-7.