CN112015519A

CN112015519A - Model online deployment method and device

Info

Publication number: CN112015519A
Application number: CN202010886567.9A
Authority: CN
Inventors: 李昌盛; 禹平
Original assignee: Jiangsu Yincheng Network Technology Co Ltd
Current assignee: Jiangsu Yincheng Network Technology Co Ltd
Priority date: 2020-08-28
Filing date: 2020-08-28
Publication date: 2020-12-01

Abstract

The invention provides a model on-line deployment method and device, which receive training data, train a model based on TensorFlow according to the training data, and obtain the trained model; storing the trained model based on a preset path, and packaging the model, wherein the model has an alias and a version number corresponding to the model; generating a corresponding container based on the encapsulated model, wherein the container calls the encapsulated model based on the version number and the alias; and respectively deploying the trained model and the container to the infrastructure of the Paas online platform based on Kubernetes. Compared with the traditional mainstream mode, the method has the advantages that an algorithm engineer does not need to pay attention to the construction of the model service, the model can be studied intensively, the model updating and the multi-model deployment are rapid and efficient, the short-time online and the verification of more models are possible, meanwhile, the method is combined with Kubernets, the scheduling management of the model service is more reliable, the change of the access amount can be adapted dynamically, and the service is more reliable.

Description

Model online deployment method and device

Technical Field

The invention relates to an artificial intelligence technology, in particular to a method and a device for deploying a model line.

Background

AI services have become one of the most important dependencies of today's internet companies, being the core technology for companies to keep growing at a high rate and operate efficiently. The existing AI service mode is mainly based on existing big data, and is characterized in that a model is trained offline through an algorithm, and then a result generated by the model is served in an online service mode.

However, it is always difficult in the industry to use a model trained in an offline environment for online deployment to reach near real-time inference. The main current modes are as follows:

1. offline training + caching mode. The method has the advantages that codes deduced on line are not needed, the offline environment and the online environment are decoupled, the online time delay is low, the defects that a large number of recommendation results need to be stored in the cache, the cache test is huge, user context information cannot be introduced, the problems of cold start cannot be solved, and the like, and the accuracy and the flexibility are limited.

2. Self-developed model training and service platform. The method has the advantages that the method can be customized according to the hardware environment and business requirements of a company, gives consideration to models and efficiency, can also adapt to some special model requirements, and has the defects that the cost for realizing the models is too high, the iteration period is long, a large number of models cannot be deployed quickly, and the tuning of the models and the comparison difficulty among the models are large.

3. Embedding generates a + lightweight inline model. The industry generally adopts an online mode of generating Embellding data of users and commodities in an offline environment, storing the Embelling data into a cache and realizing a lightweight model fitting optimization target such as logistic regression on line, and the mode still has obvious defects, a model training process is isolated, the model is single, and a large amount of development work is still required for a comparison experiment.

4. The PMML deployment model is utilized. The PMML has the advantages that the trained model can be stored into an XML format, the model is separated from a platform and is a medium for connecting offline training and online deployment, the PMML has the defects that the complex model has insufficient expression capability, the problem of small probability prediction error occurs, the generated file of the complex model is too large, and the time consumption for loading and using is high.

The above method generally has the following problems:

1. the model updating iteration is slow, and a large amount of time is consumed to write a server code;

2. the hot updating of the service can not be guaranteed, and the model updating is completed under the condition of not interrupting the service;

3. in the process of connecting the offline training model with the online service, the offline training model cannot be rapidly and effectively deployed to the online service;

for the above 3 problems, no technical solution can be simultaneously solved.

Disclosure of Invention

The embodiment of the invention provides a model online deployment method and device, which can enable the model to be updated and iterated faster when the model is deployed online, do not need to consume a large amount of time to compile server codes, can ensure the service hot update, can complete the model update under the condition of not interrupting the service, and can rapidly and effectively deploy an offline training model to online service in the linking process between the offline training model and the online service.

In a first aspect of the embodiments of the present invention, there is provided a model-line deployment method, including:

receiving training data, and training a model based on TensorFlow according to the training data to obtain a trained model;

storing the trained model based on a preset path, and packaging the model, wherein the model has an alias and a version number corresponding to the model;

generating a corresponding container based on the encapsulated model, wherein the container calls the encapsulated model based on the version number and the alias;

and respectively deploying the trained model and the container to the infrastructure of the Paas online platform based on Kubernetes.

Optionally, in a possible implementation manner of the first aspect, the receiving training data, and training the model based on the tensrflow according to the training data, to obtain the trained model includes:

pb and a Variable folder, variables, which are not null when the model contains Variable sensors during processing.

Optionally, in a possible implementation manner of the first aspect, the saving the trained model based on a preset path and packaging the model, where the model has an alias and a version number corresponding to the model includes:

generating a Docker environment of TensorFlow serving;

and the Docker environment of the TensorFlow service is used for respectively encapsulating and processing the model after mirroring.

Optionally, in a possible implementation manner of the first aspect, the tensrflow serving includes two access modes, RESTful and GRPC, respectively.

Optionally, in a possible implementation manner of the first aspect, after the step of deploying the trained model and the container to an infrastructure of a Paas platform based on kubernets respectively, the method further includes:

receiving and processing data by the Paas platform;

the container processes the processing data to obtain a version number and an alias corresponding to the corresponding model;

the container processes the processing data based on the version number corresponding to the model and the model after alias calling and packaging to generate result data;

the container outputs the result data based on the Paas platform.

In a first aspect of the embodiments of the present invention, there is provided a model online deployment apparatus, including:

the TensorFlow model training module is used for receiving training data, training the model based on the TensorFlow according to the training data and obtaining the trained model;

the packaging module is used for storing the trained model based on a preset path and packaging the model, wherein the model has an alias and a version number corresponding to the model;

a container generation module, configured to generate a corresponding container based on the encapsulated model, where the container calls the encapsulated model based on a version number and an alias;

and the online deployment module is used for respectively deploying the trained model and the container to an infrastructure of a Paas online platform based on Kubernetes.

Optionally, in a possible implementation manner of the second aspect, the tensrflow model training module is further configured to save the trained model as a pb file saved _ model.

Optionally, in a possible implementation manner of the second aspect, the encapsulation module further includes:

the first encapsulation submodule is used for generating a Docker environment of TensorFlow serving;

and the second encapsulation submodule is used for respectively encapsulating and processing the model after mirroring the model in a Docker environment of the TensorFlow serving.

Optionally, in a possible implementation manner of the second aspect, the apparatus further includes:

the receiving module is used for receiving and processing data by the Paas platform;

the processing module is used for processing the processing data by the container to obtain a version number and an alias corresponding to the corresponding model;

the result generation module is used for processing the processing data by the container based on the version number corresponding to the model and the model after alias calling and packaging to generate result data;

and the output module is used for outputting the result data based on the Paas platform by the container.

In a third aspect of the embodiments of the present invention, a readable storage medium is provided, in which a computer program is stored, which, when being executed by a processor, is adapted to carry out the method according to the first aspect of the present invention and various possible designs of the first aspect of the present invention.

The invention provides a method and a device for deploying a model on a line, which have the following advantages that:

the TensorFlow serving is provided with REST and GRPC interface services, a server code does not need to be written, and a plurality of models can be deployed simultaneously;

the TensorFlow serving can automatically detect and load a new model file, and can complete the hot update of the model under the condition that the service is not terminated;

the Kubernetes containerized deployment is convenient and efficient, can automatically perform boxing and self-repairing, automatically realize horizontal extension, automatically realize service discovery and load balancing, and also can realize automatic release and rollback, and a plurality of service shared memories (PV) can be realized through the Kubernetes, so that the rapid connection from the model training service to the ai online service is completed.

Drawings

FIG. 1 is a flow diagram of a first embodiment of a method of deployment on a model line;

FIG. 2 is a diagram of a connection structure of a model suitable for use in a method of deployment on a model line;

FIG. 3 is a diagram of a connection structure of a suitable platform for use in the model-on-line deployment method;

FIG. 4 is a block diagram of a first embodiment of a model in-line deployment device.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein.

It should be understood that, in various embodiments of the present invention, the sequence numbers of the processes do not mean the execution sequence, and the execution sequence of the processes should be determined by the functions and the internal logic of the processes, and should not constitute any limitation on the implementation process of the embodiments of the present invention.

It should be understood that in the present application, "comprising" and "having" and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be understood that, in the present invention, "a plurality" means two or more. "and/or" is merely an association describing an associated object, meaning that three relationships may exist, for example, and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "comprises A, B and C" and "comprises A, B, C" means that all three of A, B, C comprise, "comprises A, B or C" means that one of A, B, C comprises, "comprises A, B and/or C" means that any 1 or any 2 or 3 of A, B, C comprises.

It should be understood that in the present invention, "B corresponding to a", "a corresponds to B", or "B corresponds to a" means that B is associated with a, and B can be determined from a. Determining B from a does not mean determining B from a alone, but may be determined from a and/or other information. And the matching of A and B means that the similarity of A and B is greater than or equal to a preset threshold value.

As used herein, "if" may be interpreted as "at … …" or "when … …" or "in response to a determination" or "in response to a detection", depending on the context.

The technical solution of the present invention will be described in detail below with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.

The invention relates to the field of artificial intelligence, which explains words in related fields in the artificial intelligence, and comprises the following components:

tensorflow serving: open source software library services;

kubernetes: the Kubernetes is an open source and used for managing containerized applications on a plurality of hosts in a cloud platform, the Kubernetes aims to ensure that the containerized applications are simple and efficient to deploy (powerfull), and the Kubernetes provides a mechanism for deploying, planning, updating and maintaining the applications;

PAAS: a Platform As A Service (PAAS) Platform, which is a business model for providing a Service by using the running and developing environment of an application Service;

docker: docker is an open source application container engine, so that developers can pack their applications and dependency packages into a portable image, and then distribute the image to any popular Linux or Windows machine, and also realize virtualization. The containers are all using sandbox mechanism without any interface between them

File: the file _ get _ contents () function is the preferred method for reading the contents of a file into a string, but is not recommended when encountering a read large file operation. A curl and the like can be considered instead;

pod: known as Plain Old Documentation, is a lightweight markup language for recording Perl programming language.

The present invention provides a method for deploying a model line, as shown in fig. 1, including:

and S10, receiving training data, and training the model based on TensorFlow according to the training data to obtain the trained model.

In step S10, the method includes:

And step S20, saving the trained model based on a preset path, and packaging the model, wherein the model has an alias and a version number corresponding to the model.

In step S20, the method includes:

generating a Docker environment of TensorFlow serving;

and the Docker environment of the TensorFlow service is used for respectively encapsulating and processing the model after mirroring. Wherein the TensorFlow serving respectively comprises two access modes of RESTful and GRPC.

And step S30, generating a corresponding container based on the packaged model, wherein the container calls the packaged model based on the version number and the alias.

In steps S10 to S30, there is an embodiment including, as shown in fig. 2:

firstly, TensorFlow is used for training and constructing a model meeting the user requirements, and after the model training is completed, a SavedModelBuilder method under given _ model. models/liner/1/, where the liner is a model name, 1 refers to a version number (model with the largest default loading number), and an input/output parameter is to be specified for the model by a build _ signature _ def method, the signature def encapsulates information of the input/output sensors respectively, and gives them a custom alias, and when a service is requested, the model is finally saved as a pb file saved _ model.

Then, a Docker environment of TensorFlow serving was prepared. the tensoflow/serving mirror image includes a server code for processing a client request, the service mirror image provides two access modes of RESTful and gRPC, the services are provided through two ports 8501 and 8500 respectively, and when a container is operated, a port in the container and a port of a host need to be bound. TensorFlow serving loads model files under the/models directory, so that an external volume needs to be hung under the/models directory, and the model files are copied under a specified directory, for example, when the hung host directory is/dw/rec-models, the model files generated in the last step are copied under the directory, and the in-container directory is/models/liner/1/.

And step S40, respectively deploying the trained model and the container to an infrastructure of a Paas online platform based on Kubernetes.

In step S40, there is an embodiment including, as shown in fig. 3:

wherein fig. 3 shows the infrastructure of the Paas platform based on Kubernetes, the model service is in Pod in Node. The specific steps in constructing the service are as follows: and (3) constructing projects on a Paas platform based on Kubernetes, adding applications, setting a publishing mode, branch names and the like.

Editing Dockerfile, selecting tensorflow/serving as a basic mirror image, adding port mapping and self-defining an external domain NAME in a self-defining manner, mounting an external volume and declaring that the external volume is hung in a directory/MODELs so as to conveniently load different MODEL files, appointing the pod number, appointing an environment variable MODEL _ NAME, and finally constructing and issuing. The service call accesses the external domain name +/v 1/models/liners, predict, which may be by: the method of transmitting parameters to X POST addresses by curl-d tests whether the service normally runs, the input and the output are json data, the input parameters are { "inputs" { "param1": input1, "param2": input2} }, and the output result is { "outputs": output1 }. When the model needs to be updated, a folder with the name of 2 (or a larger number, and a version number of the labeled model) needs to be newly built under a/dw/rec-models directory, the model file is copied under the directory, TensorFlow serving can automatically detect the change condition of the model directory, the current number is the largest, the model with the latest version is automatically downloaded after the model is loaded and externally available, and the model of the old version is updated.

If a plurality of models need to be deployed, models are required to be created, the name, base _ path and model _ platform values of the models are specified, and a multi-model configuration file, model _ config _ file, is specified when a container is started. If the model services of a plurality of versions need to be operated simultaneously, a model _ version _ policy: { all: } } parameter is added in the model. When the online service request quantity changes, the Kubernets can dynamically adjust the number of the pod according to the current service access condition, when the access quantity is increased rapidly, more pods can be used, and when the access quantity is reduced, partial pod resources can be released. When the online service is difficult to provide the service externally due to failure, kubernets can quickly construct a pod to provide the service, so that the service is prevented from failing.

The on-line deployment method of the model adopts Kubernets containerization and TensorFlow serving to construct an ai service platform, all ai services and model training services are containerized and deployed into the Kubernets platform, and the connection from model training to on-line service is completed in a way of sharing disk storage, so that the establishment of the ai service platform is completed.

After step S40, the method further includes:

step S50, receiving and processing data by the Paas platform;

step S60, the container processes the processing data to obtain the version number and the alias corresponding to the corresponding model;

step S70, the container processes the processing data to generate result data based on the version number corresponding to the model and the model after the alias calling and packaging;

step S80, the container outputs the result data based on Paas platform.

In steps S50 to S80, in the process of processing the processing data that needs to be processed and is input by the terminal, the platform selects the model according to the version number and alias corresponding to the model, inputs the processing data into the model after selection to obtain corresponding result data, and outputs the result data to the terminal through the Paas platform.

The embodiment of the present invention further provides a model online deployment apparatus, as shown in fig. 4, including:

Further, the TensorFlow model training module is also used for saving the trained model as a pb file saved _ model.

Further, the package module further includes:

Further, the apparatus further comprises:

The readable storage medium may be a computer storage medium or a communication medium. Communication media includes any medium that facilitates transfer of a computer program from one place to another. Computer storage media may be any available media that can be accessed by a general purpose or special purpose computer. For example, a readable storage medium is coupled to the processor such that the processor can read information from, and write information to, the readable storage medium. Of course, the readable storage medium may also be an integral part of the processor. The processor and the readable storage medium may reside in an Application Specific Integrated Circuits (ASIC). Additionally, the ASIC may reside in user equipment. Of course, the processor and the readable storage medium may also reside as discrete components in a communication device. The readable storage medium may be a read-only memory (ROM), a random-access memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

The present invention also provides a program product comprising execution instructions stored in a readable storage medium. The at least one processor of the device may read the execution instructions from the readable storage medium, and the execution of the execution instructions by the at least one processor causes the device to implement the methods provided by the various embodiments described above.

In the above embodiments of the terminal or the server, it should be understood that the Processor may be a Central Processing Unit (CPU), other general-purpose processors, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A model-on-line deployment method, comprising:

2. The model in-line deployment method of claim 1,

receiving training data, training a model based on TensorFlow according to the training data, and obtaining the trained model, wherein the training data comprises the following steps:

3. The model in-line deployment method of claim 1,

saving the trained model based on a preset path, and packaging the model, wherein the model has an alias and a version number corresponding to the model, and the method comprises the following steps:

generating a Docker environment of TensorFlow serving;

4. The model in-line deployment method of claim 2,

wherein the TensorFlow serving respectively comprises two access modes of RESTful and GRPC.

5. The model in-line deployment method of claim 1,

after the step of deploying the trained model and container to the infrastructure of the Paas platform based on Kubernetes, the method further comprises:

receiving and processing data by the Paas platform;

the container outputs the result data based on the Paas platform.

6. A model online deployment apparatus, comprising:

7. The model inline deployment device of claim 6,

the TensorFlow model training module is also used for saving the trained model as a pb file saved _ model.

8. The model inline deployment device of claim 6,

the package module further includes:

9. The model inline deployment device of claim 6,

the device further comprises:

10. A readable storage medium, in which a computer program is stored which, when being executed by a processor, is adapted to carry out the method of any one of claims 1 to 5.