CN111488197A

CN111488197A - Deep learning model deployment method and system based on cloud server

Info

Publication number: CN111488197A
Application number: CN202010292272.9A
Authority: CN
Inventors: 张奎; 陈清梁; 王超; 吴磊磊; 蔡巍伟
Original assignee: Zhejiang Xinzailing Technology Co ltd
Current assignee: Zhejiang Xinzailing Technology Co ltd
Priority date: 2020-04-14
Filing date: 2020-04-14
Publication date: 2020-08-04

Abstract

The invention relates to a deep learning model deployment method and a deep learning model deployment system based on a cloud server, wherein the deep learning model deployment method comprises the following steps: a. acquiring input image data, and preprocessing the input image data; b. detecting target information in the preprocessed data information; c. b, post-processing the candidate detection result in the step b; d. carrying out target cutting on the detection result after post-processing; e. and extracting the clipped target data, and performing attribute prediction on the extracted target data according to a target attribute extraction model. According to the scheme of the invention, a memory-based file system such as tmpfs is used as a data cache module, so that the data cache pressure of a client is reduced, and the high-efficiency data reading capability of a server is ensured; by using a gRPC remote procedure call framework, data can be serialized into binary codes through protobuf, which can greatly reduce the data volume to be transmitted, thereby greatly improving the performance and facilitating the support of streaming communication.

Description

Deep learning model deployment method and system based on cloud server

Technical Field

The invention relates to the field of computer vision, in particular to a deep learning model deployment method and system based on a cloud server.

Background

Deep learning is a new research direction in the field of machine learning, and many results are obtained in the related fields of image recognition, speech recognition, natural language processing and the like. However, the deep learning model has complex calculation and low efficiency, and a general production environment has clear performance indexes and also has space requirements, such as limited resources like memory.

The cloud server is mainly a simple, safe, reliable and efficient computing service with certain processing capacity, is mainly oriented to medium and small enterprises and users, and provides infrastructure services based on the Internet. Deployment of deep learning application models to cloud servers is not subject to space constraints like embedded devices. The cloud storage server is applied through the cluster, and the distributed file system improves the data storage and service access functions outwards together, so that the data safety is ensured, and the storage space is saved.

The prior art is as follows: 201910765317.7, which use cloud servers for model deployment but do not fully utilize server resources and are inefficient, such as

Using an opencv algorithm library to carry out image coding, decoding and preprocessing;

the static communication mode of restful API is still used, and the performance is poor relative to the performance of the gPC;

the image data does not use a temporary file system of tmpfs, and the data throughput capacity is low;

the int8 inference model of tensorrt is not used, and the speed is faster under the condition of similar performance.

Disclosure of Invention

The invention aims to solve the problems and provides a deep learning model deployment method and a deep learning model deployment system based on a cloud server.

In order to achieve the above object, the present invention provides a deep learning model deployment method based on a cloud server, including the following steps:

a. acquiring input image data, and preprocessing the input image data;

b. detecting target information in the preprocessed data information;

c. b, post-processing the candidate detection result in the step b;

d. carrying out target cutting on the detection result after post-processing;

e. and extracting the clipped target data, and performing attribute prediction on the extracted target data according to a target attribute extraction model. .

According to an aspect of the invention, in the step a, the functions of decoding, zooming and cropping the image are realized by defining a computational graph by using an nvidia-dali library implementation capable of accelerating computer vision deep learning application and supporting a custom data input form.

According to one aspect of the invention, in the step b, target information detection is performed in the received preprocessed image information, the resnet18 network process target information detection is adopted, TensorRT is used for acceleration, and int8 precision is used for data calculation.

According to an aspect of the present invention, in the step c, the post-processing of the candidate detection result includes the following steps:

c1. generating a candidate detection frame;

c2. filtering confidence coefficient;

c3. non-maxima suppression.

According to an aspect of the present invention, the step d is implemented by using an nvidia-dali library capable of accelerating computer vision deep learning application, and comprises the following steps:

d1. acquiring a detection frame and a corresponding image;

d2. decoding target data corresponding to the detection frame;

d3. target image data scaling and other pre-processing.

According to an aspect of the invention, in the step e, for the target image data obtained in the step d, other attribute information is obtained through an attribute extraction model;

using the networks to extract attribute information, all networks were subjected to tensorrt acceleration and computed with int8 precision.

In order to achieve the above object, the present invention further provides a deployment system for implementing the deep learning model deployment method based on a cloud server, including:

the gPC client is used for storing the data to a tmpfs file system, calling the gPC server, informing the gPC server of the storage position of the data in the tmpfs file system, and acquiring a processing result returned by the gPC server through an interface;

the gPRC server defines two interfaces, one is used for inputting data, and the other is used for acquiring a processing result;

the tmpfs file system is a temporary file system, a file storage system is distributed from an RAM or an SWAP partition, a gPC client can store a data ratio occupying a memory into the tmpfs file system when calling a gPC server, the storage address of the data is used as a parameter to be transmitted to the gPC server when calling, and the gPC server reads from the path when using the data.

According to one aspect of the invention, the gPC server comprises two callable interfaces, wherein one callable interface pushData comprises a storage position of image data of batch in the tmpfs file system, and two parameters of the size of each image contained in the current batch;

the other calling interface is an interface PullResult for acquiring a processing result, and the interface returns the processing result in a streaming mode.

According to the scheme of the invention, a memory-based file system such as tmpfs is used as a data cache module, so that the data cache pressure of a client is reduced, and the high-efficiency data reading capability of a server is ensured;

the gPRC remote process call framework is used, and data can be serialized into binary codes through protobuf, so that the data volume needing to be transmitted is greatly reduced, the performance is greatly improved, and the streaming communication is conveniently supported;

the nvidia-dali algorithm library is used for preprocessing data by using a cpu/gpu mixed mode, so that the performance is improved;

and calculating an acceleration model by using tensorrt acceleration int8, and improving the model reasoning speed.

The invention actually measures a 4-class target detection model using resnet18, and can reach the processing speed of 1300fps on TeslaT 4; even 2 resnet34 and 2 resnet18 target classification and attribute extraction modules are added, the processing speed of 800fps can be achieved. The scheme is convenient to deploy on different servers in a Docker mode; in addition, the scheme can be suitable for different tasks by replacing a deep learning model and a preprocessing scheme.

Drawings

FIG. 1 schematically represents a flow diagram of a cloud server-based deep learning model deployment method in accordance with the present invention;

fig. 2 schematically shows a composition diagram of a delivery system implementing the deployment method according to the invention.

Detailed Description

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.

In describing embodiments of the present invention, the terms "longitudinal," "lateral," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like are used in an orientation or positional relationship that is based on the orientation or positional relationship shown in the associated drawings, which is for convenience and simplicity of description only, and does not indicate or imply that the referenced device or element must have a particular orientation, be constructed and operated in a particular orientation, and thus, the above-described terms should not be construed as limiting the present invention.

The present invention is described in detail below with reference to the drawings and the specific embodiments, which are not repeated herein, but the embodiments of the present invention are not limited to the following embodiments.

Fig. 1 schematically shows a flowchart of a cloud server-based deep learning model deployment method according to the present invention. As shown in fig. 1, the cloud server based deep learning model deployment method according to the present invention includes the following steps:

a. acquiring input image data, and preprocessing the input image data;

b. detecting target information in the preprocessed data information;

c. b, post-processing the candidate detection result in the step b;

d. carrying out target cutting on the detection result after post-processing;

e. and extracting the clipped target data, and performing attribute prediction on the extracted target data according to a target attribute extraction model.

According to an embodiment of the present invention, in the step a, the preprocessing process is implemented by using an nvidia-dali library, and is an execution engine highly optimized to accelerate the computer vision deep learning application. The method supports a user-defined data input form, and realizes the functions of decoding, zooming, clipping and the like of the image by defining a calculation graph.

According to an embodiment of the present invention, in the step b, taking the used image human body detection as an example, the process defines the following functions:

acquiring data from an input queue;

decoding, and performing mixed processing by using cpu/gpu;

scaling, unifying to the detection specified resolution, wherein the scheme is 416 × 416, and calculating on gpu;

normalization, i.e., the image data is normalized to (0-255) and calculated on gpu using mean and variance processing.

The Tensor (Tensor) data output by the process is directly Tensor data stored on the gpu, and the Tensor data can be directly used by a subsequent detection process, so that the memory and time overhead of copying the data from the cpu to the gpu are reduced.

In the embodiment, the acquisition of the human body screenshot information refers to receiving the preprocessed RGB image information to perform human body detection, and the detection method includes, but is not limited to, general purpose detection networks such as yolo3, maskrnnn, and resnet, and therefore, the networks all adopt resnet18 network process human body detection. All deep learning models were accelerated using TensorRT and data calculations were performed using int8 accuracy.

(TensorRT is mainly based on CUDA and CUDNN neural network inference acceleration engine, the former model is FP32 precision, TensorRT supports FP16 and INT8 calculation, ideal trade-off is achieved by reducing the calculation amount and maintaining the precision, other identical network structures such as convolution + BN layer + excitation layer can be fused for calculation, and the purpose of acceleration is achieved.)

According to an embodiment of the present invention, in the step c, in order to detect the post-processing procedure, the candidate detection result obtained from the queue is further processed, and may be processed on the cpu or the GPU, and in this embodiment, since GPU resources are limited (TeslaT4@15GB,70W), the processing is performed on the cpu, and the method includes the following steps:

c1. generating a candidate detection frame;

c2. filtering confidence coefficient;

c3. non-maxima suppression.

According to an embodiment of the present invention, in the step d, the object clipping is implemented by using an nvidia-dali library capable of accelerating the application of computer vision depth learning, the object clipping cuts the object in the original image according to the detection frame obtained by the post-detection processing process, and performs preprocessing for subsequent attribute calculation, and the process also uses nvidia-dali, which supports that the detection frame is designated to decode only the data of the detection frame, and is more efficient than decoding the complete image. The method comprises the following steps:

d1. acquiring a detection frame and a corresponding image;

d2. decoding target data corresponding to the detection frame;

d3. scaling of the target image data, and other pre-processing.

According to an embodiment of the present invention, in the step e, the target data extraction is to obtain other attribute information from the target image data obtained in the target detection process by using an attribute extraction model. In the embodiment, 4 networks are used together to extract attribute information, and all networks are accelerated by tensorrt and calculated with int8 precision:

1) gender age model, using the resnet34 network;

2) a personnel category model, using the resnet34 network;

3) a clothing style model, using the resnet18 network;

4) clothing rating model, using the resnet18 network.

Fig. 2 schematically shows a composition diagram of a delivery system implementing the deployment method according to the invention. As shown in fig. 2, the deployment system according to the present invention includes a gRPC client, a gRPC server, and a tmpfs file system. In this embodiment, the gRPC functions in this scheme as:

defining an input/output interface of a server (namely, a service for processing data by a deep learning model) by using protobuf 3;

the server side starts the service and specifies the port number, and any client side can use the service through the port number and the interface of the server side.

The client is an application defined by a user, data (including but not limited to image data) is saved in a tmpfs file system, the server is called, the server is told that the data is in the storage position of the tmpfs file system, and a processing result returned by the server is obtained through an interface.

The tmpfs file system is a temporary file system, a file storage system is allocated from the RAM or the SWAP partition, when a client calls a server, data occupying a memory, such as an image, can be stored in the file system, when the client calls the server, only a storage address of the data is needed to be transmitted to the server as a parameter, and when the server uses the data, the data is read from the path, so that the data throughput capacity can be improved, and the processing efficiency of the cloud server is improved.

The service end refers to a service which can be used by any client conforming to the interface predefined by protobuf.

The following description mainly refers to a server, wherein multiprocessing/multithreading uses python multiprocessing, multiprocessing and parallel processing, and data is communicated through a Queue (Queue). The hardware configuration used by the server in the scheme is Intel (R) Xeon (R) Gold 6151 CPU @3.00GHz, Tesla T416 GB70W and 32 GBRAM.

The service defines two interfaces that can be called, and the scheme takes human body detection as an example:

calling an interface, pushData, including two parameters, namely the storage position of image data of a batch in a tmpfs file system, and the size bytes of each image contained in the current batch;

and acquiring an interface PullResult of the processing result, wherein the interface returns the processing result (in a mode of file name + detection result) in a streaming (stream) mode.

According to the scheme of the invention, the memory-based file system of tmpfs is used as a data cache module, so that the data cache pressure of a client is reduced, and the high-efficiency data reading capability of a server is ensured;

The foregoing is merely exemplary of particular aspects of the present invention and devices and structures not specifically described herein are understood to be those of ordinary skill in the art and are intended to be implemented in such conventional ways.

The above description is only one embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A deep learning model deployment method based on a cloud server comprises the following steps:

a. acquiring input image data, and preprocessing the input image data;

b. detecting target information in the preprocessed data information;

c. b, post-processing the candidate detection result in the step b;

d. carrying out target cutting on the detection result after post-processing;

2. The cloud server-based deep learning model deployment method according to claim 1, wherein in the step a, the functions of decoding, zooming and cropping of the image are realized by defining a computation graph, and the functions are realized by using an nvidia-dali library which can accelerate computer vision deep learning application and supporting a custom data input form.

3. The cloud server-based deep learning model deployment method according to claim 1, wherein in the step b, target information detection is performed in the received preprocessed image information, the resnet18 network process target information detection is adopted, acceleration is performed by using TensorRT, and data calculation is performed with int8 precision.

4. The cloud server-based deep learning model deployment method according to claim 1, wherein in the step c, the post-processing of the candidate detection results comprises the following steps:

c1. generating a candidate detection frame;

c2. filtering confidence coefficient;

c3. non-maxima suppression.

5. The cloud server-based deep learning model deployment method according to claim 4, wherein in the step d, the deep learning model deployment method is implemented by using an nvidia-dali library capable of accelerating computer vision deep learning application, and comprises the following steps:

d1. acquiring a detection frame and a corresponding image;

d2. decoding target data corresponding to the detection frame;

d3. target image data scaling and other pre-processing.

6. The cloud server-based deep learning model deployment method according to claim 4, wherein in the step e, other attribute information is obtained for the target image data obtained in the step d through an attribute extraction model;

7. A deployment system for implementing the cloud server-based deep learning model deployment method according to any one of claims 1 to 6, comprising:

8. The deployment system of claim 7, wherein the gRPC server includes two callable interfaces, one of which is PushData, including a storage location of image data of the batch in the tmpfs file system, and two parameters of a size of each image contained in the current batch;