CN117950765A

CN117950765A - Cloud deep learning model calling method, reasoning method and device based on RPC framework

Info

Publication number: CN117950765A
Application number: CN202410202174.XA
Authority: CN
Inventors: 刘骁哲
Original assignee: Kyland Technology Co Ltd
Current assignee: Kyland Technology Co Ltd
Priority date: 2024-02-23
Filing date: 2024-02-23
Publication date: 2024-04-30

Abstract

The application relates to a cloud deep learning model calling method based on an RPC framework, which is applied to a client, and comprises the following steps: acquiring image data, wherein the image data comprises n channels of data, and n is an integer greater than or equal to 2; assembling the image data into a message according to a message structure definition file, wherein an identification field in the message indicates that n fields are used for the message, and n channels of image data are loaded in the n fields of the message; and serializing the assembled message and sending the serialized message to a server so as to call a deep learning model deployed at a cloud server to infer the image data. The cloud deep learning model reasoning method based on the RPC framework, the calling device and the reasoning device are correspondingly provided. The application is suitable for the remote calling reasoning process aiming at RGB data.

Description

Cloud deep learning model calling method, reasoning method and device based on RPC framework

Technical Field

The application relates to a remote calling technology and a technology in the field of artificial intelligence, in particular to a cloud deep learning model calling method, an reasoning method, a calling device and a reasoning device based on an RPC framework.

Background

Remote procedure call (Remote Procedure Call, RPC) refers to a request for a service from a remote computer program over a network, in particular, RPC allows the program to call a remote service in a locally invoked manner without knowing the protocol of the underlying network technology. Therefore, the RPC framework can be well applied to the deployment of a neural network algorithm model (simply called an inference model in the application) on a server to realize the inference service, so that a Client (Client) can remotely call the inference service provided by the corresponding inference model in an RPC mode. Based on RPC, google (Google) corporation developed the gRPC, which is an open source and generic RPC framework.

At present, when the RPC framework is applied to image-based reasoning (since a video is composed of a frame of image, the image processing and the like in the present application include the processing of the video unless otherwise specified), data interaction between a client and a server is mostly matrix data corresponding to a picture, especially matrix data of three channels of RGB. The RPC framework aims at the data interaction of images, and has the following technical problems:

At present, when the reasoning model is remotely invoked based on the RPC framework, the transmission of the image data of the RGB three channels is not optimized, the client transmits the image data as whole data to the server, the RGB three channels are not independently distinguished for transmission, the server is more difficult to interpret the received and inversely sequenced data after serialization in the transmission process, the RGB three channels of the data are difficult to distinguish, the adaptation degree of the reasoning model at the server side to the received image data is poor, and the accuracy or the reasoning efficiency of the image data reasoning are further reduced.

Therefore, how to provide an inference scheme based on an RPC framework to adapt to multi-channel image data transmission such as RGB three channels, so as to facilitate the receiving of each channel data by the inference model of the invoked server is a technical problem to be solved.

Disclosure of Invention

In view of the above problems in the prior art, the application provides a cloud deep learning model calling method, an reasoning method, a calling device and a reasoning device based on an RPC framework, which are used for adapting to the transmission of multi-channel image data and facilitating the receiving of each channel data by a reasoning model of a server.

In order to achieve the above objective, a first aspect of the present application provides a cloud deep learning model calling method based on an RPC framework, which is applied to a client, and includes:

Acquiring image data, wherein the image data comprises n channels of data, and n is an integer greater than or equal to 2;

Assembling the image data into a message according to a message structure definition file, wherein an identification field in the message indicates that n fields are used for the message, and n channels of image data are loaded in the n fields of the message;

and serializing the assembled message and sending the serialized message to a server so as to call a deep learning model deployed at a cloud server to infer the image data.

By the method, the device and the system, the n fields are used for indicating the transmitted message through the identification field in the message, and the n fields of the message are filled with the n-channel image data, so that the n-channel data contained in the image data can be transmitted to the server through the n fields, the server can conveniently analyze the multi-channel data according to the n-channel data, and the multi-channel data can be provided for the input of the corresponding multi-channel of the reasoning model on the server.

As a possible implementation manner of the first aspect, the message further includes: the name identification of the called reasoning model and the data input interface identification of the called reasoning model;

The name identification of the called inference model is used for enabling the server to determine the inference model to be called according to the name identification;

the data input interface identification of the called inference model is used for enabling the server to determine the input interface of the called inference model for acquiring data according to the identification so as to receive the n-channel data.

From the above, the message structure can be flexibly defined according to the content required by the inference model to be called remotely. For example, if multiple inference models are provided on the server, the invoked inference model may be determined by the name identification, and an interface representing the inference model to obtain data may be identified by the interface, so that the data may pass the inference model.

As a possible implementation manner of the first aspect, when the deep learning inference model to be invoked by the client has multiple functions, the message further includes: the function identification of the called inference model is used to enable the server to determine therefrom the inference function to be used by the called inference model.

From the above, if a certain inference model is provided on the server, a plurality of functions can be provided at the same time, the function of the inference model can be determined by the function identifier.

As a possible implementation manner of the first aspect, the serialized messages are sent in a topic, and each message is assigned a unique key, so that the server can sequentially receive the serialized messages of the topic to which it subscribes.

By the method, the ordered stream transmission of the serialized messages is realized, and the accuracy of data transmission is improved. Specifically, the jafka protocol can be adopted to perform orderly flow form transmission on the serialized messages, and compared with the existing RPC or gRPC, the data transmission is performed in a long link mode, the jafka can realize real-time bidirectional data interaction between the client and the server, and the real-time data interaction can enable the client to obtain the response of the server in real time, such as obtaining an inference result, and the like, so that the access speed of an inference model in the server is increased.

The second aspect of the application provides a cloud deep learning model reasoning method based on an RPC framework, which is applied to a server and comprises the following steps:

receiving a serialized message sent by a client and performing inverse serialization;

Analyzing the deserialized message according to a message structure definition file, wherein an identification field of the analyzed deserialized message indicates that n fields are used for the message, and n fields of the message are filled with n channels of image data;

and inputting the image data of the n channels into the n channels of the deep learning reasoning model called by the client so as to call the deep learning model to reason the image data.

By the method, the device and the system, multi-channel data contained in the image data can be transmitted through different fields, so that the server can analyze the multi-channel data to input corresponding multi-channels of the inference model. Because the image reasoning is applicable to the multichannel reasoning model, the cloud depth learning model reasoning method based on the RPC framework is applicable to the transmission of multichannel images, such as RGB three-channel images and the like.

As a possible implementation manner of the second aspect, the message further includes a name identifier of the called inference model and a data input interface identifier of the called inference model;

inputting the image data of the n channels into the n channels of the depth learning reasoning model called by the client to call the depth learning model to reason the image data, wherein the method comprises the following steps:

Determining a deep learning reasoning model to be called by the client according to the name identification of the reasoning model in the message; determining an input interface for acquiring data, which is used for acquiring the deep learning reasoning model to be called by the client, according to the data input interface identification of the called reasoning model;

And inputting the image data of the n channels to an input interface of a deep learning reasoning model corresponding to the name identifier, and calling the deep learning model to reason the image data.

As one possible implementation manner of the second aspect, when the deep learning inference model to be called by the client has multiple functions, the message further includes a function identifier of the called inference model, and determining, according to the name identifier of the inference model in the message, the deep learning inference model to be called by the client includes:

And determining the deep learning reasoning model to be called by the client and the reasoning function to be used according to the name identification of the reasoning model and the function identification of the reasoning model in the message.

As a possible implementation manner of the first or second aspect, the image data includes n-channel data including one of the following:

The image data comprises two-channel data, wherein the two-channel data comprises a gray scale channel and an A channel;

the image data comprises three channels of data, wherein the three channels comprise RGB channels or YUV channels;

The image data includes four-channel data including RGBA channels or YUVA channels;

wherein the A channel is a channel indicating transparency.

By the method, the value of n can be adaptively set according to the number of channels of the image data of the client, so that the server can adaptively use the corresponding number of arrays.

The third aspect of the present application provides a cloud deep learning model calling device based on an RPC framework, which is applied to a client, and includes:

an acquisition unit configured to acquire image data, where the image data includes n channels of data, and n is an integer greater than or equal to 2;

An assembling unit, configured to assemble the image data into a message according to a message structure definition file, where an identification field in the message indicates that n fields are used in the message, and n fields of the message are loaded with n channels of image data;

And the sending unit is used for serializing the assembled message and sending the serialized message to the server so as to call a deep learning model deployed at the cloud server to infer the image data.

The fourth aspect of the present application provides a cloud deep learning model reasoning device based on an RPC framework, which is applied to a server, and includes:

the receiving unit is used for receiving the serialized information sent by the client and performing inverse serialization;

An parsing unit, configured to parse the deserialized message according to a message structure definition file, where an identification field of the parsed deserialized message indicates that n fields are used for the message, and n fields of the message are loaded with n channels of image data;

And the calling unit is used for inputting the image data of the n channels into the n channels of the depth learning reasoning model called by the client so as to call the depth learning model to reason the image data.

The fifth aspect of the present application provides an reasoning method based on an RPC framework, including:

executing the cloud deep learning model calling method based on the RPC framework according to the first aspect through the client, and sending image data comprising n-channel data;

and executing the cloud deep learning model reasoning method based on the RPC framework in the second aspect through the server, obtaining n-channel data included in the image data, and providing the n-channel data for n-channel input of a reasoning model on the server so as to be inferred by the reasoning model.

A sixth aspect of the present application provides an RPC framework based reasoning system, comprising:

The client is used for executing the cloud deep learning model calling method based on the RPC framework in the first aspect so as to send image data comprising n-channel data;

The server is configured to execute the RPC framework-based cloud deep learning model reasoning method according to the second aspect, so as to receive n-channel data included in the image data, and provide the n-channel data to an n-channel input of a reasoning model on the server, so as to make reasoning by the reasoning model.

Drawings

Fig. 1 is a flowchart of a cloud deep learning model calling method based on an RPC framework according to a first embodiment of the present application;

Fig. 2 is a flowchart of an inference method of a cloud deep learning model based on an RPC framework according to a second embodiment of the present application;

FIG. 3 is a flow chart of an reasoning method based on the RPC framework provided by the fourth embodiment of the application;

fig. 4 is a schematic diagram of a cloud deep learning model calling device based on an RPC framework according to a fifth embodiment of the present application;

fig. 5 is a schematic diagram of a cloud deep learning model reasoning device based on an RPC framework according to a sixth embodiment of the present application;

Fig. 6 is a schematic diagram of an RPC framework based reasoning system provided by a seventh embodiment of the present application;

FIG. 7 is a schematic diagram of a computing device provided by an eighth embodiment of the application.

It should be understood that in the foregoing structural schematic diagrams, the sizes and forms of the respective block diagrams are for reference only and should not constitute an exclusive interpretation of the embodiments of the present invention. The relative positions and inclusion relationships between the blocks presented by the structural diagrams are merely illustrative of structural relationships between the blocks, and are not limiting of the physical connection of embodiments of the present invention.

Detailed Description

The technical scheme provided by the application is further described below by referring to the accompanying drawings and examples. It should be understood that the system structure and the service scenario provided in the embodiments of the present application are mainly for illustrating possible implementation manners of the technical solutions of the present application, and should not be interpreted as the only limitation to the technical solutions of the present application. As one of ordinary skill in the art can know, with the evolution of the system structure and the appearance of new service scenarios, the technical scheme provided by the application is applicable to similar technical problems.

It should be understood that the reasoning scheme based on the RPC framework provided by the embodiment of the present application includes a cloud deep learning model calling method, a reasoning method, a calling device, a reasoning device, a system, a computing device and a storage medium based on the RPC framework. Because the principles of solving the problems in these technical solutions are the same or similar, in the following description of the specific embodiments, some repetition is not described in detail, but it should be considered that these specific embodiments have mutual references and can be combined with each other.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. If there is a discrepancy, the meaning described in the present specification or the meaning obtained from the content described in the present specification is used. In addition, the terminology used herein is for the purpose of describing embodiments of the application only and is not intended to be limiting of the application. For the purpose of accurately describing the technical content of the present application, and for the purpose of accurately understanding the present application, the following explanation or definition is given for terms used in the present specification before the explanation of the specific embodiments:

1) RPC (Remote Procedure Call) remote procedure call is a technique for requesting services from a remote computer program over a network without requiring knowledge of the protocols of the underlying network technology. gRPC is an open source and generic RPC framework developed by Google (Google) corporation. The RPC of the present application includes gRPC, unless specifically stated otherwise.

2) Basic calling procedure of RPC: when a client wants to initiate a remote call, calling the client-stub (client stub) in a local call mode; the client-stub is responsible for assembling the called interfaces, methods and parameters into a message body capable of carrying out network transmission and transmitting the message body to the Server; the remote Server receives the message body, decodes the message body by a Server-stub (Server stub), then calls the service of the local Server according to the decoding result by the Server stub, receives the result of executing and returning the local service, packages the result into a message and sends the message to the client.

3) Serialization and deserialization: the process of converting an object into a binary stream is called serialization and the process of converting a binary stream into an object is called deserialization. Taking the process that the client transmits the parameter value to the remote server as an example, the explanation is as follows: in local call, the client only needs to press the parameters to the stack, then the function is read by the client in the stack, but in remote process call, the client is a different process from the server, the parameters cannot be transferred through the memory, the client is required to convert the parameters into a binary byte stream (only binary data can be transmitted in the network) at this time, namely, the parameters are serialized, and after the parameters are transferred to the server, the server converts the byte stream into a format which can be read by the client, namely, the byte stream is deserialized.

4) Protobuf protocol: the method is a platform language independent message serialization protocol, and compared with the traditional json and xml, the method has smaller space after the sequence, but cannot be self-interpreted, and can be used for deserializing by combining with an additional proto definition file.

5) Proto definition file (.proto): protobuf uses the proto file to predefine the message format (e.g., define the structure of the message), and the data packets are encoded and decoded according to the message format defined by the proto file.

6) Jafka: an open-source, cross-language, distributed messaging system may be used to transmit ordered message streams, in the present application, for transmitting protobuf-serialized messages. The basic transmission process of Jafka is as follows:

At the sender, the data is sent as messages using Jafka producers and on a topic, and each message is assigned a unique key so that they can be processed sequentially at the receiver.

At the receiving end, the consumer receives the messages for a topic to which it subscribes using Jafka, and because of the keys used in the messages, the messages arrive in the order they were sent.

7) Json (JavaScript Object Notation) format: a generic, lightweight data exchange format with compact description of the data format uses a text format that is completely language independent to store and represent data. The Json format is easy to read and write, and also easy to machine parse and generate.

In the reasoning scheme based on the RPC framework, the RPC framework comprises clients and servers for providing reasoning services. Wherein the client may obtain an image, such as an image entered or imported by a user, or an image transmitted from a third party device to the client. The Client can call an inference model on the server based on the RPC framework, and concretely, the Client can remotely call an inference service of the inference model on the server to infer the image so as to obtain an inference result. The reasoning is to perform image recognition, or to perform key frame selection of an image, or to generate text based on an image, or to perform image generation voice, for example. The function of a specific inference is related to the function of the model on the invoked server.

The present application will be described in detail below with reference to the accompanying drawings.

The first embodiment of the application provides a cloud deep learning model calling method based on an RPC framework, which is applied to a client. As shown in the flow chart of fig. 1, the method comprises the steps of:

S11: image data is acquired, wherein the image data comprises n channels of data, and n is an integer greater than or equal to 2.

In some embodiments, the manner in which the client obtains the image data includes: the image data directly input or imported by the user, or the image data received by the client from a third party device (such as a mobile phone, a PAD, a computer, etc. in communication with the client), or an image generated by the client itself (such as an image acquired by a camera of the client itself), etc.

In some embodiments, the image data comprising n-channel data comprises at least one of:

1) The image data includes two-channel data including a gray scale channel and an a-channel, or luminance and an a-channel. Wherein the A channel is a channel indicating transparency, also called Alpha channel. In some embodiments, the data for the A-channel may be a single value, a value representing the overall transparency of the image. In other embodiments, the data of the a channel may be different values corresponding to each pixel of the image, i.e., each pixel has its own transparency value, so that different transparency at different positions of the image may be achieved.

2) The image data includes three channels of data. In some embodiments, the three channels may include an RGB channel, or a YUV channel. Wherein the RGB channel refers to a three-channel formed by three primary colors (Red-Green-Blue), and the YUV channel refers to a three-channel formed by three components. Still other three channels are also known as HSB (Hu-Saturation-Brightness) three channel data, and so on.

3) The image data includes four-channel data including RGBA channels or YUVA channels. The other four channels are also CMYK (subtractive color mode, or print mode) channel data, etc.

The above is merely exemplary, and is not limited to the number of channels, for example, the CMYK four channels in combination with the a channel, and five channels of data can be formed.

S13: and assembling the image data into a message according to a message structure definition file, wherein an identification field in the message indicates that the message uses n fields, and n channels of image data are loaded in the n fields of the message.

In some embodiments, the content defined by the definition file of the message structure includes a first structure used for transmitting image data, the first structure including n first fields and an identification field. The identification field is used to distinguish the first structure, for example, when n is 3, the identification field value may be 1, when n is 4, the identification field value may be 2, and when n is 0, the identification field value may represent a single channel that does not use the first structure or represents that n is 1, for example, a single channel corresponding to only a gray channel image, or a single channel corresponding to normal data.

In some embodiments, the image data is three-channel RGB data, n has a value of 3, and the three first fields included in the first structure defined by the definition file of the message structure are r_group, g_group, and b_group fields. The identification field is a default field, and definition of the first structure is indicated when the default value is 1.

In some embodiments, the content defined by the definition file of the message structure may further include at least one of:

1) The name identification of the called inference model is used for enabling the server to determine the inference model to be called according to the name identification. If multiple inference models are provided on the server, the server can determine which inference model is invoked through the name identifier in the message sent by the client, for example, the name identifier can be TensorFlow, tensorFlow, CNN1 or the like, and different inference models on the server are respectively identified.

2) The function identification of the called inference model is used for enabling the server to determine the inference function to be used of the called inference model according to the function identification. When a certain inference model is provided on the server, a plurality of functions can be provided at the same time, the server can determine which specific function of the inference model is called through the function identification in the message sent by the client. For example, some TensorFlow models may have functions at the same time including: the method comprises the steps of identifying a target frame of key information in an image (the target frame comprises the image key information), identifying an image category (such as a category of people, cats, dogs, trees and the like) in the target frame, generating a text function based on the category, and the like, wherein each function corresponds to a function identifier, and the server can determine which function of the inference model is called based on the function identifier in a message sent by a client. In some embodiments, the content defined by the definition file of the message structure may include a plurality of function identifiers, that is, a plurality of function identifiers are included in the message sent by the client, which represents a plurality of functions that call the inference model, so as to obtain a plurality of inference results returned by the server.

3) The data input interface identification of the called inference model is used for enabling the server to determine the input interface of the called inference model for acquiring data so as to receive the n-channel data. The interface identifier is used for indicating an interface of the inference model on the server for acquiring data, and the server can transmit the n-channel data to the inference model through the interface corresponding to the identifier by the data input interface identifier sent by the client.

The above is merely exemplary, and is not limited to the above, and the content included in the message structure may be flexibly defined according to need.

In some embodiments, the definition file of the message structure may be a protobuf structure definition file described using json. The structure definition can be flexibly carried out by describing with json.

In some embodiments, the assembling into a message specifically includes: and writing the data of the n channels included in the image data into n first fields in a message respectively, and assigning an identification field in the message to indicate that the n first fields are used.

In some embodiments, when the image data is assembled into the message, if the corresponding image data is three-channel RGB data, and three first fields defined by the definition file of the message structure are r_group, g_group, and b_group fields, the RGB three-channel data of the image data are written into the r_group, g_group, and b_group fields, respectively, and the default value is assigned to be 1. In other embodiments, when the non-image data is conventional data, the default value is assigned to 0.

In some embodiments, when the definition file of the message structure further includes a field of name identification of the inference model, a field of function identification, a field of data input interface identification, etc., the process of assembling the message further includes filling corresponding information into corresponding fields in the message structure body.

S15: and serializing the assembled message and sending the serialized message to a server so as to call a deep learning model deployed at a cloud server to infer the image data.

In some embodiments, the serialized messages are sent in a topic and each message is assigned a unique key to enable the server to receive the serialized messages for that topic to which it subscribes in order. In some embodiments, the serialized message may be transmitted in an orderly streaming manner using the jafka protocol when the serialized message is transmitted. For example, the client uses Jafka the producer to send serialized messages in a topic and each message is assigned a unique key so that the server can receive the sequence of topics to which it subscribes in order.

In some embodiments, step S15 may further include: and receiving the reasoning result returned by the server, and displaying the reasoning result.

When the method is implemented based on the RPC framework, the step S11 is implemented at a client, for example, implemented by an Application (APP) of the client, and the steps S13 to S15 may be implemented by a client_stub (client stub) of the client. For example, when an application of the client side remotely invokes an inference function of the server, the client side invokes the client-stub in a local invoking manner, and the client-stub executes steps S13-S15, and is responsible for assembling a message, serializing and transmitting the message to the server by using the name identifier, the function identifier, and/or the data input interface identifier of the invoked inference model and the image data.

The second embodiment of the application provides a cloud deep learning model reasoning method based on an RPC framework, which is applied to a server side. As shown in the flow chart of fig. 2, the method comprises the steps of:

s21: and receiving the serialized information sent by the client and deserializing the serialized information.

In some embodiments, the server receives the serialized message sent by the client in step S15 in the first embodiment, and then the server deserializes the serialized message.

S23: and analyzing the deserialized message according to the definition file of the message structure, wherein the identification field of the analyzed deserialized message indicates that n fields are used for the message, and n fields of the message are filled with n channels of image data.

The content defined by the definition file of the message structure may be specifically referred to the description in the first embodiment, and will not be repeated here. The definition file of the message structure at the server side is the same as the definition file of the message structure at the client side, so that the message after reverse serialization can be correctly analyzed.

In some embodiments, a definition file of a message structure may be sent to the server and the client in advance. In other embodiments, the client may acquire the definition file of the message structure and then send it to the server, or vice versa.

In some embodiments, when parsing, the parsed identification field (default field) has a value of 1, based on which three first fields are known as r_group, g_group, b_group fields, so that RGB three-way data respectively recorded in the r_group, g_group, b_group fields are parsed.

In some embodiments, other information may also be parsed from other content defined by the definition file of the message structure, such as name identification, function identification, data input interface identification, etc. of the inference model that is invoked.

After the analysis in this step, the image data with n channels of data is obtained.

S25: and inputting the image data of the n channels into the n channels of the deep learning reasoning model called by the client so as to call the deep learning model to reason the image data.

For example, according to the name identification, function identification, data input interface identification, etc. of the analyzed inference model, the corresponding inference model, or a certain function, data input interface, etc. of a certain inference model is called to provide n-channel data to the inference model, and the inference is performed by the inference model.

In some embodiments, the inference model on the server builds an array (or matrix) of n-channels based on the n-channel data to receive the n-channel data, respectively, the array of n-channels being an input to the inference model.

In some embodiments, the inference model on the server may be a TensorFlow model. In some embodiments, where the solution n is 3, the TensorFlow model includes a function to infer three channels of image data, three channels of data can be received through three arrays that would be input to the TensorFlow model, where the inference function of the TensorFlow model is, for example: a function of identifying a target frame of key information in an image (the target frame includes the image key information), a function of identifying a category of the image in the target frame (for example, a category of a person, cat, dog, tree, etc.), a function of generating a piece of text based on the category, and the like.

In some embodiments, after step S25, the reasoning results are returned to the client.

When the method is implemented based on the RPC framework, the steps S21 to S23 may be implemented by a server_stub (server stub), and the reasoning process in the step S25 is implemented by a reasoning model of the server. For example, the server receives the serialized message, performs deserialization and parsing by the server-stub (server stub), then invokes the reasoning service of the server local model according to the decoding result, receives the reasoning result of the server local model, packages the reasoning result into the message, and sends the message to the client after serialization.

In some embodiments, the inference model on the corresponding server may be optimized in at least one of the following manners during deployment, so as to increase loading speed, reduce memory overhead, and improve inference efficiency.

1) The model quantization process may include any of the following:

a. Quantization of model parameters (also referred to as weights of the model), e.g., parameters (i.e., the weights) contained in the convolution layer, full-join layer, etc., of the model, may be quantized into a low-precision form, e.g., converting floating-point number parameters into integer representations. For example, the parameters of float32 are linearly mapped to a range representation of 16-bit or 8-bit integers.

B. The quantization of the input data is performed, for example, by preprocessing the image data input to the model, and the preprocessing quantizes the image data into a low-precision form, for example, reduction of pixel values and coarsening of pixel sections.

C. quantification of intermediate feature maps generated during model operation, such as converting feature maps to feature maps of low-precision representation.

D. quantization of intermediate activation functions generated during model operation, such as converting the output of the activation function to a low-precision representation.

2) And packaging the definition file and the related file of the message structure used by the application, and placing the definition file and the related file under a corresponding path of the model for calling, thereby avoiding calling all function libraries of the model.

For example, when a Tensorflow model is on a server, a protobuf stub file for data transmission is found in TensorFlow INFERENCE SERVING source codes, the file and a definition file of a message structure used by the application are packaged into protos.tensorsurface_serving.apis, and the proto gRPC-io tool is put under a folder which can be accessed by itself, and the file can be accessed by itself to import model service through the proto gRPC-io tool, so that the calling of the server to the whole TensorFlow function library is avoided, the introduction of a plurality of unnecessary function libraries is avoided, a large number of nested dependency relations to irrelevant functions are reduced, and the delay cost of reasoning operation is reduced.

3) An inference model supporting CUDA is used, and an NVIDIA graphics card supporting CUDA is used.

CUDA is a parallel computing platform and Application Programming Interface (API) introduced by NVIDIA corporation, and may utilize the parallel processing capabilities of the GPU of the graphics card, thereby having higher computing efficiency and throughput. For example, using Tensorflow models that support CUDA versions, and configuring the environment variables of the CUDA so that the model can find the CUDA path, the CUDA configuration is calculated using the GPU.

The third embodiment of the application provides an reasoning method based on an RPC framework, which is applied to a system formed by a client and a server, and comprises the following steps: executing the cloud deep learning model calling method based on the RPC framework and the optional embodiments by the client to send image data comprising n-channel data; the cloud deep learning model reasoning method based on the RPC framework and the optional embodiments described in the second embodiment are executed through a server to obtain n-channel data included in the image data, and then the n-channel data is provided to n-channel input of a reasoning model on the server to be inferred by the reasoning model.

For a better understanding of the present application, the following describes an reasoning method based on the RPC framework provided in connection with the fourth embodiment of the present application, and further describes the present application in connection with fig. 3. The embodiment comprises a client and a server, wherein the client and the server realize remote call of an inference model on the server by the client based on an RPC framework, and complete the inference of the image.

In this example, the inference model deployed at the server side is TensorFlow model, which has strong generalization and adaptation capability and can support different programming languages to be called in an interface form. The application carries out quantization optimization on model parameters (also called model weight) of the model, quantizes the model parameters from float32bit to int8bit, greatly reduces calculation amount of model reasoning parameters, and improves a plurality of times of reasoning efficiency under the condition of realizing loss of limited precision. Specifically, the quantization method includes: 2% of data is extracted from a training set to serve as a calibration data set, a calibration table is obtained by small batch reasoning on the trained float32 model on the calibration data set, model parameters of the float32bit are mapped to the range of the int8bit in a linear mode through the calibration table, the calculated amount of the parameters is changed into 1/4 of the original calculated amount, and the loss precision is limited through testing, but the reasoning efficiency is greatly improved.

In this example, in order to avoid the server from calling the whole TensorFlow function library, avoid introducing a plurality of unnecessary function libraries, only introduce the definition file of the message structure defined by the present application and the stub file related to data transmission, specifically find the protobuf stub file used for data transmission in TensorFlow INFERENCE SERVING source codes, package the definition file of the message structure used by the present application into protos.

In this example, both the client and the server use the protobuf protocol to sequence and de-sequence messages. The definition file of the redefined message structure (the definition file of the message structure may be abbreviated as the proto definition file) is used in serialization and deserialization, and the proto definition file described in json form is adopted, so that the structural definition of the proto definition file is more flexible. The proto definition file will be described later.

In this example, when the client and the server directly perform the serialized message transmission, the jafka protocol is adopted to realize the ordered stream transmission of the serialized message, so as to replace the transmission of the long connection mode. The bidirectional data flow mode of jafka protocol makes both client and server end send data flow to each other, and both sides can send data to each other simultaneously so as to realize real time interaction.

The proto definition file is redefined to adapt to the characteristic that most of image data is RGB three-channel data, in particular, the field of the data part in the proto definition file is redefined, and the field definition of the data part of the proto definition file is as follows:

Type value	Type name	Usage scenarios
			0	varint	int32,int64,uint32,uint64,sint32,sint64,bool,enum
1	64-bit	fixed64,sfixed64,double
			2	bytes	Length-delimited string,embedded meassages
3	r_group	Image R channel data
			4	g_group	Image G channel data
5	b_group	Image B channel data
			6	32-bit	fixed32,sfixed32,float
7	default	Bool: 0-data type, 1-image matrix

The r_group, g_group and b_group fields form a first structure used for transmitting image data, the identification field is a default field, when the default value is 1, the data type is an image, and when the r_group, g_group and b_group fields are 0, the data type is other types, and the first field is not used or only one field is used.

The aforementioned fields are fields of the data portion of the proto definition file, and the proto definition file further includes other fields, such as a field for writing a name identifier of the called inference model, a function identifier field for writing the called inference model, a field for writing a data input interface identifier of the called inference model, and the like, which may be defined as needed.

The following describes the reasoning method based on the RPC framework of this example by taking the client to acquire the image data and remotely calling TensorFlow model on the server to realize the identification of the image, as shown in the flowchart of fig. 3, and includes the following steps:

S110: a client Application (APP) receives image data imported by a user; the image data is an RGB image data.

S120: the client-side APP calls TensorFlow the reasoning services of the model by locally calling the client stub.

S130: the Client stub is responsible for assembling the called model name, function and image data into a message, and specifically includes:

The Client stub reads the json form proto definition file to obtain a defined message structure, wherein the message structure in this example comprises a model name identification field, a function identification field and a data field, and the structure of the data field is shown in table 1. Then, according to the message structure, the called model name identifier (such as TensorFlow) is filled in the model name identifier field, the identifier of the function of the called model (such as the identifier corresponding to the image recognition function) is filled in the function identifier word, the data of RGB three channels included in the image number is respectively filled in the r_group, g_group and b_group fields of the data field, and the default value is configured to be 1.

S140: the Client stub performs serialization processing on the messages filled in the fields based on the protobuf protocol, and transmits the message sequences to the server in an orderly streaming manner based on the jafka protocol.

S150: the server receives the serialized message based on jafka protocol, and the server stub deserializes the serialized message.

S160: the Server stub reads the json form proto definition file to obtain a defined message structure, and analyzes the deserialized message according to the message structure to obtain a called model name (such as TensorFlow), the function of the called model, three-channel image data and the like.

After the anti-serialized message is based on the anti-serialized message, each field in the anti-serialized message can be identified by comparing each byte bit of the anti-serialized message with each byte bit of the anti-serialized message according to the proto definition file.

The server stores identifiers corresponding to functions of the models, and the called functions of the models can be determined from the function identifier fields through an enumeration method, namely, the function identifiers in the function identifier fields are sequentially compared with the identifiers corresponding to the functions of the inference models stored by the server one by one to determine which function is the inference model, and in the embodiment, the image recognition function is determined.

After each field of the message is sequentially parsed according to the proto definition file, the processing can be performed in a queue mode, namely, the unpacked data is sequentially sent to a queue cache and is sequentially read out from the queue for processing. The queues may be FIFO (first in first out) queues, among others.

S170: and calling the image recognition function of the local TensorFlow model by the Server stub according to the analysis result.

S180: and receiving three-channel data by using a TensorFlow model on the server, specifically, after judging that the default value is 1, creating 3 arrays, respectively filling the data in r_group, g_group and b_group fields, and using the TensorFlow model to execute an image recognition function by taking the data of the 3 arrays as input data to generate an inference result.

S190: the server returns the reasoning result to the client. The server returns the reasoning result to the client by calling the local server stub, the client receives the reasoning result and returns the reasoning result to the client APP by the client stub, and the process is the same as the existing reasoning result returning process and is not repeated.

The fifth embodiment of the present application provides a cloud deep learning model calling device 10 based on an RPC framework, which is applied to a client and can be used to implement the cloud deep learning model calling method based on the RPC framework in the first embodiment. As shown in fig. 4, the apparatus includes:

An acquisition unit 11 configured to acquire image data including n-channel data, n being an integer of 2 or more;

An assembling unit 12, configured to assemble the image data into a message according to a message structure definition file, where an identification field in the message indicates that n fields are used in the message, and n fields of the message are loaded with n channels of image data;

And the sending unit 13 is used for serializing and sending the assembled message to a server so as to call a deep learning model deployed at a cloud server to infer the image data.

The sixth embodiment of the present application provides a cloud deep learning model reasoning device 20 based on an RPC framework, which is applied to a server and can be used to implement the cloud deep learning model reasoning method based on the RPC framework in the second embodiment. As shown in fig. 5, the apparatus includes:

a receiving unit 21, configured to receive the serialized message sent by the client and perform inverse serialization;

An parsing unit 22, configured to parse the deserialized message according to the message structure definition file, where an identification field of the parsed deserialized message indicates that n fields are used for the message, and n fields of the message are loaded with n channels of image data;

And the calling unit 23 is used for inputting the image data of the n channels into the n channels of the depth learning reasoning model called by the client so as to call the depth learning model to reason the image data.

A seventh embodiment of the present application provides an RPC framework-based reasoning system 30 that can be used to implement the RPC framework-based reasoning method in the third embodiment described above. As shown in fig. 6, the reasoning system 30 based on the RPC framework includes a client 31, configured to execute the cloud deep learning model calling method based on the RPC framework according to the first embodiment to send image data including n-channel data; the server 32 is configured to execute the cloud deep learning model reasoning method based on the RPC framework according to the second embodiment, obtain n-channel data included in the image data, and provide the n-channel data to an n-channel input of a reasoning model on the server for reasoning by the reasoning model.

Fig. 7 is a schematic diagram of a computing device 900 according to an eighth embodiment of the present application. The computing device may be used as a cloud deep learning model calling device or an reasoning device based on the RPC framework, execute the method in the first embodiment and the optional embodiments thereof, or execute the method in the second embodiment and the optional embodiments thereof, and may be a terminal, or may be a chip or a chip system inside the terminal. As shown in fig. 7, the computing device 900 includes: processor 910, memory 920, and communication interface 930.

It should be appreciated that the communication interface 930 in the computing device 900 shown in fig. 7 may be used to communicate with other devices and may include, in particular, one or more transceiver circuits or interface circuits.

Wherein the processor 910 may be coupled to a memory 920. The memory 920 may be used to store the program codes and data. Accordingly, the memory 920 may be a storage unit internal to the processor 910, an external storage unit independent of the processor 910, or a component including a storage unit internal to the processor 910 and an external storage unit independent of the processor 910.

Optionally, computing device 900 may also include a bus. The memory 920 and the communication interface 930 may be connected to the processor 910 through a bus. The bus may be a peripheral component interconnect standard (PERIPHERAL COMPONENT INTERCONNECT, PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, or the like. The buses may be classified as address buses, data buses, control buses, etc. For ease of illustration, an unbiased line is shown in FIG. 7, but does not represent only one bus or one type of bus.

It should be appreciated that in embodiments of the present application, the processor 910 may employ a central processing unit (central processing unit, CPU). The processor may also be other general purpose processors, digital Signal Processors (DSP), application SPECIFIC INTEGRATED Circuits (ASIC), off-the-shelf programmable gate arrays (field programmable GATE ARRAY, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. Or the processor 910 may employ one or more integrated circuits for executing associated programs to perform techniques provided by embodiments of the present application.

The memory 920 may include read only memory and random access memory and provide instructions and data to the processor 910. A portion of the processor 910 may also include nonvolatile random access memory. For example, the processor 910 may also store information of the device type.

When the computing device 900 is running, the processor 910 executes computer-executable instructions in the memory 920 to perform any of the operational steps of the method of the first embodiment described above, and any of the alternative embodiments, or to perform any of the operational steps of the method of the second embodiment described above, and any of the alternative embodiments.

It should be understood that the computing device 900 according to the embodiment of the present application may correspond to the respective main bodies performing the methods according to the first or second embodiment of the present application, and that the above and other operations and/or functions of the respective modules in the computing device 900 are respectively for implementing the respective flows of the methods of the present embodiment, and are not described herein for brevity.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.

In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The embodiments of the present application also provide a computer-readable storage medium having stored thereon a computer program for executing the above-described method when executed by a processor, the method comprising at least one of the aspects described in the respective embodiments above.

The computer storage media of embodiments of the application may take the form of any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

In addition, the terms "first, second, third, etc." or module a, module B, module C, etc. in the description and the claims are used merely to distinguish similar objects from a specific ordering of the objects, it being understood that the specific order or sequence may be interchanged if allowed to enable embodiments of the application described herein to be practiced otherwise than as illustrated or described.

In the above description, reference numerals indicating steps such as S110, S120 … …, etc. do not necessarily indicate that the steps are performed in this order, and the order of the steps may be interchanged or performed simultaneously as the case may be.

The term "comprising" as used in the description and claims should not be interpreted as being limited to what is listed thereafter; it does not exclude other elements or steps. Thus, it should be interpreted as specifying the presence of the stated features, integers, steps or components as referred to, but does not preclude the presence or addition of one or more other features, integers, steps or components, or groups thereof. Thus, the expression "a device comprising means a and B" should not be limited to a device consisting of only components a and B.

Reference in the specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the application. Thus, appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment, but may. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments as would be apparent to one of ordinary skill in the art from this disclosure.

Note that the above is only a preferred embodiment of the present application and the technical principle applied. It will be understood by those skilled in the art that the present application is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the application. Therefore, while the application has been described in connection with the above embodiments, the application is not limited to the above embodiments, but may include many other equivalent embodiments without departing from the spirit of the application, which fall within the scope of the application.

Claims

1. The cloud deep learning model calling method based on the RPC framework is applied to a client and is characterized by comprising the following steps:

And serializing the assembled message and sending the serialized message to a cloud server so as to call a deep learning model deployed at the cloud server to infer the image data.

2. The method of claim 1, wherein the message further comprises: the name identification of the called reasoning model and the data input interface identification of the called reasoning model;

3. The method of claim 2, wherein when the deep learning inference model to be invoked by the client has a plurality of functions, the message further comprises: the function identification of the called inference model is used to enable the server to determine therefrom the inference function to be used by the called inference model.

4. A method according to claim 3, wherein the serialized messages are sent in a topic and each message is assigned a unique key to enable the server to receive the serialized messages for the topic to which it subscribes in order.

5. The cloud deep learning model reasoning method based on the RPC framework is applied to a server and is characterized by comprising the following steps:

6. The method of claim 5, wherein the message further comprises a name identification of the invoked inference model and a data input interface identification of the invoked inference model;

7. The method of claim 6, wherein when the client has multiple functions for the deep learning inference model to be invoked, the message further includes a function identifier of the invoked inference model, and determining the deep learning inference model to be invoked by the client based on the name identifier of the inference model in the message includes:

8. The method of any of claims 1 to 7, wherein the image data comprises n-channel data comprising one of:

wherein the A channel is a channel indicating transparency.

9. Cloud deep learning model calling device based on RPC framework is applied to the customer end, and is characterized in that includes:

10. Cloud deep learning model reasoning device based on RPC frame is applied to the server, and is characterized in that includes: