CN115756875A

CN115756875A - Online service deployment method and system of machine learning model for streaming data

Info

Publication number: CN115756875A
Application number: CN202310015610.8A
Authority: CN
Inventors: 张田田; 涂燕晖; 程海博
Original assignee: Shandong Future Network Research Institute Industrial Internet Innovation Application Base Of Zijinshan Laboratory
Current assignee: Shandong Future Network Research Institute Industrial Internet Innovation Application Base Of Zijinshan Laboratory
Priority date: 2023-01-06
Filing date: 2023-01-06
Publication date: 2023-03-07
Anticipated expiration: 2043-01-06
Also published as: CN115756875B

Abstract

The invention provides a streaming data-oriented machine learning model online service deployment method and system, and relates to the field of machine learning model deployment; the method comprises the following steps: constructing a machine learning model online service framework facing to streaming data, wherein the framework comprises an external API (application programming interface) interface of model online service, a streaming data real-time processing channel and distributed model prediction service, and the streaming data real-time processing channel comprises a gRPC (graphics context control protocol) service cluster and message queue service; establishing bidirectional communication connection between the client and the gPC server node according to the gPC bidirectional flow; receiving streaming request data and storing the streaming request data in a message queue service; and monitoring the message queue service, selecting corresponding model prediction when streaming request data is received, writing a prediction result into the message queue service, and pushing the prediction result to the client. The invention provides an asynchronous WEB interface of the machine learning model online service for the outside, receives, caches and processes data in real time, and sends a model prediction result, thereby avoiding invalid blockage of a client.

Description

Online service deployment method and system of machine learning model for streaming data

Technical Field

The invention relates to the technical field of machine learning model deployment, in particular to a method and a system for deploying online services of a machine learning model for streaming data.

Background

Machine learning is being more and more widely applied and deployed, and a closed loop of a machine learning application life cycle can be completed only by using a machine learning model in an easy-to-use, efficient and convenient manner and constructing a model service with an enterprise application value. Common machine learning frameworks in the industry are tensorflow, pyrrch, etc., which generally provide solutions for offline training and corresponding model servitization; the method comprises the steps that an HTTP or gRPC interface is used for providing service to the outside, model online service is provided in a batch processing mode currently, a user sends batch request data to the model service, the model service predicts and sends results to a client, and the method mainly focuses on training and providing prediction by using a static model and historical static data; for example, when a user browses a website, news can be pushed according to the historical behavior data of the user.

In most practical use cases, the user can continue subsequent operations only after the prediction result or effect is available in the mobile application program or displayed on a webpage, so that real-time machine learning is more and more concerned by the use cases; such as using a real-time recommendation model of recent session activity encoded as a feature, a surge price prediction algorithm used in concert ticket booking/carpooling applications, and the like. The real-time machine learning process is streaming data, and the difficulty of processing streaming data is greater, because the data volume is not limited, and the rate and speed of data input are also changed, and the traditional batch processing mode obviously cannot meet the requirement of dynamic real-time data processing.

Batch processing of data, which may be understood as a series of related tasks performed sequentially or in parallel, one after another, is the input to a batch process where data is collected over a period of time. In most cases, both the output data and the input data of a batch process are bounded data. With the rapid development of the internet, various information is increasing explosively, dynamic new data is generated continuously, and the problems of repeated data transmission, low processing speed, long response time and incapability of real-time prediction exist in a batch processing model deployment mode. Batch prediction can degrade user experience without catastrophic consequences such as ad ranking, twitter's trending tag ranking, facebook's news subscription ranking, arrival time estimation, etc.; however, there are applications that can have catastrophic consequences if not predicted online, even become useless, such as high frequency transactions, auto-driving cars, voice assistants, face/fingerprint unlocking of cell phones, elderly fall detection, fraud detection, etc. For fraudulent transactions, if the effect of real-time detection can be achieved, the occurrence of events can be directly prevented. Batch processing is a good choice for scenes which do not need real-time analysis results; especially, when the business logic is very complex and the data volume is large, useful information can be more easily mined from the data. Therefore, when the real-time analysis processing of the application is required, or when the end time of data transmission and the data amount cannot be determined, it is necessary to adopt a framework of stream processing to accomplish this.

Disclosure of Invention

The invention aims to provide a method and a system for deploying online services of a machine learning model for streaming data, which realize the near-real-time processing capability of the streaming data by constructing a distributed service deployment architecture and developing a real-time processing channel with a streaming technology, can process large batches of data, and overcome the defects of strong processing delay, low computing performance and the like of the existing method for deploying online services of the machine learning model.

In order to achieve the above purpose, the invention provides the following technical scheme: a machine learning model online service deployment method facing streaming data comprises the following steps:

constructing a machine learning model online service framework facing to streaming data, wherein the framework comprises a unified model online service external API interface, a streaming data real-time processing channel and a distributed model prediction service; the streaming data real-time processing channel comprises a gPC service cluster and a message queue service; the distributed model prediction service comprises a plurality of machine learning models with model prediction functions; the gPRC service cluster comprises a plurality of gPRC service end nodes;

receiving connection requests from all application clients, and establishing bidirectional communication connection between each application client and each gPC service end node in a gPC service cluster by utilizing gPC bidirectional flow;

each gPC service end node in the gPC service cluster continuously receives streaming request data from each application client and sequentially stores the streaming request data into the message queue service;

monitoring message queue service, selecting a corresponding machine learning model from distributed model prediction service to perform model prediction when new request data is received, and writing a prediction result into the message queue service;

and monitoring the message queue service, and pushing the prediction result to the application client corresponding to the request in real time through an external API (application programming interface) by the unified model online service when receiving the new prediction result.

Further, the external API interface of the unified model online service is provided by a WEB API gateway;

the gPC server end node in the gPC service cluster is in long connection with the application client end and can perform two-way communication, and is used for receiving a streaming data request, caching the streaming data request to a message queue service, monitoring the message queue service, acquiring a prediction result, and then asynchronously pushing the prediction result to the application client end in real time;

the message queue service comprises a request message queue and a reply message queue, wherein the request message queue is used for caching streaming request data, and the reply message queue is used for caching a prediction result.

Further, the application client is a gRPC client, and each gRPC service end node in the gRPC service cluster registers service information to a WEB API gateway;

when a gPC client side initiates a gPC request, the process of establishing a two-way communication connection and interaction with the gPC server node is as follows:

receiving a gPC request of a gPC client and judging a currently available gPC service end node;

the method comprises the steps that a connectable gPC service end node is selected according to a preset load balancing strategy, and node information is sent to a gPC client, so that the gPC client establishes bidirectional communication connection with the gPC service end node according to the node information;

and allocating a unique client ID to the gPC client in bidirectional communication connection with the gPC server node, so that the gPC server node receives the streaming request data sent by the gPC client corresponding to the client ID, and the interaction between the gPC server node and the gPC client is realized.

Further, the specific process that the gRPC server-side node receives the streaming request data and sequentially stores the streaming request data in the message queue service is as follows:

the gPRC server generates a request ID corresponding to each streaming request data according to the time sequence of receiving the streaming request data;

sequentially sending each streaming request data attached with a request ID to a request message queue taking the client ID as an identifier; the streaming request data comprises model input data, a machine learning model of the prediction application and model prediction parameters.

Further, the specific process from the monitoring of the message queue service to the writing of the prediction result into the message queue service is as follows:

monitoring a request message queue, and segmenting a borderless data set into data sets correspondingly processed by a selected machine learning model based on different window modes according to an application scene when new streaming request data is received;

inputting the segmented data set into a machine learning model to obtain a prediction result;

and writing the prediction result into a reply message queue identified by the client ID, and adding a request ID corresponding to the streaming request data in the prediction result so as to recombine the prediction result under the condition that the prediction result is unordered.

The invention also discloses a machine learning model online service deployment system facing to streaming data, which comprises:

the system comprises a construction module, a data processing module and a data processing module, wherein the construction module is used for constructing a machine learning model online service framework facing streaming data, and comprises a unified model online service external API interface, a streaming data real-time processing channel and a distributed model prediction service; the streaming data real-time processing channel comprises a gPC service cluster and a message queue service; the distributed model prediction service comprises a plurality of machine learning models with model prediction functions; the gPRC service cluster comprises a plurality of gPRC service end nodes;

the first receiving module is used for receiving connection requests from all the application clients and establishing bidirectional communication connection between all the application clients and all gPC service end nodes in a gPC service cluster by utilizing gPC bidirectional flow;

the second receiving module is used for continuously receiving the streaming request data from each application client by each gPC server node in the gPC service cluster and sequentially storing the streaming request data into the message queue service;

the first monitoring module is used for monitoring the message queue service, selecting a corresponding machine learning model from the distributed model prediction service to perform model prediction when new request data are received, and writing a prediction result into the message queue service;

and the second monitoring module is used for monitoring the message queue service and pushing the prediction result to the application client corresponding to the request in real time through the external API interface of the unified model online service when receiving the new prediction result.

Further, an external API (application programming interface) of the unified model online service of the machine learning model online service framework for the streaming data is provided by a WEB API gateway;

the gPRC service end node in the gPRC service cluster is in long connection with the application client, can perform two-way communication, and is used for receiving a streaming data request, caching the streaming data request to a message queue service, monitoring the message queue service, acquiring a prediction result, and then asynchronously pushing the prediction result to the application client in real time;

Further, the execution unit, which is implemented by the first receiving module to establish the bidirectional communication connection between each application client and the gRPC service cluster, includes:

the receiving and judging unit is used for receiving a gPC request initiated by a gPC client and judging a currently available gPC service end node; the gPC client is the application client, and each gPC service end node in the gPC service cluster registers service information to a WEB API gateway;

the selection unit is used for selecting connectable gPC service end nodes according to a preset load balancing strategy and sending node information to the gPC client so that the gPC client establishes bidirectional communication connection with the gPC service end nodes according to the node information;

and the distribution interaction unit is used for distributing a unique client ID to the gPC client which is in two-way communication connection with the gPC server node, so that the gPC server node receives the streaming request data sent by the gPC client corresponding to the client ID, and the interaction between the gPC server node and the gPC client is realized.

Further, the specific execution units, in which the second receiving module receives the streaming request data through the gRPC server-end node and sequentially stores the streaming request data in the message queue service, include:

a generating unit, which is used for the gPC server-end node to generate the request ID corresponding to each streaming request data according to the time sequence of receiving the streaming request data;

a sending unit, configured to send each streaming request data attached with a request ID to a request message queue identified by a client ID in sequence; the streaming request data comprises model input data, a machine learning model of the prediction application and model prediction parameters.

Further, the specific execution unit that the first monitoring module monitors the message queue service and finally writes the prediction result into the message queue service includes:

the first monitoring unit is used for monitoring the request message queue and segmenting a borderless data set into data sets correspondingly processed by the selected machine learning model based on different window modes according to an application scene when new streaming request data is received;

the model prediction unit is used for inputting the segmented data set into a machine learning model to obtain a prediction result;

and the writing unit is used for writing the prediction result into a reply message queue with the client ID as the identification, and adding the request ID corresponding to the streaming request data in the prediction result so as to recombine the prediction result under the condition that the prediction result is unordered.

According to the technical scheme, the technical scheme of the invention has the following beneficial effects:

the invention discloses a method and a system for deploying online services of a machine learning model for streaming data, wherein the method comprises the following steps: constructing a machine learning model online service framework oriented to streaming data, wherein the framework comprises a unified model online service external API (application program interface), a streaming data real-time processing channel and a distributed model prediction service; the flow data real-time processing channel comprises a gPC service cluster and a message queue service, and the distributed model prediction service comprises a plurality of machine learning models; receiving a connection request of an application client, and establishing bidirectional communication connection between the application client and each gPC service end node in a gPC service cluster according to gPC bidirectional flow; each gPC service end node in the gPC service cluster receives the streaming request data and stores the streaming request data in the message queue service; and monitoring the message queue service, selecting a corresponding machine learning model for model prediction when new streaming request data are received, and writing a prediction result into the message queue service so as to push the prediction result to an application client in real time through an external API (application program interface) of the unified model online service.

The invention realizes the near real-time processing capability of streaming data through the distributed service deployment architecture, and can process large batch of data, and the specific advantages comprise:

1) The online model service is converted from batch prediction into a stream processing real-time prediction mode, so that the processing delay is effectively reduced; by developing a real-time processing channel with a streaming technology, caching event data by using a message queue, monitoring the event data stored in the message queue by using a model service, predicting in real time to write a result back to the message queue, and then responding to a user, thereby fully realizing the real-time characteristic of the method; the real-time nature ensures that the data is fresh and enables the model to respond to the latest changes.

2) Massive infinite data sets in services are more and more common, a system designed for infinite data stream processing is used in the method for processing data, input data are distributed into windows with specific sizes according to application scenes, and then each of the windows is used as an independent finite data set for processing, so that the data processing efficiency is greatly improved, and the processing delay is reduced.

3) The machine learning model online service framework for the streaming data adopts the architecture design which supports the model calculation load balance and can be expanded in parallel, effectively deals with the continuously increased streaming data and model processing capacity, can uniformly distribute the calculation capacity along with the time, and has high calculation performance.

It should be understood that all combinations of the foregoing concepts and additional concepts described in greater detail below can be considered as part of the inventive subject matter of this disclosure unless such concepts are mutually inconsistent.

The foregoing and other aspects, embodiments and features of the present teachings can be more fully understood from the following description taken in conjunction with the accompanying drawings. Additional aspects of the present invention, such as features and/or advantages of exemplary embodiments, will be apparent from the description which follows, or may be learned by practice of specific embodiments in accordance with the teachings of the present invention.

Drawings

The figures are not intended to be drawn to scale with true references. In the drawings, each identical or nearly identical component that is illustrated in various figures may be represented by a like numeral. For purposes of clarity, not every component may be labeled in every drawing. Embodiments of various aspects of the present invention will now be described, by way of example, with reference to the accompanying drawings, in which:

FIG. 1 is a schematic diagram of a streaming data-oriented machine learning model online service framework according to the present invention;

FIG. 2 is a schematic diagram of the model of the present invention implementing load balancing of the gPC server to the external API;

FIG. 3 is a diagram illustrating preprocessing and caching of streaming request data by a gPC server according to the present invention;

FIG. 4 is a schematic diagram of a distributed model prediction service subscribing to streaming request data in accordance with the present invention;

FIG. 5 is a diagram illustrating a distributed model prediction service writing a prediction result to a message queue in accordance with the present invention;

FIG. 6 is a diagram illustrating the predicted results of the present invention being pushed to an application client;

FIG. 7 is a flow chart of a streaming data-oriented machine learning model online service deployment method of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the drawings of the embodiments of the present invention. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the described embodiments of the invention without any inventive step, are within the scope of protection of the invention. Unless defined otherwise, technical or scientific terms used herein shall have the ordinary meaning as understood by one of ordinary skill in the art to which this invention belongs.

The use of "first," "second," and similar terms in the description and claims of the present application do not denote any order, quantity, or importance, but rather the terms are used to distinguish one element from another. Similarly, the singular forms "a," "an," or "the" do not denote a limitation of quantity, but rather denote the presence of at least one, unless the context clearly dictates otherwise. The terms "comprises," "comprising," or the like, mean that the elements or items listed before "comprises" or "comprising" encompass the features, integers, steps, operations, elements, and/or components listed after "comprising" or "comprising," and do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

On the basis of the existing model online service deployment method, data processing is mostly carried out in a data batch processing mode, and although the data processing can be completed in batches, on one hand, most processed data are data with boundaries and cannot be processed for borderless data, and on the other hand, the processing efficiency is not high, and the processing has delay and cannot be predicted in real time; the method of stream processing can effectively solve the defects of the existing data batch processing. The invention provides a method and a system for deploying online services of a machine learning model facing streaming data based on the characteristics of the streaming data, which adopt a distributed architecture to realize the decoupling and mutual dependence of model services and WEB services, develop a real-time processing channel with streaming technology, realize the real-time machine learning prediction of the streaming data, provide more real-time data analysis capability and realize the high availability of the services.

The following describes the streaming data-oriented machine learning model online service deployment method and system disclosed in the present invention in further detail with reference to the specific embodiments shown in the drawings.

The embodiment shown in fig. 7 discloses a flow of a streaming data-oriented machine learning model online service deployment method, and the flow specifically includes the following steps:

step S102, constructing a machine learning model online service framework facing to streaming data, wherein the framework comprises a unified model online service external API interface, a streaming data real-time processing channel and a distributed model prediction service; the flow data real-time processing channel comprises a gPC service cluster and a message queue service, and the distributed model prediction service comprises a plurality of machine learning models with a model prediction function; the gPRC service cluster comprises a plurality of gPRC service end nodes;

in implementation, the online service framework realizes continuous transmission and real-time machine learning analysis of streaming data. Specifically, as shown in fig. 1, the external API interface of the unified model online service is provided by the WEB API gateway, which can reduce the interaction complexity between the application client and the service end, and can perform a tangent plane task in a unified manner and balance the load of the service end. The gPC service cluster is characterized in that a plurality of gPC service end nodes are integrated together to provide the same service, so that the overall computing capacity of the service can be improved, and the application client looks like only one service node; the gPC server node and the application client are in long connection capable of realizing two-way communication, and are used for receiving a streaming data request, caching the streaming data request to a message queue service, monitoring the message queue service, acquiring a prediction result, and then asynchronously pushing the prediction result to the application client in real time; the message queue service is used for caching streaming request data and analysis results, realizing asynchronous processing of requests and peak clipping processing of data in a peak period, and comprises a request message queue and a reply message queue, wherein the request message queue is used for caching the streaming request data, and the reply message queue is used for caching prediction results. The online service framework realizes load balance and horizontal expansion of prediction requests among a plurality of machine learning models based on message queue service, and further has the characteristics of low delay, expandability and high throughput.

Step S104, receiving connection requests from all application clients, and establishing bidirectional communication connection between all application clients and all gPC service end nodes in a gPC service cluster by utilizing gPC bidirectional flow;

in the balanced load diagram shown in fig. 2, an application client, i.e., a gRPC client, is a client that implements a gRPC protocol in this scheme, and the application client and a gRPC server node communicate via the gRPC protocol; when connection is established, the gPC client initiates a connection request to each gPC server node in the gPC service cluster. Specifically, all gRPC service end nodes in the gRPC service cluster register service information in advance in the WEB API gateway, and then, when a gRPC client initiates a gRPC request, the following procedure is implemented, that is: receiving a gPC request of a gPC client and judging a currently available gPC service end node; the method comprises the steps that a connectable gPC service end node is selected according to a preset load balancing strategy, and node information is sent to a gPC client, so that the gPC client establishes bidirectional communication connection with the gPC service end node according to the node information; a unique client ID is distributed to a gPC client which is in two-way communication connection with a gPC server node, so that the gPC server node receives streaming request data sent by the gPC client corresponding to the client ID, and interaction between the gPC server node and the gPC client is realized; as shown in fig. 2, the gRPC server node caches information of the client until the connection is disconnected during bidirectional communication.

The method comprises the steps that the currently available gPC service end node is obtained by accessing a WEB API gateway, specifically, service information of the gPC service end node is registered when the gPC service end node is started, the service information comprises IP and ports, the service information reaches the WEB API gateway, the WEB API gateway sends health check heartbeats of all the gPC service end nodes at regular time, and the gPC client side judges whether the service end node is available according to the returned states of the gPC service end nodes. In addition, different load balancing strategies differ in the way the gRPC server end node is selected. For example, under a polling policy, the WEB API gateway will cyclically distribute received gRPC requests for connection to each gRPC service end node in the gRPC service cluster; under a random strategy, the WEB API gateway randomly selects a gPC service end node from a gPC service cluster list according to a received gPC request for connection, and then forwards the connection request to the gPC service end node.

Step S106, each gPC service end node in the gPC service cluster continuously receives the streaming request data from each application client and sequentially stores the streaming request data in the message queue service;

the streaming request data initiated by the application client is sent to the corresponding gRPC service end node according to the bidirectional communication connection established between the application client and the gRPC service end node, and the gRPC service end node further processes the received streaming request data. In this embodiment, as shown in fig. 3, the gRPC server node allocates a thread to each gRPC client or processes its request by coroutine, and receives request data from the gRPC client and then sequentially writes the request data into a corresponding request message queue; specifically, the gRPC server-side node generates a request ID corresponding to each streaming request data according to a time sequence of receiving the streaming request data; then, the gPC server end node sequentially sends each streaming request data attached with the request ID to a request message queue taking the client ID as an identifier; the streaming request data comprises three types of information, namely model input data, a machine learning model for prediction application, model prediction parameters and the like, in a bidirectional communication link formed by a gPC client and a gPC server node, except that the model input data must be recorded in each piece of streaming request data, the machine learning model and the model prediction parameters for prediction application can be only marked in the first piece of streaming request data determined according to the request sequence, and the subsequent streaming request data all adopt the same machine learning model and the same model prediction parameters.

Optionally, when the request message queue has only one queue, after the gRPC server-side node generates the request ID in the previous period, the gRPC server-side node sends each streaming request data to the message queue, where the request ID and the client ID are added simultaneously, so that the result after the subsequent request processing is fed back to the corresponding application client.

Step S108, monitoring the message queue service, selecting a corresponding machine learning model from the distributed model prediction service for model prediction when new streaming request data is received, and writing a prediction result into the message queue service;

and performing monitoring work by the distributed model prediction service, specifically, subscribing data in the request message queue and the reply message queue by all the model prediction services in the distributed model prediction service in a group subscription mode. For streaming request data, the specific process from the monitoring of the message queue service to the writing of the prediction result into the message queue service is as follows: monitoring a request message queue, and segmenting a borderless data set into data sets correspondingly processed by a selected machine learning model based on different window modes according to an application scene when new streaming request data is received; and inputting the segmented data set into a machine learning model to obtain a prediction result. As shown in fig. 4, when a distributed model prediction service subscribes to receive a streaming request data, the distributed model prediction service analyzes the content in the streaming request data, and determines the machine learning model to be used, how to obtain model input data, and how to perform interception of a borderless data set (including window mode and window size determined according to an application scenario), so as to obtain the machine learning model and corresponding input data, and perform model prediction.

And after model prediction is completed and a prediction result is obtained, writing the prediction result into a reply message queue with the client ID as an identifier, and adding a request ID corresponding to streaming request data in the prediction result so as to recombine the prediction result under the condition that the prediction result is unordered. As shown in fig. 5, when processing streaming request data, the distributed model prediction service may obtain a client ID and a request ID in the data, and accordingly write a prediction result into a reply message queue of the corresponding client ID.

Step S110, monitoring message queue service, and pushing the prediction result to the application client corresponding to the request in real time through the external API interface of the unified model online service when receiving the new prediction result.

The distributed model prediction service further monitors the prediction result in the reply message queue, and pushes the prediction result to the corresponding application client in real time when receiving a new prediction result. As shown in fig. 6, the gRPC server-side node monitors the reply message queues of the corresponding clients processed by the node, sorts the messages according to the request IDs after receiving the prediction results, and then pushes the prediction results to the application clients in sequence.

In the scheme, after the gPC service end node receives the streaming request data sent by the corresponding gPC client, the unique client ID is used for identifying the request data, and after the model service processing is completed, the client ID is also used for identifying the prediction result corresponding to the request, so that the gPC service end node can push the prediction result to the corresponding gPC client only after receiving the prediction result.

Compared with the existing online service deployment mode of the model, the online service deployment method of the machine learning model for the streaming data disclosed by the embodiment realizes the characteristics of faster and real-time data processing by developing the real-time processing channel with the streaming technology, and particularly for the streaming data, the repeated uploading of the data can be avoided through the message queue service, the waste of the flow is avoided, and the real-time machine learning of the streaming data is fully realized; the model deployment mode in the method can realize decoupling and mutual dependence of model services and WEB services, and the service deployment in the scheme can be horizontally expanded, so that larger data volume is supported, more real-time data analysis capability is provided, and high availability of the services is realized.

In an embodiment of the present invention, an electronic device is further provided, where the electronic device includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the computer program is executed by the processor, the streaming data-oriented machine learning model online service deployment method disclosed in the above embodiment is implemented.

The programs described above may be run on a processor or may also be stored in memory, i.e., a computer readable medium, which may include non-transitory and non-transitory, removable and non-removable media, which may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include transitory computer readable media such as modulated data signals and carrier waves.

These computer programs may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks and corresponding method steps may be implemented by different modules.

In this embodiment, there is provided an apparatus or system, which may be referred to as a streaming data-oriented machine learning model online service deployment system, including: the system comprises a construction module, a data processing module and a data processing module, wherein the construction module is used for constructing a machine learning model online service framework facing streaming data, and comprises a unified model online service external API (application program interface), a streaming data real-time processing channel and a distributed model prediction service; the streaming data real-time processing channel comprises a gPC service cluster and a message queue service; the distributed model prediction service comprises a plurality of machine learning models with model prediction functions; the gPRC service cluster comprises a plurality of gPRC service end nodes; the first receiving module is used for receiving connection requests from all the application clients and establishing bidirectional communication connection between all the application clients and all gPC service end nodes in a gPC service cluster by utilizing gPC bidirectional flow; the second receiving module is used for continuously receiving the streaming request data from each application client by each gPC server node in the gPC service cluster and sequentially storing the streaming request data into the message queue service; the first monitoring module is used for monitoring the message queue service, selecting a corresponding machine learning model from the distributed model prediction service for model prediction when new streaming request data is received, and writing a prediction result into the message queue service; and the second monitoring module is used for monitoring the message queue service and pushing the prediction result to the application client corresponding to the request in real time through the external API interface of the unified model online service when receiving the new prediction result.

The steps of the system for implementing the online service deployment method of the machine learning model for streaming data disclosed in the above embodiments have already been described, and are not described herein again.

For example, the external API interface of the unified model online service of the machine learning model online service framework for streaming data, which is constructed by the construction module, is provided by a WEB API gateway; a gPRC service end node in a gPRC service cluster of a streaming data real-time processing channel keeps long connection capable of realizing two-way communication with an application client, and is used for receiving a streaming data request, caching the streaming data request to a message queue service, monitoring the message queue service, acquiring a prediction result, and then asynchronously pushing the prediction result to the application client in real time; the message queue service of the streaming data real-time processing channel comprises a request message queue and a reply message queue, wherein the request message queue is used for caching streaming request data, and the reply message queue is used for caching a prediction result.

For another example, the execution unit, by which the first receiving module establishes the bidirectional communication connection between each application client and the gRPC service cluster, includes:

the receiving and judging unit is used for receiving a gPC request initiated by a gPC client and judging a currently available gPC service end node; the gPC client is the application client, and each gPC service end node in the gPC service cluster registers service information to a WEB API gateway; the selection unit is used for selecting a connectable gPC service end node according to a preset load balancing strategy and sending node information to the gPC client so that the gPC client establishes bidirectional communication connection with the gPC service end node according to the node information; and the distribution interaction unit is used for distributing a unique client ID to the gPC client which is in two-way communication connection with the gPC server node, so that the gPC server node receives the streaming request data sent by the gPC client corresponding to the client ID, and the interaction between the gPC server node and the gPC client is realized.

For another example, the specific execution unit of the second receiving module, which receives the streaming request data through the gRPC service end node and sequentially stores the streaming request data in the message queue service, includes: a generating unit, which is used for the gPC server-end node to generate the request ID corresponding to each streaming request data according to the time sequence of receiving the streaming request data; a sending unit, configured to send each streaming request data attached with a request ID to a request message queue identified by a client ID in sequence; the streaming request data comprises model input data, a machine learning model of the prediction application and model prediction parameters.

For another example, the specific execution unit that the first listening module listens to the message queue service and finally writes the prediction result into the message queue service includes: the first monitoring unit is used for monitoring the request message queue and segmenting a borderless data set into data sets correspondingly processed by the selected machine learning model based on different window modes according to an application scene when new streaming request data is received; the model prediction unit is used for inputting the segmented data set into a machine learning model to obtain a prediction result; and the writing unit is used for writing the prediction result into a reply message queue with the client ID as the identification, and adding the request ID corresponding to the streaming request data in the prediction result so as to recombine the prediction result under the condition that the prediction result is unordered.

For another example, the second monitoring module monitors the message queue service, and finally pushes the prediction result to the execution unit of the application client corresponding to the request in real time includes: the second monitoring unit is used for monitoring the reply message queue; and the pushing unit is used for pushing the prediction result to the application client corresponding to the client ID identified by the reply message queue in real time when the fact that the reply message queue receives the new prediction result is monitored.

The method and the system disclosed by the invention realize that a machine learning model online service framework facing streaming data provides an asynchronous WEB interface for the outside, receives, caches and processes data in real time, and sends an analysis result, thereby avoiding invalid blockage of a client; the distributed service deployment architecture is utilized in the deployment scheme, the problems of strong processing delay, low computing performance and the like of the existing machine learning model online service deployment method are effectively solved, the near-real-time processing capability of streaming data is realized, and large batches of data can be processed.

Although the present invention has been described with reference to the preferred embodiments, it is not intended to be limited thereto. Those skilled in the art can make various changes and modifications without departing from the spirit and scope of the invention. Therefore, the protection scope of the present invention should be defined by the appended claims.

Claims

1. A method for deploying online services of a machine learning model for streaming data is characterized by comprising the following steps:

monitoring the message queue service, selecting a corresponding machine learning model from the distributed model prediction service to perform model prediction when new request data are received, and writing a prediction result into the message queue service;

2. The streaming data-oriented machine learning model online service deployment method according to claim 1, wherein the unified model online service external API interface is provided by a WEB API gateway;

3. The online service deployment method of the machine learning model for the streaming data according to claim 2, characterized in that the application client is a gRPC client, and each gRPC server node in the gRPC service cluster registers service information to a WEB API gateway;

a unique client ID is distributed to a gPC client which is in two-way communication connection with the gPC server node, so that the gPC server node receives streaming request data sent by the gPC client corresponding to the client ID, and interaction between the gPC server node and the gPC client is realized.

4. The online service deployment method of the machine learning model for streaming data according to claim 3, wherein the specific process of the gPC server-side node receiving the streaming request data and sequentially storing the streaming request data in the message queue service is as follows:

sequentially sending each streaming request data attached with the request ID to a request message queue taking the client ID as an identifier; the streaming request data comprises model input data, a machine learning model of the prediction application and model prediction parameters.

5. The online service deployment method of the machine learning model for streaming data according to claim 4, wherein the specific process from the listening message queue service to the writing of the prediction result into the message queue service is as follows:

6. A streaming data-oriented machine learning model online service deployment system, comprising:

the first monitoring module is used for monitoring the message queue service, selecting a corresponding machine learning model from the distributed model prediction service for model prediction when new request data is received, and writing a prediction result into the message queue service;

7. The online service deployment system of machine learning model for streaming data as claimed in claim 6, wherein the external API interface of the unified online service of machine learning model for streaming data is provided by a WEB API gateway;

8. The streaming data-oriented machine learning model online service deployment system of claim 7, wherein the execution unit that the first receiving module implements to establish the bidirectional communication connection between each application client and the gRPC service cluster comprises:

the receiving and judging unit is used for receiving a gPC request initiated by a gPC client and judging a currently available gPC service end node; the gPRC service nodes register service information to a WEB API gateway;

the selection unit is used for selecting a connectable gPC service end node according to a preset load balancing strategy and sending node information to the gPC client so that the gPC client establishes bidirectional communication connection with the gPC service end node according to the node information;

9. The streaming data-oriented machine learning model online service deployment system of claim 8, wherein the specific execution units of the second receiving module, which receive streaming request data through the gRPC service end node and sequentially store the streaming request data in the message queue service, comprise:

a sending unit, configured to send each streaming request data to which a request ID is attached in sequence to a request message queue identified by a client ID; the streaming request data comprises model input data, a machine learning model of the prediction application and model prediction parameters.

10. The streaming data-oriented machine learning model online service deployment system of claim 9, wherein the specific execution unit that the first listening module listens to the message queue service and finally writes the prediction result into the message queue service comprises: