CN110728372A

CN110728372A - Cluster design method and cluster architecture for dynamic loading of artificial intelligence model

Info

Publication number: CN110728372A
Application number: CN201910921147.7A
Authority: CN
Inventors: 顾嘉晟; 李瀚清; 王江; 曾彦能; 陈运文; 纪达麒
Original assignee: Daerguan Information Technology (shanghai) Co Ltd
Current assignee: Daerguan Information Technology (shanghai) Co Ltd
Priority date: 2019-09-27
Filing date: 2019-09-27
Publication date: 2020-01-24
Anticipated expiration: 2039-09-27
Also published as: CN110728372B

Abstract

The invention relates to a cluster design method for dynamic loading of an artificial intelligence model, and belongs to the field of artificial intelligence. The method designs a model and a service discovery mechanism, and realizes automatic deployment of the neural network model in a service cluster through server deployment architecture design of a distributed automatic loading model. The model deployment architecture can realize full automation, high concurrency, high availability and automatic resource allocation of the deployment of the neural network model in the service cluster, thereby utilizing the server resources to the maximum extent and greatly reducing the waste of calculation and memory resources.

Description

Cluster design method and cluster architecture for dynamic loading of artificial intelligence model

Technical Field

The invention relates to the field of artificial intelligence, in particular to an architecture design mode of a service cluster based on a large-scale neural network.

Background

The neural network algorithm for deep learning is the mainstream algorithm in the field of artificial intelligence at present. Due to the multi-layer and complex model structure and the algorithmic property of the back propagation derivation, the most advanced models in the current academic world, such as Bert or GPT-2.0 in the field of natural language processing, are composed of more than 10 hundred million parameter variables, resulting in a model size of up to several GB.

In the actual engineering field, if the model is finely tuned each time on line or is reloaded from the hard disk to the memory by using model prediction, a large amount of time is consumed, and the on-line use experience of a user is seriously influenced. If all models are persistently loaded in the memory of the server, the problem of memory overflow can be caused when the number of the models is too large. For example, a plurality of servers with 16GB memories in a cluster require 20 deep learning models with different sizes ranging from 1GB to 4GB to be deployed, and if a conventional method is adopted and machines with the same number of models are deployed, a large amount of computing resources are wasted, and server cost is greatly increased. If the models are randomly distributed on a single machine, the internal memory is tried to be fully utilized, and the problems brought by the random distribution of the models are as follows: the models have popularity, most customers may only access a small portion of the models, and when the several high-frequency-use models are deployed on the same machine, the computational resources of the GPU or CPU may be congested. In addition, after the service is on line, if the administrator finds that the service resources cannot meet the user requirements, and intends to increase the service resources by adding machines, the common practice can only be to restart the service as a whole and load the model at random, which greatly affects the high availability of the service.

Disclosure of Invention

The invention aims to solve the defects of the platform architecture design, and provides a novel server deployment architecture of a distributed automatic loading model by designing a brand-new model and a service discovery mechanism. The model deployment architecture in the invention can realize full automation, high concurrency, high availability and automatic resource allocation of the deployment of the neural network model in the service cluster. Therefore, server resources are utilized to the maximum extent, and waste of computing and memory resources is greatly reduced.

In order to achieve the purpose of the invention, the technical scheme provided by the invention patent is as follows:

a cluster design method for dynamic loading of an artificial intelligence model is characterized in that the method designs a model and a service discovery mechanism, and realizes automatic deployment of a neural network model in a service cluster through server deployment architecture design of a distributed automatic loading model, and the method comprises the following processing steps:

the model discovery service is provided with a first dictionary, the first dictionary comprises the corresponding relation between a current server and a deployed model, when a user clicks and deploys a certain new model by one key on a foreground page, the model discovery service receives an instruction, and automatically deploys the new model on a low-pressure server by calculating the server pressure condition and the memory occupation condition of the server in a historical period;

the machine discovery service is used for storing a second dictionary, wherein the second dictionary comprises the current online server state, the state of each server in the service cluster is checked in a traversing mode every 10 seconds, the second dictionary is updated, and when a new server is found to be added into the current service cluster, the highest frequency usage model is obtained and loaded into a low-pressure server under the condition that the memory is not exceeded;

designing a service health check mechanism, wherein the service monitoring check mechanism checks the state of the machine discovery service every 10 seconds, and if the machine discovery service is down or a task is stuck, sending alarm information or automatically restarting the service is selected according to the actual condition;

during service health check, the pressure condition of each field in each server is analyzed simultaneously, if the pressure of individual servers in a service cluster is too high due to a large number of backlog tasks of a certain field, an idle server is automatically searched for loading the model, the model is unloaded on a high-pressure server, and the task is returned to a message queue for redistribution;

the message queues are used as a prediction task propagation medium, the message queues are used as a transmission medium for distributing tasks, prediction tasks of different models are separately stored in different queues, and when the prediction task queues of a certain model are too congested, the model discovery service automatically selects an idle server to load the model.

A cluster structure for dynamically loading artificial intelligence models comprises a user input port, a preprocessing server, a message queue server, a hard disk, a service cluster and a Redis database, wherein a plurality of neural network models are stored in the hard disk, the service cluster comprises a plurality of model deployment servers, the neural network models are loaded in the model deployment servers in the service cluster to carry out artificial intelligence operation, a user uploads or inputs signals through the user input port, the preprocessing server forwards codes which can be identified and processed by a machine, the codes are distributed to different model deployment servers through the message queue server, the corresponding models in the model deployment servers are used for processing, and processing results are output and stored in the Redis database, the cluster structure is characterized by further comprising a model/machine discovery mechanism module, the model/machine discovery mechanism module is stored in a single server in a service cluster and exists in a master-standby service mode in the server, and comprises a model discovery service submodule, a machine discovery service submodule, a service health check mechanism submodule and a model unloading submodule:

the model discovery service sub-module is independently deployed in a service cluster, the interior of the model discovery service sub-module exists in a main service and standby service mode, a first dictionary is arranged in the model discovery service sub-module, the first dictionary comprises the corresponding relation between a current server and a deployed model, a user clicks one key on a foreground page to deploy a certain new model, the model discovery service sub-module receives an instruction, and the new model is automatically deployed on a low-pressure server by calculating the server pressure condition and the memory occupation condition of the server in a historical period;

the machine discovery service submodule and the model discovery service submodule are deployed in the same server, a second dictionary is stored in the machine discovery service submodule, the second dictionary comprises the current online server state, the state of each model deployment server in a service cluster is checked in a traversing mode every 10 seconds, the second dictionary is updated, and when a new model deployment server is added into the current service cluster, the model used at the highest frequency is obtained and loaded into a low-pressure model deployment server under the condition that the memory does not exceed;

the service monitoring and checking mechanism submodule checks the state of the machine discovery service submodule every 10 seconds, if the machine discovery service submodule is down or a task is stuck, alarm information is sent or the service is restarted automatically according to the actual situation, during the service health check, the pressure situation of each field in each model deployment server is analyzed, if a large number of overstocked tasks of a certain field are found to cause the pressure of an individual model deployment server in a service cluster to be overlarge, the unloading model submodule automatically searches for an idle model deployment server to load the model, unloads the model on the high-pressure model deployment server and returns the task to a message queue for redistribution;

the method comprises the steps that a message queue is used as a prediction task propagation medium, the message queue is used as a transmission medium for distributing tasks, prediction tasks of different models are separately stored in different queues, and when the prediction task queue of a certain model is too congested, the model discovery service sub-module automatically selects an idle model deployment server to load the model.

Based on the technical scheme, the cluster design method and the cluster architecture for dynamic loading of the artificial intelligence model disclosed by the invention have the following technical effects through practical application:

1. the cluster design method for dynamic loading of the artificial intelligence model and the architecture design based on the method can realize one-key deployment of the online neural network model, reduce human intervention and reduce the occurrence of artificial errors.

2. The invention relates to a cluster design method for dynamic loading of an artificial intelligence model and an architecture design based on the method, which can automatically detect the pressure of a single server when the service is on line and automatically allocate an idle server to load a high-frequency use model through a model discovery mechanism.

3. The cluster design method for dynamic loading of the artificial intelligence model and the architecture design based on the method can automatically discover when a new server is added in the cluster, and realize dynamic loading according to the model used at high frequency in the current or historical record.

4. The cluster design method for dynamic loading of the artificial intelligence model and the service architecture designed based on the architecture of the method can realize high concurrency and high availability of the neural network service.

Drawings

FIG. 1 is a schematic diagram of a cluster architecture for dynamic loading of an artificial intelligence model according to the present invention.

FIG. 2 is a flow chart of a cluster design method for dynamic loading of an artificial intelligence model according to the present invention.

Detailed Description

In the following, we will make further detailed descriptions on the cluster design method and the cluster architecture for dynamic loading of an artificial intelligence model according to the present invention with reference to the drawings and specific embodiments, so as to understand the structural composition and the workflow clearly and clearly, but not to limit the scope of the present invention.

The invention firstly describes a cluster design method for dynamically loading an artificial intelligence model, and the method designs a model and a service discovery mechanism, and realizes automatic deployment of a neural network model in a service cluster through server deployment architecture design of a distributed automatic loading model.

To implement the above-mentioned ideas in detail, the cluster design method of the present invention includes the following processing steps:

the model discovery service is provided with a first dictionary, the first dictionary comprises the corresponding relation between a current server and a deployed model, when a user clicks and deploys a certain new model by one key on a foreground page, the model discovery service receives an instruction, and the new model is automatically deployed on a low-pressure server by calculating the server pressure condition and the memory occupation condition of the server in a historical period.

And the machine discovery service is used for storing a second dictionary, wherein the second dictionary comprises the current online server state, the state of each server in the service cluster is checked in a traversing mode every 10 seconds, the second dictionary is updated, and when a new server is found to be added into the current service cluster, the highest frequency usage model is obtained and loaded into a low-pressure server under the condition that the memory is not exceeded.

And designing a service health check mechanism, wherein the service monitoring check mechanism checks the state of the machine discovery service every 10 seconds, and if the machine discovery service is down or the task is stuck, the service monitoring check mechanism selects to send alarm information or automatically restart the service according to the actual condition.

During the service health check, the pressure condition of each field in each server is analyzed simultaneously, if the pressure of individual servers in the service cluster is too high due to a large number of backlogged tasks of a certain field, an idle server is automatically searched for to load the model, the model is unloaded on a high-pressure server, and the task is returned to the message queue for redistribution.

As shown in FIG. 1, the present invention also relates to a cluster structure for dynamic loading of artificial intelligence models. In general, the cluster architecture of the invention comprises a user input port, a preprocessing server, a message queue server, a hard disk, a service cluster and a Redis database, wherein a plurality of neural network models are stored in the hard disk, the service cluster comprises a plurality of model deployment servers, the neural network models are loaded in the model deployment servers in the service cluster to perform artificial intelligence operation, a user uploads or inputs signals through the user input port, the preprocessing server forwards codes which can be identified and processed by a machine, the codes are distributed to different model deployment servers through the message queue server, the codes are processed by corresponding models in the model deployment servers, and processing results are output and stored in the Redis database. The innovation of the present patent application is that the cluster architecture further includes a model/machine discovery mechanism module, the model/machine discovery mechanism module is stored in a separate server in the service cluster and exists in the form of a master/standby service inside the server, and the model/machine discovery mechanism module includes a model discovery service sub-module, a machine discovery service sub-module, a service health check mechanism sub-module and a model uninstallation sub-module.

The model discovery service sub-module is independently deployed in a service cluster, the interior of the model discovery service sub-module exists in a main service and standby service mode, a first dictionary is arranged in the model discovery service sub-module and comprises the corresponding relation between a current server and a deployed model, a user clicks one key on a foreground page to deploy a certain new model, the model discovery service sub-module receives an instruction, and the new model is automatically deployed on a low-pressure server by calculating the server pressure condition and the memory occupation condition of the server in a historical period.

The machine discovery service submodule and the model discovery service submodule are deployed in the same server, a second dictionary is stored in the machine discovery service submodule, the second dictionary comprises the current online server state, the state of each model deployment server in a service cluster is checked in a traversing mode every 10 seconds, the second dictionary is updated, and when a new model deployment server is added into the current service cluster, the model used at the highest frequency is obtained and loaded into the model deployment server with low pressure under the condition that the memory does not exceed the model deployment server.

The service monitoring and checking mechanism submodule checks the state of the machine discovery service submodule every 10 seconds, if the machine discovery service submodule is down or a task is stuck, alarm information is sent or the service is restarted automatically according to the actual situation, during the service health check, the pressure situation of each field in each model deployment server is analyzed, if a large number of overstocked tasks in a certain field are found to cause the pressure of a certain model deployment server in a service cluster to be overlarge, the unloading model submodule automatically searches an idle model deployment server to load the model, unloads the model on the high-pressure model deployment server, and returns the task to a message queue for redistribution.

The invention also uses the message queue as a prediction task transmission medium and uses the message queue as a transmission medium for distributing tasks, so that the prediction tasks of different models are separately stored in different queues, and when the prediction task queue of a certain model is too congested, the model discovery service submodule automatically selects an idle model deployment server to load the model.

The cluster architecture composition of the present invention and the dynamic loading method for implementing the artificial intelligence model based on the cluster architecture are described in detail below with a specific embodiment.

Example 1

The embodiment is a text mining task based on the method and the system. The requirements for the system in this embodiment are: a batch of contract files are uploaded by a user, and the field contents of information of a party A, information of a party B, contract effective time, contract specific terms and the like are expected to be excavated. Each field in the text corresponds to a deep learning neural network model. Assume that the task has 40 fields, i.e., 40 neural network models. The hardware aspect is mainly the server: the number of servers required is 4. The processing flow is shown in fig. 2:

step 1, a user clicks 'model online' on a page, at the moment, a model/machine discovery mechanism automatically detects the total number of the models and the number of servers, and initialization loading is realized by calculating the memory occupancy rate. For example, models 1-10 are loaded into server A; the models 11-20 are loaded into the server B; the models 21-30 are loaded into server C; the models 31-40 are loaded into server D.

And 2, after the user uploads a batch of files, the preprocessing server performs a series of text processing work to clean a text structure suitable for model reading. The text is divided into small tasks of about 5000 characters and is sequentially transmitted into a message queue, and one part is arranged in each field. At this time, there should be 20 queues in the message queue service, which correspond to 20 fields respectively, and each queue excavates all documents uploaded by the user.

And 3, the model deployment server pulls the corresponding task from the corresponding message queue at the moment and starts to execute the mining task.

And 4, at the moment, the administrator finds that the load of the 4 servers is too high, and decides to add the 4 servers. The machine discovery service discovers 4 machines newly added into the cluster after 10 seconds, and determines to load the field model with the largest number of remaining tasks into the new machine by analyzing the total number of tasks of each current field.

Step 5. assume that the model 10 predicts a longer speed than the rest of the fields and that machine a is overloaded. After the service health check mechanism finds the phenomenon, the model 10 is loaded on a machine which is not loaded with the model 10, and related tasks of the model 10 are pulled from a message queue. At the same time, model 10 in machine A is unloaded, and if a task is in progress, the task is automatically returned to the message queue for consumption by other servers.

The invention is a specific application case of the cluster architecture dynamically loaded by the patent artificial intelligence model, which is applied to a new patent system architecture and an implementation mode to embody the practical value of the patent system architecture and the implementation mode. In summary, the scope of the present invention also includes other modifications and alternatives apparent to those skilled in the art.

Claims

1. A cluster design method for dynamic loading of an artificial intelligence model is characterized in that the method designs a model and a service discovery mechanism, and realizes automatic deployment of a neural network model in a service cluster through server deployment architecture design of a distributed automatic loading model, and the method comprises the following processing steps:

2. A cluster structure for dynamically loading artificial intelligence models comprises a user input port, a preprocessing server, a message queue server, a hard disk, a service cluster and a Redis database, wherein a plurality of neural network models are stored in the hard disk, the service cluster comprises a plurality of model deployment servers, the neural network models are loaded in the model deployment servers in the service cluster to carry out artificial intelligence operation, a user uploads or inputs signals through the user input port, the preprocessing server forwards codes which can be identified and processed by a machine, the codes are distributed to different model deployment servers through the message queue server, the corresponding models in the model deployment servers are used for processing, and processing results are output and stored in the Redis database, the cluster structure is characterized by further comprising a model/machine discovery mechanism module, the model/machine discovery mechanism module is stored in a single server in a service cluster and exists in a master-standby service mode in the server, and comprises a model discovery service submodule, a machine discovery service submodule, a service health check mechanism submodule and a model unloading submodule: