CN116775047A - Deployment method, device and medium of AI model service cluster architecture - Google Patents

Deployment method, device and medium of AI model service cluster architecture Download PDF

Info

Publication number
CN116775047A
CN116775047A CN202311041399.3A CN202311041399A CN116775047A CN 116775047 A CN116775047 A CN 116775047A CN 202311041399 A CN202311041399 A CN 202311041399A CN 116775047 A CN116775047 A CN 116775047A
Authority
CN
China
Prior art keywords
process program
prediction
prediction engine
program
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311041399.3A
Other languages
Chinese (zh)
Inventor
陈天友
陶征霖
常雷
姚佳丽
霍瑞龙
刘大伟
宋宜旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Even Number Technology Co ltd
Original Assignee
Beijing Even Number Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Even Number Technology Co ltd filed Critical Beijing Even Number Technology Co ltd
Priority to CN202311041399.3A priority Critical patent/CN116775047A/en
Publication of CN116775047A publication Critical patent/CN116775047A/en
Pending legal-status Critical Current

Links

Landscapes

  • Stored Programmes (AREA)

Abstract

The application discloses a deployment method, a device and a medium of an AI model service cluster architecture. The method comprises the following steps: the client program sends an AI model service cluster architecture deployment request to a process program of the management server; the process program obtains a prediction engine process program file according to the deployment request, and sends the prediction engine process program in the prediction engine process program file to the prediction server cluster and starts the prediction engine process program; the prediction server cluster returns a deployment success signal to the process program, and the process program adds the prediction engine process program into a pre-established prediction engine list; the process program sends heartbeat verification to a prediction engine process program of the prediction server cluster at intervals of preset time; the prediction engine process program returns heartbeat proof health to the process program; the process program returns a deployment success message of the AI model service cluster architecture to the client program to complete the deployment of the AI model service cluster architecture.

Description

Deployment method, device and medium of AI model service cluster architecture
Technical Field
The present application relates to the field of computer technologies, and in particular, to a deployment method, apparatus, and medium for an AI model service cluster architecture.
Background
With the rapid development of AI technology, the acceleration of data acceleration in various industries and the accumulation of large data volume, artificial intelligent platforms are increasingly utilized in various fields. The AI algorithm is applied to big data, so that an AI model with commercial value can be trained, and after the model is provided, the intelligent model can be more efficiently developed, so that the problems of AI model service and deployment are faced. The prior art is largely divided into two types: public deployments and private deployments. The existing AI model deployment framework in the industry is mostly written in the Java language, and the execution efficiency of the Java language program is currently overtaken by the new generation of development language, for example, golang, which is known as high-efficiency concurrency, is outstanding in the field.
When the multi-node deployment of the cluster needs to be supported, the conventional scheme in the industry can generate a plurality of processes, and the processes are deployed to each node in the cluster respectively, so that dynamic expansion and later operation and maintenance are difficult to perform.
In addition, the development of new AI models within the technological circles is also accelerated, for example, the new AI models are still being developed by using more TensorFlow, spark, SKLearn. Every time a new AI framework is introduced in the industry, challenges are presented to the existing deployed model service framework, and how to integrate the new framework into the existing framework through simple and rapid iterative development becomes a technical problem to be solved. So that the technical problems existing at present are as follows: java programs are not efficient enough to run; when the cluster is expanded by multiple nodes, the number of processes required to be used is excessive, and deployment and operation are difficult; when a new frame appears, it is difficult to upgrade.
Disclosure of Invention
Aiming at the defects of the prior art, the application provides a deployment method, a device and a medium of an AI model service cluster architecture.
According to one aspect of the present application, there is provided a deployment method of an AI model service cluster architecture, including:
the client program sends an AI model service cluster architecture deployment request to a process program of the management server, wherein the process program is developed by Golang language;
the process program obtains a prediction engine process program file according to the deployment request, and sends the prediction engine process program in the prediction engine process program file to the prediction server cluster and starts the prediction engine process program, wherein the prediction engine process program comprises a plurality of machine learning frames;
the prediction server cluster returns a deployment success signal to the process program, and the process program adds the prediction engine process program into a pre-established prediction engine list;
the process program sends heartbeat verification to a prediction engine process program of the prediction server cluster at intervals of preset time;
the prediction engine process program returns heartbeat proof health to the process program;
the process program returns a deployment success message of the AI model service cluster architecture to the client program to complete the deployment of the AI model service cluster architecture.
Optionally, the process program registers IP addresses, user name and password information of all nodes in the prediction server cluster, and the respective engine process programs and the process programs communicate with each other through a network.
Optionally, the method further comprises: and carrying out batch prediction of the AI model according to the AI model service cluster architecture.
Optionally, performing batch prediction of AI models according to an AI model service cluster architecture includes:
the client program sends an AI model prediction request to the process program;
the process program sends the AI model to all the prediction engine process programs;
the process program sends the predicted data to all the prediction engine process programs;
all the prediction engine process programs return the prediction results to the process programs;
the process program sends an AI model prediction completion signal to the client program.
Optionally, the method further comprises:
the client program sends a request for hot replacement of the new AI model to the process program;
the process program sends the new AI model to all the prediction engine process programs;
the prediction engine process program uninstalls the old version of AI model, installs the new version of AI model, and returns a hot replacement success signal to the process program;
the process program returns a new version AI model replacement success signal to the client program.
Optionally, the method further comprises:
the client program sends a request for hot replacement of the new version of prediction engine process program to the process program;
the process program deploys a new version of the prediction engine process program to the prediction server cluster;
the process program deletes the old version of the prediction engine process program from the prediction engine list, and adds the new version of the prediction engine process program into the prediction engine list;
the process program deletes the old version of prediction engine process program from the prediction server cluster;
the process program replaces the new version of the prediction engine process program with a signal back to the client program.
According to another aspect of the present application, there is provided a deployment apparatus of an AI model service cluster architecture, including:
the client program is used for sending an AI model service cluster architecture deployment request to a process program of the management server, wherein the process program is developed by Golang language;
the second sending module is used for the process program to obtain a prediction engine process program file according to the deployment request, sending the prediction engine process program in the prediction engine process program file to the prediction server cluster and starting the prediction engine process program, wherein the prediction engine process program comprises a plurality of machine learning frames;
the first return module is used for returning a deployment success signal to the process program by the prediction server cluster, and the process program adds the prediction engine process program into a pre-established prediction engine list;
the third sending module is used for sending heartbeat verification to the prediction engine process program of the prediction server cluster at intervals of preset time;
the second return module is used for predicting the engine process program to return the heartbeat proof health to the process program;
and the third return module is used for returning a deployment success message of the AI model service cluster architecture to the client program by the process program to complete the deployment of the AI model service cluster architecture.
According to a further aspect of the present application there is provided a computer readable storage medium storing a computer program for performing the method according to any one of the above aspects of the present application.
According to still another aspect of the present application, there is provided an electronic device including: a processor; a memory for storing the processor-executable instructions; the processor is configured to read the executable instructions from the memory and execute the instructions to implement the method according to any of the above aspects of the present application.
Therefore, the application develops the progress program based on Golang language, and greatly surpasses Java programs in the aspects of operation efficiency and high-efficiency concurrency. Each node (server) only needs to run one process, and the process comprises a plurality of technical frameworks which can simultaneously support the running of different AI models, and if one node is added in the cluster, the process can be simply run on a new node. The AI model service cluster architecture realized by the technical scheme considers a quick compatible new framework, designs a whole set of interfaces in an object-oriented mode on a code layer, and can realize quick access according to the existing interfaces when the new framework needs to be added.
Drawings
Exemplary embodiments of the present application may be more completely understood in consideration of the following drawings:
FIG. 1 is a flow chart of a deployment method of an AI model service cluster architecture provided by an exemplary embodiment of the application;
FIG. 2 is a flow chart of a deployment method of an AI model service cluster architecture according to an exemplary embodiment of the application;
FIG. 3 is a flowchart illustrating a deployment method of an AI model service cluster architecture according to an exemplary embodiment of the application;
FIG. 4 is a flowchart illustrating a deployment method of an AI model service cluster architecture according to an exemplary embodiment of the application;
FIG. 5 is a flowchart illustrating a deployment method of an AI model service cluster architecture according to an exemplary embodiment of the application;
FIG. 6 is a schematic diagram of a deployment device of an AI model service cluster architecture according to an exemplary embodiment of the application;
fig. 7 is a structure of an electronic device provided in an exemplary embodiment of the present application.
Detailed Description
Hereinafter, exemplary embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are only some embodiments of the present application and not all embodiments of the present application, and it should be understood that the present application is not limited by the example embodiments described herein.
It should be noted that: the relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present application unless it is specifically stated otherwise.
It will be appreciated by those of skill in the art that the terms "first," "second," etc. in embodiments of the present application are used merely to distinguish between different steps, devices or modules, etc., and do not represent any particular technical meaning nor necessarily logical order between them.
It should also be understood that in embodiments of the present application, "plurality" may refer to two or more, and "at least one" may refer to one, two or more.
It should also be appreciated that any component, data, or structure referred to in an embodiment of the application may be generally understood as one or more without explicit limitation or the contrary in the context.
In addition, the term "and/or" in the present application is merely an association relationship describing the association object, and indicates that three relationships may exist, for example, a and/or B may indicate: a exists alone, A and B exist together, and B exists alone. In the present application, the character "/" generally indicates that the front and rear related objects are an or relationship.
It should also be understood that the description of the embodiments of the present application emphasizes the differences between the embodiments, and that the same or similar features may be referred to each other, and for brevity, will not be described in detail.
Meanwhile, it should be understood that the sizes of the respective parts shown in the drawings are not drawn in actual scale for convenience of description.
The following description of at least one exemplary embodiment is merely exemplary in nature and is in no way intended to limit the application, its application, or uses.
Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but where appropriate, the techniques, methods, and apparatus should be considered part of the specification.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further discussion thereof is necessary in subsequent figures.
Embodiments of the application are operational with numerous other general purpose or special purpose computing system environments or configurations with electronic devices, such as terminal devices, computer systems, servers, etc. Examples of well known terminal devices, computing systems, environments, and/or configurations that may be suitable for use with the terminal device, computer system, server, or other electronic device include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, network personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments that include any of the foregoing, and the like.
Electronic devices such as terminal devices, computer systems, servers, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc., that perform particular tasks or implement particular abstract data types. The computer system/server may be implemented in a distributed cloud computing environment in which tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computing system storage media including memory storage devices.
Exemplary method
Fig. 1 is a flowchart of a deployment method of an AI model service cluster architecture according to an exemplary embodiment of the application. The present embodiment can be applied to an electronic device, as shown in fig. 1, a deployment method 100 of an AI model service cluster architecture includes the following steps:
step 101, the client program sends an AI model service cluster architecture deployment request to a process program of the management server, wherein the process program is developed in Golang language.
Step 102, referring to fig. 2, a process program obtains a prediction engine process program file according to a deployment request, and sends the prediction engine process program in the prediction engine process program file to a prediction server cluster and starts the prediction engine process program, where the prediction engine process program includes a plurality of machine learning frameworks.
Step 103, the prediction server cluster returns a deployment success signal to the process program, and the process program adds the prediction engine process program into a pre-established prediction engine list.
Step 104, the process program sends heartbeat verification to the prediction engine process program of the prediction server cluster at intervals of a preset time.
Step 105, the prediction engine process program returns the heartbeat proof health to the process program.
And 106, returning a deployment success message of the AI model service cluster architecture to the client program by the process program to complete the deployment of the AI model service cluster architecture.
Optionally, the process program registers IP addresses, user name and password information of all nodes in the prediction server cluster, and the respective engine process programs and the process programs communicate with each other through a network.
Specifically, the technical scheme in the embodiment of the application is as follows:
wherein, the process program (LB Server): the program process developed by Golang functionally and generally manages the automatic deployment and current availability of a plurality of LB works.
Prediction engine Process program (LB Worker): the prediction engine is an actual process for providing an AI model service, a plurality of machine learning frames (model service parts) are contained in the prediction engine, and a new machine learning frame can be simply and quickly integrated to realize support of a plurality of models.
LB: abbreviation for artificial intelligence modeling platform LittleBoy.
As shown in FIG. 2, which is a technical frame diagram of the present application, the LBServer process is deployed in a server, called a management server, in which information such as IP addresses, user name passwords, etc. of all nodes in the prediction cluster are registered; other servers are used as "model services" and are called predictive servers. On each prediction server, an LBWorker process is deployed, and the processes and the LBServer communicate with each other through a network to form a cluster together.
As shown in the schematic diagram of a single LB workbench process in FIG. 3, the main embodiment is that a plurality of machine learning frameworks are covered in the single LBworkbench process, and the prediction services of a plurality of AI models can be supported. Each LBWorker runs alone on one server, and when the predicted actions are run, the LBWorker invokes the computing resources of the server where it is located, such as a CPU or graphics card.
Further, referring to fig. 4 and 5, after the automatic deployment instruction is issued during the automatic deployment phase, the LBServer will first obtain the lbworkbench program file, and automatically send the lbworkbench to the corresponding node, so as to operate. As shown in fig. 3, the blocks represent the prediction servers, and when there are multiple empty prediction servers to deploy lbworks, the sending strategy of the model increases exponentially, i.e. two are sent first, two are copied automatically and become four in the second step, and the third step continues to copy itself and becomes eight. Deployed to all servers at the fastest speed.
Further, the LBServer adds the running LBWorker to a "prediction Engine List", which is only one data structure stored in the memory of the LBServer, recording all the running LBWorkers.
Further, the LBServer performs periodic heartbeat detection on the lbworkbench to ensure that the lbworkbench is healthy and usable, if the lbworkbench is found to be problematic, the lbworkbench is tried to be redeployed and started, and the heartbeat is sent through rpc once every 5 seconds.
Optionally, referring to fig. 4, further includes: and carrying out batch prediction of the AI model according to the AI model service cluster architecture.
Optionally, referring to fig. 4, the batch prediction of AI models according to the AI model service cluster architecture includes:
the client program sends an AI model prediction request to the process program;
the process program sends the AI model to all the prediction engine process programs;
the process program sends the predicted data to all the prediction engine process programs;
all the prediction engine process programs return the prediction results to the process programs;
the process program sends an AI model prediction completion signal to the client program.
Specifically, when batch prediction is needed, the LBServer is responsible for equally distributing the data to be predicted to each LBWorkder, LBServer to send the predicted data in json form, and the LBWorker returns the predicted result in json form.
Optionally, referring to fig. 4, further includes:
the client program sends a request for hot replacement of the new AI model to the process program;
the process program sends the new AI model to all the prediction engine process programs;
the prediction engine process program uninstalls the old version of AI model, installs the new version of AI model, and returns a hot replacement success signal to the process program;
the process program returns a new version AI model replacement success signal to the client program.
Specifically, when the model has provided services online, if the old version model needs to be replaced by the new version model, the LBServer sends the new model to the LBWorker, and then the inside of the LBWorker automatically replaces the new model, and returns a success signal.
Optionally, referring to fig. 4, further includes:
the client program sends a request for hot replacement of the new version of prediction engine process program to the process program;
the process program deploys a new version of the prediction engine process program to the prediction server cluster;
the process program deletes the old version of the prediction engine process program from the prediction engine list, and adds the new version of the prediction engine process program into the prediction engine list;
the process program deletes the old version of prediction engine process program from the prediction server cluster;
the process program replaces the new version of the prediction engine process program with a signal back to the client program.
Specifically, when the lbworkbench needs to be updated, a new framework is quickly added by a programmer, a new program package is generated, the LBServer can automatically replace the old lbworkbench to realize seamless switching, and meanwhile, the LBServer updates a 'prediction engine list'.
Therefore, the application develops the progress program based on Golang language, and greatly surpasses Java programs in the aspects of operation efficiency and high-efficiency concurrency. Each node (server) only needs to run one process, and the process comprises a plurality of technical frameworks which can simultaneously support the running of different AI models, and if one node is added in the cluster, the process can be simply run on a new node. The AI model service cluster architecture realized by the technical scheme considers a quick compatible new framework, designs a whole set of interfaces in an object-oriented mode on a code layer, and can realize quick access according to the existing interfaces when the new framework needs to be added.
Exemplary apparatus
Fig. 6 is a schematic structural diagram of a deployment device of an AI model service cluster architecture according to an exemplary embodiment of the present application. As shown in fig. 6, the apparatus 600 includes:
a first sending module 610, configured to send, by a client program, an AI model service cluster architecture deployment request to a process program of a management server, where the process program is developed in a Golang language;
the second sending module 620 is configured to obtain a prediction engine process program file according to the deployment request by the process program, send the prediction engine process program in the prediction engine process program file to the prediction server cluster, and start the prediction engine process program, where the prediction engine process program includes a plurality of machine learning frameworks;
a first return module 630, configured to return a deployment success signal to the process program from the prediction server cluster, where the process program adds the prediction engine process program to a pre-established prediction engine list;
a third sending module 640, configured to send heartbeat verification to a prediction engine process program of the prediction server cluster at a predetermined interval;
a second return module 650 for predicting that the engine process program returns a heartbeat proof health to the process program;
and a third return module 660, configured to return, by the process program, a deployment success message of the AI model service cluster architecture to the client program, thereby completing deployment of the AI model service cluster architecture.
Optionally, the process program registers IP addresses, user name and password information of all nodes in the prediction server cluster, and the respective engine process programs and the process programs communicate with each other through a network.
Optionally, the apparatus 600 further comprises: and the prediction module is used for carrying out batch prediction of the AI model according to the AI model service cluster architecture.
Optionally, the prediction module includes:
the first sending submodule is used for sending an AI model prediction request to the process program by the client program;
the second sending submodule is used for sending the AI model to all prediction engine process programs by the process program;
the third sending sub-module is used for sending the predicted data to all the prediction engine process programs by the process program;
the return sub-module is used for returning the prediction result to the process program by all the prediction engine process programs;
and the fourth sending submodule is used for sending the AI model prediction completion signal to the client program by the process program.
Optionally, the apparatus 600 further comprises:
a fourth sending module, configured to send a hot-replacement new version AI model request to the process program by the client program;
the fifth sending module is used for sending the new AI model to all the prediction engine process programs by the process program;
the uninstallation installation module is used for predicting the old version AI model to be uninstalled by the engine process program, installing the new version AI model and returning a hot replacement success signal to the process program;
and the fourth return module is used for returning the new version of AI model replacement success signal to the client program by the process program.
Optionally, the apparatus 600 further comprises:
a sixth sending module, configured to send, to the process program, a request for hot replacement of the new version of the prediction engine process program by the client program;
the deployment module is used for deploying a new version of prediction engine process program to the prediction server cluster by the process program;
the deletion adding module is used for deleting the old version of prediction engine process program from the prediction engine list by the process program and adding the new version of prediction engine process program into the prediction engine list;
the deleting module is used for deleting the old version of prediction engine process program from the prediction server cluster by the process program;
and the fifth return module is used for replacing the new version of the prediction engine process program by the process program and returning a signal to the client program.
Exemplary electronic device
Fig. 7 is a structure of an electronic device provided in an exemplary embodiment of the present application. As shown in fig. 7, the electronic device 70 includes one or more processors 71 and memory 72.
The processor 71 may be a Central Processing Unit (CPU) or other form of processing unit having data processing and/or instruction execution capabilities, and may control other components in the electronic device to perform desired functions.
Memory 72 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random Access Memory (RAM) and/or cache memory (cache), and the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions may be stored on the computer readable storage medium that can be executed by the processor 71 to implement the methods of the software programs of the various embodiments of the present application described above and/or other desired functions. In one example, the electronic device may further include: an input device 73 and an output device 74, which are interconnected by a bus system and/or other forms of connection mechanisms (not shown).
In addition, the input device 73 may also include, for example, a keyboard, a mouse, and the like.
The output device 74 can output various information to the outside. The output device 74 may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, among others.
Of course, only some of the components of the electronic device relevant to the present application are shown in fig. 7 for simplicity, components such as buses, input/output interfaces, etc. being omitted. In addition, the electronic device may include any other suitable components depending on the particular application.
Exemplary computer program product and computer readable storage Medium
In addition to the methods and apparatus described above, embodiments of the application may also be a computer program product comprising computer program instructions which, when executed by a processor, cause the processor to perform steps in a method according to various embodiments of the application described in the "exemplary methods" section of this specification.
The computer program product may write program code for performing operations of embodiments of the present application in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server.
Furthermore, embodiments of the present application may also be a computer-readable storage medium, having stored thereon computer program instructions which, when executed by a processor, cause the processor to perform the steps in a method of mining history change records according to various embodiments of the present application described in the "exemplary methods" section above in this specification.
The computer readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can include, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The basic principles of the present application have been described above in connection with specific embodiments, however, it should be noted that the advantages, benefits, effects, etc. mentioned in the present application are merely examples and not intended to be limiting, and these advantages, benefits, effects, etc. are not to be considered as essential to the various embodiments of the present application. Furthermore, the specific details disclosed herein are for purposes of illustration and understanding only, and are not intended to be limiting, as the application is not necessarily limited to practice with the above described specific details.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different manner from other embodiments, so that the same or similar parts between the embodiments are mutually referred to. For system embodiments, the description is relatively simple as it essentially corresponds to method embodiments, and reference should be made to the description of method embodiments for relevant points.
The block diagrams of the devices, systems, apparatuses, systems according to the present application are merely illustrative examples and are not intended to require or imply that the connections, arrangements, configurations must be made in the manner shown in the block diagrams. As will be appreciated by one of skill in the art, the devices, systems, apparatuses, systems may be connected, arranged, configured in any manner. Words such as "including," "comprising," "having," and the like are words of openness and mean "including but not limited to," and are used interchangeably therewith. The terms "or" and "as used herein refer to and are used interchangeably with the term" and/or "unless the context clearly indicates otherwise. The term "such as" as used herein refers to, and is used interchangeably with, the phrase "such as, but not limited to.
The method and system of the present application may be implemented in a number of ways. For example, the methods and systems of the present application may be implemented by software, hardware, firmware, or any combination of software, hardware, firmware. The above-described sequence of steps for the method is for illustration only, and the steps of the method of the present application are not limited to the sequence specifically described above unless specifically stated otherwise. Furthermore, in some embodiments, the present application may also be embodied as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present application. Thus, the present application also covers a recording medium storing a program for executing the method according to the present application.
It is also noted that in the systems, devices and methods of the present application, components or steps may be disassembled and/or assembled. Such decomposition and/or recombination should be considered as equivalent aspects of the present application. The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present application. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the application. Thus, the present application is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit embodiments of the application to the form disclosed herein. Although a number of example aspects and embodiments have been discussed above, a person of ordinary skill in the art will recognize certain variations, modifications, alterations, additions, and subcombinations thereof.

Claims (9)

1. A method for deploying an AI model service cluster architecture, comprising:
the client program sends an AI model service cluster architecture deployment request to a process program of a management server, wherein the process program is developed by Golang language;
the process program obtains a prediction engine process program file according to the deployment request, and sends the prediction engine process program in the prediction engine process program file to a prediction server cluster and starts the prediction engine process program, wherein the prediction engine process program comprises a plurality of machine learning frames;
the prediction server cluster returns a deployment success signal to the process program, and the process program adds the prediction engine process program into a pre-established prediction engine list;
the process program sends heartbeat verification to the prediction engine process program of the prediction server cluster at intervals of preset time;
the prediction engine process program returns heartbeat proof health to the process program;
and the process program returns a deployment success message of the AI model service cluster architecture to the client program to complete the deployment of the AI model service cluster architecture.
2. The method according to claim 1, wherein IP addresses, user name password information of all nodes in the prediction server cluster are registered in the process program, and each of the engine process program and the process program communicate with each other via a network.
3. The method as recited in claim 1, further comprising: and carrying out batch prediction of the AI model according to the AI model service cluster architecture.
4. The method of claim 3, wherein performing batch prediction of AI models from the AI model services cluster architecture comprises:
the client program sends an AI model prediction request to the process program;
the process program sends the AI model to all prediction engine process programs;
the process program sends the predicted data to all the prediction engine process programs;
all prediction engine process programs return prediction results to the process programs;
the process program sends an AI model prediction completion signal to the client program.
5. The method as recited in claim 1, further comprising:
the client program sends a request for hot replacement of a new AI model to the process program;
the process program sends the new AI model to all prediction engine process programs;
the prediction engine process program uninstalls an old version of AI model, installs the new version of AI model, and returns a hot replacement success signal to the process program;
the process program returns the new version of AI model replacement success signal to the client program.
6. The method as recited in claim 1, further comprising:
the client program sends a request for hot replacement of a new version of prediction engine process program to the process program;
the process program deploys the new version of prediction engine process program to the prediction server cluster;
the process program deletes the old version of the prediction engine process program from the prediction engine list, and adds the new version of the prediction engine process program into the prediction engine list;
the process program deletes old version prediction engine process programs from the prediction server cluster;
the process program replaces the new version of the prediction engine process program with a signal to return to the client program.
7. A deployment apparatus for an AI model service cluster architecture, comprising:
the client program is used for sending an AI model service cluster architecture deployment request to a process program of the management server, wherein the process program is developed by Golang language;
the second sending module is used for acquiring a prediction engine process program file according to the deployment request by the process program, sending the prediction engine process program in the prediction engine process program file to a prediction server cluster and starting the prediction engine process program, wherein the prediction engine process program comprises a plurality of machine learning frames;
the first return module is used for returning a deployment success signal to the process program by the prediction server cluster, and the process program adds the prediction engine process program into a pre-established prediction engine list;
the third sending module is used for sending heartbeat verification to the prediction engine process program of the prediction server cluster at intervals of preset time;
the second return module is used for returning the heartbeat proof health to the process program by the prediction engine process program;
and the third return module is used for returning the successful deployment message of the AI model service cluster architecture to the client program by the process program to complete the deployment of the AI model service cluster architecture.
8. A computer readable storage medium, characterized in that the storage medium stores a computer program for executing the method of any of the preceding claims 1-6.
9. An electronic device, the electronic device comprising:
a processor;
a memory for storing the processor-executable instructions;
the processor is configured to read the executable instructions from the memory and execute the instructions to implement the method of any of the preceding claims 1-6.
CN202311041399.3A 2023-08-18 2023-08-18 Deployment method, device and medium of AI model service cluster architecture Pending CN116775047A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311041399.3A CN116775047A (en) 2023-08-18 2023-08-18 Deployment method, device and medium of AI model service cluster architecture

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311041399.3A CN116775047A (en) 2023-08-18 2023-08-18 Deployment method, device and medium of AI model service cluster architecture

Publications (1)

Publication Number Publication Date
CN116775047A true CN116775047A (en) 2023-09-19

Family

ID=87993415

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311041399.3A Pending CN116775047A (en) 2023-08-18 2023-08-18 Deployment method, device and medium of AI model service cluster architecture

Country Status (1)

Country Link
CN (1) CN116775047A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140279788A1 (en) * 2013-03-15 2014-09-18 Tibco Software Inc. Predictive System for Designing Enterprise Applications
CN110555550A (en) * 2019-08-22 2019-12-10 阿里巴巴集团控股有限公司 Online prediction service deployment method, device and equipment
CN111340232A (en) * 2020-02-17 2020-06-26 支付宝(杭州)信息技术有限公司 Online prediction service deployment method and device, electronic equipment and storage medium
CN115174551A (en) * 2022-05-31 2022-10-11 青岛海尔科技有限公司 Program deployment method and device, storage medium and electronic device
CN116069341A (en) * 2022-12-07 2023-05-05 厦门石头城软件技术有限公司 Automatic deployment method, equipment and storage medium for application program

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140279788A1 (en) * 2013-03-15 2014-09-18 Tibco Software Inc. Predictive System for Designing Enterprise Applications
CN110555550A (en) * 2019-08-22 2019-12-10 阿里巴巴集团控股有限公司 Online prediction service deployment method, device and equipment
CN111340232A (en) * 2020-02-17 2020-06-26 支付宝(杭州)信息技术有限公司 Online prediction service deployment method and device, electronic equipment and storage medium
CN115174551A (en) * 2022-05-31 2022-10-11 青岛海尔科技有限公司 Program deployment method and device, storage medium and electronic device
CN116069341A (en) * 2022-12-07 2023-05-05 厦门石头城软件技术有限公司 Automatic deployment method, equipment and storage medium for application program

Similar Documents

Publication Publication Date Title
CN112416524A (en) Implementation method and device of cross-platform CI/CD (compact disc/compact disc) based on docker and kubernets offline
CN113971095A (en) KUBERNETES application program interface in extended process
KR20200080296A (en) Create and distribute packages for machine learning on end devices
CN111061487A (en) Container-based load balancing distributed compiling system and method
US20100312879A1 (en) Plug-in provisioning integration in a clustered environment
WO2024002243A1 (en) Application management method, application subscription method, and related device
CN114968406B (en) Plug-in management method and device, electronic equipment and storage medium
CN111679888A (en) Deployment method and device of agent container
US20170329700A1 (en) Executing Multi-Version Tests Against a Multi-Version Application
CN115291946A (en) Hongmong system transplanting method, device, electronic equipment and readable medium
CN115051846A (en) Deployment method of K8S cluster based on super fusion platform and electronic equipment
CN112256287A (en) Application deployment method and device
CN116382694A (en) Method for improving compiling speed of Maven engineering in container environment
CN116775047A (en) Deployment method, device and medium of AI model service cluster architecture
US20180341475A1 (en) Just In Time Deployment with Package Managers
US11366648B2 (en) Compiling monoglot function compositions into a single entity
CN116830079A (en) Method and system for customizing a native build environment image
CN114461249A (en) Micro-service deployment method, device, code server and storage medium
CN113568623A (en) Application deployment method and device and electronic equipment
CN115604101B (en) System management method and related equipment
CN111274211A (en) Application file storage method, device and system
CN114240265B (en) Product deployment method and device based on mixed environment
US20240054000A1 (en) Container scheduling and deployment method and apparatus, and domain controller system
CN111722866B (en) OpenStack code repairing method, device, equipment and storage medium
CN112416390B (en) Method and device for updating npm package, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20230919