CN117785224A - Model deployment method and device, electronic equipment and storage medium - Google Patents

Model deployment method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN117785224A
CN117785224A CN202311702833.8A CN202311702833A CN117785224A CN 117785224 A CN117785224 A CN 117785224A CN 202311702833 A CN202311702833 A CN 202311702833A CN 117785224 A CN117785224 A CN 117785224A
Authority
CN
China
Prior art keywords
service
model
container
deployment
target container
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311702833.8A
Other languages
Chinese (zh)
Inventor
谢章良
章峰
曾智勇
胡小元
刘俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dark Matter Beijing Intelligent Technology Co ltd
Original Assignee
Dark Matter Beijing Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dark Matter Beijing Intelligent Technology Co ltd filed Critical Dark Matter Beijing Intelligent Technology Co ltd
Priority to CN202311702833.8A priority Critical patent/CN117785224A/en
Publication of CN117785224A publication Critical patent/CN117785224A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Stored Programmes (AREA)

Abstract

The application provides a model deployment method, a device, electronic equipment and a storage medium, wherein the method comprises the following steps: obtaining a model configuration file for model deployment; creating a model service based on the model configuration file; deploying the model service into at least one target container in a pre-created container cluster, and starting to run the model service in the target container, so that external services access the model service through a pre-deployed service application architecture and a model access interface on the target container pre-allocated for the model service. The method and the device provide more convenient and efficient model deployment service through the containerization technology, and ensure high reliability, high performance and easy management of model deployment.

Description

Model deployment method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a model deployment method, a device, an electronic apparatus, and a storage medium.
Background
With the continuous development of artificial intelligence and deep learning technology, deep learning models are widely used in various fields. The deployment of the trained deep learning model into the production environment is a critical step in the actual application process.
Since the model deployment needs to take into consideration many factors, such as whether the deployment mode is reliable, whether the performance is superior, whether the model deployment is easy to manage, etc., how to deploy the deep learning model to ensure high reliability, high performance and easy management is a problem to be solved.
Disclosure of Invention
The present application aims to provide a model deployment method, device, electronic equipment and storage medium, which solve the problems of how to deploy a deep learning model to ensure high reliability, high performance and easy management, aiming at the defects in the prior art.
In order to achieve the above purpose, the technical solution adopted in the embodiment of the present application is as follows:
in a first aspect, an embodiment of the present application provides a model deployment method, where the method includes:
obtaining a model configuration file for model deployment;
creating a model service based on the model configuration file;
deploying the model service into at least one target container in a pre-created container cluster, and starting to run the model service in the target container, so that external services access the model service through a pre-deployed service application architecture and a model access interface on the target container pre-allocated for the model service.
As a possible implementation manner, before the deploying the model service into at least one target container in the pre-created container cluster, the method further includes:
acquiring a predefined container configuration file, wherein the container configuration file comprises mirror image parameter information, definition and parameters of a container;
and creating the container cluster according to the container configuration file.
As a possible implementation manner, the deploying the model service into at least one target container in the pre-created container cluster includes:
acquiring predefined service node information;
determining the node number corresponding to the model service and the node type required by the model service according to the service node information;
and screening at least one target container from the container cluster according to the node number and the node type, and respectively deploying the model service into each target container.
As a possible implementation manner, after the deploying the model service into at least one target container in the pre-created container cluster, the method further includes:
acquiring a service access address allocated for the model service in the process of creating the model service by an external service, and pushing the service access address to the service application architecture by using a message queue;
And distributing a model access interface for the model service, and pushing the model access interface to the service application architecture.
As a possible implementation manner, before the obtaining the service access address allocated to the model service in the process of creating the model service, the method further includes:
acquiring a predefined service mapping rule and a predefined routing rule, wherein the service mapping rule is used for representing the corresponding relation between a gateway domain name and a model service;
determining a gateway domain name corresponding to the model service based on the service mapping rule, and determining a routing address corresponding to the model service based on the routing rule;
and determining the service access address of the model service according to the gateway domain name and the routing address.
As a possible implementation manner, after the running of the model service is started in the target container, the method further includes:
sending a service reasoning request to the service application architecture by an external service, wherein the service reasoning request comprises a service access address of a model service to be accessed;
and accessing the model service to be accessed by the service application architecture through the model access interface and based on the service access address.
As a possible implementation manner, the method further includes:
acquiring a plurality of probes, and correspondingly adding each probe into each container in the container cluster;
and detecting a port of the target container by using a target probe corresponding to the target container, and monitoring the working state of the target container.
In a second aspect, embodiments of the present application provide a model deployment apparatus, the apparatus including:
the acquisition module is used for acquiring a model configuration file for model deployment;
the creation module is used for creating a model service based on the model configuration file;
the deployment module is used for deploying the model service into at least one target container in a pre-created container cluster, and starting to run the model service in the target container so that an external service accesses the model service through a pre-deployed service application architecture and a model access interface on the target container which is pre-allocated for the model service.
As a possible implementation manner, the creating module is further configured to:
acquiring a predefined container configuration file, wherein the container configuration file comprises mirror image parameter information, definition and parameters of a container;
And creating the container cluster according to the container configuration file.
As a possible implementation manner, the deployment module is specifically configured to:
acquiring predefined service node information;
determining the node number corresponding to the model service and the node type required by the model service according to the service node information;
and screening at least one target container from the container cluster according to the node number and the node type, and respectively deploying the model service into each target container.
As a possible implementation manner, the deployment module is further configured to:
acquiring a service access address allocated for the model service in the process of creating the model service by an external service, and pushing the service access address to the service application architecture by using a message queue;
and distributing a model access interface for the model service, and pushing the model access interface to the service application architecture.
As a possible implementation manner, the deployment module is further configured to:
acquiring a predefined service mapping rule and a predefined routing rule, wherein the service mapping rule is used for representing the corresponding relation between a gateway domain name and a model service;
Determining a gateway domain name corresponding to the model service based on the service mapping rule, and determining a routing address corresponding to the model service based on the routing rule;
and determining the service access address of the model service according to the gateway domain name and the routing address.
As a possible implementation manner, the deployment module is further configured to:
sending a service reasoning request to the service application architecture by an external service, wherein the service reasoning request comprises a service access address of a model service to be accessed;
and accessing the model service to be accessed by the service application architecture through the model access interface and based on the service access address.
As a possible implementation manner, the deployment module is further configured to:
acquiring a plurality of probes, and correspondingly adding each probe into each container in the container cluster;
and detecting a port of the target container by using a target probe corresponding to the target container, and monitoring the working state of the target container.
In a third aspect, an embodiment of the present application provides an electronic device, including: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor in communication with the storage medium via the bus when the electronic device is running, the processor executing the machine-readable instructions to perform the steps of the model deployment method according to any of the first aspects above.
In a fourth aspect, embodiments of the present application provide a storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the model deployment method according to any of the first aspects described above.
According to the model deployment method, the device, the electronic equipment and the storage medium, the model configuration file for model deployment is obtained, the model service is created based on the model configuration file, then the model service is deployed into at least one target container in the pre-created container cluster, and the running model service is started in the target container, so that the external service accesses the model service through the pre-deployed service application architecture and the model access interface on the target container pre-distributed for the model service. Based on this, a model service is created from a model configuration file for model deployment, and after the model service is created, a container cluster is created using an open source container orchestration tool, and the model service is deployed into a target container of the container cluster based on a containerization technique to launch the running model service in the target container. In addition, a model access interface is assigned to the model service such that the external service accesses the model service through the pre-deployed service application architecture and the model access interface. Therefore, based on an open source container arrangement tool and a container technology, a more convenient and efficient deep learning model deployment service is provided for users, high reliability, high performance and easy management of model deployment are ensured, and further the efficiency and reliability of model deployment are improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered limiting the scope, and that other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 shows a flow diagram of a model deployment method according to an embodiment of the present application;
fig. 2 is a schematic flow chart of a model service deployment method according to an embodiment of the present application;
fig. 3 is a schematic flow chart of a pushing method according to an embodiment of the present application;
FIG. 4 shows a schematic diagram of a model service access reasoning provided by an embodiment of the present application;
fig. 5 is a schematic flow chart of a method for monitoring a container status according to an embodiment of the present application;
fig. 6 shows a schematic structural diagram of a model deployment device according to an embodiment of the present application;
fig. 7 shows a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it should be understood that the accompanying drawings in the present application are only for the purpose of illustration and description, and are not intended to limit the protection scope of the present application. In addition, it should be understood that the schematic drawings are not drawn to scale. A flowchart, as used in this application, illustrates operations implemented according to some embodiments of the present application. It should be understood that the operations of the flow diagrams may be implemented out of order and that steps without logical context may be performed in reverse order or concurrently. Moreover, one or more other operations may be added to the flow diagrams and one or more operations may be removed from the flow diagrams as directed by those skilled in the art.
In addition, the described embodiments are only some, but not all, of the embodiments of the present application. The components of the embodiments of the present application, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, as provided in the accompanying drawings, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, are intended to be within the scope of the present application.
In order to enable one skilled in the art to utilize the present disclosure, the following embodiments are presented in connection with a particular application scenario "model service". It will be apparent to those having ordinary skill in the art that the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present application. Although the present application is primarily described in terms of a model deployment method, it should be understood that this is but one exemplary embodiment.
It should be noted that the term "comprising" will be used in the embodiments of the present application to indicate the presence of the features stated hereinafter, but not to exclude the addition of other features.
Based on the containerization technology, application programs and services can be packaged into lightweight and portable containers, and further the technical characteristics of more efficient and flexible deployment and management are achieved.
Fig. 1 shows a flow chart of a model deployment method according to an embodiment of the present application. Referring to fig. 1, the method specifically includes the following steps:
s101, obtaining a model configuration file for model deployment.
Optionally, for the deep learning model, the model deployment refers to a process of making the trained model run in a specific environment, so as to convert the deep learning model from a research stage to an actual application, so that the deep learning model can provide services for users or solve actual problems. Deep learning models are usually written by frames, which are not suitable for installation in a production environment due to the limitation of frame scale and depending on the environment, and because the structure of the deep learning model is usually huge and a great amount of calculation power is required to meet the requirement of real-time operation, the model structure and model parameters of the deep learning model are converted into model configuration files so as to provide more convenient and efficient model deployment services based on the model configuration files by utilizing a containerization technology.
Exemplary, the model configuration file includes configuration information of the deployed model by the user, and the configuration information includes, but is not limited to, model configuration information, mirror configuration information, resource configuration information, and service configuration information.
S102, creating a model service based on the model configuration file.
Alternatively, model services are a technique that deploys machine learning or deep learning models into a production environment and provides services through application programming interfaces (Application Programming Interface, APIs) to provide intelligent prediction or decision support for users or applications.
Exemplary, the creation of the model service is performed according to the configured model configuration information, mirror configuration information, resource configuration information and service configuration information of the deep learning model, so as to deploy the deep learning model into the production environment for operation. In addition, to achieve service access after model service deployment, the service address type in the open source container orchestration tool k8s, i.e. ClusterIP, is used when creating the model service, and an Ingress gateway is used to proxy, where Ingress is an abstraction of the reverse proxy by the open source container orchestration tool k8s, which is a rule defining how a request is forwarded to the model service, and to avoid multiple resolution when using a domain name server, the Ingress domain name used by accessing the model service is the same.
S103, deploying the model service into at least one target container in the pre-created container cluster, and starting the running model service in the target container, so that the external service accesses the model service through a pre-deployed service application architecture and a model access interface on the target container pre-allocated for the model service.
Alternatively, the embodiment of the application selects a service type of a container, which is a deployment environment for constructing a model service deployment, and the container can run on any machine provided with a container running environment and provides independent resource space when a program runs. Therefore, after the created model service is deployed into at least one target container in the pre-created container cluster, the model service can be ensured to normally run in the target container after being started.
In addition, when the number of containers in the container cluster reaches a certain scale, the open source container orchestration tool k8s is required to schedule and manage the containers in the container cluster.
Therefore, according to the model deployment method provided by the embodiment of the application, the model configuration file for model deployment is obtained, the model service is created based on the model configuration file, then the model service is deployed into at least one target container in the pre-created container cluster, and the running model service is started in the target container, so that the external service accesses the model service through the pre-deployed service application architecture and the model access interface on the target container pre-allocated for the model service. Based on this, a model service is created from a model configuration file for model deployment, and after the model service is created, a container cluster is created using an open source container orchestration tool, and the model service is deployed into a target container of the container cluster based on a containerization technique to launch the running model service in the target container. In addition, a model access interface is assigned to the model service such that the external service accesses the model service through the pre-deployed service application architecture and the model access interface. Therefore, based on an open source container arrangement tool and a container technology, a more convenient and efficient deep learning model deployment service is provided for users, high reliability, high performance and easy management of model deployment are ensured, and further the efficiency and reliability of model deployment are improved.
As a possible implementation manner, before the deploying of the model service into at least one target container in the pre-created container cluster in the step S103, the method further includes: a predefined container configuration file is obtained, and a container cluster is created according to the container configuration file.
Exemplary, the container configuration file includes, but is not limited to, mirror parameter information, definition and parameters of a container, and after the creation of a model service is successful, clicking on line, and then creating the container, wherein service on line refers to that the developed service is deployed into a formal environment through release, and a user accesses through a designated service access entry. An open source container orchestration tool Kubernetes is deployed on an electronic device deployed by a running model, which can learn the creation needs of the container by reading the container configuration file. And because the open source container arrangement tool Kubernetes has the function of automatic boxing, the resource allocation requirements of the containers corresponding to the operation environment can be determined according to the predefined container configuration file, and then the containers are automatically deployed based on the resource allocation requirements of the containers to the application operation environment by utilizing the function of the open source container arrangement tool Kubernetes, so as to create a container cluster.
Based on the method, each container in the container cluster is isolated from each other, and processes among the containers cannot affect each other, so that after model services are deployed into at least one target container in the container cluster which is created in advance, the model services in the target containers can normally operate in the same production environment.
As a possible implementation manner, as shown in fig. 2, the step S103 deploys a model service into at least one target container in the pre-created container cluster, and specifically includes the following steps:
s201, obtaining predefined service node information.
For example, in the process of creating the model service, in addition to setting environment variable information of the image to conveniently specify some variable information required in the running process of the image, service node information may be predefined, and the service node information may include the number of nodes of the service, the types of nodes required for the service, and the like.
S202, determining the node number corresponding to the model service and the node type required by the model service according to the service node information.
Illustratively, the number of nodes of the created model service, i.e. whether the created model service corresponds to a single node or multiple nodes, is determined based on predefined service node information, wherein the number of nodes corresponds to the number of target containers, i.e. whether the model service is deployed into one target container or into multiple target containers.
The node types required for the model services are illustratively a central processing unit (Central Processing Unit, CPU) or a graphics processor (Graphics Processing Unit, GPU), and the node types required for the created model services are determined based on predefined service node information, i.e. the deployment environment corresponding to the model services is selected, i.e. the selection and configuration of the container hardware devices are considered.
S203, screening at least one target container from the container cluster according to the node number and the node type, and respectively deploying the model service into each target container.
For example, if the number of nodes corresponding to the determined model service is 1, i.e. a single node, that is, the model service is deployed into one target container in the container cluster, and if the number of nodes corresponding to the determined model service is greater than 1, i.e. multiple nodes, for example, 3, that is, the model service is deployed into three target containers in the container cluster.
It should be noted that the choice of the relationship between the container and the node affects the performance and stability of the container, and there are two different forms of the relationship between the container and the node, one is that the container a is on the node1, the container B is on the node2, the two containers transmit data and information through the network, and the other is that the container a and the container B are both on the node1, and the containers transmit data and information through the memory or the non-physical network. In the first form, the container a has the computing power of the node1, and the container B has the computing power of the node2, so that the overall computing power is too strong compared with the second form, and compared with the first form, the second form, the data exchange between the container a and the container B does not pass through a physical network, so that the operation efficiency is higher.
Based on the method, the number of the target containers and the number of the target containers for the model service deployment are determined through the node number and the node type, and a proper deployment relation between the containers and the nodes is selected according to actual requirements, so that the model service is deployed into the corresponding target containers according to the selected deployment relation.
As a possible implementation manner, as shown in fig. 3, after the deploying the model service into at least one target container in the pre-created container cluster in the step S103, the method further includes the following steps:
s301, obtaining a service access address allocated for the model service in the process of creating the model service by an external service, and pushing the service access address to a service application architecture by using a message queue.
Illustratively, a unique service access address is generated for the model service in the process of creating the model service, and a user can access the model service and realize reasoning according to the service access address. It should be noted that, the service access address is not affected by the deployment of the model service, that is, the service access address corresponding to the model service is not changed when the same model service is deployed for multiple times, so that the service access address can be stably used by one creation. Wherein the service access address may be comprised of an inferred gateway address and a routing address.
S302, a model access interface is distributed for the model service, and the model access interface is pushed to the service application architecture.
The model access interface is an entry where a user can access the model service, after the container is successfully created, the routing information of the service access address is sent to the Kafka message queue and pushed to the service application architecture Nacos, and meanwhile, the model access interface allocated to the model service is also pushed to the service application architecture Nacos, so that an external service can enter an access channel through the model access interface based on the service application architecture Nacos and query the corresponding model service by using the service access address. The service application architecture Nacos is a platform which is easier to construct dynamic service registration and discovery, configuration management and service management of the cloud native application.
Based on the method, through real-time monitoring of the container, after the container is successfully created, the service access address and the model access interface are both pushed to the service application architecture, so that the model service is accessed based on the service access address and the model access interface to realize reasoning.
As a possible implementation manner, before the service access address allocated to the model service in the process of creating the model service is obtained in the step S301, the method further includes:
The method comprises the steps of obtaining a predefined service mapping rule and a routing rule, determining a gateway domain name corresponding to a model service based on the service mapping rule, determining a routing address corresponding to the model service based on the routing rule, and determining a service access address of the model service according to the gateway domain name and the routing address.
Illustratively, when the ingress resource of the open source container orchestration tool k8s is created, the configured service access address is "ingress domain name+uuid", where the ingress domain name is an object in the creation of the open source container orchestration tool k8s, the model service can be accessed through the domain name, and the routing rule UUID is a universally unique identification code (Universally Unique Identifier, UUID) corresponding to the model service. The service mapping rule is used for representing the corresponding relation between the gateway domain name and the model service, and the corresponding model service can be determined through gateway domain name ingress because the service access address corresponding to the model service is not changed and the UUID of the universal unique identifier of the route is also fixed.
Based on the method, the gateway domain name and the routing address corresponding to the model service are determined, so that the service access address corresponding to the model service is determined, and the service access address is not changed and is not influenced by the deployment of the model service, so that the service access address can be stably used after being established for each model service once.
As a possible implementation manner, after the operation model service is started in the target container in the step S103, the method further includes:
sending a service reasoning request to a service application architecture by an external service, wherein the service reasoning request comprises a service access address of a model service to be accessed; the model service to be accessed is accessed by the service application architecture through the model access interface and based on the service access address.
For example, as shown in fig. 4, the former model service creation, deployment, scheduling, and container cluster creation and container state monitoring have implemented the deployment of model services, and the service application architecture Nacos has also grasped the service access addresses and the model access interfaces corresponding to each model service. On the basis, the external service can send a service reasoning request to the service application architecture Nacos, specifically, the external service does not directly send the service reasoning request to the service application architecture Nacos, but pulls a route address and a service access address from the service application architecture Nacos in real time based on a reasoning gateway, so that the forwarding of the service reasoning request is realized, further, the service reasoning is performed by accessing the to-be-accessed model service through a model access interface and based on the service access address, and specifically, relevant information of each service reasoning request, such as parameter entering, parameter exiting, reasoning execution duration and the like, is recorded in real time in the service reasoning process.
Based on the method, the authentication, the current limiting and the recording functions of the model service are provided by reasoning the model service, so that the safety and the integrity of the model service are ensured.
As a possible implementation manner, as shown in fig. 5, the method further includes:
s501, acquiring a plurality of probes, and correspondingly adding each probe into each container in the container cluster.
Illustratively, a container cluster is made up of multiple containers, each corresponding to an application, which may hang up during operation due to some unexpected situation. Therefore, the stability of the state of the container is monitored by the probe, so that the model service can not be problematic during operation, and the mechanism of restarting after the occurrence of the problem becomes an extremely important problem.
Illustratively, a large number of asynchronous mechanisms and decoupling of various object relation designs are adopted in the open source container orchestration tool k8s, and when the number of containers is increased or decreased or the number of application versions and the like are changed, the container configuration cannot be guaranteed to always finish refreshing timely. In general, only newly added containers can finish self-initialization, and some old containers existing originally are immediately deleted, so that model services are temporarily unavailable, and therefore, some survivability probes are added in an open source container arrangement tool k8s and are correspondingly added into each container in a container cluster to monitor the working state of each container.
S502, detecting a port of the target container by using a target probe corresponding to the target container, and monitoring the working state of the target container.
Illustratively, in the open source container orchestration tool k8s, pod is the smallest deployable computing unit, is a collection of a group of containers, i.e., container clusters, sharing the same network namespace, storage resource, and so forth. Pod in the open source container orchestration tool k8s is mainly in the following states: suspending Pending, running, successful success, failed failure, unknown unowned. Where Pending indicates that the container image of the Pod has not been pulled or that the node resources required by the Pod have not been met, the Pod is not schedulable. Running means that Pod has been dispatched onto the node and all containers have been created, at least one container still being on-the-fly or during start-up. The buffered indicates that all containers in Pod have terminated normally and will not restart again. Faiiled means that at least one container in Pod has exited in a non-zero state. Unowns indicates a state where Pod cannot be acquired.
In addition to the above states, the Pod has some special states for recording some detailed information of the Pod, such as whether the Pod is in schedule, whether the container image is successfully pulled, etc. Illustratively, pod also has the following special states: pod Scheduled, containers Ready, initialized, ready, create Container Error, container Creating, etc. Where Pod schedule indicates whether Pod has been Scheduled onto a node, containers Ready indicates whether all Containers in Pod are Ready, initialized indicates whether all Containers in Pod have been Initialized, ready indicates whether Pod is Ready, i.e., all Containers have been started and can receive traffic, create Container Error indicates a failure to create a container, container Creating indicates that a container is being created.
It should be noted that, the port of the target container is detected by using the target probe corresponding to the target container, and the working state of the target container is monitored, mainly the Ready state of the container is monitored, that is, whether the container is started and can receive the traffic, so that the model service is actually available after the online success is ensured.
Based on the method, the working state of the container is monitored, in the monitoring process, the port of the container is detected by using the probe, the state of Ready and the like of the container is monitored, the working state of the container can be mastered in real time while the model service is ensured to be truly available after the online success, the stability of the state of the container is monitored, and the problem can not occur during the running of the model service.
Therefore, the embodiment of the application provides a method for deep learning model deployment based on Kubernetes, which realizes quick and efficient deployment and access to the deep learning model and realizes high efficiency, expandability, flexibility, safety and usability of model deployment by carrying out operations such as configuration, resource scheduling, container creation, dynamic routing and the like on model services. The high efficiency is reflected in the utilization of the high-efficiency scheduling characteristic and the containerization technology of the Kubernetes, a plurality of deep learning model services can be rapidly deployed and managed, and the efficiency and the stability of model deployment are improved. The expandability is embodied in that the Kubernetes cluster has good horizontal expansion capability, and can dynamically expand or contract capacity service according to business requirements so as to meet the requirements of large-scale data processing and calculation. The flexibility is characterized in that rich configuration options and service deployment management tools are provided, and users can flexibly configure and adjust according to own service requirements. The security is embodied in that the reasoning service provides authentication, current limiting and recording functions, and the security and the integrity of the service are ensured. The usability is shown in that the model deployment operation process is simple and easy to understand, so that a user can quickly get up and the user can conveniently deploy and manage the deep learning model.
Based on the same inventive concept, the embodiment of the present application further provides a model deployment device corresponding to the model deployment method, and since the principle of the device in the embodiment of the present application for solving the problem is similar to that of the model deployment method in the embodiment of the present application, the implementation of the model deployment device may refer to the implementation of the method, and the repetition is omitted.
Referring to fig. 6, a schematic structural diagram of a model deployment device according to an embodiment of the present application is shown, where the model deployment 600 includes: an acquisition module 601, a creation module 602, a deployment module 603, wherein:
an obtaining module 601, configured to obtain a model configuration file for model deployment;
a creation module 602 for creating a model service based on the model configuration file;
a deployment module 603, configured to deploy the model service into at least one target container in the pre-created container cluster, and launch the running model service in the target container, so that the external service accesses the model service through the pre-deployed service application architecture and the model access interface on the target container pre-allocated for the model service.
Thus, according to the model deployment device of the embodiment of the application, a model configuration file for model deployment is obtained, a model service is created based on the model configuration file, then the model service is deployed into at least one target container in a pre-created container cluster, and the model service is started to run in the target container, so that an external service accesses the model service through a pre-deployed service application architecture and a model access interface on the target container pre-allocated for the model service. Based on this, a model service is created from a model configuration file for model deployment, and after the model service is created, a container cluster is created using an open source container orchestration tool, and the model service is deployed into a target container of the container cluster based on a containerization technique to launch the running model service in the target container. In addition, a model access interface is assigned to the model service such that the external service accesses the model service through the pre-deployed service application architecture and the model access interface. Therefore, based on an open source container arrangement tool and a container technology, a more convenient and efficient deep learning model deployment service is provided for users, high reliability, high performance and easy management of model deployment are ensured, and further the efficiency and reliability of model deployment are improved.
In a possible implementation manner, the creation module 602 is further configured to:
acquiring a predefined container configuration file, wherein the container configuration file comprises mirror image parameter information, definition and parameters of a container;
a container cluster is created from the container configuration file.
In a possible implementation manner, the deployment module 603 is specifically configured to:
acquiring predefined service node information;
determining the node number corresponding to the model service and the node type required by the model service according to the service node information;
and screening at least one target container from the container cluster according to the node number and the node type, and respectively deploying the model service into each target container.
In a possible implementation manner, the deployment module 603 is further configured to:
acquiring a predefined service mapping rule and a predefined routing rule, wherein the service mapping rule is used for representing the corresponding relation between a gateway domain name and a model service;
determining a gateway domain name corresponding to the model service based on the service mapping rule, and determining a routing address corresponding to the model service based on the routing rule;
and determining the service access address of the model service according to the gateway domain name and the routing address.
In a possible implementation manner, the deployment module 603 is further configured to:
Sending a service reasoning request to a service application architecture by an external service, wherein the service reasoning request comprises a service access address of a model service to be accessed;
the model service to be accessed is accessed by the service application architecture through the model access interface and based on the service access address.
In a possible implementation manner, the deployment module 603 is further configured to:
acquiring a plurality of probes, and correspondingly adding each probe into each container in the container cluster;
and detecting the port of the target container by using a target probe corresponding to the target container, and monitoring the working state of the target container.
The process flow of each module in the apparatus and the interaction flow between the modules may be described with reference to the related descriptions in the above method embodiments, which are not described in detail herein.
The embodiment of the application further provides an electronic device 700, as shown in fig. 7, which is a schematic structural diagram of the electronic device 700 provided in the embodiment of the application, including: the processor 701, the memory 702, and optionally, the bus 703 may also be included. The memory 702 stores machine readable instructions executable by the processor 701, which when executed by the processor 701 performs the steps of the model deployment method as set forth in any of the preceding claims, when the electronic device 700 is in operation, the processor 701 communicates with the memory 702 via the bus 703.
Embodiments of the present application also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the model deployment method according to any of the preceding claims.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system and apparatus may refer to corresponding procedures in the method embodiments, which are not described in detail in this application. In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, and the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, and for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, indirect coupling or communication connection of devices or modules, electrical, mechanical, or other form.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing is merely a specific embodiment of the present application, but the protection scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes or substitutions are covered in the protection scope of the present application.

Claims (10)

1. A method of model deployment, comprising:
obtaining a model configuration file for model deployment;
creating a model service based on the model configuration file;
deploying the model service into at least one target container in a pre-created container cluster, and starting to run the model service in the target container, so that external services access the model service through a pre-deployed service application architecture and a model access interface on the target container pre-allocated for the model service.
2. The method of claim 1, wherein prior to deploying the model service into at least one target container in a pre-created container cluster, further comprising:
acquiring a predefined container configuration file, wherein the container configuration file comprises mirror image parameter information, definition and parameters of a container;
and creating the container cluster according to the container configuration file.
3. The method of claim 1, wherein deploying the model service into at least one target container in a pre-created cluster of containers comprises:
acquiring predefined service node information;
Determining the node number corresponding to the model service and the node type required by the model service according to the service node information;
and screening at least one target container from the container cluster according to the node number and the node type, and respectively deploying the model service into each target container.
4. The method of claim 1, wherein after deploying the model service into at least one target container in a pre-created container cluster, further comprising:
acquiring a service access address allocated for the model service in the process of creating the model service by an external service, and pushing the service access address to the service application architecture by using a message queue;
and distributing a model access interface for the model service, and pushing the model access interface to the service application architecture.
5. The method of claim 4, wherein the acquiring the service access address assigned to the model service prior to creating the model service further comprises:
acquiring a predefined service mapping rule and a predefined routing rule, wherein the service mapping rule is used for representing the corresponding relation between a gateway domain name and a model service;
Determining a gateway domain name corresponding to the model service based on the service mapping rule, and determining a routing address corresponding to the model service based on the routing rule;
and determining the service access address of the model service according to the gateway domain name and the routing address.
6. The method of claim 1, wherein after initiating running the model service in the target container, further comprising:
sending a service reasoning request to the service application architecture by an external service, wherein the service reasoning request comprises a service access address of a model service to be accessed;
and accessing the model service to be accessed by the service application architecture through the model access interface and based on the service access address.
7. The method according to claim 1, wherein the method further comprises:
acquiring a plurality of probes, and correspondingly adding each probe into each container in the container cluster;
and detecting a port of the target container by using a target probe corresponding to the target container, and monitoring the working state of the target container.
8. A model deployment apparatus, comprising:
The acquisition module is used for acquiring a model configuration file for model deployment;
the creation module is used for creating a model service based on the model configuration file;
the deployment module is used for deploying the model service into at least one target container in a pre-created container cluster, and starting to run the model service in the target container so that an external service accesses the model service through a pre-deployed service application architecture and a model access interface on the target container which is pre-allocated for the model service.
9. An electronic device, comprising: a processor and a memory storing machine readable instructions executable by the processor to perform the steps of the model deployment method according to any one of claims 1 to 7 when the electronic device is running.
10. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, performs the steps of the model deployment method according to any of claims 1 to 7.
CN202311702833.8A 2023-12-12 2023-12-12 Model deployment method and device, electronic equipment and storage medium Pending CN117785224A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311702833.8A CN117785224A (en) 2023-12-12 2023-12-12 Model deployment method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311702833.8A CN117785224A (en) 2023-12-12 2023-12-12 Model deployment method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117785224A true CN117785224A (en) 2024-03-29

Family

ID=90386346

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311702833.8A Pending CN117785224A (en) 2023-12-12 2023-12-12 Model deployment method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117785224A (en)

Similar Documents

Publication Publication Date Title
CN108924217B (en) Automatic deployment method of distributed cloud system
CN111522628B (en) Kubernetes cluster building deployment method, framework and storage medium based on OpenStack
US20200081731A1 (en) Method, system and apparatus for creating virtual machine
CN110752947A (en) K8s cluster deployment method and device, and deployment platform
JP5276632B2 (en) Cluster system and software deployment method
CN114153566A (en) Cross-processor architecture multi-container inter-cluster service discovery method, device and equipment
Muralidharan et al. Monitoring and managing iot applications in smart cities using kubernetes
Eidenbenz et al. Latency-aware industrial fog application orchestration with kubernetes
CN104793981B (en) A kind of online snapshot management method and device of cluster virtual machine
WO2018146671A1 (en) Dynamically adaptive cloud computing infrastructure
CN111290712A (en) Block device creating method and device, cloud computing management system and storage medium
CN112882792A (en) Information loading method, computer device and storage medium
CN114338670A (en) Edge cloud platform and three-level cloud control platform for internet traffic with same
CN114615268B (en) Service network, monitoring node, container node and equipment based on Kubernetes cluster
CN109005071B (en) Decision deployment method and scheduling equipment
CN114827017B (en) Communication method and device of Kafka cluster, electronic equipment and storage medium
CN117785224A (en) Model deployment method and device, electronic equipment and storage medium
Herlicq et al. Nextgenemo: an efficient provisioning of edge-native applications
CN115225645A (en) Service updating method, device, system and storage medium
CN116860382A (en) Container-based method and device for achieving micro-service cluster
CN114490021A (en) Cloud edge coordination system and method for Internet of things edge computing
CN111431951B (en) Data processing method, node equipment, system and storage medium
CN116339926B (en) Containerized deployment method of ATS software
WO2023032103A1 (en) Job control system and job control method
CN112003731B (en) Configuration method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination