CN113222174B

CN113222174B - Model management method and device

Info

Publication number: CN113222174B
Application number: CN202110444274.XA
Authority: CN
Inventors: 李作伟; 杨军; 陈挺
Original assignee: Wanyi Technology Co Ltd
Current assignee: Wanyi Technology Co Ltd
Priority date: 2021-04-23
Filing date: 2021-04-23
Publication date: 2024-04-26
Anticipated expiration: 2041-04-23
Also published as: CN113222174A

Abstract

The embodiment of the application discloses a model management method and a device, wherein the method can comprise the following steps: obtaining a storage resource, wherein at least one model is stored in the storage resource, the model storage address of the at least one model is respectively mounted in a corresponding container, and each container is stored with one model; when a first model in at least one model stored in the storage resource is updated from a first version to a second version, updating the first version of the first model stored in the first container to the second version according to a preset updating frequency and a model storage address in the first container corresponding to the first model. By implementing the embodiment of the application, batch upgrading of the model can be realized, and the updating flow is simplified.

Description

Model management method and device

Technical Field

The present application relates to the field of machine learning, and in particular, to a model management method and apparatus.

Background

TensorFlow is an end-to-end open source machine learning platform. It has a comprehensive and flexible ecosystem that contains various tools, libraries and community resources that can assist researchers in the development of advanced machine learning techniques and enable developers to easily build and deploy applications supported by machine learning. The model trained through TensorFlow can be deployed by means of TensorFlow-serving. When different models exist at the same time and version updating is needed, different models are deployed in different containers in the prior art, the containers are mounted on a host disk, and the models in the containers are replaced by the host for multiple logins to realize updating of multiple model versions. This implementation makes the update flow too cumbersome.

Disclosure of Invention

The embodiment of the application provides a model management method and device, which can update models in batch in real time and simplify the updating process.

In a first aspect, an embodiment of the present application provides a method for managing a model, where the method includes:

Obtaining a storage resource, wherein at least one model is stored in the storage resource, the model storage addresses of the at least one model are respectively mounted in corresponding containers, and each container is stored with one model;

When a first model in at least one model stored in the storage resource is updated from a first version to a second version, updating the first version of the first model stored in the first container to the second version according to a preset updating frequency and a model storage address in the first container corresponding to the first model.

In a second aspect, an embodiment of the present application provides a model management apparatus, including:

The system comprises an acquisition module, a storage module and a storage module, wherein the acquisition module is used for acquiring storage resources, at least one model is stored in the storage resources, model storage addresses of the at least one model are respectively mounted in corresponding containers, and each container is stored with one model;

And the processing module is used for updating the first version of the first model stored in the first container into the second version according to the preset updating frequency and the model storage address in the first container corresponding to the first model when the first model in the at least one model stored in the storage resource is updated from the first version into the second version.

In a third aspect, an embodiment of the application provides an electronic device comprising a processor and a memory, the processor and the memory being interconnected, wherein the memory is adapted to store a computer program comprising program instructions, the processor being configured to invoke the program instructions, which program instructions, when executed by the processor, cause the processor to perform the method according to the first aspect.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium storing a computer program comprising instructions for performing the method according to the first aspect.

In a fifth aspect, embodiments of the present application provide a computer program product comprising: computer program code which, when run on a computer, causes the computer to perform the method according to the first aspect.

It can be seen that, in the embodiment of the application, in addition to storing the model in the container, the model is stored in the storage resource, the storage address of the model in the storage resource is mounted in the container corresponding to the storage of the same model, and when the model version in the storage resource is updated in batches, the synchronous update of the corresponding model in the container is realized according to the preset update frequency and the storage address of the model mounted in the container, so that the update flow is simplified.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a model management method according to an embodiment of the present application;

FIG. 2 is a K8s infrastructure to which the model management method according to the embodiment of the present application may be applied;

FIG. 3 is a schematic diagram of a model management device according to an embodiment of the present application;

fig. 4 is a schematic diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application.

The terms "comprising" and "having" and any variations thereof in the description and claims of the application and in the foregoing drawings are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

The electronic device may include a terminal or a server, and embodiments of the present application are not limited. Terminals include the precision of various handheld devices, vehicle mounted devices, wearable devices (e.g., smart watches, smart bracelets, pedometers, etc.), computing devices, or other processes connected to a wireless modem with wireless communication capabilities. User Equipment (UE), mobile Station (MS), terminal Equipment (TERMINAL DEVICE), and the like. For convenience of description, the above-mentioned devices are collectively referred to as electronic devices.

The technical terms and concepts related to the embodiments of the present application will be described first.

(1)TensorFlow

TensorFlow is an end-to-end open source machine learning platform. It has a comprehensive and flexible ecosystem that contains various tools, libraries and community resources that can assist researchers in the development of advanced machine learning techniques and enable developers to easily build and deploy applications supported by machine learning.

(2)TensorFlow-serving

TensorFlow-servicing is used to bring models trained, validated, and predicted using TensorFlow directly on line and provide services. That is, tensorFlow-serving is used to deploy models that run well trained by TensorFlow.

(3) Container

Containerization is a method of software development by which an application or service, its dependencies and its configuration (abstracted into a deployment manifest file) can be packaged together into a container image. The containerized application may be tested as a unit and may be deployed to a host Operating System (OS) as a container image instance. That is, a container contains the complete runtime environment: all dependencies, class libraries, other binaries, configuration files, etc. required by the application, except the application itself, are uniformly driven into a package called a container image. By containerizing the application itself, and its dependencies, differences between the release version of the operating system and other underlying environments are abstracted. It should be noted that, if no specific description is made, the container according to the embodiment of the present application is a Docker container by default.

(4)Docker

Dock is a tool to create containers, which is an open-source application container engine that allows developers to package their applications and rely on packages into a portable image, then release them onto any popular Linux or Windows machine, and also implement virtualization. Dock is used to create containers from images, which are totally sandboxed mechanisms without any interface to each other. The application scene of the Docker comprises: automated packaging and publishing of Web applications, automated testing and continuous integration and publishing, deploying and tuning databases or other background applications in a service-like environment, and the like. The application program can be separated from the infrastructure through the Docker, so that the rapid delivery of the software can be realized.

(5)kubernetes

Kubernetes, K8s for short, is an open-source application for managing containerization on multiple hosts in a cloud platform. The goal of Kubernetes is to make deploying a containerized application simple and efficient, and K8s provides a mechanism for application deployment, planning, updating, and maintenance. The K8s can be used for rapidly deploying application, rapidly expanding application, seamlessly butting new application functions, saving resources, optimizing the use of hardware resources and the like.

(6) Node

The node is the smallest computational hardware unit in Kubernetes. It is a representation of a single machine in the Kubernetes cluster. In most systems, nodes include, but are not limited to, physical machines in a data center, or virtual machines hosted on a cloud platform. Nodes on Kubernetes include two classes, master Node (Master Node) and compute Node (Node).

(7)Pod

Pod is the most basic operating unit of K8s, and one Pod represents a process running in a cluster, and internally encapsulates one or more closely related containers. Pod is used as a replication unit for Kubernetes, which can be configured to deploy copies of Pod into clusters when necessary in cases where a single Pod instance cannot carry a load.

(8)Nginx

Nginx is a free, open-source, high-performance HTTP server and reverse proxy server; meanwhile, the system is also an IMAP, POP3 and SMTP proxy server; the Nginx can be used as an HTTP server for website publishing processing, and in addition, the Nginx can be used as a reverse proxy for load balancing realization. An agent refers to a representative or channel that involves two roles, one being an agent role and one being a target role, and the process by which the agent role accesses the target role through the agent to accomplish tasks is referred to as the agent operation process. For example, a user may have purchased a product from a brand specialty store, which is an agent, the agent's role is the brand vendor, and the target role is the user. Before the reverse proxy is spoken, the forward proxy is spoken. Forward proxy, which is a client that sends a request, is a server located between the client and an origin server (origin server), and in order to retrieve content from the origin server, the client sends a request to the proxy specifying a target (origin server), and the proxy forwards the request to the origin server and returns the retrieved content to the client. The reverse proxy is a server side, and the proxy receives the request instead of the server side, and is mainly used for hiding the information of the server under the condition of distributed deployment of the server cluster. For example, after the plurality of clients send requests to the server, the nmginx server receives the requests, and distributes the requests to the service processing server at the back end according to a certain rule to process the requests. At this point, the source of the request, i.e., the client, is explicit, but it is not clear which server the request is specifically handled by, in the process, nmginx plays a reverse proxy role. When the Nginx is used as a reverse proxy server, the number of requests, namely the load quantity, sent by the client side and received by the Nginx is distributed to different servers for processing according to a certain rule, namely an equilibrium rule. And the process of distributing requests received by the server according to rules is called load balancing.

(9)GPU

A GPU, i.e., a graphics processor, also known as a display core, a vision processor, a display chip, is a microprocessor that performs image and graphics related operations on a personal computer, a workstation, a game machine, and some mobile devices (e.g., tablet computers, smartphones, etc.).

Fig. 1 is a schematic flow chart of a model management method according to an embodiment of the present application.

Step S101: and obtaining a storage resource, wherein at least one model is stored in the storage resource, the model storage addresses of the at least one model are respectively mounted in corresponding containers, and each container is stored with one model.

The storage resource may be a local storage resource, such as a hard disk, or a network storage resource, such as a cloud object storage (CloudObjectStorage, COS), a cloud object storage service (ObjectStorageService, OBS), an ali cloud object storage service (Object Storage Service, OSS), or the like.

As shown in FIG. 2, the K8s infrastructure to which the model management method according to the embodiment of the present application can be applied is shown.

The Kubernetes203 consists of a Master Node, the Master Node 210, and at least one computing Node, the Node. Master node 210 includes API SERVER component 2101, scheduler component 2102 and Controller Manager component 2103. The API SERVER component 2101 is mainly used for providing functions of authentication and authorization, running a group of admission controllers, managing API versions and the like, and can provide services outwards, namely an external interface of the whole system, for clients and other components to call, and allow various components to create, read, write, update and monitor resources (e.g., pod); the Scheduler component 2102 is configured to select a suitable Node to create a Pod according to the cluster resource and the state, that is, to schedule the resource in the cluster; controller Manager component 2103 is used as a management control center in the cluster and is responsible for management of nodes, pod copies, service endpoints (endpoints), namespaces (Namespace), service account numbers (ServiceAccount), resource quota (ResourceQuota) and the like in the cluster, and when a Node is down accidentally, an automatic repair flow is discovered and executed in time. Controller Manager contain a plurality of controllers therein, each of which is responsible for a particular control flow. The Node nodes in fig. 2 include Node 204X, node Node 204Y and Node 204Z, and the embodiment of the present application is explained in detail using Node 204X as an example. Node 204X includes one or more Pod (e.g., pod205A, pod B), docker209, and Kubelet211. Wherein, docker209 is used to create containers, there is a mirrored repository in Docker, there are one or more different Docker mirrors in mirrored repository, different Docker mirrors are used to create different containers. kubelet211 is primarily responsible for monitoring Pod assigned to the Node at which Kubelet resides (here Node 204X), including creation, modification, monitoring, deletion, etc. One or more containers are included in Pod, such as container 206a in Pod205A and container 206B in Pod 205B. One or more models and TensorFlow-serving are included in container, such as model one 208A and TensorFlow-serving207A in container 206A, and model two 208B and TensorFlow-serving207B in container 206B. TensorFlow-serving in the container may be from a Docker mirror image containing TensorFlow-serving.

The storage resource 201 is mounted on kubernetes and 203, so that kubernetes and kubernetes can obtain contents stored in the storage resource 201 by mounting the storage resource 201, for example, kubernetes and 203 can obtain a model one 202A and a model two 202B stored in the storage resource 201. The model in the storage resource has an alias name and version number corresponding to it. The first model 202A is mounted to the corresponding container 206A at the storage address in the storage resource 201, for example, under the/models directory of the container 206A; the storage address of model two 202B in storage resource 201 is mounted to the corresponding container 206B, e.g., under the/models directory of container 206B.

It should be noted that one Node, such as Node 204X, may include one or more Pod, and as shown in fig. 2, node 204X includes Pod205A and Pod205B. One Pod may contain one or more containers and one container may contain one or more models. The embodiment of the present application takes one Pod and one model in one Pod as examples, and is not limited in any way.

Step S102: when a first model in at least one model stored in the storage resource is updated from a first version to a second version, updating the first version of the first model stored in the first container to the second version according to a preset updating frequency and a model storage address in the first container corresponding to the first model.

When a first model in a storage resource needs to be updated, a new version of the first model, i.e., a second version of the first model, is uploaded to the storage resource. After TensorFlow-serving in the container storing the first model detects that the model version changes, the first version of the first model stored in the container is correspondingly updated to the second version according to the model storage address of the first model stored in the storage resource mounted in the container.

If the first model stored in the container is in an operating state when the first version of the first model is updated to the second version, the first model stored in the container is updated from the first version to the second version after the first model in the storage resource is updated from the first version to the second version. But at this point the version of the first model in operation remains the first version, although both the storage resource and the first model in the container have been updated to the second version. Thus, the running first model may be updated from the first version to the second version according to the preset update frequency. The method of setting the preset update frequency includes setting the value of the update frequency variable, for example, setting-file_system_poll_wait_second=60 indicates updating the configuration every 60 seconds. Wherein updating the configuration includes updating the model. That is, according to the preset update frequency, the model in operation is updated as the model stored in the container, thereby realizing the real-time update of the model.

For example, if multiple models exist in the storage resource at the same time and need to be updated, the second versions of the multiple models can be uploaded to the storage resource in batches. As shown in FIG. 2, if both models 208A and 208B need to be updated and are both in operation, the second version of model 208A and the second version of model 208B may be uploaded to storage resource 201 simultaneously. For ease of description, the second version of model 208A is referred to as model 208A 'and the second version of model 208B is referred to as model 208B' in this example. After models 208A 'and 208B' are stored in storage resource 201, tensorFlow-serving207A in container 206A and TensorFlow B in container 206B detect that the model versions change, thereby updating models 208A through 208A 'based on the storage address of model 202A mounted in container 206A, and likewise updating models 208B through 208B' based on the storage address of model 202B mounted in container 206B. Where model 208A 'is identical to model 202A', model 208B 'is identical to model 202B'. The ongoing 208A and 208B are then updated to 208A 'and 208B' according to the preset update frequency, thereby enabling batch real-time updating of the plurality of models. By storing the model in the storage resource, the storage address of the model in the storage resource is mounted in the container corresponding to the same model, and when the model version in the storage resource is updated in batches, the corresponding model in the container can be synchronously updated according to the preset updating frequency and the model storage address mounted in the container, so that the updating flow is simplified.

In one possible implementation, when the model is a GPU model, a corresponding container is built using a Docker mirror image that carries GPU information.

The GPU model refers to a model that requires a GPU for training, loading, and running the model. The model is trained, loaded and operated by using the GPU, so that the model reasoning speed can be increased. The image warehouse is arranged in the Docker, a plurality of different Docker images are arranged in the image warehouse, and a user can select different Docker images according to own requirements to create different containers. Therefore, when the model is a GPU model, the corresponding Docker mirror image carrying the GPU information can be selected to create the container carrying the GPU information, so that the container can better serve the corresponding model.

For example, as shown in fig. 2, there is a mirror warehouse in the Docker209, where there are multiple different Docker mirrors in the mirror warehouse, if the model one is a GPU model, when selecting a Docker image creation container 206A in the mirror warehouse of the Docker209, a Docker image carrying GPU information will be selected, for example, a Docker image with a version TensorFlow-Serving:1.14.0-GPU may be adopted, which indicates that the image carries GPU information and TensorFlow-Serving is included in the image. For other non-GPU models, besides the Docker mirror image carrying GPU information, a non-GPU version, i.e., a Docker mirror image not carrying GPU information, may also be used. After creating container 206A using the Docker image carrying GPU information, model one 208A, which is a GPU model, may be added to container 206A.

In one possible implementation, the query rate per second of the nmginx server is obtained; and determining the Pod processing type based on the comparison result of the query rate per second and the preset threshold value, wherein the Pod processing type comprises capacity expansion operation, capacity shrinkage operation and constant holding, and Pod is a carrier of the container.

In the prior art, an HPA plug-in may be included in a Node. The HPA plug-in is used for monitoring the use conditions of the memory resource and the CPU resource, so that whether the Pod is expanded, contracted or kept unchanged or flexible can be determined according to the use conditions of the memory resource and the CPU resource, and the stability of system service is ensured and the resource waste is reduced. However, for the model using GPU, that is, for the container and Pod using GPU, the fluctuation of the demand quantity of CPU resource and memory resource is negligible, and GPU resource and graphics card memory resource, that is, video memory resource are mainly used, so the method of monitoring the use condition of memory resource and CPU resource to perform elastic expansion of Pod cannot meet the actual needs. In the prior art, the use condition of the GPU resource cannot be directly obtained, so in the embodiment of the present application, the query rate per second of the nginnx server, that is, the number of requests received by the nginnx server per second is obtained, and the number of requests is compared with a preset threshold to determine the operation type of Pod. And indirectly replacing the use condition of the GPU resources by the query rate per second, and determining the elastic telescoping operation. When the request number is smaller than a preset threshold value or a preset threshold value range, performing capacity shrinking operation on the Pod; and when the request number is larger than a preset threshold value or a preset threshold value range, performing capacity expansion operation on the Pod, and keeping the Pod number unchanged under the other conditions.

By way of example, an nmginx server is added at the upper layer of TensorFlow-serving, that is, when a user wants to access a certain GPU model through TensorFlow-serving, the nmginx server needs to pass through the nmginx server, so that the nmginx server can obtain the number of requests per second and provide the number of requests per second to the HPA plugin, and the HPA plugin can automatically perform elastic expansion and contraction of Pod based on the query rate per second provided by nmginx and a preset threshold or a preset threshold range, thereby realizing elastic expansion and contraction for the use situation of the GPU.

In one possible implementation, after the GPU model is loaded successfully, the amount of resources of the GPU resources used by the GPU model is limited by setting a first threshold.

For example, as shown in fig. 2, model one 208A is a GPU model, and setting the first threshold to be the GPU resource value size (e.g., 1G) indicates that model one 208A uses GPU resources of 1G size in the K8s cluster at most; or the first threshold is set to be a percentage (e.g., 20%) of GPU resources, then TensorFlow-serving of running model one 208A can use up to 20% of GPU resources in the K8s cluster. If the TensorFlow-serving207A of the running model one 208A is not limited to use GPU resources, the TensorFlow-serving207A will default to occupy all GPU resources in the K8s cluster, and the embodiment of the application limits TensorFlow-serving to use all GPU resources by setting the first threshold, so that the GPU can be used for deploying a plurality of TensorFlow-serving at the same time, and the utilization rate of GPU resources is improved.

In one possible implementation, after the GPU model is loaded successfully, the amount of resources of the GPU model that use the video memory resources is limited by setting a second threshold.

For example, as shown in fig. 2, model one 208A is a GPU model, and the second threshold is set to be the size of the video memory resource value (e.g., 1G), which indicates that model one 208A uses the video memory resource of 1G size in the K8s cluster at most; or the second threshold is set to be a percentage (e.g., 20%) of the video memory resources, then TensorFlow-serving of the running model one 208A can use up to 20% of the video memory resources in the K8s cluster. If the TensorFlow-serving207A of the operation model one 208A is not limited to use the video memory resources, the TensorFlow-serving207A will default to occupy all the video memory resources in the K8s cluster, and the embodiment of the application limits TensorFlow-serving to use all the video memory resources by setting the second threshold, so that the video memory can be used for deploying a plurality of TensorFlow-serving at the same time, and the utilization rate of the video memory resources is improved.

The following describes an apparatus according to an embodiment of the present application with reference to the drawings.

Fig. 3 is a schematic diagram of a model management device according to an embodiment of the present application. The model management apparatus 300 includes:

The obtaining module 301 is configured to obtain a storage resource, where at least one model is stored in the storage resource, and model storage addresses of the at least one model are respectively mounted in corresponding containers, and each container stores one model;

And the processing module 302 is configured to update, when a first model in at least one model stored in the storage resource is updated from a first version to a second version, the first version of the first model stored in the first container to the second version according to a preset update frequency and a model storage address in the first container corresponding to the first model.

Optionally, in the case that the model is a GPU model, a corresponding container is built by adopting a Docker mirror image carrying GPU information.

Optionally, the obtaining module 301 is further configured to obtain a query rate per second of the nginnx server;

the processing module 302 is further configured to determine a Pod processing type based on a result of comparing the query rate per second with a preset threshold, where the Pod processing type includes a capacity expansion operation, a capacity contraction operation, and a constant capacity maintenance operation, and the Pod is a carrier of the container.

Optionally, the processing module 302 is further configured to limit, after the GPU model is loaded successfully, an amount of resources of the GPU resources used by the GPU model by setting a first threshold.

Optionally, the processing module 302 is further configured to limit, after the GPU model is loaded successfully, an amount of resources of the graphics card resource used by the GPU model by setting a second threshold.

The specific implementation function manner of the module management device 300 may refer to the corresponding method steps in fig. 1, and will not be described herein.

The electronic device 100 may include: a processor 110, a memory 120; the processor 110, the memory 120 and the communication interface 130 are connected through the bus 140, where the memory 120 is used for storing instructions, and the processor 110 is used for executing the instructions stored in the memory 120 to implement the method steps corresponding to fig. 1 above.

The processor 110 is configured to execute the instructions stored in the memory 120 to control the communication interface 130 to receive and transmit signals, thereby completing the steps in the method. The memory 120 may be integrated into the processor 110 or may be provided separately from the processor 110.

As an implementation, the functions of the communication interface 130 may be considered to be implemented by a transceiver circuit or a dedicated chip for transceiving. The processor 110 may be considered to be implemented by a dedicated processing chip, a processing circuit, a processor, or a general-purpose chip.

As another implementation manner, a manner of using a general-purpose computer may be considered to implement the electronic device provided by the embodiment of the present application. I.e. program code implementing the functions of the processor 110, the communication interface 130 is stored in the memory 120, and the general purpose processor implements the functions of the processor 110, the communication interface 130 by executing the code in the memory 120.

The concepts related to the technical solutions provided by the embodiments of the present application, explanation and detailed description of the concepts related to the embodiments of the present application and other steps refer to the foregoing methods or descriptions of the contents of the method steps performed by the apparatus in other embodiments, which are not repeated herein.

As another implementation of this embodiment, a computer-readable storage medium is provided, on which instructions are stored, which when executed perform the method in the method embodiment described above.

As another implementation of this embodiment, a computer program product is provided that contains instructions that, when executed, perform the method of the method embodiment described above.

Those skilled in the art will appreciate that only one memory and processor is shown in fig. 4 for ease of illustration. In an actual electronic device, there may be multiple processors and memories. The memory may also be referred to as a storage medium or storage device, etc., and embodiments of the present application are not limited in this respect.

It should be appreciated that in embodiments of the present application, the processor may be a central Processing unit (Central Processing Unit, CPU) and may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL Processing, DSP), application SPECIFIC INTEGRATED Circuits (ASIC), off-the-shelf Programmable gate arrays (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and so on.

It should also be understood that the memory referred to in embodiments of the present application may be volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable ROM (ELECTRICALLY EPROM, EEPROM), or a flash Memory. The volatile memory may be a random access memory (Random Access Memory, RAM for short) which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as static random access memory (STATIC RAM, SRAM), dynamic random access memory (DYNAMIC RAM, DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate Synchronous dynamic random access memory (Double DATA RATE SDRAM, DDR SDRAM), enhanced Synchronous dynamic random access memory (ENHANCED SDRAM, ESDRAM), synchronous link dynamic random access memory (SYNCHLINK DRAM, SLDRAM), and direct memory bus random access memory (Direct Rambus RAM, DR RAM).

It should be noted that when the processor is a general-purpose processor, DSP, ASIC, FPGA or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, the memory (storage module) is integrated into the processor.

It should be noted that the memory described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

The bus may include a power bus, a control bus, a status signal bus, and the like in addition to the data bus. But for clarity of illustration, the various buses are labeled as buses in the figures.

It should also be understood that the first, second, third, fourth and various numerical numbers referred to herein are merely descriptive convenience and are not intended to limit the scope of the application.

It should be understood that the term "and/or" is merely an association relationship describing the associated object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.

In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in the processor for execution. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method. To avoid repetition, a detailed description is not provided herein.

In various embodiments of the present application, the sequence number of each process does not mean the sequence of execution, and the execution sequence of each process should be determined by its functions and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.

Those of ordinary skill in the art will appreciate that the various illustrative logical blocks (illustrative logical block, abbreviated ILBs) and steps described in connection with the embodiments disclosed herein can be implemented in electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk), etc.

The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A model management method, comprising:

when the container detects that a first model in at least one model stored in the storage resource is updated from a first version to a second version, updating the first version of the first model stored in the first container to the second version according to a preset updating frequency and a model storage address in the first container corresponding to the first model;

under the condition that the model is a GPU model, establishing a corresponding container by adopting a Docker mirror image carrying GPU information;

Acquiring the query rate of the Nginx server per second;

and determining a Pod processing type based on a comparison result of the query rate per second and a preset threshold, wherein the Pod processing type comprises a capacity expansion operation, a capacity contraction operation and a constant capacity maintenance operation, and the Pod is a carrier of the container.

2. The method according to claim 1, wherein the method further comprises:

and after the GPU model is successfully loaded, limiting the amount of resources of GPU resources used by the GPU model in a mode of setting a first threshold.

3. The method according to claim 1, wherein the method further comprises:

And after the GPU model is successfully loaded, limiting the resource amount of the GPU model used for using the video memory resource by setting a second threshold value.

4. An apparatus for model management, comprising:

the processing module is used for updating the first version of the first model stored in the first container into the second version according to the preset updating frequency and the model storage address in the first container corresponding to the first model when the first model in the at least one model stored in the storage resource is updated from the first version into the second version;

the processing module is further used for establishing a corresponding container by adopting a Docker mirror image carrying GPU information under the condition that the model is a GPU model;

the acquisition module is also used for acquiring the query rate per second of the Nginx server;

the processing module is further used for determining a Pod processing type based on a comparison result of the query rate per second and a preset threshold, wherein the Pod processing type comprises a capacity expansion operation, a capacity contraction operation and a constant capacity maintenance operation, and the Pod is a carrier of the container.

5. An electronic device, comprising:

a processor and a memory, the processor and the memory being interconnected, wherein the memory is adapted to store a computer program, the computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the method of any of claims 1-3.

6. A computer-readable storage medium, comprising:

Stored in the computer readable storage medium are instructions which, when run on a computer, implement the method according to any one of claims 1-3.