CN116302555A - Instance management method, system, physical host, device and storage medium - Google Patents

Instance management method, system, physical host, device and storage medium Download PDF

Info

Publication number
CN116302555A
CN116302555A CN202310320751.0A CN202310320751A CN116302555A CN 116302555 A CN116302555 A CN 116302555A CN 202310320751 A CN202310320751 A CN 202310320751A CN 116302555 A CN116302555 A CN 116302555A
Authority
CN
China
Prior art keywords
cluster
target instance
instance cluster
target
instances
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310320751.0A
Other languages
Chinese (zh)
Inventor
徐浩广
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
Alibaba China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba China Co Ltd filed Critical Alibaba China Co Ltd
Priority to CN202310320751.0A priority Critical patent/CN116302555A/en
Publication of CN116302555A publication Critical patent/CN116302555A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer And Data Communications (AREA)

Abstract

The embodiment of the invention provides an instance management method, an instance management system, a physical host, a physical device and a storage medium, wherein the method comprises the following steps: the physical host is deployed with a target instance cluster for providing service, the instances in the cluster use GPU computing resources of the physical host to provide service, based on the service, the control device firstly determines the GPU utilization rate of the target instance cluster, and then adjusts the number of the instances in the cluster according to the GPU utilization rate and busy and idle characteristics of the service provided by the target instance cluster. The GPU utilization rate of the target instance cluster can reflect the workload of the cluster, and the busy and idle feature of the service provided by the target instance cluster can reflect the data flow generation rule corresponding to the service provided by the cluster. The method expands and contracts the cluster by referring to the load pressure of the cluster and the type of service provided by the cluster, so that the number and the time for adjusting the examples in the cluster are more reasonable, and the influence of the cluster expansion and contraction on the service quality is reduced.

Description

Instance management method, system, physical host, device and storage medium
Technical Field
The present invention relates to the field of cloud computing, and in particular, to an instance management method, system, physical host, device, and storage medium.
Background
Artificial intelligence implemented based on machine learning models has been applied to an increasing number of scenarios. Translation of multiple languages may be accomplished, for example, using a machine learning model. For example, the machine learning model can be used for detecting the target object in the image or the video, and the target object detection has wide application in intelligent traffic and intelligent security scenes. For another example, a human-machine conversation can also be implemented using machine learning. And the various functions described above may be provided to a user as a service.
In practice, example clusters deployed with machine learning models may be utilized to provide the various services described above. And the service quality can be ensured by expanding and contracting the example cluster. However, the expansion and contraction of the volume often needs a certain period of time, so how to reduce the influence of the expansion and contraction of the volume on the service quality becomes a problem to be solved urgently.
Disclosure of Invention
In view of the above, embodiments of the present invention provide an example management method, system, physical host, device and storage medium for reducing the impact of expansion and contraction on the quality of service.
In a first aspect, an embodiment of the present invention provides an instance management method, including:
determining the GPU utilization rate of a graphic processor of a target instance cluster, wherein the target instance cluster is deployed on a physical host, and the instances in the target instance cluster provide services by using GPU computing resources provided by the physical host;
and adjusting the number of the instances in the target instance cluster according to the GPU utilization rate and the busy and idle characteristics of the service provided by the target instance cluster.
In a second aspect, an embodiment of the present invention provides an instance management method, including:
in response to operation of a target service provided by a target instance cluster, determining graphics processor GPU utilization of the target instance cluster, the target instance cluster deployed on a physical host, an instance in the target instance cluster providing the target service using GPU computing resources provided by the physical host;
and according to the GPU utilization rate and the busy and idle characteristics of the target service, adjusting the number of the instances in the target instance cluster so that the target instance cluster with the adjusted number provides the target service, wherein the target service comprises at least one of online translation, automatic driving, video identification and man-machine conversation.
In a third aspect, an embodiment of the present invention provides an instance management system, including: the control device and a physical host deployed with a target instance cluster, wherein the instances in the target instance cluster provide services by using GPU computing resources provided by the physical host;
the control device is used for determining the GPU utilization rate of the target instance cluster; and determining whether to adjust the number of the instances in the target instance cluster according to the GPU utilization rate and the busy and idle characteristics of the service provided by the target instance cluster.
In a fourth aspect, an embodiment of the present invention provides a physical host, including:
a control component and a target instance cluster;
the control component is configured to determine a GPU utilization of the target instance cluster, where the instances in the target instance cluster provide services using GPU computing resources provided by the physical host; and determining whether to adjust the number of the instances in the target instance cluster according to the GPU utilization rate and the busy and idle characteristics of the service provided by the target instance cluster.
In a fifth aspect, an embodiment of the present invention provides an instance management apparatus, including:
the system comprises a utilization rate determining module, a storage module and a storage module, wherein the utilization rate determining module is used for determining GPU utilization rate of a graphic processor of a target instance cluster, the target instance cluster is deployed on a physical host, and instances in the target instance cluster provide services by using GPU computing resources provided by the physical host;
And the adjusting module is used for adjusting the number of the instances in the target instance cluster according to the GPU utilization rate and the busy and idle characteristics of the service provided by the target instance cluster.
In a sixth aspect, an embodiment of the present invention provides an instance management apparatus, including:
a determining module, configured to determine, in response to operation of a target service provided by a target instance cluster, a graphics processor GPU utilization of the target instance cluster, where the target instance cluster is deployed on a physical host, and an instance in the target instance cluster provides the target service using GPU computing resources provided by the physical host;
and the quantity adjusting module is used for adjusting the quantity of the instances in the target instance cluster according to the GPU utilization rate and the busy and idle characteristics of the target service so that the target instance cluster with the quantity adjusted provides the target service, and the target service comprises at least one of online translation, automatic driving, video identification and man-machine conversation.
In a seventh aspect, an embodiment of the present invention provides an electronic device, including a processor and a memory, where the memory is configured to store one or more computer instructions, and the one or more computer instructions implement the method for managing an instance in the first aspect or the second aspect when executed by the processor. The electronic device may also include a communication interface for communicating with other devices or communication systems.
In an eighth aspect, embodiments of the present invention provide a non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to at least implement an instance management method as in the first or second aspects above.
According to the instance management method provided by the embodiment of the invention, the physical host is provided with the target instance cluster for providing service, the instances in the cluster use the GPU computing resources of the physical host for providing service, based on the fact, the control equipment firstly determines the GPU utilization rate of the target instance cluster, and then adjusts the number of the instances in the cluster according to the GPU utilization rate and the busy and idle characteristics of the service provided by the target instance cluster. The GPU utilization rate of the target instance cluster can reflect the workload of the cluster, the busy and idle feature of the service provided by the target instance cluster can reflect the data flow generation rule corresponding to the service provided by the cluster, the busy and idle feature is also related to the service content, and the data flow can be processed by the instances in the cluster.
Therefore, the method expands and contracts the clusters by referring to the information of different dimensions, namely the load pressure of the clusters and the type of the service provided by the clusters, so that the adjustment quantity and adjustment time of the examples in the clusters are more reasonable, the influence of the expansion and contraction of the clusters on the service quality is reduced, and the restarting speed after the expansion and contraction of the service is improved. Meanwhile, the utilization rate of GPU resources can be improved, and the cost required by normal service provision is reduced.
From another point of view, the embodiments of the present invention use GPU computing resources provided by physical devices, and the time required for creating such embodiments is generally longer, resulting in longer service expansion and restarting time, while creating a suitable number of embodiments can enable a cluster to complete expansion as soon as possible, and further reduce the impact of cluster expansion and contraction on service quality, that is, reduce the occurrence of unstable service caused by too long time spent for unreasonably creating a large number of embodiments.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of an example management method according to an embodiment of the present invention;
FIG. 2 is a flowchart of a judgment logic before obtaining a GPU utilization rate according to an embodiment of the present invention;
FIG. 3 is a flowchart of another example management method according to an embodiment of the present invention;
FIG. 4 is a flowchart of another example management method according to an embodiment of the present invention;
FIG. 5 is a flowchart of another example management method according to an embodiment of the present invention;
FIG. 6 is a flowchart of another example management method according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of an example management system according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of a physical host according to an embodiment of the present invention;
FIG. 9 is a schematic diagram of a method according to an embodiment of the present invention applied to a translation scenario;
fig. 10 is a schematic structural diagram of an example management device according to an embodiment of the present invention;
fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present invention;
fig. 12 is a schematic structural diagram of another example management apparatus according to an embodiment of the present invention;
fig. 13 is a schematic structural diagram of another electronic device according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, the "plurality" generally includes at least two, but does not exclude the case of at least one.
It should be understood that the term "and/or" as used herein is merely one relationship describing the association of the associated objects, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.
The words "if", as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to an identification", depending on the context. Similarly, the phrase "if determined" or "if identified (stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when identified (stated condition or event)" or "in response to an identification (stated condition or event), depending on the context.
It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present invention are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region, and provide corresponding operation entries for the user to select authorization or rejection.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a product or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such product or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a commodity or system comprising such elements.
Some embodiments of the invention will now be described in detail with reference to the accompanying drawings. In the case where there is no conflict between the embodiments, the following embodiments and features in the embodiments may be combined with each other. In addition, the sequence of steps in the method embodiments described below is only an example and is not strictly limited.
Before describing the following embodiments of the present invention in detail, the background of performing cluster lock expansion may be further described:
the machine learning model can be applied to various scenes such as translation, intelligent transportation, intelligent security, man-machine conversation and the like. For example, online translation services can be provided to users using machine learning models. Each machine learning model may have a language translation capability, i.e., translating from language a to language B. The machine learning model can be used for identifying the target object of the shot security video, wherein the identified target object can be an illegal person. The recognition result can be used as a basis for providing intelligent security services. The captured road video may also be identified using a machine learning model, where the target object may be a vehicle on the road. The recognition result can be used as a basis for providing an automatic driving service in an intelligent traffic scene.
Alternatively, the machine learning model may be deployed in an example mentioned in the following embodiments of the present invention, and the service provided by the example corresponds to the function of the machine learning model. Alternatively, the instance concrete may also be a containerized instance. An instance deployed with a machine learning model can process data traffic generated by a user when using a certain service to ensure normal provision of the service. Then to ensure high availability of the service, the service may be provided by an instance cluster comprising a plurality of instances.
In practice, to further guarantee the quality of service, the instance clusters may also be scaled, i.e. the number of instances in the instance set may be adjusted. At this time, the method provided by the following embodiments of the present invention can be used to realize the expansion and contraction of the instance cluster, and reduce the influence of the expansion and contraction process on the service quality.
It should be noted that an instance cluster providing any kind of service may be deployed in a physical host. Since the examples mentioned in the following embodiments of the present invention are deployed with a machine learning model with a large calculation amount, the examples can use the graphics processor (Graphics Processing Unit, GPU for short) provided by the physical host to perform calculation, i.e. the examples mentioned in the following embodiments of the present invention are GPU calculation type examples. Alternatively, the process deployed in the GPU-computing instance may be the aforementioned computationally intensive machine learning model, or may be another computationally intensive algorithm.
Optionally, the devices or components used for managing the instance clusters in the embodiments of the present invention may be cloud computing nodes in a cloud computing environment, that is, the instance management method provided by the present invention may be performed in the cloud. A computing device used by a user may communicate with a cloud computing node. Cloud computing nodes may also communicate with each other. The computing devices used by the user may include, among other things, personal digital assistants (Personal Digital Assistant, PDA for short) or cellular telephones, desktop computers, laptop computers, and/or automobile computer systems, and the like. At least one cloud computing node in the cloud computing environment may be physically or virtually grouped in one or more networks, such as the private cloud, public cloud, hybrid cloud described above, or a combination thereof. This allows the cloud computing environment to provide infrastructure, platforms, and/or software as services, without the cloud consumer having to maintain resources on the local computing device.
Based on the above description, fig. 1 is a flowchart of an example management method according to an embodiment of the present invention. The method provided by the embodiment of the invention can be executed by the control equipment. As shown in fig. 1, the method may include the steps of:
s101, determining the GPU utilization rate of a graphic processor of a target instance cluster, wherein the target instance cluster is deployed on a physical host, and the instances in the target instance cluster provide services by using GPU computing resources provided by the physical host.
At the current time, the control device may obtain GPU utilization for the target instance cluster deployed on the physical host. Wherein the GPU utilization can reflect the workload size of the cluster. Optionally, the target instance cluster may provide the translation service, the intelligent security service, and the intelligent traffic service, or any other service that needs to be implemented by using a GPU computing type instance, which is not limited by the service provided by the instance cluster.
S102, according to the GPU utilization rate and busy characteristics of service provided by the target instance cluster, the number of instances in the target instance cluster is adjusted.
And then the control equipment can expand and contract the target instance cluster according to the GPU utilization rate of the target instance cluster and the busy and idle characteristics of the service provided by the cluster, namely, the quantity of the instances in the target instance cluster is adjusted.
If the GPU utilization is greater than the preset upper utilization limit, which indicates that the workload of the target instance cluster is too large and the instances in the target instance cluster are too tense, a new instance can be created for the target instance cluster, namely the target instance cluster is expanded. In another case, if the GPU utilization is less than the preset lower utilization limit, which indicates that the workload of the target instance cluster is too small and the instances in the target instance cluster are too sufficient, part of the instances in the target instance cluster can be deleted, that is, the target instance cluster is scaled.
The number of instances created and deleted may be determined based on the busy or idle characteristics of the service. The busy-idle feature may specifically include a data traffic generation rule corresponding to the service provided by the target instance cluster, where the rule is further related to the service content.
For example, for a translation service, there are more users who use the service during the working period and fewer users who use the service during the non-working period, and thus, the data traffic generation rule corresponding to the translation service matches the working period of the users. For the man-machine conversation service in daily family life, more users use the service on weekends and during non-working hours of working days, and fewer users use the service during working hours of working days, so that the data flow generation rule corresponding to the man-machine conversation service is matched with the rest period of the users. The user can specifically perform man-machine conversation with an online customer service or a voice assistant. For the automatic driving service, the service is used by more users in the morning and evening peaks of working days and at rest days, and the service is used by less users in the rest periods, so that the data flow generation rule corresponding to the man-machine conversation service is matched with the travel period of the user.
Optionally, the control device may periodically determine whether to expand or contract the target instance cluster, that is, the control device may periodically obtain the GPU utilization rate of the target instance cluster by using its own communication interface, and periodically adjust the number of instances, where the current time in the above step is the time for reaching the utilization rate acquisition period.
It is easy to understand that in practice, the following can also occur: if the GPU utilization is between the preset upper utilization limit and the preset lower utilization limit, indicating that the workload of the target instance cluster is moderate, the number of instances in the target instance cluster may not be adjusted.
In this embodiment, the control device determines the GPU utilization of the target instance cluster, and then adjusts the number of instances in the target instance cluster according to the GPU utilization and the busy/idle characteristics of the service provided by the target instance cluster. The GPU utilization rate of the target instance cluster can reflect the workload of the cluster, the busy and idle feature of the service provided by the target instance cluster can reflect the rule of data flow generated by the service provided by the cluster, the busy and idle feature is also related to service content, and the data flow can be processed by the instances in the cluster.
Therefore, the method expands and contracts the clusters by referring to the information of different dimensions, namely the load pressure of the clusters and the type of the service provided by the clusters, so that the adjustment quantity and adjustment time of the examples in the clusters are more reasonable, the influence of the expansion and contraction of the clusters on the service quality is reduced, and the restarting speed after the expansion and contraction of the service is improved. Meanwhile, the utilization rate of GPU resources can be improved, and the cost required by normal service provision is reduced.
From another point of view, the embodiments of the present invention use GPU computing resources provided by physical devices, and the time required for creating such embodiments is generally longer, resulting in longer service expansion and restarting times, and creating a suitable number of embodiments can make the cluster complete expansion as soon as possible, which can reduce the impact of cluster expansion and contraction on service quality, that is, reduce the occurrence of unstable service caused by too long time spent for unreasonably creating a large number of embodiments.
In practice, at least one instance cluster comprising the target instance cluster described above may be deployed on a physical host, wherein each instance cluster may provide a service. The control device may also first obtain first state information of the physical host before obtaining the GPU utilization, and determine, according to the first state information, whether there is authority to adjust the number of instances in at least one instance cluster in the physical host at the current time. The first state information is used for reflecting whether the physical host opens the adjusting authority of the instance number to the control device at the current time. It can be seen that the first state information of the physical host may be regarded as a global switch of the adjustment authority corresponding to the at least one instance cluster. The control device may be provided with the capability to control a plurality of instance clusters by means of a global switch.
Alternatively, the control device may also periodically acquire the first status information described above, and periodically determine its own adjustment authority. Alternatively, the period of acquiring the first state information may be the same as the period of determining whether to perform the expansion or contraction, for example, 5 minutes.
In practice, the service may have a special period, for example, a special period that is a period when there is a rapid increase in data flow in an online shopping scenario, or a special period that needs to perform a pressure test on the service with a maximum data flow, so that the service availability needs to be ensured by using the maximum GPU computing resource, and a certain time is required for adjusting the number of GPU computing instances in any instance cluster, so that the physical host may close the adjustment authority of the number of instances in the control device in the special period.
In this embodiment, before the control device obtains the GPU utilization of the target instance cluster, the control device may further determine whether the physical host opens the global switch to the control device, and determine whether to adjust the number of instances in the target instance cluster according to the open/close state of the global switch. The global switch can enable the control device to expand and contract the example clusters in the physical host at different periods.
With the above embodiment, at the current time, if the physical host already allows the control device to adjust the number of instances in at least one instance cluster, that is, the global switch of the instance cluster is turned on for the control device, before obtaining the GPU utilization, the control device may further obtain second state information of any instance cluster in at least one instance cluster, and determine, according to the second state information, whether the any instance cluster allows to adjust the number of instances at the current time. Any of the instance clusters may be the target instance cluster in the embodiment shown in fig. 1. The second state information is used for reflecting whether any instance cluster opens the adjusting authority of the instance number to the control device at the current time. It can be seen that the second state information can be considered as a local switch of the adjustment authority corresponding to any instance cluster.
Optionally, the control device may sequentially traverse the respective second state information of at least one instance cluster deployed in the physical host, and sequentially determine whether the local switch of each instance cluster is turned on.
Alternatively, the control device may also periodically acquire the second status information described above to periodically determine its own adjustment authority. Alternatively, the period of acquiring the second state information may be the same as the period of acquiring the first state information, the period of determining whether to perform expansion or contraction, for example, 5 minutes.
In practice, different services may or may not be present for a particular period of data traffic surge, taking into account the different services provided by at least one instance cluster. And for a plurality of services for which there is a special period, the corresponding special periods are not necessarily the same. For example, the special period corresponding to the service of the man-machine conversation with the online customer service may be a plurality of promotion periods in one year, while the service of the man-machine conversation with the voice assistant or the translation service, the automatic driving service and the like mentioned above have no obvious special period. Therefore, the granularity of the control device for realizing expansion and contraction can be thinned to the level of the instance cluster by setting the local switch for each instance cluster.
It should be noted that, according to actual needs, the global switch and the local switch may be set at the same time or may be set only one. In order to ensure the service quality, in a special period corresponding to an instance cluster, the instance cluster can often close the adjustment authority of the number of instances to the control device so as to improve the influence of expansion and contraction capacity on the service quality.
In this embodiment, before the GPU utilization of the target instance cluster is obtained, the control device may further determine whether the target instance cluster opens the local switch to the control device, and determine whether to adjust the number of instances in the target instance cluster according to the open/close state of the local switch. Namely, the local switch can control the control equipment to expand and contract the capacity of the example cluster in different time periods, and the time for accurately expanding and contracting the capacity can reduce the influence of the expansion and contraction capacity of the cluster on the service quality.
Still further, the target instance cluster is also provided with an adjustable period of time, which may correspond to a busy feature that the target instance cluster is servicing. The adjustable period may also be understood in conjunction with the following:
the adjustable period is usually a period when data traffic increases, and takes up an example of the busy feature in the embodiment shown in fig. 1, for the translation service, the adjustable period may be a busy period of the translation service, that is, a working period of the user, and the non-adjustable period may be an idle period of the translation service, that is, a non-working period of the user.
And to ensure quality of service, the adjustable period may alternatively be a period that includes an operational period but is longer than the operational period, such as eight points earlier to five points later, then the adjustable period may be seven points earlier to seven points later. Because the data traffic of the translation service is smaller in the evening, but the eight-point data traffic can reach a peak in the morning soon, the control device can start to expand the instance cluster for providing the translation service in the seven-point early according to the adjustable time period, so that the expansion and the data traffic increase are prevented from being carried out simultaneously, and the translation service quality is prevented from being influenced.
Similarly, for a daily human-machine conversation service, the adjustable period may be a weekend with more users and a non-working period of a workday. Alternatively, the adjustment period may be a longer period including a user rest period. For an autopilot service, the adjustment period may be the morning and evening peaks of the workday as well as the rest day. Alternatively, the adjustable period may be a longer period including a travel period. And similarly to the translation service, setting a longer adjustable period can perform expansion of the instance cluster in advance, thereby ensuring the quality of service.
It should be noted that, the control device may expand or contract the target instance cluster during the adjustable period. In order to ensure the service quality, in practice, the control device may also perform expansion on the target instance cluster during the non-adjustment period. That is, the adjustable period places a limit on the scaling of the target instance cluster.
If the target instance cluster is provided with an adjustable period and the cluster also opens its own adjustment authority to the control device, the control device may further determine whether the current time is in the adjustable period corresponding to the target instance cluster. If the current time is in the adjustable period corresponding to the target instance cluster, step S101 in fig. 1 is executed to determine the GPU utilization of the target instance cluster. In another case, if the current time is not in the adjustable period corresponding to the target instance cluster, the number of instances of the target instance cluster can be restored from the current number to the preset number. The preset number may correspond to busy and idle characteristics of the target instance cluster, and the instance clusters providing different services may have different preset numbers.
Optionally, the preset number may be an upper limit of a preset number range corresponding to the target instance cluster, where the preset number range is an upper limit and a lower limit of a expansion capacity corresponding to the target instance cluster. The number of instances is adjusted to the upper limit of the range of the preset number, and even if the data flow surge suddenly occurs in the non-adjustable period, the surge can be dealt with, so that the service quality is ensured.
Alternatively, the lower limit of the preset number range may be set to 2, so that even after one instance in the target instance cluster fails, another instance may be used to provide the service, i.e., guarantee high availability of the service. Optionally, the control device may obtain a preset number range corresponding to the target instance cluster through the communication interface.
In this embodiment, before the GPU utilization of the target instance cluster is obtained, for the target instance cluster in which the local switch has been turned on to the control device, the control device may further determine whether the current time is in an adjustable period corresponding to the target instance cluster, and determine whether to adjust the number of instances in the target instance cluster according to the determination result. That is, by means of the adjustable time period, the control device can expand and contract the target instance cluster in different time periods, so as to further accurately expand and contract the opportunity, and reduce the influence of the cluster expansion and contraction on the service quality.
The multiple determination process before obtaining the GPU utilization described above may also be understood in conjunction with fig. 2.
After the various judgment logics provided in the foregoing embodiments, at the current time, if the target instance cluster allows the control device to adjust the number of instances in the cluster, the control device may perform expansion and contraction on the target instance cluster. For capacity expansion, fig. 3 is a flowchart of another example management method according to an embodiment of the present invention. As shown in fig. 3, the method may include the steps of:
s201, determining the GPU utilization rate of a graphic processor of a target instance cluster, wherein the target instance cluster is deployed on a physical host, and the instances in the target instance cluster provide services by using GPU computing resources provided by the physical host.
The specific implementation process of the above step S201 may refer to the specific description of the related steps in the embodiment shown in fig. 1, which is not repeated herein.
S202, if the GPU utilization rate is greater than the preset utilization rate upper limit at the current time, the number of the instances contained in the target instance cluster is obtained.
And S203, if the number of the instances does not exceed the preset number range, creating a first number of new instances for the target instance cluster according to the degree that the GPU utilization rate is greater than the preset range and the busy and idle features, wherein the preset number range corresponds to the busy and idle features.
If the target instance cluster turns on the local switch to the control device and the GPU utilization rate of the current time target instance cluster is greater than the preset utilization rate upper limit, which indicates that the workload of the current target instance cluster is greater, the control device can further acquire the number of instances contained in the current time target instance cluster by means of the communication interface. Wherein the number of instances includes the instances that are running properly and the instances that are being created.
If the number of the instances exceeds the preset number range, which indicates that the capacity of the target instance cluster has reached the maximum at the current time, the normal provision of other services may be affected by further capacity expansion, and the control device does not expand the capacity of the target instance cluster.
If the number of the instances does not exceed the preset number range, which indicates that the target instance cluster has further capacity expansion space at the current time, the control device may create a first number of newly-added instances for the target instance cluster according to the degree that the GPU utilization rate is greater than the preset utilization rate upper limit and the busy/idle feature of the service provided by the target instance cluster, that is, the first number of instances are expanded for the target instance cluster.
For example, for a target instance cluster that provides a translation service, if the difference between the GPU utilization and the preset upper utilization limit is 5%, the GPU utilization exceeds the preset upper utilization limit to a low extent, and the probability of the translation service to generate a surge in data traffic is not high at the noon break time when the current time is in the working period, the first number may be determined to be 4. If the difference between the GPU utilization and the preset upper utilization limit is 10%, then the first number may be determined to be 6. If the difference between the GPU utilization rate and the preset utilization rate upper limit is 20%, the GPU utilization rate exceeds the preset utilization rate upper limit to a higher extent, and the probability of the data traffic surge of the translation service is higher in the working period after noon break at the current time, the first number can be determined to be 10.
Therefore, in the process that the degree of the GPU utilization rate exceeding the preset utilization rate upper limit is increased in equal proportion, the first quantity is not increased in equal proportion, but the higher the exceeding degree is, the larger the first quantity is, so that the number of times of the expansion of the target instance cluster can be reduced, and the influence of the expansion and contraction of the cluster on the service quality is reduced.
In addition, as can be seen from the related description in the above embodiments, a machine learning model with a large calculation amount can be deployed in the examples, and the functions of the model can be utilized to provide corresponding services for users. Then for the generation of new instances, an alternative way is to create a first number of initial instances. Wherein the initial instance has generic instance execution logic but is not yet able to provide the corresponding service. Thereafter, the control device may download the machine learning model and deploy the machine learning model to the first number of initial instances, respectively, to complete creation of the newly added instance.
In practice, since the model size of the machine learning model is large, it takes a certain time to download and deploy the machine learning model to the initial instance, so in the capacity expansion process, the capacity expansion of the excessive instance may reduce the speed of completing cluster capacity expansion, reduce the speed of service start, and affect the service quality. In this embodiment, according to the degree that the GPU utilization ratio is greater than the preset utilization ratio upper limit and the busy/idle feature of the service provided by the target instance cluster, a proper amount of newly added instances can be determined, so that the cluster can complete capacity expansion as soon as possible, and the influence of the capacity expansion and contraction of the cluster on the service quality can be reduced.
Compared to determining whether to expand the target instance cluster by directly referencing the GPU utilization, in this embodiment, the control device may flexibly determine whether to expand the target instance cluster by means of the global switch and/or the local switch and the adjustable period. Meanwhile, in this embodiment, when the control device determines that the GPU utilization rate of the target instance cluster is higher than the preset utilization rate upper limit, a suitable number of instances can be created according to busy features, so that the cluster completes capacity expansion as soon as possible, and the influence of the cluster capacity expansion on the service quality is reduced, that is, the situation that the service capacity expansion is started slowly due to too long time consumed by unreasonably creating a large number of instances is reduced.
In addition, the details of the embodiment which are not described in detail and the technical effects which can be achieved can be referred to the description of the above embodiment, and are not described herein.
According to the above embodiment, it is known that a certain period of time is required to create the new instance, and the period of time required to create the instance may be longer than the period of time for judging whether to perform the expansion and contraction. Thus, the following may occur:
and at the last time before the current time, if the control equipment carries out multiple judgment and determines that the expansion of the target instance cluster is required according to the obtained GPU utilization rate, but the expansion of the instance is not finished when the current time is reached, and at the moment, the GPU utilization rate of the target instance cluster obtained by the control equipment at the current time is still greater than the preset utilization rate upper limit. In this case, if the control device continues to select to create an instance for the target instance cluster, an excessive number of instances are created for the target instance cluster, which is obviously undesirable, and thus causes waste of GPU resources.
In order to improve the above situation, if the GPU utilization rate of the target instance cluster obtained by the control device at the current time is still greater than the preset upper utilization limit, the control device further determines whether the target instance cluster includes the instance being created at the current time. If the current time target instance cluster contains the instance being created, the control device does not continue to create the instance, i.e. does not expand the target instance cluster. Alternatively, the control device may acquire the number of instances being created by means of the communication interface. And at a next time after reaching the current time, the control device may determine whether to scale the target instance cluster according to the GPU usage of the target instance cluster acquired at the next time. If the GPU utilization rate acquired at the next time is smaller than the preset utilization rate lower limit, the expansion of the target instance at the last time is indicated to be capable of guaranteeing the service quality. Wherein the previous time, the current time and the next time belong to different judging periods for whether to perform expansion and contraction.
Therefore, the capacity expansion waiting mechanism can ensure the service quality and improve the waste of GPU computing resources in the physical equipment.
Corresponding to the capacity expansion case shown in fig. 3, fig. 4 is a flowchart of another example management method provided in the embodiment of the present invention for the capacity reduction case. As shown in fig. 4, this approach may include the steps of:
s301, determining the GPU utilization rate of a graphic processor of a target instance cluster, wherein the target instance cluster is deployed on a physical host, and the instances in the target instance cluster provide services by using GPU computing resources provided by the physical host.
The specific implementation process of the above step S301 may refer to the specific description of the related steps in the embodiment shown in fig. 1, which is not repeated herein.
S302, if the GPU utilization rate at the current time is smaller than the preset utilization rate lower limit, the number of the instances contained in the current time target instance cluster is obtained.
S303, if the number of the instances does not exceed the preset number range, deleting the second number of the instances in the target instance cluster according to the degree that the GPU utilization rate is smaller than the preset utilization rate lower limit and the busy and idle feature, wherein the preset number range corresponds to the busy and idle feature.
If the target instance cluster turns on the local switch for the control device and the GPU utilization rate of the target instance cluster at the current time is smaller than the preset utilization rate lower limit, which indicates that the workload of the target instance cluster at the current time is smaller and the instances in the cluster are more sufficient, the control device can further acquire the number of the instances contained in the target instance cluster at the current time by means of the communication interface.
If the number of instances exceeds the preset number range, which indicates that the target instance cluster has been scaled to a minimum extent at the current time, further scaling may affect the normal provision of the service, and the control device may not scale the target instance cluster.
If the number of the instances does not exceed the preset number range, which indicates that the target instance cluster has further capacity reduction space at the current time, the control device may delete the second number of instances in the target instance cluster according to the degree that the GPU utilization rate is less than the preset utilization rate upper limit and the busy/idle feature of the service provided by the target instance cluster, that is, the second number of instances in the target instance cluster are contracted.
For example, for a target instance cluster providing a translation service, if the difference between the preset lower utilization limit and the GPU utilization is less than 20%, which indicates that the GPU utilization is lower than the preset upper utilization limit, and the probability of the translation service having a surge in data traffic is not high during the noon break when the current time is in the working period, the second number may be determined to be 1. If the difference between the GPU utilization and the preset lower utilization limit is 20%, the second number may be determined to be 2. If the difference between the GPU utilization rate and the preset utilization rate upper limit is 30%, the GPU utilization rate exceeds the preset utilization rate lower limit to a higher extent, and the probability of data traffic surge of the translation service is smaller when the current time is in the noon break period, the second number can be determined to be 4.
Similar to capacity expansion, it can be seen that in the process that the degree to which the GPU utilization rate exceeds the preset utilization rate lower limit is increased in equal proportion, the second number is not increased in equal proportion, and the higher the exceeding degree is, the larger the second number is set, so that the number of times of capacity reduction of the target instance cluster can be reduced, and the influence of the cluster capacity reduction on the service quality is reduced.
And comparing the first quantity set in the capacity expansion process with the second quantity set in the capacity contraction process, if the degree that the GPU utilization ratio is smaller than the preset utilization ratio lower limit is the same as the degree that the GPU utilization ratio is larger than the preset utilization ratio upper limit in different time, the first quantity of the newly added examples is larger than the second quantity.
By taking the above example, when the GPU utilization ratio is greater than the upper limit of the preset utilization ratio by 20%, the first number is 10, and when the GPU utilization ratio is less than the lower limit of the preset utilization ratio by 20%, the second number is 2, that is, the large-step capacity expansion and small-step capacity reduction is performed, so that the number of times of performing capacity expansion and reduction on the target instance cluster can be reduced, and the probability of the situation that the data flow is increased rapidly and the capacity is required to be expanded after the large-step capacity reduction is reduced.
Compared with the direct reference GPU utilization determining whether to expand the target instance cluster, in this embodiment, the control device may flexibly determine whether to contract the target instance cluster by means of the global switch and/or the local switch and the adjustable period. Meanwhile, in the embodiment, the control device can also create a proper number of examples according to busy and idle characteristics, so that the cluster can complete capacity reduction as soon as possible, and the influence of the cluster capacity reduction on the service quality is reduced.
In addition, the details of the embodiment which are not described in detail and the technical effects which can be achieved can be referred to the description of the above embodiment, and are not described herein.
It should be noted that, in the above embodiments, the GPU utilization ratio corresponding to the target instance cluster, the number of instances of the target instance cluster, the number of instances being created in the target instance cluster, and the preset number range corresponding to the target instance cluster may be obtained by using different communication interfaces in the physical host.
In addition, in order to improve that frequent capacity expansion and contraction results in slow service start-up speed, for the example obtained by capacity expansion, the control device may also set an effective duration. The effective duration of the example is longer than the judging period of whether the control equipment performs expansion and contraction. It may happen that an instance does not reach the effective duration but at the present time the control device determines that the target instance cluster needs to be scaled, at which time the control device may continue to hold the instance without scaling the target instance cluster. When the next time control device determines that the target instance cluster needs to be expanded, the instance which is reserved at the current time and does not reach the effective duration can be directly used without re-creating the instance, so that the time required for creating the instance can be saved, the capacity expansion speed of the target instance cluster can be improved, and the service capacity restarting speed can be improved.
In summary, the above embodiments, a complete expansion and contraction process of the target instance cluster may also be described as: and judging whether the physical host deployed with the target instance cluster opens the global switch for the control equipment according to the first state information at the current time. If the global switch is turned on, whether the target instance cluster turns on the local switch of the control device or not is further judged according to the second state information, and if not, the next period is continued to be waited. If the local switch is also turned on, the control device judges whether the current time is in an adjustable period corresponding to the target instance cluster, otherwise, the control device continues to wait for the next period. If the current time target instance is in the adjustable period, the GPU utilization rate of the target instance cluster is obtained, and whether to expand and contract the cluster is determined according to the difference value between the GPU utilization rate and the upper limit and the lower limit of the preset utilization rate and the busy and idle feature of the service provided by the target instance cluster. Meanwhile, whether the number of the instances in the target instance cluster at the current time exceeds the preset number range corresponding to the target instance cluster is considered while determining whether to perform expansion and contraction.
The above procedure can also be understood in connection with fig. 5. Alternatively, in fig. 5, each time it is determined whether or not to perform the scaling, it may be performed from determining that the global switch is turned on. However, in practice, the global switch is turned off at a smaller probability once turned on, so that it may be directly performed from determining whether the local switch is turned on after determining that the global switch is turned on.
The services provided by the target instance clusters may be specifically referred to as target services, and then, as can be appreciated from the descriptions in the above embodiments, the target instance clusters may provide, for example, online translation, autopilot, video recognition, human-machine conversation, and so on. And the service quality of the target service provided by the target service cluster can be ensured through instance management.
Fig. 6 is a flowchart of yet another example management method according to an embodiment of the present invention. The present embodiment may have the same execution subjects as the above embodiments, i.e., may be executed by the control apparatus. As shown in fig. 6, the method may include the steps of:
s401, in response to operation of the target service provided by the target instance cluster, determining GPU utilization of a graphic processor of the target instance cluster, wherein the target instance cluster is deployed on a physical host, and the instances in the target instance cluster provide the target service by using GPU computing resources provided by the physical host.
S402, according to the GPU utilization rate and busy and idle characteristics of the target service, the number of the instances in the target instance cluster is adjusted, so that the target instance cluster with the adjusted number provides the target service, and the target service comprises at least one of online translation, automatic driving, video identification and man-machine conversation.
In the process of operating the target service, the GPU utilization rate of the target instance cluster providing the target service can be determined in real time, the number of instances in the target instance cluster is adjusted in real time according to the GPU utilization rate and the busy characteristic of the target service provided by the target instance cluster, and the target instance cluster with the adjusted number of instances can continue to provide the target service for users.
The target service may be at least one of line translation, automatic driving, video recognition and man-machine interaction mentioned in the above embodiments, or any service that can be implemented by a model or algorithm with a large amount of calculation.
In addition, the method for obtaining the GPU utilization, the method for obtaining the opportunity, and the method for adjusting the number of instances in the target instance cluster in this embodiment can be referred to the related description in each embodiment, and will not be described herein.
In this embodiment, the control device determines the GPU utilization of the target instance cluster, and then adjusts the number of instances in the target instance cluster according to the GPU utilization and the busy/idle characteristics of the service provided by the target instance cluster. The GPU utilization rate of the target instance cluster can reflect the workload of the cluster, the busy and idle feature of the service provided by the target instance cluster can reflect the rule of data flow generated by the service provided by the cluster, the busy and idle feature is also related to service content, and the data flow can be processed by the instances in the cluster.
Therefore, the method expands and contracts the clusters by referring to the information of different dimensions, namely the load pressure of the clusters and the type of the service provided by the clusters, so that the adjustment quantity and adjustment time of the examples in the clusters are more reasonable, the influence of the expansion and contraction of the clusters on the service quality is reduced, and the restarting speed after the expansion and contraction of the service is improved. Meanwhile, the utilization rate of GPU resources can be improved, and the cost required by normal service provision is reduced. In addition, other technical effects achieved in the present embodiment may be referred to the description in the above embodiments, and will not be described herein.
On the basis of the above-described method embodiments, the process of instance management can also be described below from the system point of view. Fig. 7 is a schematic structural diagram of an example management system according to an embodiment of the present invention. As shown in fig. 7, the system may include a control device and a physical host deployed with a target instance cluster, the instances in the target instance cluster being serviced using GPU computing resources provided by the physical host.
The control device may determine a GPU utilization of the target instance cluster, and further determine whether to adjust the number of instances in the target instance cluster based on the GPU utilization and busy features of services provided by the target instance cluster.
In this embodiment, the specific working process and the technical effects that can be achieved in each part of the system can be referred to the related descriptions in the related embodiments, which are not described herein.
Alternatively, the control device and the instance cluster for providing the service mentioned in the above embodiments may be deployed on one physical host, and then the process of instance management may be further described below from the perspective of the device. Fig. 8 is a schematic structural diagram of a physical host according to an embodiment of the present invention. As shown in fig. 8, the physical host may include a control component and a target instance cluster.
And the control component is used for determining the GPU utilization rate of the target instance cluster, and the instances in the target instance cluster provide services by using GPU computing resources provided by the physical host. And then, determining whether to adjust the number of the instances in the target instance cluster according to the obtained GPU utilization rate and busy and idle characteristics of the service provided by the target instance cluster.
Alternatively, at least one instance cluster including the target instance cluster may be deployed in the physical host, only the target instance cluster being shown in the figure.
In this embodiment, the specific working process and the technical effects that can be achieved in each part of the physical host can be referred to the related description in the related illustrated embodiments, which are not described herein again.
The specific implementation procedure of the method provided in the foregoing embodiments may be described below by taking an example that a physical host is deployed with a cluster instance that provides a translation service.
The physical host can be deployed with a control component and an instance cluster A, each instance in the instance cluster A is deployed with a machine learning model with Chinese-English translation capability, so that the instance cluster A can provide Chinese-English translation service for users, the number of the instances in the instance cluster A is 5, and the preset number range corresponding to the instance cluster A is the upper limit and the lower limit of the expansion capacity of the instance cluster A.
Based on this, the control component in the physical host may periodically determine whether the instance cluster needs to be scaled. For judging whether to expand and contract volume:
after the time T1 reaches the judging period P1, the control component may determine, according to the first state information of the physical host, that the physical host has turned on the global switch for the control component, where it is indicated that the physical host allows the control component to expand and contract each instance cluster deployed by itself. The control component may then determine, based on the second state information of instance cluster a, that instance cluster a has turned on its own local switch to the control component, indicating that instance cluster a allows the control component to scale itself.
Further, since the time T1 is six afternoon of the working day, it is within an adjustable period corresponding to the translation service. Wherein the adjustable period corresponding to the translation service may be seven early points to seven late points. At this time, the control component may further obtain that the GPU utilization rate of the instance cluster a is 60% and the extent of exceeding the preset upper utilization rate limit by 50% is 10%, and then the control component expands the instance cluster a, that is, creates 6 new instances for the cluster, where the number of the new instances may be determined according to the busy and idle features of the translation service. At this time, the expansion of the instance cluster a is completed.
And during the adjustment, a condition may exist; after the time T2 reaches the judging period P1, the GPU utilization rate acquired by the control component is still larger than the preset utilization rate upper limit, and meanwhile, the control component determines that 6 newly-added instances created at the time T1 are created when the time T2 is reached. At this time, the control component may execute a wait mechanism, i.e., may not continue to create new instances for instance cluster a, to avoid creating excessive instances for instance cluster to reduce the restart speed after service expansion.
The expansion and waiting process described above can also be understood in connection with fig. 9.
Then, after the time T3 reaches the judging period P3, the control component may also judge that both the global switch and the local switch are in an on state, and since the time T2 is 8 pm on the working day, the control device is not already in the adjustable period of the translation service pair, and at this time, the control device may not perform expansion and contraction processing on the instance cluster.
The control component can dynamically adjust the number of instances in instance cluster a with the level of utilization over an adjustable period. In the dynamic adjustment process, the expansion and contraction capacity of the example cluster can be rapidly realized through the comprehensive use of the global switch, the local switch, the adjustable time period, the GPU utilization rate and the busy and idle characteristics of the translation service, so as to improve the influence of the expansion and contraction capacity on the service quality. In the dynamic adjustment process, a large-step capacity expansion and small-step capacity reduction mechanism can be adopted, so that the influence of the expansion and the reduction on the service quality is reduced to the greatest extent.
An example management device of one or more embodiments of the present invention will be described in detail below. Those skilled in the art will appreciate that these example management devices may be configured using commercially available hardware components through the steps taught by the present solution.
Fig. 10 is a schematic structural diagram of an example management device according to an embodiment of the present invention, as shown in fig. 10, where the device includes:
The utilization determining module 11 is configured to determine a GPU utilization of a graphics processor of a target instance cluster, where the target instance cluster is deployed on a physical host, and an instance in the target instance cluster provides a service using GPU computing resources provided by the physical host.
And the adjusting module 12 is configured to adjust the number of instances in the target instance cluster according to the GPU utilization and the busy/idle feature of the service provided by the target instance cluster.
Optionally, the apparatus further comprises: and the permission determining module 13 is configured to determine whether the physical host is allowed to adjust the number of instances in the at least one instance cluster at the current time according to first state information of the physical host, where the first state information reflects whether the physical host opens permission for adjusting the number of instances of the control device at the current time.
Wherein the physical host has disposed thereon at least one instance cluster including the target instance cluster.
Optionally, the permission determining module 13 is further configured to determine, if the physical host at the current time allows adjustment of the number of instances in the at least one instance cluster, whether the target instance cluster at the current time allows adjustment of the number of instances according to second state information of the target instance cluster, where the second state information reflects whether the target instance cluster at the current time opens the permission for adjustment of the number of instances to the control device.
Optionally, the apparatus further comprises: a period determining module 14, configured to determine whether the current time is in an adjustable period corresponding to the target instance cluster.
The utilization determining module 11 is configured to determine a GPU utilization of the target instance cluster if the current time is in the adjustable period, where the adjustable period corresponds to the busy and idle feature.
Optionally, the adjusting module 12 is configured to restore the number of instances of the target instance cluster to a preset number, where the preset number corresponds to the busy and idle feature, if the current time is not in the adjustable period.
Optionally, the adjusting module 12 is configured to obtain the number of instances included in the target instance cluster if the GPU utilization rate is greater than a preset upper utilization limit at the current time; if the number of the instances does not exceed the preset number range, a first number of newly-added instances are created for the target instance cluster according to the degree that the GPU utilization rate is larger than the preset utilization rate upper limit and the busy and idle features, and the preset number range corresponds to the busy and idle features.
Optionally, the adjustment module 12 is configured to create the first number of initial instances; and respectively deploying the machine learning models into the first number of initial examples to finish the creation of the newly added examples.
The example is provided with a machine learning model, and the service provided by the example corresponds to the function of the machine learning model.
Optionally, the adjustment module 12 is further configured to stop instance creation if an instance being created is included in the target instance cluster at the current time; determining whether to adjust the number of instances in the target instance cluster according to GPU usage of the target instance cluster at a next time, the next time being after the current time.
Optionally, the adjusting module 12 is configured to obtain the number of instances included in the target instance cluster at the current time if the GPU utilization rate at the current time is less than a preset utilization rate lower limit; if the number of the instances does not exceed the preset number range, deleting a second number of instances in the target instance cluster according to the degree that the GPU utilization rate is smaller than the preset utilization rate lower limit and the busy/idle feature, wherein the preset number range corresponds to the busy/idle feature, and the current time belongs to an adjustable period corresponding to the target instance cluster;
and in different time, if the degree that the GPU utilization rate is smaller than the preset utilization rate lower limit is the same as the degree that the GPU utilization rate is larger than the preset utilization rate upper limit, the first number of newly added examples is larger than the second number.
The apparatus shown in fig. 10 may perform the method of the embodiment shown in fig. 1 to 5, and reference is made to the relevant description of the embodiment shown in fig. 1 to 5 for a part of this embodiment that is not described in detail. The implementation process and technical effects of this technical solution are described in the embodiments shown in fig. 1 to 5, and are not described herein.
In one possible design, the example management method provided in the foregoing embodiments may be applied to an electronic device, as shown in fig. 11, where the electronic device may include: a first processor 21 and a first memory 22. Wherein the first memory 22 is for storing a program for supporting the electronic device to execute the example management method provided in the embodiments shown in fig. 1 to 5 described above, the first processor 21 is configured to execute the program stored in the first memory 22.
The program comprises one or more computer instructions, wherein the one or more computer instructions, when executed by the first processor 21, are capable of performing the steps of:
determining the GPU utilization rate of a graphic processor of a target instance cluster, wherein the target instance cluster is deployed on a physical host, and the instances in the target instance cluster provide services by using GPU computing resources provided by the physical host;
And adjusting the number of the instances in the target instance cluster according to the GPU utilization rate and the busy and idle characteristics of the service provided by the target instance cluster.
Optionally, the first processor 21 is further configured to perform all or part of the steps in the embodiments shown in fig. 1 to 5.
The electronic device may further include a first communication interface 23 in a structure for the electronic device to communicate with other devices or communication systems.
In addition, an embodiment of the present invention provides a computer storage medium storing computer software instructions for the electronic device, which includes a program for executing the example management method shown in fig. 1 to 5.
Fig. 12 is a schematic structural diagram of another example management apparatus according to an embodiment of the present invention, as shown in fig. 12, where the apparatus includes:
a determining module 31, configured to determine, in response to operation of a target service provided by a target instance cluster, GPU utilization of a graphics processor of the target instance cluster, where the target instance cluster is deployed on a physical host, and an instance in the target instance cluster provides the target service using GPU computing resources provided by the physical host.
The number adjustment module 32 is configured to adjust the number of instances in the target instance cluster according to the GPU utilization and the busy/idle feature of the target service, so that the target instance cluster with the adjusted number provides the target service, where the target service includes at least one of online translation, autopilot, video recognition, and man-machine conversation.
The apparatus shown in fig. 12 may perform the method of the embodiment shown in fig. 6, and reference is made to the relevant description of the embodiment shown in fig. 6 for parts of this embodiment not described in detail. The implementation process and the technical effect of this technical solution are described in the embodiment shown in fig. 6, and are not described herein.
In one possible design, the example management method provided in the foregoing embodiments may be applied to another electronic device, as shown in fig. 13, where the electronic device may include: a second processor 41 and a second memory 42. Wherein the second memory 42 is for storing a program supporting the electronic device to execute the example management method provided in the embodiment shown in fig. 6 described above, the second processor 41 is configured for executing the program stored in the second memory 42.
The program comprises one or more computer instructions, wherein the one or more computer instructions, when executed by the processor 41, are capable of performing the steps of:
in response to operation of a target service provided by a target instance cluster, determining graphics processor GPU utilization of the target instance cluster, the target instance cluster deployed on a physical host, an instance in the target instance cluster providing the target service using GPU computing resources provided by the physical host;
And according to the GPU utilization rate and the busy and idle characteristics of the target service, adjusting the number of the instances in the target instance cluster so that the target instance cluster with the adjusted number provides the target service, wherein the target service comprises at least one of online translation, automatic driving, video identification and man-machine conversation.
Optionally, the second processor 41 is further configured to perform all or part of the steps in the embodiment shown in fig. 6.
The electronic device may further include a second communication interface 43 in its structure for communicating with other devices or communication systems.
In addition, an embodiment of the present invention provides a computer storage medium for storing computer software instructions for the above-described electronic device, which contains a program for executing the example management method shown in fig. 6.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (14)

1. An instance management method, comprising:
determining the GPU utilization rate of a graphic processor of a target instance cluster, wherein the target instance cluster is deployed on a physical host, and the instances in the target instance cluster provide services by using GPU computing resources provided by the physical host;
and adjusting the number of the instances in the target instance cluster according to the GPU utilization rate and the busy and idle characteristics of the service provided by the target instance cluster.
2. The method of claim 1, applied to a control device, the physical host having disposed thereon at least one instance cluster including the target instance cluster;
before determining the GPU utilization of the target instance cluster, the method further includes:
and determining whether the physical host allows the number of the instances in the at least one instance cluster to be adjusted at the current time according to first state information of the physical host, wherein the first state information reflects whether the physical host opens the adjusting authority of the number of the instances to the control equipment at the current time.
3. The method of claim 1, applied to a control device, the physical host having disposed thereon at least one instance cluster including the target instance cluster;
The method further comprises the steps of:
if the physical host allows the number of instances in the at least one instance cluster to be adjusted at the current time, determining whether the target instance cluster allows the number of instances to be adjusted at the current time according to second state information of the target instance cluster, wherein the second state information reflects whether the target instance cluster opens an adjustment authority of the number of instances to the control device at the current time.
4. A method according to claim 1 or 3, wherein prior to said determining graphics processor GPU utilization for a target instance cluster, the method further comprises:
determining whether the current time is in an adjustable period corresponding to the target instance cluster;
and if the current time is in the adjustable period, determining the GPU utilization rate of the target instance cluster, wherein the adjustable period corresponds to the busy and idle feature.
5. The method according to claim 4, wherein the method further comprises:
and if the current time is not in the adjustable period, recovering the number of the instances of the target instance cluster to a preset number, wherein the preset number corresponds to the busy and idle feature.
6. The method according to any one of claims 1 to 5, wherein said adjusting the number of instances in the target instance cluster according to the GPU utilization and busy characteristics of the services provided by the target instance cluster comprises;
if the GPU utilization rate is greater than a preset utilization rate upper limit at the current time, acquiring the number of examples contained in the target example cluster;
if the number of the instances does not exceed the preset number range, a first number of newly-added instances are created for the target instance cluster according to the degree that the GPU utilization rate is larger than the preset utilization rate upper limit and the busy and idle features, and the preset number range corresponds to the busy and idle features.
7. The method of claim 6, wherein a machine learning model is deployed in an instance, the instance providing services corresponding to functions of the machine learning model;
the creating a first number of newly added instances for the target instance cluster includes:
creating the first number of initial instances;
and respectively deploying the machine learning models into the first number of initial examples to finish the creation of the newly added examples.
8. The method of claim 6, wherein prior to creating the first number of newly added instances for the target instance cluster, the method further comprises:
If the current time includes the instance being created in the target instance cluster, stopping the instance creation;
determining whether to adjust the number of instances in the target instance cluster according to GPU usage of the target instance cluster at a next time, the next time being after the current time.
9. The method according to any one of claims 1 to 5, wherein the current time belongs to an adjustable period corresponding to the target instance cluster;
the adjusting the number of the instances in the target instance cluster according to the GPU utilization and the busy and idle characteristics of the service provided by the target instance cluster includes:
if the GPU utilization rate at the current time is smaller than the preset utilization rate lower limit, acquiring the number of examples contained in the target example cluster at the current time;
and if the number of the instances does not exceed the preset number range, deleting the second number of the instances in the target instance cluster according to the degree that the GPU utilization rate is smaller than the preset utilization rate lower limit and the busy and idle feature, wherein the preset number range corresponds to the busy and idle feature.
10. An instance management method, comprising:
In response to operation of a target service provided by a target instance cluster, determining graphics processor GPU utilization of the target instance cluster, the target instance cluster deployed on a physical host, an instance in the target instance cluster providing the target service using GPU computing resources provided by the physical host;
and according to the GPU utilization rate and the busy and idle characteristics of the target service, adjusting the number of the instances in the target instance cluster so that the target instance cluster with the adjusted number provides the target service, wherein the target service comprises at least one of online translation, automatic driving, video identification and man-machine conversation.
11. An instance management system, comprising: the control device and a physical host deployed with a target instance cluster, wherein the instances in the target instance cluster provide services by using GPU computing resources provided by the physical host;
the control device is used for determining the GPU utilization rate of the target instance cluster; and determining whether to adjust the number of the instances in the target instance cluster according to the GPU utilization rate and the busy and idle characteristics of the service provided by the target instance cluster.
12. A physical host comprising: a control component and a target instance cluster;
the control component is configured to determine a GPU utilization of the target instance cluster, where the instances in the target instance cluster provide services using GPU computing resources provided by the physical host; and determining whether to adjust the number of the instances in the target instance cluster according to the GPU utilization rate and the busy and idle characteristics of the service provided by the target instance cluster.
13. An electronic device, comprising: a memory, a processor; wherein the memory has stored thereon executable code which, when executed by the processor, causes the processor to perform the instance management method of any of claims 1-10.
14. A non-transitory machine-readable storage medium having stored thereon executable code which, when executed by a processor of an electronic device, causes the processor to perform the instance management method of any of claims 1-10.
CN202310320751.0A 2023-03-28 2023-03-28 Instance management method, system, physical host, device and storage medium Pending CN116302555A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310320751.0A CN116302555A (en) 2023-03-28 2023-03-28 Instance management method, system, physical host, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310320751.0A CN116302555A (en) 2023-03-28 2023-03-28 Instance management method, system, physical host, device and storage medium

Publications (1)

Publication Number Publication Date
CN116302555A true CN116302555A (en) 2023-06-23

Family

ID=86828640

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310320751.0A Pending CN116302555A (en) 2023-03-28 2023-03-28 Instance management method, system, physical host, device and storage medium

Country Status (1)

Country Link
CN (1) CN116302555A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116665670A (en) * 2023-07-28 2023-08-29 深圳博瑞天下科技有限公司 Speech recognition task management method and system based on resource configuration analysis

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116665670A (en) * 2023-07-28 2023-08-29 深圳博瑞天下科技有限公司 Speech recognition task management method and system based on resource configuration analysis
CN116665670B (en) * 2023-07-28 2023-10-31 深圳博瑞天下科技有限公司 Speech recognition task management method and system based on resource configuration analysis

Similar Documents

Publication Publication Date Title
US10489476B2 (en) Methods and devices for preloading webpages
CN110704177B (en) Computing task processing method and device, computer equipment and storage medium
CN116302555A (en) Instance management method, system, physical host, device and storage medium
CN113190282A (en) Android operating environment construction method and device
CN112463290A (en) Method, system, apparatus and storage medium for dynamically adjusting the number of computing containers
CN113282580A (en) Method, storage medium and server for executing timed task
CN114168262B (en) Cloud platform mirror image cache management method based on LRU replacement algorithm
US20130132681A1 (en) Temporal standby list
CN113946491A (en) Microservice data processing method, microservice data processing device, computer equipment and storage medium
CN107493312B (en) Service calling method and device
CN112286559A (en) Upgrading method and device for vehicle-mounted intelligent terminal
CN112363828A (en) Memory fragment management method and device, vehicle-mounted system and vehicle
CN116527590A (en) Distributed current limiting implementation method and device for cloud native gateway
CN116302448A (en) Task scheduling method and system
CN112905699B (en) Full data comparison method, device, equipment and storage medium
CN111240805A (en) Cloud operating system user switching processing method and device
CN110109775A (en) Virtual machine restoration methods, device, terminal device and storage medium
US11256607B1 (en) Adaptive resource management for instantly provisioning test environments via a sandbox service
CN111338668B (en) Method and device for upgrading code in real-time computing
CN116719632B (en) Task scheduling method, device, equipment and medium
CN110377381B (en) List refreshing method and device for information system
CN116820332A (en) Data migration method and device
CN107741810B (en) List management method, device and computer readable storage medium
CN116974695A (en) Pod resource capacity shrinking method, device, equipment and storage medium
CN117492922A (en) Virtual machine fault recovery method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination