CN117707731A - Scheduling method, device, equipment and storage medium for model deployment - Google Patents

Scheduling method, device, equipment and storage medium for model deployment Download PDF

Info

Publication number
CN117707731A
CN117707731A CN202311702784.8A CN202311702784A CN117707731A CN 117707731 A CN117707731 A CN 117707731A CN 202311702784 A CN202311702784 A CN 202311702784A CN 117707731 A CN117707731 A CN 117707731A
Authority
CN
China
Prior art keywords
chip
team
deployed
model
task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311702784.8A
Other languages
Chinese (zh)
Inventor
胡小刚
戴�峰
成念
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Changan Automobile Co Ltd
Original Assignee
Chongqing Changan Automobile Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Changan Automobile Co Ltd filed Critical Chongqing Changan Automobile Co Ltd
Priority to CN202311702784.8A priority Critical patent/CN117707731A/en
Publication of CN117707731A publication Critical patent/CN117707731A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a scheduling method, a device, equipment and a storage medium for model deployment, wherein the method comprises the following steps: acquiring a model to be deployed and chip types of the model to be deployed, wherein the model to be deployed is cached in a model pool to be deployed of a cloud platform; constructing tasks to be deployed based on the models to be deployed, sequencing the tasks to be deployed corresponding to the same chip type according to priority, and generating a task queue, wherein the tasks to be deployed comprise team tasks and personal tasks; dispatching the tasks to be deployed to the chips in the corresponding chip resource pools according to the task queues, enabling the models to be deployed to the corresponding chips from the model pools to be deployed, wherein the task queues, the chip resource pools and the chip types are in one-to-one correspondence with each other, and the chips comprise shared chips and team exclusive chips; the team task is scheduled to the team exclusive chip belonging to the same team as the team task, and the individual task is scheduled to the shared chip, so that the scheduling of the model deployment is more reliable, and the rationality of the chip resource allocation and the efficiency of the model deployment are improved.

Description

Scheduling method, device, equipment and storage medium for model deployment
Technical Field
The invention relates to the technical field of model deployment, in particular to a scheduling method, a scheduling device, scheduling equipment and a storage medium for model deployment.
Background
With the development of artificial intelligence, the application of the deep learning model is more and more extensive, but the trained deep learning model needs to be deployed in an actual production environment to generate real application value, for example, an image recognition model is deployed on camera equipment to realize functions of face recognition, license plate recognition and the like, a natural language model is deployed in intelligent customer service to realize functions of language understanding, emotion analysis and the like. In the process of model deployment, a proper deployment mode is required to be selected and deployed into an AI (artificial intelligence) chip, different types of chips require different frame tools, the model is required to be converted into a format capable of being deployed by the chip by using the frame tools before model deployment, and the model is quantized and compiled, so that the model is simplified, the occupation of the model to chip resources is reduced, and then the model is scheduled to the corresponding chip for deployment. In the related art, when a cloud platform performs model deployment, whether each chip is idle is generally monitored in real time, if an idle chip of a model matching type exists, the model is scheduled to the idle chip of a corresponding type for deployment, otherwise, the current task is completed by waiting for the execution of the chip of the corresponding type, and then the model is deployed to the chip.
However, when a plurality of teams perform model deployment on the cloud platform at the same time, the task that the chip resources are deployed by the models of other teams is easy to occupy, the idle chips can not be used, the problems of unreasonable allocation of the chip resources and low deployment efficiency exist, and the use experience of users is poor.
Disclosure of Invention
In view of the above-mentioned drawbacks of the prior art, the present invention provides a scheduling method, apparatus, device and storage medium for model deployment, so as to solve at least one of the above-mentioned technical problems.
In a first aspect, the present invention provides a scheduling method for model deployment, including: acquiring a model to be deployed and a chip type corresponding to the model to be deployed, wherein the model to be deployed is cached in a model pool to be deployed of a cloud platform; constructing a task to be deployed based on the model to be deployed, sequencing the tasks to be deployed corresponding to the same chip type according to priority, and generating a task queue, wherein the task to be deployed comprises a team task and a personal task; dispatching the task to be deployed to a chip in a corresponding chip resource pool according to the task queue, enabling the model to be deployed from the model pool to the corresponding chip, wherein the task queue, the chip resource pool and the chip type are in one-to-one correspondence with each other, and the chip comprises a shared chip and a team exclusive chip; and dispatching the team task to the team exclusive chip belonging to the same team as the team task, and dispatching the personal task to the shared chip.
In an embodiment of the present invention, the constructing a task to be deployed based on the model to be deployed, and sequencing the tasks to be deployed corresponding to the same chip type according to priorities, to generate a task queue, includes: setting the priority of each task to be deployed, wherein the setting of the priority of the team task comprises uniformly setting the same priority for each team task corresponding to any team, or independently setting the priority for each team task corresponding to any team; and sequencing each team task and each personal task corresponding to the same chip type from high to low according to priority, determining the execution sequence of each task to be deployed, and generating the task queue based on the execution sequence.
In an embodiment of the present invention, the scheduling the task to be deployed to the chip in the corresponding chip resource pool according to the task queue includes: determining team information of the team tasks to be scheduled and team labels of each team exclusive chip in the chip resource pool; matching team information of the team tasks with team labels of each team exclusive chip, determining the team exclusive chips belonging to the same team with the team tasks, and taking the team exclusive chips as target team chips; if any of the target team chips is idle, scheduling the team task to the idle target team chip; if all the target team chips are not idle, scheduling the team tasks to the idle shared chips; and if the target team chip and the shared chip are not idle, stopping scheduling the model to be deployed corresponding to the team task in the pool to be deployed until the idle target team chip or the idle shared chip exists.
In an embodiment of the present invention, before the scheduling the team task to the team exclusive chip belonging to the same team as the team task, the method further includes: counting the number of chips used by each user in each team; if the number of chips used by any user in the team is larger than a preset first threshold, taking the user as a target user, and stopping scheduling the team task corresponding to the target user until the number of chips used by the target user is smaller than the preset first threshold; if the total number of the chips obtained by adding the numbers of the chips used by all the users in the team is larger than a preset second threshold, all the team tasks corresponding to the team stop scheduling until the total number of the chips used by the team is smaller than the second threshold.
In an embodiment of the present invention, before the task to be deployed is scheduled to the chip in the corresponding chip resource pool according to the task queue, the method includes: deploying a chip program on each chip to acquire chip information of each chip through the chip program, wherein the chip information at least comprises the chip type, the chip IP and a chip driving version; and configuring the same resource address for the chips with the same chip type by utilizing the chip program so as to allocate each chip to the corresponding chip resource pool, wherein the resource addresses are in one-to-one correspondence with the chip resource pools.
In an embodiment of the present invention, before the task to be deployed is scheduled to the chip in the corresponding chip resource pool according to the task queue, the method further includes: monitoring the use state of the chip by using the chip program; if the chip is in an offline state, stopping scheduling the task to be deployed to the chip; and if the chip is in an idle state, scheduling the task to be deployed to the chip.
In an embodiment of the present invention, before the obtaining the model to be deployed and the chip type corresponding to the model to be deployed, the method further includes: acquiring an initial model and the chip type pre-deployed by the initial model; screening frame tools corresponding to the chip types pre-deployed by the initial model from a preset frame tool package, converting the initial model according to the frame tools, and determining an intermediate model; and quantifying and compiling the intermediate model to obtain the model to be deployed, wherein the model to be deployed is shared by teams to which the user belongs.
In a second aspect, the present invention further provides a scheduling apparatus for model deployment, including: the acquisition module is used for acquiring a model to be deployed and a chip type corresponding to the model to be deployed, wherein the model to be deployed is cached in a model pool to be deployed of the cloud platform; the queue generating module is used for constructing tasks to be deployed based on the models to be deployed, sequencing the tasks to be deployed corresponding to the same chip type according to priority, and generating a task queue, wherein the tasks to be deployed comprise team tasks and personal tasks; the scheduling module is used for scheduling the task to be deployed to a chip in a corresponding chip resource pool according to the task queue, so that the model to be deployed is deployed from the model pool to be deployed to the corresponding chip, the task queue, the chip resource pool and the chip type are in one-to-one correspondence with each other, the chip comprises a shared chip and a team exclusive chip, wherein the team task is scheduled to the team exclusive chip belonging to the same team as the team task, and the personal task is scheduled to the shared chip.
In a third aspect, the present invention also provides an electronic device, including: one or more processors; and a storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the scheduling method of model deployment as described in the above embodiments.
In a fourth aspect, the present invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor of a computer, causes the computer to perform a scheduling method of model deployment as described in the above embodiments.
The invention has the beneficial effects that: the invention provides a scheduling method, a scheduling device, scheduling equipment and a storage medium for model deployment. On the one hand, by acquiring the model to be deployed and the chip types of the model to be deployed, the model to be deployed is cached in a model pool to be deployed of the cloud platform, tasks to be deployed are constructed based on the model to be deployed, tasks to be deployed corresponding to the same chip types are ordered according to priority, and a task queue is generated, so that the model to be deployed is queued for deployment, the deployment pressure of the cloud platform model is reduced, and the reliability of the cloud platform is improved. On the other hand, team tasks are scheduled to team exclusive chips belonging to the same team as team tasks, and individual tasks are scheduled to shared chips, so that exclusive chips are configured for each team, the situation that chip resources are fully occupied is avoided to a great extent, the rationality of chip resource allocation and the efficiency of model deployment are improved, and the reliability of model deployment and scheduling is high.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention. It is evident that the drawings in the following description are only some embodiments of the present invention and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art. In the drawings:
FIG. 1 is a schematic diagram of a model deployment system shown in an exemplary embodiment of the present invention;
FIG. 2 is a flow chart of a scheduling method of model deployment, shown in an exemplary embodiment of the invention;
FIG. 3 is a schematic diagram of chip management shown in an exemplary embodiment of the invention;
FIG. 4 is a schematic diagram illustrating a model deployment schedule to a chip resource pool in accordance with an exemplary embodiment of the present invention;
FIG. 5 is a schematic diagram of acquiring a model to be deployed, shown in accordance with an exemplary embodiment of the present invention;
FIG. 6 is a block diagram of a scheduling apparatus illustrating a modular deployment in accordance with an exemplary embodiment of the present invention;
fig. 7 is a schematic diagram of a computer system suitable for use in implementing the electronic device of the present invention, as shown in an exemplary embodiment of the present invention.
Detailed Description
Further advantages and effects of the present invention will become readily apparent to those skilled in the art from the disclosure herein, by referring to the accompanying drawings and the preferred embodiments. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be understood that the preferred embodiments are presented by way of illustration only and not by way of limitation.
It should be noted that the illustrations provided in the following embodiments merely illustrate the basic concept of the present invention by way of illustration, and only the components related to the present invention are shown in the drawings and are not drawn according to the number, shape and size of the components in actual implementation, and the form, number and proportion of the components in actual implementation may be arbitrarily changed, and the layout of the components may be more complicated.
In the following description, numerous details are set forth in order to provide a more thorough explanation of embodiments of the present invention, it will be apparent, however, to one skilled in the art that embodiments of the present invention may be practiced without these specific details, in other embodiments, well-known structures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring the embodiments of the present invention.
Referring to FIG. 1, a schematic diagram of a model deployment system is shown in accordance with an exemplary embodiment of the present invention. The scheduling method of the model deployment in the invention can be implemented in the model deployment system 100 in fig. 1, the model deployment system 100 comprises a model management module 110 for managing initial models uploaded to the cloud by users, multiple versions can be created under each initial model, and the initial models uploaded by users in the same team can be shared; the model conversion module 120 is configured to convert the initial model by using frames corresponding to the chips of the various types of chips configured in advance to obtain a model to be deployed; the model deployment module 130 is configured to send the model to be deployed to a model resource pool to be deployed; a chip resource management module 140, configured to manage chips in each chip resource pool; and the dispatching center 150 is used for dispatching the to-be-deployed model from the to-be-deployed resource pool to the chips in the chip resource pool for deployment.
Referring to FIG. 2, a flow chart of a scheduling method for model deployment is shown in accordance with an exemplary embodiment of the present invention. As shown in fig. 2, in an exemplary embodiment, the scheduling method of model deployment at least includes steps S210 to S240, which are described in detail below:
step S210, a model to be deployed and chip types corresponding to the model to be deployed are obtained, and the model to be deployed is cached in a model pool to be deployed of the cloud platform.
Specifically, the model to be deployed may be created by a certain user in a team or may be created by a personal user, so as to avoid unstable cloud platform due to excessive pressure on the cloud platform caused by excessive models to be deployed at the same time, introduce a resource pool to be deployed, and when a plurality of models to be deployed exist, queue the models to be deployed corresponding to different chip types separately for waiting for scheduling, thereby reducing the pressure of the cloud platform deployment model and improving the stability of the cloud platform.
Step S120, constructing tasks to be deployed based on the models to be deployed, and sequencing the tasks to be deployed corresponding to the same chip type according to priority, so as to generate a task queue, wherein the tasks to be deployed comprise team tasks and personal tasks.
Specifically, setting the priority of each task to be deployed, wherein setting the priority of team tasks comprises uniformly setting the same priority for each team task corresponding to any team, or independently setting the priority for each team task corresponding to any team; and sequencing each team task and each person task corresponding to the same chip type from high to low according to the priority, determining the execution sequence of each task to be deployed, and generating a task queue based on the execution sequence.
In one embodiment of the present invention, if a task to be deployed is urgent, the priority of the task to be deployed may be modified to adjust the execution sequence of the task to be deployed, so as to schedule the task to be deployed preferentially. For example, the priorities are denoted by the numerals 1 to 7, the priority of 1 is lowest, the priority of 7 is highest, and if the priorities of two tasks to be deployed are 5 and 3, the task to be deployed with priority set to 5 is placed before the task to be deployed with priority set to 3; if the two tasks to be deployed are the same and are 1, sequencing the two tasks from small to large according to the generation time of the two tasks to be deployed. The priorities of the team tasks can be set independently, and the priorities of all team tasks belonging to the same team can be set uniformly. Therefore, when the urgent task to be deployed exists, the task to be deployed can be scheduled preferentially, so that the scheduling of the model deployment is more reasonable.
Step S130, scheduling the task to be deployed to the chip in the corresponding chip resource pool according to the task queue, so that the model to be deployed is deployed to the corresponding chip from the model pool to be deployed, the task queue, the chip resource pool and the chip types are in one-to-one correspondence with each other, and the chip comprises a shared chip and a team exclusive chip.
Specifically, a chip program is deployed on each chip, so that chip information of each chip is collected through the chip program, wherein the chip information at least comprises a chip type, a chip IP and a chip driving version; and configuring the same resource address for the chips of the same chip type by using the chip program so as to allocate each chip to a corresponding chip resource pool, wherein the resource addresses correspond to the chip resource pools one by one.
Referring to fig. 3, a schematic diagram of chip management according to an exemplary embodiment of the present invention is shown. As shown in fig. 3, the chip-side program 32, that is, the chip program is deployed on the chip 33, and the chip resource management module 30 can interact through the chip program, that is, the cloud platform and the chip interact through the chip program. When a new chip is added, only a chip program is required to be deployed on the new chip, chip information of the new chip can be acquired through the chip program and reported to a cloud platform, and a resource address corresponding to the same chip type is configured for the new chip according to the chip type of the new chip and is brought into a chip resource pool of the corresponding chip type; if the chip resource pool corresponding to the chip type of the new chip does not exist, an independent resource address is configured for the new chip, and the chip resource pool corresponding to the chip type of the new chip is created. Therefore, through the interaction mode of the chip program and the cloud platform, the quick configuration of the new chip type can be realized, the cloud platform is not required to be subjected to additional code development, a plurality of APIs (interfaces) which are suitable for various chip types are configured, and the time for integrating chips by the cloud platform and the expansibility of chip resources are greatly shortened.
In one embodiment of the invention, the resource address (URL address) of the chip is an interface address for the chip to receive the task to be deployed, and the chips of the same chip type use the same interface address, so that when the task to be deployed issued by the cloud platform is scheduled to the corresponding chip, the task to be deployed only needs to be scheduled according to the chip IP of the corresponding chip.
Specifically, before the model deployment task to be deployed is scheduled to the corresponding chip, the method further comprises: monitoring the use state of the chip by using a chip program; if the chip is in an offline state, stopping scheduling the task to be deployed to the chip; and if the chip is in an idle state, scheduling the task to be deployed to the chip.
In one embodiment of the invention, the use state of the chip can be monitored in real time by utilizing the chip program, wherein the use state comprises the on-line state of the chip and the progress of executing the task to be deployed, the use state of the chip is periodically reported to the cloud platform, and if the use state of the chip is not received by the cloud platform outside the preset time, the cloud platform considers that the chip is off-line and does not issue the task to be deployed to the chip.
Referring to FIG. 4, a schematic diagram of a model deployment schedule to a chip resource pool is shown in accordance with an exemplary embodiment of the present invention. As shown in fig. 4, there are N chip resource pools, where N is a positive integer, and a task to be deployed generated based on a model to be deployed is scheduled to a resource pool of a corresponding chip type, so that the model to be deployed is deployed from the model pool to be deployed 40 to the resource pool of the corresponding chip type, that is, the model to be deployed of the a type should be deployed to the chip in the chip resource pool of the a type.
Step S140, wherein the team task is scheduled to the team exclusive chip belonging to the same team as the team task, and the individual task is scheduled to the shared chip.
Specifically, team information of team tasks to be scheduled and team labels of each team exclusive chip in a chip resource pool are determined; matching team information of the team tasks with team labels of each team exclusive chip, determining team exclusive chips belonging to the same team with the team tasks, and taking the team exclusive chips as target team chips; if any target team chip is idle, scheduling the team task to the idle target team chip; if all the target team chips are not idle, scheduling the team tasks to idle shared chips; and if the target team chip and the shared chip are not idle, stopping scheduling of the model to be deployed corresponding to the team task in the pool to be deployed until the idle target team chip or the idle shared chip exists.
In one embodiment of the invention, in order to prevent chip resources from being occupied by tasks to be deployed of part of teams, the chip resources cannot be used by the rest teams or individual users, and a special team exclusive chip is configured for each team, so that the occupation of other teams or individual users is avoided, and the chip resource allocation of each team is more reasonable. In addition, in order to avoid overlarge team task amount of the team deployment model, the team exclusive chip to which the team belongs cannot meet the use requirement, and the team task of the team can use the shared chip under the condition that the team exclusive chip is occupied. The use objects of the shared chip are not limited, and all groups can use the shared chip, so that the condition of insufficient chip resources is avoided, and the utilization rate of the chip is improved.
Specifically, counting the number of chips used by each user in each team; if the number of chips used by any user in the team is larger than a preset first threshold, taking the user as a target user, and stopping scheduling the team task corresponding to the target user until the number of chips used by the target user is smaller than the preset first threshold; if the total number of the chips obtained by adding the numbers of the chips used by all the users in the team is larger than a preset second threshold, all team tasks corresponding to the team stop scheduling until the total number of the chips used by the team is smaller than the second threshold.
In one embodiment of the invention, when each team uses the cloud platform to perform model deployment, the quantity limitation of the chip resources used by the team is configured, namely, the total number of chips used by the team is smaller than a preset second threshold, and each member in the team also configures the quantity limitation of the chip resources used simultaneously, namely, the quantity of the chips used by each user in the team is smaller than a preset first threshold, wherein the preset first threshold configured for each team and the preset second threshold configured for each user in the team can be adjusted according to actual conditions so as to limit the occupation of the chip resources by each team and enable the chip resource allocation to be more reasonable.
Referring to fig. 5, a schematic diagram of obtaining a model to be deployed according to an exemplary embodiment of the present invention is shown. As shown in fig. 5, before step S210, the acquisition flow of the model to be deployed includes: acquiring an initial model and a chip type of initial model pre-deployment; screening frame tools corresponding to the chip types pre-deployed by the initial model from a preset frame tool kit, converting the initial model according to the frame tools, and determining an intermediate model; and quantifying and compiling the intermediate model to obtain a model to be deployed, wherein the model to be deployed is shared by teams to which the user belongs.
Specifically, the initial model 510 is a trained model, and may be uploaded to the cloud platform through a web browser, a client tool, an SDK (software development kit), an API, and the like, which is not limited herein. After the initial model is converted according to the framework tool of the corresponding type of the initial model, the intermediate model 520 is obtained Onnx (Open Neural Network Exchange), and it should be understood that the format of the intermediate model converted by the initial model is not limited to Onnx, and may be converted into an intermediate model of another intermediate format according to actual requirements, for example, caffe (convolutional neural network framework). Then, the Onnx intermediate model 520 is quantized to compress the Onnx intermediate model 520, reduce the scale thereof to reduce the occupation of chip resources, obtain a quantized model 530, and finally compile the quantized model 530 to obtain a trt (TensorRT) model 540, i.e. a model to be deployed.
In one embodiment of the invention, the preset framework tool kit is integrated with framework tools of each mainstream chip type, so that the initial model can be quickly converted through the cloud platform, the initial model is quantized, the occupation of chip resources is small, and the efficiency of model deployment is improved.
Optionally, if the frame tool corresponding to the chip type pre-deployed by the initial model does not exist in the preset frame tool package, taking the frame tool corresponding to the chip type pre-deployed by the initial model as a target frame tool, manufacturing the target frame tool into a mirror image, starting a container for executing the target frame tool according to the mirror image, and converting the initial model by using the container to obtain the intermediate model. And the frame tools of the chip types which are not integrated in the preset frame tool kit can be uploaded to the cloud platform in a mirror image mode.
On one hand, the to-be-deployed model and the chip types of the to-be-deployed model are acquired, the to-be-deployed model is cached in a to-be-deployed model pool of the cloud platform, to-be-deployed tasks are constructed based on the to-be-deployed model, and the to-be-deployed tasks corresponding to the same chip types are ordered according to priority, so that a task queue is generated, the to-be-deployed model is queued for deployment, the deployment pressure of the cloud platform model is reduced, and the reliability of the cloud platform is improved; on the other hand, team tasks are scheduled to team exclusive chips belonging to the same team as team tasks, and individual tasks are scheduled to shared chips, so that exclusive chips are configured for each team, the situation that chip resources are fully occupied is avoided to a great extent, the rationality of chip resource allocation and the efficiency of model deployment are improved, and the reliability of model deployment and scheduling is high.
Referring to FIG. 6, a block diagram of a model-deployed scheduler is shown in accordance with an exemplary embodiment of the present invention. As shown in fig. 7, the scheduling apparatus of the exemplary model deployment includes: an acquisition module 610, a queue generation module 620, and a scheduling module 640.
The obtaining module 610 is configured to obtain a model to be deployed and a chip type corresponding to the model to be deployed, where the model to be deployed is cached in a model pool to be deployed of the cloud platform;
the queue generating module 620 is configured to construct a task to be deployed based on the model to be deployed, and the task to be deployed corresponding to the same chip type is ordered according to priority, so as to generate a task queue, where the task to be deployed includes a team task and a personal task;
the scheduling module 640 is configured to schedule the task to be deployed to a chip in the corresponding chip resource pool according to the task queue, so that the model to be deployed is deployed from the model pool to be deployed to the corresponding chip, the task queue, the chip resource pool and the chip types are in one-to-one correspondence with each other, the chip includes a shared chip and a team exclusive chip, wherein the team task is scheduled to the team exclusive chip belonging to the same team as the team task, and the personal task is scheduled to the shared chip.
In one embodiment of the present invention, the function of the acquisition module in the scheduling apparatus for model deployment may be implemented by a combination of the model management module, the model conversion module, and the model deployment module in fig. 1, the function of the queue generating module may be implemented by the scheduling center in fig. 1, and the function of the scheduling module may be implemented by a combination of the scheduling center and the chip resource manager in fig. 1.
It should be noted that, the scheduling device of the model deployment provided by the above embodiment and the scheduling method of the model deployment provided by the above embodiment belong to the same concept, and the specific manner of performing the operation in each step has been described in detail in the system embodiment, which is not repeated here.
The embodiment of the invention also provides electronic equipment, which comprises: one or more processors; and a storage device for storing one or more programs which, when executed by the one or more processors, cause the electronic device to implement the scheduling method of model deployment provided in the above embodiments.
Referring to FIG. 7, a schematic diagram of a computer system suitable for use in implementing an embodiment of the invention is shown. It should be noted that, the computer system 700 of the electronic device shown in fig. 7 is only an example, and should not impose any limitation on the functions and the application scope of the embodiments of the present invention.
As shown in fig. 7, the computer system 700 includes a central processing unit (Central Processing Unit, CPU) 701 that can perform various appropriate actions and processes, such as performing the methods in the above-described embodiments, according to a program stored in a Read-Only Memory (ROM) 702 or a program loaded from a storage section 707 into a random access Memory (RandomAccess Memory, RAM) 703. In the RAM 703, various programs and data required for the system operation are also stored. The CPU 701, ROM 702, and RAM 703 are connected to each other through a bus 704. An Input/Output (I/O) interface 705 is also connected to bus 704.
The following components are connected to the I/O interface 705: an input section 706 including a keyboard, a mouse, and the like; an output section 707 including a Cathode Ray Tube (CRT), a liquid crystal display (Liquid Crystal Display, LCD), and the like, a speaker, and the like; a storage section 708 including a hard disk or the like; and a communication section 709 including a network interface card such as a LAN (local area network) card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. The drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed on the drive 710 as needed, so that a computer program read out therefrom is installed into the storage section 708 as needed.
In particular, according to embodiments of the present invention, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present invention include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising a computer program for performing the method shown in the flowchart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 709, and/or installed from the removable medium 711. When executed by a Central Processing Unit (CPU) 701, performs the various functions defined in the system of the present invention.
Embodiments of the present invention also provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor of a computer, causes the computer to perform a scheduling method of model deployment as described above. The computer-readable storage medium may be included in the electronic device described in the above embodiment or may exist alone without being incorporated in the electronic device.
It should be noted that, the computer readable medium shown in the embodiments of the present invention may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium may be, for example, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-Only Memory (ROM), an erasable programmable read-Only Memory (Erasable Programmable Read Only Memory, EPROM), flash Memory, an optical fiber, a portable compact disc read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer-readable signal medium may comprise a data signal propagated in baseband or as part of a carrier wave, with a computer-readable computer program embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. A computer program embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. Where each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The above embodiments are merely illustrative of the principles of the present invention and its effectiveness, and are not intended to limit the invention. Modifications and variations may be made to the above-described embodiments by those skilled in the art without departing from the spirit and scope of the invention. It is therefore intended that all equivalent modifications and changes made by those skilled in the art without departing from the spirit and technical spirit of the present invention shall be covered by the appended claims.

Claims (10)

1. A scheduling method for model deployment, comprising:
acquiring a model to be deployed and a chip type corresponding to the model to be deployed, wherein the model to be deployed is cached in a model pool to be deployed of a cloud platform;
constructing a task to be deployed based on the model to be deployed, sequencing the tasks to be deployed corresponding to the same chip type according to priority, and generating a task queue, wherein the task to be deployed comprises a team task and a personal task;
dispatching the task to be deployed to a chip in a corresponding chip resource pool according to the task queue, enabling the model to be deployed from the model pool to the corresponding chip, wherein the task queue, the chip resource pool and the chip type are in one-to-one correspondence with each other, and the chip comprises a shared chip and a team exclusive chip;
and dispatching the team task to the team exclusive chip belonging to the same team as the team task, and dispatching the personal task to the shared chip.
2. The method for scheduling model deployment according to claim 1, wherein the constructing tasks to be deployed based on the model to be deployed, and sequencing the tasks to be deployed corresponding to the same chip type according to priorities, generating a task queue, includes:
setting the priority of each task to be deployed, wherein the setting of the priority of the team task comprises uniformly setting the same priority for each team task corresponding to any team, or independently setting the priority for each team task corresponding to any team;
and sequencing each team task and each personal task corresponding to the same chip type from high to low according to priority, determining the execution sequence of each task to be deployed, and generating the task queue based on the execution sequence.
3. The method for scheduling model deployment according to claim 1, wherein the scheduling the task to be deployed to the chip in the corresponding chip resource pool according to the task queue comprises:
determining team information of the team tasks to be scheduled and team labels of each team exclusive chip in the chip resource pool;
matching team information of the team tasks with team labels of each team exclusive chip, determining the team exclusive chips belonging to the same team with the team tasks, and taking the team exclusive chips as target team chips;
if any of the target team chips is idle, scheduling the team task to the idle target team chip;
if all the target team chips are not idle, scheduling the team tasks to the idle shared chips;
and if the target team chip and the shared chip are not idle, stopping scheduling the model to be deployed corresponding to the team task in the pool to be deployed until the idle target team chip or the idle shared chip exists.
4. The method of scheduling model deployment of claim 3, further comprising, prior to said scheduling the team task to the team exclusive chip of the same team to which the team task belongs:
counting the number of chips used by each user in each team;
if the number of chips used by any user in the team is larger than a preset first threshold, taking the user as a target user, and stopping scheduling the team task corresponding to the target user until the number of chips used by the target user is smaller than the preset first threshold;
if the total number of the chips obtained by adding the numbers of the chips used by all the users in the team is larger than a preset second threshold, all the team tasks corresponding to the team stop scheduling until the total number of the chips used by the team is smaller than the second threshold.
5. The method for scheduling model deployment according to claim 1, comprising, before said scheduling the task to be deployed to a chip in a corresponding chip resource pool according to the task queue:
deploying a chip program on each chip to acquire chip information of each chip through the chip program, wherein the chip information at least comprises the chip type, the chip IP and a chip driving version;
and configuring the same resource address for the chips with the same chip type by utilizing the chip program so as to allocate each chip to the corresponding chip resource pool, wherein the resource addresses are in one-to-one correspondence with the chip resource pools.
6. The method for scheduling model deployment according to claim 5, further comprising, before said scheduling the task to be deployed to a chip in a corresponding chip resource pool according to the task queue:
monitoring the use state of the chip by using the chip program;
if the chip is in an offline state, stopping scheduling the task to be deployed to the chip;
and if the chip is in an idle state, scheduling the task to be deployed to the chip.
7. The scheduling method of model deployment according to any one of claims 1 to 6, further comprising, before the obtaining a model to be deployed and a chip type corresponding to the model to be deployed:
acquiring an initial model and the chip type pre-deployed by the initial model;
screening frame tools corresponding to the chip types pre-deployed by the initial model from a preset frame tool package, converting the initial model according to the frame tools, and determining an intermediate model;
and quantifying and compiling the intermediate model to obtain the model to be deployed, wherein the model to be deployed is shared by teams to which the user belongs.
8. A scheduling apparatus for model deployment, comprising:
the acquisition module is used for acquiring a model to be deployed and a chip type corresponding to the model to be deployed, wherein the model to be deployed is cached in a model pool to be deployed of the cloud platform;
the queue generating module is used for constructing tasks to be deployed based on the models to be deployed, sequencing the tasks to be deployed corresponding to the same chip type according to priority, and generating a task queue, wherein the tasks to be deployed comprise team tasks and personal tasks;
the scheduling module is used for scheduling the task to be deployed to a chip in a corresponding chip resource pool according to the task queue, so that the model to be deployed is deployed from the model pool to be deployed to the corresponding chip, the task queue, the chip resource pool and the chip type are in one-to-one correspondence with each other, the chip comprises a shared chip and a team exclusive chip, wherein the team task is scheduled to the team exclusive chip belonging to the same team as the team task, and the personal task is scheduled to the shared chip.
9. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs which, when executed by the one or more processors, cause the electronic device to implement the scheduling method of model deployment of any of claims 1 to 7.
10. A computer-readable storage medium, having stored thereon a computer program for causing a computer to execute the scheduling method of model deployment according to any one of claims 1 to 7.
CN202311702784.8A 2023-12-12 2023-12-12 Scheduling method, device, equipment and storage medium for model deployment Pending CN117707731A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311702784.8A CN117707731A (en) 2023-12-12 2023-12-12 Scheduling method, device, equipment and storage medium for model deployment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311702784.8A CN117707731A (en) 2023-12-12 2023-12-12 Scheduling method, device, equipment and storage medium for model deployment

Publications (1)

Publication Number Publication Date
CN117707731A true CN117707731A (en) 2024-03-15

Family

ID=90145545

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311702784.8A Pending CN117707731A (en) 2023-12-12 2023-12-12 Scheduling method, device, equipment and storage medium for model deployment

Country Status (1)

Country Link
CN (1) CN117707731A (en)

Similar Documents

Publication Publication Date Title
US11436050B2 (en) Method, apparatus and computer program product for resource scheduling
CN111367679A (en) Artificial intelligence computing power resource multiplexing method and device
US10977076B2 (en) Method and apparatus for processing a heterogeneous cluster-oriented task
CN109117252B (en) Method and system for task processing based on container and container cluster management system
US20190213040A1 (en) Workflow scheduling system, workflow scheduling method, and electronic apparatus
CN112395736B (en) Parallel simulation job scheduling method of distributed interactive simulation system
CN111343288B (en) Job scheduling method and system and computing device
CN111190712A (en) Task scheduling method, device, equipment and medium
CN113946431B (en) Resource scheduling method, system, medium and computing device
CN110177146A (en) A kind of non-obstruction Restful communication means, device and equipment based on asynchronous event driven
CN107832130A (en) A kind of job stream scheduling of banking system performs method, apparatus and electronic equipment
CN115686805A (en) GPU resource sharing method and device, and GPU resource sharing scheduling method and device
CN106648831A (en) Cloud workflow scheduling method based on firefly algorithm and dynamic priority algorithm
CN114968567A (en) Method, apparatus and medium for allocating computing resources of a compute node
CN105933136B (en) A kind of resource regulating method and system
CN116820714A (en) Scheduling method, device, equipment and storage medium of computing equipment
CN115373826B (en) Task scheduling method and device based on cloud computing
CN117707731A (en) Scheduling method, device, equipment and storage medium for model deployment
CN111190731A (en) Cluster task scheduling system based on weight
CN114896049A (en) Method, system, equipment and medium for scheduling operation tasks of electric power artificial intelligence platform
CN114298313A (en) Artificial intelligence computer vision reasoning method
CN114625512A (en) Task scheduling method and device, electronic equipment and storage medium
CN114020414A (en) Symbiotic method and device of Android system and bottom layer Linux, electronic equipment and storage medium
KR102642396B1 (en) Batch scheduling device for deep learning inference model using limited gpu resources
TWI777695B (en) Method for automatic scheduling tasks, electronic device, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination