CN115129463A

CN115129463A - Computing power scheduling method, device, system and storage medium

Info

Publication number: CN115129463A
Application number: CN202110333330.2A
Authority: CN
Inventors: 倪茂; 周婷; 崔芳
Original assignee: China Mobile Communications Group Co Ltd
Current assignee: China Mobile Communications Group Co Ltd
Priority date: 2021-03-29
Filing date: 2021-03-29
Publication date: 2022-09-30

Abstract

The embodiment of the invention provides a computing power scheduling method, a computing power scheduling device, a computing power scheduling system and a storage medium, wherein the method comprises the following steps: if the current working mode is the inference mode, executing the following operations: acquiring computing power weight information of each computing power device in a target computing power device group; determining a task distribution sequence according to the calculation weight information of each calculation device; and sequentially distributing each task in the task flow to corresponding force computing equipment in the target force computing equipment group according to the task distribution sequence information. The computing power scheduling method can solve the problem that computing resources are wasted because part of computing power equipment is in an idle state for waiting for the completion of computing of the other part of computing power equipment in the prior art, and tasks are dynamically allocated through the real-time computing power attribute of the computing power equipment, so that the computing power of each equipment is effectively utilized.

Description

Computing power scheduling method, device, system and storage medium

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a calculation power scheduling method, a calculation power scheduling device, a calculation power scheduling system and a storage medium.

Background

In the prior art, after a user proposes corresponding demand information, a computing power network system may schedule idle computing power devices according to the demand of the user to provide corresponding computing services for the user, however, in the existing computing power scheduling manner, a part of the computing power devices of the computing power services provided for the user are still in a computing state during the computing process, and another part of the computing power devices have completed computing and are in an idle state within a period of time, that is, wait for the computing power devices still in the computing state to complete the computing task of the current round. Therefore, the existing power scheduling mode has the problem of power equipment resource waste.

Disclosure of Invention

Embodiments of the present invention provide a computing power scheduling method, a computing power scheduling device, a computing power scheduling system, and a storage medium, which can solve the problem in the prior art that computing resources are wasted because a part of computing power devices are in an idle state to wait for another part of computing power devices to complete computing, and dynamically allocate tasks according to real-time computing power attributes of the computing power devices, thereby effectively utilizing computing power of each device.

The embodiment of the invention provides a computing power scheduling method, which comprises the following steps: if the current working mode is the inference mode, executing the following operations: acquiring computing power weight information of each computing power device in a target computing power device group; determining a task distribution sequence according to the calculation weight information of each calculation device; and sequentially distributing each task in the task flow to corresponding force computing equipment in the target force computing equipment group according to the task distribution sequence information.

Further, the determining the task allocation sequence according to the computing power weight information of each computing power device includes: executing multi-turn calculation force equipment selection operation, obtaining a calculation force equipment selection sequence after the multi-turn calculation force equipment selection operation, and taking the selection sequence as the calculation force equipment selection sequence and the task distribution sequence; wherein each round of computing force device selection operation comprises: taking the power calculating equipment corresponding to the highest weight in the first weight information of the current round as the power calculating equipment selected in the current round, wherein the weight information comprises power calculating weight values of all power calculating equipment in a target power calculating equipment group, and the first weight information in the first round power calculating equipment selection operation comprises initial power calculating weight values of all power calculating equipment in the target power calculating equipment group; obtaining a correction value after the current calculated force weight value of the calculated force equipment selected in the current round is different from the sum of the weights of all calculated force equipment of the target calculated force equipment group, and updating the first weight information according to the correction value to obtain second weight information; and respectively summing the current force weight value of each force computing device in the second weight information with the initial force weight value of the corresponding force computing device to obtain third weight information, and using the third weight information as first weight information of the next force computing device selection operation.

Further, before performing the multi-computing-force device selection operation, the method further comprises: configuring a target selection rule according to a selection rule of a first calculation force equipment selection operation; wherein the selection rule of the initial computing force device selection operation comprises: the force calculation equipment corresponding to the highest weight in the first weight information of the current round is used as the force calculation equipment selected in the current round; and/or randomly selecting a target weight from the first-round first weight information, and taking the force calculation equipment corresponding to the target weight as the force calculation equipment selected in the first round.

Further, before the obtaining computing power weight information of each computing power device in the target computing power device group, the method further includes: acquiring task description information, and determining a current working mode according to the task expression information, wherein the working mode comprises a training mode and an inference mode.

Further, after determining the current working mode according to the task expression information, if the current working mode is a training mode, executing the following operations: acquiring computing power ratio information of each computing power device in a target computing power device group; dividing first-round training data according to the calculated force ratio information of each calculated force device, and distributing the divided first-round training data to corresponding calculated force devices respectively; determining distribution information of (i +1) th round training data according to the ith round training time of each power computing device, dividing the (i +1) th round training data according to the distribution information, and distributing the divided (i +1) th round training data to the corresponding power computing devices respectively, wherein i is an integer greater than or equal to 1.

Further, the ratio of the calculated forces according to the calculated forces of the calculated force devicesThe information division first-round training data and the distribution of the divided first-round training data to the corresponding force calculation equipment respectively comprise: according to the computing power ratio information { P) of the computing power equipment ₁ ,P ₂ ,P ₃ ,...,P _n Dividing the first training data into a plurality of training subdata in equal proportion

Wherein, P _n The computing power size of the nth computing power device in each computing power device is shown,

representing the data size of training subdata divided for the nth computing power equipment in the first round of training, wherein the value range of N comprises {1, 2, 3.., N }, and N represents the number of the computing power equipment in the target computing power equipment group; respectively dividing the training subdata D in the plurality of training subdata _n Is assigned with a corresponding calculated force P _n The force calculating device.

Further, the determining distribution information of the (i +1) th round of training data according to the i-th round of training time of each computing power apparatus, dividing the (i +1) th round of training data according to the distribution information, and distributing the divided (i +1) th round of training data to the respective computing power apparatuses includes: obtaining the ith round of training time of each computing device

Wherein the content of the first and second substances,

representing the training time of the ith round of the nth computing equipment; obtaining the longest training time in the ith round of training time of each computing device

And determining the ith round of training time and the longest training time of each computing power device

Has a large time differenceThe computing power equipment to be configured is equal to preset time, wherein the preset time comprises training time of the computing power equipment to be configured on training subdata with unit data volume; obtaining distribution information of ith round of training data

Wherein the content of the first and second substances,

representing the data size of the training subdata divided to the nth computing equipment in the ith round of training; distribution information to the ith round of training data

The data volume corresponding to the computing power equipment to be configured is updated to obtain the distribution information of the training data of the (i +1) th round

Wherein the updating operation comprises increasing the training subdata with unit data volume on the basis of the training subdata divided by the computing power device to be configured; distribution information according to the (i +1) th round training data

The (i +1) th round of training data is divided, and the divided (i +1) th round of training data is assigned to the respective power computing devices, respectively.

In a second aspect, an embodiment of the present application further provides a computing power scheduling apparatus, where the apparatus includes: a processor and a memory for storing at least one instruction which is loaded and executed by the processor to implement the computational scheduling method provided by the first aspect.

In a third aspect, an embodiment of the present application further provides a distributed system, where the system includes: a plurality of computing devices for providing computing services; the system comprises a user interface, a task description module and a task management module, wherein the user interface is used for acquiring user requirement information of a user and determining task description information according to the user requirement information, and the task description information comprises one or more of the following: task type, time delay requirement, precision requirement, memory occupation and working mode; the gateway is used for acquiring the task description information provided by the user interface and matching a target algorithm model meeting the task description information; the computing power equipment matching module is used for matching a target computing power equipment group meeting the task description information in the plurality of computing power equipment; and the computing power scheduling device provided by the second aspect.

Further, the matching, among the plurality of computing devices, a target computing device group that satisfies the task description information includes: screening out a plurality of candidate force computing devices which meet the task description information from the idle force computing devices of the plurality of force computing devices; and selecting a partial candidate computing power device among the plurality of candidate computing power devices as the target computing power device group.

In a fourth aspect, this embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the computational power scheduling method provided in the first aspect.

According to the technical scheme, in the inference stage, tasks are dynamically allocated according to different computing power resources and certain weights, so that certain computing power equipment is prevented from being always in an idle state, namely, the problem that computing resources are wasted because part of the computing power equipment is in the idle state for waiting for the other part of the computing power equipment to finish computing in the prior art is solved, the tasks are dynamically allocated according to the real-time computing power attribute of the computing power equipment, and the computing power of each equipment is effectively utilized.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a graph comparing edge calculation with computational force networks;

FIG. 2 is a flow diagram of user requirement processing provided by an embodiment of the present application;

FIG. 3 is a schematic diagram of a user requirement analysis provided in an embodiment of the present application;

FIG. 4 is a flowchart of a method for computing power scheduling in a training mode according to an embodiment of the present application;

FIG. 5 is a diagram illustrating dynamic correction of task allocation in a training mode according to an embodiment of the present application;

FIG. 6 is a flowchart of a method for computing power scheduling in an inference mode according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a computing power scheduling apparatus according to yet another embodiment of the present application;

fig. 8 is a diagram of a distributed system architecture according to yet another embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.

With the development of artificial intelligence, AI deployment is bound to have wide demands in various industries. And the nature of the edge is the extension and evolution of cloud computing near the end of computing demand. One of the core capabilities of edge computing is a network, which can integrate network, computing, storage, application and other capabilities at the edge of the network, and tasks with low computing power are directly completed at end testing, while computing with high computing power requirements is arranged at the cloud for processing.

Fig. 1 is a diagram comparing edge computing with a computing power network, as shown in fig. 1, in an edge computing mode, each end-side device processes a low-computing power task locally, and executes in the cloud if a large computing power is encountered. In the computing power network mode, various computing devices are networked to form a uniform computing network, and computing tasks can be automatically arranged on the devices in the network to be executed according to the task types and the computing power requirements of users.

When a computing task is processed through edge computing, because the deployment machine room of the edge computing is generally limited by conditions, the computation-intensive operation of model training cannot be realized at the edge end and can only be uploaded to a cloud platform for processing. However, in order to train the model and upload the user data to the cloud platform, a security problem that the private data of the user is leaked easily occurs. In addition, model training is performed at the cloud, and then a large amount of network bandwidth is occupied by a mode of reasoning at the edge side, so that the line congestion of a central server is easily caused when training data is large or users are many, and the computational power network at the edge side not only performs reasoning but also needs online model learning, so that the computational power requirement on the edge side is higher, and various computing devices in the computational power network need to be comprehensively used.

At present, the mode of edge computing and a cloud platform faces various problems of privacy, network bandwidth congestion, difficulty in computing and scheduling and the like. Therefore, the problem can be solved by locally placing the computing center and building a computing power network locally. However, in the process of computing power application, some problems still exist, for example, the problem of computing power scheduling in a computing power network, and the problem of resource waste of computing power equipment exists in the existing computing power scheduling.

At least in order to overcome the computational power scheduling problem existing in the computational power network, the embodiment of the present application provides the following technical solutions:

fig. 2 is a user requirement processing flow chart provided in an embodiment of the present application, and as shown in fig. 2, the user requirement processing flow includes the following steps:

step S1: acquiring user demand information of a user, and determining task description information by analyzing the user demand information, wherein the task description information comprises one or more of the following information: task type, delay requirement, precision requirement, memory occupation and working mode. In one embodiment, the task description information includes a task type, a delay requirement, a precision requirement, a memory occupation, and a working mode. Fig. 3 is a schematic diagram of user requirement analysis according to an embodiment of the present application, and as shown in fig. 3, the user requirement information sent by the user may include the following requirements: target detection, face recognition, difference detection or voice recognition, and analyzing the needs of the user. For example, two pieces of requirement information (requirement 1 and requirement 2) of a user are obtained, and the two pieces of requirement information are analyzed respectively, so that task description information 1 corresponding to requirement 1 and task description information 2 corresponding to requirement 2 are obtained respectively. As shown in fig. 3, the task description information 1 includes task types: detecting a target; the time delay requirement is as follows: 0.5 s; the working mode is as follows: and (4) reasoning mode. The task description information 2 includes task types: detecting a target; the time delay requirement is as follows: none; an algorithm model: YOLOv 4; the working mode is as follows: a training mode; a data source: path/to/source.

Step S2: an appropriate AI algorithm may be selected from the algorithm library based on the task description information provided in step S1 and the trained model parameters may be loaded to the computing device. The selection principle of the suitable AI algorithm may include user requirements and hardware resource limitations. For example, an algorithm with a delay requirement and a memory requirement satisfying the user requirement is selected from the model library and loaded on the parameter server.

Before step S2 is executed, the following preset step may also be executed, specifically, before a plurality of computing power devices are networked, Network discovery operation and Network discovery may be performed on a plurality of heterogeneous computing power devices currently existing, where the Network discovery operation and Network discovery may be performed by combining an icmp (Internet Control Message Protocol), an ARP (Address Resolution Protocol) and an SNMP ((Simple Network Management Protocol), and the Network is checked for active devices to obtain all active devices, and then basic information of the devices is obtained through the SNMP, a type of the devices is determined according to the basic information, and then detailed information of corresponding devices is obtained according to the type of the devices, and then unified Management is performed.

And (4) adopting a centralized searching method to search the resource information in the whole network, and then positioning the position of the resource in the grid. When network resources are found, the system is defined as a uniform interface and protocol for different types of equipment, equipment produced by different manufacturers and equipment with different models, so that the system manages the equipment by using uniform names and descriptors.

In one embodiment, a resource descriptor may be assigned to a discovered new computing device each time a new device accesses the network. The descriptors are organized in a Key-Value mode, and new description characteristics can be conveniently expanded after the descriptors are expanded. The descriptor has four basic features: the method comprises the following steps of equipment name, calculation efficiency of the equipment under double-precision, single-precision, semi-precision and integer conditions, time delay required from a data sending end to the equipment end and equipment memory. The newly accessed device is assigned a descriptor and then sent to s10 to be managed uniformly in the heterogeneous resource pool. Through the network, the system can manage the equipment in different physical spaces, and the efficiency of network management is greatly improved.

Step S3: configuring a corresponding parameter file on the parameter server according to the algorithm model selected in step S2, where the corresponding parameter file may include one or more of the following: task ID, deep learning framework, required hardware resources, network structure, network parameters, and task-related hyper-parameters. In one embodiment, the parameter file may include all of the information described above (task ID, deep learning framework, required hardware resources, network structure, network parameters, and task-related hyper-parameters).

Step S4: determining a working mode according to the task description information provided in the step S1, wherein the working mode comprises a training mode or an inference mode. The parameter server stores the training parameters of the model, so that when the working mode is determined to be the training mode, the parameter server and the computing power equipment in the resource pool perform parameter synchronization operation among multiple pieces of equipment. And when the working mode is determined to be the reasoning mode, the parameter server provides network parameters for the force computing equipment for reasoning. Specifically, when the network hyper-parameters are configured, the training samples and the model parameters can be serialized in a flexible, efficient and automatic mechanism structure data serialization mode, such as proto buffer.

Step S5: according to the working mode determined in the step S4, when the working mode is determined to be the training mode, polling the computing power resources in the resource pool, screening multiple candidate computing power devices that satisfy task description information from the idle computing power devices in the resource pool, and selecting part of the candidate computing power devices from the multiple candidate computing power devices as the target computing power device group, where part of the computing power needs to be reserved for other tasks. Whether the requirement is met (whether the task description information is met) is considered from the aspects of whether the requirement of the memory is met (the size of the model is more than 1.5 times), the calculation power of heterogeneous equipment is selected in a grading way and the like.

And after selecting the target computing power equipment group, performing corresponding computing power scheduling operation. In the training mode, the batch processing sample size of each device can be dynamically adjusted according to the device memory and calculation power. Specifically, fig. 4 is a flowchart of a computing power scheduling method in a training mode according to an embodiment of the present application, and as shown in fig. 4, the computing power scheduling method in the training mode includes the following steps:

step 401: and acquiring the calculation power ratio information of each calculation power device in the target calculation power device group.

Step 402: dividing first-round training data according to the information of the force calculation ratio of each force calculation device, and distributing the divided first-round training data to the corresponding force calculation devices respectively.

Step 403: determining distribution information of (i +1) th round training data according to the ith round training time of each computing force device, dividing the (i +1) th round training data according to the distribution information, and distributing the divided (i +1) th round training data to the corresponding computing force devices respectively, wherein i is an integer greater than or equal to 1.

It may be assumed that the training time is roughly inversely proportional to the amount of effort, and therefore, when resources are first allocated, a certain amount of data is allocated to each device according to the ratio of the efforts calculated by each device. Setting the selected force computing equipment set as { P ₁ ,P ₂ ,P ₃ ,...,P _n The information of the computing power ratio of each computing power device in the target computing power device group is { P } ₁ ,P ₂ ,P ₃ ,...,P _n In which P is _n And the calculation force size of the nth calculation force device in each calculation force device is shown. Wherein each of the above-mentioned calculation forces is setThe calculated power ratio is obtained by comparing the calculated power ratios with the same calculation accuracy.

The step 402 of dividing the first-round training data according to the information of the ratio of the calculated forces of the force calculating devices, and allocating the divided first-round training data to the corresponding force calculating devices may specifically include dividing the first-round training data according to the information of the ratio of the calculated forces { P) of the force calculating devices ₁ ,P ₂ ,P ₃ ,...,P _n Dividing the first training data into a plurality of training subdata in equal proportion

And representing the data size of the training subdata divided for the nth force computing device in the first training, wherein the value range of N comprises {1, 2, 3.. and N }, and N represents the number of the force computing devices in the target force computing device group. Further, the training subdata D in the plurality of training subdata is respectively used _n Is assigned with a corresponding calculated force P _n The computing force device of (1).

For example, the target computing power equipment group includes 3 computing power equipment, and the computing power ratio information of the three computing power equipment is { P } ₁ ,P ₂ ,P ₃ Dividing the first training data into a plurality of training subdata in equal proportion according to the computing power ratio information of the three computing power devices

Wherein the training subdata

To be divided into ₁ The force calculating equipment of (2) and so on.

The way in which the above-mentioned step 402 assigns the amount of training data by means of the calculated power is only an initial value, which is dynamically adjusted according to the actual training time. The network parameters are updated after all devices have completed a round of assignment data calculation. Calculating network parameters:

wherein λ (D) _j ) Is a quantity related to the size of the amount of training data per batch,

W _i representing the parameters of the parameter server node at the time of the ith round of synchronization. During initial training, the initial training data volume is determined according to the proportion of the computing power of the training equipment in the computing power pool, and then the distributed training volume is dynamically adjusted according to the actual training time.

Fig. 5 is a schematic diagram of dynamically modifying task allocation amount in a training mode according to an embodiment of the present application, as shown in fig. 5, in each training, each device can complete training in approximately the same time, so as to uniformly update parameters in the parameter server. The method solves the defect that the efficiency of updating parameters of multiple devices depends on the slowest device when the training data volume of each batch is fixed, and also avoids the problem of message delay when the devices are updated asynchronously. After each round of calculation is finished, the scheduling module can automatically and dynamically correct the distributed task amount according to the time difference of the equipment calculated in the previous round, so that the calculation time of each equipment is close.

That is, the above dynamic adjustment is realized according to the operation procedure of step 403. Specifically, the determining, according to the i-th round training time of each power computing device, allocation information of the (i +1) -th round training data according to the step 403, dividing the (i +1) -th round training data according to the allocation information, and allocating the divided (i +1) -th round training data to the corresponding power computing devices respectively includes: obtaining the ith round of training time of each computing device

Wherein the content of the first and second substances,

And determining the ith wheel of each force computing deviceTraining time and the maximum training time

The time difference of (a) is greater than or equal to a preset time, wherein the preset time comprises the training time of the to-be-configured computing power device on the training subdata with unit data volume, and specifically, for each device j of n computing power devices, inequalities are sequentially calculated

Whether or not this is true. If the inequality is true, the data amount of the next round of training of the device j is increased by 1 (training sub data per unit data amount), that is,

the specific implementation steps comprise: obtaining distribution information of ith round of training data

Wherein, the first and the second end of the pipe are connected with each other,

Wherein the updating operation comprises increasing the training subdata with unit data volume on the basis of the training subdata divided by the computing power device to be configured; according to the distribution information of the (i +1) th round training data

Dividing training data of the (i +1) th round, and dividingThe (i +1) th round of training data is assigned to the respective computing power apparatuses.

According to the operation of confirming the working mode, when the current working mode is confirmed to be the reasoning mode, the computing power resources in the resource pool are polled, a plurality of candidate computing power devices meeting task description information are screened out from the idle computing power devices in the resource pool, partial candidate computing power devices are selected from the candidate computing power devices to serve as the target computing power device group, and a part of computing power needs to be reserved for other tasks to use. Wherein whether the requirements are met (whether the task description information is met) is considered from whether the delay requirements are met.

And after selecting the target computing power equipment group, performing corresponding computing power scheduling operation. In the inference mode, the computing machines may be assigned tasks based on the weights of the devices. Specifically, fig. 6 is a flowchart of a method for scheduling computing power in an inference mode according to an embodiment of the present application, and as shown in fig. 6, the method for scheduling computing power in the inference mode includes the following steps:

step 601: and acquiring computing power weight information of each computing power device in the target computing power device group.

Step 602: and determining a task distribution sequence according to the calculation weight information of each calculation device.

Step 603: and sequentially distributing each task in the task flow to corresponding force computing equipment in the target force computing equipment group according to the task distribution sequence information.

The set of computing power devices selected for operation in inference mode is set to { P } ₁ ,P ₂ ,P ₃ ,...,P _n In which P is _n And the calculation force size of the nth calculation force device in each calculation force device is shown. Correspondingly, the computing power weight set of each computing power device selected by working in the reasoning mode is { w ₁ ,w ₂ ,w ₃ ,...,w _n And (c) }, wherein. w is a _n Representing calculated force P _n A computational weight of the corresponding computational device.

When the task flow is transmitted from the dispatching center, the task is distributed according to the following principle:

i. and selecting the device with the largest weight from the weights of the alternative devices, and distributing the task to the device.

ii. The weight of the selected device is subtracted by the weight sum of all the devices, and then each weight is added with the weight initial value of the device.

And iii, repeating the steps i and ii until the weight is 0.

And iv, resetting all weights to initial weights, and continuing to distribute the tasks from i.

In other words, the allocation operation of the allocation principle described above can be implemented by the operation of step 602 described above. Specifically, the determining the task allocation sequence according to the computing power weight information of each computing power device in step 602 includes: and executing multi-turn calculation force equipment selection operation, obtaining a calculation force equipment selection sequence after the multi-turn calculation force equipment selection operation, and taking the selection sequence as the calculation force equipment selection sequence and the task distribution sequence. Wherein each round of computing force device selection operation comprises: taking the computing force equipment corresponding to the highest weight in the first weight information of the current round as the computing force equipment selected in the current round, wherein the weight information comprises the computing force weight value of each computing force equipment in the target computing force equipment group, and the first weight information in the first round of computing force equipment selection operation comprises the initial computing force weight value of each computing force equipment in the target computing force equipment group; obtaining a correction value after the current computing power weight value of the computing power equipment selected in the current round is different from the sum of the weights of all the computing power equipment of the target computing power equipment group, and updating the first weight information according to the correction value to obtain second weight information; and respectively summing the current force weight value of each force computing device in the second weight information with the initial force weight value of the corresponding force computing device to obtain third weight information, and using the third weight information as first weight information of the next force computing device selection operation.

For example, the target computing force device group described in step 601 includes three devices: and the initial force calculation weights obtained by the force calculation equipment A, the force calculation equipment B and the force calculation equipment C are respectively 5, 2 and 1. The device selection is as shown in table one.

Watch 1

In the first round of device selection, the computing force device with the highest weight in the current round of first weight information (5, 2, 1) is the computing force device a, and the current computing force weight is 5. Further, the current calculation weight (5) of the calculation force device a is subtracted from the sum (5+2+1) of the weights of the three calculation force devices, that is, 5-8 is equal to-3, and the correction value (-3) of the current round is obtained through the calculation of the difference. And performing an updating operation on the current round of first weight information (5, 2, 1) through the correction value (-3), wherein the updating operation can comprise replacing the weight value in the first weight information of the selected power computing device A by the correction value (-3), namely updating the '5' in the first weight information (5, 2, 1) to be '-3', so as to obtain the current round of second weight information (-3, 2, 1). Further, the current force weight value of each force computing device in the current round of second weight information (-3, 2, 1) is respectively summed with the initial force weight value of the corresponding force computing device, so as to obtain third weight information. Specifically, the sum operation is (-3) +5 ═ 2, 2+2 ═ 4, 1+1 ═ 2, the third weight information in the current round is (2, 4, 2) as the first weight information selected by the next round of equipment, and so on.

In one implementation, when high concurrent tasks are performed, in order to avoid tasks being assigned to the highest weight at the same time, it may be considered to randomly select a weight and then continue assigning the weight. In other words, before performing the multi-round computing force device selection operation, the method further includes: configuring a target selection rule according to a selection rule of a first calculation force equipment selection operation; wherein the selection rule of the first round of computing force device selection operation comprises: the force calculation equipment corresponding to the highest weight in the first weight information of the current round is used as the force calculation equipment selected in the current round; and/or randomly selecting a target weight from the first round of first weight information, and taking the force calculation equipment corresponding to the target weight as the force calculation equipment selected by the first round.

Fig. 7 is a schematic structural diagram of a computational power scheduling apparatus according to yet another embodiment of the present application, and as shown in fig. 7, the computational power scheduling apparatus includes a processor 701 and a memory 702, where the memory 702 is used to store at least one instruction, and the instruction is loaded and executed by the processor 701 to implement the computational power scheduling method according to the embodiment shown in fig. 4 and/or the computational power scheduling method according to the embodiment shown in fig. 6.

Fig. 8 is a distributed system provided in another embodiment of the present application, and as shown in fig. 8, the distributed system may include a plurality of computing power devices, a user interface, a gateway, a computing power device matching module, and a computing power scheduling apparatus (e.g., the computing power scheduling module shown in fig. 8) provided in the embodiment shown in fig. 7.

Wherein, the user interface (user API) is configured to receive user requirement information sent by a user end, and the task description information includes one or more of the following: task type, time delay requirement, precision requirement, memory occupation and working mode; in one implementation, the task description information includes: task type, delay requirement, precision requirement, memory occupation and working mode. And the user transmits the user requirement into a computing network of the distributed system through a given user API, and the network is automatically deployed.

In one implementation, the distributed system may further include an AI algorithm library (SOTA library), which may include a library of algorithms in a plurality of deep learning fields, such as natural language processing, computer vision, recommendation systems, knowledge maps, etc., each algorithm having associated network parameters and models. So that the user can customize the training data for training if the user has the requirement. Each model in the AI algorithm library also has corresponding description information, which mainly includes: the type of model, the input and output of the model, the operation mode (accuracy priority/calculation speed priority), the average accuracy, the calculation time per unit input amount, and the like. After the user transmits the requirements to the computational power network, the algorithm selection can automatically select a candidate model group according to the model type, and then an optimal algorithm is selected for deployment according to the specific requirements of the user, such as time delay, precision requirements, data size and the like.

The gateway (e.g., the algorithm selection shown in the figure) may be a computational network gateway, and is configured to obtain the task description information provided by the user interface and match a target algorithm model satisfying the task description information in an AI algorithm library. After the computing power equipment is selected, the gateway can also load the trained model parameters to each computing power equipment. In particular, OpenVINO or libtorch is used to transform the model into an easily deployed model description file.

OpenVINO is a tool kit developed by intel based on its existing hardware platform, and can accelerate the application development speed of high-performance computer vision and deep learning vision, support deep learning on hardware accelerators of various intel platforms, and allow direct heterogeneous execution. The method is supported in Windows and Linux systems and Python/C + + language. The C + + API provided by the Pytorch authority is named Libtorch, and starts to support windows after the version of Pytorch1.0, and the Pytorch model can be deployed directly by using the Libtorch.

And the computing power equipment matching module is used for matching a target computing power equipment group meeting the task description information in the plurality of computing power equipment.

In one implementation manner, the distributed network further includes a resource discovery and management module, configured to perform active device check on the network in combination with SNMP to obtain all active devices, then obtain basic information of the devices through SNMP, determine the types of the devices according to the basic information, obtain detailed information of the corresponding devices according to the types of the devices, summarize the detailed information into a descriptor, and send the descriptor to the heterogeneous computing power. And when the scheduling module requests to use the computing resources, the scheduling module carries out addressing and calling. A network administrator can easily manage devices supporting the SNMP protocol in a network. The uniform interface provided based on the SNMP protocol can shield the difference between different devices, and does not need to be concerned about the type of the device and the manufacturer, thereby realizing automatic network management.

The computing equipment forms the resource pool, and the real heterogeneous resource equipment refers to entities of various resources, such as a computer, a mobile phone, a router, an intelligent terminal and the like, and can also be an NAS (network attached storage), a private cloud and the like. In one implementation, the computing power devices in the resource pool may provide heterogeneous computing power, and in particular, the heterogeneous computing power stores information of various computing powers in the network, including computing powers at double precision, single precision and half precision, time delay of data arriving at the computing power devices, memory of the computing power devices, and MAC addresses. Double precision, single precision, half precision: floating point numbers are one of the most common data types on computers, and have double precision and single precision, and the half precision is more for reducing the data transmission and storage cost. Double precision has 64 bits, single precision 32 bits, and half precision 16 bits. In distributed training, using half precision saves half the transmission cost over single precision.

In one implementation, the distributed system may further include a network scheduling module: when the network scale is huge, each router needs to acquire the information of the whole network and independently calculate the path for each application service, and at this time, the maintenance workload of the whole network is unacceptable, so in order to realize the feasibility of the operation of the computational network, the computational network needs to be uniformly managed, the information synchronization and the path calculation are centralized, the service routing table entry is calculated and then sent to the router, and the router is only responsible for forwarding the service message of the data layer.

An embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the computational power scheduling method provided in the embodiment shown in fig. 4 and/or the computational power scheduling method provided in the embodiment shown in fig. 6.

It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions in actual implementation, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

The integrated unit implemented in the form of a software functional unit may be stored in a computer-readable storage medium. The software functional unit is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) or a Processor (Processor) to execute some steps of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for computing power scheduling, the method comprising:

if the current working mode is the inference mode, executing the following operations:

acquiring computing power weight information of each computing power device in a target computing power device group;

determining a task distribution sequence according to the calculation weight information of each calculation device; and

and sequentially distributing each task in the task flow to corresponding force computing equipment in the target force computing equipment group according to the task distribution sequence information.

2. The method of claim 1, wherein determining the task assignment order according to the computational power weight information of the computational power devices comprises:

executing multiple rounds of calculation force equipment selection operations, obtaining a calculation force equipment selection sequence after the multiple rounds of calculation force equipment selection operations, and taking the calculation force equipment selection sequence as the task distribution sequence;

wherein each round of computing force device selection operation comprises:

taking the computing force equipment corresponding to the highest weight in the first weight information of the current round as the computing force equipment selected in the current round, wherein the weight information comprises the computing force weight value of each computing force equipment in the target computing force equipment group, and the first weight information in the first round of computing force equipment selection operation comprises the initial computing force weight value of each computing force equipment in the target computing force equipment group;

obtaining a correction value after the current computing power weight value of the computing power equipment selected in the current round is different from the sum of the weights of all the computing power equipment of the target computing power equipment group, and updating the first weight information according to the correction value to obtain second weight information; and

and respectively summing the current force weight value of each force calculation device in the second weight information with the initial force weight value of the corresponding force calculation device to obtain third weight information, and using the third weight information as the first weight information of the next force calculation device selection operation.

3. The method of claim 2, further comprising, prior to performing the multi-turn force device selection operation:

configuring a target selection rule according to a selection rule of a first calculation force equipment selection operation;

wherein the selection rule of the first round of computing force device selection operation comprises:

the force calculation equipment corresponding to the highest weight in the first weight information of the current round is used as the force calculation equipment selected in the current round; and/or

And randomly selecting a target weight in the first round of first weight information, and taking the force calculation equipment corresponding to the target weight as the force calculation equipment selected by the first round.

4. The method of claim 1, wherein prior to obtaining computing power weight information for each computing power device in the target computing power device group, further comprising:

acquiring task description information, and determining a current working mode according to the task expression information, wherein the working mode comprises a training mode and an inference mode.

5. The method of claim 4, wherein after determining the current working mode according to the task expression information, if the current working mode is a training mode, performing the following operations:

acquiring the computing power ratio information of each computing power device in the target computing power device group;

dividing first-round training data according to the computing power ratio information of each computing power device, and respectively distributing the divided first-round training data to corresponding computing power devices;

determining distribution information of (i +1) th round training data according to the ith round training time of each computing force device, dividing the (i +1) th round training data according to the distribution information, and distributing the divided (i +1) th round training data to the corresponding computing force devices respectively, wherein i is an integer greater than or equal to 1.

6. The method of claim 5, wherein the dividing the first-round training data according to the calculated power ratio information of each calculated power device and respectively allocating the divided first-round training data to the corresponding calculated power devices comprises:

according to the computing power ratio information { P of each computing power device ₁ ,P ₂ ,P ₃ ,...,P _n Dividing the first training data into a plurality of training subdata in equal proportion

representing the data size of the training subdata divided into the nth force computing device in the first training, wherein the value range of N comprises {1, 2, 3,. and N }, and N represents the number of force computing devices in the target force computing device group;

respectively dividing the training subdata D in the plurality of training subdata _n Is assigned with a corresponding calculated force P _n The computing force device of (1).

7. The method according to claim 6, wherein the determining allocation information of the (i +1) th round of training data according to the i-th round of training time of each power computing device, dividing the (i +1) th round of training data according to the allocation information, and allocating the divided (i +1) th round of training data to the respective power computing devices comprises:

obtaining the ith round of training time { T ] of each computing power device ₁ ⁱ ,T ₂ ⁱ ,T ₃ ⁱ ,...,T _n ⁱ In which T is _n ⁱ Representing the ith round of training time of the nth computing power device;

obtaining the longest training time in the ith round of training time of each computing device

And determining the ith round of training time and the longest training time of each computing device

The time difference is more than or equal to the preset time, wherein the preset time comprises the training time of the calculation force equipment to be configured on the training subdata with unit data volume;

obtaining distribution information of ith round of training data

Wherein the content of the first and second substances,

representing the data size of the training subdata divided to the nth computing equipment in the ith round of training;

distribution information to the ith round of training data

Wherein the updating operation comprises calculating a force setting at the to-be-configuredAdding training subdata with unit data volume on the basis of the divided training subdata;

according to the distribution information of the (i +1) th round training data

8. An computing power scheduling apparatus, comprising:

a processor and a memory for storing at least one instruction which is loaded and executed by the processor to implement the computational power scheduling method of any one of claims 1-7.

9. A distributed system, the system comprising:

a plurality of computing devices for providing computing services;

the system comprises a user interface, a task description module and a task management module, wherein the user interface is used for acquiring user requirement information of a user and determining task description information according to the user requirement information, and the task description information comprises one or more of the following: task type, time delay requirement, precision requirement, memory occupation and working mode;

the gateway is used for acquiring the task description information provided by the user interface and matching a target algorithm model meeting the task description information;

the computing power equipment matching module is used for matching a target computing power equipment group meeting the task description information in the plurality of computing power equipment; and

the computing power scheduler of claim 8.

10. The system of claim 9, wherein the matching of a group of target computing power devices among the plurality of computing power devices that satisfy the task description information comprises:

screening out a plurality of candidate computing power devices which meet the task description information from the idle computing power devices of the plurality of computing power devices; and

selecting a portion of the candidate computing power devices among the plurality of candidate computing power devices as the target computing power device group.

11. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the computational power scheduling method according to any one of claims 1 to 7.