CN116048773B - Distributed collaborative task assignment method and system based on wave function collapse - Google Patents

Distributed collaborative task assignment method and system based on wave function collapse Download PDF

Info

Publication number
CN116048773B
CN116048773B CN202211309545.1A CN202211309545A CN116048773B CN 116048773 B CN116048773 B CN 116048773B CN 202211309545 A CN202211309545 A CN 202211309545A CN 116048773 B CN116048773 B CN 116048773B
Authority
CN
China
Prior art keywords
node
task
resource
value
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211309545.1A
Other languages
Chinese (zh)
Other versions
CN116048773A (en
Inventor
张彤
白洋
徐锋
贺婧媛
王海鑫
郝创博
隋悦
郭小星
姚帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jinghang Computing Communication Research Institute
Original Assignee
Beijing Jinghang Computing Communication Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jinghang Computing Communication Research Institute filed Critical Beijing Jinghang Computing Communication Research Institute
Priority to CN202211309545.1A priority Critical patent/CN116048773B/en
Publication of CN116048773A publication Critical patent/CN116048773A/en
Application granted granted Critical
Publication of CN116048773B publication Critical patent/CN116048773B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention relates to a distributed collaborative task assignment method and a system based on wave function collapse, wherein the method comprises the following steps: performing resource evaluation on each node in the cluster to obtain a storage resource value, a calculation resource value and a communication response frequency value of each node; determining the number of nodes required by the task to be executed according to the number of idle nodes in the current cluster and the type of the task to be executed; and acquiring a storage resource predicted value, a calculation resource predicted value and a communication resource predicted value of the task to be executed, and determining an execution node of the task to be executed based on a wave function collapse algorithm according to the number of nodes required by the task to be executed, the resource predicted value of the task to be executed, the storage resource value, the calculation resource value and the communication response frequency value of each node.

Description

Distributed collaborative task assignment method and system based on wave function collapse
Technical Field
The invention relates to the technical field of task assignment methods, in particular to a distributed collaborative task assignment method and system based on wave function collapse.
Background
Public networking of distributed big data or cloud computing system construction has become the development direction of various real-time Internet application mainstream, and can promote data analysis and design operation effect. However, any distributed AI system has a problem of localization in time and space of tasks.
The essence of this problem is that due to the bulkiness of the AI dataset and the long-term nature of AI model training, no single computer node can perform the complete operation, and there is some combinability of both the dataset and the AI model, but it is difficult to combine computer nodes to achieve high efficiency and sustainability, on the one hand, the more nodes are involved in the provision of the dataset and the training of the AI model, the greater the additional network loss and resource redundancy, and on the other hand, the more labor-saving a fixed working combination, but the least likely the exertion of working efficiency and global performance.
A typical solution for task combining and distribution in conventional distributed systems is load balancing, which is common to access forwarding service tasks in large websites. Common methods include random, polling, hash extraction, and connection liveness, local area network area weighting, and response time weighting are enhanced to avoid the inability to guarantee sustainable service capability of random extraction. Has been widely used in cloud servers and data clusters. However, on one hand, the collective tasks are simple atomic tasks, and can be completed by a distributed single server, so-called load balancing only needs to reserve a central scheduling server, and the following tasks do not have multi-machine cooperation; on the other hand, this load is not truly balanced, and although post-algorithms add to the network part considerations, the performance and real-time efficiency of the assigned work machine cannot be adjusted once it is assigned.
Disclosure of Invention
In view of the above analysis, the embodiment of the invention aims to provide a distributed collaborative task assignment method and system based on wave function collapse, which are used for solving the problems that the existing distributed cluster has redundancy or low balancing efficiency and cannot be dynamically adjusted according to tasks.
In one aspect, an embodiment of the present invention provides a method for assigning distributed collaborative tasks based on wave function collapse, including the following steps:
Performing resource evaluation on each node in the cluster to obtain a storage resource value, a calculation resource value and a communication response frequency value of each node;
determining the number of nodes required by the task to be executed according to the number of idle nodes in the current cluster and the type of the task to be executed;
and acquiring a storage resource predicted value, a calculation resource predicted value and a communication resource predicted value of the task to be executed, and determining an execution node of the task to be executed based on a wave function collapse algorithm according to the number of nodes required by the task to be executed, the resource predicted value of the task to be executed, the storage resource value, the calculation resource value and the communication response frequency value of each node.
Based on the further improvement of the technical scheme, determining the number of nodes needed by the task to be executed according to the number of idle nodes in the current cluster and the type of the task to be executed comprises the following steps:
Judging whether similar tasks are executed according to task description of the tasks to be executed, if yes, determining the number of nodes required by the tasks to be executed currently according to the number of executing nodes of the tasks of the same type last time;
otherwise, the number of nodes needed by the current task to be executed is half of the number of the current idle nodes.
Further, the number of nodes required by the task to be executed currently is determined according to the number of executing nodes of the task of the same type at the last time by the following steps:
If the task of the last type has lack of resources, the number of nodes needed by the task to be executed currently is count=epsilon+epsilon/2; if resource redundancy exists in the task of the last type, the number of nodes needed by the task to be executed currently is count=epsilon-epsilon/2, wherein epsilon represents the number of executing nodes of the task of the last type; otherwise, the number of nodes needed by the current task to be executed is count=epsilon;
If the number count of the nodes needed by the current task to be executed is larger than the number of the idle nodes of the current cluster, the number of the nodes needed by the current task to be executed is set as the number of the idle nodes of the current cluster.
Further, the following method is adopted to judge whether resource deficiency or resource redundancy exists or not:
and during task execution, carrying out multiple resource evaluation on each node executing the task, and carrying out the following judgment according to an evaluation result:
if the occupancy rate of one type of resources exists in all the nodes executing the task exceeds a third threshold, judging that the resources for executing the task are lacking; if the occupancy rate of all the resource types of all the nodes executing the task is smaller than the third threshold value, judging that the task has resource redundancy.
Further, performing resource evaluation on each node in the cluster to obtain a storage resource value, a calculation resource value and a communication response frequency value of each node, including:
performing storage resource and computing resource evaluation on each node in the cluster by adopting an evaluation model to obtain a storage resource evaluation value and a computing resource evaluation value of each node;
According to the formula Respectively calculating a storage resource value and a calculation resource value of each node; wherein S i represents the storage resource value of node i, S c_i represents the storage resource evaluation value of node i,/>Representing the average stored resource assessment value of all nodes, C i representing the calculated resource value of node i, C c_i representing the calculated resource assessment value of node i,/>The average computing resource evaluation value of all the nodes is represented, and N represents the total node number of the cluster;
According to the formula A communication response frequency value R i of each node is calculated, where M represents the number of nodes communicable with node i, and time i,k represents the communication response time between node i and node k.
Further, the method comprises the steps of,
Adopting an evaluation model to test the number of reading and writing IOPS i, the storage capacity ST i and the number of floating point operation FLOPS i of each node per second, and calculating a storage resource evaluation value of each node according to S c_i=IOPSi×STi; the floating point number per second per node FLOPS i is used as the computing resource evaluation value of the node.
Further, performing resource evaluation on each node in the cluster to obtain a storage resource value, a calculation resource value and a communication response frequency value of each node, including:
performing storage resource and computing resource evaluation on each node in the cluster by adopting an evaluation model to obtain a storage resource evaluation value and a computing resource evaluation value of each node;
According to the formula Calculating a storage resource value S i of each node; wherein S i,j represents the value of the storage resource occupied by the j-th execution task of the node i, θ i represents the number of times the node i executes the task, S i,last represents the value of the storage resource occupied by the node i last time the task was executed, S c_i represents the evaluation value of the storage resource of the node i,/>The average storage resource evaluation value of all the nodes is represented, and N represents the total node number of the cluster;
According to the formula Calculating a calculation resource value of each node; wherein C i,j represents the calculation resource value C i,Ci,last occupied by the j-th execution task of the node i represents the calculation resource value occupied by the last execution task of the node i, C c_i represents the calculation resource evaluation value of the node i,/>Representing an average computing resource evaluation value of all nodes;
According to the formula Calculating the communication response frequency evaluation value of each node according to the formula R i=Rc_i+RLi+RHi,/>Calculating a communication response frequency value R i of each node; wherein R i,j represents a communication response frequency value when the j-th task is executed by the node i, R i,last represents a communication response frequency value when the task is executed last time by the node i, M represents the number of nodes communicable with the node i, and time i,k represents a communication response time between the node i and the node k.
Further, the following steps are adopted to screen the execution nodes for executing the tasks:
s31, randomly screening a node from the idle node set, and adding the node into the executing node set;
S32, sorting the storage resources, the computing resources and the communication resources according to the sequence of the storage resources, the computing resources and the communication resource predicted values of the tasks to be executed from large to small, and taking the first-ranked resources as the current resource dimension;
S33, if the number of the nodes of the current execution node set is equal to the number of the nodes required by the task to be executed, finishing screening; otherwise, if the random screening times under the current resource dimension reaches the second threshold, taking the next resource dimension as the current resource dimension, and returning to the step S33; otherwise, randomly screening a node from the idle node set, if the current screening node is equivalent to the node newly added into the executing node set in the current resource dimension, adding the current screening node into the executing node set, and returning to the step S33.
Further, if the ratio of the resource value of the current filtering node in the current resource dimension to the ratio of the node newly added to the executing node set in the current resource dimension is within the first threshold range, the capacity of the current filtering node in the current resource dimension is equivalent to that of the node newly added to the executing node set.
In another aspect, an embodiment of the present invention provides a distributed collaborative task assignment system based on wave function collapse, including the following modules:
The node resource evaluation module is used for evaluating the resources of each node in the cluster to obtain a storage resource value, a calculation resource value and a communication response frequency value of each node;
the task node number determining module is used for determining the number of nodes required by the task to be executed according to the number of idle nodes in the current cluster and the type of the task to be executed;
the task assignment module is used for acquiring a storage resource predicted value, a calculation resource predicted value and a communication resource predicted value of the task to be executed, and determining an execution node of the task to be executed based on a wave function collapse algorithm according to the number of nodes required by the task to be executed, the resource predicted value of the task to be executed, the storage resource value of each node, the calculation resource value and the communication response frequency value.
Compared with the prior art, the method and the device have the advantages that the nodes meeting the task resource requirements are selected from the clusters by adopting the wave function collapse algorithm according to the resource evaluation value of the task to be executed and the resources of the nodes in the clusters, so that the execution nodes are dynamically adjusted according to the actual tasks, the occurrence of cluster resource deficiency or redundant virtual consumption is avoided, and the overall working efficiency and performance are improved.
In the invention, the technical schemes can be mutually combined to realize more preferable combination schemes. Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention may be realized and attained by the structure particularly pointed out in the written description and drawings.
Drawings
The drawings are only for purposes of illustrating particular embodiments and are not to be construed as limiting the invention, like reference numerals being used to refer to like parts throughout the several views.
FIG. 1 is a flow chart of a distributed collaborative task assignment method based on wave function collapse in accordance with an embodiment of the present invention;
FIG. 2 is a block diagram of a distributed collaborative task assignment system based on wave function collapse in accordance with an embodiment of the invention.
Detailed Description
The following detailed description of preferred embodiments of the application is made in connection with the accompanying drawings, which form a part hereof, and together with the description of the embodiments of the application, are used to explain the principles of the application and are not intended to limit the scope of the application.
The invention discloses a distributed collaborative task assignment method based on wave function collapse, which is shown in fig. 1 and comprises the following steps:
s1, carrying out resource evaluation on each node in a cluster to obtain a storage resource value, a calculation resource value and a communication response frequency value of each node;
s2, determining the number of nodes required by the task to be executed according to the number of idle nodes in the current cluster and the type of the task to be executed;
s3, acquiring a storage resource predicted value, a calculation resource predicted value and a communication resource predicted value of the task to be executed, and determining an execution node of the task to be executed based on a wave function collapse algorithm according to the number of nodes required by the task to be executed, the resource predicted value of the task to be executed, the storage resource value, the calculation resource value and the communication response frequency value of each node.
According to the resource evaluation value of the task to be executed and the resources of the nodes in the cluster, a wave function collapse algorithm is adopted to select the nodes meeting the task resource requirements from the cluster, so that the execution nodes are dynamically adjusted according to the actual task, the occurrence of cluster resource deficiency or redundant virtual consumption is avoided, and the overall working efficiency and performance are improved.
Specifically, in step S1, performing resource evaluation on each node in the cluster to obtain a storage resource value, a calculation resource value and a communication response frequency value of each node, including:
s11, performing storage resource and calculation resource evaluation on each node in the cluster by adopting an evaluation model to obtain a storage resource evaluation value and a calculation resource evaluation value of each node;
In implementation, the evaluation model may be an existing computer evaluation model, for example, an evaluation model of computer evaluation software such as a robustst.
Specifically, the evaluation model is adopted to test the number of reading and writing per second IOPS i, the storage capacity ST i and the number of floating point operation per second FLOPS i of each node, and the storage resource evaluation value of each node is calculated according to S c_i=IOPSi×STi; the floating point number per second per node FLOPS i is used as the computing resource evaluation value of the node.
IOPS (Input/Output Per Second), i.e., input/output per second (or read/write times), is one of the main indicators for measuring disk performance. IOPS refers to the number of I/O requests that a system can handle per unit time, typically in units of I/O requests per second, which are typically read or write data operation requests. And calculating a storage resource evaluation value of each node according to the product of the number of times of reading and writing of each node per second and the storage capacity, so as to measure the storage resource owned by each node.
FLOPS (floating point operations per second) represents the number of floating point operations per second, which is understood to be the calculation speed, and is an index for measuring the calculation performance of hardware, and the index is used as the calculation resource evaluation value of each node.
The evaluation value of each node is a result obtained by performing evaluation when it is idle.
S12, according to the formulaRespectively calculating a storage resource value and a calculation resource value of each node; wherein S i represents the storage resource value of node i, S c_i represents the storage resource evaluation value of node i,/>Representing the average stored resource assessment value of all nodes, C i representing the calculated resource value of node i, C c_i representing the calculated resource assessment value of node i,/>The average computing resource evaluation value of all the nodes is represented, and N represents the total node number of the cluster;
In order to improve the expandability of the method, after the storage resource evaluation value and the calculation resource evaluation value of each node are calculated, the storage resource evaluation value and the calculation resource evaluation value of each node are converted into dimensionless proportions, and the storage resource value and the calculation resource value of each node are obtained. Specifically, according to the formula The storage resource value and the calculation resource value of each node are calculated separately.
S13, according to the formulaA communication response frequency value R i of each node is calculated, where M represents the number of nodes communicable with node i, and time i,k represents the communication response time between node i and node k.
Specifically, the communication response frequency reflects the network communication performance of the nodes, and the communication response frequency of each node can be calculated according to the network communication response time among the nodes. In implementation, the network communication time between the node i and the node k can be obtained by adopting one real-time test or multiple test average value of any network communication protocol.
Short-term test resource values and nominal resource values are error-prone, time-induced hardware aging can reduce actual values, temporary network or voltage faults can change test values, these changes can occur in various dimensions, in order to accurately perform resource combination, more suitable execution nodes are provided for tasks to be executed, and long-term running weights of the nodes are considered when calculating resource data of each node.
In a specific embodiment, in step S1, performing resource evaluation on each node in the cluster to obtain a storage resource value, a calculation resource value and a communication response frequency value of each node, including:
S14, performing storage resource and calculation resource evaluation on each node in the cluster by adopting an evaluation model to obtain a storage resource evaluation value and a calculation resource evaluation value of each node;
The implementation process of step S14 can refer to step S11, and will not be described herein.
S15, according to the formulaCalculating a storage resource value S i of each node; wherein S i,j represents the value of the storage resource occupied by the j-th execution task of the node i, θ i represents the number of times the node i executes the task, S i,last represents the value of the storage resource occupied by the node i last time the task was executed, S c_i represents the evaluation value of the storage resource of the node i,/>The average storage resource evaluation value of all the nodes is represented, and N represents the total node number of the cluster;
S16, according to the formula Calculating a calculation resource value of each node; wherein C i,j represents the calculation resource value C i,Ci,last occupied by the j-th execution task of the node i represents the calculation resource value occupied by the last execution task of the node i, C c_i represents the calculation resource evaluation value of the node i,Representing an average computing resource evaluation value of all nodes;
s17, according to the formula Calculating the communication response frequency evaluation value of each node according to the formula R i=Rc_i+RLi+RHi,/>Calculating a communication response frequency value R i of each node; wherein R i,j represents a communication response frequency value when the j-th task is executed by the node i, R i,last represents a communication response frequency value when the task is executed last time by the node i, M represents the number of nodes communicable with the node i, and time i,k represents a communication response time between the node i and the node k.
The long-term and short-term resource use condition of the node can be evaluated by considering the resource value actually used by the node when the historical task is executed, and the long-term and short-term resource use information is considered when the resource value is calculated, so that the accuracy of resource calculation is further improved, and a data basis is provided for the follow-up execution node for determining the task to be executed according to the resource value.
It should be noted that S i,j may be the maximum storage resource value occupied by the j-th execution of the task of the node i, C i,j may be the maximum calculation resource value occupied by the j-th execution of the task of the node i, and R i,j may be the maximum communication response frequency value of the j-th execution of the task of the node i.
After calculating the storage, calculation and communication resource value of each node in the cluster, for the task to be executed, determining the number of nodes required by the task to be executed according to the number of idle nodes in the current cluster and the type of the task to be executed in step S2 specifically includes:
Judging whether similar tasks are executed according to task description of the tasks to be executed, if yes, determining the number of nodes required by the tasks to be executed currently according to the number of executing nodes of the tasks of the same type last time;
otherwise, the number of nodes needed by the current task to be executed is half of the number of the current idle nodes.
Specifically, whether the tasks of the same type are executed or not is judged according to the task description, and matching judgment can be carried out through keyword matching or task types listed by a developer in the task description. For example, if the number of keywords of the task to be executed, which is the same as that of a certain historical execution task, is more than 90% of the historical execution task, the current task to be executed and the historical execution task are regarded as the same type task. Or the task to be executed listed by the developer is the same as the task type of a certain historical execution task, and the current task to be executed and the historical execution task are regarded as the same type of task.
If the cluster does not execute the tasks of the same type, the number of nodes needed by the task to be executed is set to be half of the number of idle nodes, and if the number of idle nodes is 1, the number of nodes needed by the task to be executed is 1.
If the cluster executes the tasks of the same type, the number of nodes required by the task to be executed currently can be determined according to the number of executing nodes of the task of the same type last time.
Most tasks requiring cluster calculation can evaluate a basic task workload proportion for experienced system architects, but are difficult to evaluate for cluster redundancy and collaboration efficiency, that is, how many computers participate in collaboration cannot be accurately set, more waste is caused, the overall efficiency and the working time are reduced, in order to provide a proper number of nodes for tasks to be executed, resources are not wasted, blocking does not occur, the Lagrange median theorem is consulted, and the number of executing nodes of the tasks to be executed currently is determined in an iterative mode according to the number of executing nodes of the tasks of the last type. The method comprises the following steps:
If the task of the last type has lack of resources, the number of nodes needed by the task to be executed currently is count=epsilon+epsilon/2; if resource redundancy exists in the task of the last type, the number of nodes needed by the task to be executed currently is count=epsilon-epsilon/2, wherein epsilon represents the number of executing nodes of the task of the last type; otherwise, the number of nodes needed by the current task to be executed is count=epsilon;
If the number count of the nodes needed by the current task to be executed is larger than the number of the idle nodes of the current cluster, the number of the nodes needed by the current task to be executed is set as the number of the idle nodes of the current cluster. If the number of nodes needed by the current task to be executed is less than 1, setting the count to be 1.
That is, if the task of the last type has a lack of resources, the number of nodes is increased based on the number of executing nodes of the task of the last type, if the task of the last type has a redundancy of resources, the number of nodes is decreased based on the number of executing nodes of the task of the last type, and if the task of the last type does not have a lack of resources or a redundancy of resources, that is, the number of nodes is proper, the number of executing nodes of the task of the last type is maintained. The node number of the task to be executed is determined according to the resource use condition of the task of the last type, and the optimal resource combination of the task is gradually approximated through continuous iterative adjustment, so that the task is operated without wasting resources and without lacking resources, and the resource use efficiency and the cluster performance are improved.
Specifically, the following method is adopted to judge whether resource deficiency or resource redundancy exists or not:
And during task execution, carrying out multiple resource evaluation on each node executing the task, and carrying out the following judgment according to an evaluation result: if the occupancy rate of one type of resources exists in all the nodes executing the task exceeds a third threshold, judging that the resources for executing the task are lacking; if the occupancy rate of all the resource types of all the nodes executing the task is smaller than the third threshold value, judging that the task has resource redundancy.
In implementation, the method in step S1 may be used to perform multiple evaluations on the node, to obtain multiple storage resource evaluation values, calculation resource evaluation values, and communication response frequency evaluation values of the node during task execution. The storage resource occupancy rate of the node when executing the task is as follows: (storage resource evaluation value for idle time-storage resource evaluation value during execution of task)/storage resource evaluation value for idle time; the computing resource occupancy rate of the node when executing the task is: (computing resource evaluation value of idle time-computing resource evaluation value during execution of task)/computing resource evaluation value of idle time; the occupancy rate of communication resources when the node executes the task is as follows: (communication response frequency evaluation value for idle time-communication response frequency evaluation value during execution of a task)/communication response frequency evaluation value for idle time. Averaging the occupancy rates of the storage resources to obtain the average occupancy rate of the storage resources when the node executes the task; averaging the multiple computing resource occupancy rates to obtain an average computing resource occupancy rate when the node executes the task; and averaging the plurality of communication resource occupancy rates to obtain the average communication resource occupancy rate when the node executes the task.
If the occupancy rate in any dimension of the storage resources, the computing resources or the communication resources of all the nodes executing the task exceeds the third threshold, the resource deficiency exists in the resource dimension of the cooperative node, so that the resource deficiency in executing the task is judged.
If the occupancy rates of the storage resources, the computing resources and the communication resources of all the nodes executing the task do not exceed the third threshold value, redundancy exists in all the resource dimensions of all the cooperative nodes, so that the task execution is judged to have resource redundancy.
If there is no lack of resources and no redundancy of resources when the task is executed, the number of nodes considered to be allocated is considered to be appropriate.
In practice, the third threshold may be determined according to the accuracy of the determination, for example, the third threshold may be 98%.
The number of the needed nodes can be quickly and accurately determined for the task to be executed according to the resource use condition of the same type of task during execution.
Before determining the execution node of the task to be executed based on the wave function collapse algorithm according to the number of nodes required by the task to be executed, the storage resource value, the calculation resource value and the communication response frequency value of each node, the resource pre-evaluation value of the task to be executed, namely the storage resource pre-evaluation value, the calculation resource pre-evaluation value and the communication resource pre-evaluation value, needs to be acquired.
Specifically, the storage resource predicted value, the calculation resource predicted value and the communication resource predicted value of the task to be executed are proportional values obtained by comparing the storage resource reference quantity, the calculation resource reference quantity and the communication resource reference quantity obtained after the storage resource quantity, the calculation resource quantity and the communication resource quantity which are predicted to be used by the task to be executed are evaluated by a system architect.
In addition, the average value of the storage resources, the computing resources and the communication resources used when the tasks are executed by the plurality of historical tasks with the same type can be obtained by comparing the average value with the storage resource reference value, the computing resource reference value and the communication resource reference value.
In a specific embodiment, the following steps are adopted to obtain a storage resource predicted value, a calculation resource predicted value and a communication resource predicted value of a task to be executed:
If the cluster executes the similar task, obtaining an average storage resource value, an average calculation resource value and an average communication response frequency value used by the similar task according to the storage resource occupation value, the calculation resource occupation value and the communication response frequency resource occupation value of each execution node during the execution of each similar task; dividing the average storage resource value, the average calculation resource value and the average communication resource value by the storage resource reference value, the calculation resource reference value and the communication resource reference value respectively to obtain a storage resource predicted value, a calculation resource predicted value and a communication resource predicted value of a task to be executed;
Otherwise, acquiring a storage resource predicted value, a calculation resource predicted value and a communication response frequency predicted value of the task to be executed, which are evaluated by the developer.
In implementation, the storage resource reference value, the calculation resource reference value and the communication resource reference value may be empirically set, where the reference values are set to convert each resource value into a dimensionless value, so as to screen an optimal set in which a resource allocation proportion accords with a resource demand proportion of a task to be executed.
In implementation, the same kind of task determination may be performed by the same method as in step S2.
If the cluster executes the same type of task, multiple groups of storage resource occupation values, calculation resource occupation values and communication response frequency occupation values of each execution node of the same type of task in the execution period are obtained, and multiple groups of data are averaged to obtain the storage resource occupation values, the calculation resource occupation values and the communication response frequency occupation values of each execution node.
And summing the storage resource occupation value, the calculation resource occupation value and the communication response frequency occupation value of all the execution nodes of each similar task to obtain the storage resource value, the calculation resource value and the communication response frequency value of each similar task.
And averaging the storage resource values, the calculation resource values and the communication response frequency values of the similar tasks, and obtaining the average storage resource values, the average calculation resource values and the average communication response frequency values of the similar tasks.
Dividing the average storage resource value, the average calculation resource value and the average communication resource value by the storage resource reference value, the calculation resource reference value and the communication resource reference value respectively to obtain a storage resource predicted value, a calculation resource predicted value and a communication resource predicted value of the task to be executed.
Specifically, the following steps are adopted to screen the execution nodes for executing the tasks:
s31, randomly screening a node from the idle node set, and adding the node into the executing node set;
initially, the executing node set is empty, and one node is randomly screened from the idle node set to be added into the executing node set.
S32, sorting the storage resources, the computing resources and the communication resources according to the sequence of the storage resources, the computing resources and the communication resource predicted values of the tasks to be executed from large to small, wherein the communication takes the first-ranked resources as the current resource dimension; in order to ensure that the capability of the screened nodes in the core dimension is consistent, firstly, sorting the storage resources, the computing resources and the communication resources according to the sequence from large to small of the predicted values of the storage resources, the computing resources and the communication resources of the tasks to be executed. For example, if the storage resource of the task to be executed is predicted to be 100, the computing resource is predicted to be 10, and the communication resource is predicted to be 30, the three resources are ranked according to the predicted values: storage resources, communication resources, computing resources, and thus the first-ranked storage resources are taken as the current resource dimension.
Starting from the second node, whether the screened node is equivalent to the node added into the executing node set in the current resource dimension or not needs to be evaluated, and if so, the node is added into the executing node set, so that the consistency of the node set to be executed in the resource dimension is maintained.
S33, if the number of the nodes of the current execution node set is equal to the number of the nodes required by the task to be executed, finishing screening; otherwise, if the random screening times under the current resource dimension reaches the second threshold, taking the next resource dimension as the current resource dimension, and returning to the step S33; otherwise, randomly screening a node from the idle node set, if the current screening node is equivalent to the node newly added into the executing node set in the current resource dimension, adding the current screening node into the executing node set, and returning to the step S33.
The executing node set comprises at least one node.
Before screening the second node, firstly judging whether the number of the nodes of the current executing node set meets the number of the nodes of the task to be executed, and if the number of the nodes of the executing node set is equal to the number of the nodes required by the task to be executed, ending the screening.
In order to improve screening efficiency, a random screening frequency limit, namely a second threshold value, is set under each resource dimension, and if the screening frequency under the current resource dimension reaches the second threshold value, the next resource dimension is taken as the current resource dimension. In practice, the second threshold may be set to the number of nodes needed for the task to be performed.
And for the currently screened nodes, if the capacity of the currently screened nodes is equivalent to that of the nodes which are newly added into the executing node set in the current resource dimension, adding the currently screened nodes into the executing node set. Otherwise, discarding the node which is currently screened out, and randomly screening the next node again.
Specifically, whether the current screening node is equivalent to the node which is newly added into the executing node set in the current resource dimension or not is judged by the following method:
If the ratio of the resource value of the current screening node in the current resource dimension to the ratio of the node newly added to the executing node set in the current resource dimension is in the first threshold range, the capacity of the current screening node in the current resource dimension is equivalent to that of the node newly added to the executing node set.
In practice, the first threshold range may be set according to the requirements of node capability consistency, for example, may be set to [0.9,10]. For example, the current resource dimension is a storage resource, the storage resource value of the node combined by the latest joining execution node is 10, the storage resource value of the current screening node is 11, the ratio of the current screening node to the node newly joining the execution node in the storage resource dimension is 1.1, and the current picking node can be added into the execution node set within the first threshold range.
Based on the wave function collapse algorithm, the resource quantity of the subsequent nodes is at least almost higher and possibly higher through a screening method with equivalent local capacity, so that an execution node set meeting the resource requirement is generated.
In implementation, after the screening times of the three dimensions reach the second threshold, the number of nodes in the executing node set is smaller than the number of nodes required by the task to be executed. At this time, the node set to be executed may be emptied, and the filtering is restarted from step S31 until the executing node set satisfying the node number requirement is found.
Multiple execution node sets can be screened for multiple times, and the optimal set is selected from the multiple execution node sets to serve as a final execution node set of the task to be executed.
Specifically, the following method may be used to select and determine the best set from a plurality of execution node sets:
for any one executing node set, the sum of the storage resource values, the sum of the calculation resource values and the sum of the communication response resource values of all nodes in the set are calculated respectively.
And sorting the storage resource dimension, the computing resource dimension and the communication resource dimension according to the storage resource pre-evaluation value, the computing resource pre-evaluation value and the communication resource pre-evaluation value of the task to be executed.
And for each execution joint set, calculating the division results of the three resource values according to the sequence of the sequences. For example, the storage resource dimension, the ordering result of the computation resource dimension and the communication resource dimension is the result of computing resource dimension, communication resource dimension and storage resource dimension, and computing the computation resource total value/communication resource total value/storage resource total value of each executing node set.
And calculating the successive division results of the three resource pre-estimation values of the task to be executed according to the sequence, namely the calculation resource pre-estimation value/communication resource pre-estimation value/storage resource pre-estimation value results of the task to be executed.
And selecting the execution node set with the nearest coupling result to the coupling result of the task to be executed as the execution node set of the task to be executed.
If the result of the division of the value of a certain executing node set in the sorted resource dimension is closest to the result of the division of the value of the task to be executed in the sorted resource dimension, the executing node set is the optimal node set.
In the prior art, a large amount of redundancy and task scale limitation exist in load balancing, the running efficiency of the whole system is dragged, or the system is blocked before a large task, and the problem that the cluster system is dynamically changed cannot be solved. In a state that the information is ambiguous in the operation, basic measurement is provided through estimation and idle running measurement, and then task estimation and cooperative group boundary are continuously corrected through continuous cooperative operation, so that higher operation efficiency is obtained.
In one embodiment of the present invention, a distributed collaborative task assignment system based on wave function collapse is disclosed, as shown in fig. 2, comprising the following modules:
The node resource evaluation module is used for evaluating the resources of each node in the cluster to obtain a storage resource value, a calculation resource value and a communication response frequency value of each node;
the task node number determining module is used for determining the number of nodes required by the task to be executed according to the number of idle nodes in the current cluster and the type of the task to be executed;
the task assignment module is used for acquiring a storage resource predicted value, a calculation resource predicted value and a communication resource predicted value of the task to be executed, and determining an execution node of the task to be executed based on a wave function collapse algorithm according to the number of nodes required by the task to be executed, the resource predicted value of the task to be executed, the storage resource value of each node, the calculation resource value and the communication response frequency value.
The method embodiment and the system embodiment are based on the same principle, and the related parts can be mutually referred to and can achieve the same technical effect. The specific implementation process refers to the foregoing embodiment, and will not be described herein.
Those skilled in the art will appreciate that all or part of the flow of the methods of the embodiments described above may be accomplished by way of a computer program to instruct associated hardware, where the program may be stored on a computer readable storage medium. Wherein the computer readable storage medium is a magnetic disk, an optical disk, a read-only memory or a random access memory, etc.
The present invention is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present invention are intended to be included in the scope of the present invention.

Claims (8)

1. The distributed collaborative task assignment method based on wave function collapse is characterized by comprising the following steps of:
performing resource evaluation on each node in the cluster to obtain a storage resource value, a calculation resource value and a communication response frequency value of each node; the calculation resource value is the floating point operation times of each second of the node;
determining the number of nodes required by the task to be executed according to the number of idle nodes in the current cluster and the type of the task to be executed;
Acquiring a storage resource predicted value, a calculation resource predicted value and a communication resource predicted value of a task to be executed, and determining an execution node of the task to be executed based on a wave function collapse algorithm according to the number of nodes required by the task to be executed, the resource predicted value of the task to be executed, the storage resource value, the calculation resource value and the communication response frequency value of each node;
Determining the number of nodes needed by the task to be executed according to the number of idle nodes in the current cluster and the type of the task to be executed, including:
Judging whether similar tasks are executed according to task description of the tasks to be executed, if yes, determining the number of nodes required by the tasks to be executed currently according to the number of executing nodes of the tasks of the same type last time;
otherwise, the number of the nodes needed by the current task to be executed is half of the number of the current idle nodes;
The method comprises the following steps of determining the number of nodes needed by a current task to be executed according to the number of executing nodes of the last task of the same type:
If the task of the last type has lack of resources, the number of nodes needed by the task to be executed currently is count=epsilon+epsilon/2; if resource redundancy exists in the task of the last type, the number of nodes needed by the task to be executed currently is count=epsilon-epsilon/2, wherein epsilon represents the number of executing nodes of the task of the last type; otherwise, the number of nodes needed by the current task to be executed is count=epsilon;
If the number count of the nodes needed by the current task to be executed is larger than the number of the idle nodes of the current cluster, the number of the nodes needed by the current task to be executed is set as the number of the idle nodes of the current cluster.
2. The method for assigning distributed collaborative tasks based on wave function collapse according to claim 1, wherein the following method is employed to determine whether there is a lack of resources or redundancy of resources:
and during task execution, carrying out multiple resource evaluation on each node executing the task, and carrying out the following judgment according to an evaluation result:
if the occupancy rate of one type of resources exists in all the nodes executing the task exceeds a third threshold, judging that the resources for executing the task are lacking; if the occupancy rate of all the resource types of all the nodes executing the task is smaller than the third threshold value, judging that the task has resource redundancy.
3. The method for assigning distributed collaborative tasks based on wave function collapse according to claim 1, wherein performing resource evaluation on each node in a cluster to obtain a storage resource value, a calculation resource value and a communication response frequency value of each node includes:
performing storage resource and computing resource evaluation on each node in the cluster by adopting an evaluation model to obtain a storage resource evaluation value and a computing resource evaluation value of each node;
According to the formula Respectively calculating a storage resource value and a calculation resource value of each node; wherein S i represents the storage resource value of node i, S c_i represents the storage resource evaluation value of node i,/>Representing the average stored resource assessment value for all nodes, C i representing the computational resource value for node i, C c_i representing the computational resource assessment value for node i,The average computing resource evaluation value of all the nodes is represented, and N represents the total node number of the cluster;
According to the formula A communication response frequency value R i of each node is calculated, where M represents the number of nodes communicable with node i, and time i,k represents the communication response time between node i and node k.
4. The method for assigning distributed collaborative tasks based on wave function collapse according to claim 3,
Adopting an evaluation model to test the number of reading and writing IOPS i, the storage capacity ST i and the number of floating point operation FLOPS i of each node per second, and calculating a storage resource evaluation value of each node according to S c_i=IOPSi×STi; the floating point number per second per node FLOPS i is used as the computing resource evaluation value of the node.
5. The method for assigning distributed collaborative tasks based on wave function collapse according to claim 1, wherein performing resource evaluation on each node in a cluster to obtain a storage resource value, a calculation resource value and a communication response frequency value of each node includes:
performing storage resource and computing resource evaluation on each node in the cluster by adopting an evaluation model to obtain a storage resource evaluation value and a computing resource evaluation value of each node;
According to the formula Calculating a storage resource value S i of each node; wherein S i,j represents the value of the storage resource occupied by the j-th execution task of the node i, θ i represents the number of times the node i executes the task, S i,last represents the value of the storage resource occupied by the node i last time the task was executed, S c_i represents the evaluation value of the storage resource of the node i,/>The average storage resource evaluation value of all the nodes is represented, and N represents the total node number of the cluster;
According to the formula Calculating a calculation resource value of each node; wherein C i,j represents the calculation resource value C i,Ci,last occupied by the j-th execution task of the node i represents the calculation resource value occupied by the last execution task of the node i, C c_i represents the calculation resource evaluation value of the node i,/>Representing an average computing resource evaluation value of all nodes;
According to the formula Calculating the communication response frequency evaluation value of each node according to the formula R i=Rc_i+RLi+RHi,/>Calculating a communication response frequency value R i of each node; wherein R i,j represents a communication response frequency value when the j-th task is executed by the node i, R i,last represents a communication response frequency value when the task is executed last time by the node i, M represents the number of nodes communicable with the node i, and time i,k represents a communication response time between the node i and the node k.
6. The method for assigning distributed collaborative tasks based on wave function collapse according to claim 1 wherein the following steps are employed to screen the executing nodes of the task to be executed:
s31, randomly screening a node from the idle node set, and adding the node into the executing node set;
S32, sorting the storage resources, the computing resources and the communication resources according to the sequence of the storage resources, the computing resources and the communication resource predicted values of the tasks to be executed from large to small, and taking the first-ranked resources as the current resource dimension;
S33, if the number of the nodes of the current execution node set is equal to the number of the nodes required by the task to be executed, finishing screening; otherwise, if the random screening times under the current resource dimension reaches the second threshold, taking the next resource dimension as the current resource dimension, and returning to the step S33; otherwise, randomly screening a node from the idle node set, if the current screening node is equivalent to the node newly added into the executing node set in the current resource dimension, adding the current screening node into the executing node set, and returning to the step S33.
7. The method for assigning distributed collaborative tasks based on wave function collapse according to claim 6, wherein if a ratio of a current screening node's resource value in a current resource dimension to a node that has recently joined the set of executing nodes in the current resource dimension is within a first threshold, the current screening node's ability to be compared with the node that has recently joined the set of executing nodes in the current resource dimension is comparable.
8. A distributed collaborative task assignment system based on wave function collapse, comprising the following modules:
the node resource evaluation module is used for evaluating the resources of each node in the cluster to obtain a storage resource value, a calculation resource value and a communication response frequency value of each node; the calculation resource value is the floating point operation times of each second of the node;
the task node number determining module is used for determining the number of nodes required by the task to be executed according to the number of idle nodes in the current cluster and the type of the task to be executed;
The task assignment module is used for acquiring a storage resource predicted value, a calculation resource predicted value and a communication resource predicted value of a task to be executed, and determining an execution node of the task to be executed based on a wave function collapse algorithm according to the number of nodes required by the task to be executed, the resource predicted value of the task to be executed, the storage resource value of each node, the calculation resource value and the communication response frequency value;
Determining the number of nodes needed by the task to be executed according to the number of idle nodes in the current cluster and the type of the task to be executed, including:
Judging whether similar tasks are executed according to task description of the tasks to be executed, if yes, determining the number of nodes required by the tasks to be executed currently according to the number of executing nodes of the tasks of the same type last time;
otherwise, the number of the nodes needed by the current task to be executed is half of the number of the current idle nodes;
The method comprises the following steps of determining the number of nodes needed by a current task to be executed according to the number of executing nodes of the last task of the same type:
If the task of the last type has lack of resources, the number of nodes needed by the task to be executed currently is count=epsilon+epsilon/2; if resource redundancy exists in the task of the last type, the number of nodes needed by the task to be executed currently is count=epsilon-epsilon/2, wherein epsilon represents the number of executing nodes of the task of the last type; otherwise, the number of nodes needed by the current task to be executed is count=epsilon;
If the number count of the nodes needed by the current task to be executed is larger than the number of the idle nodes of the current cluster, the number of the nodes needed by the current task to be executed is set as the number of the idle nodes of the current cluster.
CN202211309545.1A 2022-10-25 2022-10-25 Distributed collaborative task assignment method and system based on wave function collapse Active CN116048773B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211309545.1A CN116048773B (en) 2022-10-25 2022-10-25 Distributed collaborative task assignment method and system based on wave function collapse

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211309545.1A CN116048773B (en) 2022-10-25 2022-10-25 Distributed collaborative task assignment method and system based on wave function collapse

Publications (2)

Publication Number Publication Date
CN116048773A CN116048773A (en) 2023-05-02
CN116048773B true CN116048773B (en) 2024-05-28

Family

ID=86117000

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211309545.1A Active CN116048773B (en) 2022-10-25 2022-10-25 Distributed collaborative task assignment method and system based on wave function collapse

Country Status (1)

Country Link
CN (1) CN116048773B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117112242B (en) * 2023-10-24 2024-01-26 纬创软件(武汉)有限公司 Resource node allocation method and system in cloud computing system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107291539A (en) * 2017-06-19 2017-10-24 山东师范大学 Cluster program scheduler method based on resource significance level
CN112241321A (en) * 2020-09-24 2021-01-19 北京影谱科技股份有限公司 Computing power scheduling method and device based on Kubernetes
CN112328399A (en) * 2020-11-17 2021-02-05 中国平安财产保险股份有限公司 Cluster resource scheduling method and device, computer equipment and storage medium
CN114880748A (en) * 2022-05-27 2022-08-09 北京邮电大学 Layout design method and related device thereof

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108268318A (en) * 2016-12-30 2018-07-10 华为技术有限公司 A kind of method and apparatus of distributed system task distribution

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107291539A (en) * 2017-06-19 2017-10-24 山东师范大学 Cluster program scheduler method based on resource significance level
CN112241321A (en) * 2020-09-24 2021-01-19 北京影谱科技股份有限公司 Computing power scheduling method and device based on Kubernetes
CN112328399A (en) * 2020-11-17 2021-02-05 中国平安财产保险股份有限公司 Cluster resource scheduling method and device, computer equipment and storage medium
CN114880748A (en) * 2022-05-27 2022-08-09 北京邮电大学 Layout design method and related device thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于改进量子粒子群算法的多无人机任务分配;邓可等;指挥控制与仿真;20181015;第40卷(第05期);第32-36页 *

Also Published As

Publication number Publication date
CN116048773A (en) 2023-05-02

Similar Documents

Publication Publication Date Title
CN115562870B (en) Task node resource construction method of cluster
US9749208B2 (en) Integrated global resource allocation and load balancing
US7610425B2 (en) Approach for managing interrupt load distribution
US8463971B2 (en) Approach for distributing interrupts from high-interrupt load devices
US9201690B2 (en) Resource aware scheduling in a distributed computing environment
US7581052B1 (en) Approach for distributing multiple interrupts among multiple processors
Kumar et al. ARPS: An autonomic resource provisioning and scheduling framework for cloud platforms
US10353745B1 (en) Assessing performance of disparate computing environments
Rathore Dynamic threshold based load balancing algorithms
Patni et al. Load balancing strategies for grid computing
CN116048773B (en) Distributed collaborative task assignment method and system based on wave function collapse
Ziafat et al. A hierarchical structure for optimal resource allocation in geographically distributed clouds
CN103997515A (en) Distributed cloud computing center selection method and application thereof
Abdullah et al. A reliable, TOPSIS-based multi-criteria, and hierarchical load balancing method for computational grid
CN113127173B (en) Heterogeneous sensing cluster scheduling method and device
Selvi et al. Popularity (hit rate) based replica creation for enhancing the availability in cloud storage
Javanmardi et al. An architecture for scheduling with the capability of minimum share to heterogeneous Hadoop systems
CN115357401B (en) Task scheduling and visualization method and system based on multiple data centers
Garg et al. Optimal virtual machine scheduling in virtualized cloud environment using VIKOR method
CN116302327A (en) Resource scheduling method and related equipment
Wan et al. Utility-driven share scheduling algorithm in hadoop
El-Zoghdy et al. A threshold-based load balancing algorithm for grid computing systems
Cao et al. A novel and one-stage embedding algorithm for mapping virtual networks
Wei et al. A novel scheduling mechanism for hybrid cloud systems
CN113689103B (en) Mining and shunting intelligent scheduling management method, device and system for self-adaptive load balancing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant