CN115454649A - Dynamic task scheduling system for calculation of space control simulation model - Google Patents

Dynamic task scheduling system for calculation of space control simulation model Download PDF

Info

Publication number
CN115454649A
CN115454649A CN202211221165.2A CN202211221165A CN115454649A CN 115454649 A CN115454649 A CN 115454649A CN 202211221165 A CN202211221165 A CN 202211221165A CN 115454649 A CN115454649 A CN 115454649A
Authority
CN
China
Prior art keywords
node
task
scheduler
load
load migration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211221165.2A
Other languages
Chinese (zh)
Inventor
覃润楠
谢文明
惠建江
彭晓东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National Space Science Center of CAS
Original Assignee
National Space Science Center of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Space Science Center of CAS filed Critical National Space Science Center of CAS
Priority to CN202211221165.2A priority Critical patent/CN115454649A/en
Publication of CN115454649A publication Critical patent/CN115454649A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • G06F9/5088Techniques for rebalancing the load in a distributed system involving task migration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/547Remote procedure calls [RPC]; Web services

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a dynamic task scheduling system for space manipulation simulation model calculation, which comprises a scheduling center, a plurality of scheduler nodes and a plurality of executor nodes, wherein the scheduler nodes are connected with the executor nodes through a network; the scheduling center is used for dynamically adjusting task scheduling based on a dynamic load migration technology according to task scheduling requirements of the space control simulation service and distributing tasks to the registered scheduler nodes; the scheduler node is used for producing and caching tasks, triggering the actuator center by adopting a Remote Procedure Call (RPC) technology, and dynamically and elastically expanding the capacity of the actuator node aiming at the tasks which cannot be met; the actuator center is used for logging in the dispatching center in a heartbeat registration mode to realize remote communication between the actuator center and the dispatching center; the system is also used for decomposing the tasks and triggering corresponding actuator nodes; and the actuator nodes are used for executing corresponding space control simulation model calculation tasks.

Description

Dynamic task scheduling system for calculation of space control simulation model
Technical Field
The invention relates to the field of computer system space control simulation, in particular to a dynamic task scheduling system for space control simulation model calculation.
Background
Aiming at exploration research in the space field, various countries develop researches in the aspects of simulation of space control tasks such as full-space orbit maintenance, camera load detection, debris avoidance and obstacle clearing, on-orbit operation and the like in order to improve the performance of a spacecraft and reduce the development risk and cost of the spacecraft. The simulation of the space control task process needs to perform virtualized activities such as space control task deduction, evaluation, control and decision under a simulation environment, and relates to complex task processing of multidisciplinary coupled modeling such as aircraft dynamics, hydrodynamics and computer vision.
Nowadays, a server architecture carries out continuous upgrading and iteration from a single architecture to a distributed architecture to a micro-service, and when the micro-service brings strong flexible expansion application, related problems of cluster deployment, scheduling optimization and the like also exist. In order to optimize cluster resource scheduling, many distributed task scheduling systems are generated, and the bottleneck is how to optimally control server resources, that is, how to assign thread/process quotas of different service classes to each computing node. When an Elastic-Job task scheduling system developed by a group provides functions of task fragmentation, elastic capacity expansion and reduction and the like, but does not support dynamic management of timing tasks and timing tasks of workflow types; an XXL-JOB task scheduling system of the public commenting group provides light-weight task scheduling dynamic management, but the service registration capability is slow under the mass business processing; the TBSchedule task scheduling system introduced by the Alababa company has strong expandability, but the task type is single; the Spring Cloud Scheduler task scheduling system has a strong community function, but the framework is complex and has certain requirements on hardware resources.
In an actual space manipulation simulation task, due to the difference of the performance of each computing node in a server cluster, a plurality of task scheduling systems have a phenomenon of unbalanced load, which causes problems of reduced system processing speed, increased network delay and the like, for example: 1) The role of the task node resource is fixed or is single, and simultaneously, the node backup is excessive to cause a large amount of invalid cost of the resource; 2) When the load is transferred, a large number of tasks are transferred into the task nodes with lighter loads, so that the task nodes trigger load balance again, periodic oscillation is generated, and the performance of the server system is influenced.
Therefore, for the space manipulation simulation task, how to design a task scheduling system to achieve the purposes of optimizing resource use, minimizing system response time, maximizing model calculation efficiency and avoiding overload of a server becomes a problem to be solved urgently at present.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a dynamic task scheduling system for space manipulation simulation model calculation.
The method mainly aims at the challenge brought to the traditional service architecture by the high concurrent calling of the massive multi-source heterogeneous space manipulation simulation model in the large-scale space manipulation simulation test, and establishes the dynamic task scheduling system taking the load migration model as the core.
In order to achieve the above object, the present invention provides a dynamic task scheduling system for spatial manipulation simulation model calculation, which includes: the system comprises a dispatching center, a plurality of dispatcher nodes and a plurality of executor nodes; the dispatcher node and the executor node are respectively deployed on different server clusters, wherein,
the scheduling center is used for dynamically adjusting task scheduling based on a dynamic load migration technology according to task scheduling requirements of the space control simulation service, and allocating tasks to the registered scheduler nodes;
the scheduler node is used for producing and caching tasks, triggering the actuator center by adopting a Remote Procedure Call (RPC) technology, and performing dynamic elastic expansion on the actuator node aiming at the tasks which cannot be met;
the actuator center is used for logging in the dispatching center in a heartbeat registration mode to realize remote communication between the actuator center and the dispatching center; the system is also used for carrying out specific calculation flow and calculation task decomposition on the tasks and triggering corresponding actuator nodes to realize high-concurrency execution of task scheduling;
and the actuator node is used for executing a corresponding space control simulation model calculation task.
As an improvement of the above system, the dynamic load migration technique is to adopt a dynamic load migration algorithm to establish an optimally controlled load migration model; wherein the load migration model comprises:
the state information returned by the scheduler node j is scheduled by the task to the queue length Q i And the number T of idle threads in the thread pool of the scheduler j Form N binary groups < Q i ,T j >, i, j =1, 2., N, scheduler node load balancing time
Figure BDA0003878264430000021
The method is proportional to the length of a task scheduling queue and inversely proportional to the number of threads for concurrently processing the tasks, and meets the following formula:
Figure BDA0003878264430000022
Figure BDA0003878264430000023
wherein the content of the first and second substances,
Figure BDA0003878264430000024
the idleness of the scheduler node j serving the scheduling of i tasks is represented, alpha is a transmission attenuation coefficient and is a constant related to a service-side switch and a controller, and E [ Delta L ] i,j (t)]Is a mathematical expectation of the scheduler node load variation.
As an improvement of the above system, the establishing of the optimally controlled load migration model specifically includes:
defining a real-time load RLP index L i,j (t) satisfies the following formula:
L i,j (t)=(aN c +bC c +cM c )/(N r +C r +M r )
the resource allocation method comprises the following steps that a, b and C respectively represent the occupation ratio coefficients of network bandwidth, CPU resources and memory resources in the overall resources of a server, subscript r represents the overall resources of the server, subscript C represents various resources used currently, N represents the network bandwidth, C represents CPU computing resources and M represents the size of the memory;
defining task scheduling distribution weight TSW index W i,j (t) satisfies the following formula:
W i,j (t)=h i,j (t-1)ξ i (L i,j (t)-L i,j (t-1))/Q i
wherein ξ i Scheduling complexity, h, for tasks i,j (t-1) is the load migration marking coefficient of the current scheduler node, if the scheduler node does not carry out the load migration operation at the last sampling moment, h i,j (t-1) defaults to 1, h when load has been migrated i,j (t-1) is 0.5;
defining the load migration TLT index U (t) to satisfy the following formula:
Figure BDA0003878264430000031
wherein u is i,j (t) is the single load migration;
obtaining scheduler node load balancing time
Figure BDA0003878264430000032
Comprises the following steps:
Figure BDA0003878264430000033
meeting scheduler node load balancing time
Figure BDA0003878264430000034
The load migration quantity U (t) is optimally controlled at the tth sampling moment by reducing the change rate of the TLT index U (t) as much as possible to reduce the resource adjustment frequency of the server * (t) is:
U * (t)=U(t-1)-E i,j {α/2W i,j (t)Q i T j }
wherein U (t-1) represents the load migration TLT index at the t-1 th sampling moment, E i,j Is the mathematical expectation calculation.
As an improvement of the foregoing system, the dynamic load migration algorithm specifically includes:
inputting the length Q of the task scheduling queue at the current t moment i (T) and the number of threads in the scheduler thread pool T j (t);
The nodes of each scheduler are added into a set phi { desc } in a descending order according to RLP indexes, and meanwhile, a load balance threshold u at the current moment is calculated thr (t) and providing that at the same time, the scheduler node only responds to a load migration request and can accept the request again after the request is completed;
and traversing the optimal load migration node of each scheduler node at each sampling moment by adopting a greedy idea, performing iterative selection, and calculating the optimally controlled load migration quantity.
As an improvement of the system, traversing the optimal load migration node of each scheduler node at each sampling moment by adopting a greedy thought, performing iterative selection, and calculating the optimally controlled load migration quantity; the method specifically comprises the following steps:
calculating TLT indexes, and selecting a scheduler node with the minimum idleness as a load migration object;
if a plurality of scheduler nodes with the same idleness exist, a single scheduler node TSW index is calculated, a scheduler node with higher weight is selected as a first load migration object, and a single load migration amount u is calculated according to the resource state of the scheduler node i,j (t) when the single load migration exceeds the load balancing threshold u i,j (t), removing the scheduler node from the set, and selecting the subsequent scheduler node to recalculate whether to recalculateThe task scheduling request can be accepted;
calculating the load migration total amount U of the optimal control * (t) marking the scheduler node, and marking the load migration marking coefficient h i,j (t) is set to 0.5.
As an improvement of the above system, the system further comprises: the system comprises a global resource control module, a monitor, an elastic expansion module and a container arrangement tool; wherein the content of the first and second substances,
the global resource control module is used for receiving the suspension state of the tasks in the server cluster through a heartbeat mechanism;
the monitor is used for regularly monitoring the task suspension condition in the server cluster where the executor node is located by subscribing the heartbeat information of the global resource control module;
the elastic expansion module is used for creating a new actuator node by adopting a dynamic elastic expansion algorithm Auto Scale () according to the task suspension condition and storing the new actuator node into an expansion queue expansion Q, and is also used for sending an instruction for creating the new actuator node to the container arrangement tool;
and the container arrangement tool is used for reading new executor node information in the capacity expansion queue, creating an executor node and adding the executor node into the container cluster.
As an improvement of the above system, the processing procedure of the elastic expansion module specifically includes:
and setting the type of the actuator node according to the resource information of each suspended task, if the capacity expansion queue expansion Q comprises the type of the actuator node, continuing to iteratively search the next suspended task, and if not, adding the actuator node into the capacity expansion queue expansion Q.
As an improvement of the above system, the processing of the scheduler node further comprises:
when the scheduler detects a newly added actuator node, the task in the suspended state is awakened again, and whether the actuator node meets the resource requirement of the task is judged again: if the executor node meets the task resource requirement, the task is dispatched to the executor node, and if the executor node still does not meet the task resource requirement, the task is converted into the suspended state again.
Compared with the prior art, the invention has the advantages that:
1. the method breaks through the key technologies of dynamic task scheduling, dynamic load migration, dynamic elastic capacity expansion and the like, optimizes the resource allocation of the server side to improve the task processing capacity, and provides technical support for the calculation requirement of high-concurrency tasks in a large-scale space control simulation test;
2. the invention promotes the development of a task scheduling platform and a load balancing technology based on a micro-service technology, and solves the technical problems of slow service registration capability, large expenditure of server side resources and oscillation of load migration period under high concurrent task response in the prior art;
3. according to the invention, a dynamic task scheduling system taking a load migration model as a core is established to realize optimal server resource allocation, and a dynamic load migration algorithm is designed for avoiding load balancing oscillation, so that the load migration behavior is effectively restrained, the server stability is improved, and finally the massive task processing performance is improved;
4. the invention establishes a generalization platform system suitable for large-scale space control simulation tests, supports the activities of space control simulation deduction, evaluation, control, decision and the like relating to multidisciplinary coupling modeling and complex task processing, and has high application value.
Drawings
FIG. 1 is a technical route diagram of a dynamic task scheduling system for spatial manipulation simulation model computation according to the present invention;
FIG. 2 is a timing diagram of fully asynchronous distributed dynamic task scheduling;
FIG. 3 is a dynamic load migration algorithm flow description;
FIG. 4 is a description of a dynamic elastic capacity expansion technique;
FIG. 5 is a hardware environment deployment;
FIG. 6 is a real-time load status of a 40G manipulation object interaction data packet;
FIG. 7 (a) is a real-time load status for different magnitude task scheduling;
FIG. 7 (b) is a load balancing time consuming task scheduling of different magnitudes.
Detailed Description
The invention relates to a dynamic task scheduling system for space manipulation simulation model calculation, and the technical route is shown in figure 1:
the system comprises: the system comprises a scheduling center, a plurality of scheduler nodes and a plurality of actuator nodes; the dispatcher node and the executor node are respectively deployed on different server clusters, wherein,
the scheduling center is used for dynamically adjusting task scheduling based on a dynamic load migration technology according to task scheduling requirements of the space control simulation service, and distributing tasks to registered scheduler nodes;
the scheduler node is used for producing and caching tasks, triggering the actuator center by adopting a Remote Procedure Call (RPC) technology, and dynamically and elastically expanding the capacity of the actuator node aiming at the tasks which cannot be met;
the actuator center is used for logging in the dispatching center in a heartbeat registration mode to realize remote communication between the actuator center and the dispatching center; the system is also used for carrying out specific calculation flow and calculation task decomposition on the tasks and triggering corresponding actuator nodes to realize high-concurrency execution of task scheduling;
and the actuator node is used for executing a corresponding space control simulation model calculation task.
The technical solution of the present invention will be described in detail below with reference to the accompanying drawings and examples.
Example 1
The embodiment of the invention provides a dynamic task scheduling system for space manipulation simulation model calculation, which mainly comprises three parts of task scheduling, remote execution and data management, and mainly relates to three key technologies in the dynamic task scheduling system: the dynamic task scheduling method comprises a full-asynchronous distributed dynamic task scheduling technology, a dynamic load migration technology based on a load migration model and a dynamic elastic capacity expansion technology.
Aiming at various space control simulation task scheduling requirements of spacecraft orbit maneuver deduction, stepping attitude adjustment, inter-satellite-to-earth communication, space operation and the like, the dynamic task scheduling system performs high concurrency calculation of massive simulation tasks through a full-asynchronous distributed dynamic task scheduling technology: firstly, each scheduler node is dynamically activated by a scheduling center, then, each scheduler node issues a message to an executor center by adopting a Remote Procedure Call (RPC) technology, the executor center disassembles the message into corresponding calculation tasks aiming at a calculation flow, distributes the executor nodes to each calculation task for calculation, and finally, synchronizes the calculation result to a relational database and a non-relational database to realize the functions of quick caching, storage, retrieval and management of data. The dispatching center realizes self-adaptive dispatcher node resource allocation through a dynamic load migration technology based on a load migration model, and if the current server cluster resource does not meet the resource consumption requirement of task allocation, longitudinal expansion is carried out on the server cluster through a dynamic elastic expansion technology, and node resources are increased to complete a calculation task.
(1) Full-asynchronous distributed dynamic task scheduling technology
The specific work time sequence flow of the fully asynchronous distributed dynamic task scheduling technology is shown in fig. 2, when a new task scheduling requirement exists outside, the task is distributed to the scheduling center through a dynamic load migration algorithm, the scheduling center does not participate in the execution of the calculation task, the task is distributed to the registered scheduler node again, and the scheduler node performs the production of the task and writes the task into a consumption queue for caching at a queue interaction layer; then, according to the task allocation result of the scheduler node, stabilizing each task in the consumption queue according to the writing sequence, triggering and executing an actuator center in a computing layer by adopting a Remote Procedure Call (RPC) technology, wherein the actuator center can perform distributed cluster deployment in a server cluster, and logging in a scheduling center in a heartbeat registration mode to realize remote communication between the tasks; and finally, the actuator center performs specific calculation flow and calculation task decomposition on the task, and triggers the corresponding actuator node to realize high-concurrency execution of task scheduling.
The dynamic task scheduling system takes a scheduling center as a control center, triggers the scheduling center to distribute jobs by quickly analyzing related calculation requirements of users, carries out actual processing work of the jobs by an actuator center, supports registration or removal of corresponding actuators to carry out dynamic elastic expansion, and realizes full-asynchronous design and distributed deployment of task scheduling.
The specific work time sequence flow of the fully asynchronous distributed dynamic task scheduling technology is shown in fig. 2, when a new task scheduling requirement exists outside, the task is distributed to the scheduling center through a dynamic load migration algorithm, the scheduling center does not participate in the execution of the calculation task, the task is distributed to the registered scheduler node again, and the scheduler node performs the production of the task and writes the task into a consumption queue for caching at a queue interaction layer; then, according to the task allocation result of the scheduler node, stabilizing each task in the consumption queue according to the writing sequence, triggering and executing an actuator center in a calculation layer by adopting a Remote Procedure Call (RPC) technology, wherein the actuator center can perform distributed cluster deployment in a server cluster, and logging in a scheduling center in a heartbeat registration mode to realize remote communication between the tasks and the actuator center; and finally, the actuator center performs specific calculation flow and calculation task decomposition on the task, and triggers the corresponding actuator node to realize high-concurrency execution of task scheduling.
The core of the fully-asynchronous distributed dynamic Task scheduling technology lies in how to perform optimal Task allocation on scheduler nodes, so the invention provides a dynamic Task scheduling algorithm Task Schedule (·), that is, a node set meeting resource requirements is obtained by traversing all scheduler nodes in a server cluster, dynamic adjustment of Task scheduling is performed by adopting a dynamic load migration technology based on a load migration model, for tasks which cannot be met under the scheduling strategy, longitudinal capacity expansion is performed on a server cluster by adopting a dynamic elastic technology, node resources are increased to complete scheduling tasks, and the overall idea is shown in table 1.
TABLE 1 dynamic Task scheduling Algorithm Task Schedule (·)
Figure BDA0003878264430000071
Figure BDA0003878264430000081
Specifically, resource initialization is performed on the server cluster first, and then a task scheduling request submitted by a user enters a scheduling queue Q i I = {1, 2.. Multidot.n }, all scheduler nodes in the server cluster are traversed, and the resource requirement Q carried by the task is determined according to the resource requirement Q carried by the task i Dem, the local scheduler firstly judges whether the resource localnode. Res of the node is satisfied, if so, the task is scheduled to the node, if not, the dynamic Load migration algorithm Load Balance () is executed, a node set which satisfies the resource requirement is obtained by traversing all the nodes in the cluster, and the optimal task scheduling distribution result is obtained on the node set according to the dynamic Load balancing algorithm. If the resource of any node can not meet the resource requirement of the task, the task is suspended Q at the moment i And pending, simultaneously carrying out capacity expansion operation on the server cluster, and executing the Auto Scale (-) of the dynamic elastic capacity expansion algorithm. The capacity expansion operation is divided into horizontal capacity expansion and vertical capacity expansion, wherein the horizontal capacity expansion can only copy configured resource nodes, the vertical capacity expansion mechanism is flexible and changeable, and the resource nodes can be re-established according to the resource requirements of tasks. And traversing the suspended task again after a new node is added in the server cluster, checking the resource requirement of the task, converting the suspended task into a ready state again when the node meets the requirement, and rescheduling the suspended task to the newly added node.
(2) Dynamic load migration technology based on load migration model
1) Load migration model definition
The bottleneck of the fully-asynchronous distributed dynamic task scheduling technology lies in how to optimally control server resources, namely how to allocate different computing nodes to each computing task, and the load balancing strategy ensures that each computing node is uniformly scheduled, so that the nodes are prevented from being in an overload or idle state. Most current load balancing strategies are mainly based on the assumption that the client access amount and the service response time are exponentially distributed, but the space manipulation simulation task comprises the synchronous calling of large-scale computing service and the requirements of high-frequency data interaction and read-write operation, and the service response time is difficult to perform fitting calculation. In order to accurately evaluate the load condition of each node on a server and further efficiently distribute new task requests, and meanwhile, to avoid the problems of server overload caused by rapid increase of calculation tasks or server dormancy caused by rapid decrease of the tasks and the like, the invention establishes a load migration model and realizes optimal node resource distribution.
The state information returned by the server node j is the length Q of the task scheduling queue i And the number T of idle threads in the thread pool of the server j Composed of N binary groups < Q i ,T j > i, j =1, 2. Load balancing time of each node of server
Figure BDA0003878264430000082
Proportional to the length of the task scheduling queue and inversely proportional to the number of threads concurrently processing the task.
Figure BDA0003878264430000091
Figure BDA0003878264430000092
The idleness representing the scheduling of i tasks served by the server node j is inversely proportional to the load balancing time of the server node. α is a transmission attenuation coefficient, and is a constant related to a system such as a server switch and a controller.
E[ΔL i,j (t)]Is a mathematical expectation of the amount of server node load change.
Define 1 a real-time Load Probability (RLP): the workload state of the server is related to network bandwidth N, CPU computing resource C, memory size M and other factors, and the calculation formula for measuring RLP is shown as formula (2):
Figure BDA0003878264430000093
wherein a, b and c respectively represent the resource weights of the network bandwidth, the CPU resource and the memory resource in the whole service component, the subscript r represents the whole resource of the service component, and the subscript c represents various resources used currently.
Define 2 Task scheduling assignment Weight (TSW): the TSW is defined as the priority relation of the server to task scheduling distribution, and is in direct proportion to the idleness of the current node. The smaller the TSW, the more "discriminates" the corresponding traffic flow in resource scheduling.
Figure BDA0003878264430000094
In which ξ i Scheduling complexity, h, for tasks i,j And (t-1) is a load migration marking coefficient of the current node, if the node does not perform load migration operation at the last sampling moment, the coefficient defaults to 1, and the coefficient is 0.5 when load is migrated.
Define 3 Load migration indicator (Total Load Transform, TLT): the TLT is defined as the total amount of load migration between the server node and the idle thread to describe the total overhead of the system.
Figure BDA0003878264430000095
Wherein u is i,j (t) is the single load migration amount, and the server load balancing time obtained by combining the formulas (1) and (4) is:
Figure BDA0003878264430000096
to sum up, the thread/process scheduling of the server can establish an optimally controlled load migration model, on one hand, it is desirable that the load balancing time of each node of the server is as small as possible, and on the other hand, the change rate of the TLT is reduced as much as possible to reduce the system overhead, and the loss function is constructed as follows:
Figure BDA0003878264430000101
wherein E i,j Is a mathematical expectation. When Loss takes a minimum, the derivative with respect to U (t) is zero, as:
Figure BDA0003878264430000102
solving the formula (6), and obtaining the optimally controlled load migration quantity at the tth sampling moment as follows:
U * (t)=U(t-1)-E i,j {α/2W i,j (t)Q i T j } (8)
2) Dynamic load migration algorithm
When load migration is performed between server threads or processes, how to solve the problem that the load is migrated again due to overload of the server caused by sudden increase of tasks needs to be considered, so that a shock migration phenomenon is formed. Therefore, the Load Balance () algorithm is designed based on the Load migration model, the Load migration behavior can be effectively restrained, and the Load Balance oscillation can be avoided.
TABLE 2 dynamic Load migration Algorithm Load Balance (·)
Figure BDA0003878264430000103
Figure BDA0003878264430000111
The overall idea of the Load Balance (-) dynamic Load migration algorithm is shown in table 2, the flow chart of the algorithm is shown in fig. 3, and the algorithm inputs the task scheduling queue length Q at the current time t i (T) and the number of threads in the thread pool of the server T j (t), wherein i, j = {1, 2.., N }. Firstly, resource nodes are arranged and added into a set phi { desc } in a descending order according to RLP, and meanwhile, a load balance threshold u at the current moment is calculated thr (t) and providing that at the same time, a resource node only responds to a load migration request, and can be reconnected after the request is completedIs requested. And then determining the resource nodes capable of accepting the requests by adopting a greedy idea: (1) Calculating real-time load according to a formula (2), and selecting a node with the minimum idle degree as a load migration object; (2) And if a plurality of nodes with the same idle degree exist, calculating task scheduling distribution weight TSW of each node according to a formula (3), and selecting the node with higher weight as the first load migration object. Calculating single load migration u according to the resource state of the node i,j (t) when the single load migration exceeds the load balancing threshold u i,j And (t), removing the resource node from the set, and selecting the subsequent resource nodes in a forward delay manner to recalculate whether the task scheduling request is acceptable. Finally, calculating the load migration total amount U of the optimal control according to the loss function of the optimal control load migration model deduced by the formula (8) * (t) and marking the resource node, namely marking the load migration marking coefficient h i,j (t) is set to 0.5. And in the same way, the optimal load migration node at the subsequent sampling moment is iteratively selected, and the optimally controlled load migration quantity is calculated. Because the algorithm preferentially selects the node with the lower task scheduling distribution weight TSW as the load migration object, the probability that the resource node is selected again at the next sampling time is improved, so that the overall load migration frequency is reduced as much as possible.
(3) Dynamic elastic capacity expansion technology
The capacity expansion operation is divided into horizontal capacity expansion and vertical capacity expansion, wherein the horizontal capacity expansion can only copy configured resource nodes, the vertical capacity expansion mechanism is flexible and changeable, and the resource nodes can be re-established according to the resource requirements of tasks. For the tasks which cannot be met under the scheduling of the dynamic load migration technology, the invention designs a dynamic elastic capacity expansion technology to perform longitudinal capacity expansion on a server cluster, and node resources are added to complete the scheduling: and traversing the suspended task again, checking the resource requirement of the task, converting the suspended task into a ready state again when the node meets the requirement, rescheduling the suspended task to a newly added node, and reducing the load of the server cluster together with the transverse capacity expansion.
As shown in fig. 4, the dynamic elastic capacity expansion technical mechanism is that a monitor is provided to manage a capacity expansion process, and an elastic expansion module is responsible for specific execution of capacity expansion, and the specific flow is as follows:
(1) When the server cluster node can not meet the task Resource requirement, the task is converted into a suspended state, and simultaneously, the Resource B required by the node is transmitted to a global Resource control module by means of a heartbeat mechanism;
(2) The monitor monitors the cluster condition by subscribing heartbeat, receives information transmitted by the global Resource control module once every other heartbeat time, including Resource requirement Resource B in a suspended state, and then triggers longitudinal expansion, the elastic expansion module creates a new Resource node according to a dynamic elastic expansion algorithm Auto Scale (·), and stores the new Resource node into an expansion queue expansion Q, and the dynamic elastic expansion algorithm is shown in a table 3;
(3) After receiving an instruction of creating a new node, the container arrangement tool reads the node information from the capacity expansion queue, and creates the new node and adds the new node into the container cluster. When the new node is created, removing the corresponding node in the capacity expansion queue;
(4) And finally, the scheduler detects a newly added node, re-awakens the task in the suspended state, and judges whether the node meets the task resource requirement again: if the node meets the task resource requirement, the task is dispatched to the node, and if the node still does not meet the task resource requirement, the task is converted into the suspended state again.
TABLE 3 dynamic elastic Capacity expansion Algorithm Auto Scale (. Cndot.)
Figure BDA0003878264430000121
In the dynamic elastic capacity expansion algorithm Auto Scale (-), a node type is set according to the resource information of each suspended task, if the capacity expansion queue contains the node type, the next suspended task is continuously searched in an iterative manner, otherwise, the node type is added into the capacity expansion queue. Through the setting of the capacity expansion queue, the creation time of the nodes is reduced, and meanwhile, the situation that one suspended task creates excessive nodes is prevented.
Simulation example:
(1) Preparation of the Experimental Environment
The dynamic task scheduling system and method provided by the invention are used for carrying out field test in a large-scale spacecraft test simulation system, and the system is provided with two types of cloud computing virtual node clusters which are respectively as follows: 1) And (3) task scheduling central control cluster: the system comprises 3 servers, wherein a cloud computing node is virtualized by adopting a cloud platform building technology and is responsible for deploying a scheduling center for receiving a massive simulation task scheduling request of a user; 2) The actuator processes the cluster: the system consists of 5 servers and is responsible for deploying an actuator center and a database to perform efficient task processing and data storage, and a hardware deployment diagram of the system is shown in fig. 5. Based on the principle of autonomous control of domestic products, a basic environment is built by adopting a Loongson rack server TL621 cluster, a Galaxy kylin server version operating system V4.0 and a China general domestic database V7.0, and the detailed configuration parameters are shown in Table 2.
TABLE 4 software and hardware environmental parameter configuration table
Figure BDA0003878264430000131
(2) Analysis of Experimental results
The method aims at various space control simulation task scheduling requirements of spacecraft orbit maneuver deduction, stepping attitude adjustment, inter-satellite-to-earth communication, space operation and the like, the actual performance of the dynamic task scheduling system is verified through experiments, comparative analysis is respectively carried out on the two aspects of architecture delay and load balance performance, and experimental results and analysis are as follows.
1) Architectural response delay experiment
By simultaneously calculating 1000 space manipulation simulation tasks such as spacecraft orbit deduction, spacecraft on-orbit attitude calculation, satellite-ground and inter-satellite communication period calculation and the like, and comparing the response time delay of a dynamic task scheduling system and the current domestic and foreign mainstream open source micro-service architecture, the experimental result is shown in table 5, and it can be known that the overall time delay of the Xxl-joba as a lightweight architecture is shorter, the service response time delay of Spring Cloud is longer due to the complex architecture, the dynamic task scheduling system in the invention has better performance in different architectures, and the single task response time delay is averagely shortened by 0.10s, 0.25s and 0.92s compared with other three task scheduling architectures.
2) Load balance performance test
And selecting the following observation indexes for carrying out real-time load RLP calculation when testing load balance: (1) CPU usage; (2) memory usage; (3) hard disk transmission amount; and (4) network flow. And the load balancing time consumption is calculated by recording the observation indexes once every 0.2s from the beginning of the operation of the server, and calculating the maximum time interval that the load tends to be stable before and after task scheduling when different task scheduling is processed. As shown in fig. 6, the moments when the observation indicators (1) - (4) are initially in the equilibrium state are respectively 6.02s, 6.05s, 0.12s, and 0.10s, the moments when the respective indicators reach the equilibrium state again in the process of processing the interactive packet of the control object with 10M of 4000 cells are respectively 24.34s, 24.10s, 22.26s, and 26.15s, and the time intervals between the two equilibrium states are respectively 18.32s, 18.05s, 22.14s, and 26.05s, so the average load balancing time of the dynamic task scheduling system of the present invention is 21.56s.
Space manipulation simulation tasks such as spacecraft orbit deduction, spacecraft on-orbit attitude calculation, satellite-ground and inter-satellite communication period calculation are respectively processed, and load balancing time consumption results under the condition that the number of the tasks is increased are shown in fig. 7 (a): the load balancing time consumption is positively correlated with the task scheduling quantity, and the larger the complexity of a correlation algorithm of the spatial manipulation simulation task is, the larger the load balancing time consumption is. Specifically, one type of space manipulation simulation task is analyzed, and the real-time load state under the condition that the number of the spacecraft orbit deduction tasks is increased is shown in fig. 7 (b): the CPU utilization rate and the disk read-write rate of the system are in direct proportion to the task scheduling number, and the memory utilization rate is increased along with the task scheduling number until the memory utilization rate is saturated; meanwhile, in the 2000-order task scheduling processing, although the memory utilization rate tends to be the limit, due to the dynamic load migration technology based on the load migration model, the computing scheduling only obtains the computing task amount matched with the current effective load, and the rest computing tasks are queued in the waiting queue for processing, so that the problems of server crash and the like caused by computing scheduling blocking cannot occur. With the subsequent expectation that the dynamic task scheduling system can support tens of thousands, millions and other large-scale space control simulation task processing along with the upgrading and capacity expansion of software and hardware environments.
It should be noted that: the simulation model of the space manipulation is not limited to the spacecraft, but also comprises the simulation model calling in the aspect of the communication link of the ground application system. The system can also be applied to model calculation of all large-scale high-concurrency calls.
The innovation points are as follows:
(1) At present, the problems of complex structure, single task type, relatively slow service registration capability under massive business processing and the like exist in the domestic and foreign task scheduling system, and the fully asynchronous and distributed dynamic task scheduling system has the advantages of rapid task response calculation, high throughput stability and the like.
(2) In practical application, due to the performance difference of each computing node in a server cluster, a load imbalance phenomenon exists in a task scheduling system, and the current domestic and foreign load balancing technology has the limitations of large resource overhead, load migration period oscillation and the like caused by node backup.
Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and are not limited. Although the present invention has been described in detail with reference to the embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (8)

1. A dynamic task scheduling system for computing a space-oriented manipulation simulation model, the system comprising: the system comprises a dispatching center, a plurality of dispatcher nodes and a plurality of executor nodes; the dispatcher node and the executor node are respectively deployed on different server clusters, wherein,
the scheduling center is used for dynamically adjusting task scheduling based on a dynamic load migration technology according to task scheduling requirements of the space control simulation service, and allocating tasks to the registered scheduler nodes;
the scheduler node is used for producing and caching tasks, triggering the actuator center by adopting a Remote Procedure Call (RPC) technology, and dynamically and elastically expanding the capacity of the actuator node aiming at the tasks which cannot be met;
the actuator center is used for logging in the dispatching center in a heartbeat registration mode to realize remote communication between the actuator center and the dispatching center; the system is also used for carrying out specific calculation flow and calculation task decomposition on the tasks and triggering corresponding actuator nodes to realize high-concurrency execution of task scheduling;
and the actuator node is used for executing a corresponding space control simulation model calculation task.
2. The dynamic task scheduling system for the computation of the space manipulation-oriented simulation model according to claim 1, wherein the dynamic load migration technique is to establish an optimally controlled load migration model using a dynamic load migration algorithm; wherein the load migration model comprises:
the state information returned by the scheduler node j is scheduled by the task with the queue length Q i And the number T of idle threads in the thread pool of the scheduler j Form N binary groups < Q i ,T j >, i, j =1, 2., N, scheduler node load balancing time
Figure FDA0003878264400000011
The method is proportional to the length of a task scheduling queue and inversely proportional to the number of threads for concurrently processing the tasks, and meets the following formula:
Figure FDA0003878264400000012
Figure FDA0003878264400000013
wherein the content of the first and second substances,
Figure FDA0003878264400000014
representing the idleness of the scheduler node j serving the scheduling of i tasks, alpha is a transmission attenuation coefficient and is a constant related to a server-side switch and a controller, E [ Delta L ] i,j (t)]Is a mathematical expectation of the amount of scheduler node load variation.
3. The dynamic task scheduling system for the computation of the space-oriented manipulation simulation model according to claim 2, wherein the establishing of the optimally controlled load migration model specifically comprises:
defining a real-time load RLP metric L i,j (t) satisfies the following formula:
L i,j (t)=(aN c +bC c +cM c )/(N r +C r +M r )
the resource allocation method comprises the following steps that a, b and C respectively represent the occupation ratio coefficients of network bandwidth, CPU resources and memory resources in the overall resources of a server, subscript r represents the overall resources of the server, subscript C represents various resources used currently, N represents the network bandwidth, C represents CPU computing resources and M represents the size of the memory;
defining task scheduling distribution weight TSW index W i,j (t) satisfies the following formula:
W i,j (t)=h i,j (t-1)ξ i (L i,j (t)-L i,j (t-1))/Q i
wherein xi is i Scheduling complexity, h, for tasks i,j (t-1) is the load migration flag coefficient of the current scheduler node, if the scheduler node does not perform load migration operation at the last sampling moment, h i,j (t-1) defaults to 1, h when load has been migrated i,j (t-1) is 0.5;
defining the load migration TLT index U (t) to satisfy the following formula:
Figure FDA0003878264400000021
wherein u is i,j (t) is the single load migration;
obtaining scheduler node load balancing time
Figure FDA0003878264400000022
Comprises the following steps:
Figure FDA0003878264400000023
meeting scheduler node load balancing time
Figure FDA0003878264400000024
The load migration quantity U (t) is optimally controlled at the tth sampling moment by reducing the change rate of the TLT index U (t) as much as possible to reduce the resource adjustment frequency of the server * (t) is:
U * (t)=U(t-1)-E i,j {α/2W i,j (t)Q i T j }
wherein U (t-1) represents the load transfer TLT index at the t-1 th sampling moment, E i,j Is the mathematical expectation calculation.
4. The dynamic task scheduling system for spatial manipulation simulation model calculation according to claim 3, wherein the dynamic load migration algorithm specifically comprises:
inputting the length Q of the task scheduling queue at the current t moment i (T) and the number of threads in the scheduler thread pool T j (t);
The nodes of each scheduler are added into a set phi { desc } in a descending order according to RLP indexes, and meanwhile, a load balance threshold u at the current moment is calculated thr (t) and providing that at the same time, the scheduler node only responds to a load migration request and can accept the request again after the request is completed;
and traversing the optimal load migration node of each scheduler node at each sampling moment by adopting a greedy idea, performing iterative selection, and calculating the optimally controlled load migration quantity.
5. The dynamic task scheduling system for the calculation of the space manipulation-oriented simulation model according to claim 4, wherein the optimal load migration node at each sampling time of each scheduler node is traversed by adopting a greedy idea, iterative selection is performed, and optimally controlled load migration quantity is calculated; the method specifically comprises the following steps:
calculating TLT indexes, and selecting a scheduler node with the minimum idleness as a load migration object;
if a plurality of scheduler nodes with the same idleness exist, a single scheduler node TSW index is calculated, a scheduler node with higher weight is selected as a first load migration object, and a single load migration amount u is calculated according to the resource state of the scheduler node i,j (t) when the single load migration exceeds the load balancing threshold u i,j (t), removing the scheduler node from the set, and selecting a subsequent scheduler node in a forward-delay manner to recalculate whether the task scheduling request is acceptable;
calculating the load migration total U of the optimal control * (t) and marking the scheduler node, and marking the load migration marking coefficient h i,j (t) is set to 0.5.
6. The dynamic task scheduling system for computation of a spatial steering simulation model according to claim 1, further comprising: the system comprises a global resource control module, a monitor, an elastic expansion module and a container arrangement tool; wherein, the first and the second end of the pipe are connected with each other,
the global resource control module is used for receiving the suspension state of the tasks in the server cluster through a heartbeat mechanism;
the monitor is used for regularly monitoring the task suspension condition in the server cluster where the executor node is located by subscribing the heartbeat information of the global resource control module;
the elastic expansion module is used for creating a new actuator node by adopting a dynamic elastic expansion algorithm Auto Scale () according to the task suspension condition and storing the new actuator node into an expansion queue expansion Q, and is also used for sending an instruction for creating the new actuator node to the container arrangement tool;
and the container arrangement tool is used for reading new executor node information in the capacity expansion queue, creating an executor node and adding the executor node into the container cluster.
7. The dynamic task scheduling system for the computation of the spatial manipulation simulation model according to claim 6, wherein the processing procedure of the elastic scaling module specifically comprises:
and setting the type of the actuator node according to the resource information of each suspended task, if the capacity expansion queue expansion Q contains the type of the actuator node, continuing to iteratively search the next suspended task, and otherwise, adding the actuator node into the capacity expansion queue expansion Q.
8. The dynamic task scheduling system for computation of a spatial manipulation oriented simulation model of claim 7, wherein the processing of the scheduler node further comprises:
when the scheduler detects a newly added actuator node, the task in the suspended state is awakened again, and whether the actuator node meets the resource requirement of the task is judged again: if the executor node meets the task resource requirement, the task is dispatched to the executor node, and if the executor node still does not meet the task resource requirement, the task is converted into the suspended state again.
CN202211221165.2A 2022-10-08 2022-10-08 Dynamic task scheduling system for calculation of space control simulation model Pending CN115454649A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211221165.2A CN115454649A (en) 2022-10-08 2022-10-08 Dynamic task scheduling system for calculation of space control simulation model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211221165.2A CN115454649A (en) 2022-10-08 2022-10-08 Dynamic task scheduling system for calculation of space control simulation model

Publications (1)

Publication Number Publication Date
CN115454649A true CN115454649A (en) 2022-12-09

Family

ID=84309498

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211221165.2A Pending CN115454649A (en) 2022-10-08 2022-10-08 Dynamic task scheduling system for calculation of space control simulation model

Country Status (1)

Country Link
CN (1) CN115454649A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116149794A (en) * 2023-03-07 2023-05-23 北京创奇视界科技有限公司 Cloud simulation method based on container architecture
CN116795517A (en) * 2023-08-25 2023-09-22 中国人民解放军国防科技大学 Multi-strategy self-adaptive asynchronous task scheduling method, system and device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116149794A (en) * 2023-03-07 2023-05-23 北京创奇视界科技有限公司 Cloud simulation method based on container architecture
CN116149794B (en) * 2023-03-07 2023-09-08 北京创奇视界科技有限公司 Cloud simulation method based on container architecture
CN116795517A (en) * 2023-08-25 2023-09-22 中国人民解放军国防科技大学 Multi-strategy self-adaptive asynchronous task scheduling method, system and device
CN116795517B (en) * 2023-08-25 2023-11-07 中国人民解放军国防科技大学 Multi-strategy self-adaptive asynchronous task scheduling method, system and device

Similar Documents

Publication Publication Date Title
CN110619595B (en) Graph calculation optimization method based on interconnection of multiple FPGA accelerators
CN115454649A (en) Dynamic task scheduling system for calculation of space control simulation model
Liu et al. Agent-based load balancing on homogeneous minigrids: Macroscopic modeling and characterization
CN102073546B (en) Task-dynamic dispatching method under distributed computation mode in cloud computing environment
CN102932422B (en) Cloud environment task scheduling method based on improved ant colony algorithm
US8812639B2 (en) Job managing device, job managing method and job managing program
CN107273185B (en) Load balancing control method based on virtual machine
CN104657221A (en) Multi-queue peak-alternation scheduling model and multi-queue peak-alteration scheduling method based on task classification in cloud computing
CN113454614A (en) System and method for resource partitioning in distributed computing
CN114138486A (en) Containerized micro-service arranging method, system and medium for cloud edge heterogeneous environment
CN110086855B (en) Intelligent Spark task perception scheduling method based on ant colony algorithm
CN108182105A (en) Local dynamic station moving method and control system based on Docker container techniques
Cao et al. A parallel computing framework for large-scale air traffic flow optimization
Yu et al. Algorithms for divisible load scheduling of data-intensive applications
Ullah et al. LSTPD: least slack time-based preemptive deadline constraint scheduler for Hadoop clusters
Kumar et al. Graphsteal: Dynamic re-partitioning for efficient graph processing in heterogeneous clusters
Malik Dynamic load balancing in a network of workstations
Wang et al. On mapreduce scheduling in hadoop yarn on heterogeneous clusters
Meddeber et al. Tasks assignment for Grid computing
CN111506407B (en) Resource management and job scheduling method and system combining Pull mode and Push mode
Khalil et al. Survey of Apache Spark optimized job scheduling in Big Data
Hazra et al. Energy aware task scheduling algorithms in cloud environment: A survey
CN113590281A (en) Distributed parallel fuzzy test method and system based on dynamic centralized scheduling
Liu et al. Dynamic co-scheduling of distributed computation and replication
Lakhina et al. Threshold based load handling mechanism for multi-agent micro grid using cloud computing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination