CN115454649A

CN115454649A - Dynamic task scheduling system for calculation of space control simulation model

Info

Publication number: CN115454649A
Application number: CN202211221165.2A
Authority: CN
Inventors: 覃润楠; 谢文明; 惠建江; 彭晓东
Original assignee: National Space Science Center of CAS
Current assignee: National Space Science Center of CAS
Priority date: 2022-10-08
Filing date: 2022-10-08
Publication date: 2022-12-09

Abstract

The invention discloses a dynamic task scheduling system for space manipulation simulation model calculation, which comprises a scheduling center, a plurality of scheduler nodes and a plurality of executor nodes, wherein the scheduler nodes are connected with the executor nodes through a network; the scheduling center is used for dynamically adjusting task scheduling based on a dynamic load migration technology according to task scheduling requirements of the space control simulation service and distributing tasks to the registered scheduler nodes; the scheduler node is used for producing and caching tasks, triggering the actuator center by adopting a Remote Procedure Call (RPC) technology, and dynamically and elastically expanding the capacity of the actuator node aiming at the tasks which cannot be met; the actuator center is used for logging in the dispatching center in a heartbeat registration mode to realize remote communication between the actuator center and the dispatching center; the system is also used for decomposing the tasks and triggering corresponding actuator nodes; and the actuator nodes are used for executing corresponding space control simulation model calculation tasks.

Description

Dynamic task scheduling system for calculation of space control simulation model

Technical Field

The invention relates to the field of computer system space control simulation, in particular to a dynamic task scheduling system for space control simulation model calculation.

Background

Aiming at exploration research in the space field, various countries develop researches in the aspects of simulation of space control tasks such as full-space orbit maintenance, camera load detection, debris avoidance and obstacle clearing, on-orbit operation and the like in order to improve the performance of a spacecraft and reduce the development risk and cost of the spacecraft. The simulation of the space control task process needs to perform virtualized activities such as space control task deduction, evaluation, control and decision under a simulation environment, and relates to complex task processing of multidisciplinary coupled modeling such as aircraft dynamics, hydrodynamics and computer vision.

Nowadays, a server architecture carries out continuous upgrading and iteration from a single architecture to a distributed architecture to a micro-service, and when the micro-service brings strong flexible expansion application, related problems of cluster deployment, scheduling optimization and the like also exist. In order to optimize cluster resource scheduling, many distributed task scheduling systems are generated, and the bottleneck is how to optimally control server resources, that is, how to assign thread/process quotas of different service classes to each computing node. When an Elastic-Job task scheduling system developed by a group provides functions of task fragmentation, elastic capacity expansion and reduction and the like, but does not support dynamic management of timing tasks and timing tasks of workflow types; an XXL-JOB task scheduling system of the public commenting group provides light-weight task scheduling dynamic management, but the service registration capability is slow under the mass business processing; the TBSchedule task scheduling system introduced by the Alababa company has strong expandability, but the task type is single; the Spring Cloud Scheduler task scheduling system has a strong community function, but the framework is complex and has certain requirements on hardware resources.

In an actual space manipulation simulation task, due to the difference of the performance of each computing node in a server cluster, a plurality of task scheduling systems have a phenomenon of unbalanced load, which causes problems of reduced system processing speed, increased network delay and the like, for example: 1) The role of the task node resource is fixed or is single, and simultaneously, the node backup is excessive to cause a large amount of invalid cost of the resource; 2) When the load is transferred, a large number of tasks are transferred into the task nodes with lighter loads, so that the task nodes trigger load balance again, periodic oscillation is generated, and the performance of the server system is influenced.

Therefore, for the space manipulation simulation task, how to design a task scheduling system to achieve the purposes of optimizing resource use, minimizing system response time, maximizing model calculation efficiency and avoiding overload of a server becomes a problem to be solved urgently at present.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a dynamic task scheduling system for space manipulation simulation model calculation.

The method mainly aims at the challenge brought to the traditional service architecture by the high concurrent calling of the massive multi-source heterogeneous space manipulation simulation model in the large-scale space manipulation simulation test, and establishes the dynamic task scheduling system taking the load migration model as the core.

In order to achieve the above object, the present invention provides a dynamic task scheduling system for spatial manipulation simulation model calculation, which includes: the system comprises a dispatching center, a plurality of dispatcher nodes and a plurality of executor nodes; the dispatcher node and the executor node are respectively deployed on different server clusters, wherein,

the scheduling center is used for dynamically adjusting task scheduling based on a dynamic load migration technology according to task scheduling requirements of the space control simulation service, and allocating tasks to the registered scheduler nodes;

the scheduler node is used for producing and caching tasks, triggering the actuator center by adopting a Remote Procedure Call (RPC) technology, and performing dynamic elastic expansion on the actuator node aiming at the tasks which cannot be met;

the actuator center is used for logging in the dispatching center in a heartbeat registration mode to realize remote communication between the actuator center and the dispatching center; the system is also used for carrying out specific calculation flow and calculation task decomposition on the tasks and triggering corresponding actuator nodes to realize high-concurrency execution of task scheduling;

and the actuator node is used for executing a corresponding space control simulation model calculation task.

As an improvement of the above system, the dynamic load migration technique is to adopt a dynamic load migration algorithm to establish an optimally controlled load migration model; wherein the load migration model comprises:

the state information returned by the scheduler node j is scheduled by the task to the queue length Q _i And the number T of idle threads in the thread pool of the scheduler _j Form N binary groups < Q _i ,T _j >, i, j =1, 2., N, scheduler node load balancing time

The method is proportional to the length of a task scheduling queue and inversely proportional to the number of threads for concurrently processing the tasks, and meets the following formula:

wherein the content of the first and second substances,

the idleness of the scheduler node j serving the scheduling of i tasks is represented, alpha is a transmission attenuation coefficient and is a constant related to a service-side switch and a controller, and E [ Delta L ] _i,j (t)]Is a mathematical expectation of the scheduler node load variation.

As an improvement of the above system, the establishing of the optimally controlled load migration model specifically includes:

defining a real-time load RLP index L _i,j (t) satisfies the following formula:

L _i,j (t)＝(aN _c +bC _c +cM _c )/(N _r +C _r +M _r )

the resource allocation method comprises the following steps that a, b and C respectively represent the occupation ratio coefficients of network bandwidth, CPU resources and memory resources in the overall resources of a server, subscript r represents the overall resources of the server, subscript C represents various resources used currently, N represents the network bandwidth, C represents CPU computing resources and M represents the size of the memory;

defining task scheduling distribution weight TSW index W _i,j (t) satisfies the following formula:

W _i,j (t)＝h _i,j (t-1)ξ _i (L _i,j (t)-L _i,j (t-1))/Q _i

wherein ξ _i Scheduling complexity, h, for tasks _i,j (t-1) is the load migration marking coefficient of the current scheduler node, if the scheduler node does not carry out the load migration operation at the last sampling moment, h _i,j (t-1) defaults to 1, h when load has been migrated _i,j (t-1) is 0.5;

defining the load migration TLT index U (t) to satisfy the following formula:

wherein u is _i,j (t) is the single load migration;

obtaining scheduler node load balancing time

Comprises the following steps:

meeting scheduler node load balancing time

The load migration quantity U (t) is optimally controlled at the tth sampling moment by reducing the change rate of the TLT index U (t) as much as possible to reduce the resource adjustment frequency of the server ^* (t) is:

U ^* (t)＝U(t-1)-E _i,j {α/2W _i,j (t)Q _i T _j }

wherein U (t-1) represents the load migration TLT index at the t-1 th sampling moment, E _i,j Is the mathematical expectation calculation.

As an improvement of the foregoing system, the dynamic load migration algorithm specifically includes:

inputting the length Q of the task scheduling queue at the current t moment _i (T) and the number of threads in the scheduler thread pool T _j (t)；

The nodes of each scheduler are added into a set phi { desc } in a descending order according to RLP indexes, and meanwhile, a load balance threshold u at the current moment is calculated _thr (t) and providing that at the same time, the scheduler node only responds to a load migration request and can accept the request again after the request is completed;

and traversing the optimal load migration node of each scheduler node at each sampling moment by adopting a greedy idea, performing iterative selection, and calculating the optimally controlled load migration quantity.

As an improvement of the system, traversing the optimal load migration node of each scheduler node at each sampling moment by adopting a greedy thought, performing iterative selection, and calculating the optimally controlled load migration quantity; the method specifically comprises the following steps:

calculating TLT indexes, and selecting a scheduler node with the minimum idleness as a load migration object;

if a plurality of scheduler nodes with the same idleness exist, a single scheduler node TSW index is calculated, a scheduler node with higher weight is selected as a first load migration object, and a single load migration amount u is calculated according to the resource state of the scheduler node _i,j (t) when the single load migration exceeds the load balancing threshold u _i,j (t), removing the scheduler node from the set, and selecting the subsequent scheduler node to recalculate whether to recalculateThe task scheduling request can be accepted;

calculating the load migration total amount U of the optimal control ^* (t) marking the scheduler node, and marking the load migration marking coefficient h _i,j (t) is set to 0.5.

As an improvement of the above system, the system further comprises: the system comprises a global resource control module, a monitor, an elastic expansion module and a container arrangement tool; wherein the content of the first and second substances,

the global resource control module is used for receiving the suspension state of the tasks in the server cluster through a heartbeat mechanism;

the monitor is used for regularly monitoring the task suspension condition in the server cluster where the executor node is located by subscribing the heartbeat information of the global resource control module;

the elastic expansion module is used for creating a new actuator node by adopting a dynamic elastic expansion algorithm Auto Scale () according to the task suspension condition and storing the new actuator node into an expansion queue expansion Q, and is also used for sending an instruction for creating the new actuator node to the container arrangement tool;

and the container arrangement tool is used for reading new executor node information in the capacity expansion queue, creating an executor node and adding the executor node into the container cluster.

As an improvement of the above system, the processing procedure of the elastic expansion module specifically includes:

and setting the type of the actuator node according to the resource information of each suspended task, if the capacity expansion queue expansion Q comprises the type of the actuator node, continuing to iteratively search the next suspended task, and if not, adding the actuator node into the capacity expansion queue expansion Q.

As an improvement of the above system, the processing of the scheduler node further comprises:

when the scheduler detects a newly added actuator node, the task in the suspended state is awakened again, and whether the actuator node meets the resource requirement of the task is judged again: if the executor node meets the task resource requirement, the task is dispatched to the executor node, and if the executor node still does not meet the task resource requirement, the task is converted into the suspended state again.

Compared with the prior art, the invention has the advantages that:

1. the method breaks through the key technologies of dynamic task scheduling, dynamic load migration, dynamic elastic capacity expansion and the like, optimizes the resource allocation of the server side to improve the task processing capacity, and provides technical support for the calculation requirement of high-concurrency tasks in a large-scale space control simulation test;

2. the invention promotes the development of a task scheduling platform and a load balancing technology based on a micro-service technology, and solves the technical problems of slow service registration capability, large expenditure of server side resources and oscillation of load migration period under high concurrent task response in the prior art;

3. according to the invention, a dynamic task scheduling system taking a load migration model as a core is established to realize optimal server resource allocation, and a dynamic load migration algorithm is designed for avoiding load balancing oscillation, so that the load migration behavior is effectively restrained, the server stability is improved, and finally the massive task processing performance is improved;

4. the invention establishes a generalization platform system suitable for large-scale space control simulation tests, supports the activities of space control simulation deduction, evaluation, control, decision and the like relating to multidisciplinary coupling modeling and complex task processing, and has high application value.

Drawings

FIG. 1 is a technical route diagram of a dynamic task scheduling system for spatial manipulation simulation model computation according to the present invention;

FIG. 2 is a timing diagram of fully asynchronous distributed dynamic task scheduling;

FIG. 3 is a dynamic load migration algorithm flow description;

FIG. 4 is a description of a dynamic elastic capacity expansion technique;

FIG. 5 is a hardware environment deployment;

FIG. 6 is a real-time load status of a 40G manipulation object interaction data packet;

FIG. 7 (a) is a real-time load status for different magnitude task scheduling;

FIG. 7 (b) is a load balancing time consuming task scheduling of different magnitudes.

Detailed Description

The invention relates to a dynamic task scheduling system for space manipulation simulation model calculation, and the technical route is shown in figure 1:

the system comprises: the system comprises a scheduling center, a plurality of scheduler nodes and a plurality of actuator nodes; the dispatcher node and the executor node are respectively deployed on different server clusters, wherein,

the scheduling center is used for dynamically adjusting task scheduling based on a dynamic load migration technology according to task scheduling requirements of the space control simulation service, and distributing tasks to registered scheduler nodes;

the scheduler node is used for producing and caching tasks, triggering the actuator center by adopting a Remote Procedure Call (RPC) technology, and dynamically and elastically expanding the capacity of the actuator node aiming at the tasks which cannot be met;

The technical solution of the present invention will be described in detail below with reference to the accompanying drawings and examples.

Example 1

The embodiment of the invention provides a dynamic task scheduling system for space manipulation simulation model calculation, which mainly comprises three parts of task scheduling, remote execution and data management, and mainly relates to three key technologies in the dynamic task scheduling system: the dynamic task scheduling method comprises a full-asynchronous distributed dynamic task scheduling technology, a dynamic load migration technology based on a load migration model and a dynamic elastic capacity expansion technology.

Aiming at various space control simulation task scheduling requirements of spacecraft orbit maneuver deduction, stepping attitude adjustment, inter-satellite-to-earth communication, space operation and the like, the dynamic task scheduling system performs high concurrency calculation of massive simulation tasks through a full-asynchronous distributed dynamic task scheduling technology: firstly, each scheduler node is dynamically activated by a scheduling center, then, each scheduler node issues a message to an executor center by adopting a Remote Procedure Call (RPC) technology, the executor center disassembles the message into corresponding calculation tasks aiming at a calculation flow, distributes the executor nodes to each calculation task for calculation, and finally, synchronizes the calculation result to a relational database and a non-relational database to realize the functions of quick caching, storage, retrieval and management of data. The dispatching center realizes self-adaptive dispatcher node resource allocation through a dynamic load migration technology based on a load migration model, and if the current server cluster resource does not meet the resource consumption requirement of task allocation, longitudinal expansion is carried out on the server cluster through a dynamic elastic expansion technology, and node resources are increased to complete a calculation task.

(1) Full-asynchronous distributed dynamic task scheduling technology

The specific work time sequence flow of the fully asynchronous distributed dynamic task scheduling technology is shown in fig. 2, when a new task scheduling requirement exists outside, the task is distributed to the scheduling center through a dynamic load migration algorithm, the scheduling center does not participate in the execution of the calculation task, the task is distributed to the registered scheduler node again, and the scheduler node performs the production of the task and writes the task into a consumption queue for caching at a queue interaction layer; then, according to the task allocation result of the scheduler node, stabilizing each task in the consumption queue according to the writing sequence, triggering and executing an actuator center in a computing layer by adopting a Remote Procedure Call (RPC) technology, wherein the actuator center can perform distributed cluster deployment in a server cluster, and logging in a scheduling center in a heartbeat registration mode to realize remote communication between the tasks; and finally, the actuator center performs specific calculation flow and calculation task decomposition on the task, and triggers the corresponding actuator node to realize high-concurrency execution of task scheduling.

The dynamic task scheduling system takes a scheduling center as a control center, triggers the scheduling center to distribute jobs by quickly analyzing related calculation requirements of users, carries out actual processing work of the jobs by an actuator center, supports registration or removal of corresponding actuators to carry out dynamic elastic expansion, and realizes full-asynchronous design and distributed deployment of task scheduling.

The specific work time sequence flow of the fully asynchronous distributed dynamic task scheduling technology is shown in fig. 2, when a new task scheduling requirement exists outside, the task is distributed to the scheduling center through a dynamic load migration algorithm, the scheduling center does not participate in the execution of the calculation task, the task is distributed to the registered scheduler node again, and the scheduler node performs the production of the task and writes the task into a consumption queue for caching at a queue interaction layer; then, according to the task allocation result of the scheduler node, stabilizing each task in the consumption queue according to the writing sequence, triggering and executing an actuator center in a calculation layer by adopting a Remote Procedure Call (RPC) technology, wherein the actuator center can perform distributed cluster deployment in a server cluster, and logging in a scheduling center in a heartbeat registration mode to realize remote communication between the tasks and the actuator center; and finally, the actuator center performs specific calculation flow and calculation task decomposition on the task, and triggers the corresponding actuator node to realize high-concurrency execution of task scheduling.

The core of the fully-asynchronous distributed dynamic Task scheduling technology lies in how to perform optimal Task allocation on scheduler nodes, so the invention provides a dynamic Task scheduling algorithm Task Schedule (·), that is, a node set meeting resource requirements is obtained by traversing all scheduler nodes in a server cluster, dynamic adjustment of Task scheduling is performed by adopting a dynamic load migration technology based on a load migration model, for tasks which cannot be met under the scheduling strategy, longitudinal capacity expansion is performed on a server cluster by adopting a dynamic elastic technology, node resources are increased to complete scheduling tasks, and the overall idea is shown in table 1.

TABLE 1 dynamic Task scheduling Algorithm Task Schedule (·)

Specifically, resource initialization is performed on the server cluster first, and then a task scheduling request submitted by a user enters a scheduling queue Q _i I = {1, 2.. Multidot.n }, all scheduler nodes in the server cluster are traversed, and the resource requirement Q carried by the task is determined according to the resource requirement Q carried by the task _i Dem, the local scheduler firstly judges whether the resource localnode. Res of the node is satisfied, if so, the task is scheduled to the node, if not, the dynamic Load migration algorithm Load Balance () is executed, a node set which satisfies the resource requirement is obtained by traversing all the nodes in the cluster, and the optimal task scheduling distribution result is obtained on the node set according to the dynamic Load balancing algorithm. If the resource of any node can not meet the resource requirement of the task, the task is suspended Q at the moment _i And pending, simultaneously carrying out capacity expansion operation on the server cluster, and executing the Auto Scale (-) of the dynamic elastic capacity expansion algorithm. The capacity expansion operation is divided into horizontal capacity expansion and vertical capacity expansion, wherein the horizontal capacity expansion can only copy configured resource nodes, the vertical capacity expansion mechanism is flexible and changeable, and the resource nodes can be re-established according to the resource requirements of tasks. And traversing the suspended task again after a new node is added in the server cluster, checking the resource requirement of the task, converting the suspended task into a ready state again when the node meets the requirement, and rescheduling the suspended task to the newly added node.

(2) Dynamic load migration technology based on load migration model

1) Load migration model definition

The bottleneck of the fully-asynchronous distributed dynamic task scheduling technology lies in how to optimally control server resources, namely how to allocate different computing nodes to each computing task, and the load balancing strategy ensures that each computing node is uniformly scheduled, so that the nodes are prevented from being in an overload or idle state. Most current load balancing strategies are mainly based on the assumption that the client access amount and the service response time are exponentially distributed, but the space manipulation simulation task comprises the synchronous calling of large-scale computing service and the requirements of high-frequency data interaction and read-write operation, and the service response time is difficult to perform fitting calculation. In order to accurately evaluate the load condition of each node on a server and further efficiently distribute new task requests, and meanwhile, to avoid the problems of server overload caused by rapid increase of calculation tasks or server dormancy caused by rapid decrease of the tasks and the like, the invention establishes a load migration model and realizes optimal node resource distribution.

The state information returned by the server node j is the length Q of the task scheduling queue _i And the number T of idle threads in the thread pool of the server _j Composed of N binary groups < Q _i ,T _j > i, j =1, 2. Load balancing time of each node of server

Proportional to the length of the task scheduling queue and inversely proportional to the number of threads concurrently processing the task.

The idleness representing the scheduling of i tasks served by the server node j is inversely proportional to the load balancing time of the server node. α is a transmission attenuation coefficient, and is a constant related to a system such as a server switch and a controller.

E[ΔL _i,j (t)]Is a mathematical expectation of the amount of server node load change.

Define 1 a real-time Load Probability (RLP): the workload state of the server is related to network bandwidth N, CPU computing resource C, memory size M and other factors, and the calculation formula for measuring RLP is shown as formula (2):

wherein a, b and c respectively represent the resource weights of the network bandwidth, the CPU resource and the memory resource in the whole service component, the subscript r represents the whole resource of the service component, and the subscript c represents various resources used currently.

Define 2 Task scheduling assignment Weight (TSW): the TSW is defined as the priority relation of the server to task scheduling distribution, and is in direct proportion to the idleness of the current node. The smaller the TSW, the more "discriminates" the corresponding traffic flow in resource scheduling.

In which ξ _i Scheduling complexity, h, for tasks _i,j And (t-1) is a load migration marking coefficient of the current node, if the node does not perform load migration operation at the last sampling moment, the coefficient defaults to 1, and the coefficient is 0.5 when load is migrated.

Define 3 Load migration indicator (Total Load Transform, TLT): the TLT is defined as the total amount of load migration between the server node and the idle thread to describe the total overhead of the system.

Wherein u is _i,j (t) is the single load migration amount, and the server load balancing time obtained by combining the formulas (1) and (4) is:

to sum up, the thread/process scheduling of the server can establish an optimally controlled load migration model, on one hand, it is desirable that the load balancing time of each node of the server is as small as possible, and on the other hand, the change rate of the TLT is reduced as much as possible to reduce the system overhead, and the loss function is constructed as follows:

wherein E _i,j Is a mathematical expectation. When Loss takes a minimum, the derivative with respect to U (t) is zero, as:

solving the formula (6), and obtaining the optimally controlled load migration quantity at the tth sampling moment as follows:

U ^* (t)＝U(t-1)-E _i,j {α/2W _i,j (t)Q _i T _j } (8)

2) Dynamic load migration algorithm

When load migration is performed between server threads or processes, how to solve the problem that the load is migrated again due to overload of the server caused by sudden increase of tasks needs to be considered, so that a shock migration phenomenon is formed. Therefore, the Load Balance () algorithm is designed based on the Load migration model, the Load migration behavior can be effectively restrained, and the Load Balance oscillation can be avoided.

TABLE 2 dynamic Load migration Algorithm Load Balance (·)

The overall idea of the Load Balance (-) dynamic Load migration algorithm is shown in table 2, the flow chart of the algorithm is shown in fig. 3, and the algorithm inputs the task scheduling queue length Q at the current time t _i (T) and the number of threads in the thread pool of the server T _j (t), wherein i, j = {1, 2.., N }. Firstly, resource nodes are arranged and added into a set phi { desc } in a descending order according to RLP, and meanwhile, a load balance threshold u at the current moment is calculated _thr (t) and providing that at the same time, a resource node only responds to a load migration request, and can be reconnected after the request is completedIs requested. And then determining the resource nodes capable of accepting the requests by adopting a greedy idea: (1) Calculating real-time load according to a formula (2), and selecting a node with the minimum idle degree as a load migration object; (2) And if a plurality of nodes with the same idle degree exist, calculating task scheduling distribution weight TSW of each node according to a formula (3), and selecting the node with higher weight as the first load migration object. Calculating single load migration u according to the resource state of the node _i,j (t) when the single load migration exceeds the load balancing threshold u _i,j And (t), removing the resource node from the set, and selecting the subsequent resource nodes in a forward delay manner to recalculate whether the task scheduling request is acceptable. Finally, calculating the load migration total amount U of the optimal control according to the loss function of the optimal control load migration model deduced by the formula (8) ^* (t) and marking the resource node, namely marking the load migration marking coefficient h _i,j (t) is set to 0.5. And in the same way, the optimal load migration node at the subsequent sampling moment is iteratively selected, and the optimally controlled load migration quantity is calculated. Because the algorithm preferentially selects the node with the lower task scheduling distribution weight TSW as the load migration object, the probability that the resource node is selected again at the next sampling time is improved, so that the overall load migration frequency is reduced as much as possible.

(3) Dynamic elastic capacity expansion technology

The capacity expansion operation is divided into horizontal capacity expansion and vertical capacity expansion, wherein the horizontal capacity expansion can only copy configured resource nodes, the vertical capacity expansion mechanism is flexible and changeable, and the resource nodes can be re-established according to the resource requirements of tasks. For the tasks which cannot be met under the scheduling of the dynamic load migration technology, the invention designs a dynamic elastic capacity expansion technology to perform longitudinal capacity expansion on a server cluster, and node resources are added to complete the scheduling: and traversing the suspended task again, checking the resource requirement of the task, converting the suspended task into a ready state again when the node meets the requirement, rescheduling the suspended task to a newly added node, and reducing the load of the server cluster together with the transverse capacity expansion.

As shown in fig. 4, the dynamic elastic capacity expansion technical mechanism is that a monitor is provided to manage a capacity expansion process, and an elastic expansion module is responsible for specific execution of capacity expansion, and the specific flow is as follows:

(1) When the server cluster node can not meet the task Resource requirement, the task is converted into a suspended state, and simultaneously, the Resource B required by the node is transmitted to a global Resource control module by means of a heartbeat mechanism;

(2) The monitor monitors the cluster condition by subscribing heartbeat, receives information transmitted by the global Resource control module once every other heartbeat time, including Resource requirement Resource B in a suspended state, and then triggers longitudinal expansion, the elastic expansion module creates a new Resource node according to a dynamic elastic expansion algorithm Auto Scale (·), and stores the new Resource node into an expansion queue expansion Q, and the dynamic elastic expansion algorithm is shown in a table 3;

(3) After receiving an instruction of creating a new node, the container arrangement tool reads the node information from the capacity expansion queue, and creates the new node and adds the new node into the container cluster. When the new node is created, removing the corresponding node in the capacity expansion queue;

(4) And finally, the scheduler detects a newly added node, re-awakens the task in the suspended state, and judges whether the node meets the task resource requirement again: if the node meets the task resource requirement, the task is dispatched to the node, and if the node still does not meet the task resource requirement, the task is converted into the suspended state again.

TABLE 3 dynamic elastic Capacity expansion Algorithm Auto Scale (. Cndot.)

In the dynamic elastic capacity expansion algorithm Auto Scale (-), a node type is set according to the resource information of each suspended task, if the capacity expansion queue contains the node type, the next suspended task is continuously searched in an iterative manner, otherwise, the node type is added into the capacity expansion queue. Through the setting of the capacity expansion queue, the creation time of the nodes is reduced, and meanwhile, the situation that one suspended task creates excessive nodes is prevented.

Simulation example:

(1) Preparation of the Experimental Environment

The dynamic task scheduling system and method provided by the invention are used for carrying out field test in a large-scale spacecraft test simulation system, and the system is provided with two types of cloud computing virtual node clusters which are respectively as follows: 1) And (3) task scheduling central control cluster: the system comprises 3 servers, wherein a cloud computing node is virtualized by adopting a cloud platform building technology and is responsible for deploying a scheduling center for receiving a massive simulation task scheduling request of a user; 2) The actuator processes the cluster: the system consists of 5 servers and is responsible for deploying an actuator center and a database to perform efficient task processing and data storage, and a hardware deployment diagram of the system is shown in fig. 5. Based on the principle of autonomous control of domestic products, a basic environment is built by adopting a Loongson rack server TL621 cluster, a Galaxy kylin server version operating system V4.0 and a China general domestic database V7.0, and the detailed configuration parameters are shown in Table 2.

TABLE 4 software and hardware environmental parameter configuration table

(2) Analysis of Experimental results

The method aims at various space control simulation task scheduling requirements of spacecraft orbit maneuver deduction, stepping attitude adjustment, inter-satellite-to-earth communication, space operation and the like, the actual performance of the dynamic task scheduling system is verified through experiments, comparative analysis is respectively carried out on the two aspects of architecture delay and load balance performance, and experimental results and analysis are as follows.

1) Architectural response delay experiment

By simultaneously calculating 1000 space manipulation simulation tasks such as spacecraft orbit deduction, spacecraft on-orbit attitude calculation, satellite-ground and inter-satellite communication period calculation and the like, and comparing the response time delay of a dynamic task scheduling system and the current domestic and foreign mainstream open source micro-service architecture, the experimental result is shown in table 5, and it can be known that the overall time delay of the Xxl-joba as a lightweight architecture is shorter, the service response time delay of Spring Cloud is longer due to the complex architecture, the dynamic task scheduling system in the invention has better performance in different architectures, and the single task response time delay is averagely shortened by 0.10s, 0.25s and 0.92s compared with other three task scheduling architectures.

2) Load balance performance test

And selecting the following observation indexes for carrying out real-time load RLP calculation when testing load balance: (1) CPU usage; (2) memory usage; (3) hard disk transmission amount; and (4) network flow. And the load balancing time consumption is calculated by recording the observation indexes once every 0.2s from the beginning of the operation of the server, and calculating the maximum time interval that the load tends to be stable before and after task scheduling when different task scheduling is processed. As shown in fig. 6, the moments when the observation indicators (1) - (4) are initially in the equilibrium state are respectively 6.02s, 6.05s, 0.12s, and 0.10s, the moments when the respective indicators reach the equilibrium state again in the process of processing the interactive packet of the control object with 10M of 4000 cells are respectively 24.34s, 24.10s, 22.26s, and 26.15s, and the time intervals between the two equilibrium states are respectively 18.32s, 18.05s, 22.14s, and 26.05s, so the average load balancing time of the dynamic task scheduling system of the present invention is 21.56s.

Space manipulation simulation tasks such as spacecraft orbit deduction, spacecraft on-orbit attitude calculation, satellite-ground and inter-satellite communication period calculation are respectively processed, and load balancing time consumption results under the condition that the number of the tasks is increased are shown in fig. 7 (a): the load balancing time consumption is positively correlated with the task scheduling quantity, and the larger the complexity of a correlation algorithm of the spatial manipulation simulation task is, the larger the load balancing time consumption is. Specifically, one type of space manipulation simulation task is analyzed, and the real-time load state under the condition that the number of the spacecraft orbit deduction tasks is increased is shown in fig. 7 (b): the CPU utilization rate and the disk read-write rate of the system are in direct proportion to the task scheduling number, and the memory utilization rate is increased along with the task scheduling number until the memory utilization rate is saturated; meanwhile, in the 2000-order task scheduling processing, although the memory utilization rate tends to be the limit, due to the dynamic load migration technology based on the load migration model, the computing scheduling only obtains the computing task amount matched with the current effective load, and the rest computing tasks are queued in the waiting queue for processing, so that the problems of server crash and the like caused by computing scheduling blocking cannot occur. With the subsequent expectation that the dynamic task scheduling system can support tens of thousands, millions and other large-scale space control simulation task processing along with the upgrading and capacity expansion of software and hardware environments.

It should be noted that: the simulation model of the space manipulation is not limited to the spacecraft, but also comprises the simulation model calling in the aspect of the communication link of the ground application system. The system can also be applied to model calculation of all large-scale high-concurrency calls.

The innovation points are as follows:

(1) At present, the problems of complex structure, single task type, relatively slow service registration capability under massive business processing and the like exist in the domestic and foreign task scheduling system, and the fully asynchronous and distributed dynamic task scheduling system has the advantages of rapid task response calculation, high throughput stability and the like.

(2) In practical application, due to the performance difference of each computing node in a server cluster, a load imbalance phenomenon exists in a task scheduling system, and the current domestic and foreign load balancing technology has the limitations of large resource overhead, load migration period oscillation and the like caused by node backup.

Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and are not limited. Although the present invention has been described in detail with reference to the embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A dynamic task scheduling system for computing a space-oriented manipulation simulation model, the system comprising: the system comprises a dispatching center, a plurality of dispatcher nodes and a plurality of executor nodes; the dispatcher node and the executor node are respectively deployed on different server clusters, wherein,

2. The dynamic task scheduling system for the computation of the space manipulation-oriented simulation model according to claim 1, wherein the dynamic load migration technique is to establish an optimally controlled load migration model using a dynamic load migration algorithm; wherein the load migration model comprises:

the state information returned by the scheduler node j is scheduled by the task with the queue length Q _i And the number T of idle threads in the thread pool of the scheduler _j Form N binary groups < Q _i ,T _j >, i, j =1, 2., N, scheduler node load balancing time

wherein the content of the first and second substances,

representing the idleness of the scheduler node j serving the scheduling of i tasks, alpha is a transmission attenuation coefficient and is a constant related to a server-side switch and a controller, E [ Delta L ] _i,j (t)]Is a mathematical expectation of the amount of scheduler node load variation.

3. The dynamic task scheduling system for the computation of the space-oriented manipulation simulation model according to claim 2, wherein the establishing of the optimally controlled load migration model specifically comprises:

defining a real-time load RLP metric L _i,j (t) satisfies the following formula:

L _i,j (t)＝(aN _c +bC _c +cM _c )/(N _r +C _r +M _r )

W _i,j (t)＝h _i,j (t-1)ξ _i (L _i,j (t)-L _i,j (t-1))/Q _i

wherein xi is _i Scheduling complexity, h, for tasks _i,j (t-1) is the load migration flag coefficient of the current scheduler node, if the scheduler node does not perform load migration operation at the last sampling moment, h _i,j (t-1) defaults to 1, h when load has been migrated _i,j (t-1) is 0.5;

defining the load migration TLT index U (t) to satisfy the following formula:

wherein u is _i,j (t) is the single load migration;

obtaining scheduler node load balancing time

Comprises the following steps:

meeting scheduler node load balancing time

U ^* (t)＝U(t-1)-E _i,j {α/2W _i,j (t)Q _i T _j }

wherein U (t-1) represents the load transfer TLT index at the t-1 th sampling moment, E _i,j Is the mathematical expectation calculation.

4. The dynamic task scheduling system for spatial manipulation simulation model calculation according to claim 3, wherein the dynamic load migration algorithm specifically comprises:

5. The dynamic task scheduling system for the calculation of the space manipulation-oriented simulation model according to claim 4, wherein the optimal load migration node at each sampling time of each scheduler node is traversed by adopting a greedy idea, iterative selection is performed, and optimally controlled load migration quantity is calculated; the method specifically comprises the following steps:

if a plurality of scheduler nodes with the same idleness exist, a single scheduler node TSW index is calculated, a scheduler node with higher weight is selected as a first load migration object, and a single load migration amount u is calculated according to the resource state of the scheduler node _i,j (t) when the single load migration exceeds the load balancing threshold u _i,j (t), removing the scheduler node from the set, and selecting a subsequent scheduler node in a forward-delay manner to recalculate whether the task scheduling request is acceptable;

calculating the load migration total U of the optimal control ^* (t) and marking the scheduler node, and marking the load migration marking coefficient h _i,j (t) is set to 0.5.

6. The dynamic task scheduling system for computation of a spatial steering simulation model according to claim 1, further comprising: the system comprises a global resource control module, a monitor, an elastic expansion module and a container arrangement tool; wherein, the first and the second end of the pipe are connected with each other,

7. The dynamic task scheduling system for the computation of the spatial manipulation simulation model according to claim 6, wherein the processing procedure of the elastic scaling module specifically comprises:

and setting the type of the actuator node according to the resource information of each suspended task, if the capacity expansion queue expansion Q contains the type of the actuator node, continuing to iteratively search the next suspended task, and otherwise, adding the actuator node into the capacity expansion queue expansion Q.

8. The dynamic task scheduling system for computation of a spatial manipulation oriented simulation model of claim 7, wherein the processing of the scheduler node further comprises: