CN113835896B

CN113835896B - Iterative computation-oriented parallelism dynamic adjustment method in Gaia system

Info

Publication number: CN113835896B
Application number: CN202111149214.1A
Authority: CN
Inventors: 季航旭; 韦刘国; 赵宇海; 王国仁; 吴刚; 李博扬
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2021-09-29
Filing date: 2021-09-29
Publication date: 2024-03-22
Anticipated expiration: 2041-09-29
Also published as: CN113835896A

Abstract

The invention provides a parallelism dynamic adjustment method for iterative computation in a Gaia system, and relates to the technical fields of distributed big data computing systems and iterative computation. The method does not need to predict the resources required by the job in advance, but dynamically adjusts the iteration resources in the execution process of the job. If the Slot resources are insufficient or the occupation is too high in the process of executing the job, corresponding Slot resource expansion is carried out according to a resource utilization rate target preset by a user, so that iteration resources required by the job are met. If the Slot resource waste occurs in the process of executing the job, the corresponding Slot resource is contracted according to the resource utilization rate target preset by the user, so that the number of the Slot resources occupied by the iterative job is correspondingly reduced. The parallelism dynamic adjustment method does not need to execute similar jobs before executing the jobs, does not need to execute special short examples of the jobs, and does not need to additionally and excessively predict time.

Description

Iterative computation-oriented parallelism dynamic adjustment method in Gaia system

Technical Field

The invention relates to the technical field of distributed big data computing systems and iterative computation, in particular to a parallelism dynamic adjustment method facing iterative computation in a Gaia system.

Background

Gaia is a new generation of high-timeliness and extensible big data computing system which is mixed and coexistent in a multi-computing model. The novel big data computing system has a full-period multi-scale optimization technology and a unified computing engine aiming at batch flow mixing tasks. The existing big data computing system simulates the behavior of another type of framework by depending on a self computing engine, or defines a set of universal interfaces to shield the difference of the computing engines at the bottom layer, and has weaker support for batch flow fusion. At the same time, it is mostly at a specific time or level of execution and is not optimized for high complexity tasks. In view of the above problems, a high-performance batch fusion big data computing engine based on a unified computing engine and full-period multi-scale optimization is innovatively developed. The engine provides unified expression logic support for batch flow fusion processing, and realizes real fusion of batch and flow processing by integrating a calculation model, a data model, a transformation model and an action model of batch flow processing through unified expression modeling. Aiming at the characteristics of diversity, durability, iteration and the like of the operation, an optimization strategy for multi-operation, multi-task, iterative computation, persistent computation and the like is provided, and the optimization pertinence is stronger. Meanwhile, full-cycle optimization before and during execution is provided and is subdivided into a plurality of scales such as a job level, a task level, a conversion level and the like so as to realize extremely fast response and mass throughput.

Iterative computation is one of the most common computation models in data processing, and is widely applied to the fields of big data machine learning, big graph data computation and the like. For example, the webpage ranking algorithm PageRank determines webpage weights by iteratively calculating massive hyperlink relationships in the Internet; in the field of community discovery, a community discovery algorithm divides different communities by using continuous iteration; in the field of machine learning, various clustering algorithms such as a K-Means algorithm and a DBSCAN algorithm are typical algorithms that continuously approach an optimal solution using iterative computation. As a common calculation model, iterative calculation includes an intermediate calculation process of first giving an initial value, then calculating an intermediate result obtained by the initial value with a given algorithm or formula, and repeatedly calculating the intermediate result as an input parameter, and a calculation result obtained after a given condition is satisfied.

Since the number of iterations in iterative computation is often not a small number, the job containing iterative computation is a time-consuming job. Especially in the case of large data volumes, the computation of the job is more time consuming and the portion containing the iterative computation takes up most of the execution time of the entire job. Therefore, in order to obtain the calculation result more quickly, the job containing the iterative calculation is more implemented in the big data distributed calculation system (such as Hadoop, spark, gaia), and the expansion of the physical nodes is utilized to achieve the faster job execution efficiency, which also promotes researchers to more aim at researching the iterative optimization technology in the distributed calculation system to continuously reduce the running time of the iterative calculation.

Iterative computation is characterized by the same computational logic for each iteration step, but different input data, which can result in different loads on the distributed clusters for each iteration step. The basic unit of computing resources in Gaia is the task Slot (Slot for short) in the task executor (TaskExecutor). The number of slots represents the parallel processing capability of the Gaia cluster, and the resource manager in Gaia mainly allocates and manages Slot resources. Slot resources in the existing Gaia system cannot be dynamically adjusted, so that two problems can occur in the process of executing iterative tasks. Firstly, parallelism setting is far insufficient to meet the requirement of an iteration task, and under the condition, the high load of the iteration task can cause overlarge pressure of each Slot and overlarge CPU pressure, so that the program processing speed is affected or the conditions such as memory overflow are caused, and the operation efficiency is low and even execution fails. Another problem is that too high a parallelism setting results in very low utilization of system resources, which can result in excessive waste and idling of resources, which can result in long periods of idling of resources in large-scale data processing.

In summary, a dynamically adjustable Slot resource allocation mechanism facing iterative computation is needed, and corresponding resource optimization is performed by combining system resource statistical information according to the characteristic of iterative computation of an iterative algorithm.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a dynamic adjustment method for parallelism of iterative computation in a Gaia system, so that various iterative algorithms running in the Gaia system have higher execution efficiency.

In order to solve the technical problems, the invention adopts the following technical scheme: a parallelism dynamic adjustment method facing iterative computation in a Gaia system comprises the following steps:

presetting a target resource utilization rate according to the use condition of the distributed cluster;

collecting statistical information data for parallelism adjustment;

dynamically adjusting parallelism resources when the iterative operation is operated, and calculating parallelism;

saving the iteration state;

the method specifically comprises the following steps:

step 1, presetting a target resource utilization rate by a user according to the service condition of a distributed cluster through a parallelism adjustment interface;

step 2, collecting heartbeat information based on a heartbeat mechanism; in the process that a heartbeat monitor TaskExecutor of the Gaia system reports the heartbeat to a job manager JobMaster, reporting of non-connection state statistical information is added in Payload load information in the periodic timing heartbeat process; the non-connection state statistics include: 1) Resource utilization information when the job currently containing iterative computation runs: counting the number and total amount of resources such as the number of cores used by the CPU of the distributed cluster, the use percentage, the memory occupation percentage, the disk occupation percentage and the network bandwidth; 2) Features of the dataset entered when the Gaia system performs a job: including the size of the dataset, the number of elements, and the distribution of data key values;

the heartbeat mechanism collects statistical information data for adjusting parallelism;

the heartbeat reporting process of the TaskExecutor to the JobMaster comprises an initialization stage, a registration stage and a heartbeat stage;

the initialization stage is as follows: the JobMaster calls the createHeartBeatManagerSender () method in the HeartBeatServices class at startup; the method is used for creating a taskExecutorHeartBeatManager object which is responsible for carrying out heartbeat management on all the taskExecutorconnected with the object, periodically starting a timer, periodically scanning the managed object and then sending a heartbeat request to the object; the method comprises the steps that a job executor creates a jobMasterHeartBeatManager object to manage heartbeat information of a JobMaster;

the registration stage is as follows: after resource manager allocates task executor to corresponding JobMaster, the allocated task executor will actively register information to JobMaster, which calls the register task executor () method of JobMaster through remote procedure call RPC; after receiving remote RPC call, jobMaster firstly executes local method to accept the registration of TaskExecutor, then adds the TaskExecutor into the monitoring target through monitor target () method of TaskExecutorHeartBeatManager; finally, packaging the monitoring object as a HeartBeatMonitor, and starting a heartbeat timer with timeout time; after the TaskExecutor is registered, the JobMaster sends a registration success message to the TaskExecutor, and the JobMaster is subjected to heartbeat monitoring in the same manner at the TaskExecutor end;

the heartbeat phase is as follows: the heartbeat detection process between JobMaster and TaskExecutor is bi-directional; the TaskExecutor collects and reports load information to the JobMaster through a heartbeat mechanism; firstly, a JobMaster can regularly and remotely call a HeartBeatFromTaskExecutor () method of a TaskExecutor through an RPC, after the TaskExecutor receives the RPC request, call a reportHeartBeat () method in a corresponding HeartBeatMonitor class, then the TaskExecutor actively calls the reportPayload () method, and the collected load information is sent to the JobMaster;

step 3, dynamically adjusting parallelism based on a parallelism mechanism facing iterative computation by taking a target resource utilization rate preset by a user as a target;

based on the iteration characteristic of the data stream operation, adjusting parallelism resources according to a parallelism mechanism when the iteration operation is operated; dynamically adjusting computing system resources in a parallelism mode according to system statistical information collected in an iteration process; and (3) carrying out calculation of the parallelism of the next iteration according to a formula 1:

wherein n is the number of statistical information categories; p (P) _i For the parallelism of the current iteration of the iterative operation, P _i+1 Target parallelism for next iteration of iterative operation; the targetCPUAvg and targetMenoreAvg are respectively the CPU resource utilization rate and the memory resource utilization rate preset by the user;

the parallelism mechanism facing the iterative computation comprises a parallelism capacity-shrinking mechanism and a parallelism capacity-expanding mechanism; the parallelism capacity reduction mechanism is as follows: in some rounds of iterative computation, the resource utilization rate of each Slot in the Gaia system is lower than a target value, and then the iteration parallelism is reduced in the next iteration, so that the average resource utilization rate of the slots is improved, and finally the average resource utilization rate is improved to the target value preset by a user; the parallelism capacity expansion mechanism is as follows: in some rounds of iterative computation, the utilization rate of each Slot is higher than a target value, and then the iterative parallelism is improved in the next iteration, so that the available resources of iterative operation are increased, the average utilization rate of the Slot is reduced, and finally the target value of the utilization rate of the resources preset by a user is reached;

step 4, iteration state storage: and (5) storing the iteration state at the iteration synchronization obstacle by utilizing the characteristic of the Gaia iteration calculation.

The beneficial effects of adopting above-mentioned technical scheme to produce lie in: the parallelism dynamic adjustment method facing iterative computation in the Gaia system does not need to predict resources required by the operation in advance, but dynamically adjusts the iterative resources in the operation execution process. If the Slot resources are insufficient or the occupation is too high in the operation execution process, the mechanism can correspondingly expand the Slot resources according to the resource utilization rate target preset by a user, so that iteration resources required by the operation are met, the operation execution efficiency is further improved, the operation execution time is reduced, and operation failure caused by memory overflow and the like is avoided. If the Slot resource is wasted (each Slot resource is used too low) in the operation executing process, the method can shrink the corresponding Slot resource according to the resource utilization rate target preset by the user, so that the number of Slot resources occupied by the iterative operation is correspondingly reduced, further, the rest Slot resources in the Gaia cluster are increased, and the method can be used for other programs. The dynamic adjustment parallelism method does not require similar jobs to be executed before the execution of the jobs, nor does it require special job short examples to be executed, nor does it require excessive additional prediction time, compared to load prediction from frequently executed jobs or learning and predicting resources required for jobs by job short examples.

The method utilizes the characteristics of repeated execution of iterative computation and different input data, reduces the time cost of iterative computation or releases more iterative resources with extremely small time cost. Under the condition of insufficient iteration resources, the iteration resource capacity expansion mechanism can save more operation time and execution time cost along with the increase of the data set scale. Under the condition that the utilization rate of the iteration resources is too low, the iteration resource capacity reduction mechanism can save the iteration resource cost with extremely small time cost.

Drawings

Fig. 1 is a flowchart of a dynamic parallelism adjustment method facing iterative computation in a Gaia system according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a parallelism capacity reduction mechanism facing iterative computation according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a parallelism expansion mechanism facing iterative computation according to an embodiment of the present invention;

FIG. 4 is a diagram of an implementation process of an iterative step function in a Gaia system according to an embodiment of the present invention;

fig. 5 is a diagram of a specific implementation process of the heartbeat information collecting module assembly for collecting the Gaia system resource information according to the embodiment of the present invention.

Detailed Description

The following describes in further detail the embodiments of the present invention with reference to the drawings and examples. The following examples are illustrative of the invention and are not intended to limit the scope of the invention.

In this embodiment, a method for dynamically adjusting parallelism in a Gaia system, which faces iterative computation, as shown in fig. 1, specifically includes the following steps:

in this embodiment, a heartbeat data acquisition module is first added to the Gaia system, and the module is responsible for collecting statistical information data for adjusting parallelism. And simultaneously, a parallelism adjustment interface is provided for a user, through the parallelism adjustment interface, the user can preset the target resource utilization rate according to the service condition of the distributed cluster, and the Gaia system can automatically adjust the parallelism according to the data of the heartbeat data acquisition module in the next iteration execution. The implementation of the iterative resource dynamic parallelism mechanism in the embodiment meets the requirements of a design mode in the design and specific implementation, and particularly pays attention to the reliability and the expandability of the system code.

Step 2, collecting heartbeat information based on a heartbeat mechanism; in the process that a heartbeat monitor TaskExecutor of the Gaia system reports the heartbeat to a job manager JobMaster, reporting of non-connection state statistical information is added in Payload load information in the periodic timing heartbeat process; the non-connection state statistics include: 1) Resource utilization information when the job currently containing iterative computation runs: counting the occupied amount and total amount of resources such as the number of CPU usage cores (threads) of the distributed cluster, the usage percentage, the memory occupation percentage, the disk occupation percentage and the network bandwidth; 2) Features of the dataset entered when the Gaia system performs a job: including the size of the dataset, the number of elements, and the distribution of data key values;

the newly added heartbeat mechanism is a mechanism for determining whether another party in the network is alive by periodically sending a request. In the newly added heartbeat mechanism, three components of a resource manager (resource manager), a JobMaster and a TaskExecutor perform heartbeat detection and heartbeat reporting. For example, the TaskExecutor may report its own status to the JobMaster heartbeat, so that the JobMaster may determine whether the TaskExecutor survives, and the JobMaster may report its own status to the TaskExecutor heartbeat, so that the TaskExecutor may determine whether the JobMaster survives, and further determine whether the task executed thereon enters a failure state. The dynamic resource parallelism mechanism is realized based on a heartbeat mechanism in a newly added Gaia system, and when the TaskExecutor responds to the self survival information of the JobMaster in a heartbeat stage, the acquisition of Slot load information is realized, so that the utilization condition of Slot resources can be perceived in the execution process of a distributed operation iteration step. The heartbeat detection between JobMaster and TaskExecutor in the newly added heartbeat mechanism mainly comprises three stages: an initialization phase, a registration phase and a heartbeat phase.

The initialization stage is as follows: the JobMaster calls the createHeartBeatManagerSender () method in the HeartBeatServices class at startup; the method is used for creating a taskExecutorHeartBeatManager object which is responsible for carrying out heartbeat management on all the taskExecutorconnected with the object, periodically starting a timer, periodically scanning the managed object and then sending a heartbeat request to the object; the heartbeat detection system initialization process of the TaskExecutor is similar to the execution process of the JobMaster initialization stage, and the TaskExecutor creates a jobMasterHeartBeatManager object to manage the heartbeat information of the JobMaster;

the heartbeat phase is as follows: the heartbeat detection process between JobMaster and TaskExecutor is bi-directional; the TaskExecutor collects and reports load information to the JobMaster through a heartbeat mechanism; firstly, a JobMaster can regularly and remotely call a HeartBeatFromTaskExecutor () method of a TaskExecutor through an RPC, after the TaskExecutor receives the RPC request, call a reportHeartBeat () method in a corresponding HeartBeatMonitor class, then the TaskExecutor actively calls the reportPayload () method, and the acquired load information such as a CPU, a memory and the like is sent to the JobMaster;

step 3, dynamically adjusting parallelism based on a parallelism mechanism facing iterative computation by taking the target resource utilization rate preset by a user as a target;

based on the iteration characteristic of the data flow operation, adjusting parallelism resources according to the parallelism capacity reduction mechanism and the parallelism capacity expansion mechanism when the iteration operation is operated; according to the system statistical information collected in the iterative process, dynamically adjusting the computing system resources in a parallelism mode so as to realize economic and efficient job execution; and (3) calculating the number of parallelism of next iteration according to the formula 1:

wherein n is the number of statistical information categories, and in the above formula, the value of n is 2, because the statistical index categories include two types of CPU and memory average utilization rate; p (P) _i For the parallelism of the current iteration of the iterative operation, P _i+1 Target parallelism for next iteration of iterative operation; the targetCPUAvg and targetMenoreAvg are respectively the CPU resource utilization rate and the memory resource utilization rate preset by the user; according to the above, it can be calculated how much the parallelism of the next iterative computation needs to be set in order to reach the target resource utilization target preset by the user;

the parallelism mechanism facing the iterative computation comprises a parallelism capacity-shrinking mechanism and a parallelism capacity-expanding mechanism; the parallelism capacity reduction mechanism is shown in fig. 2, and specifically comprises the following steps: in some rounds of iterative computation, the resource utilization rate of each Slot in the Gaia system is lower than a target value, and then the iteration parallelism is reduced in the next iteration, so that the average resource utilization rate of the slots is improved, and finally the average resource utilization rate is improved to the target value preset by a user; the continuation of the resource waste condition can be avoided through the parallelism capacity reduction, so that the slot resources occupied by part of tasks are released, other jobs in a waiting state can be executed, and the overall benefit of the system is further improved. The parallelism expansion mechanism is shown in fig. 3, and specifically comprises the following steps: in some rounds of iterative computation, the utilization rate of each Slot is higher than a target value, and the excessive resource utilization rate can cause overflow of a system memory or insufficient cache to cause network congestion and other conditions, so that the iterative parallelism is improved in the next iteration, the available resources of iterative operation are increased, the average resource utilization rate of the slots is reduced, and finally the target value of the resource utilization rate preset by a user is reached, so that the problem of low iterative operation efficiency is avoided;

in this embodiment, a data collector (data collection) component of the JobMaster is responsible for collecting all the task executor load information associated with the data collector, and then merging the load information of each task executor, counting and calculating the average resource utilization rate of each index of each node. And then the DataCollection sends the heartbeat statistical information to the Client end of the Gaia system, and the Client end is preset with a corresponding target resource utilization rate by a user. After receiving the heartbeat statistical information sent by the JobMaster, the Client calculates the iteration parallelism of the next round according to the formula (1). If the iteration parallelism of the next round is changed compared with that of the current round, the Gaia system adjusts the corresponding parallelism and continues to perform iterative computation; if the parallelism of the next iteration is not changed, the iteration operation continues to be executed while maintaining the original parallelism.

Step 4, iteration state storage: by utilizing the characteristics of the Gaia iterative computation, the iterative state is saved at the iterative synchronization obstacle, because the intermediate state processing is not needed at the iterative synchronization obstacle.

The dynamic adjustment of the parallelism is carried out in the iterative calculation process, the iterative state must be saved, otherwise, the iterative task cannot be correctly restored after the parallelism is adjusted. The iterative computation model in the Gaia system can call a Step Function (Step Function) all the time before the iterative operator does not receive the iteration termination signal, and the Step Function is embedded into the iterative operator, so that the iterative operation is realized. The implementation of the iterative step function in the Gaia system is shown in fig. 4. First, a Superstep is a complete execution of an iteration. The super step is broken down into 3 steps: a local computation phase, a message passing phase, a synchronization barrier (Barrier Synchronization) phase. Synchronization obstacles indicate that all parallel tasks in an iterative computation need to be completed before the start of the next iterative superstep, each synchronization obstacle representing the end of the last superstep and the start of the next superstep in this distributed iterative computation. The end condition of a process iteration of a job in Gaia is generally defined as two aspects, one is that the iteration reaches a preset maximum number of iterations, and the other is that the iteration reaches a preset convergence criterion, and the whole iteration process is terminated as long as one of the conditions is satisfied.

In the iterative state preservation, the iterative state is preserved at the iteration synchronization obstacle by mainly utilizing the characteristics of the Gaia iterative computation. The use of iteration barriers has mainly the following two advantages: 1) The statistical information is more detailed. At the iteration barrier, detailed statistical information reflecting all elements processed in the previous iteration may be obtained. Compared with the method for collecting the statistical data for the next stage of the task in each stage of the task, the method for collecting the statistical data is not only suitable for an execution model for gradually executing batch processing tasks, but also suitable for an execution model for executing task pipelines in parallel with the streaming tasks. 2) No intermediate task state: at the iteration barrier, all tasks of the previous iteration have completed and the task of the next iteration has not yet started. Thus, at iteration obstacles, the data flow can be adjusted without the need to handle intermediate task states, which makes implementation of parallelism dynamic scaling simpler.

In the Gaia system iterative computation process, an iteration head task ItemonHeadTask, an iteration intermediate task ItemonIntermediateTask and an iteration tail task ItemonTailTask are different execution stages of the Gaia iterative process. The iteration head task reads the initial input and establishes a blocking backchannel feedback channel with the iteration tail task; the iteration intermediate task is responsible for updating the iteration states of the WorkSet and the Solution Set; the iteration tail task will transmit the output of the local task to the iteration head through the feedback channel after the iteration state is updated, which means the completion of one iteration logic. The ItemSynchronizationSinkTask is an implementation of iteration barriers, and is only used for coordinating and synchronizing all iteration head tasks, and does not participate in any data processing. After the ItemerationSynchronization SinkTask receives the AllWorks DoneEvent event, the method means that one-time superstep is completed, and at the moment, iteration state information in a feedback channel is saved by calling a saveItemerationState () method for use when operation is restored after iteration parallelism adjustment.

In this embodiment, based on the parallelism adjustment method of the present invention, at the Gaia system level, the job manager JobMaster performs statistics on various heartbeat information through resource utilization information of the Slot in the heartbeat monitor task executor and through a heartbeat data acquisition module (DataCollection) component. After the specific execution of the system resource information acquired by the heartbeat information acquisition module component is shown in fig. 5, the Gaia system starts a thread to process a heartbeat timeout event, and the thread is executed after the set heartbeat timeout time is reached; if the statistical information data for adjusting the parallelism is received, the thread is firstly canceled and then restarted, and the trigger of the heartbeat timeout event is reset; the final statistical information is transmitted to the Client end of the Gaia system, the Client end calculates the parallelism of the next iteration through a parallelism supervisor ParallelismRegulato component, and then the parallelism is adjusted and the execution of the next iteration is continued.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced with equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions, which are defined by the scope of the appended claims.

Claims

1. A dynamic parallelism adjusting method facing iterative computation in a Gaia system is characterized by comprising the following steps of: the method specifically comprises the following steps:

step 2, collecting heartbeat information based on a heartbeat mechanism; in the process that a heartbeat monitor TaskExecutor of the Gaia system reports the heartbeat to a job manager JobMaster, reporting of non-connection state statistical information is added in Payload load information in the periodic timing heartbeat process;

based on the iteration characteristic of the data stream operation, adjusting parallelism resources according to a parallelism mechanism when the iteration operation is operated; dynamically adjusting computing system resources in a parallelism mode according to system statistical information collected in an iteration process;

2. The method for dynamically adjusting parallelism in a Gaia system according to claim 1, wherein the method comprises the steps of: the heartbeat mechanism gathers statistics data for adjusting parallelism.

3. The method for dynamically adjusting parallelism in a Gaia system according to claim 2, wherein the method comprises the steps of: the non-connection state statistical information in step 2 includes: 1) Resource utilization information when the job currently containing iterative computation runs: counting the number and total amount of resources such as the number of core threads used by the CPU of the distributed cluster, the use percentage, the memory occupation percentage, the disk occupation percentage and the network bandwidth; 2) Features of the dataset entered when the Gaia system performs a job: including the size of the data set, the number of elements, and the distribution of data key values.

4. A method for dynamically adjusting parallelism in a Gaia system according to claim 3, wherein: the parallelism mechanism facing the iterative computation comprises a parallelism capacity-shrinking mechanism and a parallelism capacity-expanding mechanism; the parallelism capacity reduction mechanism is as follows: in some rounds of iterative computation, the resource utilization rate of each Slot in the Gaia system is lower than a target value, and then the iteration parallelism is reduced in the next iteration, so that the average resource utilization rate of the slots is improved, and finally the average resource utilization rate is improved to the target value preset by a user; the parallelism capacity expansion mechanism is as follows: in some rounds of iterative computation, the utilization rate of each Slot is higher than a target value, and then the iterative parallelism is improved in the next iteration, so that the available resources of iterative operation are increased, the average utilization rate of the Slot is reduced, and finally the target value of the utilization rate of the resources preset by a user is reached.

5. The method for dynamically adjusting parallelism in a Gaia system according to claim 4, wherein: and 3, calculating the parallelism of the next iteration according to the formula 1:

wherein n is the number of statistical information categories; p (P) _i For the parallelism of the current iteration of the iterative operation, P _i+1 Target parallelism for next iteration of iterative operation; the targetCPUAvg and targetmeryavg are respectively a CPU resource utilization and a memory resource utilization preset by a user.