CN117193989B - Data centralized scheduling method of partitioned data center and related equipment thereof - Google Patents

Data centralized scheduling method of partitioned data center and related equipment thereof Download PDF

Info

Publication number
CN117193989B
CN117193989B CN202311464685.0A CN202311464685A CN117193989B CN 117193989 B CN117193989 B CN 117193989B CN 202311464685 A CN202311464685 A CN 202311464685A CN 117193989 B CN117193989 B CN 117193989B
Authority
CN
China
Prior art keywords
task
data
node
tasks
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311464685.0A
Other languages
Chinese (zh)
Other versions
CN117193989A (en
Inventor
谭长华
车科谋
彭韧辉
赵振东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Cloud Base Technology Co ltd
Original Assignee
Guangdong Cloud Base Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Cloud Base Technology Co ltd filed Critical Guangdong Cloud Base Technology Co ltd
Priority to CN202311464685.0A priority Critical patent/CN117193989B/en
Publication of CN117193989A publication Critical patent/CN117193989A/en
Application granted granted Critical
Publication of CN117193989B publication Critical patent/CN117193989B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The utility model provides a data centralized scheduling method and related equipment of a partition data center, the task in task prediction data is sampled at equal intervals, a plurality of center samples are obtained, the task is divided into task transition nodes corresponding to the minimum node judgment value of the task, the steps are repeated, the center task of each task transition node is determined, if the matching degree of the center task of the task transition node and the data of the center sample of the task transition node exceeds a preset matching threshold value, the task transition node is used as a task node, the data of the partition data center is scheduled in a centralized manner according to each task node, the task processing characteristic values of each task node at different time points are determined, when the task processing characteristic values are lower than the preset task processing threshold value interval, the task nodes corresponding to the task processing characteristic values are dispersed in a tidal manner, and the data corresponding to the task are scheduled to other task nodes, so that the data processing capacity of the task nodes can be improved.

Description

Data centralized scheduling method of partitioned data center and related equipment thereof
Technical Field
The present application relates to the technical field of partitioned data centers, and in particular, to a data centralized scheduling method and related devices for a partitioned data center.
Background
A partitioned data center is a distributed computing and storage architecture used in large-scale data processing and storage, the core idea of which is to divide data and computing tasks into multiple partitions, which can be stored and processed at different physical locations or servers, and the gist of the partitioned data center technology includes: data partitioning, distributed storage, task scheduling, parallel processing and the like, and partitioned data center technology is widely applied to the fields of big data analysis, cloud computing, scientific computing and the like.
Data set scheduling refers to a process of managing and scheduling access, processing and distribution of data on the basis of data set centralization, the process comprising: the batch processing and real-time processing of data, the differentiation of tasks, the data scheduling and the like ensure that the data can be efficiently and orderly operated after centralization, the data centralization scheduling of the partitioned data center relates to the process of managing and scheduling the data in a distributed data storage system, the data centralization scheduling plays roles of coordinating, managing and optimizing data access, processing and distributing under the condition, and when the data volume in the partitioned data center reaches a certain degree, the partitioned data center can face the problem that the data management becomes complex, so that the data processing capacity of the partitioned data center is reduced.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a method for centralized scheduling of data in a partitioned data center and related apparatus thereof, which are used to improve the data processing capability of the partitioned data center.
In order to solve the technical problems, the application adopts the following technical scheme:
in a first aspect, the present application provides a method for centralized scheduling of data in a partitioned data center, including the following steps:
acquiring historical task data of a partition data center, converting the historical task data into task prediction data, and sampling tasks in the task prediction data at equal intervals to obtain a plurality of center samples, wherein each center sample corresponds to one task transition node;
respectively carrying out node judgment value calculation on the tasks in the task prediction data and each center sample to obtain a plurality of node judgment values of the tasks, dividing the tasks into task transition nodes corresponding to the minimum node judgment values of the tasks, repeating the steps, and dividing all the tasks in the task prediction data into corresponding task transition nodes;
determining a central task of each task transition node, if the data matching degree of the central task of each task transition node and a central sample of each task transition node exceeds a preset matching threshold, taking the task transition node as a task node, otherwise, re-dividing the tasks in the task prediction data according to each central task until the data matching degree exceeds the preset matching threshold, and carrying out centralized scheduling on the data of the partitioned data center according to each task node;
Selecting a task node, determining task processing fatigue factors of the task node, determining task processing characteristic values of the task node at different time points according to the task processing fatigue factors, and repeating the steps for the rest task nodes to continuously obtain the task processing characteristic values of the rest task nodes at different time points;
and when the task processing characteristic value is lower than a preset task processing threshold value interval, the task node tidal corresponding to the task processing characteristic value is dispersed, and the current task of the task node and the data corresponding to the task are scheduled to other task nodes for processing.
In some embodiments, converting the historical task data into task prediction data specifically includes:
dividing the historical task data according to the same time period to obtain a plurality of time data sets;
and carrying out average calculation on each time data group to obtain task prediction data.
In some embodiments, the node judgment value calculation is performed on the task in the task prediction data and each center sample, and the specific steps of obtaining a plurality of node judgment values of the task are as follows:
determination of the firstData of the individual center samples- >
Determining the first of the task prediction dataData of individual tasks->
Determination of the firstTreatment period of the individual center samples +.>
Determining the first of the task prediction dataProcessing time period of individual tasks->
According to the firstData of the individual center samples->The>Data of individual tasks->Said->Treatment period of the individual center samples +.>And said->Treatment period of the individual center samples +.>Determining a node determination value of each task in the task prediction data corresponding to each center sample, wherein the node determination value can be determined by adopting the following formula:
wherein,representing the +.>The task corresponds to->Node decision value of the individual center samples, +.>And->Representing the adjustment factor>
In some embodiments, determining the central task for each task transition node specifically includes:
determining data with the most use times in each task transition node and a time period with the most task processing;
and determining a central task judgment value of each task transition node according to the data with the maximum use times and the time period with the maximum task processing, and further determining the central task of each task transition node.
In some embodiments, determining the task processing fatigue factor for the task node specifically includes:
Determining the task quantity of the task node;
and determining the task processing fatigue level of the task node according to the number of the task quantities, and further determining the task processing fatigue factor of the task node.
In some embodiments, further comprising:
when the task processing characteristic value exceeds a preset task processing threshold value interval, permanently dismissing task nodes corresponding to the task processing characteristic value, and simultaneously scheduling all tasks in the task nodes and data corresponding to the tasks to other task nodes for processing;
and when the task processing characteristic value is in a preset task processing threshold value interval, sending a maintenance normal signal.
In some embodiments, for a task node that is tidal-dismissed, the task node is restarted on the next weekday.
In a second aspect, the present application provides a data set scheduling system for a partitioned data center, including:
the central sample acquisition module is used for acquiring historical task data of the partition data center, converting the historical task data into task prediction data, and sampling tasks in the task prediction data at equal intervals to obtain a plurality of central samples, wherein each central sample corresponds to one task transition node;
The task dividing module is used for respectively calculating the node judgment values of the tasks in the task prediction data and the central samples to obtain a plurality of node judgment values of the tasks, dividing the tasks into task transition nodes corresponding to the minimum node judgment values of the tasks, repeating the steps, and dividing all the tasks in the task prediction data into the corresponding task transition nodes;
the task node determining module is used for determining a central task of each task transition node, if the data matching degree of the central task of the task transition node and a central sample of the task transition node exceeds a preset matching threshold, taking the task transition node as a task node, otherwise, dividing the tasks in the task prediction data again according to each central task until the data matching degree exceeds the preset matching threshold, and carrying out centralized scheduling on the data of the partitioned data center according to each task node;
the task processing characteristic value determining module is used for selecting one task node, determining task processing fatigue factors of the task node, determining task processing characteristic values of the task node at different time points according to the task processing fatigue factors, and repeating the steps for the rest task nodes to continuously obtain the task processing characteristic values of the rest task nodes at different time points;
And the task node control module is used for carrying out tidal dismissal on the task node corresponding to the task processing characteristic value when the task processing characteristic value is lower than a preset task processing threshold value interval, and scheduling the current task of the task node and the data corresponding to the task to other task nodes for processing.
In a third aspect, the present application provides a computer device, including a memory and a processor, where the memory stores a computer program, and the processor implements the steps of the data-intensive scheduling method of the partitioned data center described above when the computer program is executed.
In a fourth aspect, the present application provides a computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of a data-centralized scheduling method of a partitioned data center described above.
The technical scheme provided by the embodiment of the application has the following beneficial effects:
in the data centralized scheduling method of the partition data center and the related equipment thereof, firstly, historical task data of the partition data center are obtained, the historical task data are converted into task prediction data, tasks in the task prediction data are sampled at equal intervals to obtain a plurality of center samples, each center sample corresponds to a task transition node, tasks in the task prediction data are respectively calculated with node judgment values of the center samples to obtain a plurality of node judgment values of the tasks, the tasks are divided into task transition nodes corresponding to the minimum node judgment values of the tasks, the steps are repeated, all tasks in the task prediction data are divided into corresponding task transition nodes, center tasks of each task transition node are determined, if the data matching degree of the center tasks of the task transition nodes and the center samples of the task transition nodes exceeds a preset matching threshold, the task transition nodes are used as task nodes, otherwise, the tasks in the task prediction data are divided again according to each center task till the data matching degree exceeds the preset matching threshold, the task matching degree does not exceed the preset matching threshold, the task fatigue threshold is not processed at the time point corresponding to the task nodes, the characteristic of the task is not processed, the fatigue factor is determined when the characteristic of the task is not processed at the node, the time of the task is processed at the time point of the time of the task is not being set, the task fatigue threshold is continuously processed, the task fatigue is not processed, the characteristic value is determined, and the task fatigue is processed at the time point is continuously, and the time is not processed, the method can combine the same kind of task in the partition data center and the data of the task into the task node through the flexibility, and further monitor the task processing characteristic value of the task node in real time, thereby controlling the tidal reorganization of the task node.
Drawings
FIG. 1 is a flow chart of a method for centralized scheduling of data in a partitioned data center in some embodiments of the present application;
FIG. 2 is a block diagram of a data set scheduling system of a partitioned data center in some embodiments of the present application;
fig. 3 is an internal block diagram of a computer device in some embodiments of the present application.
Detailed Description
The method comprises the steps of obtaining historical task data of a partition data center, converting the historical task data into task prediction data, carrying out equidistant sampling on tasks in the task prediction data to obtain a plurality of center samples, wherein each center sample corresponds to a task transition node, respectively carrying out node judgment value calculation on the tasks in the task prediction data and each center sample to obtain a plurality of node judgment values of the tasks, dividing the tasks into task transition nodes corresponding to the minimum node judgment value of the tasks, repeating the steps, dividing all the tasks in the task prediction data into the corresponding task transition nodes, determining the center tasks of each task transition node, taking the task transition node as a task node if the data matching degree of the center tasks of the task transition node exceeds a preset matching threshold value, otherwise, carrying out re-dividing on the tasks in the task prediction data according to each center task until the data matching degree exceeds the preset matching threshold value, carrying out scheduling on the data of the partition data center according to each task node, selecting a task corresponding to the minimum node judgment value, determining the task fatigue factor when the characteristic value of the task is not equal to the task fatigue threshold value, processing the characteristic value is not determined for the task at the task, and the task fatigue factor is not processed at the same time point, and the characteristic value is not being equal to the task threshold value is continuously processed, and scheduling the current task of the task node and the data corresponding to the task to other task nodes for processing, so that the similar task in the partitioned data center and the data of the task can be flexibly selected to be combined into the task node, and further the task processing characteristic value of the task node is monitored in real time, thereby controlling the tidal recombination of the task node.
In order to better understand the above technical solutions, the following detailed description will refer to the accompanying drawings and specific embodiments. Referring to FIG. 1, which is an exemplary flowchart of a method of centralized scheduling of a partitioned data center, a method 100 of centralized scheduling of a partitioned data center, according to some embodiments of the present application, generally includes the steps of:
in step 101, historical task data of a partition data center is obtained, the historical task data are converted into task prediction data, tasks in the task prediction data are sampled at equal intervals, a plurality of center samples are obtained, and each center sample corresponds to a task transition node.
In particular, the historical task data of the year before the same month is obtained from the operation database of the partition data center, and in some embodiments, the conversion of the historical task data into the task prediction data can be realized by adopting the following steps:
dividing the historical task data according to the same time period to obtain a plurality of time data sets;
and carrying out average calculation on each time data group to obtain task prediction data.
In this application, task prediction data is historical task data for predicting a task, processing data of the task, and processing time of the task at a specified time point in the future, and in a specific implementation, prediction may be performed based on the historical task, the processing data of the historical task, and the processing time of the historical task, for example, task data in the historical task data is divided according to different days and the same time period, the divided historical task data in different days and the same time period is used as a time data set, and the historical task data is traversed according to different time periods in one day to obtain a plurality of time data sets, for example: two groups of task data in the historical task data are task data from a No. 12 point 38 of a No. 16 of a 3 month in 2018 to a No. 12 point 40 of a No. 16 of a 3 month in 2018 and task data from a No. 12 point 38 of a No. 12 point of a No. 17 of a 3 month in 2018 to a No. 12 point 40 of a No. 12 point in 2018 respectively, and the two groups of task data are divided into time data groups from a No. 38 point to a No. 12 point 40 point in 12; the task data in each time data group is subjected to average calculation, and each task data after average calculation is combined into task prediction data, for example: the time data set of 12 points 38 comprises all task data of which the number is 12 points 38 in different numbers in 3 months of 2018, the time data set of 12 points 38 is subjected to average calculation, and the time data set of 12 points 38 after average calculation is used as task prediction data of 12 points 38 in task prediction data.
It should be noted that, the historical task data in the present application includes all tasks processed by the partition data center, processing data of each task, and processing time of each task; the processing data of each task refers to data required by processing the corresponding task, and the processing time of each task refers to time required by processing the corresponding task; the averaging of the time data sets may be performed using prior art averaging analysis methods, and in other embodiments, may be performed using other methods, without limitation.
In specific implementation, the tasks in the task prediction data are sampled at equal intervals, for example: dividing 0 point 30 into initial sampling points, sampling tasks in the task prediction data at equal intervals once every hour, taking the tasks obtained by sampling at equal intervals as center samples, and traversing 24 hours in one day to obtain 24 center samples; each central sample corresponds to a task transition node.
In step 102, the tasks in the task prediction data and each center sample are respectively calculated to obtain a plurality of node judgment values of the tasks, the tasks are divided into task transition nodes corresponding to the minimum node judgment values of the tasks, the steps are repeated, and all the tasks in the task prediction data are divided into corresponding task transition nodes.
In some embodiments, the task in the task prediction data and each center sample are respectively calculated to obtain a plurality of node judgment values of the task, which may be implemented by the following steps:
determination of the firstData of the individual center samples->
Determining the first of the task prediction dataData of individual tasks->
Determination of the firstTreatment period of the individual center samples +.>
Determining the first of the task prediction dataProcessing time period of individual tasks->
According to the firstData of the individual center samples->The>Data of individual tasks->Said->Treatment period of the individual center samples +.>And said->Treatment period of the individual center samples +.>Determining a node determination value of each center sample corresponding to each task in the task prediction data, wherein the node determination value can be determined by adopting the following formula:
wherein,representing the +.>The task corresponds to->Node decision value of the individual center samples, +.>And->Representing the adjustment factor>
It should be noted that, the adjustment coefficients in the present application may be set according to the weight, and the sum of the adjustment coefficients is equal to 1.
When the task is specifically implemented, the node judgment values of the tasks are compared, the tasks are divided into task transition nodes corresponding to the minimum node judgment values of the tasks, the steps are repeated for each task in the task prediction data, and all the tasks in the task prediction data are divided into task transition nodes corresponding to the respective minimum node judgment values.
The task transition node in the present application includes a center sample, tasks divided into the center sample, data corresponding to each task, and processing time corresponding to each task.
In step 103, determining a central task of each task transition node, if the data matching degree of the central task of each task transition node and a central sample of the task transition node exceeds a preset matching threshold, taking the task transition node as a task node, otherwise, re-dividing the tasks in the task prediction data according to each central task until the data matching degree exceeds the prediction matching threshold, and carrying out centralized scheduling on the data of the partitioned data center according to each task node.
In some embodiments, the determination of the central task for each task transition node may be accomplished using the steps of:
determining data with the most use times in each task transition node and a time period with the most task processing;
and determining a central task judgment value of each task transition node according to the data with the maximum use times and the time period with the maximum task processing, and further determining the central task of each task transition node.
In the concrete implementation, for each task transition node, a task corresponding to the minimum central task judgment value in the task transition node is used as the central task of the task transition node, so that the central task of each task transition node is obtained.
In some embodiments, the central task determination value of each task transition node determined according to the data with the most usage times and the time period with the most task processing can be determined by the following formula:
wherein,indicate->Center task determination value of individual task transition node, for example>Indicate->Data with the highest number of uses in the task transition nodes +.>Indicate->The first part of the task transition nodes>Data of individual tasks>Indicate->The time period in which the task is most processed in the task transition node is +.>Indicate->The first part of the task transition nodes>The time period of the individual tasks->And->Representing the adjustment factor>
It should be noted that the adjustment coefficients in the present application may be set according to the weight, and the sum of the adjustment coefficients is equal to 1.
When the method is specifically implemented, the central task of the task transition node and the data of the central sample of the task transition node are obtained, the data matching degree of the central task and the data of the central sample is calculated, if the data matching degree exceeds a preset matching threshold, the task transition node is used as a task node, otherwise, the steps in the step 102 are repeated, the central task is used as a new central sample, the node judgment value is recalculated, the task transition node is re-divided, the central task is re-determined until the data matching degree of the central task of the re-determined task transition node and the new central sample exceeds the preset matching threshold, the re-determined task transition node is used as a task node, and the data of the partitioned data center is subjected to centralized scheduling according to each task node.
It should be noted that, in the present application, the data matching degree indicates a measure of the overlapping degree between two sets of data, the larger the data matching degree is, the smaller the difference between the data of the central task and the data of the central sample is, the larger the difference between the data of the central task and the data of the central sample is, and when the data matching degree is specifically implemented, the data matching degree between the data of the central task and the data of the central sample can be determined by using, for example, the existing similarity measurement technology or the machine classification technology, which is not described herein again, and in addition, the matching threshold value in the present application can be preset according to the historical experimental data, and in other embodiments, other methods can be used to obtain the data, which is not limited herein.
In step 104, a task node is selected, task processing fatigue factors of the task node are determined, task processing characteristic values of the task node at different time points are determined according to the task processing fatigue factors, and for the remaining task nodes, the steps are repeated to continuously obtain the task processing characteristic values of the remaining task nodes at different time points.
It should be noted that, in the present application, the task processing fatigue factor indicates the load degree of a task node when processing various tasks, and the greater the task processing fatigue factor, the greater the load of the task node, and conversely, the smaller the load of the task node, when the task node is always under a high load, the data of the task node will be damaged, which is not described herein again.
In some embodiments, the task processing fatigue factor of the task node determined by the application may be implemented by the following steps:
determining the task quantity of the task node;
and determining the task processing fatigue level of the task node according to the number of the task quantities, and further determining the task processing fatigue factor of the task node.
In concrete implementation, setting task processing fatigue grade of the task node according to the number of the obtained task quantity of the task node; and determining task processing fatigue factors of the task nodes according to the set task processing fatigue levels, for example: and setting the task processing fatigue level of the task node as one level if the number of the task quantity of the task node is 4, and setting the task processing fatigue factor of the task node as 0.2.
It should be noted that in some embodiments, the task handles fatigue levels such as: 0-5 is one grade, 6-10 is two grade, 11-15 is three grade, 16-20 is four grade, 21-25 is five grade, and fatigue factors corresponding to the grade from one grade to five grade are 0.2, 0.4, 0.6, 0.8 and 1 respectively.
In some embodiments, the task processing characteristic value of each task node at different time points can be implemented by the following steps:
Acquisition of the firstThe time of each task node is->Task handling fatigue factor at time->
Determination of the firstFirst->The time of each task is->Data volume->
Acquisition of the firstData aggregation of individual task nodesQuantity->
Acquisition of the firstThe time of each task node is->Task amount at time->
Acquisition of the firstData of a central task of the individual task nodes +.>
Acquisition of the firstFirst->The time of each task is->Data at time->
Acquisition of the firstProcessing time period of central task of individual task node +.>
Acquisition of the firstFirst->The time of each task is->Treatment time period ∈>
According to the firstThe time of each task node is->Task handling fatigue factor at time->Said->First->The time of each task is->Data volume->Said->Total data amount of each task node->Said->The time of each task node is->Task amount at time->Said->Data of a central task of the individual task nodes +.>Said->First->The time of each task is->Data at time->Said->Processing time period of central task of individual task node +.>And said->First->The time of each task is->Treatment time period ∈ >Determining a real-time task processing characteristic value of a task node, wherein the task processing characteristic value is determined by adopting the following formula:
wherein,indicate->The time of each task node is->Task feature handling value at time, < >>And->Representing the adjustment factor>,/>
When the method is specifically implemented, the data meganumbers of the tasks in each task node at different time points are obtained, and the data meganumbers are used as the data amounts of the tasks corresponding to the data meganumbers at different time points; and the sum of the data meganumbers of all the tasks in each task node is used as the total data quantity of the task node.
It should be noted that, the adjustment coefficients in the present application may be set according to the weight, and the sum of the adjustment coefficients is equal to 1.
In step 105, when the task processing characteristic value is lower than a preset task processing threshold interval, the task node corresponding to the task processing characteristic value is dispersed in a tidal manner, and the current task of the task node and the data corresponding to the task are scheduled to other task nodes for processing.
In some embodiments, when the task processing characteristic value exceeds a preset task processing threshold interval, permanently dismissing the task node corresponding to the task processing characteristic value, and scheduling all tasks in the task node and data corresponding to the tasks to other task nodes for processing;
And when the task processing characteristic value is in a preset task processing threshold value interval, sending a maintenance normal signal.
When the task processing characteristic value is lower than a preset task processing threshold interval and the time of the task processing characteristic value lower than the preset task processing threshold interval exceeds a set threshold time, the task node tidal corresponding to the task processing characteristic value is dispersed, and the current task of the task node and the data corresponding to the task are scheduled to the task node corresponding to the second small node judgment value of the task for processing; when the task processing characteristic value exceeds a preset task processing threshold value interval and the time exceeding the preset task processing threshold value interval exceeds a set threshold value time, permanently dismissing task nodes corresponding to the task processing characteristic value, and simultaneously scheduling all tasks in the task nodes and data corresponding to the tasks to task nodes corresponding to a second small node judgment value of the tasks for processing; and when the task processing characteristic value is in a preset task processing threshold value interval, sending a signal for maintaining normal work.
It should be noted that, the preset task processing threshold interval and the preset threshold time in the present application may be obtained through historical experimental data, and in other embodiments, other methods may be used to obtain the preset task processing threshold interval and the preset threshold time, which are not limited herein.
It should be noted that, tidal dismissal in the present application means that the task node still exists, and only the task in the data center is divided into other data centers, so that the energy consumption of the task node is reduced, and meanwhile, the working efficiency of other task nodes is improved; persistent dismissal refers to the task node no longer existing and permanently partitioning tasks and data belonging to the task node to other task nodes.
In some embodiments, for a task node that is tidal-dismissed, the task node may be restarted on the next workday.
Additionally, in another aspect of the present application, in some embodiments, the present application provides a data-centric scheduling system of a partitioned data center, referring to fig. 2, which is a schematic diagram of exemplary hardware and/or software of a data-centric scheduling system of a partitioned data center according to some embodiments of the present application, the data-centric scheduling system 200 of the partitioned data center includes: the central sample acquisition module 201, the task partitioning module 202, the task node determination module 203, the task processing characteristic value determination module 204, and the task node control module 205 are respectively described as follows:
The central sample acquisition module 201 is mainly used for acquiring historical task data of a partition data center, converting the historical task data into task prediction data, and performing equidistant sampling on tasks in the task prediction data to obtain a plurality of central samples, wherein each central sample corresponds to a task transition node;
the task dividing module 202, in the present application, the task dividing module 202 is mainly configured to calculate node decision values of the tasks in the task prediction data and each center sample, obtain a plurality of node decision values of the tasks, divide the tasks into task transition nodes corresponding to the minimum node decision values of the tasks, repeat the above steps, and divide all the tasks in the task prediction data into corresponding task transition nodes;
the task node determining module 203, in the present application, the task node determining module 203 is configured to determine a central task of each task transition node, if there is a match degree between the central task of a task transition node and a data of a central sample of the task transition node that exceeds a preset match threshold, take the task transition node as a task node, otherwise, divide the tasks in the task prediction data again according to each central task until the data match degree exceeds the preset match threshold, and perform centralized scheduling on the data of the partitioned data center according to each task node;
The task processing characteristic value determining module 204 is mainly used for selecting a task node, determining task processing fatigue factors of the task node, determining task processing characteristic values of the task node at different time points according to the task processing fatigue factors, and repeating the steps for the remaining task nodes to continuously obtain the task processing characteristic values of the remaining task nodes at different time points;
the task node control module 205, in this application, the task node control module 205 is mainly configured to, when the task processing feature value is lower than a preset task processing threshold interval, break up the task node tidal property corresponding to the task processing feature value, and schedule the current task of the task node and the data corresponding to the task to other task nodes for processing.
The various modules in the data set scheduling system of the partitioned data center may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules
Additionally, in one embodiment, the present application provides a computer device, which may be a server, whose internal structure may be as shown in fig. 3. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used to store data of the data set schedule of the partitioned data center. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a method for centralized scheduling of data in a partitioned data center.
It will be appreciated by those skilled in the art that the structure shown in fig. 3 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, there is also provided a computer device including a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the data-intensive scheduling method embodiment of the partitioned data center described above when the computer program is executed.
In one embodiment, a computer readable storage medium is provided storing a computer program which when executed by a processor implements the steps of the data-centric scheduling method embodiment of a partitioned data center described above.
In one embodiment, a computer program product or computer program is provided that includes computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium and executes the computer instructions to cause the computer device to perform the steps of the data-intensive scheduling method embodiment of the partitioned data center described above.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, or the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like.
In summary, in the data centralized scheduling method of the partition data center and the related device thereof disclosed in the embodiments of the present application, firstly, historical task data of the partition data center is obtained, the historical task data is converted into task prediction data, tasks in the task prediction data are sampled at equal intervals to obtain a plurality of center samples, each center sample corresponds to a task transition node, tasks in the task prediction data are respectively calculated with node judgment values of each center sample to obtain a plurality of node judgment values of the tasks, the tasks are divided into task transition nodes corresponding to the minimum node judgment value of the tasks, the steps are repeated, all tasks in the task prediction data are divided into corresponding task transition nodes, central tasks of each task transition node are determined, if the data matching degree of the central tasks of the task transition nodes and the central samples of the task transition nodes exceeds a preset matching threshold, the task transition nodes are regarded as task nodes, otherwise, tasks in the task prediction data of the task are divided again according to each center until the data matching degree exceeds the preset threshold, the task fatigue factor is not processed at the same time point when the task is not processed at the node, the time of the task is not processed at the same node, the task fatigue factor is determined, the task fatigue factor is not processed at the time point of the node is continuously processed at the time of the node, the task nodes corresponding to the task processing characteristic values are subjected to tidal dismissal, the current tasks of the task nodes and the data corresponding to the tasks are scheduled to other task nodes for processing, the similar tasks in the partitioned data center and the data of the tasks can be flexibly selected to be combined into the task nodes, and further the task processing characteristic values of the task nodes are monitored in real time, so that the tidal recombination of the task nodes is controlled.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims (9)

1. The data centralized scheduling method of the partitioned data center is characterized by comprising the following steps of:
acquiring historical task data of a partition data center, converting the historical task data into task prediction data, and sampling tasks in the task prediction data at equal intervals to obtain a plurality of center samples, wherein each center sample corresponds to one task transition node;
And respectively carrying out node judgment value calculation on the tasks in the task prediction data and each center sample to obtain a plurality of node judgment values of the tasks, wherein the specific steps of determining the node judgment values are as follows:
determination of the firstData of the individual center samples->
Determining the first of the task prediction dataData of individual tasks->
Determination of the firstTreatment period of the individual center samples +.>
Determining the first of the task prediction dataProcessing time period of individual tasks->
According to the firstData of the individual center samples->The>Data of individual tasks->Said->Treatment period of the individual center samples +.>And said->Treatment period of the individual center samples +.>Determining a node determination value of each task in the task prediction data corresponding to each center sample, wherein the node determination value is determined by adopting the following formula:
wherein,representing the +.>The task corresponds to->Node decision value of the individual center samples, +.>And->Representing the adjustment factor>
Dividing the task into task transition nodes corresponding to the minimum node judgment value of the task, repeating the steps, and dividing all tasks in the task prediction data into corresponding task transition nodes;
Determining a central task of each task transition node, if the data matching degree of the central task of each task transition node and a central sample of each task transition node exceeds a preset matching threshold, taking the task transition node as a task node, otherwise, re-dividing the tasks in the task prediction data according to each central task until the data matching degree exceeds the preset matching threshold, and carrying out centralized scheduling on the data of the partitioned data center according to each task node;
selecting a task node, and determining task processing fatigue factors of the task node, wherein the task processing fatigue factors represent the load degree of the task node when processing tasks, task processing characteristic values of the task node at different time points are determined according to the task processing fatigue factors, and for the rest task nodes, repeating the steps to continuously obtain the task processing characteristic values of the rest task nodes at different time points;
and when the task processing characteristic value is lower than a preset task processing threshold value interval, the task node tidal decomposition corresponding to the task processing characteristic value is carried out, wherein the tidal decomposition means that the task node still exists, the task in the data center is only divided into other data centers at the moment, and the current task of the task node and the data corresponding to the task are scheduled to other task nodes for processing.
2. The method of claim 1, wherein converting the historical task data into task prediction data comprises:
dividing the historical task data according to the same time period to obtain a plurality of time data sets;
and carrying out average calculation on each time data group to obtain task prediction data.
3. The method of claim 1, wherein determining a central task for each task transition node comprises:
determining data with the most use times in each task transition node and a time period with the most task processing;
and determining a central task judgment value of each task transition node according to the data with the maximum use times and the time period with the maximum task processing, and further determining the central task of each task transition node.
4. The method of claim 1, wherein determining the task processing fatigue factor for the task node comprises:
determining the task quantity of the task node;
and determining the task processing fatigue level of the task node according to the number of the task quantities, and further determining the task processing fatigue factor of the task node.
5. The method as recited in claim 1, further comprising:
When the task processing characteristic value exceeds a preset task processing threshold value interval, permanently dismissing task nodes corresponding to the task processing characteristic value, and simultaneously scheduling all tasks in the task nodes and data corresponding to the tasks to other task nodes for processing;
and when the task processing characteristic value is in a preset task processing threshold value interval, sending a maintenance normal signal.
6. The method as recited in claim 1, further comprising:
for a task node that is tidal disintegrated, the task node is restarted on the next workday.
7. A data-centric centralized scheduling system for a partitioned data center that performs scheduling using the method of claim 1, the data-centric scheduling system comprising:
the central sample acquisition module is used for acquiring historical task data of the partition data center, converting the historical task data into task prediction data, and sampling tasks in the task prediction data at equal intervals to obtain a plurality of central samples, wherein each central sample corresponds to one task transition node;
the task dividing module is used for respectively calculating the node judgment values of the tasks in the task prediction data and the central samples to obtain a plurality of node judgment values of the tasks, dividing the tasks into task transition nodes corresponding to the minimum node judgment values of the tasks, repeating the steps, and dividing all the tasks in the task prediction data into the corresponding task transition nodes;
The task node determining module is used for determining a central task of each task transition node, if the data matching degree of the central task of the task transition node and a central sample of the task transition node exceeds a preset matching threshold, taking the task transition node as a task node, otherwise, dividing the tasks in the task prediction data again according to each central task until the data matching degree exceeds the preset matching threshold, and carrying out centralized scheduling on the data of the partitioned data center according to each task node;
the task processing characteristic value determining module is used for selecting one task node, determining task processing fatigue factors of the task node, determining task processing characteristic values of the task node at different time points according to the task processing fatigue factors, and repeating the steps for the rest task nodes to continuously obtain the task processing characteristic values of the rest task nodes at different time points;
and the task node control module is used for carrying out tidal dismissal on the task node corresponding to the task processing characteristic value when the task processing characteristic value is lower than a preset task processing threshold value interval, and scheduling the current task of the task node and the data corresponding to the task to other task nodes for processing.
8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the data-centralized scheduling method of a partitioned data center of any one of claims 1 to 6.
9. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the steps of the data-intensive scheduling method of a partitioned data center according to any one of claims 1 to 6.
CN202311464685.0A 2023-11-07 2023-11-07 Data centralized scheduling method of partitioned data center and related equipment thereof Active CN117193989B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311464685.0A CN117193989B (en) 2023-11-07 2023-11-07 Data centralized scheduling method of partitioned data center and related equipment thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311464685.0A CN117193989B (en) 2023-11-07 2023-11-07 Data centralized scheduling method of partitioned data center and related equipment thereof

Publications (2)

Publication Number Publication Date
CN117193989A CN117193989A (en) 2023-12-08
CN117193989B true CN117193989B (en) 2024-03-15

Family

ID=88992805

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311464685.0A Active CN117193989B (en) 2023-11-07 2023-11-07 Data centralized scheduling method of partitioned data center and related equipment thereof

Country Status (1)

Country Link
CN (1) CN117193989B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101986272A (en) * 2010-11-05 2011-03-16 北京大学 Task scheduling method under cloud computing environment
WO2021179462A1 (en) * 2020-03-12 2021-09-16 重庆邮电大学 Improved quantum ant colony algorithm-based spark platform task scheduling method
CN115981848A (en) * 2022-12-17 2023-04-18 郑州斋杆网络科技有限公司 Memory database fragmentation adjustment method and device
WO2023104192A1 (en) * 2021-12-10 2023-06-15 华为技术有限公司 Cluster system management method and apparatus
CN116954905A (en) * 2023-07-26 2023-10-27 南京邮电大学 Task scheduling and migration method for large Flink data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101986272A (en) * 2010-11-05 2011-03-16 北京大学 Task scheduling method under cloud computing environment
WO2021179462A1 (en) * 2020-03-12 2021-09-16 重庆邮电大学 Improved quantum ant colony algorithm-based spark platform task scheduling method
WO2023104192A1 (en) * 2021-12-10 2023-06-15 华为技术有限公司 Cluster system management method and apparatus
CN115981848A (en) * 2022-12-17 2023-04-18 郑州斋杆网络科技有限公司 Memory database fragmentation adjustment method and device
CN116954905A (en) * 2023-07-26 2023-10-27 南京邮电大学 Task scheduling and migration method for large Flink data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于推测的无响应任务自适应容错调度算法;崔云飞 等;《计算机科学》;20161130;第43卷(第11A期);第11-15页 *
基于节点负载的数据动态分区;孟令伍 等;《计算机系统应用》;第30卷(第12期);第299−307页 *

Also Published As

Publication number Publication date
CN117193989A (en) 2023-12-08

Similar Documents

Publication Publication Date Title
US10474504B2 (en) Distributed node intra-group task scheduling method and system
CN110287245A (en) Method and system for scheduling and executing distributed ETL (extract transform load) tasks
CN108491255B (en) Self-service MapReduce data optimal distribution method and system
CN106980906B (en) Spark-based Ftrl voltage prediction method
CN113283044A (en) Edge calculation method for urban gas pipe network blockage diagnosis
CN113568759B (en) Cloud computing-based big data processing method and system
CN113328467B (en) Probability voltage stability evaluation method, system, terminal device and medium
CN117193989B (en) Data centralized scheduling method of partitioned data center and related equipment thereof
Luan et al. SCHED²: Scheduling Deep Learning Training via Deep Reinforcement Learning
CN114461384A (en) Task execution method and device, computer equipment and storage medium
CN113139698A (en) Load prediction method, device and equipment
CN113094899A (en) Random power flow calculation method and device, electronic equipment and storage medium
CN112558869A (en) Remote sensing image caching method based on big data
Shardakov et al. Generating of the coefficient matrix of the system of homogeneous differential equations
CN112632615B (en) Scientific workflow data layout method based on hybrid cloud environment
EP3365787A1 (en) Data storage device monitoring
CN115454585A (en) Adaptive batch processing and parallel scheduling system for deep learning model inference of edge equipment
CN115377967A (en) Method and system for calculating available transmission capacity of power grid based on mode
CN111476316B (en) Method and system for clustering mean value of power load characteristic data based on cloud computing
CN111858051B (en) Real-time dynamic scheduling method, system and medium suitable for edge computing environment
Houeland et al. The utility problem for lazy learners-towards a non-eager approach
CN111177106A (en) Distributed data computing system and method
CN117667602B (en) Cloud computing-based online service computing power optimization method and device
CN116777162A (en) Instruction distribution method, apparatus, computer device and storage medium
CN117453376B (en) Control method, device, equipment and storage medium for high-throughput calculation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant