CN114398339A

CN114398339A - Data scheduling method and device and electronic equipment

Info

Publication number: CN114398339A
Application number: CN202111585351.XA
Authority: CN
Inventors: 童剑; 刘瀚阳; 陈书宁
Original assignee: Pingkai Star Beijing Technology Co ltd
Current assignee: Pingkai Star Beijing Technology Co ltd
Priority date: 2021-12-17
Filing date: 2021-12-17
Publication date: 2022-04-26

Abstract

The embodiment of the application relates to the technical field of databases, and discloses a data scheduling method, a data scheduling device and electronic equipment, wherein the method comprises the following steps: acquiring a scheduling parameter of a storage node in a first scheduling period; wherein the scheduling parameter is determined according to predicted load data of the first scheduling period; the predicted load data is obtained by prediction according to historical data; the historical data comprises data of a historical scheduling period prior to the first scheduling period; and scheduling the data of the storage nodes according to the scheduling parameters. The embodiment of the application solves the problem that in the prior art, in the process of scheduling data in a database, the dependence degree on the accuracy of the model and the trained data volume is high.

Description

Data scheduling method and device and electronic equipment

Technical Field

The application relates to the technical field of databases, in particular to a data scheduling method, a data scheduling device and electronic equipment.

Background

In the field of databases, it is often desirable to schedule data between storage nodes (stores) of the database, such as a distributed database. In order to improve the data scheduling efficiency, in the related art, the load condition of the storage node in the subsequent time is usually predicted according to a prediction model, and then data scheduling is performed according to the predicted load. However, the prediction model is usually trained based on big data, for example, machine learning or the like is adopted, and the accuracy of the model depends on the accuracy of the model itself and the amount of trained data, so that there is a certain uncertainty in the prediction result, and thus the data scheduling also has a high degree of dependence on the accuracy of the model itself and the amount of trained data.

Disclosure of Invention

The embodiment of the application provides a data scheduling method, a data scheduling device and electronic equipment, and aims to solve the problem that in the prior art, in the process of scheduling data in a database, the dependence degree on the accuracy of a model and the trained data volume is high.

Correspondingly, the embodiment of the application also provides a data scheduling device, an electronic device and a storage medium, which are used for ensuring the implementation and application of the method.

In order to solve the above problem, an embodiment of the present application discloses a data scheduling method, where the method includes:

acquiring a scheduling parameter of a storage node in a first scheduling period; wherein the scheduling parameter is determined according to predicted load data of the first scheduling period; the predicted load data is obtained by prediction according to history data; the historical data comprises data of a historical scheduling period prior to the first scheduling period;

and scheduling the data of the storage nodes according to the scheduling parameters.

Optionally, before acquiring the scheduling parameter of the storage node in the first scheduling period, the method further includes:

acquiring the historical data of a first preset number of the historical scheduling periods;

predicting to obtain the predicted load data of a second preset number of predicted scheduling cycles according to the historical data; the predicted scheduling period comprises the first scheduling period;

and determining the scheduling parameters of the predicted scheduling period according to the predicted load data.

Optionally, the predicting load data of a second preset number of predicted scheduling cycles according to the historical data includes:

acquiring a hot spot distribution matrix of a hot spot region in the historical data;

determining a load distribution matrix according to the hotspot distribution matrix;

and determining the predicted load data of each storage node according to the load distribution matrix.

Optionally, determining a load distribution matrix according to the hotspot distribution matrix includes;

determining a preferred replica matrix in the hotspot distribution matrix;

acquiring a load matrix of the preferred replica matrix;

a load distribution matrix for each storage node of the load matrix is determined.

Optionally, the determining the scheduling parameter of the predicted scheduling cycle according to the predicted load data includes:

determining system evaluation parameters of alternative scheduling parameters according to the predicted load data; the system evaluation parameters comprise load balance index parameters and migration cost parameters;

and determining the scheduling parameters in the alternative scheduling parameters according to the system evaluation parameters.

Optionally, the determining a scheduling parameter of the candidate scheduling parameters according to the system evaluation parameter includes:

determining a minimum value of a sum of the load balancing index parameter and the migration cost parameter according to a first data relationship, where the candidate scheduling parameter corresponding to the minimum value is the scheduling parameter, and the first data relationship includes:

wherein J (U) representsThe system evaluates a parameter, U_iAn input sequence representing the ith said predicted scheduling cycle, k representing said second preset number; y is_iThe load balancing index parameter representing the ith predicted scheduling period, and Q represents a first weight of the load balancing index parameter; w represents a second weight of the migration cost parameter.

Optionally, the scheduling parameter includes an original storage node and a target storage node corresponding to the data scheduling operation;

the scheduling data of the storage node according to the scheduling parameter includes:

and migrating the data of the original storage node to the target storage node.

The embodiment of the application also discloses a data scheduling device, which comprises:

the parameter acquisition module is used for acquiring the scheduling parameters of the storage node in a first scheduling period;

wherein the scheduling parameter is determined according to predicted load data of the first scheduling period; the predicted load data is obtained by prediction according to historical data; the historical data comprises data of a historical scheduling period prior to the first scheduling period;

and the data scheduling module is used for scheduling the data of the storage nodes according to the scheduling parameters.

An embodiment of the present application further discloses an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the data scheduling method shown in the first aspect of the present application is implemented.

The embodiment of the application also discloses a computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, and the computer program is used for realizing the method according to one or more of the embodiment of the application when being executed by a processor.

The technical scheme provided by the embodiment of the application has the following beneficial effects:

in the embodiment of the application, the scheduling parameters of the storage node in a first scheduling period are obtained; scheduling data of the storage nodes according to the scheduling parameters; the scheduling parameters are determined according to the predicted load data of the first scheduling period, the control process and the prediction process are combined, the influence of the scheduling operation on the database system is fully considered, and the robustness of the system is improved.

Additional aspects and advantages of embodiments of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and more readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

fig. 1 is a flowchart of a data scheduling method according to an embodiment of the present application;

fig. 2 is a second flowchart of a data scheduling method according to an embodiment of the present application;

fig. 3 is a third flowchart of a data scheduling method according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a data scheduling apparatus according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Embodiments of the present application are described below in conjunction with the drawings in the present application. It should be understood that the embodiments set forth below in connection with the accompanying drawings are illustrative descriptions for explaining technical solutions of the embodiments of the present application, and do not limit the technical solutions of the embodiments of the present application.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, information, data, steps, operations, elements, and/or components, but do not preclude the presence or addition of other features, information, data, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein indicates at least one of the items defined by the term, e.g., "a and/or B" may be implemented as "a", or as "B", or as "a and B".

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Referring to fig. 1, an embodiment of the present application provides a data scheduling method, which is optionally applied to a distributed database, and the method may include the following steps:

step 101, acquiring a scheduling parameter of a storage node in a first scheduling period;

wherein the scheduling parameter is determined according to predicted load data of the first scheduling period; the predicted load data is obtained by prediction according to historical data; the historical data includes data of historical scheduling periods prior to the first scheduling period.

The first scheduling cycle is a cycle in which data scheduling is to be performed at the present time, and the data scheduling, for example, migrates data of the first storage node (store1) to the second storage node (store 2). A scheduling parameter such as the amount of data scheduled per scheduling period.

The scheduling parameters are determined according to the predicted load data of the first scheduling period, for example, the predicted load data of the first scheduling period is obtained by prediction at a time before the first scheduling period, then the scheduling parameters of the first scheduling period are obtained according to the predicted load data, and the minimum scheduling parameter causing the system control error is selected, so that the influence of scheduling on the database system is minimum; for example, the scheduling parameter causing the minimum load change or the minimum migration cost is selected to reduce the shock of the data scheduling to the database system and influence the performance of the database system.

The historical data includes data of a historical scheduling period before the first scheduling period, for example, data of C historical scheduling periods before the first scheduling period, where C may be a smaller positive integer, for example, a value not greater than 10, and does not need to be adjusted by larger training data, so that the combination degree of the prediction process and the scheduling process is higher, and the interpretability is higher.

And step 102, scheduling the data of the storage nodes according to the scheduling parameters.

Scheduling data of the storage nodes according to the scheduling parameters to realize the error of the control system by combining the scheduling parameters; the scheduling parameters are determined according to the predicted load data, the control process and the prediction process are combined, the influence of the scheduling operation on the database system is fully considered, and the robustness of the system is improved.

In this way, in the process of data scheduling, the degree of dependence on the prediction model is reduced, and further the degree of dependence on the accuracy of the model and the trained data volume is reduced.

Optionally, in this embodiment of the present application, before the obtaining of the scheduling parameter of the storage node in the first scheduling period, the method further includes a first step to a third step:

the method comprises the following steps that firstly, historical data of a first preset number of historical scheduling cycles are obtained;

secondly, predicting the predicted load data of a second preset number of predicted scheduling cycles according to the historical data; the predicted scheduling period comprises the first scheduling period;

and thirdly, determining the scheduling parameters of the predicted scheduling period according to the predicted load data.

Before scheduling data according to the scheduling parameters, load data of a future scheduling period is predicted according to historical data, and the scheduling parameters are generated according to the predicted load data. Considering that the hot spots of the distributed database have randomness, predicting data of a second preset number of periods in the future based on historical data, and calculating an error to minimize the total error; and determining the input of a predicted scheduling period according to the scheduling parameters, and adjusting the system by taking the input of the next scheduling period as a control variable, thereby improving the prediction precision and the control response effect.

In addition, one data scheduling operation usually has a certain long tail influence, which affects the system performance of multiple scheduling periods, thereby causing a shock effect on the control of the system and responding to the system performance. In the embodiment of the application, the influence of data scheduling operation on a plurality of scheduling cycles is considered by predicting the data of the plurality of cycles, so that unnecessary scheduling is reduced, and the quick response performance of the system is improved.

Optionally, the second step further comprises:

acquiring a hot spot distribution matrix of a hot spot region in the historical data; for example, the distributed storage node reports a region (region) of a top100 of the node and a current store hot spot index to the metadata management node through a GRPC protocol, and obtains a hot spot distribution matrix from the metadata management node; optionally, the metadata management node can also filter the metadata to reduce detection errors;

determining a load distribution matrix according to the hot spot distribution matrix, wherein the load distribution matrix records the region load distribution on each store;

and determining the predicted load data of each storage node, namely the sum of the loads of the regions on all the stores according to the load distribution matrix.

Optionally, in this embodiment of the present application, the determining a load distribution matrix according to the hotspot distribution matrix includes;

determining a preferred replica matrix in the hotspot distribution matrix; in general, each hotspot distribution matrix comprises at least two replica matrices, so a preferred replica matrix region leader in the replica matrices needs to be selected; for example, a leader is selected through the raft protocol, and the leader distribution condition of each region is recorded through a leader matrix;

acquiring a load matrix of the preferred replica matrix; the load matrix records the read flow parameters of each region;

and determining a load distribution matrix of each storage node of the load matrix, wherein the load distribution matrix records the region load distribution on each storage.

Optionally, the following describes a data scheduling method provided in the embodiment of the present application with a specific example, which mainly includes two stages: a prediction phase and a control phase.

Referring to fig. 2, the prediction phase includes steps 201 to 205.

Step 201, obtaining a distribution matrix A of a hot spot region in the history data_MNWherein A is_MNAs shown in matrix 1:

in the matrix 1, M represents a row, N represents a column, and each element in the matrix represents a storage node; wherein, 1 represents that the storage node has a hot spot, and 0 represents that the storage node has no hot spot.

Step 202, determining the distribution matrix A_MNThe preferred replica matrix region leader in (1).

Selecting a distribution matrix L of the region leader_MN，L_MNRecord leader distribution, L, for each region_MNAs shown in matrix 2:

wherein, the replica matrix satisfies the following constraint condition 1:

p denotes the default number of copies, for example P may be 3.

leader satisfies constraint conditions2：

j∈M，

Wherein xnor represents an exclusive nor operation.

Step 203, obtaining the read flow parameter of each region in the region leader to obtain the load matrix region_query。

Recording the read flow parameters of each region; since read traffic will only be done on the region leader, the region_queryIs a diagonal matrix, as shown in matrix 3:

wherein diag denotes a diagonal matrix.

Step 204, according to the load matrix region_queryDetermining a load matrix of each storage node store of the region leader

Recording the distribution of region loads on each store to obtain a load matrix

As shown in the following matrix 4:

where dot represents dot product.

Step 205, counting the load matrix

The predicted load data store of each of the stores_query，store_queryAs shown in the following matrix 5:

optionally, in this embodiment of the present application, the determining the scheduling parameter of the predicted scheduling period according to the predicted load data includes:

Load balancing index parameters such as system errors and migration cost parameters such as CPU utilization rate occupied by data scheduling operation and network bandwidth; and according to the load balancing index parameter and the migration cost parameter, fully considering the influence of the scheduling operation on the database system. For the distributed storage node, various scheduling parameters (i.e., parameters corresponding to various scheduling operations) may be preset, and then system evaluation parameters corresponding to each (or each group of) scheduling parameters are calculated respectively. The system evaluation parameters include a load balance index parameter and a migration cost parameter, and in general, the smaller the value of the system evaluation parameter is, the more stable the system evaluation parameter is, so that an alternative scheduling parameter with a smaller system evaluation parameter can be selected as a final scheduling parameter.

Further, the determining a scheduling parameter of the candidate scheduling parameters according to the system evaluation parameter includes:

wherein J (U) represents the system evaluation parameter, U_iTo representAn input sequence of the ith said predicted scheduling period, k representing said second preset number; y is_iThe load balancing indicator parameter representing the ith said predicted scheduling period,

representing the migration cost parameter, Q representing a first weight of the load balancing indicator parameter; w represents a second weight of the migration cost parameter.

The derivation of the first data relationship is first described below:

the first step is based on a load mean square error minimum formula:

wherein X (t) represents the distribution matrix L of the t-th scheduling period_MN(ii) a X (t +1) represents the distribution matrix L of the t +1 th scheduling period_MN(ii) a Y (k) represents a load balancing index parameter;

A. b respectively represents an identity matrix E;

u represents a scheduling parameter, which may be a migration matrix resulting from a data scheduling operation;

c denotes the preset operator dot (ones (1, N), region _ query).

D is preset to be 0;

m denotes a scheduling delay.

Secondly, based on formula 1, formula 2 is obtained:

thirdly, a new formula of the prediction region leader is obtained, as shown in the following formula 3:

X(t)＝[x(t+1|t)]^T，[x(t+2|t)]^T，……，[x(t+k|t)]^T(formula 3)

Wherein k represents a second preset number, i.e., the number of the predicted scheduling periods, and is, for example, 3; t denotes a transposed matrix.

Fourthly, obtaining a scheduling parameter U (t) according to a preset scheduling parameter formula (formula 4):

U(t)＝[u(t|t)]^T，[u(t+1|t)]^T，……，[u(t+k-1|t)]^T(formula 4)

Substituting U (t) into the formula of X (t) yields:

the scheduling parameter u (t) satisfies the following constraint 3 and constraint 4:

constraint 3:

j∈M，

constraint 4:

j∈M，

the scheduler _ limit represents scheduling concurrency number at the same time, and is used for preventing the scheduling concurrency number at the same time from being too high and influencing foreground service of the distributed database; the store _ limit represents the number of schedules on a single store to prevent too many schedules on a particular store from causing the single database to be over stressed.

And fifthly, obtaining a first data relation based on the steps:

wherein J (U) represents the system evaluation parameter, U_iAn input sequence representing the ith said predicted scheduling cycle, k representing said second preset number;

Y_ithe load balancing index parameter representing the ith predicted scheduling period, and Q represents a first weight of the load balancing index parameter; w represents a second weight of the migration cost parameter.

Wherein the content of the first and second substances,

the load mean square error is minimized, namely the load of each node of the cluster is balanced as much as possible;

representing a migration cost, for example, the migration of a region from store1 to store2 requires the use of CPU and network bandwidth.

With reference to the foregoing example, referring to fig. 3, the foregoing control phase includes steps 301 to 302.

Step 301, determining a load balancing index parameter according to the predicted load data.

Step 302, determining a scheduling parameter in the alternative scheduling parameters according to the system evaluation parameters; optionally, a heuristic algorithm may be used to solve the first data relationship, and obtain the control input sequence U ═ U (t), U (t +1), U (t +2) indicated by the scheduling parameters.

And then, using the first input as a control variable for scheduling, and repeatedly predicting and controlling the hot spot distribution of the distributed database.

Specifically, the following describes a data scheduling method provided in the embodiment of the present application with a specific example, for example, a distribution matrix a of a hot spot area_MNIs as follows;

then

And a load matrix

Further obtain a load distribution matrix

statistics on store

For each group of multiple groups of alternative scheduling parameters, respectively calculating system evaluation parameters according to the following modes:

taking the example that the scheduling parameter is migrated from the store1 to the store2, assuming that there is no scheduling in the former M-5 scheduling periods, the reported data are the same, and the scheduling delay M is 5, the predicted time domain k is 3, and meanwhile, the Q, W weight is set as the identity matrix, there are:

1. migration cost parameter: c (u) dot (abs (u), ons (N, 1)); wherein abs (U) represents the absolute value operation.

2. The first data relationship:

at this time, the scheduling parameters of the following 3 predicted scheduling time periods (t +1, t +2, t +3) need to be calculated.

The last three plans are assumed to be as follows: scheduling in a t +1 scheduling period, wherein the two scheduling periods of t +2 and t +3 do not generate scheduling, and the scheduling delay is considered to be completed in 1 period, so that the distribution of the hot spot distribution leader in the following three periods can be calculated:

and

the same;

since no schedule is generated, its value is and

the same is true.

Due to the storage_queryOnly with L_MNCorrelation, therefore, only the load distribution matrix of the variation time (the time after the delay of t +1 by 1 cycle, i.e. the predicted scheduling cycle of t +2) needs to be calculated

Similarly, the total load corresponding to the store is counted

And

the same is not calculated again and,

and

the same is not calculated again, so the t to t +3 period, the previous term of the first data relationship, can be calculated:

since the scheduling delay k is 1, only t +1 has the calculation migration cost, and neither t +2 nor t + 3; then

Based on the second term of the first data relationship, the migration cost over the entire period:

then, the minimum value of the first data relation (overall expected curve) is j (u) 21+ 4-25, i.e., the scheduling parameter is moved from the store1 to the store2 in the scheduling period of t + 2.

Optionally, in this embodiment of the present application, the scheduling parameter includes an original storage node and a target storage node corresponding to the data scheduling operation;

and migrating the data of the original storage node to the target storage node, wherein optionally, the scheduling parameter may be carried in a scheduling instruction, and the scheduling instruction mainly includes a region (scheduling target) to be scheduled and a scheduling parameter (indicating from which node to migrate, i.e., a scheduling direction). Optionally, the scheduling instruction (there may be multiple specific scheduling steps) may be issued to the scheduling target in a heartbeat manner of the GRPC, and the scheduling target may be migrated to the target storage node according to the scheduling instruction and simultaneously reported to the meta-information node of the database.

Based on the same principle as the method provided in the embodiment of the present application, an embodiment of the present application further provides a data scheduling apparatus, as shown in fig. 4, the apparatus includes:

a parameter obtaining module 401, configured to obtain a scheduling parameter of a storage node in a first scheduling period;

A data scheduling module 402, configured to schedule the data of the storage node according to the scheduling parameter.

In an alternative embodiment, the apparatus comprises:

a data obtaining module, configured to obtain the history data of a first preset number of historical scheduling periods before the parameter obtaining module 401 obtains the scheduling parameter of the storage node in the first scheduling period;

the data prediction module is used for predicting the predicted load data of a second preset number of predicted scheduling cycles according to the historical data; the predicted scheduling period comprises the first scheduling period;

a parameter generation module, configured to determine the scheduling parameter of the predicted scheduling period according to the predicted load data.

In an optional embodiment, the data prediction module comprises:

the first acquisition submodule is used for acquiring a hot spot distribution matrix of a hot spot area in the historical data;

the first determining submodule is used for determining a load distribution matrix according to the hotspot distribution matrix;

and the statistic submodule is used for determining the predicted load data of each storage node according to the load distribution matrix.

In an alternative embodiment, the first determining sub-module is configured to:

determining a preferred replica matrix in the hotspot distribution matrix;

acquiring a load matrix of the preferred replica matrix;

In an optional embodiment, the parameter generation module comprises:

the second determining submodule is used for determining a system evaluation parameter of the alternative scheduling parameter according to the predicted load data; the system evaluation parameters comprise load balance index parameters and migration cost parameters;

and the generation submodule is used for determining the scheduling parameters in the alternative scheduling parameters according to the system evaluation parameters.

In an alternative embodiment, the generation submodule is configured to:

wherein J (U) represents the system evaluation parameter, U_iAn input sequence representing the ith said predicted scheduling cycle, k representing said second preset number; y is_iThe load balancing index parameter representing the ith predicted scheduling period, and Q represents a first weight of the load balancing index parameter; w represents a second weight of the migration cost parameter.

In an optional embodiment, the scheduling parameter includes an original storage node and a target storage node corresponding to a data scheduling operation;

the data scheduling module 402 is configured to:

and migrating the data of the original storage node to the target storage node.

The data scheduling apparatus provided in the embodiment of the present application can implement each process implemented in the method embodiments of fig. 1 to fig. 3, and is not described here again to avoid repetition.

In the data scheduling apparatus provided by the application, the parameter obtaining module 401 obtains a scheduling parameter of a storage node in a first scheduling period; the data scheduling module 402 schedules the data of the storage node according to the scheduling parameter; the scheduling parameters are determined according to the predicted load data of the first scheduling period, the control process and the prediction process are combined, the influence of the scheduling operation on the database system is fully considered, and the robustness of the system is improved.

The data scheduling apparatus of the embodiment of the present application may execute the data scheduling method provided in the embodiment of the present application, and the implementation principle is similar, actions executed by each module and unit in the data scheduling apparatus in each embodiment of the present application correspond to steps in the data scheduling method in each embodiment of the present application, and detailed functional descriptions of each module of the data scheduling apparatus may specifically refer to descriptions in the corresponding data scheduling method shown in the foregoing, and are not described again here.

Based on the same principle as the method shown in the embodiments of the present application, the embodiments of the present application also provide an electronic device, which may include but is not limited to: a processor and a memory; a memory for storing a computer program; a processor for executing the data scheduling method of any of the alternative embodiments of the present application by calling a computer program.

In an alternative embodiment, there is also provided an electronic device, as shown in fig. 5, the electronic device 5000 shown in fig. 5 includes: a processor 5001 and a memory 5003. The processor 5001 and the memory 5003 are coupled, such as via a bus 5002. Optionally, the electronic device 5000 may further include a transceiver 5004, and the transceiver 5004 may be used for data interaction between the electronic device and other electronic devices, such as transmission of data and/or reception of data. It should be noted that the transceiver 5004 is not limited to one in practical application, and the structure of the electronic device 5000 is not limited to the embodiment of the present application.

The Processor 5001 may be a CPU (Central Processing Unit), a general purpose Processor, a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array) or other Programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure herein. The processor 5001 may also be a combination of processors implementing computing functionality, e.g., a combination comprising one or more microprocessors, a combination of DSPs and microprocessors, or the like.

Bus 5002 can include a path that conveys information between the aforementioned components. The bus 5002 may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus 5002 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 5, but this is not intended to represent only one bus or type of bus.

The Memory 5003 may be a ROM (Read Only Memory) or other types of static storage devices that can store static information and instructions, a RAM (Random Access Memory) or other types of dynamic storage devices that can store information and instructions, an EEPROM (Electrically Erasable Programmable Read Only Memory), a CD-ROM (Compact Disc Read Only Memory) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), a magnetic Disc storage medium, other magnetic storage devices, or any other medium that can be used to carry or store a computer program and that can be Read by a computer, and is not limited herein.

The memory 5003 is used for storing computer programs for executing the embodiments of the present application, and is controlled by the processor 5001 for execution. The processor 5001 is configured to execute computer programs stored in the memory 5003 to implement the steps shown in the foregoing method embodiments.

Among them, electronic devices include but are not limited to: mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and fixed terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

Embodiments of the present application provide a computer-readable storage medium, on which a computer program is stored, and the computer program, when being executed by a processor, can implement the steps of the foregoing method embodiments and corresponding content.

Embodiments of the present application further provide a computer program product, which includes a computer program, and when the computer program is executed by a processor, the steps and corresponding contents of the foregoing method embodiments can be implemented.

The terms "first," "second," "third," "fourth," "1," "2," and the like in the description and in the claims of the present application and in the above-described drawings (if any) are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in other sequences than illustrated or otherwise described herein.

It should be understood that, although each operation step is indicated by an arrow in the flowchart of the embodiment of the present application, the implementation order of the steps is not limited to the order indicated by the arrow. In some implementation scenarios of the embodiments of the present application, the implementation steps in the flowcharts may be performed in other sequences as desired, unless explicitly stated otherwise herein. In addition, some or all of the steps in each flowchart may include multiple sub-steps or multiple stages based on an actual implementation scenario. Some or all of these sub-steps or stages may be performed at the same time, or each of these sub-steps or stages may be performed at different times. In a scenario where execution times are different, an execution sequence of the sub-steps or the phases may be flexibly configured according to requirements, which is not limited in the embodiment of the present application.

The foregoing is only an optional implementation manner of a part of implementation scenarios in this application, and it should be noted that, for those skilled in the art, other similar implementation means based on the technical idea of this application are also within the protection scope of the embodiments of this application without departing from the technical idea of this application.

Claims

1. A method for scheduling data, comprising:

acquiring a scheduling parameter of a storage node in a first scheduling period;

2. The data scheduling method of claim 1, wherein the obtaining the scheduling parameter of the storage node in the first scheduling period is preceded by the method further comprising:

predicting the predicted load data of a second preset number of predicted scheduling cycles according to the historical data; the predicted scheduling period comprises the first scheduling period;

3. The data scheduling method of claim 2, wherein the predicting the predicted load data of a second preset number of predicted scheduling cycles according to the historical data comprises:

4. The data scheduling method according to claim 3, wherein the determining a load distribution matrix according to the hotspot distribution matrix comprises;

determining a preferred replica matrix in the hotspot distribution matrix;

acquiring a load matrix of the preferred replica matrix;

5. The data scheduling method of claim 2, wherein the determining the scheduling parameter of the predicted scheduling period according to the predicted load data comprises:

6. The data scheduling method of claim 5, wherein the determining the scheduling parameter of the alternative scheduling parameters according to the system evaluation parameter comprises:

determining the minimum value of the sum of the load balancing index parameter and the migration cost parameter according to a first data relationship, wherein the alternative scheduling parameter corresponding to the minimum value is the scheduling parameter;

wherein the first data relationship comprises:

7. The data scheduling method according to any one of claims 1 to 6, wherein the scheduling parameters include an original storage node and a target storage node corresponding to a data scheduling operation;

and migrating the data of the original storage node to the target storage node.

8. A data scheduling apparatus, comprising:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any one of claims 1 to 7 when executing the program.

10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method of any one of claims 1 to 7.