CN114398339A - Data scheduling method and device and electronic equipment - Google Patents

Data scheduling method and device and electronic equipment Download PDF

Info

Publication number
CN114398339A
CN114398339A CN202111585351.XA CN202111585351A CN114398339A CN 114398339 A CN114398339 A CN 114398339A CN 202111585351 A CN202111585351 A CN 202111585351A CN 114398339 A CN114398339 A CN 114398339A
Authority
CN
China
Prior art keywords
scheduling
data
parameter
predicted
load
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111585351.XA
Other languages
Chinese (zh)
Inventor
童剑
刘瀚阳
陈书宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Pingkai Star Beijing Technology Co ltd
Original Assignee
Pingkai Star Beijing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pingkai Star Beijing Technology Co ltd filed Critical Pingkai Star Beijing Technology Co ltd
Priority to CN202111585351.XA priority Critical patent/CN114398339A/en
Publication of CN114398339A publication Critical patent/CN114398339A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/214Database migration support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system

Abstract

The embodiment of the application relates to the technical field of databases, and discloses a data scheduling method, a data scheduling device and electronic equipment, wherein the method comprises the following steps: acquiring a scheduling parameter of a storage node in a first scheduling period; wherein the scheduling parameter is determined according to predicted load data of the first scheduling period; the predicted load data is obtained by prediction according to historical data; the historical data comprises data of a historical scheduling period prior to the first scheduling period; and scheduling the data of the storage nodes according to the scheduling parameters. The embodiment of the application solves the problem that in the prior art, in the process of scheduling data in a database, the dependence degree on the accuracy of the model and the trained data volume is high.

Description

Data scheduling method and device and electronic equipment
Technical Field
The application relates to the technical field of databases, in particular to a data scheduling method, a data scheduling device and electronic equipment.
Background
In the field of databases, it is often desirable to schedule data between storage nodes (stores) of the database, such as a distributed database. In order to improve the data scheduling efficiency, in the related art, the load condition of the storage node in the subsequent time is usually predicted according to a prediction model, and then data scheduling is performed according to the predicted load. However, the prediction model is usually trained based on big data, for example, machine learning or the like is adopted, and the accuracy of the model depends on the accuracy of the model itself and the amount of trained data, so that there is a certain uncertainty in the prediction result, and thus the data scheduling also has a high degree of dependence on the accuracy of the model itself and the amount of trained data.
Disclosure of Invention
The embodiment of the application provides a data scheduling method, a data scheduling device and electronic equipment, and aims to solve the problem that in the prior art, in the process of scheduling data in a database, the dependence degree on the accuracy of a model and the trained data volume is high.
Correspondingly, the embodiment of the application also provides a data scheduling device, an electronic device and a storage medium, which are used for ensuring the implementation and application of the method.
In order to solve the above problem, an embodiment of the present application discloses a data scheduling method, where the method includes:
acquiring a scheduling parameter of a storage node in a first scheduling period; wherein the scheduling parameter is determined according to predicted load data of the first scheduling period; the predicted load data is obtained by prediction according to history data; the historical data comprises data of a historical scheduling period prior to the first scheduling period;
and scheduling the data of the storage nodes according to the scheduling parameters.
Optionally, before acquiring the scheduling parameter of the storage node in the first scheduling period, the method further includes:
acquiring the historical data of a first preset number of the historical scheduling periods;
predicting to obtain the predicted load data of a second preset number of predicted scheduling cycles according to the historical data; the predicted scheduling period comprises the first scheduling period;
and determining the scheduling parameters of the predicted scheduling period according to the predicted load data.
Optionally, the predicting load data of a second preset number of predicted scheduling cycles according to the historical data includes:
acquiring a hot spot distribution matrix of a hot spot region in the historical data;
determining a load distribution matrix according to the hotspot distribution matrix;
and determining the predicted load data of each storage node according to the load distribution matrix.
Optionally, determining a load distribution matrix according to the hotspot distribution matrix includes;
determining a preferred replica matrix in the hotspot distribution matrix;
acquiring a load matrix of the preferred replica matrix;
a load distribution matrix for each storage node of the load matrix is determined.
Optionally, the determining the scheduling parameter of the predicted scheduling cycle according to the predicted load data includes:
determining system evaluation parameters of alternative scheduling parameters according to the predicted load data; the system evaluation parameters comprise load balance index parameters and migration cost parameters;
and determining the scheduling parameters in the alternative scheduling parameters according to the system evaluation parameters.
Optionally, the determining a scheduling parameter of the candidate scheduling parameters according to the system evaluation parameter includes:
determining a minimum value of a sum of the load balancing index parameter and the migration cost parameter according to a first data relationship, where the candidate scheduling parameter corresponding to the minimum value is the scheduling parameter, and the first data relationship includes:
Figure BDA0003418193690000031
wherein J (U) representsThe system evaluates a parameter, UiAn input sequence representing the ith said predicted scheduling cycle, k representing said second preset number; y isiThe load balancing index parameter representing the ith predicted scheduling period, and Q represents a first weight of the load balancing index parameter; w represents a second weight of the migration cost parameter.
Optionally, the scheduling parameter includes an original storage node and a target storage node corresponding to the data scheduling operation;
the scheduling data of the storage node according to the scheduling parameter includes:
and migrating the data of the original storage node to the target storage node.
The embodiment of the application also discloses a data scheduling device, which comprises:
the parameter acquisition module is used for acquiring the scheduling parameters of the storage node in a first scheduling period;
wherein the scheduling parameter is determined according to predicted load data of the first scheduling period; the predicted load data is obtained by prediction according to historical data; the historical data comprises data of a historical scheduling period prior to the first scheduling period;
and the data scheduling module is used for scheduling the data of the storage nodes according to the scheduling parameters.
An embodiment of the present application further discloses an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the data scheduling method shown in the first aspect of the present application is implemented.
The embodiment of the application also discloses a computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, and the computer program is used for realizing the method according to one or more of the embodiment of the application when being executed by a processor.
The technical scheme provided by the embodiment of the application has the following beneficial effects:
in the embodiment of the application, the scheduling parameters of the storage node in a first scheduling period are obtained; scheduling data of the storage nodes according to the scheduling parameters; the scheduling parameters are determined according to the predicted load data of the first scheduling period, the control process and the prediction process are combined, the influence of the scheduling operation on the database system is fully considered, and the robustness of the system is improved.
Additional aspects and advantages of embodiments of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and more readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a flowchart of a data scheduling method according to an embodiment of the present application;
fig. 2 is a second flowchart of a data scheduling method according to an embodiment of the present application;
fig. 3 is a third flowchart of a data scheduling method according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a data scheduling apparatus according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Embodiments of the present application are described below in conjunction with the drawings in the present application. It should be understood that the embodiments set forth below in connection with the accompanying drawings are illustrative descriptions for explaining technical solutions of the embodiments of the present application, and do not limit the technical solutions of the embodiments of the present application.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, information, data, steps, operations, elements, and/or components, but do not preclude the presence or addition of other features, information, data, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein indicates at least one of the items defined by the term, e.g., "a and/or B" may be implemented as "a", or as "B", or as "a and B".
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
Referring to fig. 1, an embodiment of the present application provides a data scheduling method, which is optionally applied to a distributed database, and the method may include the following steps:
step 101, acquiring a scheduling parameter of a storage node in a first scheduling period;
wherein the scheduling parameter is determined according to predicted load data of the first scheduling period; the predicted load data is obtained by prediction according to historical data; the historical data includes data of historical scheduling periods prior to the first scheduling period.
The first scheduling cycle is a cycle in which data scheduling is to be performed at the present time, and the data scheduling, for example, migrates data of the first storage node (store1) to the second storage node (store 2). A scheduling parameter such as the amount of data scheduled per scheduling period.
The scheduling parameters are determined according to the predicted load data of the first scheduling period, for example, the predicted load data of the first scheduling period is obtained by prediction at a time before the first scheduling period, then the scheduling parameters of the first scheduling period are obtained according to the predicted load data, and the minimum scheduling parameter causing the system control error is selected, so that the influence of scheduling on the database system is minimum; for example, the scheduling parameter causing the minimum load change or the minimum migration cost is selected to reduce the shock of the data scheduling to the database system and influence the performance of the database system.
The historical data includes data of a historical scheduling period before the first scheduling period, for example, data of C historical scheduling periods before the first scheduling period, where C may be a smaller positive integer, for example, a value not greater than 10, and does not need to be adjusted by larger training data, so that the combination degree of the prediction process and the scheduling process is higher, and the interpretability is higher.
And step 102, scheduling the data of the storage nodes according to the scheduling parameters.
Scheduling data of the storage nodes according to the scheduling parameters to realize the error of the control system by combining the scheduling parameters; the scheduling parameters are determined according to the predicted load data, the control process and the prediction process are combined, the influence of the scheduling operation on the database system is fully considered, and the robustness of the system is improved.
In this way, in the process of data scheduling, the degree of dependence on the prediction model is reduced, and further the degree of dependence on the accuracy of the model and the trained data volume is reduced.
Optionally, in this embodiment of the present application, before the obtaining of the scheduling parameter of the storage node in the first scheduling period, the method further includes a first step to a third step:
the method comprises the following steps that firstly, historical data of a first preset number of historical scheduling cycles are obtained;
secondly, predicting the predicted load data of a second preset number of predicted scheduling cycles according to the historical data; the predicted scheduling period comprises the first scheduling period;
and thirdly, determining the scheduling parameters of the predicted scheduling period according to the predicted load data.
Before scheduling data according to the scheduling parameters, load data of a future scheduling period is predicted according to historical data, and the scheduling parameters are generated according to the predicted load data. Considering that the hot spots of the distributed database have randomness, predicting data of a second preset number of periods in the future based on historical data, and calculating an error to minimize the total error; and determining the input of a predicted scheduling period according to the scheduling parameters, and adjusting the system by taking the input of the next scheduling period as a control variable, thereby improving the prediction precision and the control response effect.
In addition, one data scheduling operation usually has a certain long tail influence, which affects the system performance of multiple scheduling periods, thereby causing a shock effect on the control of the system and responding to the system performance. In the embodiment of the application, the influence of data scheduling operation on a plurality of scheduling cycles is considered by predicting the data of the plurality of cycles, so that unnecessary scheduling is reduced, and the quick response performance of the system is improved.
Optionally, the second step further comprises:
acquiring a hot spot distribution matrix of a hot spot region in the historical data; for example, the distributed storage node reports a region (region) of a top100 of the node and a current store hot spot index to the metadata management node through a GRPC protocol, and obtains a hot spot distribution matrix from the metadata management node; optionally, the metadata management node can also filter the metadata to reduce detection errors;
determining a load distribution matrix according to the hot spot distribution matrix, wherein the load distribution matrix records the region load distribution on each store;
and determining the predicted load data of each storage node, namely the sum of the loads of the regions on all the stores according to the load distribution matrix.
Optionally, in this embodiment of the present application, the determining a load distribution matrix according to the hotspot distribution matrix includes;
determining a preferred replica matrix in the hotspot distribution matrix; in general, each hotspot distribution matrix comprises at least two replica matrices, so a preferred replica matrix region leader in the replica matrices needs to be selected; for example, a leader is selected through the raft protocol, and the leader distribution condition of each region is recorded through a leader matrix;
acquiring a load matrix of the preferred replica matrix; the load matrix records the read flow parameters of each region;
and determining a load distribution matrix of each storage node of the load matrix, wherein the load distribution matrix records the region load distribution on each storage.
Optionally, the following describes a data scheduling method provided in the embodiment of the present application with a specific example, which mainly includes two stages: a prediction phase and a control phase.
Referring to fig. 2, the prediction phase includes steps 201 to 205.
Step 201, obtaining a distribution matrix A of a hot spot region in the history dataMNWherein A isMNAs shown in matrix 1:
Figure BDA0003418193690000071
in the matrix 1, M represents a row, N represents a column, and each element in the matrix represents a storage node; wherein, 1 represents that the storage node has a hot spot, and 0 represents that the storage node has no hot spot.
Step 202, determining the distribution matrix AMNThe preferred replica matrix region leader in (1).
Selecting a distribution matrix L of the region leaderMN,LMNRecord leader distribution, L, for each regionMNAs shown in matrix 2:
Figure BDA0003418193690000072
wherein, the replica matrix satisfies the following constraint condition 1:
Figure BDA0003418193690000081
p denotes the default number of copies, for example P may be 3.
leader satisfies constraint conditions2:
Figure BDA0003418193690000082
j∈M,
Figure BDA0003418193690000083
Wherein xnor represents an exclusive nor operation.
Step 203, obtaining the read flow parameter of each region in the region leader to obtain the load matrix regionquery
Recording the read flow parameters of each region; since read traffic will only be done on the region leader, the regionqueryIs a diagonal matrix, as shown in matrix 3:
Figure BDA0003418193690000084
wherein diag denotes a diagonal matrix.
Step 204, according to the load matrix regionqueryDetermining a load matrix of each storage node store of the region leader
Figure BDA0003418193690000085
Recording the distribution of region loads on each store to obtain a load matrix
Figure BDA0003418193690000086
As shown in the following matrix 4:
Figure BDA0003418193690000087
where dot represents dot product.
Step 205, counting the load matrix
Figure BDA0003418193690000088
The predicted load data store of each of the storesquery,storequeryAs shown in the following matrix 5:
Figure BDA0003418193690000089
optionally, in this embodiment of the present application, the determining the scheduling parameter of the predicted scheduling period according to the predicted load data includes:
determining system evaluation parameters of alternative scheduling parameters according to the predicted load data; the system evaluation parameters comprise load balance index parameters and migration cost parameters;
and determining the scheduling parameters in the alternative scheduling parameters according to the system evaluation parameters.
Load balancing index parameters such as system errors and migration cost parameters such as CPU utilization rate occupied by data scheduling operation and network bandwidth; and according to the load balancing index parameter and the migration cost parameter, fully considering the influence of the scheduling operation on the database system. For the distributed storage node, various scheduling parameters (i.e., parameters corresponding to various scheduling operations) may be preset, and then system evaluation parameters corresponding to each (or each group of) scheduling parameters are calculated respectively. The system evaluation parameters include a load balance index parameter and a migration cost parameter, and in general, the smaller the value of the system evaluation parameter is, the more stable the system evaluation parameter is, so that an alternative scheduling parameter with a smaller system evaluation parameter can be selected as a final scheduling parameter.
Further, the determining a scheduling parameter of the candidate scheduling parameters according to the system evaluation parameter includes:
determining a minimum value of a sum of the load balancing index parameter and the migration cost parameter according to a first data relationship, where the candidate scheduling parameter corresponding to the minimum value is the scheduling parameter, and the first data relationship includes:
Figure BDA0003418193690000091
wherein J (U) represents the system evaluation parameter, UiTo representAn input sequence of the ith said predicted scheduling period, k representing said second preset number; y isiThe load balancing indicator parameter representing the ith said predicted scheduling period,
Figure BDA0003418193690000092
representing the migration cost parameter, Q representing a first weight of the load balancing indicator parameter; w represents a second weight of the migration cost parameter.
The derivation of the first data relationship is first described below:
the first step is based on a load mean square error minimum formula:
Figure BDA0003418193690000093
wherein X (t) represents the distribution matrix L of the t-th scheduling periodMN(ii) a X (t +1) represents the distribution matrix L of the t +1 th scheduling periodMN(ii) a Y (k) represents a load balancing index parameter;
A. b respectively represents an identity matrix E;
u represents a scheduling parameter, which may be a migration matrix resulting from a data scheduling operation;
c denotes the preset operator dot (ones (1, N), region _ query).
D is preset to be 0;
m denotes a scheduling delay.
Secondly, based on formula 1, formula 2 is obtained:
Figure BDA0003418193690000101
thirdly, a new formula of the prediction region leader is obtained, as shown in the following formula 3:
X(t)=[x(t+1|t)]T,[x(t+2|t)]T,……,[x(t+k|t)]T(formula 3)
Wherein k represents a second preset number, i.e., the number of the predicted scheduling periods, and is, for example, 3; t denotes a transposed matrix.
Fourthly, obtaining a scheduling parameter U (t) according to a preset scheduling parameter formula (formula 4):
U(t)=[u(t|t)]T,[u(t+1|t)]T,……,[u(t+k-1|t)]T(formula 4)
Substituting U (t) into the formula of X (t) yields:
Figure BDA0003418193690000102
Figure BDA0003418193690000103
Figure BDA0003418193690000104
the scheduling parameter u (t) satisfies the following constraint 3 and constraint 4:
constraint 3:
Figure BDA0003418193690000105
j∈M,
Figure BDA0003418193690000106
constraint 4:
Figure BDA0003418193690000107
j∈M,
Figure BDA0003418193690000108
the scheduler _ limit represents scheduling concurrency number at the same time, and is used for preventing the scheduling concurrency number at the same time from being too high and influencing foreground service of the distributed database; the store _ limit represents the number of schedules on a single store to prevent too many schedules on a particular store from causing the single database to be over stressed.
And fifthly, obtaining a first data relation based on the steps:
Figure BDA0003418193690000111
wherein J (U) represents the system evaluation parameter, UiAn input sequence representing the ith said predicted scheduling cycle, k representing said second preset number;
Yithe load balancing index parameter representing the ith predicted scheduling period, and Q represents a first weight of the load balancing index parameter; w represents a second weight of the migration cost parameter.
Wherein the content of the first and second substances,
Figure BDA0003418193690000112
the load mean square error is minimized, namely the load of each node of the cluster is balanced as much as possible;
Figure BDA0003418193690000113
representing a migration cost, for example, the migration of a region from store1 to store2 requires the use of CPU and network bandwidth.
With reference to the foregoing example, referring to fig. 3, the foregoing control phase includes steps 301 to 302.
Step 301, determining a load balancing index parameter according to the predicted load data.
Step 302, determining a scheduling parameter in the alternative scheduling parameters according to the system evaluation parameters; optionally, a heuristic algorithm may be used to solve the first data relationship, and obtain the control input sequence U ═ U (t), U (t +1), U (t +2) indicated by the scheduling parameters.
And then, using the first input as a control variable for scheduling, and repeatedly predicting and controlling the hot spot distribution of the distributed database.
Specifically, the following describes a data scheduling method provided in the embodiment of the present application with a specific example, for example, a distribution matrix a of a hot spot areaMNIs as follows;
Figure BDA0003418193690000114
then
Figure BDA0003418193690000115
And a load matrix
Figure BDA0003418193690000116
Further obtain a load distribution matrix
Figure BDA0003418193690000117
Figure BDA0003418193690000121
statistics on store
Figure BDA0003418193690000122
Figure BDA0003418193690000123
For each group of multiple groups of alternative scheduling parameters, respectively calculating system evaluation parameters according to the following modes:
taking the example that the scheduling parameter is migrated from the store1 to the store2, assuming that there is no scheduling in the former M-5 scheduling periods, the reported data are the same, and the scheduling delay M is 5, the predicted time domain k is 3, and meanwhile, the Q, W weight is set as the identity matrix, there are:
1. migration cost parameter: c (u) dot (abs (u), ons (N, 1)); wherein abs (U) represents the absolute value operation.
2. The first data relationship:
Figure BDA0003418193690000124
at this time, the scheduling parameters of the following 3 predicted scheduling time periods (t +1, t +2, t +3) need to be calculated.
The last three plans are assumed to be as follows: scheduling in a t +1 scheduling period, wherein the two scheduling periods of t +2 and t +3 do not generate scheduling, and the scheduling delay is considered to be completed in 1 period, so that the distribution of the hot spot distribution leader in the following three periods can be calculated:
Figure BDA0003418193690000125
Figure BDA0003418193690000126
Figure BDA0003418193690000127
and
Figure BDA0003418193690000128
the same;
Figure BDA0003418193690000129
since no schedule is generated, its value is and
Figure BDA00034181936900001210
the same is true.
Due to the storagequeryOnly with LMNCorrelation, therefore, only the load distribution matrix of the variation time (the time after the delay of t +1 by 1 cycle, i.e. the predicted scheduling cycle of t +2) needs to be calculated
Figure BDA0003418193690000131
Figure BDA0003418193690000132
Similarly, the total load corresponding to the store is counted
Figure BDA0003418193690000133
Figure BDA0003418193690000134
And
Figure BDA0003418193690000135
the same is not calculated again and,
Figure BDA0003418193690000136
and
Figure BDA0003418193690000137
the same is not calculated again, so the t to t +3 period, the previous term of the first data relationship, can be calculated:
Figure BDA0003418193690000138
since the scheduling delay k is 1, only t +1 has the calculation migration cost, and neither t +2 nor t + 3; then
Figure BDA0003418193690000139
Based on the second term of the first data relationship, the migration cost over the entire period:
Figure BDA00034181936900001310
then, the minimum value of the first data relation (overall expected curve) is j (u) 21+ 4-25, i.e., the scheduling parameter is moved from the store1 to the store2 in the scheduling period of t + 2.
Optionally, in this embodiment of the present application, the scheduling parameter includes an original storage node and a target storage node corresponding to the data scheduling operation;
the scheduling data of the storage node according to the scheduling parameter includes:
and migrating the data of the original storage node to the target storage node, wherein optionally, the scheduling parameter may be carried in a scheduling instruction, and the scheduling instruction mainly includes a region (scheduling target) to be scheduled and a scheduling parameter (indicating from which node to migrate, i.e., a scheduling direction). Optionally, the scheduling instruction (there may be multiple specific scheduling steps) may be issued to the scheduling target in a heartbeat manner of the GRPC, and the scheduling target may be migrated to the target storage node according to the scheduling instruction and simultaneously reported to the meta-information node of the database.
In the embodiment of the application, the scheduling parameters of the storage node in a first scheduling period are obtained; scheduling data of the storage nodes according to the scheduling parameters; the scheduling parameters are determined according to the predicted load data of the first scheduling period, the control process and the prediction process are combined, the influence of the scheduling operation on the database system is fully considered, and the robustness of the system is improved.
Based on the same principle as the method provided in the embodiment of the present application, an embodiment of the present application further provides a data scheduling apparatus, as shown in fig. 4, the apparatus includes:
a parameter obtaining module 401, configured to obtain a scheduling parameter of a storage node in a first scheduling period;
wherein the scheduling parameter is determined according to predicted load data of the first scheduling period; the predicted load data is obtained by prediction according to historical data; the historical data includes data of historical scheduling periods prior to the first scheduling period.
The first scheduling cycle is a cycle in which data scheduling is to be performed at the present time, and the data scheduling, for example, migrates data of the first storage node (store1) to the second storage node (store 2). A scheduling parameter such as the amount of data scheduled per scheduling period.
The scheduling parameters are determined according to the predicted load data of the first scheduling period, for example, the predicted load data of the first scheduling period is obtained by prediction at a time before the first scheduling period, then the scheduling parameters of the first scheduling period are obtained according to the predicted load data, and the minimum scheduling parameter causing the system control error is selected, so that the influence of scheduling on the database system is minimum; for example, the scheduling parameter causing the minimum load change or the minimum migration cost is selected to reduce the shock of the data scheduling to the database system and influence the performance of the database system.
The historical data includes data of a historical scheduling period before the first scheduling period, for example, data of C historical scheduling periods before the first scheduling period, where C may be a smaller positive integer, for example, a value not greater than 10, and does not need to be adjusted by larger training data, so that the combination degree of the prediction process and the scheduling process is higher, and the interpretability is higher.
A data scheduling module 402, configured to schedule the data of the storage node according to the scheduling parameter.
Scheduling data of the storage nodes according to the scheduling parameters to realize the error of the control system by combining the scheduling parameters; the scheduling parameters are determined according to the predicted load data, the control process and the prediction process are combined, the influence of the scheduling operation on the database system is fully considered, and the robustness of the system is improved.
In this way, in the process of data scheduling, the degree of dependence on the prediction model is reduced, and further the degree of dependence on the accuracy of the model and the trained data volume is reduced.
In an alternative embodiment, the apparatus comprises:
a data obtaining module, configured to obtain the history data of a first preset number of historical scheduling periods before the parameter obtaining module 401 obtains the scheduling parameter of the storage node in the first scheduling period;
the data prediction module is used for predicting the predicted load data of a second preset number of predicted scheduling cycles according to the historical data; the predicted scheduling period comprises the first scheduling period;
a parameter generation module, configured to determine the scheduling parameter of the predicted scheduling period according to the predicted load data.
In an optional embodiment, the data prediction module comprises:
the first acquisition submodule is used for acquiring a hot spot distribution matrix of a hot spot area in the historical data;
the first determining submodule is used for determining a load distribution matrix according to the hotspot distribution matrix;
and the statistic submodule is used for determining the predicted load data of each storage node according to the load distribution matrix.
In an alternative embodiment, the first determining sub-module is configured to:
determining a preferred replica matrix in the hotspot distribution matrix;
acquiring a load matrix of the preferred replica matrix;
a load distribution matrix for each storage node of the load matrix is determined.
In an optional embodiment, the parameter generation module comprises:
the second determining submodule is used for determining a system evaluation parameter of the alternative scheduling parameter according to the predicted load data; the system evaluation parameters comprise load balance index parameters and migration cost parameters;
and the generation submodule is used for determining the scheduling parameters in the alternative scheduling parameters according to the system evaluation parameters.
In an alternative embodiment, the generation submodule is configured to:
determining a minimum value of a sum of the load balancing index parameter and the migration cost parameter according to a first data relationship, where the candidate scheduling parameter corresponding to the minimum value is the scheduling parameter, and the first data relationship includes:
Figure BDA0003418193690000161
wherein J (U) represents the system evaluation parameter, UiAn input sequence representing the ith said predicted scheduling cycle, k representing said second preset number; y isiThe load balancing index parameter representing the ith predicted scheduling period, and Q represents a first weight of the load balancing index parameter; w represents a second weight of the migration cost parameter.
In an optional embodiment, the scheduling parameter includes an original storage node and a target storage node corresponding to a data scheduling operation;
the data scheduling module 402 is configured to:
and migrating the data of the original storage node to the target storage node.
The data scheduling apparatus provided in the embodiment of the present application can implement each process implemented in the method embodiments of fig. 1 to fig. 3, and is not described here again to avoid repetition.
In the data scheduling apparatus provided by the application, the parameter obtaining module 401 obtains a scheduling parameter of a storage node in a first scheduling period; the data scheduling module 402 schedules the data of the storage node according to the scheduling parameter; the scheduling parameters are determined according to the predicted load data of the first scheduling period, the control process and the prediction process are combined, the influence of the scheduling operation on the database system is fully considered, and the robustness of the system is improved.
The data scheduling apparatus of the embodiment of the present application may execute the data scheduling method provided in the embodiment of the present application, and the implementation principle is similar, actions executed by each module and unit in the data scheduling apparatus in each embodiment of the present application correspond to steps in the data scheduling method in each embodiment of the present application, and detailed functional descriptions of each module of the data scheduling apparatus may specifically refer to descriptions in the corresponding data scheduling method shown in the foregoing, and are not described again here.
Based on the same principle as the method shown in the embodiments of the present application, the embodiments of the present application also provide an electronic device, which may include but is not limited to: a processor and a memory; a memory for storing a computer program; a processor for executing the data scheduling method of any of the alternative embodiments of the present application by calling a computer program.
In an alternative embodiment, there is also provided an electronic device, as shown in fig. 5, the electronic device 5000 shown in fig. 5 includes: a processor 5001 and a memory 5003. The processor 5001 and the memory 5003 are coupled, such as via a bus 5002. Optionally, the electronic device 5000 may further include a transceiver 5004, and the transceiver 5004 may be used for data interaction between the electronic device and other electronic devices, such as transmission of data and/or reception of data. It should be noted that the transceiver 5004 is not limited to one in practical application, and the structure of the electronic device 5000 is not limited to the embodiment of the present application.
The Processor 5001 may be a CPU (Central Processing Unit), a general purpose Processor, a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array) or other Programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure herein. The processor 5001 may also be a combination of processors implementing computing functionality, e.g., a combination comprising one or more microprocessors, a combination of DSPs and microprocessors, or the like.
Bus 5002 can include a path that conveys information between the aforementioned components. The bus 5002 may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus 5002 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 5, but this is not intended to represent only one bus or type of bus.
The Memory 5003 may be a ROM (Read Only Memory) or other types of static storage devices that can store static information and instructions, a RAM (Random Access Memory) or other types of dynamic storage devices that can store information and instructions, an EEPROM (Electrically Erasable Programmable Read Only Memory), a CD-ROM (Compact Disc Read Only Memory) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), a magnetic Disc storage medium, other magnetic storage devices, or any other medium that can be used to carry or store a computer program and that can be Read by a computer, and is not limited herein.
The memory 5003 is used for storing computer programs for executing the embodiments of the present application, and is controlled by the processor 5001 for execution. The processor 5001 is configured to execute computer programs stored in the memory 5003 to implement the steps shown in the foregoing method embodiments.
Among them, electronic devices include but are not limited to: mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and fixed terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
Embodiments of the present application provide a computer-readable storage medium, on which a computer program is stored, and the computer program, when being executed by a processor, can implement the steps of the foregoing method embodiments and corresponding content.
Embodiments of the present application further provide a computer program product, which includes a computer program, and when the computer program is executed by a processor, the steps and corresponding contents of the foregoing method embodiments can be implemented.
The terms "first," "second," "third," "fourth," "1," "2," and the like in the description and in the claims of the present application and in the above-described drawings (if any) are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in other sequences than illustrated or otherwise described herein.
It should be understood that, although each operation step is indicated by an arrow in the flowchart of the embodiment of the present application, the implementation order of the steps is not limited to the order indicated by the arrow. In some implementation scenarios of the embodiments of the present application, the implementation steps in the flowcharts may be performed in other sequences as desired, unless explicitly stated otherwise herein. In addition, some or all of the steps in each flowchart may include multiple sub-steps or multiple stages based on an actual implementation scenario. Some or all of these sub-steps or stages may be performed at the same time, or each of these sub-steps or stages may be performed at different times. In a scenario where execution times are different, an execution sequence of the sub-steps or the phases may be flexibly configured according to requirements, which is not limited in the embodiment of the present application.
The foregoing is only an optional implementation manner of a part of implementation scenarios in this application, and it should be noted that, for those skilled in the art, other similar implementation means based on the technical idea of this application are also within the protection scope of the embodiments of this application without departing from the technical idea of this application.

Claims (10)

1. A method for scheduling data, comprising:
acquiring a scheduling parameter of a storage node in a first scheduling period;
wherein the scheduling parameter is determined according to predicted load data of the first scheduling period; the predicted load data is obtained by prediction according to historical data; the historical data comprises data of a historical scheduling period prior to the first scheduling period;
and scheduling the data of the storage nodes according to the scheduling parameters.
2. The data scheduling method of claim 1, wherein the obtaining the scheduling parameter of the storage node in the first scheduling period is preceded by the method further comprising:
acquiring the historical data of a first preset number of the historical scheduling periods;
predicting the predicted load data of a second preset number of predicted scheduling cycles according to the historical data; the predicted scheduling period comprises the first scheduling period;
and determining the scheduling parameters of the predicted scheduling period according to the predicted load data.
3. The data scheduling method of claim 2, wherein the predicting the predicted load data of a second preset number of predicted scheduling cycles according to the historical data comprises:
acquiring a hot spot distribution matrix of a hot spot region in the historical data;
determining a load distribution matrix according to the hotspot distribution matrix;
and determining the predicted load data of each storage node according to the load distribution matrix.
4. The data scheduling method according to claim 3, wherein the determining a load distribution matrix according to the hotspot distribution matrix comprises;
determining a preferred replica matrix in the hotspot distribution matrix;
acquiring a load matrix of the preferred replica matrix;
a load distribution matrix for each storage node of the load matrix is determined.
5. The data scheduling method of claim 2, wherein the determining the scheduling parameter of the predicted scheduling period according to the predicted load data comprises:
determining system evaluation parameters of alternative scheduling parameters according to the predicted load data; the system evaluation parameters comprise load balance index parameters and migration cost parameters;
and determining the scheduling parameters in the alternative scheduling parameters according to the system evaluation parameters.
6. The data scheduling method of claim 5, wherein the determining the scheduling parameter of the alternative scheduling parameters according to the system evaluation parameter comprises:
determining the minimum value of the sum of the load balancing index parameter and the migration cost parameter according to a first data relationship, wherein the alternative scheduling parameter corresponding to the minimum value is the scheduling parameter;
wherein the first data relationship comprises:
Figure FDA0003418193680000021
wherein J (U) represents the system evaluation parameter, UiAn input sequence representing the ith said predicted scheduling cycle, k representing said second preset number; y isiThe load balancing index parameter representing the ith predicted scheduling period, and Q represents a first weight of the load balancing index parameter; w represents a second weight of the migration cost parameter.
7. The data scheduling method according to any one of claims 1 to 6, wherein the scheduling parameters include an original storage node and a target storage node corresponding to a data scheduling operation;
the scheduling data of the storage node according to the scheduling parameter includes:
and migrating the data of the original storage node to the target storage node.
8. A data scheduling apparatus, comprising:
the parameter acquisition module is used for acquiring the scheduling parameters of the storage node in a first scheduling period;
wherein the scheduling parameter is determined according to predicted load data of the first scheduling period; the predicted load data is obtained by prediction according to historical data; the historical data comprises data of a historical scheduling period prior to the first scheduling period;
and the data scheduling module is used for scheduling the data of the storage nodes according to the scheduling parameters.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any one of claims 1 to 7 when executing the program.
10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method of any one of claims 1 to 7.
CN202111585351.XA 2021-12-17 2021-12-17 Data scheduling method and device and electronic equipment Pending CN114398339A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111585351.XA CN114398339A (en) 2021-12-17 2021-12-17 Data scheduling method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111585351.XA CN114398339A (en) 2021-12-17 2021-12-17 Data scheduling method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN114398339A true CN114398339A (en) 2022-04-26

Family

ID=81227288

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111585351.XA Pending CN114398339A (en) 2021-12-17 2021-12-17 Data scheduling method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN114398339A (en)

Similar Documents

Publication Publication Date Title
US10936364B2 (en) Task allocation method and system
US8504556B1 (en) System and method for diminishing workload imbalance across multiple database systems
CN106469018B (en) Load monitoring method and device for distributed storage system
CN103970587B (en) A kind of method, apparatus and system of scheduling of resource
CN108959510B (en) Partition level connection method and device for distributed database
US10324644B2 (en) Memory side accelerator thread assignments
CN106202092A (en) The method and system that data process
JP2022500768A (en) Thermal load prediction methods, equipment, readable media and electronic devices
CN113516247A (en) Parameter calibration method, quantum chip control method, device and system
CN110704336A (en) Data caching method and device
CN114429195A (en) Performance optimization method and device for hybrid expert model training
CN114518948A (en) Large-scale microservice application-oriented dynamic perception rescheduling method and application
JP2019503014A (en) Method and apparatus for processing user behavior data
CN114077492A (en) Prediction model training and prediction method and system for cloud computing infrastructure resources
CN114398339A (en) Data scheduling method and device and electronic equipment
CN115309502A (en) Container scheduling method and device
CN111967938B (en) Cloud resource recommendation method and device, computer equipment and readable storage medium
CN107018163B (en) Resource allocation method and device
CN115190010A (en) Distributed recommendation method and device based on software service dependency relationship
CN113342781A (en) Data migration method, device, equipment and storage medium
CN112000485A (en) Task allocation method and device, electronic equipment and computer readable storage medium
CN116701410B (en) Method and system for storing memory state data for data language of digital networking
US9652373B2 (en) Adaptive statistics for a linear address space
CN116893865B (en) Micro-service example adjusting method and device, electronic equipment and readable storage medium
CN117608862B (en) Data distribution control method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination