CN105512264A

CN105512264A - Performance prediction method of concurrency working loads in distributed database

Info

Publication number: CN105512264A
Application number: CN201510881758.5A
Authority: CN
Inventors: 李晖; 陈梅
Original assignee: Guizhou Youlian Borui Technology Co Ltd; Guizhou University
Current assignee: Guizhou Youlian Borui Technology Co Ltd; Guizhou University
Priority date: 2015-12-04
Filing date: 2015-12-04
Publication date: 2016-04-20
Anticipated expiration: 2035-12-04
Also published as: CN105512264B

Abstract

The invention discloses a performance prediction method of concurrency working loads in a distributed database. A linear regression model is established and used for judging the interaction between queries in the distributed database and predicting query time delay L under different concurrency degrees in the distributed database, and the database selectively distributes tasks through the query time delay L. The method includes the main steps that A, metrics of the query time delay L are selected; B, the interaction under combined concurrency conditions is inquired about and the linear regression model is established; C, correctness and validity of the linear regression model are verified through experiments. It is proved through repeated experiments that the total average relative error of the query time delay is 14%, the total average relative error of network delay is 30% and the total average relative error of the number of I/O block reading times is 37%, it can be seen from experiment results that the linear regression model can well conduct performance prediction of concurrency work responsibility on the distributed database, and therefore subsequent task distribution of the database is facilitated, and the average waiting time of query can be shortened.

Description

The performance prediction method of concurrent efforts load in distributed data base

Technical field

The present invention relates to the performance prediction method of operating load in a kind of database, the performance prediction method of particularly concurrent efforts load in a kind of distributed data base.

Background technology

Now current, the performance prediction for database work load has had relevant research.But the database of now research is only confined to the database of single node, and that is this database only has a station server, for its performance of a station server how, depend primarily on the disk of server and the utilization factor of CPU.The growth of the data volume produced along with research and industrial circle, distributed data base system is applied to store and management PB DBMS, and provides high concurrency and extensibility.Data in distributed data base pass through the call by pattern of dispersion/collection to process.Such as, inquiry can be gone out multiple subquery by a node split, these subqueries can by the concurrent execution of other node a lot, and then the partial results of each node will to turn back on this node and to combine, and obtain final result of query execution.Therefore, in distributed data base, data point cede territory to be stored on the multiple distributed nodes in cluster, and this cluster can be expanded easily by adding new node.This namely why distributed data base be used for one of the reason of the large data of Storage and Processing.Usually, distributed data base is used for supporting concurrence performance analytic type operating load, to reduce the query execution time of needs.But concurrence performance also also exists the challenge in resource contention while bringing a large amount of advantage, such as judges the interaction between multiple queries.Interaction between multiple queries may be different.When a table scan is shared in two inquiries, may be positive role each other.On the contrary, when two inquiries all need high network transmission bandwidth, mutually the query execution time will be increased because of network delay.For single-node data storehouse, the distribution of its task is only confined to a station server, but for the distributed data base of multinode, the distribution of its task then has multiple choices, how to pass through task matching, shorter to realize the inquiry average latency, be that database should be considered when carrying out task matching.Such as, in distributed data base, there are 3 servers, and all at execution query task, No. 1 server disk, cpu busy percentage are lower; No. 2 and No. 3 server disks and cpu busy percentage higher, if now 1 inquiry, distributed to which server and carry out having considered with regard to needs.If No. 1 server also has other inquiry to raise at wait or its disk of subsequent time, cpu busy percentage, and 2, No. 3 server subsequent time disks, CPU are when reducing, when so carrying out task matching, just should consider to assign the task to 2 goods No. 3 servers, instead of distribute to No. 1 server.Therefore, with regard to needing, performance prediction is carried out to the operating load in distributed data base, thus be convenient to follow-up task matching.Due to the singularity of distributed data base, former database performance Forecasting Methodology is not suitable for present distributed data base, and does not carry out performance prediction for concurrent operating load in existing performance prediction method.

Summary of the invention

The object of the invention is to, the performance prediction method of concurrent efforts load in a kind of distributed data base is provided.The present invention can carry out good performance prediction to concurrent efforts load in distributed data base, thus is convenient to follow-up task matching, thus shortens the average latency of inquiry.

Technical scheme of the present invention: the performance prediction method of concurrent efforts load in a kind of distributed data base, by setting up multiple linear regression model, for judging the interaction in distributed data base between inquiry, and the inquiry time delay L in prediction distribution formula Database Systems under different concurrent degree, the selectivity that database carries out task by inquiry time delay L is distributed; Its key step includes:

The metric of A, inquiry time delay L is selected;

Multiple linear regression model is set up in interaction under B, inquiry combination complications;

The correctness of C, experimental demonstration multiple linear regression model and validity.

In the performance prediction method of concurrent efforts load in aforesaid distributed data base, the inquiry time delay L in steps A includes network delay and processing locality.

In the performance prediction method of concurrent efforts load in aforesaid distributed data base, described network delay adopts transmission volume N as its metric; Described processing locality adopts I/O block to read number of times B as its metric.

In the performance prediction method of concurrent efforts load in aforesaid distributed data base, described step B is made up of following several part:

B1: predicted query interacts;

B2: predicted query time delay;

B3: based on the linear regression model (LRM) training of sampling.

In the performance prediction method of concurrent efforts load in aforesaid distributed data base, described B1 step includes: the I/O block of main inquiry q when inquiring about p1...pn concurrence performance with pair reads the prediction of number of times B and transmission volume N; Wherein I/O block reads number of times B by following linear forecast of regression model:

B = β_{1} B_{q} + β_{2} Σ_{i = 1}^{n} B_{p i} + β_{3} Σ_{i = 1}^{n} {ΔB}_{q} /_{p i} + β_{4} Σ_{i = 1}^{n} Σ_{i = 1}^{n, j! = i} {ΔB}_{p i} /_{q j} - - - (1);

Transmission volume N, following linear forecast of regression model:

N = β_{1} N_{q} + β_{2} Σ_{i = 1}^{n} N_{p i} + β_{3} Σ_{i = 1}^{n} {ΔN}_{q} /_{p i} + β_{4} Σ_{i = 1}^{n} Σ_{i = 1}^{n, j! = i} {ΔN}_{p i} /_{q j} - - - (2);

Described step B2 is predicted inquiry time delay L by following linear regression model:

L＝C _q+β ₁*B _q+β ₂*N _q(3)；

Described step B3 is: by providing the inquiry of more than 2, and use LHS to generate different inquiry combination, and the inquiry combination that paired operation is different, I/O block when recording each inquiry combination reads number of times and transmission volume to form sample, uses sample to estimate the factor beta 1 of linear regression model (LRM), β 2, β 3 and β 4 by least square method;

In formula, B _qbe that the main I/O block inquiring about q reads number of times;

for the I/O block of all pair inquiries reads number of times sum;

for all pair inquiries read number of times sum to the I/O block of the direct influence value of main inquiry;

for the I/O block of the remote effect value between all pair inquiries reads number of times sum;

N _qit is the transmission volume of main inquiry q;

for the transmission volume sum of all pair inquiries;

for the transmission volume sum of all pair inquiries to the direct influence value of main inquiry;

for the transmission volume sum of the remote effect value between all pair inquiries;

C _qfor inquiring about the CPU overhead time of q.

In the performance prediction method of concurrent efforts load in aforesaid distributed data base, described step C is: in multiple linear regression model, run inquiry Q1, Q2, Q3 ... Qn obtains measured value, then measured value is put into multiple linear regression model to export, obtain predicted value, the sampling of a predicted value part is divided into test data set, another part is divided into training dataset, and observes the fit solution between predicted value and measured value.

In the performance prediction method of concurrent efforts load in aforesaid distributed data base, the network transmission package number between described transmission volume employing node is as raw data during measurement query execution.

In the performance prediction method of concurrent efforts load in aforesaid distributed data base, described network transmission package number and I/O block are read number of times and are used SystemTap to obtain.

Beneficial effect of the present invention: compared with prior art, due to the transmission of data between node in distributed data base, network overhead can be related to when performing inquiry in system, therefore when the concurrent query execution performance of prediction, consider network delay, in the present invention, propose reciprocation when linear regression model (LRM) carrys out concurrence performance analytic type operating load in prediction distribution formula Database Systems.Because network delay and processing locality are query execution time most important two factors, therefore in the present invention, the present invention is from network delay, and processing locality and three aspects of concurrency in various degree, utilize linear regression model (LRM) to carry out analysis and consult act of execution.In addition, adopt sample technique to obtain the inquiry combination under different concurrent degree in the present invention.Model of the present invention is in the cluster built by PostgreSQL, utilize typical analytic type operating load TPC-H data set to complete performance prediction, by repeatedly testing proof inquiry time delay, the average relative error that network delay and I/O block read number of times total is respectively 14%, 30% and 37%, can find out that the linear regression model (LRM) that the present invention proposes can well carry out the responsible performance prediction of concurrent efforts to distributed data base from experimental result, thus be convenient to the follow-up task matching of database, the average latency of inquiry can be shortened.

Accompanying drawing explanation

Accompanying drawing 1 is the predicted value of inquiry time delay of the present invention and the matching schematic diagram of measured value;

Accompanying drawing 2 is that I/O block of the present invention reads the predicted value of number of times and the matching schematic diagram of measured value;

Accompanying drawing 3 is the predicted value of network delay of the present invention and the matching schematic diagram of measured value;

Accompanying drawing 4 is the average relative error schematic diagram of the present invention when concurrent degree is 3;

Accompanying drawing 5 is the average relative error schematic diagram of the present invention when concurrent degree is 4;

Embodiment

Below in conjunction with drawings and Examples, the present invention is further illustrated, but not as the foundation limited the present invention.

Embodiments of the invention:

1. performance prediction

Target of the present invention is the concurrent inquiry time delay performance prediction in research distributed data base system.Performance in distributed data base system mainly affects by the resource contention shared in basic resource situation, and the resource shared has RAM, CPU, magnetic disc i/o, network bandwidth etc.Therefore in the present invention, the availability value that may be used for inquiry time delay performance prediction under concurrent efforts load is first selected, particularly for distributed data base system.

The present invention focuses on the concurrent inquiry time delay of prediction distribution formula analytic type operating load.Analytic type inquiry in distributed data base system relates generally to network delay and processing locality two aspects.

Processing locality acquires its data needed for retrieving and process an inquiry from node.The processing locality time is the time average between the block being submitted to needs when the request of retrieve data on node obtains returning.For logical I/O request, processing locality needs search disk many times, and a series of reading continuously is write with a small amount of, or the access of buffer memory and cache pool.Usually, processing locality time major part is used for I/O operation, and read operation is far away more than write operation in addition.Therefore in the present invention, average I/O block is adopted to read number of times as weighing metric processing locality dimension being carried out inquiry time delay performance prediction.

Because the data in distributed data base are processed by the pattern of dispersion/collection, Internet Transmission when therefore performing inquiry is necessary.Data are divided and in storage multiple distributed nodes in the cluster, the data of transmission may be the partial query result obtained from local node, or net result returned to the node submitting this request to.Transmission volume is the factor affecting inquiry time delay in distributed data base system, therefore makes in the present invention to use it as the metric in network delay dimension.

The present invention is mainly for the research in medium complex analyses aspect.The present invention have chosen 10 medium complex query statements to form query analysis of the present invention combination from TPC-H for this reason, and these query statements are conceived to the concurrence performance performance of distributed data base system.First, choose 10 query statements under complications in various degree, utilize TPC-H to produce the data set of 10G, then the PostgreSQL cluster be made up of 4 nodes runs the inquiry time delay time that this query statement obtains measurement, wherein MPL represents the concurrent quantity of query execution, embodies concurrent degree.As table 1, the present invention can find out that the time delay of not every query statement all can present the trend of existing linear increase along with the increase of number of concurrent.

The average lookup time delay of table 1. 10 inquiries under the concurrent degree of difference

Inquiry	MPL1	MPL2	MPL3	MPL4
					3	0.07	0.13	0.12	0.10
4	5.23	5.48	5.32	5.61
					5	8.92	9.62	9.70	10.46
6	2.63	3.14	2.76	2.80
					7	27.80	29.48	31.03	32.06
8	26.95	28.24	31.85	28.12
					10	3.13	3.68	3.61	3.71
14	3.50	4.10	3.84	4.11
					18	83.14	93.47	87.93	86.03
19	4.83	5.90	5.92	6.19

2. interaction modeling

In the discussion of previous section, the present invention reads number of times and transmission volume as the metric on processing locality and network delay two dimensions using adopting I/O block respectively, predicts the performance of different inquiry combination in different concurrent degree.Therefore in this part, the present invention proposes two multiple linear regression models and study inquiry and be combined in interaction under complications.Then the present invention proposes again a linear regression model (LRM), and utilizes I/O block to read number of times and network delay to carry out inquiry time delay prediction.Data set after finally utilizing sampling carries out training and obtains forecast model of the present invention.

In order to predicted query influencing each other in concurrence performance situation, first the I/O block of the present invention's judgement when inquiry performs under two complications reads the impact in number of times and transmission volume, then increases concurrency gradually.Particularly the present invention builds multiple linear regression model to analyze influencing each other under two concurrent degree.In order to make model of the present invention easier to understand, the present invention is divided into main inquiry and secondary inquiry inquiry.Main inquiry is that the present invention wants to study the inquiry that it is affected situation under complications, and secondary inquiry is the inquiry with main inquiry concurrence performance.Before the model introducing the present invention's proposition, first introduce correlated variables.The value of these variablees can be concentrated from training data and obtain.

Separation number: the present invention proposes this variable as a base value, namely main inquiry is without value when performing under complications.The present invention using this value as the base value judging analog value under complications.Such as inquiry i, the present invention B _irepresent that I/O block reads number of times, use N _irepresent transmission volume.

Concurrent value: same, the value of this variable is the separation number summation of concurrent inquiry, the B as above in example _ior N _i.

Direct influence value: the present invention uses this value to carry out the impact of vice inquiry on main inquiry, and it is the summation of the metric of change.Such as when i is main inquiry, j is secondary inquiry, for transmission volume, and N _i/jrepresent direct disturbance degree value, its changing value is Δ N _i/j=N _i/j-N _i.

Remote effect value: the present invention applies this variable and carrys out vice inquiry and directly influence each other, and its value is the direct influence value sum of secondary inquiry.

B = β_{1} B_{q} + β_{2} Σ_{i = 1}^{n} B_{p i} + β_{3} Σ_{i = 1}^{n} {ΔB}_{q} /_{p i} + β_{4} Σ_{i = 1}^{n} Σ_{i = 1}^{n, j! = i} {ΔB}_{p i} /_{q j} - - - (1);

Therefore, the present invention uses following formula to carry out predicted query q and reads number of times B and transmission volume N at its average I/O block of situation with p1...pn concurrence performance:

N = β_{1} N_{q} + β_{2} Σ_{i = 1}^{n} N_{p i} + β_{3} Σ_{i = 1}^{n} {ΔN}_{q} /_{p i} + β_{4} Σ_{i = 1}^{n} Σ_{i = 1}^{n, j! = i} {ΔN}_{p i} /_{q j} - - - (2);

By adopting least square method to estimate, these coefficients of factor beta 1, β 2, β 3, β 4 of each inquiry will be obtained from training data is concentrated by training in the present invention.

The present invention considers that I/O block reads number of times and transmission volume two aspects to set up the inquiry time delay that each inquiry predicted by linear regression model (LRM) simultaneously.General for distributed data base system, inquiry time delay forms primarily of network delay and processing locality, and the processing locality time mainly comprises specific CPU overhead time and average logical I/O stand-by period, the inquiry time delay therefore inquiring about q can be predicted by following formula:

In above-mentioned formula 1,2 and 3, the particular CPU overhead time of Cq representative inquiry q, Bq represents that average I/O block reads number of times, and Nq represents the averaging network transmission quantity in distributed data base between multiple node.

The present invention carries out experiment repeatedly to obtain factor beta 1, β 2 by using least square method to sample.

In order to the model of easy to understand the present invention proposition more, next introduce a simple example.If the present invention want to predict inquire about in distributed data base system a with the inquiry time delay in inquiry b, c concurrence performance situation, first the present invention needs to calculate following value:

The isolation I/O block of inquiry a, b, c reads time numerical value Ba, Bb and Bc, and isolation network transmission value: Na, Nb and Nc.

With in inquiry b, c concurrence performance situation, the direct influence value of inquiry a: △ B _a/b, △ N _a/b,

△B _a/c,△N _a/c。

Remote effect value: △ B _c/b, △ N _c/b, △ B _b/c, △ N _b/c.

The present invention can obtain corresponding metric respectively by following two formula.

B _a＝β ₁B _a+β ₂(B _b+B _c)+β ₃(△B _a/b+△B _a/c)+β ₄(△B _c/b+△B _b/c)

N _a＝β ₁N _a+β ₂(N _b+N _c)+β ₃(△N _a/b+△N _a/c)+β ₄(△N _c/b+△N _b/c)

Next the inquiry time delay will formula 3 being used to carry out predicted query a:

L _a＝C _a+β ₁*B _a+β ₂*N _a

In order to obtain inquiry time delay from previously described formula 3, need to train forecast model of the present invention.First, provide feature when 10 inquiries run respectively, be query execution time delay, I/O block reads number of times and network delay, these 10 inquiries are baseline query statements of various inquiry combination under different MPLs, when the operation inquiry that the present invention is paired, as 55 paired inquiries can obtain the concrete feature how they affect the other side.

In order to run these query statements on multiple machines, the interbehavior of higher degree simultaneously, the present invention uses LHS to generate different inquiry combination to represent the operating load wanted required for the present invention.LHS is a stratified sampling function, and it can produce sample data very easily.In table 2, the present invention gives a LHS example out on MPL2, can see that LHS generates 5 paired inquiries in this example.In an experiment, transmission volume when recording the I/O block reading times of each inquiry and carry out each inquiry combination is to form sample, and these samples are used for the coefficient of appraising model.For each inquiry, generate a lot of inquiry example combinations to form sample.Such as, inquiry 3 represents Q3, Q4, Q5 combination, but Q3 is main inquiry, and Q4, Q5 are then secondary inquiries.

LHS example tieed up by table 2.2

Inquiry	1	2	3	4	5
						1		X
2					X
						3	X
4			X
						5			X

For set up linear regression model (LRM), the correctness of three models needing assessment to propose and validity.To understand inquiry time delay respectively by experiment, transmission volume and I/O block read measured value and the predicted value situation of number of times, and when being 3 and 4 for concurrent degree, understand the average relative error rate of each inquiry.

3. experimental demonstration

In order to the feasibility of appraisal procedure and the accuracy of model, select to perform inquiry on the 10G data set of the QGEN generation provided by TPC-H.Because this research lays particular emphasis on analytical operating load, therefore the present invention chooses Q3 from 22 inquiries of TPC-H, and Q4, Q5, Q6, Q7, Q8, Q10, Q14, Q18, Q19 form inquiry of the present invention combination.Choosing these inquiries is because these query times are relatively long, can provide the more time, be beneficial to the present invention and collect I/O block and read number of times and transmission volume.The data-base cluster that the distributed data base system of this experiment is made up of four PostgreSQL nodes, use Postgres-XL to realize this function in experiment, Postgres-XL is a PostgreSQL data-base cluster of increasing income, and all has high-level retractility and dirigibility to the different database work load of process.Clustered deploy(ment) is at 4 cores, and 2 hertz of processors, 8G internal memory, model is in the physical machine of Intel (R) Xeon (R) CPUE5-2620, the Centos6.4 of the operating system that each node runs to be kernel be Linux2.6.32.

First obtain training dataset by sample technique, and use Matlab to obtain multiple linear regression model, then use test data set to predict that the I/O block inquired about under concurrence performance reads number of times and network delay.

Training dataset and test data set are then obtained by following manner, inquiry Q1, Q2, Q3 is run in multiple linear regression model ... Qn obtains measured value, then measured value is put into multiple linear regression model to export, obtain predicted value, the sampling of a predicted value part is divided into test data set, another part sampling obtains training dataset, and observes the fit solution between predicted value and measured value.

Fit solution between predicted value and measured value provides in FIG.In an experiment, the present invention uses coefficient of determination R ²weigh the fine of regression model whether matching.Coefficient of determination R ²span be 0 to 1, its value near 1, illustrate predicted value and measured value more close, regression model of the present invention is better.Fig. 1,2,3 respectively illustrate under many complications, and the inquiry time delay that the forecast model utilizing the present invention to propose obtains, network delay and I/O block read number of times, the fit solution between predicted value and measured value.R successively ²value be respectively 0.94,0.58 and 0.84, this illustrates in this research work and utilizes network delay and I/O block to read the ability that number of times carrys out predicted query time delay.For each inquiry, first use the model in formula 1 and 2 to predict that network delay and I/O block read number of times, finally use formula 3 to carry out predicted query time delay.In an experiment, the present invention adopts the network transmission package number between node as the raw data of transmission volume during measurement query execution.

Need to it should be noted that the method obtaining raw data, because these raw data become sample after process, and two key factors affecting linear regression model (LRM) quality are quality and the quantity of sample respectively, the method therefore obtaining raw data is very important.

In order to I/O block when collecting each query execution reads number of times and transmission volume, use SystemTap to perform the script write in this research and carry out Dynamic Acquisition data.SystemTap is the dynamic approach of monitoring and the operation following the tracks of operating linux kernel.For user provides simple command Window and script.With the statistics by catching PostgreSQL self or utilize other instruments to obtain compared with network delay, using SystemTap more can obtain network transmission package and I/O block accurately and reading number of times.Data are obtained in addition in order to obtain the more time, also in order to make the data that obtain more accurate, the shared_buffer value of what the present invention was suitable have adjusted PostgreSQL.

As previously mentioned, the present invention applies common least square method (OLS) and obtains coefficient in model.According to common least square method, basic demand could be met from least needing 6 samples to carry out predicted query time delay empirically.In the middle of experiment, present invention employs 120 sample values, also for ease of and predict that network delay and I/O block read number of times, also at least need 13 samples, in this research, employ 140 samples.Find in experiment, when increasing sample size, there is not special change in overall variation tendency, just makes a little more crypto set on the contrary yet yet.In FIG, in order to make more point show in the drawings, so there is no displaying is worth king-sized point, and the query execution time of such as inquiring about 18 does not just embody in FIG.In addition, in figure 3 for the prediction of network delay, the present invention can see the point that some predictions are higher or on the low side.This is because the fluctuation of Experimental Network or the packet loss when collecting data make the error of predicted value and observed reading larger.

In addition, in order to closer to practical application scene, all do not remove buffer memory when performing inquiry, this is also why in time increasing concurrent degree, prediction accuracy has one of reason reduced a little at every turn in experiment.In comparison diagram 4 and Fig. 5, I/O block is read the average relative error of number of times and can find this phenomenon.

Comparison diagram 4 and Fig. 5 can also find, when inquiry 3 (Q3) average relative error when concurrency is 3 is 4 higher than concurrency.Can learn by analyzing, this is because the execution time of inquiry 3 (Q3) is too short to such an extent as to can not obtain source data more accurately, the too low predicated error that makes of sample quality is higher.

Fig. 4, Fig. 5 respectively show when concurrent degree is 3,4, inquiry time delay, and network delay and I/O block read the average relative error of number of times, and wherein average relative error is passed through | (measured value-predicted value)/measured value | calculate gained.Inquiry time delay, the average relative error that network delay and I/O block read number of times total is respectively 14%, 30% and 37%.This experimental result shows that the model using the present invention to propose can well carry out the performance prediction of concurrent efforts load to distributed data base system.

Claims

1. the performance prediction method of concurrent efforts load in a distributed data base, it is characterized in that: by setting up multiple linear regression model, for judging the interaction in distributed data base between inquiry, and the inquiry time delay L in prediction distribution formula database under different concurrent degree, the selectivity that database carries out task by inquiry time delay L is distributed; Its key step includes:

The metric of A, inquiry time delay L is selected;

2. the performance prediction method of concurrent efforts load in distributed data base according to claim 1, is characterized in that: the inquiry time delay L in steps A includes network delay and processing locality.

3. the performance prediction method of concurrent efforts load in distributed data base according to claim 2, is characterized in that: described network delay adopts transmission volume N as its metric; Described processing locality adopts I/O block to read number of times B as its metric.

4. the performance prediction method of concurrent efforts load in distributed data base according to claim 3, is characterized in that: described step B is made up of following several part:

B1: predicted query interacts;

B2: predicted query time delay;

B3: based on the linear regression model (LRM) training of sampling.

5. the performance prediction method of concurrent efforts load in distributed data base according to claim 4, is characterized in that:

Described B1 step includes: the I/O block of main inquiry q when inquiring about p1...pn concurrence performance with pair reads the prediction of number of times B and transmission volume N; Wherein I/O block reads number of times B by following linear forecast of regression model:

B = β_{1} B_{q} + β_{2} Σ_{i = 1}^{n} B_{p i} + β_{3} Σ_{i = 1}^{n} {ΔB}_{q} /_{p i} + β_{4} Σ_{i = 1}^{n} Σ_{i = 1}^{n, j! = i} {ΔB}_{p i} /_{q j} - - - (1);

Transmission volume N, by following linear forecast of regression model:

N = β_{1} N_{q} + β_{2} Σ_{i = 1}^{n} N_{p i} + β_{3} Σ_{i = 1}^{n} {ΔN}_{q} /_{p i} + β_{4} Σ_{i = 1}^{n} Σ_{i = 1}^{n, j! = i} {ΔN}_{p i} /_{q j} - - - (2);

L＝C _q+β ₁*B _q+β ₂*N _q(3)；

Described step B3 is: by providing the inquiry of more than 2, and use stratified sampling function to generate different inquiry combination, and the inquiry combination that paired operation is different, I/O block when recording each inquiry combination reads number of times B and transmission volume N to form sample, uses sample to estimate the factor beta 1 of linear regression model (LRM), β 2, β 3 and β 4 by least square method;

for the I/O block of all pair inquiries reads number of times sum;

N _qit is the transmission volume of main inquiry q;

for the transmission volume sum of all pair inquiries;

C _qfor inquiring about the CPU overhead time of q.

6. the performance prediction method of concurrent efforts load in distributed data base according to claim 1, it is characterized in that: described step C is: in multiple linear regression model, run inquiry Q1, Q2, Q3 ... Qn obtains measured value, then measured value is put into multiple linear regression model to export, obtain predicted value, the sampling of a predicted value part is divided into test data set, another part is divided into training dataset, and observes the fit solution between predicted value and measured value.

7. the performance prediction method of concurrent efforts load in distributed data base according to claim 3, is characterized in that: the network transmission package number between described transmission volume employing node is as raw data during measurement query execution.

8. the performance prediction method of concurrent efforts load in distributed data base according to claim 7, is characterized in that: described network transmission package number and I/O block are read number of times and used SystemTap to obtain.