CN106453546B

CN106453546B - The method of distributed storage scheduling

Info

Publication number: CN106453546B
Application number: CN201610875745.1A
Authority: CN
Inventors: 张栗粽; 殷光强; 罗光春; 田玲
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2016-10-08
Filing date: 2016-10-08
Publication date: 2019-05-07
Anticipated expiration: 2036-10-08
Also published as: CN106453546A

Abstract

The present invention relates to the methods of distributed storage scheduling, comprising: a. establishes evaluation index: obtaining evaluations matrix relevant to m memory node；B. data normalization is handled: after eliminating dimensional effect to each data in evaluations matrix, obtaining evaluation index value matrix；C. compared two-by-two using Triangular Fuzzy Number: obtaining the weight vectors of the judgment matrix being made of Triangular Fuzzy Number and each factor of evaluation and the weight of weighting evaluation matrix and each factor of evaluation；D. it acquires positive and negative ideal value: acquiring positive and negative ideal value and get Ge memory node to the distance of ideal value, each memory node is ranked up, optimal memory node is selected.Method of the invention, comprehensive analysis can be carried out to the various influence factors in scheduling process, it is responded to select most preferred memory node, high degree improves the data transmission performance and storage efficiency of network remote distributed storage, hence it is evident that improves the storage quality of distributed storage.

Description

The method of distributed storage scheduling

Technical field

The present invention relates to the distributed storage methods in cloud storage, are concretely that distributed storage is carried out in cloud storage The method of scheduling.

Background technique

In the field of distributed storage, Cinder is a kind of using extremely wide distributed storage architecture, its tune Degree is divided into two stages, is filtering and weighting respectively.When a storage request arrives, it is first The filtering stage screens satisfactory memory node, and each memory node only meets or is not inconsistent in screening Two kinds of conjunction is undesirable as a result, the queue then entered to weighting that meets the requirements, and is eliminated；It executes later The weighting stage is ranked up satisfactory memory node, chooses memory node the most suitable, is saved by the storage Point provides storage service for request.The current whether satisfactory standard of filtering procedure inspection memory node is the storage The storage request whether node has enough memory spaces to be able to carry out this time, if there is being placed in queue so as to later Weighting process uses, on the contrary then do not consider.Weighting process later is based on residual memory space and deposits to qualification Storage node is ranked up, and chooses the maximum memory node of residual memory space to provide service.By above two step, a transmission Scheduling process to the request of Cinder terminates.

The dispatching method of Cinder plays a key role in the service quality of distributed storage, but current Cinder scheduling there is a problem of following: not ensured that with the dispatching method that residual memory space is unique regulation goal currently The service quality of cloud storage.For example, when the network congestion of some memory node is more serious but its remaining memory space is maximum When, Cinder is not aware that this point, it can choose the memory node still to handle storage request.But at this time by In network congestion influence obviously the memory node be not ideal service node.Furthermore, it is understood that Cinder is this Only consider memory node residual memory space dispatching method do not reach scheduling comprehensive performance it is optimal.Shadow in memory node The factor for ringing service quality further includes other many factors in addition to remaining space, and many factors that only will affect service quality are comprehensive Conjunction, which takes into account, realizes that multidimensional scheduling can be only achieved optimal dispatching effect.

Summary of the invention

The present invention provides a kind of methods of distributed storage scheduling, to overcome current single goal scheduling cannot being dispatched to property The defect of energy optimal service node, keeps the selection mode to memory node more comprehensive, improves storage efficiency and quality.

The method of distributed storage scheduling of the present invention, including the following steps:

A. it establishes evaluation index: according to scheduling request, collecting on the influential factor of evaluation of scheduling, then analyze each institute The correlation of factor of evaluation and scheduling is stated, evaluations matrix relevant to m memory node is obtained, wherein m is natural number；

B. data normalization is handled: being a kind of Data processing commonly mode to data normalization.It is logical in this method After standardization formula is crossed to each data elimination dimensional effect in the evaluations matrix, the evaluation index value after being standardized Matrix；

C. compared two-by-two using Triangular Fuzzy Number: by Triangular Fuzzy Number to each in the evaluation index value matrix Memory node compares two-by-two, such as can use r for the comparison between two memory nodes m and n_mn=(a, b, c) indicate, in Value b indicates significance level, and two boundary values a and c are then used to indicate fog-level, and two are illustrated when the difference that b subtracts a is bigger The ambiguity that node compares is higher, illustrates that this is relatively non-fuzzy if difference is 0.Same reason,Indicate different degree of the node n relative to node m.Then the judgement square being made of Triangular Fuzzy Number is obtained The weight vectors of battle array and each factor of evaluation, and weighting evaluation matrix is obtained, normalized is done to the weight vectors and is obtained The weight of each factor of evaluation；

D. it acquires positive and negative ideal value: acquiring positive and negative ideal value on the weighting evaluation matrix, pass through the Manhattan of weighting Range formula calculates each memory node in weighting evaluation matrix respectively and reuses the degree of approach to the distance of positive and negative ideal value to define Comprehensive performance calculates the comprehensive evaluation value of each memory node, according to the comprehensive evaluation value of each memory node to each memory node It is ranked up, selects response memory node of the smallest memory node of comprehensive evaluation value as scheduling request.

Wherein establishing evaluation index is the basis analyzed, and the factor for influencing scheduling can be issued to tune from scheduling request Degree request handles this overall process by a certain memory node to be analyzed.It first relates to issue request in this course The factor of client, followed by transmits the factor of the network of request, is finally the factor of the service node of processing request.It therefore can The factor of evaluation to be judged as to include client factor, network factors and server-side factor.

From the point of view of specific, client factor includes hop count of the client away from server-side, that is, client is to storing The distance factor of node；Network factors include whether packet loss, the network in network transmission process are unimpeded etc.；Server-side factor It include processor load, memory usage and memory space occupancy etc..Therefore, hop count, packet loss, processor are negative It carries, memory usage and memory space occupancy this 5 principal elements have codetermined the quality of storage.

A kind of optional mode is, in step a by Pearson product-moment correlation coefficient analyze each factor of evaluation and The correlation of scheduling, available scheduled correlationWhereinRepresent test sample x's Mean value, similarlyThe mean value of y is represented, n representative sample capacity, the value of r is between minus 1 to positive 1, and explanation has just when being positive value Correlation, explanation has negative correlation when being negative value.

Further, it in step b, is closed according to the superior degree of each data in evaluations matrix is corresponding with data value size System carries out dimensional effect Processing for removing using different standardization formula.Such as more big for data value superior degree is more excellent Data p (i, j) can be using standardization formula: p (i, j)=n (i, j)/[n_max(i)+n_min(i)]；It is smaller for data value excellent The more more excellent data p (i, j) of degree can be using standardization formula: p (i, j)=[n_max(i)+n_min(i)-n(i,j)]/[n_max (i)+n_min(i)], wherein n (i, j) indicates the node in evaluations matrix N, n_max(i) maximum value of i-th of factor of evaluation is indicated, n_min(i) minimum value of i-th of factor of evaluation is indicated.

Preferably, it is calculated respectively in step d by the manhatton distance formula of weighting and respectively stores section in weighting evaluation matrix Point arrives the distance of positive and negative ideal value.It can useIndicate that memory node i to the distance of positive ideal value, is usedIndicate memory node i To the distance of negative ideal value.The size of value shows memory node i and the direct distance of positive ideal value, the smaller then table of the value The bright memory node is nearer it is to positive ideal value；SimilarlyShow the distance between memory node i and negative ideal value.Such as When the calculating parameter of candidate storage node is set as client distance service end hop count, network packet loss rate, cpu load, interior When depositing utilization rate and residual storage capacity, these parameters are obviously all the smaller the better, so at this moment optimal node is exactly from negative reason Want to be worthNearest memory node.

Various influence factors in scheduling process can be carried out comprehensive point by the method for distributed storage scheduling of the invention Analysis, is responded, high degree improves the data of network remote distributed storage to select most preferred memory node Transmission performance and storage efficiency, hence it is evident that improve the storage quality of distributed storage.

Specific embodiment with reference to embodiments is described in further detail above content of the invention again. But the range that this should not be interpreted as to the above-mentioned theme of the present invention is only limitted to example below.Think not departing from the above-mentioned technology of the present invention In the case of thinking, the various replacements or change made according to ordinary skill knowledge and customary means should all be included in this hair In bright range.

Detailed description of the invention

Fig. 1 is the flow chart of the method for distributed storage of the present invention scheduling.

Specific embodiment

The method of distributed storage scheduling of the present invention, step have as shown in Figure 1:

A. it establishes evaluation index: according to scheduling request, collecting on the influential factor of evaluation of scheduling.According to the full mistake of scheduling Factor of evaluation is divided into the factor for the client for issuing request, the factor of the network of transmission request, the service section for handling request by journey The factor of point.Wherein client factor includes hop count of the client away from server-side, and network factors include network transmission Whether packet loss and network in the process be unimpeded, and server-side factor includes processor load, memory usage and memory space Occupancy.

Based on this 5 factors of evaluation, each influence factor and scheduling are analyzed by Pearson product-moment correlation coefficient Correlation:

WhereinThe mean value of test sample x is represented, similarlyThe mean value of y is represented, n representative sample capacity, the value of r is between minus 1 To between positive 1, explanation has positive correlation when being positive value, and explanation has negative correlation when being negative value.

Evaluation object is m physical store node of actual storage volume, can be expressed as k_i∈ K, wherein i ∈ 1,2, 3,…,m}.There are 5 factors for influencing scheduling to consider each memory node, then can establish the evaluations matrix of m × 5 N:

B. data normalization is handled: eliminating dimensional effect to each data in the evaluations matrix by standardization formula. According to the corresponding relationship of the superior degree of each data in evaluations matrix and data value size, using different standardization formula into Row dimensional effect Processing for removing, such as the superior degree of " transmission delay " is the smaller the better.Superior degree more big for data value More excellent data p (i, j) can be using standardization formula: p (i, j)=n (i, j)/[n_max(i)+n_min(i)]；For data value The more excellent data p (i, j) of smaller superior degree can be using standardization formula: p (i, j)=[n_max(i)+n_min(i)-n(i, j)]/[n_max(i)+n_min(i)], wherein n (i, j) indicates the node in evaluations matrix N, n_max(i) i-th of factor of evaluation is indicated Maximum value, n_min(i) minimum value of i-th of factor of evaluation is indicated.

Then the evaluation index value matrix N after being standardized^·:

C. compared two-by-two using Triangular Fuzzy Number: by Triangular Fuzzy Number to each in the evaluation index value matrix Memory node compares two-by-two, altogether relatively m (m-1)/2 time.Then the judgment matrix being made of Triangular Fuzzy Number is obtainedWith the weight vectors of each factor of evaluation, and weighting matrix is obtained T does normalized to the weight vectors and obtains the weight of each factor of evaluation.

Such as have 5 alternative memory nodes, obtained weighting matrix T are as follows:

Above-mentioned weighting matrix T the sum of is transformed to by decimal form and seeks evaluation index average value:

Calculating fuzzy synthesis degree formula is used to judgment matrix:I= 1,2 ..., n, whereinIt is the summed result of respective items calculated in judgment matrix,It is the weight of item to be solved.This reality Apply in example is to calculate 5 evaluation indexes, therefore n=5 obtains each evaluation index relative to other evaluation indexes in turn Significance level:

R can be used for the comparison between two memory nodes m and n_mn=(a, b, c) indicates that intermediate value b is in a and c Value indicates significance level, and two boundary values a and c are then used to indicate fog-level, and two are illustrated when the difference that b subtracts a is bigger The ambiguity that node compares is higher, illustrates that this is relatively non-fuzzy if difference is 0.Similarly,Table Show different degree of the node n relative to node m.Pass through formula:

Each evaluation index and its can be calculated What its evaluation index was compared estimates:It can similarly obtain: V (S₁ ≥S₅)=0.417, V (S₂≥S₃)=0.235, V (S₂≥S₄)=0.228, V (S₂≥S₅)=0.762, V (S₅≥S₃)= 0.396, V (S₅≥S₄)=0.391, remaining each fiducial value is 1.

Recycle formula:With

D (P)=minV (P >=P_x), x=1,2 ..., n；P≠P_i, the weight vectors d of available each factor of evaluation (C_i):

d(C₁)=V (S₁≥S₂,S₃,S₄,S₅)=min (0.65,1,1,0.417)=0.417

d(C₂)=V (S₂≥S₁,S₃,S₄,S₅)=min (1,0.235,0.228,0.762)=0.228

d(C₃)=V (S₃≥S₁,S₂,S₄,S₅)=min (1,1,1,1)=1

d(C₄)=V (S₄≥S₁,S₂,S₃,S₅)=min (1,1,1,1)=1

d(C₅)=V (S₅≥S₁,S₂,S₃,S₄)=min (1,1,0.396,0.391)=0.391

Wherein P indicates the amount of chance event possibility occurrence size in new probability formula, P_nFor corresponding S_nProbability.

Doing inspection to each weight has:

d′(C₁)+d′(C₂)+d′(C₃)+d′(C₄)+d′(C₅)=0.137+0.075+0.329+0.329+0.13=1

Wherein d ' (C_i) it is d (C₁)~d (C₅) evaluation divided by they summation value.

The client acquired according to above formula is to the hop count of each memory node, network packet loss rate, cpu load, memory Weight vector A is obtained after 5 parametric solutions of utilization rate and disk space usage amount:

A=(a₁,a₂,a₃,a₄,a₅)=(0.137,0.075,0.329,0,329,0.13)

(the r in Triangular Fuzzy Number judgment matrix R is replaced with weight vector A₁,r₂,r₃,r₄,r₅), obtain weighting evaluation matrix Z。

D. it acquires positive and negative ideal value: acquiring positive and negative ideal value on the weighting evaluation matrix Z, can make on matrix Z It is that memory node is ranked up with TOPSIS algorithm, uses Z respectively⁺And Z^-It indicates, wherein Z⁺By respectively being evaluated in weighting evaluation matrix Z The maximum value of index forms, Z^-Then it is made of the minimum value of each evaluation index in weighting evaluation matrix Z:

Then by weighting manhatton distance formula calculate respectively each memory node to positive and negative ideal value distance:

Wherein i=1,2 ..., m, wherein a_jIt is evaluation index Weight, x_ijIt is the value of j-th of evaluation index of i-th of memory node,WithRespectively j-th of evaluation index is to positive and negative reason Think the distance of value,It is distance of the memory node i to positive ideal value,It is distance of the memory node i to negative ideal value, The size of value shows memory node i and the direct distance of positive ideal value, and the value is smaller, show the memory node nearer it is to Positive ideal value, similarlyShow direct distance between memory node i and negative ideal value.

The degree of approach is reused to define comprehensive performance, calculates the comprehensive evaluation value C of each memory node:The comprehensive performance and C of memory node_iValue it is negatively correlated, by respectively storing in this present embodiment The calculating parameter of node be set as client distance service end hop count, network packet loss rate, cpu load, memory usage and Residual storage capacity, these parameters are obviously all the smaller the better, so at this moment optimal memory node is exactly from negative ideal value D_i ^- Nearest memory node, therefore work as C_iWhen getting 0, that is, the memory node for being 0 at a distance from negative ideal value is optimal Node.Therefore according to the C of each memory node_iValue, is ranked up each memory node, selects C_iIt is worth the smallest memory node to make For the response memory node of scheduling request.

Claims

1. the method for distributed storage scheduling, feature include:

A. it establishes evaluation index: according to scheduling request, collecting on the influential factor of evaluation of scheduling, then analyze each institute's commentary The correlation of valence factor and scheduling obtains evaluations matrix relevant to m memory node, and wherein m is natural number；

B. data normalization is handled: after eliminating dimensional effect to each data in the evaluations matrix by standardization formula, being obtained Evaluation index value matrix after to standardization；

C. compared two-by-two using Triangular Fuzzy Number: by Triangular Fuzzy Number to each storage in the evaluation index value matrix After node compares two-by-two, the weight vectors of the judgment matrix being made of Triangular Fuzzy Number and each factor of evaluation are obtained, and obtain Weighting evaluation matrix does normalized to the weight vectors and obtains the weight of each factor of evaluation；

D. it acquires positive and negative ideal value: acquiring positive and negative ideal value on the weighting evaluation matrix, pass through the manhatton distance of weighting Formula calculates each memory node in weighting evaluation matrix respectively and reuses the degree of approach to the distance of positive and negative ideal value to define synthesis Performance calculates the comprehensive evaluation value of each memory node, is carried out according to the comprehensive evaluation value of each memory node to each memory node Sequence, selects response memory node of the smallest memory node of comprehensive evaluation value as scheduling request.

2. the method for distributed storage scheduling as described in claim 1, it is characterized in that: the factor of evaluation includes client Factor, network factors and server-side factor.

3. the method for distributed storage scheduling as claimed in claim 2, it is characterized in that: client factor includes client away from clothes The hop count at business end；Network factors include whether packet loss in network transmission process and network are unimpeded；Server-side factor packet Include processor load, memory usage and memory space occupancy.

4. the method that the distributed storage as described in one of claims 1 to 3 is dispatched, it is characterized in that: pass through Pearson came in step a Product moment correlation coefficient analyzes the correlation of each factor of evaluation and scheduling.

5. the method that the distributed storage as described in one of claims 1 to 3 is dispatched, it is characterized in that: in step b, according to evaluation The superior degree of each data in matrix and the corresponding relationship of data value size carry out dimension effect using different standardization formula Answer Processing for removing.