CN110535898A - Copy storage, completion, node selecting method and management system in big data storage - Google Patents
Copy storage, completion, node selecting method and management system in big data storage Download PDFInfo
- Publication number
- CN110535898A CN110535898A CN201810545954.9A CN201810545954A CN110535898A CN 110535898 A CN110535898 A CN 110535898A CN 201810545954 A CN201810545954 A CN 201810545954A CN 110535898 A CN110535898 A CN 110535898A
- Authority
- CN
- China
- Prior art keywords
- node
- copy
- disk
- evaluation index
- storage
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1001—Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
- H04L67/1004—Server selection for load balancing
- H04L67/1008—Server selection for load balancing based on parameters of servers, e.g. available memory or workload
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1097—Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer Hardware Design (AREA)
- General Engineering & Computer Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to copy storage, completion, node selecting method and management systems in big data storage, the node selecting method are as follows: the evaluation index of copy storage node is chosen according to the real time status information of each back end server and historical failure information, the predicted value for the probability that data break down wherein is included in evaluation index, the weight for determining each evaluation index obtains back end according to weight computing and carries out copy storage.Based on above-mentioned node selecting method, copy storage is carried out according to the suitable node of three copy Scheme Choices.When copy failure needs completion, copy completion first is carried out according to the active node in the rack where malfunctioning node, when the rack cisco unity malfunction where copy malfunctioning node, active node similar in failure rate is selected to carry out copy completion.Write efficiency and load balance degree when the present invention effectively improves storage in the case where not influencing copy safety fundamentally solve the problems, such as to need load balancing after cluster long-play.
Description
Technical field
The invention belongs to big data storage and field of cloud computer technology, and in particular to copy is deposited in a kind of storage of big data
It puts, completion, node selecting method and management system.
Background technique
Existing big data storage system is generally distributed storage (such as HDFS), and node failure and hardware fault are must
It need consider the problems of, the utilization of Replication technology ensure that the reliabilty and availability of system.Guarantee appropriately putting for data copy
Set and select also to ensure that the access for more efficiently realizing data.Big data storage strategy generallys use three copy plans
Slightly, it ensure that the safety of data, can effectively support distribution formula calculate, but when considering the locality of reference of data,
Copy distribution it is unreasonable, can to data localize it is demanding calculating have an impact, can assign the task to have copy but
The lower machine of performance causes the performance of entire cluster to decline in turn.
It is uneven mainly to solve load from the angle of data volume for the load balancing mentioned in current common copy storage strategy
Data with existing is allowed to carry out load balancing after weighing apparatus, its essence of the realization of load balancing is the transfer of copy at this time.Load balancing is only
It can be at last to a kind of tactful unreasonable means to save the situation of copy storage.Ideal method should be can be according to working as when placing copy
The behavior pattern of preceding cluster independently selects or adjusts the position of copy storage.
It also proposed the side placed according to rack, hard disk usage amount, loading condition in existing copy storage strategy
Method, but the influence factor that distinct methods are referred in selection is relatively simple, cannot still take into account load balancing and storage effect
Rate problem.Especially in isomeric group, such issues that it is just more obvious, for example may be deposited on the poor machine of some performances
But occurs idle situation in excessively high load, the higher machine of some performances.Copy occurs in big data storage cluster
Duplication and transfer are usually because caused by the hardware fault of server, and the design of general server is 5-7 using the time, single
The actual useful year and server batch of platform server, use intensity, use environment have relationship.Current copy storage side
The factor for server failure or aging is not accounted in method, as bibliography " put by the HDFS copy based on support vector machines
Set improvement strategy " (author: Luo Jun etc., computer engineering, in November, 2015 o. 11th of volume 41) its only consider relative load rate,
5 network distance, disk performance, cpu performance and memory factors, when server fail, the duplication and migration of data are dynamic
Work be it is random, will result in the disordering of Replica placement in this way.
From the angle analysis of operational research, the copy storage strategy of big data may be considered a kind of policy-making problem, and
It is the policy-making problem for being difficult to quantitative analysis.For problems, there is a kind of analytic hierarchy process (AHP), the evaluation index being involved in into
Row compares respectively, and the relative importance of several groups of relationships can be gone out with quantitative analysis, obtains finally by weighting to all kinds of influence factors
Ideal effect can theoretically obtain ideal result as long as it can provide suitable weight.
Summary of the invention
The present invention is supplied to copy storage, completion, node selecting method and management system in a kind of storage of big data, with
It solves not carry out effectively perceive to each back end server state in existing cluster copy storage strategy, selects reference
Influence factor is relatively simple, cannot take into account load balancing and storage efficiency problem.
In order to solve the above technical problems, copy storage node selecting method includes following four in big data storage of the invention
A unit scheme:
Unit scheme one is chosen copy according to the real time status information of each back end server and historical failure information and is deposited
The evaluation index of node is put, including disk utilization rate, magnetic disc i/o load factor, cpu load rate, memory load factor, read-write task connect
Rate and node failure rate are connect, the read-write task bonding ratio is the connection number and file system permission that current server reads and writes task
Read-write task maximum connection number ratio;Determine the weight of each evaluation index, then calculated ginseng according to the following formula
Value is examined to select back end as copy storage position:
ω=λ0ωdisk_used+λ1ωdisk_io+λ2ωcpu+λ3ωmem+λ4ωprocess+λ5ωfr
Wherein, ω is that back end selects reference value, ωdisk_used、ωdisk_io、ωcpu、ωmem、ωprocess、ωfrPoint
Not Wei disk utilization rate, magnetic disc i/o load factor, cpu load rate, memory load factor, read-write task bonding ratio and node failure rate,
λ0+λ1+λ2+λ3+λ4+λ5=1, λ0、λ1、λ2、λ3、λ4、λ5、ωdisk_used、ωdisk_io、ωcpu、ωmem、ωprocess、ωfr∈
[0,1]。
Unit scheme two determines the weight of each evaluation index using analytic hierarchy process (AHP) on the basis of unit scheme one,
The weight of each evaluation index is described as judgment matrix, each evaluation index is layered, the connection between each layer is realized fixed
Amount analysis, finally seeks a normalization characteristic vector as judgment matrix;When perceiving cluster when handling different task, certainly
Corresponding matrix is matched, adaptively to correct suitable Replica placement position.
Unit scheme three, on the basis of unit scheme one, the node failure rate be back end fault time with
The ratio of line runing time or the ratio of back end years already spent and design life.
Unit scheme four, on the basis of unit scheme two, the node failure rate be back end fault time with
The ratio of line runing time or the ratio of back end years already spent and design life.
Copy deposit method includes following four unit scheme in big data storage of the invention:
Unit scheme one, default number of copies is 3 in this method, and two of them copy is stored in the different sections in same rack
On point, another is stored on the node of different racks, when copy starts storage, if client is back end, by first
On a Replica placement node, if client is not back end, deposited in the node on institute's organic frame according to the big data
The selection method selection node of copy storage node is for placing the first authentic copy in storage;Then with first copy difference rack
Node according to the big data storage in copy storage node selection method selection node for place second copy,
It is selected from triplicate same machine frame and different nodes according to the selection method of copy storage node in big data storage
Node is selected for placing triplicate.
Unit scheme two determines the weight of each evaluation index using analytic hierarchy process (AHP) on the basis of unit scheme one,
The weight of each evaluation index is described as judgment matrix, each evaluation index is layered, the connection between each layer is realized fixed
Amount analysis, finally seeks a normalization characteristic vector as judgment matrix;When perceiving cluster when handling different task, certainly
Corresponding matrix is matched, adaptively to correct suitable Replica placement position.
Unit scheme three, on the basis of unit scheme one, the node failure rate be back end fault time with
The ratio of line runing time or the ratio of back end years already spent and design life.
Unit scheme four, on the basis of unit scheme two, the node failure rate be back end fault time with
The ratio of line runing time or the ratio of back end years already spent and design life.
Copy complementing method includes following five unit schemes in big data storage of the invention:
Unit scheme one obtains each rack lost where copy, and judge when needing the number of copies of completion less than 3
Each malfunctioning node whether there is active node on the rack, and active node is then in these active nodes according to setting if it exists
Node selecting method selection back end be used for the completion of copy, active node if it does not exist, then from malfunctioning node failure
The completion of copy is used in the identical node of rate according to the node selecting method selection node of setting.
Unit scheme two, on the basis of unit scheme one, the node selecting method of the setting are as follows: according to each data section
The real time status information and historical failure information of point server choose the evaluation index of copy storage node, including disk uses
Rate, magnetic disc i/o load factor, cpu load rate, memory load factor, read-write task bonding ratio and node failure rate, the read-write task
Bonding ratio is the ratio for the read-write task maximum connection number that current server reads and writes the connection number of task and file system allows;Really
The weight of fixed each evaluation index, then calculated reference value selects the back end as copy storage according to the following formula
Position:
ω=λ0ωdisk_used+λ1ωdisk_io+λ2ωcpu+λ3ωmem+λ4ωprocess+λ5ωfr
Wherein, ω is that back end selects reference value, ωdisk_used、ωdisk_io、ωcpu、ωmem、ωprocess、ωfrPoint
Not Wei disk utilization rate, magnetic disc i/o load factor, cpu load rate, memory load factor, read-write task bonding ratio and node failure rate,
λ0+λ1+λ2+λ3+λ4+λ5=1, λ0、λ1、λ2、λ3、λ4、λ5、ωdisk_used、ωdisk_io、ωcpu、ωmem、ωprocess、ωfr∈
[0,1]。
Unit scheme three determines the weight of each evaluation index using analytic hierarchy process (AHP) on the basis of unit scheme two,
The weight of each evaluation index is described as judgment matrix, each evaluation index is layered, the connection between each layer is realized fixed
Amount analysis, finally seeks a normalization characteristic vector as judgment matrix;When perceiving cluster when handling different task, certainly
Corresponding matrix is matched, adaptively to correct suitable Replica placement position.
Unit scheme four, on the basis of unit scheme two, the node failure rate be back end fault time with
The ratio of line runing time or the ratio of back end years already spent and design life.
Unit scheme five, on the basis of unit scheme three, the node failure rate be back end fault time with
The ratio of line runing time or the ratio of back end years already spent and design life.
Replica management system includes following four unit scheme in big data storage of the invention:
Unit scheme one, the system can be realized following functions: according to the real time status information of each back end server
And historical failure information chooses the evaluation index of copy storage node, including disk utilization rate, magnetic disc i/o load factor, cpu load
Rate, memory load factor, read-write task bonding ratio and node failure rate, the read-write task bonding ratio are that current server read-write is appointed
The ratio for the read-write task maximum connection number that the connection number and file system of business allow;Determine the weight of each evaluation index, so
Calculated reference value selects the back end as copy storage position according to the following formula afterwards:
ω=λ0ωdisk_used+λ1ωdisk_io+λ2ωcpu+λ3ωmem+λ4ωprocess+λ5ωfr
Wherein, ω is that back end selects reference value, ωdisk_used、ωdisk_io、ωcpu、ωmem、ωprocess、ωfrPoint
Not Wei disk utilization rate, magnetic disc i/o load factor, cpu load rate, memory load factor, read-write task bonding ratio and node failure rate,
λ0+λ1+λ2+λ3+λ4+λ5=1, λ0、λ1、λ2、λ3、λ4、λ5、ωdisk_used、ωdisk_io、ωcpu、ωmem、ωprocess、ωfr∈
[0,1]。
Unit scheme two determines the weight of each evaluation index using analytic hierarchy process (AHP) on the basis of unit scheme one,
The weight of each evaluation index is described as judgment matrix, each evaluation index is layered, the connection between each layer is realized fixed
Amount analysis, finally seeks a normalization characteristic vector as judgment matrix;When perceiving cluster when handling different task, certainly
Corresponding matrix is matched, adaptively to correct suitable Replica placement position.
Unit scheme three, on the basis of unit scheme one, the node failure rate be back end fault time with
The ratio of line runing time or the ratio of back end years already spent and design life.
Unit scheme four, on the basis of unit scheme two, the node failure rate be back end fault time with
The ratio of line runing time or the ratio of back end years already spent and design life.
The beneficial effects of the present invention are: the present invention believes the real time status information and historical failure of each back end server
Breath is perceived, and is provided relatively reliable back end for management node and is carried out copy storage, in the feelings for not influencing copy safety
The write efficiency and load balance degree when storage are effectively improved under condition, fundamentally solve to need after cluster long-play
The problem of load balancing.
Server failure information is included in evaluation index by the present invention, and the predicted value for the probability that data are broken down is as number
The data copy of completion will be needed according to setting side when back end really breaks down according to the reference factor that copy is stored
Method carries out completion, the disordering for avoiding data from storing, while reducing expense when copy completion to the greatest extent.It substitutes to a certain extent
The function of rack perception.
When copy failure needs completion, copy benefit preferentially is carried out according to the active node in the rack where malfunctioning node
Entirely.It is preferential that active node similar in failure rate is selected to carry out pair when the rack cisco unity malfunction where copy malfunctioning node
This completion, it is contemplated that the factor of batch when server disposition can be guaranteed as much as possible in this way the node and failure section that completion copy occurs
Point is and then to guarantee that copy is stored in close positions in same batch.
Detailed description of the invention
Fig. 1 is the design reference illustraton of model that copy stores node selecting method in big data of the present invention storage;
Fig. 2 is that copy stores flow chart in big data of the present invention storage;
Fig. 3 is copy completion flow chart in big data of the present invention storage.
Specific embodiment
The technical scheme of the present invention will be explained in further detail with reference to the accompanying drawing.
Copy stores node selecting method embodiment in big data storage of the present invention
The method of the present embodiment are as follows: the real time status information and historical failure information of each back end server are felt
Know, is referred to according to the evaluation that the real time status information of each back end server and historical failure information choose copy storage node
Mark, including disk utilization rate, magnetic disc i/o load factor, cpu load rate, memory load factor, read-write task bonding ratio and node failure
Rate;Determine the weight of each evaluation index, then according to the following formula calculated reference value come select back end as pair
This storage position:
ω=λ0ωdisk_used+λ1ωdisk_io+λ2ωcpu+λ3ωmem+λ4ωprocess+λ5ωfr
Wherein, ω is that back end selects reference value, ωdisk_used、ωdisk_io、ωcpu、ωmem、ωprocess、ωfrPoint
Not Wei disk utilization rate, magnetic disc i/o load factor, cpu load rate, memory load factor, read-write task bonding ratio and node failure rate,
λ0+λ1+λ2+λ3+λ4+λ5=1, λ0、λ1、λ2、λ3、λ4、λ5、ωdisk_used、ωdisk_io、ωcpu、ωmem、ωprocess、ωfr∈
[0,1]。
Wherein, ωdisk_used、ωdisk_io、ωcpu、ωmemIt can obtain, read in corresponding operating system in cluster server
Writing task bonding ratio ωprocessThe connection number of task and the read-write task maximum company of file system permission are read and write for current server
Meet several ratio, ωfrParameter can be obtained in conjunction with the fault time of server and the ratio calculation of on-line operation time.Some
It can also be the ratio of years already spent and design life by this parameter designing in the cluster not recorded to failure,
So the modelling of the parameter and the production time of server and deployment time have relationship to same data center.
The weight of each evaluation index once it is determined that, the Placement Strategy of copy determines substantially.The weight of evaluation index can root
According to needing to modify, this scheme can be punctured into the method for carrying out Replica placement by single evaluation index, to adapt to after modification
Special workplace.The weight of each evaluation index can be described as judgment matrix, appointed when perceiving cluster in processing difference
When business, corresponding matrix is adaptively matched, to correct suitable locations of copies.
The weight of evaluation index indicates that wherein A, B, C, D, E, F respectively indicate above-mentioned six with matrix [A B C D E F]
Evaluation index is layered each evaluation index, and quantitative analysis is realized in the connection between each layer, as lowercase ab indicates AB two
The relativeness of interlayer can determine the weight of each evaluation index according to the relativeness between evaluation index each in table 1,
A normalization characteristic vector finally is sought as judgment matrix, can be calculated by table 1.Cluster is perceived in processing difference as worked as
When task, pay the utmost attention to evaluation index A, then, evaluation index A accounts for the largest percentage, then according to evaluation index A and other five
Relationship between a evaluation index that weight is arranged for other 5 evaluation indexes.After setting weight, according to each nodal information
The reference value of calculating obtains Replica placement position, and mentality of designing is as shown in Figure 1.
The relativeness matrix of 1 evaluation index of table
A | B | C | D | E | F | |
A | 1 | ab | ac | ad | ae | af |
B | 1/ab | 1 | bc | bd | be | bf |
C | 1/ac | 1/bc | 1 | cd | ce | cf |
D | 1/ad | 1/bd | 1/cd | 1 | de | df |
E | 1/ae | 1/be | 1/ce | 1/de | 1 | ef |
F | 1/af | 1/bf | 1/cf | 1/df | 1/ef | 1 |
Copy deposit method embodiment in big data storage of the present invention
The copy deposit method of the present embodiment is stored according to three copy schemes, according to the original being placed on copy in two racks
Then guarantee the reliability of copy, i.e. two of them copy is stored on the different nodes in same rack, another is stored in
On the node of different racks.When copy starts storage, if client is back end, by first Replica placement node,
If client is not back end, deposited in the node on institute's organic frame according to copy in the big data storage in above-described embodiment
The selection method selection node of node is put for placing the first authentic copy;Then from the different nodes of first same rack of copy
According in above-described embodiment big data storage in copy storage node selection method selection node for place second
Copy is being different from depositing in the node where the first and second copies in the rack of rack according to the big data in above-described embodiment
The selection method selection node of copy storage node is for placing triplicate in storage.
As shown in Fig. 2, when the number of copies for needing to store is greater than 0, step 1) judgement to be stored specific storage process
Whether it is first copy, if so, entering step 2), otherwise enters step 3);
Step 2) judges whether client is back end, if so, entering step 4), otherwise enters step 5);
Whether what step 3) judgement to be stored is second copy, is otherwise third copy if so, entering step 6),
It enters step 7);
Step 4) selects this according to node for placing first copy;
Step 5) stores node according to copy in the big data storage in above-described embodiment in the node on institute's organic frame
Selection method selection node for placing the first authentic copy;
Step 6) in the node of first copy difference rack according to secondary in the big data storage in above-described embodiment
The selection method selection node of this storage node is for placing second copy;
Step 7) judges whether first copy and second copy are stored in the same rack, if so, entering step
8) it, otherwise enters step 9);
Step 8) is being different from the node where the first and second copies in the rack of rack according in above-described embodiment
Big data storage in copy storage node selection method selection node for placing triplicate, enter step 10);
Step 9) is stored from the different nodes of second same rack of copy according to the big data in above-described embodiment
10) the selection method selection node of middle copy storage node is entered step for placing triplicate;
Three Replica placement of step 10) is completed, and process is terminated.
Copy complementing method embodiment in big data storage of the present invention
The complementing method of the present embodiment are as follows: when needing the number of copies of completion less than 3, obtain each machine lost where copy
Frame, and judge that each malfunctioning node whether there is active node on the rack, active node is then in these active nodes if it exists
In be used for the completion of copy according to the node selecting method selection back end of setting, active node if it does not exist, then from therefore
Hinder the completion for being used for copy in the identical node of node failure rate according to the node selecting method selection node of setting.
For specific completion process as shown in figure 3, when needing the number of copies of completion to be greater than 0, judgement needs the copy of completion
Whether number is equal to 3, if being equal to 3, enters step 4), if entering step 1) less than 3;
Step 1) obtains each rack lost where copy, and judges each malfunctioning node on the rack and whether there is activity
Node, if it exists 2) active node, enters step, and otherwise enters step 3);
The active node of rack where step 2) enumerates malfunctioning node, it is suitable to calculate according to the node selecting method of setting
Node carries out copy completion;
Step 3) enumerates node identical with malfunctioning node failure rate, and it is suitable to calculate according to the node selecting method of setting
Node carries out copy completion;
Step 4) generates bad block, alarms.
Above-mentioned steps 2), 3) described in the node selecting method of setting can be using copy in above-mentioned big data storage
The selection method of node is stored, other node selecting methods in the prior art can also be used, be no longer described in detail here.
Replica management system embodiment in big data storage of the present invention
Replica management system is that can be realized copy storage in a kind of storage of big data in the big data storage of the present embodiment
The management platform of node selecting method.Copy storage node selecting method can be found in above-mentioned implementation in big data storage therein
Example, is no longer discussed in detail here.
Specific embodiment of the present invention is presented above, but the present invention is not limited to described embodiment.
Under the thinking that the present invention provides, to the skill in above-described embodiment by the way of being readily apparent that those skilled in the art
Art means are converted, are replaced, are modified, and play the role of with the present invention in relevant art means it is essentially identical, realize
Goal of the invention it is also essentially identical, the technical solution formed in this way is to be finely adjusted to be formed to above-described embodiment, this technology
Scheme is still fallen in protection scope of the present invention.
Claims (10)
1. the selection method of copy storage node in big data storage, which is characterized in that this method are as follows: taken according to each back end
The real time status information and historical failure information of business device choose the evaluation index of copy storage node, including disk utilization rate, magnetic
Disk I/O load factor, cpu load rate, memory load factor, read-write task bonding ratio and node failure rate, the read-write task connection
Rate is the ratio for the read-write task maximum connection number that current server reads and writes the connection number of task and file system allows;It determines each
The weight of a evaluation index, then calculated reference value selects the back end as copy storage position according to the following formula
It sets:
ω=λ0ωdisk_used+λ1ωdisk_io+λ2ωcpu+λ3ωmem+λ4ωprocess+λ5ωfr
Wherein, ω is that back end selects reference value, ωdisk_used、ωdisk_io、ωcpu、ωmem、ωprocess、ωfrRespectively magnetic
Disk utilization rate, magnetic disc i/o load factor, cpu load rate, memory load factor, read-write task bonding ratio and node failure rate, λ0+λ1+
λ2+λ3+λ4+λ5=1, λ0、λ1、λ2、λ3、λ4、λ5、ωdisk_used、ωdisk_io、ωcpu、ωmem、ωprocess、ωfr∈[0,1]。
2. the selection method of copy storage node in big data storage according to claim 1, which is characterized in that use layer
Fractional analysis determines the weight of each evaluation index, and the weight of each evaluation index is described as judgment matrix, refers to each evaluation
Mark is layered, and quantitative analysis is realized in the connection between each layer, finally seeks a normalization characteristic vector as judgment matrix;
When perceiving cluster when handling different task, corresponding matrix is adaptively matched, to correct suitable Replica placement position.
3. the selection method of copy storage node in big data storage according to claim 1 or 2, which is characterized in that institute
State the ratio or back end years already spent and design that node failure rate is back end fault time and on-line operation time
The ratio of service life.
4. using copy is deposited in the big data storage of the selection method of copy storage node in the storage of big data described in claim 1
Put method, default number of copies is 3 in this method, and two of them copy is stored on the different nodes in same rack, in addition one
On a node for being stored in different racks, which is characterized in that when copy starts storage, if client is back end, by first
On a Replica placement node, if client is not back end, deposited in the node on institute's organic frame according to the big data
The selection method selection node of copy storage node is for placing the first authentic copy in storage;Then with first copy difference rack
Node according to the big data storage in copy storage node selection method selection node for place second copy,
It is selected from triplicate same machine frame and different nodes according to the selection method of copy storage node in big data storage
Node is selected for placing triplicate.
5. copy deposit method in big data storage according to claim 4, which is characterized in that true using analytic hierarchy process (AHP)
The weight of fixed each evaluation index, is described as judgment matrix for the weight of each evaluation index, is layered to each evaluation index,
Quantitative analysis is realized in connection between each layer, finally seeks a normalization characteristic vector as judgment matrix;Collect when perceiving
Group matches corresponding matrix, adaptively when handling different task to correct suitable Replica placement position.
6. copy complementing method in big data storage, which is characterized in that when needing the number of copies of completion less than 3, acquisition is respectively lost
The rack where copy is lost, and judges that each malfunctioning node whether there is active node on the rack, active node is then if it exists
The completion of copy is used for according to the node selecting method selection back end of setting in these active nodes, it is movable if it does not exist
Node, then the node selecting method selection node from node identical with malfunctioning node failure rate according to setting is for copy
Completion.
7. copy complementing method in big data storage according to claim 6, which is characterized in that the node of the setting selects
Method are as follows: the evaluation of copy storage node is chosen according to the real time status information of each back end server and historical failure information
Index, including disk utilization rate, magnetic disc i/o load factor, cpu load rate, memory load factor, read-write task bonding ratio and node event
Barrier rate, the read-write task that the read-write task bonding ratio reads and writes the connection number of task for current server and file system allows is most
The ratio of big connection number;Determine the weight of each evaluation index, then calculated reference value selects number according to the following formula
According to node as copy storage position:
ω=λ0ωdisk_used+λ1ωdisk_io+λ2ωcpu+λ3ωmem+λ4ωprocess+λ5ωfr
Wherein, ω is that back end selects reference value, ωdisk_used、ωdisk_io、ωcpu、ωmem、ωprocess、ωfrRespectively magnetic
Disk utilization rate, magnetic disc i/o load factor, cpu load rate, memory load factor, read-write task bonding ratio and node failure rate, λ0+λ1+
λ2+λ3+λ4+λ5=1, λ0、λ1、λ2、λ3、λ4、λ5、ωdisk_used、ωdisk_io、ωcpu、ωmem、ωprocess、ωfr∈[0,1]。
8. copy complementing method in big data storage according to claim 7, which is characterized in that true using analytic hierarchy process (AHP)
The weight of fixed each evaluation index, is described as judgment matrix for the weight of each evaluation index, is layered to each evaluation index,
Quantitative analysis is realized in connection between each layer, finally seeks a normalization characteristic vector as judgment matrix;Collect when perceiving
Group matches corresponding matrix, adaptively when handling different task to correct suitable Replica placement position.
9. replica management system in big data storage, which is characterized in that the system can be realized following functions: according to each data section
The real time status information and historical failure information of point server choose the evaluation index of copy storage node, including disk uses
Rate, magnetic disc i/o load factor, cpu load rate, memory load factor, read-write task bonding ratio and node failure rate, the read-write task
Bonding ratio is the ratio for the read-write task maximum connection number that current server reads and writes the connection number of task and file system allows;Really
The weight of fixed each evaluation index, then calculated reference value selects the back end as copy storage according to the following formula
Position:
ω=λ0ωdisk_used+λ1ωdisk_io+λ2ωcpu+λ3ωmem+λ4ωprocess+λ5ωfr
Wherein, ω is that back end selects reference value, ωdisk_used、ωdisk_io、ωcpu、ωmem、ωprocess、ωfrRespectively magnetic
Disk utilization rate, magnetic disc i/o load factor, cpu load rate, memory load factor, read-write task bonding ratio and node failure rate, λ0+λ1+
λ2+λ3+λ4+λ5=1, λ0、λ1、λ2、λ3、λ4、λ5、ωdisk_used、ωdisk_io、ωcpu、ωmem、ωprocess、ωfr∈[0,1]。
10. replica management system in big data storage according to claim 9, which is characterized in that use analytic hierarchy process (AHP)
The weight of each evaluation index is described as judgment matrix, divided each evaluation index by the weight for determining each evaluation index
Layer, quantitative analysis is realized in the connection between each layer, finally seeks a normalization characteristic vector as judgment matrix;When perceiving
Cluster adaptively matches corresponding matrix when handling different task, to correct suitable Replica placement position.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810545954.9A CN110535898B (en) | 2018-05-25 | 2018-05-25 | Method for storing and complementing copies and selecting nodes in big data storage and management system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810545954.9A CN110535898B (en) | 2018-05-25 | 2018-05-25 | Method for storing and complementing copies and selecting nodes in big data storage and management system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110535898A true CN110535898A (en) | 2019-12-03 |
CN110535898B CN110535898B (en) | 2022-10-04 |
Family
ID=68657167
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810545954.9A Active CN110535898B (en) | 2018-05-25 | 2018-05-25 | Method for storing and complementing copies and selecting nodes in big data storage and management system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110535898B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112487093A (en) * | 2020-12-07 | 2021-03-12 | 浪潮云信息技术股份公司 | Decentralized copy control method for distributed database |
Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100106808A1 (en) * | 2008-10-24 | 2010-04-29 | Microsoft Corporation | Replica placement in a distributed storage system |
CN102667761A (en) * | 2009-06-19 | 2012-09-12 | 布雷克公司 | Scalable cluster database |
US20140052706A1 (en) * | 2011-04-29 | 2014-02-20 | Prateep Misra | Archival storage and retrieval system |
CN103678051A (en) * | 2013-11-18 | 2014-03-26 | 航天恒星科技有限公司 | On-line fault tolerance method in cluster data processing system |
CN104156381A (en) * | 2014-03-27 | 2014-11-19 | 深圳信息职业技术学院 | Copy access method and device for Hadoop distributed file system and Hadoop distributed file system |
CN104468651A (en) * | 2013-09-17 | 2015-03-25 | 南京中兴新软件有限责任公司 | Distributed multi-copy storage method and device |
CN104615606A (en) * | 2013-11-05 | 2015-05-13 | 阿里巴巴集团控股有限公司 | Hadoop distributed file system and management method thereof |
US20150186411A1 (en) * | 2014-01-02 | 2015-07-02 | International Business Machines Corporation | Enhancing Reliability of a Storage System by Strategic Replica Placement and Migration |
US20160004631A1 (en) * | 2014-07-03 | 2016-01-07 | Pure Storage, Inc. | Profile-Dependent Write Placement of Data into a Non-Volatile Solid-State Storage |
CN105915626A (en) * | 2016-05-27 | 2016-08-31 | 南京邮电大学 | Data copy initial placement method for cloud storage |
CN106302702A (en) * | 2016-08-10 | 2017-01-04 | 华为技术有限公司 | Burst storage method, the Apparatus and system of data |
CN106293492A (en) * | 2015-05-14 | 2017-01-04 | 中兴通讯股份有限公司 | A kind of memory management method and distributed file system |
CN106612322A (en) * | 2016-07-11 | 2017-05-03 | 四川用联信息技术有限公司 | Data recovery method for distribution optimization of data storing nodes in cloud storage |
US20180004414A1 (en) * | 2016-06-29 | 2018-01-04 | EMC IP Holding Company LLC | Incremental erasure coding for storage systems |
CN107729514A (en) * | 2017-10-25 | 2018-02-23 | 郑州云海信息技术有限公司 | A kind of Replica placement node based on hadoop determines method and device |
CN108009260A (en) * | 2017-12-11 | 2018-05-08 | 西安交通大学 | A kind of big data storage is lower with reference to node load and the Replica placement method of distance |
-
2018
- 2018-05-25 CN CN201810545954.9A patent/CN110535898B/en active Active
Patent Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100106808A1 (en) * | 2008-10-24 | 2010-04-29 | Microsoft Corporation | Replica placement in a distributed storage system |
CN102667761A (en) * | 2009-06-19 | 2012-09-12 | 布雷克公司 | Scalable cluster database |
US20140052706A1 (en) * | 2011-04-29 | 2014-02-20 | Prateep Misra | Archival storage and retrieval system |
CN104468651A (en) * | 2013-09-17 | 2015-03-25 | 南京中兴新软件有限责任公司 | Distributed multi-copy storage method and device |
CN104615606A (en) * | 2013-11-05 | 2015-05-13 | 阿里巴巴集团控股有限公司 | Hadoop distributed file system and management method thereof |
CN103678051A (en) * | 2013-11-18 | 2014-03-26 | 航天恒星科技有限公司 | On-line fault tolerance method in cluster data processing system |
US20150186411A1 (en) * | 2014-01-02 | 2015-07-02 | International Business Machines Corporation | Enhancing Reliability of a Storage System by Strategic Replica Placement and Migration |
CN104156381A (en) * | 2014-03-27 | 2014-11-19 | 深圳信息职业技术学院 | Copy access method and device for Hadoop distributed file system and Hadoop distributed file system |
US20160004631A1 (en) * | 2014-07-03 | 2016-01-07 | Pure Storage, Inc. | Profile-Dependent Write Placement of Data into a Non-Volatile Solid-State Storage |
CN106293492A (en) * | 2015-05-14 | 2017-01-04 | 中兴通讯股份有限公司 | A kind of memory management method and distributed file system |
CN105915626A (en) * | 2016-05-27 | 2016-08-31 | 南京邮电大学 | Data copy initial placement method for cloud storage |
US20180004414A1 (en) * | 2016-06-29 | 2018-01-04 | EMC IP Holding Company LLC | Incremental erasure coding for storage systems |
CN106612322A (en) * | 2016-07-11 | 2017-05-03 | 四川用联信息技术有限公司 | Data recovery method for distribution optimization of data storing nodes in cloud storage |
CN106302702A (en) * | 2016-08-10 | 2017-01-04 | 华为技术有限公司 | Burst storage method, the Apparatus and system of data |
CN107729514A (en) * | 2017-10-25 | 2018-02-23 | 郑州云海信息技术有限公司 | A kind of Replica placement node based on hadoop determines method and device |
CN108009260A (en) * | 2017-12-11 | 2018-05-08 | 西安交通大学 | A kind of big data storage is lower with reference to node load and the Replica placement method of distance |
Non-Patent Citations (3)
Title |
---|
刘艳等: "异构Hadoop集群中数据副本放置策略优化", 《华中科技大学学报(自然科学版)》 * |
王岩,汪晋宽: "云存储中动态副本放置机制研究", 《计算机工程与科学 高性能计算》 * |
蔡燕冬,刘艳等: "一种优化的Hadoop副本放置策略", 《微型机与应用》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112487093A (en) * | 2020-12-07 | 2021-03-12 | 浪潮云信息技术股份公司 | Decentralized copy control method for distributed database |
Also Published As
Publication number | Publication date |
---|---|
CN110535898B (en) | 2022-10-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102855294B (en) | Intelligent hash data layout method, cluster storage system and method thereof | |
CN104657459B (en) | A kind of mass data storage means based on file granularity | |
CN104704773B (en) | Cloud storage method and system | |
CN102137133B (en) | Method and system for distributing contents and scheduling server | |
CN107733676A (en) | A kind of method and system of flexible scheduling resource | |
CN104407926B (en) | A kind of dispatching method of cloud computing resources | |
CN103095599A (en) | Dynamic feedback weighted integration load scheduling method of cloud computing operating system | |
CN104462432B (en) | Adaptive distributed computing method | |
CN102882983A (en) | Rapid data memory method for improving concurrent visiting performance in cloud memory system | |
CN106843745A (en) | Capacity expansion method and device | |
CN104094254A (en) | System and method for unbalanced raid management | |
CN102411542A (en) | Dynamic hierarchical storage system and method | |
CN110096350B (en) | Cold and hot area division energy-saving storage method based on cluster node load state prediction | |
CN103455526A (en) | ETL (extract-transform-load) data processing method, device and system | |
US20090313312A1 (en) | Method of Enhancing De-Duplication Impact by Preferential Selection of Master Copy to be Retained | |
CN102577241A (en) | Method, device and system for scheduling distributed buffer resources | |
CN102170460A (en) | Cluster storage system and data storage method thereof | |
CN107729514A (en) | A kind of Replica placement node based on hadoop determines method and device | |
CN111966291B (en) | Data storage method, system and related device in storage cluster | |
CN107450855A (en) | A kind of model for distributed storage variable data distribution method and system | |
CN110058960A (en) | For managing the method, equipment and computer program product of storage system | |
CN109144783A (en) | A kind of distribution magnanimity unstructured data backup method and system | |
CN105915626B (en) | A kind of data copy initial placement method towards cloud storage | |
CN107480254B (en) | Online load balancing method suitable for distributed memory database | |
CN110535898A (en) | Copy storage, completion, node selecting method and management system in big data storage |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |