CN110535898A - Copy storage, completion, node selecting method and management system in big data storage - Google Patents

Copy storage, completion, node selecting method and management system in big data storage Download PDF

Info

Publication number
CN110535898A
CN110535898A CN201810545954.9A CN201810545954A CN110535898A CN 110535898 A CN110535898 A CN 110535898A CN 201810545954 A CN201810545954 A CN 201810545954A CN 110535898 A CN110535898 A CN 110535898A
Authority
CN
China
Prior art keywords
node
copy
disk
evaluation index
storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810545954.9A
Other languages
Chinese (zh)
Other versions
CN110535898B (en
Inventor
丁博
徐大青
张展国
贺彪
杨迎春
王少鹏
刘一擎
丁亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
Xuji Group Co Ltd
Xuchang XJ Software Technology Co Ltd
Original Assignee
State Grid Corp of China SGCC
Xuji Group Co Ltd
Xuchang XJ Software Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, Xuji Group Co Ltd, Xuchang XJ Software Technology Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN201810545954.9A priority Critical patent/CN110535898B/en
Publication of CN110535898A publication Critical patent/CN110535898A/en
Application granted granted Critical
Publication of CN110535898B publication Critical patent/CN110535898B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • H04L67/1008Server selection for load balancing based on parameters of servers, e.g. available memory or workload
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Abstract

The present invention relates to copy storage, completion, node selecting method and management systems in big data storage, the node selecting method are as follows: the evaluation index of copy storage node is chosen according to the real time status information of each back end server and historical failure information, the predicted value for the probability that data break down wherein is included in evaluation index, the weight for determining each evaluation index obtains back end according to weight computing and carries out copy storage.Based on above-mentioned node selecting method, copy storage is carried out according to the suitable node of three copy Scheme Choices.When copy failure needs completion, copy completion first is carried out according to the active node in the rack where malfunctioning node, when the rack cisco unity malfunction where copy malfunctioning node, active node similar in failure rate is selected to carry out copy completion.Write efficiency and load balance degree when the present invention effectively improves storage in the case where not influencing copy safety fundamentally solve the problems, such as to need load balancing after cluster long-play.

Description

Copy storage, completion, node selecting method and management system in big data storage
Technical field
The invention belongs to big data storage and field of cloud computer technology, and in particular to copy is deposited in a kind of storage of big data It puts, completion, node selecting method and management system.
Background technique
Existing big data storage system is generally distributed storage (such as HDFS), and node failure and hardware fault are must It need consider the problems of, the utilization of Replication technology ensure that the reliabilty and availability of system.Guarantee appropriately putting for data copy Set and select also to ensure that the access for more efficiently realizing data.Big data storage strategy generallys use three copy plans Slightly, it ensure that the safety of data, can effectively support distribution formula calculate, but when considering the locality of reference of data, Copy distribution it is unreasonable, can to data localize it is demanding calculating have an impact, can assign the task to have copy but The lower machine of performance causes the performance of entire cluster to decline in turn.
It is uneven mainly to solve load from the angle of data volume for the load balancing mentioned in current common copy storage strategy Data with existing is allowed to carry out load balancing after weighing apparatus, its essence of the realization of load balancing is the transfer of copy at this time.Load balancing is only It can be at last to a kind of tactful unreasonable means to save the situation of copy storage.Ideal method should be can be according to working as when placing copy The behavior pattern of preceding cluster independently selects or adjusts the position of copy storage.
It also proposed the side placed according to rack, hard disk usage amount, loading condition in existing copy storage strategy Method, but the influence factor that distinct methods are referred in selection is relatively simple, cannot still take into account load balancing and storage effect Rate problem.Especially in isomeric group, such issues that it is just more obvious, for example may be deposited on the poor machine of some performances But occurs idle situation in excessively high load, the higher machine of some performances.Copy occurs in big data storage cluster Duplication and transfer are usually because caused by the hardware fault of server, and the design of general server is 5-7 using the time, single The actual useful year and server batch of platform server, use intensity, use environment have relationship.Current copy storage side The factor for server failure or aging is not accounted in method, as bibliography " put by the HDFS copy based on support vector machines Set improvement strategy " (author: Luo Jun etc., computer engineering, in November, 2015 o. 11th of volume 41) its only consider relative load rate, 5 network distance, disk performance, cpu performance and memory factors, when server fail, the duplication and migration of data are dynamic Work be it is random, will result in the disordering of Replica placement in this way.
From the angle analysis of operational research, the copy storage strategy of big data may be considered a kind of policy-making problem, and It is the policy-making problem for being difficult to quantitative analysis.For problems, there is a kind of analytic hierarchy process (AHP), the evaluation index being involved in into Row compares respectively, and the relative importance of several groups of relationships can be gone out with quantitative analysis, obtains finally by weighting to all kinds of influence factors Ideal effect can theoretically obtain ideal result as long as it can provide suitable weight.
Summary of the invention
The present invention is supplied to copy storage, completion, node selecting method and management system in a kind of storage of big data, with It solves not carry out effectively perceive to each back end server state in existing cluster copy storage strategy, selects reference Influence factor is relatively simple, cannot take into account load balancing and storage efficiency problem.
In order to solve the above technical problems, copy storage node selecting method includes following four in big data storage of the invention A unit scheme:
Unit scheme one is chosen copy according to the real time status information of each back end server and historical failure information and is deposited The evaluation index of node is put, including disk utilization rate, magnetic disc i/o load factor, cpu load rate, memory load factor, read-write task connect Rate and node failure rate are connect, the read-write task bonding ratio is the connection number and file system permission that current server reads and writes task Read-write task maximum connection number ratio;Determine the weight of each evaluation index, then calculated ginseng according to the following formula Value is examined to select back end as copy storage position:
ω=λ0ωdisk_used1ωdisk_io2ωcpu3ωmem4ωprocess5ωfr
Wherein, ω is that back end selects reference value, ωdisk_used、ωdisk_io、ωcpu、ωmem、ωprocess、ωfrPoint Not Wei disk utilization rate, magnetic disc i/o load factor, cpu load rate, memory load factor, read-write task bonding ratio and node failure rate, λ012345=1, λ0、λ1、λ2、λ3、λ4、λ5、ωdisk_used、ωdisk_io、ωcpu、ωmem、ωprocess、ωfr∈ [0,1]。
Unit scheme two determines the weight of each evaluation index using analytic hierarchy process (AHP) on the basis of unit scheme one, The weight of each evaluation index is described as judgment matrix, each evaluation index is layered, the connection between each layer is realized fixed Amount analysis, finally seeks a normalization characteristic vector as judgment matrix;When perceiving cluster when handling different task, certainly Corresponding matrix is matched, adaptively to correct suitable Replica placement position.
Unit scheme three, on the basis of unit scheme one, the node failure rate be back end fault time with The ratio of line runing time or the ratio of back end years already spent and design life.
Unit scheme four, on the basis of unit scheme two, the node failure rate be back end fault time with The ratio of line runing time or the ratio of back end years already spent and design life.
Copy deposit method includes following four unit scheme in big data storage of the invention:
Unit scheme one, default number of copies is 3 in this method, and two of them copy is stored in the different sections in same rack On point, another is stored on the node of different racks, when copy starts storage, if client is back end, by first On a Replica placement node, if client is not back end, deposited in the node on institute's organic frame according to the big data The selection method selection node of copy storage node is for placing the first authentic copy in storage;Then with first copy difference rack Node according to the big data storage in copy storage node selection method selection node for place second copy, It is selected from triplicate same machine frame and different nodes according to the selection method of copy storage node in big data storage Node is selected for placing triplicate.
Unit scheme two determines the weight of each evaluation index using analytic hierarchy process (AHP) on the basis of unit scheme one, The weight of each evaluation index is described as judgment matrix, each evaluation index is layered, the connection between each layer is realized fixed Amount analysis, finally seeks a normalization characteristic vector as judgment matrix;When perceiving cluster when handling different task, certainly Corresponding matrix is matched, adaptively to correct suitable Replica placement position.
Unit scheme three, on the basis of unit scheme one, the node failure rate be back end fault time with The ratio of line runing time or the ratio of back end years already spent and design life.
Unit scheme four, on the basis of unit scheme two, the node failure rate be back end fault time with The ratio of line runing time or the ratio of back end years already spent and design life.
Copy complementing method includes following five unit schemes in big data storage of the invention:
Unit scheme one obtains each rack lost where copy, and judge when needing the number of copies of completion less than 3 Each malfunctioning node whether there is active node on the rack, and active node is then in these active nodes according to setting if it exists Node selecting method selection back end be used for the completion of copy, active node if it does not exist, then from malfunctioning node failure The completion of copy is used in the identical node of rate according to the node selecting method selection node of setting.
Unit scheme two, on the basis of unit scheme one, the node selecting method of the setting are as follows: according to each data section The real time status information and historical failure information of point server choose the evaluation index of copy storage node, including disk uses Rate, magnetic disc i/o load factor, cpu load rate, memory load factor, read-write task bonding ratio and node failure rate, the read-write task Bonding ratio is the ratio for the read-write task maximum connection number that current server reads and writes the connection number of task and file system allows;Really The weight of fixed each evaluation index, then calculated reference value selects the back end as copy storage according to the following formula Position:
ω=λ0ωdisk_used1ωdisk_io2ωcpu3ωmem4ωprocess5ωfr
Wherein, ω is that back end selects reference value, ωdisk_used、ωdisk_io、ωcpu、ωmem、ωprocess、ωfrPoint Not Wei disk utilization rate, magnetic disc i/o load factor, cpu load rate, memory load factor, read-write task bonding ratio and node failure rate, λ012345=1, λ0、λ1、λ2、λ3、λ4、λ5、ωdisk_used、ωdisk_io、ωcpu、ωmem、ωprocess、ωfr∈ [0,1]。
Unit scheme three determines the weight of each evaluation index using analytic hierarchy process (AHP) on the basis of unit scheme two, The weight of each evaluation index is described as judgment matrix, each evaluation index is layered, the connection between each layer is realized fixed Amount analysis, finally seeks a normalization characteristic vector as judgment matrix;When perceiving cluster when handling different task, certainly Corresponding matrix is matched, adaptively to correct suitable Replica placement position.
Unit scheme four, on the basis of unit scheme two, the node failure rate be back end fault time with The ratio of line runing time or the ratio of back end years already spent and design life.
Unit scheme five, on the basis of unit scheme three, the node failure rate be back end fault time with The ratio of line runing time or the ratio of back end years already spent and design life.
Replica management system includes following four unit scheme in big data storage of the invention:
Unit scheme one, the system can be realized following functions: according to the real time status information of each back end server And historical failure information chooses the evaluation index of copy storage node, including disk utilization rate, magnetic disc i/o load factor, cpu load Rate, memory load factor, read-write task bonding ratio and node failure rate, the read-write task bonding ratio are that current server read-write is appointed The ratio for the read-write task maximum connection number that the connection number and file system of business allow;Determine the weight of each evaluation index, so Calculated reference value selects the back end as copy storage position according to the following formula afterwards:
ω=λ0ωdisk_used1ωdisk_io2ωcpu3ωmem4ωprocess5ωfr
Wherein, ω is that back end selects reference value, ωdisk_used、ωdisk_io、ωcpu、ωmem、ωprocess、ωfrPoint Not Wei disk utilization rate, magnetic disc i/o load factor, cpu load rate, memory load factor, read-write task bonding ratio and node failure rate, λ012345=1, λ0、λ1、λ2、λ3、λ4、λ5、ωdisk_used、ωdisk_io、ωcpu、ωmem、ωprocess、ωfr∈ [0,1]。
Unit scheme two determines the weight of each evaluation index using analytic hierarchy process (AHP) on the basis of unit scheme one, The weight of each evaluation index is described as judgment matrix, each evaluation index is layered, the connection between each layer is realized fixed Amount analysis, finally seeks a normalization characteristic vector as judgment matrix;When perceiving cluster when handling different task, certainly Corresponding matrix is matched, adaptively to correct suitable Replica placement position.
Unit scheme three, on the basis of unit scheme one, the node failure rate be back end fault time with The ratio of line runing time or the ratio of back end years already spent and design life.
Unit scheme four, on the basis of unit scheme two, the node failure rate be back end fault time with The ratio of line runing time or the ratio of back end years already spent and design life.
The beneficial effects of the present invention are: the present invention believes the real time status information and historical failure of each back end server Breath is perceived, and is provided relatively reliable back end for management node and is carried out copy storage, in the feelings for not influencing copy safety The write efficiency and load balance degree when storage are effectively improved under condition, fundamentally solve to need after cluster long-play The problem of load balancing.
Server failure information is included in evaluation index by the present invention, and the predicted value for the probability that data are broken down is as number The data copy of completion will be needed according to setting side when back end really breaks down according to the reference factor that copy is stored Method carries out completion, the disordering for avoiding data from storing, while reducing expense when copy completion to the greatest extent.It substitutes to a certain extent The function of rack perception.
When copy failure needs completion, copy benefit preferentially is carried out according to the active node in the rack where malfunctioning node Entirely.It is preferential that active node similar in failure rate is selected to carry out pair when the rack cisco unity malfunction where copy malfunctioning node This completion, it is contemplated that the factor of batch when server disposition can be guaranteed as much as possible in this way the node and failure section that completion copy occurs Point is and then to guarantee that copy is stored in close positions in same batch.
Detailed description of the invention
Fig. 1 is the design reference illustraton of model that copy stores node selecting method in big data of the present invention storage;
Fig. 2 is that copy stores flow chart in big data of the present invention storage;
Fig. 3 is copy completion flow chart in big data of the present invention storage.
Specific embodiment
The technical scheme of the present invention will be explained in further detail with reference to the accompanying drawing.
Copy stores node selecting method embodiment in big data storage of the present invention
The method of the present embodiment are as follows: the real time status information and historical failure information of each back end server are felt Know, is referred to according to the evaluation that the real time status information of each back end server and historical failure information choose copy storage node Mark, including disk utilization rate, magnetic disc i/o load factor, cpu load rate, memory load factor, read-write task bonding ratio and node failure Rate;Determine the weight of each evaluation index, then according to the following formula calculated reference value come select back end as pair This storage position:
ω=λ0ωdisk_used1ωdisk_io2ωcpu3ωmem4ωprocess5ωfr
Wherein, ω is that back end selects reference value, ωdisk_used、ωdisk_io、ωcpu、ωmem、ωprocess、ωfrPoint Not Wei disk utilization rate, magnetic disc i/o load factor, cpu load rate, memory load factor, read-write task bonding ratio and node failure rate, λ012345=1, λ0、λ1、λ2、λ3、λ4、λ5、ωdisk_used、ωdisk_io、ωcpu、ωmem、ωprocess、ωfr∈ [0,1]。
Wherein, ωdisk_used、ωdisk_io、ωcpu、ωmemIt can obtain, read in corresponding operating system in cluster server Writing task bonding ratio ωprocessThe connection number of task and the read-write task maximum company of file system permission are read and write for current server Meet several ratio, ωfrParameter can be obtained in conjunction with the fault time of server and the ratio calculation of on-line operation time.Some It can also be the ratio of years already spent and design life by this parameter designing in the cluster not recorded to failure, So the modelling of the parameter and the production time of server and deployment time have relationship to same data center.
The weight of each evaluation index once it is determined that, the Placement Strategy of copy determines substantially.The weight of evaluation index can root According to needing to modify, this scheme can be punctured into the method for carrying out Replica placement by single evaluation index, to adapt to after modification Special workplace.The weight of each evaluation index can be described as judgment matrix, appointed when perceiving cluster in processing difference When business, corresponding matrix is adaptively matched, to correct suitable locations of copies.
The weight of evaluation index indicates that wherein A, B, C, D, E, F respectively indicate above-mentioned six with matrix [A B C D E F] Evaluation index is layered each evaluation index, and quantitative analysis is realized in the connection between each layer, as lowercase ab indicates AB two The relativeness of interlayer can determine the weight of each evaluation index according to the relativeness between evaluation index each in table 1, A normalization characteristic vector finally is sought as judgment matrix, can be calculated by table 1.Cluster is perceived in processing difference as worked as When task, pay the utmost attention to evaluation index A, then, evaluation index A accounts for the largest percentage, then according to evaluation index A and other five Relationship between a evaluation index that weight is arranged for other 5 evaluation indexes.After setting weight, according to each nodal information The reference value of calculating obtains Replica placement position, and mentality of designing is as shown in Figure 1.
The relativeness matrix of 1 evaluation index of table
A B C D E F
A 1 ab ac ad ae af
B 1/ab 1 bc bd be bf
C 1/ac 1/bc 1 cd ce cf
D 1/ad 1/bd 1/cd 1 de df
E 1/ae 1/be 1/ce 1/de 1 ef
F 1/af 1/bf 1/cf 1/df 1/ef 1
Copy deposit method embodiment in big data storage of the present invention
The copy deposit method of the present embodiment is stored according to three copy schemes, according to the original being placed on copy in two racks Then guarantee the reliability of copy, i.e. two of them copy is stored on the different nodes in same rack, another is stored in On the node of different racks.When copy starts storage, if client is back end, by first Replica placement node, If client is not back end, deposited in the node on institute's organic frame according to copy in the big data storage in above-described embodiment The selection method selection node of node is put for placing the first authentic copy;Then from the different nodes of first same rack of copy According in above-described embodiment big data storage in copy storage node selection method selection node for place second Copy is being different from depositing in the node where the first and second copies in the rack of rack according to the big data in above-described embodiment The selection method selection node of copy storage node is for placing triplicate in storage.
As shown in Fig. 2, when the number of copies for needing to store is greater than 0, step 1) judgement to be stored specific storage process Whether it is first copy, if so, entering step 2), otherwise enters step 3);
Step 2) judges whether client is back end, if so, entering step 4), otherwise enters step 5);
Whether what step 3) judgement to be stored is second copy, is otherwise third copy if so, entering step 6), It enters step 7);
Step 4) selects this according to node for placing first copy;
Step 5) stores node according to copy in the big data storage in above-described embodiment in the node on institute's organic frame Selection method selection node for placing the first authentic copy;
Step 6) in the node of first copy difference rack according to secondary in the big data storage in above-described embodiment The selection method selection node of this storage node is for placing second copy;
Step 7) judges whether first copy and second copy are stored in the same rack, if so, entering step 8) it, otherwise enters step 9);
Step 8) is being different from the node where the first and second copies in the rack of rack according in above-described embodiment Big data storage in copy storage node selection method selection node for placing triplicate, enter step 10);
Step 9) is stored from the different nodes of second same rack of copy according to the big data in above-described embodiment 10) the selection method selection node of middle copy storage node is entered step for placing triplicate;
Three Replica placement of step 10) is completed, and process is terminated.
Copy complementing method embodiment in big data storage of the present invention
The complementing method of the present embodiment are as follows: when needing the number of copies of completion less than 3, obtain each machine lost where copy Frame, and judge that each malfunctioning node whether there is active node on the rack, active node is then in these active nodes if it exists In be used for the completion of copy according to the node selecting method selection back end of setting, active node if it does not exist, then from therefore Hinder the completion for being used for copy in the identical node of node failure rate according to the node selecting method selection node of setting.
For specific completion process as shown in figure 3, when needing the number of copies of completion to be greater than 0, judgement needs the copy of completion Whether number is equal to 3, if being equal to 3, enters step 4), if entering step 1) less than 3;
Step 1) obtains each rack lost where copy, and judges each malfunctioning node on the rack and whether there is activity Node, if it exists 2) active node, enters step, and otherwise enters step 3);
The active node of rack where step 2) enumerates malfunctioning node, it is suitable to calculate according to the node selecting method of setting Node carries out copy completion;
Step 3) enumerates node identical with malfunctioning node failure rate, and it is suitable to calculate according to the node selecting method of setting Node carries out copy completion;
Step 4) generates bad block, alarms.
Above-mentioned steps 2), 3) described in the node selecting method of setting can be using copy in above-mentioned big data storage The selection method of node is stored, other node selecting methods in the prior art can also be used, be no longer described in detail here.
Replica management system embodiment in big data storage of the present invention
Replica management system is that can be realized copy storage in a kind of storage of big data in the big data storage of the present embodiment The management platform of node selecting method.Copy storage node selecting method can be found in above-mentioned implementation in big data storage therein Example, is no longer discussed in detail here.
Specific embodiment of the present invention is presented above, but the present invention is not limited to described embodiment. Under the thinking that the present invention provides, to the skill in above-described embodiment by the way of being readily apparent that those skilled in the art Art means are converted, are replaced, are modified, and play the role of with the present invention in relevant art means it is essentially identical, realize Goal of the invention it is also essentially identical, the technical solution formed in this way is to be finely adjusted to be formed to above-described embodiment, this technology Scheme is still fallen in protection scope of the present invention.

Claims (10)

1. the selection method of copy storage node in big data storage, which is characterized in that this method are as follows: taken according to each back end The real time status information and historical failure information of business device choose the evaluation index of copy storage node, including disk utilization rate, magnetic Disk I/O load factor, cpu load rate, memory load factor, read-write task bonding ratio and node failure rate, the read-write task connection Rate is the ratio for the read-write task maximum connection number that current server reads and writes the connection number of task and file system allows;It determines each The weight of a evaluation index, then calculated reference value selects the back end as copy storage position according to the following formula It sets:
ω=λ0ωdisk_used1ωdisk_io2ωcpu3ωmem4ωprocess5ωfr
Wherein, ω is that back end selects reference value, ωdisk_used、ωdisk_io、ωcpu、ωmem、ωprocess、ωfrRespectively magnetic Disk utilization rate, magnetic disc i/o load factor, cpu load rate, memory load factor, read-write task bonding ratio and node failure rate, λ01+ λ2345=1, λ0、λ1、λ2、λ3、λ4、λ5、ωdisk_used、ωdisk_io、ωcpu、ωmem、ωprocess、ωfr∈[0,1]。
2. the selection method of copy storage node in big data storage according to claim 1, which is characterized in that use layer Fractional analysis determines the weight of each evaluation index, and the weight of each evaluation index is described as judgment matrix, refers to each evaluation Mark is layered, and quantitative analysis is realized in the connection between each layer, finally seeks a normalization characteristic vector as judgment matrix; When perceiving cluster when handling different task, corresponding matrix is adaptively matched, to correct suitable Replica placement position.
3. the selection method of copy storage node in big data storage according to claim 1 or 2, which is characterized in that institute State the ratio or back end years already spent and design that node failure rate is back end fault time and on-line operation time The ratio of service life.
4. using copy is deposited in the big data storage of the selection method of copy storage node in the storage of big data described in claim 1 Put method, default number of copies is 3 in this method, and two of them copy is stored on the different nodes in same rack, in addition one On a node for being stored in different racks, which is characterized in that when copy starts storage, if client is back end, by first On a Replica placement node, if client is not back end, deposited in the node on institute's organic frame according to the big data The selection method selection node of copy storage node is for placing the first authentic copy in storage;Then with first copy difference rack Node according to the big data storage in copy storage node selection method selection node for place second copy, It is selected from triplicate same machine frame and different nodes according to the selection method of copy storage node in big data storage Node is selected for placing triplicate.
5. copy deposit method in big data storage according to claim 4, which is characterized in that true using analytic hierarchy process (AHP) The weight of fixed each evaluation index, is described as judgment matrix for the weight of each evaluation index, is layered to each evaluation index, Quantitative analysis is realized in connection between each layer, finally seeks a normalization characteristic vector as judgment matrix;Collect when perceiving Group matches corresponding matrix, adaptively when handling different task to correct suitable Replica placement position.
6. copy complementing method in big data storage, which is characterized in that when needing the number of copies of completion less than 3, acquisition is respectively lost The rack where copy is lost, and judges that each malfunctioning node whether there is active node on the rack, active node is then if it exists The completion of copy is used for according to the node selecting method selection back end of setting in these active nodes, it is movable if it does not exist Node, then the node selecting method selection node from node identical with malfunctioning node failure rate according to setting is for copy Completion.
7. copy complementing method in big data storage according to claim 6, which is characterized in that the node of the setting selects Method are as follows: the evaluation of copy storage node is chosen according to the real time status information of each back end server and historical failure information Index, including disk utilization rate, magnetic disc i/o load factor, cpu load rate, memory load factor, read-write task bonding ratio and node event Barrier rate, the read-write task that the read-write task bonding ratio reads and writes the connection number of task for current server and file system allows is most The ratio of big connection number;Determine the weight of each evaluation index, then calculated reference value selects number according to the following formula According to node as copy storage position:
ω=λ0ωdisk_used1ωdisk_io2ωcpu3ωmem4ωprocess5ωfr
Wherein, ω is that back end selects reference value, ωdisk_used、ωdisk_io、ωcpu、ωmem、ωprocess、ωfrRespectively magnetic Disk utilization rate, magnetic disc i/o load factor, cpu load rate, memory load factor, read-write task bonding ratio and node failure rate, λ01+ λ2345=1, λ0、λ1、λ2、λ3、λ4、λ5、ωdisk_used、ωdisk_io、ωcpu、ωmem、ωprocess、ωfr∈[0,1]。
8. copy complementing method in big data storage according to claim 7, which is characterized in that true using analytic hierarchy process (AHP) The weight of fixed each evaluation index, is described as judgment matrix for the weight of each evaluation index, is layered to each evaluation index, Quantitative analysis is realized in connection between each layer, finally seeks a normalization characteristic vector as judgment matrix;Collect when perceiving Group matches corresponding matrix, adaptively when handling different task to correct suitable Replica placement position.
9. replica management system in big data storage, which is characterized in that the system can be realized following functions: according to each data section The real time status information and historical failure information of point server choose the evaluation index of copy storage node, including disk uses Rate, magnetic disc i/o load factor, cpu load rate, memory load factor, read-write task bonding ratio and node failure rate, the read-write task Bonding ratio is the ratio for the read-write task maximum connection number that current server reads and writes the connection number of task and file system allows;Really The weight of fixed each evaluation index, then calculated reference value selects the back end as copy storage according to the following formula Position:
ω=λ0ωdisk_used1ωdisk_io2ωcpu3ωmem4ωprocess5ωfr
Wherein, ω is that back end selects reference value, ωdisk_used、ωdisk_io、ωcpu、ωmem、ωprocess、ωfrRespectively magnetic Disk utilization rate, magnetic disc i/o load factor, cpu load rate, memory load factor, read-write task bonding ratio and node failure rate, λ01+ λ2345=1, λ0、λ1、λ2、λ3、λ4、λ5、ωdisk_used、ωdisk_io、ωcpu、ωmem、ωprocess、ωfr∈[0,1]。
10. replica management system in big data storage according to claim 9, which is characterized in that use analytic hierarchy process (AHP) The weight of each evaluation index is described as judgment matrix, divided each evaluation index by the weight for determining each evaluation index Layer, quantitative analysis is realized in the connection between each layer, finally seeks a normalization characteristic vector as judgment matrix;When perceiving Cluster adaptively matches corresponding matrix when handling different task, to correct suitable Replica placement position.
CN201810545954.9A 2018-05-25 2018-05-25 Method for storing and complementing copies and selecting nodes in big data storage and management system Active CN110535898B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810545954.9A CN110535898B (en) 2018-05-25 2018-05-25 Method for storing and complementing copies and selecting nodes in big data storage and management system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810545954.9A CN110535898B (en) 2018-05-25 2018-05-25 Method for storing and complementing copies and selecting nodes in big data storage and management system

Publications (2)

Publication Number Publication Date
CN110535898A true CN110535898A (en) 2019-12-03
CN110535898B CN110535898B (en) 2022-10-04

Family

ID=68657167

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810545954.9A Active CN110535898B (en) 2018-05-25 2018-05-25 Method for storing and complementing copies and selecting nodes in big data storage and management system

Country Status (1)

Country Link
CN (1) CN110535898B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112487093A (en) * 2020-12-07 2021-03-12 浪潮云信息技术股份公司 Decentralized copy control method for distributed database

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100106808A1 (en) * 2008-10-24 2010-04-29 Microsoft Corporation Replica placement in a distributed storage system
CN102667761A (en) * 2009-06-19 2012-09-12 布雷克公司 Scalable cluster database
US20140052706A1 (en) * 2011-04-29 2014-02-20 Prateep Misra Archival storage and retrieval system
CN103678051A (en) * 2013-11-18 2014-03-26 航天恒星科技有限公司 On-line fault tolerance method in cluster data processing system
CN104156381A (en) * 2014-03-27 2014-11-19 深圳信息职业技术学院 Copy access method and device for Hadoop distributed file system and Hadoop distributed file system
CN104468651A (en) * 2013-09-17 2015-03-25 南京中兴新软件有限责任公司 Distributed multi-copy storage method and device
CN104615606A (en) * 2013-11-05 2015-05-13 阿里巴巴集团控股有限公司 Hadoop distributed file system and management method thereof
US20150186411A1 (en) * 2014-01-02 2015-07-02 International Business Machines Corporation Enhancing Reliability of a Storage System by Strategic Replica Placement and Migration
US20160004631A1 (en) * 2014-07-03 2016-01-07 Pure Storage, Inc. Profile-Dependent Write Placement of Data into a Non-Volatile Solid-State Storage
CN105915626A (en) * 2016-05-27 2016-08-31 南京邮电大学 Data copy initial placement method for cloud storage
CN106293492A (en) * 2015-05-14 2017-01-04 中兴通讯股份有限公司 A kind of memory management method and distributed file system
CN106302702A (en) * 2016-08-10 2017-01-04 华为技术有限公司 Burst storage method, the Apparatus and system of data
CN106612322A (en) * 2016-07-11 2017-05-03 四川用联信息技术有限公司 Data recovery method for distribution optimization of data storing nodes in cloud storage
US20180004414A1 (en) * 2016-06-29 2018-01-04 EMC IP Holding Company LLC Incremental erasure coding for storage systems
CN107729514A (en) * 2017-10-25 2018-02-23 郑州云海信息技术有限公司 A kind of Replica placement node based on hadoop determines method and device
CN108009260A (en) * 2017-12-11 2018-05-08 西安交通大学 A kind of big data storage is lower with reference to node load and the Replica placement method of distance

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100106808A1 (en) * 2008-10-24 2010-04-29 Microsoft Corporation Replica placement in a distributed storage system
CN102667761A (en) * 2009-06-19 2012-09-12 布雷克公司 Scalable cluster database
US20140052706A1 (en) * 2011-04-29 2014-02-20 Prateep Misra Archival storage and retrieval system
CN104468651A (en) * 2013-09-17 2015-03-25 南京中兴新软件有限责任公司 Distributed multi-copy storage method and device
CN104615606A (en) * 2013-11-05 2015-05-13 阿里巴巴集团控股有限公司 Hadoop distributed file system and management method thereof
CN103678051A (en) * 2013-11-18 2014-03-26 航天恒星科技有限公司 On-line fault tolerance method in cluster data processing system
US20150186411A1 (en) * 2014-01-02 2015-07-02 International Business Machines Corporation Enhancing Reliability of a Storage System by Strategic Replica Placement and Migration
CN104156381A (en) * 2014-03-27 2014-11-19 深圳信息职业技术学院 Copy access method and device for Hadoop distributed file system and Hadoop distributed file system
US20160004631A1 (en) * 2014-07-03 2016-01-07 Pure Storage, Inc. Profile-Dependent Write Placement of Data into a Non-Volatile Solid-State Storage
CN106293492A (en) * 2015-05-14 2017-01-04 中兴通讯股份有限公司 A kind of memory management method and distributed file system
CN105915626A (en) * 2016-05-27 2016-08-31 南京邮电大学 Data copy initial placement method for cloud storage
US20180004414A1 (en) * 2016-06-29 2018-01-04 EMC IP Holding Company LLC Incremental erasure coding for storage systems
CN106612322A (en) * 2016-07-11 2017-05-03 四川用联信息技术有限公司 Data recovery method for distribution optimization of data storing nodes in cloud storage
CN106302702A (en) * 2016-08-10 2017-01-04 华为技术有限公司 Burst storage method, the Apparatus and system of data
CN107729514A (en) * 2017-10-25 2018-02-23 郑州云海信息技术有限公司 A kind of Replica placement node based on hadoop determines method and device
CN108009260A (en) * 2017-12-11 2018-05-08 西安交通大学 A kind of big data storage is lower with reference to node load and the Replica placement method of distance

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
刘艳等: "异构Hadoop集群中数据副本放置策略优化", 《华中科技大学学报(自然科学版)》 *
王岩,汪晋宽: "云存储中动态副本放置机制研究", 《计算机工程与科学 高性能计算》 *
蔡燕冬,刘艳等: "一种优化的Hadoop副本放置策略", 《微型机与应用》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112487093A (en) * 2020-12-07 2021-03-12 浪潮云信息技术股份公司 Decentralized copy control method for distributed database

Also Published As

Publication number Publication date
CN110535898B (en) 2022-10-04

Similar Documents

Publication Publication Date Title
CN107943867B (en) High-performance hierarchical storage system supporting heterogeneous storage
CN102855294B (en) Intelligent hash data layout method, cluster storage system and method thereof
CN102546782B (en) Distribution system and data operation method thereof
CN104657459B (en) A kind of mass data storage means based on file granularity
CN104704773B (en) Cloud storage method and system
CN102137133B (en) Method and system for distributing contents and scheduling server
CN104407926B (en) A kind of dispatching method of cloud computing resources
CN104462432B (en) Adaptive distributed computing method
CN106843745A (en) Capacity expansion method and device
US8682850B2 (en) Method of enhancing de-duplication impact by preferential selection of master copy to be retained
CN102882983A (en) Rapid data memory method for improving concurrent visiting performance in cloud memory system
CN104094254A (en) System and method for unbalanced raid management
CN102411542A (en) Dynamic hierarchical storage system and method
CN108388604A (en) User right data administrator, method and computer readable storage medium
CN103455526A (en) ETL (extract-transform-load) data processing method, device and system
CN107729514A (en) A kind of Replica placement node based on hadoop determines method and device
CN102577241A (en) Method, device and system for scheduling distributed buffer resources
CN105915626B (en) A kind of data copy initial placement method towards cloud storage
CN107450855A (en) A kind of model for distributed storage variable data distribution method and system
CN110058960A (en) For managing the method, equipment and computer program product of storage system
CN107480254B (en) Online load balancing method suitable for distributed memory database
CN111966291B (en) Data storage method, system and related device in storage cluster
CN109144783A (en) A kind of distribution magnanimity unstructured data backup method and system
CN117033004B (en) Load balancing method and device, electronic equipment and storage medium
CN110535898A (en) Copy storage, completion, node selecting method and management system in big data storage

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant