CN110535898B - Method for storing and complementing copies and selecting nodes in big data storage and management system - Google Patents

Method for storing and complementing copies and selecting nodes in big data storage and management system Download PDF

Info

Publication number
CN110535898B
CN110535898B CN201810545954.9A CN201810545954A CN110535898B CN 110535898 B CN110535898 B CN 110535898B CN 201810545954 A CN201810545954 A CN 201810545954A CN 110535898 B CN110535898 B CN 110535898B
Authority
CN
China
Prior art keywords
node
copy
rate
disk
copies
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810545954.9A
Other languages
Chinese (zh)
Other versions
CN110535898A (en
Inventor
丁博
徐大青
张展国
贺彪
杨迎春
王少鹏
刘一擎
丁亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
Xuji Group Co Ltd
Xuchang XJ Software Technology Co Ltd
Original Assignee
State Grid Corp of China SGCC
Xuji Group Co Ltd
Xuchang XJ Software Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, Xuji Group Co Ltd, Xuchang XJ Software Technology Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN201810545954.9A priority Critical patent/CN110535898B/en
Publication of CN110535898A publication Critical patent/CN110535898A/en
Application granted granted Critical
Publication of CN110535898B publication Critical patent/CN110535898B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • H04L67/1008Server selection for load balancing based on parameters of servers, e.g. available memory or workload
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a method for storing and complementing copies in big data storage and a node selection method and a management system, wherein the node selection method comprises the following steps: and selecting evaluation indexes of the replica storage nodes according to the real-time state information and the historical fault information of each data node server, listing the predicted value of the probability of data fault into the evaluation indexes, determining the weight of each evaluation index, and calculating according to the weight to obtain the data nodes for replica storage. Based on the node selection method, appropriate nodes are selected according to the three schemes to store the copies. When the duplicate failure needs to be completed, firstly, the duplicate completion is carried out according to the active node on the rack where the failure node is located, and when the rack where the duplicate failure node is located cannot work normally, the active node with the similar failure rate is selected to carry out the duplicate completion. The invention effectively improves the writing efficiency and the load balancing degree during storage under the condition of not influencing the copy safety, and fundamentally solves the problem that the cluster needs load balancing after long-time running.

Description

Method for storing and complementing copies and selecting nodes in big data storage and management system
Technical Field
The invention belongs to the technical field of big data storage and cloud computing, and particularly relates to a copy storage, completion and node selection method and a management system in big data storage.
Background
The existing large data storage system is generally distributed storage (such as HDFS), node failure and hardware failure are problems which must be considered, and the reliability and the availability of the system are ensured by the application of the replica technology. Ensuring proper placement and selection of copies of data also ensures that access to the data is more efficiently achieved. Three copy strategies are commonly adopted in a big data storage strategy, so that the safety of data is guaranteed, distributed computation can be effectively supported, however, when local access of the data is considered, unreasonable copy distribution can affect computation with high data localization requirements, tasks can be allocated to machines which store copies and have low performance, and the performance of the whole cluster is reduced.
Load balancing mentioned in a currently common copy storage strategy is mainly used for solving load imbalance from the perspective of data volume and then carrying out load balancing on existing data, and the implementation of load balancing is essentially copy transfer. Load balancing can only be a remedy to unreasonable copy storage policies. The ideal method is to autonomously select or adjust the storage position of the copy according to the performance condition of the current cluster when the copy is placed.
The existing copy storage strategy also provides a method for placing according to the use amount of the rack and the hard disk and the load condition, but the influence factors referred by different methods during selection are single, and the problems of load balance and storage efficiency still cannot be considered. This problem is particularly pronounced in heterogeneous clusters, where, for example, there may be too high a load on some of the less performing machines and some of the more performing machines may be idle. The duplication and transfer of the copies in the large data storage cluster are generally caused by hardware faults of the servers, the design service life of the general servers is 5-7 years, and the actual service life of a single server is related to the batch, the service intensity and the service environment of the server. The existing copy storage method does not consider factors aiming at server failure or aging, for example, the reference "improved strategy for placing copies of HDFS based on support vector machine" (author: army, computer engineering, vol. 41, vol. 11, p. 2015 11) only considers 5 factors of relative load rate, network distance, disk performance, CPU performance and memory, when a server fails, the actions of copying and migrating data are random, which causes disorder of copy placement.
From the perspective of operations research, the strategy of storing the big data in the copy can be considered as a decision problem and is difficult to analyze quantitatively. Aiming at the problems, a hierarchical analysis method is provided, the related evaluation indexes are respectively compared, the relative importance of a plurality of groups of relations can be quantitatively analyzed, finally, the most ideal effect is obtained by weighting various influence factors, and theoretically, the ideal result can be obtained as long as the proper weight can be given.
Disclosure of Invention
The invention provides a method for storing, complementing and selecting a copy in big data storage and a management system, which are used for solving the problems that the existing cluster copy storage strategy does not effectively sense the state of each data node server, the influence factor of selection reference is single, and load balance and storage efficiency cannot be considered at the same time.
In order to solve the technical problem, the method for selecting the copy storage node in the big data storage comprises the following four unit schemes:
according to the first unit scheme, evaluation indexes of copy storage nodes are selected according to real-time state information and historical fault information of each data node server, wherein the evaluation indexes comprise a disk utilization rate, a disk I/O load rate, a CPU load rate, a memory load rate, a read-write task connection rate and a node fault rate, and the read-write task connection rate is the ratio of the connection number of read-write tasks of a current server to the maximum connection number of the read-write tasks allowed by a file system; determining the weight of each evaluation index, and then selecting the data node as a copy storage position according to a reference value calculated by the following formula:
ω=λ 0 ω disk_used1 ω disk_io2 ω cpu3 ω mem4 ω process5 ω fr
wherein ω is a reference value selected for the data node, ω disk_used 、ω disk_io 、ω cpu 、ω mem 、ω process 、ω fr Respectively the disk utilization rate, the disk I/O load rate, the CPU load rate, the memory load rate, the read-write task connection rate and the node failure rate, lambda 012345 =1,λ 0 、λ 1 、λ 2 、λ 3 、λ 4 、λ 5 、ω disk_used 、ω disk_io 、ω cpu 、ω mem 、ω process 、ω fr ∈[0,1]。
In the unit scheme II, on the basis of the unit scheme I, a hierarchy analysis method is adopted to determine the weight of each evaluation index, the weight of each evaluation index is described as a judgment matrix, each evaluation index is layered, the relation among the layers realizes quantitative analysis, and finally a normalized feature vector is obtained as the judgment matrix; when the cluster is perceived to be processing different tasks, the corresponding matrix is adaptively matched to correct the appropriate copy placement position.
And in the unit scheme III, on the basis of the unit scheme I, the node failure rate is the ratio of the failure time of the data node to the online running time or the ratio of the service life of the data node to the design service life.
And a fourth unit scheme, wherein on the basis of the second unit scheme, the node failure rate is the ratio of the failure time of the data node to the online operation time or the ratio of the service life of the data node to the design service life.
The method for storing the copy in the big data storage comprises the following four unit schemes:
in the first unit scheme, the number of the default copies is 3, wherein two copies are stored in different nodes on the same rack, the other copy is stored in a node of a different rack, when the copies are stored, if a client is a data node, the first copy is placed on the node, and if the client is not the data node, the node is selected from the nodes on all the racks according to the selection method of the copy storage node in the big data storage for placing the first copy; and then selecting a node for placing a second copy in a node of a rack different from the first copy according to the selection method of the copy storage node in the big data storage, and selecting a node for placing a third copy in a rack same as the second copy and different from the first copy according to the selection method of the copy storage node in the big data storage.
In the unit scheme II, on the basis of the unit scheme I, a hierarchy analysis method is adopted to determine the weight of each evaluation index, the weight of each evaluation index is described as a judgment matrix, each evaluation index is layered, the relation among the layers realizes quantitative analysis, and finally a normalized feature vector is obtained as the judgment matrix; when the cluster is perceived to be processing different tasks, the corresponding matrix is adaptively matched to correct the appropriate copy placement position.
And in the unit scheme III, on the basis of the unit scheme I, the node failure rate is the ratio of the failure time of the data node to the online running time or the ratio of the service life of the data node to the design service life.
And a fourth unit scheme, wherein on the basis of the second unit scheme, the node failure rate is the ratio of the failure time of the data node to the online operation time or the ratio of the service life of the data node to the design service life.
The method for complementing the copy in the big data storage comprises the following five unit schemes:
in the first unit scheme, when the number of copies to be complemented is less than 3, a rack where each lost copy is located is obtained, whether an active node exists on the rack where each fault node is located is judged, if the active node exists, a data node is selected from the active nodes according to a set node selection method for complementing the copies, and if the active node does not exist, a node is selected from the nodes with the same fault rate as the fault node according to the set node selection method for complementing the copies.
In the second unit scheme, on the basis of the first unit scheme, the set node selection method is as follows: selecting evaluation indexes of the copy storage nodes according to the real-time state information and the historical fault information of each data node server, wherein the evaluation indexes comprise a disk utilization rate, a disk I/O load rate, a CPU load rate, a memory load rate, a read-write task connection rate and a node fault rate, and the read-write task connection rate is the ratio of the connection number of the read-write tasks of the current server to the maximum connection number of the read-write tasks allowed by the file system; determining the weight of each evaluation index, and then selecting the data node as a copy storage position according to a reference value calculated by the following formula:
ω=λ 0 ω disk_used1 ω disk_io2 ω cpu3 ω mem4 ω process5 ω fr
wherein ω is a reference value selected for the data node, ω disk_used 、ω disk_io 、ω cpu 、ω mem 、ω process 、ω fr Respectively the disk utilization rate, the disk I/O load rate, the CPU load rate, the memory load rate, the read-write task connection rate and the node failure rate, lambda 012345 =1,λ 0 、λ 1 、λ 2 、λ 3 、λ 4 、λ 5 、ω disk_used 、ω disk_io 、ω cpu 、ω mem 、ω process 、ω fr ∈[0,1]。
In the unit scheme III, on the basis of the unit scheme II, a hierarchy analysis method is adopted to determine the weight of each evaluation index, the weight of each evaluation index is described as a judgment matrix, each evaluation index is layered, the relation among the layers realizes quantitative analysis, and finally a normalized feature vector is obtained as the judgment matrix; when a cluster is perceived to be processing a different task, the corresponding matrix is adaptively matched to correct the appropriate copy placement position.
And a fourth unit scheme, wherein on the basis of the second unit scheme, the node failure rate is the ratio of the failure time of the data node to the online operation time or the ratio of the service life of the data node to the design service life.
And a fifth unit scheme, wherein on the basis of the third unit scheme, the node failure rate is the ratio of the failure time of the data node to the online operation time or the ratio of the service life of the data node to the design service life.
The copy management system in the big data storage comprises the following four unit schemes:
in the first unit scheme, the system can realize the following functions: selecting evaluation indexes of the copy storage nodes according to the real-time state information and the historical fault information of each data node server, wherein the evaluation indexes comprise a disk utilization rate, a disk I/O load rate, a CPU load rate, a memory load rate, a read-write task connection rate and a node fault rate, and the read-write task connection rate is the ratio of the connection number of the read-write tasks of the current server to the maximum connection number of the read-write tasks allowed by the file system; determining the weight of each evaluation index, and then selecting the data node as a copy storage position according to a reference value calculated by the following formula:
ω=λ 0 ω disk_used1 ω disk_io2 ω cpu3 ω mem4 ω process5 ω fr
wherein ω is a reference value selected for the data node, ω disk_used 、ω disk_io 、ω cpu 、ω mem 、ω process 、ω fr Respectively being the disk utilization rate, the disk I/O load rate, the CPU load rate, the memory load rate, the read-write task connection rate and the node failure rate, lambda 012345 =1,λ 0 、λ 1 、λ 2 、λ 3 、λ 4 、λ 5 、ω disk_used 、ω disk_io 、ω cpu 、ω mem 、ω process 、ω fr ∈[0,1]。
According to the second unit scheme, on the basis of the first unit scheme, the weight of each evaluation index is determined by adopting an analytic hierarchy process, the weight of each evaluation index is described as a judgment matrix, each evaluation index is layered, the relation among layers realizes quantitative analysis, and finally a normalized feature vector is obtained to serve as the judgment matrix; when a cluster is perceived to be processing a different task, the corresponding matrix is adaptively matched to correct the appropriate copy placement position.
And in the unit scheme III, on the basis of the unit scheme I, the node failure rate is the ratio of the failure time of the data node to the online running time or the ratio of the service life of the data node to the design service life.
And on the basis of the unit scheme II, the node failure rate is the ratio of the data node failure time to the online running time or the ratio of the used life of the data node to the design life.
The invention has the beneficial effects that: the invention senses the real-time state information and the historical fault information of each data node server, provides more reliable data nodes for the management node to store the copy, effectively improves the writing efficiency and the load balancing degree during storage under the condition of not influencing the copy safety, and fundamentally solves the problem that the cluster needs load balancing after long-time operation.
According to the method, the server fault information is listed as an evaluation index, the predicted value of the probability of data fault is used as a reference factor for storing the data copy, and when the data node really has fault, the data copy to be supplemented is supplemented according to a set method, so that disorder of data storage is avoided, and the overhead of the copy supplement is reduced as much as possible. Replacing the rack-aware functionality to some extent.
And when the copy fault needs to be completed, preferentially performing copy completion according to the active node on the rack where the fault node is located. When the rack where the replica fault node is located cannot work normally, the active nodes with similar fault rates are preferentially selected for replica completion, and batch factors during server deployment are considered, so that the situation that the nodes with the completed replicas and the fault node are in the same batch can be ensured as much as possible, and the replicas are further ensured to be stored at similar positions.
Drawings
FIG. 1 is a design reference model diagram of a copy storage node selection method in big data storage according to the present invention;
FIG. 2 is a flow chart of copy storage in big data storage according to the present invention;
FIG. 3 is a flow chart of copy completion in big data storage according to the present invention.
Detailed Description
The technical scheme of the invention is further explained in detail by combining the attached drawings.
Embodiment of method for selecting copy storage node in big data storage
The method of the embodiment comprises the following steps: sensing the real-time state information and the historical fault information of each data node server, and selecting evaluation indexes of the copy storage node according to the real-time state information and the historical fault information of each data node server, wherein the evaluation indexes comprise a disk utilization rate, a disk I/O load rate, a CPU load rate, a memory load rate, a read-write task connection rate and a node fault rate; determining the weight of each evaluation index, and then selecting the data node as a copy storage position according to a reference value calculated by the following formula:
ω=λ 0 ω disk_used1 ω disk_io2 ω cpu3 ω mem4 ω process5 ω fr
where ω is a reference selected for the data nodeValue, ω disk_used 、ω disk_io 、ω cpu 、ω mem 、ω process 、ω fr Respectively the disk utilization rate, the disk I/O load rate, the CPU load rate, the memory load rate, the read-write task connection rate and the node failure rate, lambda 012345 =1,λ 0 、λ 1 、λ 2 、λ 3 、λ 4 、λ 5 、ω disk_used 、ω disk_io 、ω cpu 、ω mem 、ω process 、ω fr ∈[0,1]。
Wherein, ω is disk_used 、ω disk_io 、ω cpu 、ω mem Can be obtained from the corresponding operating system in the cluster server, and the read-write task connection rate omega process Is the ratio of the number of the read-write tasks of the current server to the maximum number of the read-write tasks allowed by the file system, omega fr The parameters may be calculated in combination with the ratio of the down time to the online runtime of the server. In some clusters where no fault is recorded, the parameter can also be designed as the ratio of the used life to the designed life, so that the model design of the parameter has a relationship with both the production time and the deployment time of the server for the same data center.
Once the weight value of each evaluation index is determined, the placement strategy of the copy is basically determined. The weight of the evaluation index can be modified according to the requirement, and the scheme can be contracted into a method for placing copies according to a single evaluation index after modification so as to adapt to special working occasions. The weight of each evaluation index can be described as a judgment matrix, and when the cluster is sensed to process different tasks, the corresponding matrix is adaptively matched so as to correct the proper copy position.
The weight of the evaluation index is represented by a matrix [ A B C D E F ], wherein A, B, C, D, E and F respectively represent the six evaluation indexes, each evaluation index is layered, the relation between each layer realizes quantitative analysis, for example, a lowercase AB represents the relative relation between two layers AB, the weight of each evaluation index can be determined according to the relative relation between each evaluation index in the table 1, finally, a normalized feature vector is obtained as a judgment matrix, and the judgment matrix can be obtained by calculation of the table 1. If the evaluation index A is considered preferentially when the cluster is perceived to process different tasks, the proportion of the evaluation index A is the largest, and then weights are set for other 5 evaluation indexes according to the relationship between the evaluation index A and other five evaluation indexes. After the weight is set, the copy placement position is obtained according to the reference value calculated by each node information, and the design idea is shown in fig. 1.
TABLE 1 relative relationship matrix of evaluation indexes
A B C D E F
A 1 ab ac ad ae af
B 1/ab 1 bc bd be bf
C 1/ac 1/bc 1 cd ce cf
D 1/ad 1/bd 1/cd 1 de df
E 1/ae 1/be 1/ce 1/de 1 ef
F 1/af 1/bf 1/cf 1/df 1/ef 1
Embodiment of copy storage method in big data storage
The copy storage method of the embodiment stores the copies according to three schemes, and ensures the reliability of the copies according to the principle that the copies are placed on two racks, namely two copies are stored on different nodes on the same rack, and the other copies are stored on nodes of different racks. When the copies start to be stored, if the client is a data node, placing the first copy on the node, and if the client is not a data node, selecting a node from the nodes on all the racks according to the selection method of the copy storage node in the big data storage in the embodiment to place the first copy; then, in a different node of the same rack as the first copy, a node is selected according to the selection method of the copy storage node in the big data storage in the above embodiment to place a second copy, and in a node on a rack different from the rack where the first and second copies are located, a node is selected according to the selection method of the copy storage node in the big data storage in the above embodiment to place a third copy.
The specific storage process is shown in fig. 2, when the number of copies to be stored is greater than 0, step 1) judges whether the copy to be stored is the first copy, if so, step 2) is performed, otherwise, step 3) is performed;
step 2) judging whether the client is a data node, if so, entering step 4), and if not, entering step 5);
step 3) judging whether the copy to be stored is a second copy, if so, entering step 6), otherwise, entering step 7) if the copy is a third copy;
step 4) selecting the data node for placing a first copy;
step 5) selecting nodes from all the nodes on the rack according to the selection method of the copy storage nodes in the big data storage in the embodiment to place the first copies;
step 6) selecting a node from the nodes of the racks different from the first copy for placing a second copy according to the selection method of the copy storage node in the big data storage in the embodiment;
step 7) judging whether the first copy and the second copy are stored on the same rack, if so, entering step 8), and if not, entering step 9);
step 8) selecting a node for placing a third copy in a node on a rack different from the rack where the first copy and the second copy are located according to the selection method of the copy storage node in the big data storage in the embodiment, and entering step 10);
step 9) selecting a node for placing a third copy in a different node of the same rack as the second copy according to the selection method of the copy storage node in the big data storage in the embodiment, and entering step 10);
and step 10), finishing the placement of the three copies and finishing the process.
Embodiment of method for completing duplicate copy in big data storage
The completion method of the embodiment is as follows: and when the number of the copies needing to be complemented is less than 3, acquiring a rack where each lost copy is located, judging whether an active node exists on the rack where each fault node is located, if so, selecting a data node from the active nodes according to a set node selection method for complementing the copies, and if not, selecting a node from the nodes with the same fault rate as the fault node according to the set node selection method for complementing the copies.
As shown in fig. 3, when the number of copies to be complemented is greater than 0, determining whether the number of copies to be complemented is equal to 3, if so, entering step 4), and if not, entering step 1);
step 1) obtaining a rack where each lost copy is located, judging whether a movable node exists on the rack where each fault node is located, if so, entering step 2), and if not, entering step 3);
step 2) enumerating active nodes of a rack where the fault nodes are located, and calculating appropriate nodes according to a set node selection method to complete the copy;
step 3) enumerating nodes with the same failure rate as the failed nodes, and calculating appropriate nodes according to a set node selection method to complete the copy;
and 4) generating a bad block and alarming.
The node selection method set in the above step 2) and step 3) may adopt the above selection method of the copy storage node in the big data storage, and may also adopt other node selection methods in the prior art, which are not described in detail here.
The embodiment of the copy management system in big data storage of the invention
The copy management system in the big data storage of the embodiment is a management platform capable of realizing a method for selecting copy storage nodes in the big data storage. The method for selecting the copy storage node in the big data storage can be referred to the above embodiments, and is not described in detail here.
The present invention has been described in relation to particular embodiments thereof, but the invention is not limited to the described embodiments. In the thought given by the present invention, the technical means in the above embodiments are changed, replaced, modified in a manner that is easily imaginable to those skilled in the art, and the functions are basically the same as the corresponding technical means in the present invention, and the purpose of the invention is basically the same, so that the technical scheme formed by fine tuning the above embodiments still falls into the protection scope of the present invention.

Claims (3)

1. The method for completing the copies in the big data storage is characterized in that when the number of the copies needing to be completed is less than 3, a rack where each lost copy is located is obtained, whether an active node exists on the rack where each fault node is located is judged, if the active node exists, a data node is selected from the active nodes according to a set node selection method to be used for completing the copies, and if the active node does not exist, a node is selected from the nodes with the same fault rate as the fault node according to the set node selection method to be used for completing the copies;
the set node selection method comprises the following steps: selecting evaluation indexes of the copy storage nodes according to the real-time state information and the historical fault information of each data node server, wherein the evaluation indexes comprise a disk utilization rate, a disk I/O load rate, a CPU load rate, a memory load rate, a read-write task connection rate and a node fault rate, and the read-write task connection rate is the ratio of the connection number of the read-write tasks of the current server to the maximum connection number of the read-write tasks allowed by the file system; determining the weight of each evaluation index, and then selecting the data node as a copy storage position according to a reference value calculated by the following formula:
ω=λ 0 ω disk_used1 ω disk_io2 ω cpu3 ω mem4 ω process5 ω fr
wherein ω is a reference value selected for the data node, ω disk_used 、ω disk_io 、ω cpu 、ω mem 、ω process 、ω fr Respectively being the disk utilization rate, the disk I/O load rate, the CPU load rate, the memory load rate, the read-write task connection rate and the node failure rate, lambda 012345 =1,λ 0 、λ 1 、λ 2 、λ 3 、λ 4 、λ 5 、ω disk_used 、ω disk_io 、ω cpu 、ω mem 、ω process 、ω fr ∈[0,1]。
2. The method for completing the duplicates in the big data storage according to claim 1, wherein an analytic hierarchy process is used to determine the weight of each evaluation index, the weight of each evaluation index is described as a judgment matrix, each evaluation index is layered, the connection between each layer realizes quantitative analysis, and finally, a normalized feature vector is obtained as the judgment matrix; when the cluster is perceived to be processing different tasks, the corresponding matrix is adaptively matched to correct the appropriate copy placement position.
3. The method for completing duplicates in a big data storage according to claim 1 or 2, wherein said node failure rate is a ratio of data node failure time to online running time or a ratio of data node used time to design used time.
CN201810545954.9A 2018-05-25 2018-05-25 Method for storing and complementing copies and selecting nodes in big data storage and management system Active CN110535898B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810545954.9A CN110535898B (en) 2018-05-25 2018-05-25 Method for storing and complementing copies and selecting nodes in big data storage and management system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810545954.9A CN110535898B (en) 2018-05-25 2018-05-25 Method for storing and complementing copies and selecting nodes in big data storage and management system

Publications (2)

Publication Number Publication Date
CN110535898A CN110535898A (en) 2019-12-03
CN110535898B true CN110535898B (en) 2022-10-04

Family

ID=68657167

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810545954.9A Active CN110535898B (en) 2018-05-25 2018-05-25 Method for storing and complementing copies and selecting nodes in big data storage and management system

Country Status (1)

Country Link
CN (1) CN110535898B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112487093A (en) * 2020-12-07 2021-03-12 浪潮云信息技术股份公司 Decentralized copy control method for distributed database

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102667761A (en) * 2009-06-19 2012-09-12 布雷克公司 Scalable cluster database
CN104156381A (en) * 2014-03-27 2014-11-19 深圳信息职业技术学院 Copy access method and device for Hadoop distributed file system and Hadoop distributed file system
CN104615606A (en) * 2013-11-05 2015-05-13 阿里巴巴集团控股有限公司 Hadoop distributed file system and management method thereof
CN106293492A (en) * 2015-05-14 2017-01-04 中兴通讯股份有限公司 A kind of memory management method and distributed file system
CN107729514A (en) * 2017-10-25 2018-02-23 郑州云海信息技术有限公司 A kind of Replica placement node based on hadoop determines method and device
CN108009260A (en) * 2017-12-11 2018-05-08 西安交通大学 A kind of big data storage is lower with reference to node load and the Replica placement method of distance

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8010648B2 (en) * 2008-10-24 2011-08-30 Microsoft Corporation Replica placement in a distributed storage system
WO2012147087A1 (en) * 2011-04-29 2012-11-01 Tata Consultancy Services Limited Archival storage and retrieval system
CN104468651B (en) * 2013-09-17 2019-09-10 南京中兴新软件有限责任公司 Distributed more copy data storage methods and device
CN103678051B (en) * 2013-11-18 2016-08-24 航天恒星科技有限公司 A kind of online failure tolerant method in company-data processing system
US9635109B2 (en) * 2014-01-02 2017-04-25 International Business Machines Corporation Enhancing reliability of a storage system by strategic replica placement and migration
US9817750B2 (en) * 2014-07-03 2017-11-14 Pure Storage, Inc. Profile-dependent write placement of data into a non-volatile solid-state storage
CN105915626B (en) * 2016-05-27 2019-02-26 南京邮电大学 A kind of data copy initial placement method towards cloud storage
US10248326B2 (en) * 2016-06-29 2019-04-02 EMC IP Holding Company LLC Incremental erasure coding for storage systems
CN106612322B (en) * 2016-07-11 2019-10-11 南京买简信息科技有限公司 A kind of data reconstruction method of deposit data Node distribution optimization in cloud storage
CN106302702B (en) * 2016-08-10 2020-03-20 华为技术有限公司 Data fragment storage method, device and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102667761A (en) * 2009-06-19 2012-09-12 布雷克公司 Scalable cluster database
CN104615606A (en) * 2013-11-05 2015-05-13 阿里巴巴集团控股有限公司 Hadoop distributed file system and management method thereof
CN104156381A (en) * 2014-03-27 2014-11-19 深圳信息职业技术学院 Copy access method and device for Hadoop distributed file system and Hadoop distributed file system
CN106293492A (en) * 2015-05-14 2017-01-04 中兴通讯股份有限公司 A kind of memory management method and distributed file system
CN107729514A (en) * 2017-10-25 2018-02-23 郑州云海信息技术有限公司 A kind of Replica placement node based on hadoop determines method and device
CN108009260A (en) * 2017-12-11 2018-05-08 西安交通大学 A kind of big data storage is lower with reference to node load and the Replica placement method of distance

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
一种优化的Hadoop副本放置策略;蔡燕冬,刘艳等;《微型机与应用》;20151014(第16期);第21-23页 *
云存储中动态副本放置机制研究;王岩,汪晋宽;《计算机工程与科学 高性能计算》;20171030;第39卷(第09期);全文 *
蔡燕冬,刘艳等.一种优化的Hadoop副本放置策略.《微型机与应用》.2015,(第16期),第21-23页. *

Also Published As

Publication number Publication date
CN110535898A (en) 2019-12-03

Similar Documents

Publication Publication Date Title
US20220027189A1 (en) System and Method for Optimizing Placements of Virtual Machines on Hypervisor Hosts
CN105657066B (en) Load for storage system equalization methods and device again
CN102855294B (en) Intelligent hash data layout method, cluster storage system and method thereof
US8244671B2 (en) Replica placement and repair strategies in multinode storage systems
CN109656911A (en) Distributed variable-frequencypump Database Systems and its data processing method
US10356150B1 (en) Automated repartitioning of streaming data
CN102938001B (en) Data loading device and data load method
CN107196865A (en) A kind of adaptive threshold overload moving method of Load-aware
CN108810115B (en) Load balancing method and device suitable for distributed database and server
CN107450855B (en) Model-variable data distribution method and system for distributed storage
US9602590B1 (en) Shadowed throughput provisioning
CN104679594B (en) A kind of middleware distributed computing method
CN108519856B (en) Data block copy placement method based on heterogeneous Hadoop cluster environment
US20170357537A1 (en) Virtual machine dispatching method, apparatus, and system
CN107133228A (en) A kind of method and device of fast resampling
CN109656896A (en) Fault repairing method, device and distributed memory system and storage medium
CN117033004B (en) Load balancing method and device, electronic equipment and storage medium
CN107729514A (en) A kind of Replica placement node based on hadoop determines method and device
WO2014184606A1 (en) Identifying workload and sizing of buffers for the purpose of volume replication
CN107480254B (en) Online load balancing method suitable for distributed memory database
Guo et al. A data placement strategy based on genetic algorithm in cloud computing platform
Fang et al. Integrating workload balancing and fault tolerance in distributed stream processing system
CN110535898B (en) Method for storing and complementing copies and selecting nodes in big data storage and management system
CN107943615B (en) Data processing method and system based on distributed cluster
US9037762B2 (en) Balancing data distribution in a fault-tolerant storage system based on the movements of the replicated copies of data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant