CN102223404A - Replica selection method based on access cost and transmission time - Google Patents
Replica selection method based on access cost and transmission time Download PDFInfo
- Publication number
- CN102223404A CN102223404A CN2011101512234A CN201110151223A CN102223404A CN 102223404 A CN102223404 A CN 102223404A CN 2011101512234 A CN2011101512234 A CN 2011101512234A CN 201110151223 A CN201110151223 A CN 201110151223A CN 102223404 A CN102223404 A CN 102223404A
- Authority
- CN
- China
- Prior art keywords
- copy
- data
- memory node
- matrix
- task
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a replica selection method based on access cost and transmission time. The method comprises the following steps that: for any data-intensive task needing to access a plurality of data replicas, at first, a replica selection problem is modeled into a WSCP (Weight Set Covering Problem); then the problem is converted into a matrix; and by adopting a weight greedy algorithm, a storage node with minimum replica average access cost is selected at each time, so that the transmission time of the data replica can be reduced when the replicas with low cost are selected, until the data replicas needed by the task are obtained. By the method disclosed by the invention, the replicas with low cost can be selected, and simultaneously, the transmission time of the replicas can be reduced. The method is simple, has high execution efficiency, and is suitable for the replica selection in a data-intensive computing environment.
Description
Technical field
The present invention relates to the copy selection method in the data-intensive calculating, particularly a kind of copy selection method based on access cost and transmission time.
Background technology
Share to ecommerce from search engine, video, it is the service at center that Internet service becomes gradually with the mass data processing, and the ability to providing data processing is provided its service quality to a great extent.And, i.e. the position of these data, operation, storage, move, share and describe the performance bottleneck that has caused in the computing capability evolution just to the management of these data.Under this background, as a kind of support technology of new services, (Data-Intensive Computing DIC) arises at the historic moment and causes the common concern of industrial quarters and academia in data-intensive calculating.
And the copy technology is a Data Replication Technology in Mobile, is a kind of effective technology that improves data-intensive service quality that is widely adopted in the data-intensive calculating.In the data-intensive environment, promptly deposit a plurality of copies of same data by adopting distributed storage and data redundancy technology at different physical store nodes, not only can improve the reliabilty and availability of data in the data-intensive environment, the access to netwoks that can also effectively reduce data postpones, and improves the load balancing of network etc.And the scheduled for executing of task depends on the memory node of required by task copy to a great extent, and the selection of copy place memory node is optimized the execution efficient that can improve application program when satisfying the user task quality of service requirement.Therefore, when carrying out user task, the memory node of the best at selection required by task data trnascription place is most important.
At present, domestic and international research about copy selection method mostly is under data grid environment:
Srikumar Venugopal of Univ Melbourne Australia and Rajkumar Buyya are devoted to data-intensive application and research under the grid environment always, at " An SCP-based heuristic approach for scheduling distributed data-intensive applications on global grids " (source publication: Journal of Parallel and Distributed Computing volume: 69 phases: 4 pages: proposed a kind of copy selection method 471-487), promptly the copy that carries out in the data-intensive application based on the tree search type algorithm of set covering problem (SCP) is selected, its starting point is all copies that the physical store node of selection minimum number comes the covering task to need, to reduce the time that copy moves.
People such as the Sun Min of University Of Tianjin are at " Ant algorithm for file replica selection in data grid " (source publication: First International Conference on Semantics, Knowledge and Grid Materials Research, SKG 2005, paper number: 4125852 pages: proposed the selection problem that a kind of ant group algorithm solves data trnascription in the data-intensive calculating 64-66), to reduce data access delay, bandwidth consumption and distributed storage load.
People such as the Jin Hai of the Central China University of Science and Technology are at " Using classification techniques to improve replica selection in data grid " (source publication: On The Move to Meaningful Internet Systems volume: 4276 pages: propose a kind of new copy selection strategy based on sorting technique 1376-1387), by utilizing the transfer of data historical information to predict the physical location of best copy, and adopt the contiguous algorithm of K (KNN) to realize that the optimal data copy selects.
As a kind of emerging data-intensive computation model and a kind of computation schema that can handle large-scale data and huge commercial application value is arranged, this service model based on the Internet of cloud computing is subjected to the extensive concern of various circles of society.Each fatware manufacturer is all in the research of actively pushing forward cloud computing and application in the world, and proposed the scheme and the realization of using at cloud respectively, wherein is no lack of information giants such as Google, Amazon, IBM and Microsoft.Copy under the cloud environment selects also becoming the problem of paying close attention to of Chinese scholars:
People such as the Li Jing of University Of Chongqing are at " A replica selection decision in cloud computing environment " (source publication: Advanced Materials Research volume: 121-122 page or leaf: propose a kind of new copy selection algorithm 801-806), based on GM (1,1) the gray scale dynamic model adopts gray system theory to come the prediction data response time, use the reliability of Markov chain prediction data copy simultaneously, can improve the load balance between the memory node under the cloud computing environment.
In sum, the copy selection method major part that researcher before proposes is at improvement under certain specified conditions and optimization, its weak point is: the value of all not considering data itself is the access cost of copy, and does not have fully to pay attention to the time of the required cost of transmission copy.
Summary of the invention
In order to solve the problem that still exists in the present copy selection, at the deficiency of researcher's proposition method before, the purpose of this invention is to provide a kind of copy selection method based on access cost and transmission time, the copy that mainly solves in the data-intensive calculating is selected problem, making can be when selecting low-cost copy, and the transmission time of reducing copy is to improve the execution efficient of data-intensive application program.
Concrete steps of the present invention are as follows:
The first step: the set of a plurality of data trnascriptions that task need be visited and all in the data-intensive environment have the set of memory node of data trnascription as the initialization input of this copy selection course;
Second step: be matrix of task creation, wherein comprise the row of the memory node of the arbitrary copy of required by task as matrix, the duplicate of the document that this task needs is as matrix column, and each nonzero value is used for representing cost from the storage node accesses copy in the matrix, and correlated variables is carried out initialization;
The 3rd step:, the matrix ascending order is arranged according to the average copy access cost of memory node;
The 4th step: from orderly matrix, select first row, it is current memory node with minimum average B configuration copy access cost, it is added to the set of memory node at the best copy place of task choosing, and, upgrade matrix with the row deletion of the file correspondence of the row of this memory node correspondence and covering thereof;
The 5th step: judge whether all copies all are capped,, then forwarded for the 3rd step to, continue to carry out this copy selection course if also have copy not to be capped; Otherwise the copy selection course finishes, and so far obtains the memory node set at the optimum copy place of task needs.
Characteristics of the present invention
The present invention is by adopting the heavy greedy algorithm of cum rights, and each memory node of selecting to have minimum copy average access cost in (ln|X|+1) of polynomial time, when can select the low-cost data copy, reduces moving of copy.Method is simple, carries out the efficient height, is applicable to that the copy in the data-intensive computing environment is selected.
Description of drawings
Fig. 1 is a flow chart of the present invention.
Fig. 2 is the copy preference pattern figure that the present invention is based on the heavy set covering problem (WSCP) of cum rights.
Embodiment
The present invention is described in further detail below in conjunction with drawings and Examples.
Below in the description of embodiment, a data intensive task J, its operation needs the individual data trnascription that is distributed on the individual memory node of m (m>0) of visit k (k>0), and, the access cost difference of a plurality of copies of same file on different memory nodes.As shown in Figure 1, concrete steps are as follows:
The first step: the set of the individual data copy of k (k>0) that task J need be visited and all in the data-intensive environment have the set D of memory node of data trnascription as the initialization input of this copy selection course;
Second step: be matrix of task creation, wherein comprise the row of the memory node of the arbitrary copy of required by task as matrix, the duplicate of the document that this task needs is as matrix column, and each nonzero value is used for representing cost from the storage node accesses copy in the matrix, and correlated variables is carried out initialization;
As shown in Figure 2, the copy set that needs of J share F and represents F={f
1, f
2..., f
k, D={d is represented in the memory node set at these copy places with D
1, d
2..., d
m, and w
MkExpression is from memory node d
mLast visit copy f
kCost.Accompanying drawing 3 can further be converted into a matrix A=[a
Ij], wherein i represents i memory node, and 1≤i≤m, and j represents j copy, and 1≤j≤k.If VM can be from memory node d
iGo up with cost w
IjVisit data copy f
j, a then
Ij=w
Ij(w
Ij>0), otherwise, if d
iDo not comprise f
j, a then
Ij=0.If delegation comprises a cost w in the matrix in a certain row
IjBe called this row with cost w
Ij" covering " should row, then the copy set of selecting problem can be converted into the row matrix that finds an optimum makes it cover all row with minimum average weight, this problem can be summed up as the heavy set covering problem (WSCP) of cum rights.In the following description, set C is used for depositing the current memory node of having selected that contains best copy, and set E is used for depositing the current copy that has been covered by selected memory node, it is carried out initialization respectively: C Φ, E Φ;
The 3rd step:, the matrix ascending order is arranged according to the average copy access cost of memory node;
According to formula
Calculate the average copy access cost of each memory node in the matrix, the ratio of the number of the total cost of the task J desired data copy that promptly memory node covered and the copy that is covered, wherein d
iRepresent i memory node,
Expression memory node d
iTotal access cost of the copy that covers, its computing formula is:
And | d
i∩ F| is used for representing memory node d
iThe number of the copy that being covered of task J needs.From formula
As can be seen, memory node d
iThe required number of copies of being covered of task J is many more, and total access cost of these copies is more little, and then the average access cost of copy is more little, and promptly the copy access cost is lower, and the copy that covers is more concentrated.
For the copy of selecting low cost and relatively concentrating, according to the average access cost of the copy of each memory node, the matrix ascending order is arranged, make can begin from the memory node of average access cost minimum to select at every turn.
The 4th step: from orderly matrix, select first row, it is current memory node with minimum copy average access cost, it is added to the set of memory node at the best copy place of task choosing, and, then matrix is upgraded the row deletion of the file correspondence of the row of this memory node correspondence and covering thereof;
In the 3rd step, matrix has been carried out the ascending order arrangement according to the average access cost that each memory node covers the copy of task J needs, therefore, the corresponding memory node d of first row of selection matrix, promptly the current copy average access cost of this memory node is minimum, with the memory node set C at the best copy place selected of the task that is added to J in, i.e. C C ∪ d; Copy with the matrix column correspondence that d covered adds among the E simultaneously, i.e. E E ∪ d; Then matrix A is upgraded, deletion first row is the row of the copy correspondence that covered of pairing row and the d of memory node d.
The 5th step: judge whether all copies all are capped,, then forwarded for the 3rd step to, continue to carry out this copy selection course if also have copy not to be capped; Otherwise the copy selection course finishes, and so far obtains the memory node set at the optimum copy place of task needs.
Judge whether the copy set that has been capped equates with the set F of the required whole copies of task J, if E ≠ F illustrates that copy is not capped in addition, then turns to for the 3rd step, continue memory node is selected in new matrix ordering then; Otherwise, illustrating that all copies all are capped, the copy selection course finishes, and so far obtains the memory node set C at the optimum copy place of task needs.
Claims (1)
1. copy selection method based on access cost and transmission time is characterized in that:
The first step: the set of a plurality of data trnascriptions that task need be visited and all in the data-intensive environment have the set of memory node of data trnascription as the initialization input of this copy selection course;
Second step: be matrix of task creation, wherein comprise the row of the memory node of the arbitrary copy of required by task as matrix, the duplicate of the document that this task needs is as matrix column, and each nonzero value is used for representing cost from the storage node accesses copy in the matrix, and correlated variables is carried out initialization;
The 3rd step:, the matrix ascending order is arranged according to the average copy access cost of memory node;
The 4th step: from orderly matrix, select first row, it is current memory node with minimum average B configuration copy access cost, it is added to the set of memory node at the best copy place of task choosing, and, upgrade matrix with the row deletion of the file correspondence of the row of this memory node correspondence and covering thereof;
The 5th step: judge whether all copies all are capped,, then forwarded for the 3rd step to, continue to carry out this copy selection course if also have copy not to be capped; Otherwise the copy selection course finishes, and so far obtains the memory node set at the optimum copy place of task needs.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2011101512234A CN102223404A (en) | 2011-06-07 | 2011-06-07 | Replica selection method based on access cost and transmission time |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2011101512234A CN102223404A (en) | 2011-06-07 | 2011-06-07 | Replica selection method based on access cost and transmission time |
Publications (1)
Publication Number | Publication Date |
---|---|
CN102223404A true CN102223404A (en) | 2011-10-19 |
Family
ID=44779830
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2011101512234A Pending CN102223404A (en) | 2011-06-07 | 2011-06-07 | Replica selection method based on access cost and transmission time |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102223404A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103793425A (en) * | 2012-10-31 | 2014-05-14 | 国际商业机器公司 | Data processing method and data processing device for distributed system |
CN104915205A (en) * | 2015-06-08 | 2015-09-16 | 北京航空航天大学 | Request multi-replica task execution method applicable to online data intensive applications |
CN108255593A (en) * | 2017-12-20 | 2018-07-06 | 东软集团股份有限公司 | Task coordination method, device, medium and electronic equipment based on shared resource |
CN109902797A (en) * | 2019-04-22 | 2019-06-18 | 桂林电子科技大学 | A kind of cloud Replica placement scheme based on ant group algorithm |
CN114356236A (en) * | 2021-12-31 | 2022-04-15 | 杭州趣链科技有限公司 | Block chain data storage and reading method and block chain data access system |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101751234A (en) * | 2010-01-21 | 2010-06-23 | 浪潮(北京)电子信息产业有限公司 | Method and system for distributing disk array data |
-
2011
- 2011-06-07 CN CN2011101512234A patent/CN102223404A/en active Pending
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101751234A (en) * | 2010-01-21 | 2010-06-23 | 浪潮(北京)电子信息产业有限公司 | Method and system for distributing disk array data |
Non-Patent Citations (1)
Title |
---|
WEI LIU ET. AL: "A Cost-Aware Resource Selection for Data-intensive Applications in Cloud-oriented Data Centers", 《I.J.INFORMATION TECHNOLOGY AND COMPUTER SCIENCE》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103793425A (en) * | 2012-10-31 | 2014-05-14 | 国际商业机器公司 | Data processing method and data processing device for distributed system |
CN104915205A (en) * | 2015-06-08 | 2015-09-16 | 北京航空航天大学 | Request multi-replica task execution method applicable to online data intensive applications |
CN108255593A (en) * | 2017-12-20 | 2018-07-06 | 东软集团股份有限公司 | Task coordination method, device, medium and electronic equipment based on shared resource |
CN108255593B (en) * | 2017-12-20 | 2020-11-03 | 东软集团股份有限公司 | Task coordination method, device, medium and electronic equipment based on shared resources |
CN109902797A (en) * | 2019-04-22 | 2019-06-18 | 桂林电子科技大学 | A kind of cloud Replica placement scheme based on ant group algorithm |
CN114356236A (en) * | 2021-12-31 | 2022-04-15 | 杭州趣链科技有限公司 | Block chain data storage and reading method and block chain data access system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wang et al. | Load balancing task scheduling based on genetic algorithm in cloud computing | |
CN102467570B (en) | Connection query system and method for distributed data warehouse | |
CN102223404A (en) | Replica selection method based on access cost and transmission time | |
CN100576179C (en) | A kind of based on energy-optimised gridding scheduling method | |
CN105656999B (en) | A kind of cooperation task immigration method of energy optimization in mobile cloud computing environment | |
CN106250240A (en) | A kind of optimizing and scheduling task method | |
Ahmad et al. | Optimization of data-intensive workflows in stream-based data processing models | |
CN106471501A (en) | The method of data query, the storage method data system of data object | |
CN103294912B (en) | A kind of facing mobile apparatus is based on the cache optimization method of prediction | |
CN114327811A (en) | Task scheduling method, device and equipment and readable storage medium | |
CN108304253A (en) | Map method for scheduling task based on cache perception and data locality | |
CN102156659A (en) | Scheduling method and system for job task of file | |
Mansouri et al. | Job scheduling and dynamic data replication in data grid environment | |
Zeng et al. | Cost minimization for big data processing in geo-distributed data centers | |
CN107070965B (en) | Multi-workflow resource supply method under virtualized container resource | |
CN109582457A (en) | Network-on-chip heterogeneous multi-core system task schedule and mapping | |
CN103984737A (en) | Optimization method for data layout of multi-data centres based on calculating relevancy | |
Shu-Jun et al. | Optimization and research of hadoop platform based on fifo scheduler | |
CN103176850A (en) | Electric system network cluster task allocation method based on load balancing | |
CN107341193B (en) | Method for inquiring mobile object in road network | |
Abdi et al. | The Impact of Data Replicatino on Job Scheduling Performance in Hierarchical data Grid | |
Kaur et al. | Improvement of Task Offloading for Latency Sensitive Tasks in Fog Environment | |
CN112114951A (en) | Bottom-up distributed scheduling system and method | |
Bisht et al. | Survey on Load Balancing and Scheduling Algorithms in Cloud Integrated Fog Environment | |
CN108228323A (en) | Hadoop method for scheduling task and device based on data locality |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20111019 |
|
RJ01 | Rejection of invention patent application after publication |