CN102223404A - Replica selection method based on access cost and transmission time - Google Patents

Replica selection method based on access cost and transmission time Download PDF

Info

Publication number
CN102223404A
CN102223404A CN2011101512234A CN201110151223A CN102223404A CN 102223404 A CN102223404 A CN 102223404A CN 2011101512234 A CN2011101512234 A CN 2011101512234A CN 201110151223 A CN201110151223 A CN 201110151223A CN 102223404 A CN102223404 A CN 102223404A
Authority
CN
China
Prior art keywords
copy
data
memory node
matrix
task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2011101512234A
Other languages
Chinese (zh)
Inventor
刘伟
杜薇
石飞燕
位凯志
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University of Technology WUT
Original Assignee
Wuhan University of Technology WUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University of Technology WUT filed Critical Wuhan University of Technology WUT
Priority to CN2011101512234A priority Critical patent/CN102223404A/en
Publication of CN102223404A publication Critical patent/CN102223404A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a replica selection method based on access cost and transmission time. The method comprises the following steps that: for any data-intensive task needing to access a plurality of data replicas, at first, a replica selection problem is modeled into a WSCP (Weight Set Covering Problem); then the problem is converted into a matrix; and by adopting a weight greedy algorithm, a storage node with minimum replica average access cost is selected at each time, so that the transmission time of the data replica can be reduced when the replicas with low cost are selected, until the data replicas needed by the task are obtained. By the method disclosed by the invention, the replicas with low cost can be selected, and simultaneously, the transmission time of the replicas can be reduced. The method is simple, has high execution efficiency, and is suitable for the replica selection in a data-intensive computing environment.

Description

A kind of copy selection method based on access cost and transmission time
Technical field
The present invention relates to the copy selection method in the data-intensive calculating, particularly a kind of copy selection method based on access cost and transmission time.
Background technology
Share to ecommerce from search engine, video, it is the service at center that Internet service becomes gradually with the mass data processing, and the ability to providing data processing is provided its service quality to a great extent.And, i.e. the position of these data, operation, storage, move, share and describe the performance bottleneck that has caused in the computing capability evolution just to the management of these data.Under this background, as a kind of support technology of new services, (Data-Intensive Computing DIC) arises at the historic moment and causes the common concern of industrial quarters and academia in data-intensive calculating.
And the copy technology is a Data Replication Technology in Mobile, is a kind of effective technology that improves data-intensive service quality that is widely adopted in the data-intensive calculating.In the data-intensive environment, promptly deposit a plurality of copies of same data by adopting distributed storage and data redundancy technology at different physical store nodes, not only can improve the reliabilty and availability of data in the data-intensive environment, the access to netwoks that can also effectively reduce data postpones, and improves the load balancing of network etc.And the scheduled for executing of task depends on the memory node of required by task copy to a great extent, and the selection of copy place memory node is optimized the execution efficient that can improve application program when satisfying the user task quality of service requirement.Therefore, when carrying out user task, the memory node of the best at selection required by task data trnascription place is most important.
At present, domestic and international research about copy selection method mostly is under data grid environment:
Srikumar Venugopal of Univ Melbourne Australia and Rajkumar Buyya are devoted to data-intensive application and research under the grid environment always, at " An SCP-based heuristic approach for scheduling distributed data-intensive applications on global grids " (source publication: Journal of Parallel and Distributed Computing volume: 69 phases: 4 pages: proposed a kind of copy selection method 471-487), promptly the copy that carries out in the data-intensive application based on the tree search type algorithm of set covering problem (SCP) is selected, its starting point is all copies that the physical store node of selection minimum number comes the covering task to need, to reduce the time that copy moves.
People such as the Sun Min of University Of Tianjin are at " Ant algorithm for file replica selection in data grid " (source publication: First International Conference on Semantics, Knowledge and Grid Materials Research, SKG 2005, paper number: 4125852 pages: proposed the selection problem that a kind of ant group algorithm solves data trnascription in the data-intensive calculating 64-66), to reduce data access delay, bandwidth consumption and distributed storage load.
People such as the Jin Hai of the Central China University of Science and Technology are at " Using classification techniques to improve replica selection in data grid " (source publication: On The Move to Meaningful Internet Systems volume: 4276 pages: propose a kind of new copy selection strategy based on sorting technique 1376-1387), by utilizing the transfer of data historical information to predict the physical location of best copy, and adopt the contiguous algorithm of K (KNN) to realize that the optimal data copy selects.
As a kind of emerging data-intensive computation model and a kind of computation schema that can handle large-scale data and huge commercial application value is arranged, this service model based on the Internet of cloud computing is subjected to the extensive concern of various circles of society.Each fatware manufacturer is all in the research of actively pushing forward cloud computing and application in the world, and proposed the scheme and the realization of using at cloud respectively, wherein is no lack of information giants such as Google, Amazon, IBM and Microsoft.Copy under the cloud environment selects also becoming the problem of paying close attention to of Chinese scholars:
People such as the Li Jing of University Of Chongqing are at " A replica selection decision in cloud computing environment " (source publication: Advanced Materials Research volume: 121-122 page or leaf: propose a kind of new copy selection algorithm 801-806), based on GM (1,1) the gray scale dynamic model adopts gray system theory to come the prediction data response time, use the reliability of Markov chain prediction data copy simultaneously, can improve the load balance between the memory node under the cloud computing environment.
In sum, the copy selection method major part that researcher before proposes is at improvement under certain specified conditions and optimization, its weak point is: the value of all not considering data itself is the access cost of copy, and does not have fully to pay attention to the time of the required cost of transmission copy.
Summary of the invention
In order to solve the problem that still exists in the present copy selection, at the deficiency of researcher's proposition method before, the purpose of this invention is to provide a kind of copy selection method based on access cost and transmission time, the copy that mainly solves in the data-intensive calculating is selected problem, making can be when selecting low-cost copy, and the transmission time of reducing copy is to improve the execution efficient of data-intensive application program.
Concrete steps of the present invention are as follows:
The first step: the set of a plurality of data trnascriptions that task need be visited and all in the data-intensive environment have the set of memory node of data trnascription as the initialization input of this copy selection course;
Second step: be matrix of task creation, wherein comprise the row of the memory node of the arbitrary copy of required by task as matrix, the duplicate of the document that this task needs is as matrix column, and each nonzero value is used for representing cost from the storage node accesses copy in the matrix, and correlated variables is carried out initialization;
The 3rd step:, the matrix ascending order is arranged according to the average copy access cost of memory node;
The 4th step: from orderly matrix, select first row, it is current memory node with minimum average B configuration copy access cost, it is added to the set of memory node at the best copy place of task choosing, and, upgrade matrix with the row deletion of the file correspondence of the row of this memory node correspondence and covering thereof;
The 5th step: judge whether all copies all are capped,, then forwarded for the 3rd step to, continue to carry out this copy selection course if also have copy not to be capped; Otherwise the copy selection course finishes, and so far obtains the memory node set at the optimum copy place of task needs.
Characteristics of the present invention
The present invention is by adopting the heavy greedy algorithm of cum rights, and each memory node of selecting to have minimum copy average access cost in (ln|X|+1) of polynomial time, when can select the low-cost data copy, reduces moving of copy.Method is simple, carries out the efficient height, is applicable to that the copy in the data-intensive computing environment is selected.
Description of drawings
Fig. 1 is a flow chart of the present invention.
Fig. 2 is the copy preference pattern figure that the present invention is based on the heavy set covering problem (WSCP) of cum rights.
Embodiment
The present invention is described in further detail below in conjunction with drawings and Examples.
Below in the description of embodiment, a data intensive task J, its operation needs the individual data trnascription that is distributed on the individual memory node of m (m>0) of visit k (k>0), and, the access cost difference of a plurality of copies of same file on different memory nodes.As shown in Figure 1, concrete steps are as follows:
The first step: the set of the individual data copy of k (k>0) that task J need be visited and all in the data-intensive environment have the set D of memory node of data trnascription as the initialization input of this copy selection course;
Second step: be matrix of task creation, wherein comprise the row of the memory node of the arbitrary copy of required by task as matrix, the duplicate of the document that this task needs is as matrix column, and each nonzero value is used for representing cost from the storage node accesses copy in the matrix, and correlated variables is carried out initialization;
As shown in Figure 2, the copy set that needs of J share F and represents F={f 1, f 2..., f k, D={d is represented in the memory node set at these copy places with D 1, d 2..., d m, and w MkExpression is from memory node d mLast visit copy f kCost.Accompanying drawing 3 can further be converted into a matrix A=[a Ij], wherein i represents i memory node, and 1≤i≤m, and j represents j copy, and 1≤j≤k.If VM can be from memory node d iGo up with cost w IjVisit data copy f j, a then Ij=w Ij(w Ij>0), otherwise, if d iDo not comprise f j, a then Ij=0.If delegation comprises a cost w in the matrix in a certain row IjBe called this row with cost w Ij" covering " should row, then the copy set of selecting problem can be converted into the row matrix that finds an optimum makes it cover all row with minimum average weight, this problem can be summed up as the heavy set covering problem (WSCP) of cum rights.In the following description, set C is used for depositing the current memory node of having selected that contains best copy, and set E is used for depositing the current copy that has been covered by selected memory node, it is carried out initialization respectively: C Φ, E Φ;
The 3rd step:, the matrix ascending order is arranged according to the average copy access cost of memory node;
According to formula
Figure BDA0000066590340000041
Calculate the average copy access cost of each memory node in the matrix, the ratio of the number of the total cost of the task J desired data copy that promptly memory node covered and the copy that is covered, wherein d iRepresent i memory node, Expression memory node d iTotal access cost of the copy that covers, its computing formula is:
W d i = Σ 1 ≤ j ≤ k w ij
And | d i∩ F| is used for representing memory node d iThe number of the copy that being covered of task J needs.From formula
Figure BDA0000066590340000044
As can be seen, memory node d iThe required number of copies of being covered of task J is many more, and total access cost of these copies is more little, and then the average access cost of copy is more little, and promptly the copy access cost is lower, and the copy that covers is more concentrated.
For the copy of selecting low cost and relatively concentrating, according to the average access cost of the copy of each memory node, the matrix ascending order is arranged, make can begin from the memory node of average access cost minimum to select at every turn.
The 4th step: from orderly matrix, select first row, it is current memory node with minimum copy average access cost, it is added to the set of memory node at the best copy place of task choosing, and, then matrix is upgraded the row deletion of the file correspondence of the row of this memory node correspondence and covering thereof;
In the 3rd step, matrix has been carried out the ascending order arrangement according to the average access cost that each memory node covers the copy of task J needs, therefore, the corresponding memory node d of first row of selection matrix, promptly the current copy average access cost of this memory node is minimum, with the memory node set C at the best copy place selected of the task that is added to J in, i.e. C C ∪ d; Copy with the matrix column correspondence that d covered adds among the E simultaneously, i.e. E E ∪ d; Then matrix A is upgraded, deletion first row is the row of the copy correspondence that covered of pairing row and the d of memory node d.
The 5th step: judge whether all copies all are capped,, then forwarded for the 3rd step to, continue to carry out this copy selection course if also have copy not to be capped; Otherwise the copy selection course finishes, and so far obtains the memory node set at the optimum copy place of task needs.
Judge whether the copy set that has been capped equates with the set F of the required whole copies of task J, if E ≠ F illustrates that copy is not capped in addition, then turns to for the 3rd step, continue memory node is selected in new matrix ordering then; Otherwise, illustrating that all copies all are capped, the copy selection course finishes, and so far obtains the memory node set C at the optimum copy place of task needs.

Claims (1)

1. copy selection method based on access cost and transmission time is characterized in that:
The first step: the set of a plurality of data trnascriptions that task need be visited and all in the data-intensive environment have the set of memory node of data trnascription as the initialization input of this copy selection course;
Second step: be matrix of task creation, wherein comprise the row of the memory node of the arbitrary copy of required by task as matrix, the duplicate of the document that this task needs is as matrix column, and each nonzero value is used for representing cost from the storage node accesses copy in the matrix, and correlated variables is carried out initialization;
The 3rd step:, the matrix ascending order is arranged according to the average copy access cost of memory node;
The 4th step: from orderly matrix, select first row, it is current memory node with minimum average B configuration copy access cost, it is added to the set of memory node at the best copy place of task choosing, and, upgrade matrix with the row deletion of the file correspondence of the row of this memory node correspondence and covering thereof;
The 5th step: judge whether all copies all are capped,, then forwarded for the 3rd step to, continue to carry out this copy selection course if also have copy not to be capped; Otherwise the copy selection course finishes, and so far obtains the memory node set at the optimum copy place of task needs.
CN2011101512234A 2011-06-07 2011-06-07 Replica selection method based on access cost and transmission time Pending CN102223404A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011101512234A CN102223404A (en) 2011-06-07 2011-06-07 Replica selection method based on access cost and transmission time

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2011101512234A CN102223404A (en) 2011-06-07 2011-06-07 Replica selection method based on access cost and transmission time

Publications (1)

Publication Number Publication Date
CN102223404A true CN102223404A (en) 2011-10-19

Family

ID=44779830

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011101512234A Pending CN102223404A (en) 2011-06-07 2011-06-07 Replica selection method based on access cost and transmission time

Country Status (1)

Country Link
CN (1) CN102223404A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103793425A (en) * 2012-10-31 2014-05-14 国际商业机器公司 Data processing method and data processing device for distributed system
CN104915205A (en) * 2015-06-08 2015-09-16 北京航空航天大学 Request multi-replica task execution method applicable to online data intensive applications
CN108255593A (en) * 2017-12-20 2018-07-06 东软集团股份有限公司 Task coordination method, device, medium and electronic equipment based on shared resource
CN109902797A (en) * 2019-04-22 2019-06-18 桂林电子科技大学 A kind of cloud Replica placement scheme based on ant group algorithm
CN114356236A (en) * 2021-12-31 2022-04-15 杭州趣链科技有限公司 Block chain data storage and reading method and block chain data access system

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101751234A (en) * 2010-01-21 2010-06-23 浪潮(北京)电子信息产业有限公司 Method and system for distributing disk array data

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101751234A (en) * 2010-01-21 2010-06-23 浪潮(北京)电子信息产业有限公司 Method and system for distributing disk array data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WEI LIU ET. AL: "A Cost-Aware Resource Selection for Data-intensive Applications in Cloud-oriented Data Centers", 《I.J.INFORMATION TECHNOLOGY AND COMPUTER SCIENCE》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103793425A (en) * 2012-10-31 2014-05-14 国际商业机器公司 Data processing method and data processing device for distributed system
CN104915205A (en) * 2015-06-08 2015-09-16 北京航空航天大学 Request multi-replica task execution method applicable to online data intensive applications
CN108255593A (en) * 2017-12-20 2018-07-06 东软集团股份有限公司 Task coordination method, device, medium and electronic equipment based on shared resource
CN108255593B (en) * 2017-12-20 2020-11-03 东软集团股份有限公司 Task coordination method, device, medium and electronic equipment based on shared resources
CN109902797A (en) * 2019-04-22 2019-06-18 桂林电子科技大学 A kind of cloud Replica placement scheme based on ant group algorithm
CN114356236A (en) * 2021-12-31 2022-04-15 杭州趣链科技有限公司 Block chain data storage and reading method and block chain data access system

Similar Documents

Publication Publication Date Title
Wang et al. Load balancing task scheduling based on genetic algorithm in cloud computing
CN102467570B (en) Connection query system and method for distributed data warehouse
CN102223404A (en) Replica selection method based on access cost and transmission time
CN100576179C (en) A kind of based on energy-optimised gridding scheduling method
CN105656999B (en) A kind of cooperation task immigration method of energy optimization in mobile cloud computing environment
CN106250240A (en) A kind of optimizing and scheduling task method
Ahmad et al. Optimization of data-intensive workflows in stream-based data processing models
CN106471501A (en) The method of data query, the storage method data system of data object
CN103294912B (en) A kind of facing mobile apparatus is based on the cache optimization method of prediction
CN114327811A (en) Task scheduling method, device and equipment and readable storage medium
CN108304253A (en) Map method for scheduling task based on cache perception and data locality
CN102156659A (en) Scheduling method and system for job task of file
Mansouri et al. Job scheduling and dynamic data replication in data grid environment
Zeng et al. Cost minimization for big data processing in geo-distributed data centers
CN107070965B (en) Multi-workflow resource supply method under virtualized container resource
CN109582457A (en) Network-on-chip heterogeneous multi-core system task schedule and mapping
CN103984737A (en) Optimization method for data layout of multi-data centres based on calculating relevancy
Shu-Jun et al. Optimization and research of hadoop platform based on fifo scheduler
CN103176850A (en) Electric system network cluster task allocation method based on load balancing
CN107341193B (en) Method for inquiring mobile object in road network
Abdi et al. The Impact of Data Replicatino on Job Scheduling Performance in Hierarchical data Grid
Kaur et al. Improvement of Task Offloading for Latency Sensitive Tasks in Fog Environment
CN112114951A (en) Bottom-up distributed scheduling system and method
Bisht et al. Survey on Load Balancing and Scheduling Algorithms in Cloud Integrated Fog Environment
CN108228323A (en) Hadoop method for scheduling task and device based on data locality

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20111019

RJ01 Rejection of invention patent application after publication