CN1658560A - Quickly data copy method based on prediction - Google Patents

Quickly data copy method based on prediction Download PDF

Info

Publication number
CN1658560A
CN1658560A CN 200510031286 CN200510031286A CN1658560A CN 1658560 A CN1658560 A CN 1658560A CN 200510031286 CN200510031286 CN 200510031286 CN 200510031286 A CN200510031286 A CN 200510031286A CN 1658560 A CN1658560 A CN 1658560A
Authority
CN
China
Prior art keywords
data
copy
prediction
data access
visit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 200510031286
Other languages
Chinese (zh)
Inventor
王意洁
李思昆
秦永进
周婧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN 200510031286 priority Critical patent/CN1658560A/en
Publication of CN1658560A publication Critical patent/CN1658560A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A fast data copying method based on forecast to solve the problem that the present copying method difficultly improves the efficiency of copying large amount of data. The method is to adopt the strategy of the combination of data accessing forecast and copying process, use data accessing forecast to decide adding and deleting of copies, adopt parallel copying scheme during the copying procedure, improve the data accessing efficiency from two aspects of data placement and copying. The specific scheme have two steps: at first, use the historical record from data accessing to forecast the data accessing in certain period of time; secondly, according to the present network state choose N copies that has the least expenditure and the accessing amount of every copy, finish the integration of the target copy from different parts of transmitting copies. The invention increases the local shoot straight of data accessing, improve the data copying efficiency and the data accessing efficiency of applied program.

Description

A kind of quickly data copy method based on prediction
Technical field: the present invention relates to based on the data copy method in the distributed system of Wide Area Network, especially the data access efficiency is required than higher, based on the data copy method in the distributed virtual environment system of Wide Area Network.
Background technology: it is one of distributed system key problem that will solve that data are duplicated, and the quality of data copy method directly has influence on the performance height of distributed system.At present, the clone method that adopts in the distributed system all is based on data object, that is, when copy data, according to current network state, select copy of visit expense minimum to duplicate from a plurality of available data trnascriptions.In distributed virtual environment system based on Wide Area Network, relate to many large-scale mass data objects, clone method based on data object mainly is to weigh between the network bandwidth and memory space, be difficult to make full use of the network bandwidth and improve the data duplicating efficiency, thereby influence the raising of data access efficiency.Therefore, how the mass data object being carried out quick copy is the hot issue that the distributed virtual environment researcher is badly in need of solving.
Summary of the invention: the technical problem to be solved in the present invention be at existing data copy method based on data object be difficult to effectively improve the duplicating efficiency of mass data object and propose based on the prediction quickly data copy method, it combines data access prediction and parallel duplicating, utilize the data access prediction to decide the increase and the deletion of copy, in the copy replication process, adopt parallel replication strategy, place and two aspects raisings of copy replication data access efficiency from copy, satisfy of the requirement of distributed virtual environment system the data access efficiency.
Technical scheme is: the present invention is divided into the data access prediction and parallel data duplicated for two steps: at first, utilize the following interior at interval sometime data access of historical record prediction of data access; Then, select N the copy of visit expense minimum and the data access amount of each copy,, be merged into complete data object copy at last from the different piece of each copy data object according to current network state.
The data access forecast method is:
The data access prediction is exactly to utilize the historical record of data access to predict following interior at interval sometime data access.In distributed system, the access module of different types of data shows different localities, that is:
● spatial locality is exactly that the data in the certain limit might be accessed in following certain hour interval around the accessed data;
● temporal locality is exactly that accessed data might be accessed once more recently;
● no locality is exactly that data access randomness is very strong.
The distributed virtual environment system all has spatial locality significantly to the access request of spatial datas such as landform, sea chart and geographical threedimensional model, and these data object volumes are all bigger, the data format complexity is various, does not also have the uniform and effective management method at present.Therefore, under distributed environment, visit these data objects and can bring very big delay, the performance of distributed virtual environment system is brought direct influence, the present invention adopts to come the visit of spatial data is optimized based on the Forecasting Methodology of spatial movement physical model.
The spatial movement physical model utilizes the kinematic principle of physics to set up the forecast model of a data access in the suitable distributed virtual environment exactly.The spatial movement fundamental is exactly time and position, utilizes kinematic principle can set up the position about the function of time, thereby predicts the movement position in following certain hour at interval.Can calculate the data object that comprises this position according to this position.
The spatial movement physical model is created the free t of basic parameter, displacement s, speed v based on the equation of motion of object in the physics.Fundamental equation is: s ρ = s ρ 0 + ∫ t 0 t n v ρ dt ; From the access request sequence of spatial data, extract the space coordinates sequence A of request msg 0, A 1, Λ A n, the time series t of record request generation simultaneously 0, t 1Λ t nIn the formula Initial coordinate A for request sequence 0, use the Lagrange interpolation method to obtain to the sequence of A and T:
● space coordinates is about the function of time A = Σ k = 0 n ( Π i = 0 i ≠ k n t - t i t k - t i ) A k ,
● the time is about the function of coordinate t = Σ k = 0 n ( Π i = 0 i ≠ k n x - x i x k - x i ) t k
These two function representations the spatial movement trend of access request, be the another kind of approximate expression form of fundamental equation because space coordinates has two dimension, three-dimensional coordinate, can use above-mentioned interpolating function to each dimension of coordinate.Obtain following two kinds of forms thus:
If geographical coordinate is A, then have:
1, A=f (t) is the function of geographical coordinate about the time;
2, t=f ' is the function of time about geographical coordinate (A);
Suppose that the current time is t n, can obtain t by this function N+1Space coordinates has constantly been accomplished the prediction to request visit in following a period of time.
Can find through simulation test repeatedly, when utilizing 4 interpolation, just can reach good prediction effect, utilize the interpolation formula of high order more can obtain to a certain degree improvement, but amount of calculation be bigger.Recommend to use 4 interpolating functions to carry out prediction and calculation in order to reach calculating purpose the present invention simply fast.
In real system, create the spatial movement physical model and obtain parameter and select prediction method constantly to be:
Obtain parameter.In distributed virtual environment, to all including the geographical location information of this data object in the access request of spatial data, by analyzing data access request, information such as time that can the record data access request, geographical position.Based on these historical informations, can utilize the spatial movement physical model to carry out the prediction of data access.
Select prediction be exactly constantly determine when to need to predict proper.Prediction is constantly by decisions such as the size of the movement velocity size direction of spatial movement model representation, spatial data piece, network availability bandwidths, these three factors determine prediction constantly simultaneously, comprehensive three factors could be under the situation that does not influence the normal visit data of system, it is local to utilize the network idle bandwidth in the time that allows transfer of data to be arrived, the hiding data access delay.The trend of the movement velocity representative of consumer data access request of spatial movement model representation, exactly future a certain moment user may visit that block space data.
The speed of hypothesis space motion model representative is V, the position of current motion be A (x, y); The size of a spatial data piece is M, and representative geographical position scope is P (x 0, y 0, x 1, y 1); The available bandwidth of network is B; Current time is T 0Last moment T=min (the f ' (x that then predicts 0), f ' (x 1), f ' (y 0), f ' (y 1))-M/B;
After a period of time was carried out in the prediction visit, because the restriction of local memory space, new data may can not find enough spaces and store.At this moment just must replace a certain blocks of data in the local memory space with new data, the replacement policy that the present invention proposes and adapt based on the spatial movement model, just being based on the replacement policy of space length, is at the data with spatial geographical locations information equally.
Suppose 1), current time is T 0
2), a spatial data piece representative geographical position scope is P (x 0, y 0, x 1, y 1) (generally being the coordinate in the upper right corner, the geographical rectangular extent lower left corner), then the scope of local data set expression is { P 0, P 1, P 2Λ };
3), the motion model that has living space can obtain current coordinates of motion A=f (T 0),
Use S iRepresent A and p iThe distance of central point then can obtain set omega={ S 0, S 1, S 2Λ };
Then replace data block p i=max (Ω).
The method that parallel data is duplicated is:
Parallel clone method is to duplicate from a plurality of copies that are distributed in different nodes simultaneously, utilizes the redundancy communication link of bottom-layer network to come the expedited data reproduction speed.In order to make this acceleration effect reach optimum, the present invention adopts the copy selection strategy to carry out copy and selects, and adopts the visit capacity allocation strategy to carry out the distribution of data access amount.
Often there is a large amount of copies in data object in network, if visit all copies simultaneously, so, not only can take a large amount of Internet resources, and may not reach best duplicating efficiency.The copy selection strategy mainly solves two problems, and the one, how from numerous copies, to select a plurality of copies; The 2nd, should select several copies comparatively suitable.Under actual conditions, the copy that is not selection is The more the better, because when a data object has a lot of copy, if duplicate simultaneously from all copies, the data access amount of each copy distribution of possibility is very little so, even compare and can ignore with setting up the time that network is connected, at this moment the parallel time of duplicating is exactly the maximum delay time that replica node and all copies connect.In this case, the increase of visit copy number not only can not reach the effect of improving duplicating efficiency, may increase the time of duplicating on the contrary.The copy selection strategy is selected the best a plurality of copies of access efficiency from all copies, by simulation test as can be known, parallel duplicate the best copy number of selection and the average number of degrees of bottom-layer network node have certain relation, when the average nodal that equals the bottom-layer network node when parallel copy number of duplicating selection is spent, parallel efficient the best of duplicating.
The distribution of data access amount mainly solves the problem of how duplicating from a plurality of copies, duplicates so that finish in the shortest time.Basic principle is from the little more data of copy visit of visit expense, from the big a little less data of copy visit of visit expense.If all copies finish transfer of data simultaneously, then can reach best duplicating efficiency.
In traditional clone method, replica node is selected a copy of visit expense minimum to carry out data according to current network state to duplicate.Suppose that the network bandwidth between selected at that time replica node and the replica node is B, size of data is M, and doubling time is M/B.
It is N copy selecting visit expense minimum according to current network state that parallel data is duplicated, and from the different piece of each copy data object, is merged into complete data object then.Suppose that the available copies number is N in the current system, the network availability bandwidth between these copy place nodes and the replica node is V={v 1, v 2, Λ v N.So, the total bandwidth of the copy of all selections is SUM=v 1+ v 2+ Λ v N, the maximum network bandwidth is MAX=max (V).
It is exactly will be from selected copy copy data simultaneously that parallel data is duplicated, and from the part of different copy replication data objects, the present invention will carry out the process that data duplicate from each copy and be called the replicon process.Ideally, all replicon processes begin simultaneously and finish simultaneously, so just can make parallel duplicate most effective, for reaching this purpose, the present invention's regulation is directly proportional to the data access amount of each copy and the network bandwidth of replica node correspondence, therefore, the data access amount of each copy is assigned as Ω = { M v 1 SUM , M v 2 SUM , ΛM v N SUM } . This shows that in traditional replication strategy, the data doubling time is M/MAX; In parallel replication strategy, the data doubling time is M/SUM.
In actual conditions, because the dynamic of network even each sub-reproduction process that parallel data is duplicated begins simultaneously, may not be to finish simultaneously also, so, data doubling time D should satisfy M/SUM≤D≤M/MAX.
The present invention passes through quickly data copy system based on prediction of design and realizes, this system is made up of data access prediction module, parallel replication module, data replacement module, data access logging modle, data access history module, local data space.Next step data access is predicted according to the historical record of data access by the data access prediction module, produced data copy request, parallel replication module is submitted in request; After parallel replication module received data copy request from the data access prediction module, a plurality of copies from network carried out data simultaneously and duplicate, duplicate finish after, data trnascription is submitted to data replaces module; Data are replaced module according to the replacement policy that adapts based on the spatial movement model, need the data trnascription of replacing in the Data Replica Replacement copy space that utilization receives; The historical information of data access logging modle record data visits comprises the start-up time, geographical location information of data access etc., and these historical informations are foundations that the data access prediction module is carried out the data access prediction.
Groundwork process based on the quickly data copy system that predicts is:
1. copy produces, and is divided into two kinds of situations:
A) started data access request one time in this locality, and can find the data that need visit in this locality.Infer next step data access according to the visit predicting strategy, if prediction is about to the data of visit not in this locality, check then whether local network is idle, if free time then start the parallel data transmission course, otherwise, wait for the local network free time, and the stand-by period is no more than time T---according to the regulation in the space physics motion forecast method, if wait for having surpassed time T, then cancellation is this time duplicated.
If there are enough spaces in the local replica space, then directly produce new copy, otherwise utilize the decision of data replacement policy to need the data trnascription of replacing.
B) started data access request one time in this locality, and can not find the data that need visit, then carried out the parallel data transmission in this locality.
If there are enough spaces in the local replica space, then directly produce new copy, otherwise together at the data that are about to visit and the data in the copy space, using data to replace the module decision needs the data trnascription replaced.The data of Ti Huaning with the data of visit, then do not produce new copy videlicet if desired, otherwise utilize the data obtain to replace to need in the copy space data of replacing.
2. copy transmission adopts parallel replication strategy that long-range data are copied to this locality exactly in the visit teledata with when duplicating.
3. copy is replaced, and according to replacement policy decision needs certain blocks of data in the local replica space is replaced exactly.
Adopt the present invention can reach following technique effect:
Compare with traditional clone method, on the one hand, the present invention utilizes forecasting mechanism that the data access in future is predicted, utilizes the idle bandwidth of network to carry out data and duplicates, improve the local hit rate of data access, also improved the application's data access efficiency; On the other hand, the present invention duplicates by parallel, makes full use of the redundant path of network, has improved the data duplicating efficiency, has also improved the application's data access efficiency.
Description of drawings:
Fig. 1 realizes the basic principle figure based on the quickly data copy system that predicts of the present invention;
Fig. 2 is a flow chart of the present invention;
Fig. 3 is the schematic diagram that parallel data of the present invention is duplicated;
Fig. 4 is the parallel schematic diagrames that influence each other when duplicating of a plurality of nodes of the present invention;
Fig. 5 is the network environment allocation plan of performance test of the present invention;
Fig. 6 is the present invention and based on the contrast test result of economics replication of Model method;
Fig. 7 is the relation of general assignment time of implementation of the present invention and copy space size;
Fig. 8 is the relation of network edge node transfer of data total amount of the present invention and copy space size.
Embodiment:
Fig. 1 realizes the basic block diagram based on the quickly data copy system that predicts of the present invention.Quickly data copy system based on prediction is made up of data access prediction module, parallel replication module, data replacement module, data access logging modle, data access history module, local data space.
● the data access prediction module realizes the data access prediction, is whether the present invention is crucial efficiently, and accurately then the local hit rate of data access is just high in prediction, and the efficient of data access is just high.The data access prediction module is predicted next step data access according to the historical record of data access, produces data copy request, and parallel replication module is submitted in request.
● after parallel replication module received data copy request from the data access prediction module, a plurality of copies from network carried out data simultaneously and duplicate, duplicate finish after, data trnascription is submitted to data replaces module.
● data are replaced module according to the replacement policy that adapts based on the spatial movement model, need the data trnascription of replacing in the Data Replica Replacement copy space that utilization receives.
● the historical information of data access logging modle record data visits comprises the start-up time, geographical location information of data access etc.These historical informations are foundations that the data access prediction module is carried out the data access prediction.
Groundwork based on the quickly data copy system that predicts comprises that copy produces, copy transmits, copy is replaced three processes.
● the copy production process is divided into two kinds of situations:
C) started data access request one time in this locality, and can find the data that need visit in this locality.Infer next step data access according to the visit predicting strategy, if prediction is about to the data of visit not in this locality, check then whether local network is idle, if free time then start the parallel data transmission course, otherwise, wait for the local network free time, and the stand-by period is no more than time T---according to the regulation in the space physics motion forecast method, if wait for having surpassed time T, then cancellation is this time duplicated.
If there are enough spaces in the local replica space, then directly produce new copy, otherwise utilize the decision of data replacement policy to need the data trnascription of replacing.
D) started data access request one time in this locality, and can not find the data that need visit, then carried out the parallel data transmission in this locality.
If there are enough spaces in the local replica space, then directly produce new copy, otherwise together at the data that are about to visit and the data in the copy space, using data to replace the module decision needs the data trnascription replaced.The data of Ti Huaning with the data of visit, then do not produce new copy videlicet if desired, otherwise utilize the data obtain to replace to need in the copy space data of replacing.
● the copy transmission course is exactly to utilize parallel reproduction process that long-range data are copied to this locality in the visit teledata with when duplicating.
● the copy replacement process is exactly according to replacement policy decision needs certain blocks of data in the local replica space to be replaced.
Fig. 2 is the flow chart that parallel data of the present invention is duplicated.When node receives data access request, at first search the data object that needs visit in this locality; If local data visit failure then utilizes parallel replicanism to set up the copy of data object in this locality.On the other hand, node utilizes forecasting mechanism that the data access in future is predicted, when network is idle the data object that will visit future is walked abreast and duplicates; If the idle storage space deficiency of node is then carried out the replacement of data object according to the data access frequency.
The present invention duplicates by data access prediction and parallel data, improves the hit rate of local data visit, has reduced remote data access, has improved the efficient of data access.
What Fig. 3 illustrated is to adopt the embodiment of the present invention from three nodes while copy datas.Need each node support to read certain part with data object, and under data volume of duplicating from each node and the current network state between replica node and the replica node available bandwidth be directly proportional.
Be that server node 3 carries out data from server node 1,2,4 and duplicates shown in the figure.
If the network availability bandwidth ratio between the node 1,3, between 2,3, between 4,3 is 3: 1: 2, data object size is M, and then the ratio of duplicating is divided into Ω { M/2, M/6, M/3};
Node 3 0~M/2, M/2~2M/3, the 2M/3~M part of copy data object from the node 1,2,4 respectively then;
Duplicate from 3 nodes simultaneously, utilized the redundant path and the bandwidth of network, quickened reproduction process, than carry out fast 1 times of reproduction speed separately from node 1, than carry out fast 3 times of reproduction speed separately from node 2, than carry out fast 1.5 times of reproduction speed separately from node 3.
Fig. 4 illustrates when two nodes carry out simultaneously that many copies are parallel to be duplicated, and occurs influence each other on the network, and the node on network is many more, and this influence is big more.
Node shown in the figure 2 and 3 walks abreast simultaneously and duplicates, and node 2 carries out copy data from node 3,4,5 simultaneously, and node 3 carries out copy data from node 1,2,4 simultaneously.In the parallel reproduction process that has occurred on the network on positive and negative two kinds of different factor affecting networks:
Unfavorable factor: the conflict of 6 places occurred on network, in the place of these conflicts, two parallel reproduction processes may occur and influence each other, the use network bandwidth of vying each other will make doubling time increase like this.
Figure A20051003128600132
Favorable factor: also existing on the network on 6 network segments does not have affected transfer of data, and these transfer of data have been quickened the process of transfer of data again.
Therefore, be exactly total the coefficient result of the above the pros and cons of doubling time, when network size is very big, when having thousands of routing node and server node on the network, the effect of two aspects just is difficult to significantly make a distinction, need to use statistical method to analyze, and print effect is verified by the method for simulation test.
Fig. 5 is the network environment allocation plan that the present invention is carried out performance test, comprise 11 resource nodes and 7 routing nodes, 11 resource nodes comprise 10 computing units and 11 memory cell, the size of setting each data file is 200MB, the data of total total 19.4GB in the system, the time of a data object of each processing unit processes is 100ms, and the average nodal number of degrees of bottom-layer network are 3.Task type configuration, task data visit capacity and generation probability parameter are as shown in the table.
The task kind The data volume (GB) of visit The probability that produces
????1 ????2.4 ????17%
????2 ????0.4 ????17%
????3 ????1 ????17%
????4 ????2.8 ????16%
????5 ????11.6 ????17%
????6 ????1.2 ????16%
Fig. 6 is that data object is 97, and the copy number that parallel replication strategy is selected is 3, and the task number is respectively 100,300,500,700,1000 and at 5000 o'clock, the present invention and based on the contrast test result of economics replication of Model method.Basic thought based on economics replication of Model method is the bid competitive bidding process of utilizing in the P2P network analog economics, makes data trnascription rationally distribute on network, thereby improves the data access efficiency of system.As can be seen from Figure 6, the present invention on effect significantly better than based on economics replication of Model algorithm, mainly be because parallel replication strategy has been utilized the idle bandwidth and the redundant path of network fully, the remote data access that greatly reduces program postpones, so the general assignment deadline shortens greatly, improved the efficient of data access.
Fig. 7 is that the general assignment number is 500, total data file size is 19.4GB, when the data copy space size of each processing node is respectively under the situation of 1.94GB, 3.88GB, 5.82GB, 7.76GB, 9.7GB, 11.64GB, 13.58GB, 15.52GB and 17.46GB the situation of change of Total Mission Time.As can be seen from the figure Total Mission Time does not have obvious variation, mainly is because the present invention has increased the local hit rate of data access by the prediction visit, so adopts the present invention less demanding to the data copy space of each processing node.
Fig. 8 is that the general assignment number is 500, total data file size is 19.4GB, when the data copy space size of each processing node is respectively under the situation of 1.94GB, 3.88GB, 5.82GB, 7.76GB, 9.7GB, 11.64GB, 13.58GB, 15.52GB and 17.46GB the situation of change of network edge node transfer of data total amount.As can be seen from the figure along with the increase in local replica space, network edge node transfer of data total amount has obvious downward trend.Because increase the local replica space, reduced the data access prediction and utilized network with the probability of transfer of data, so reduced the load of network to this locality.Therefore the suitable size that copy space is set can be improved the loading condition of network well.

Claims (4)

  1. One kind based on the prediction quickly data copy method, it is characterized in that adopting data access prediction and the parallel strategy that combines that duplicates, utilize the increase and the deletion of data access prediction decision copy, in the copy replication process, adopt parallel replication strategy, place and two aspects raisings of copy replication data access efficiency from copy, concrete scheme is divided into the data access prediction and parallel data duplicated for two steps: at first, utilize the following interior at interval sometime data access of historical record prediction of data access; Then, select N the copy of visit expense minimum and the data access amount of each copy,, be merged into complete data object copy at last from the different piece of each copy data object according to current network state.
  2. 2. the quickly data copy method based on prediction as claimed in claim 1 is characterized in that described data access forecast method is:
    2.1 adopt to come the visit of spatial data is optimized based on the Forecasting Methodology of spatial movement physical model: the spatial movement physical model is created based on the equation of motion of object in the physics, the free t of basic parameter, displacement s, speed v, and fundamental equation is: s ρ = s ρ 0 + ∫ t 0 t n v ρ dt ; From the access request sequence of spatial data, extract the space coordinates sequence A of request msg 0, A 1, Λ A n, the time series t of record request generation simultaneously 0, t 1Λ t nIn the formula Initial coordinate A for request sequence 0, use the Lagrange interpolation method to obtain to the sequence of A and T:
    ● space coordinates is about the function of time A = Σ k = 0 n ( Π i = 0 i ≠ k n t - t i t k - t i ) A k ,
    ● the time is about the function of coordinate t = Σ k = 0 n ( Π i = 0 i ≠ k n x - x i x k - x i ) t k
    Because space coordinates has two dimension, three-dimensional coordinate, obtain following two kinds of forms behind the above-mentioned interpolating function of each dimension use to coordinate:
    If geographical coordinate is A, then have:
    1), A=f (t), be the function of geographical coordinate about the time;
    2), t=f ' (A), be the function of time about geographical coordinate;
    Suppose that the current time is t n, obtain t by this function N+1Space coordinates constantly is with this prediction to asking in following a period of time to visit; Can find through simulation test repeatedly, when utilizing 4 interpolation, just can reach good prediction effect, recommend to use 4 interpolating functions to carry out prediction and calculation in order to reach calculating purpose the present invention simply fast;
    2.2 creating the spatial movement physical model obtains parameter and selects prediction method constantly to be:
    2.2.1 obtain the method for parameter be: in distributed virtual environment, to all including the geographical location information of this data object in the access request of spatial data, by analyzing data access request, information such as time that can the record data access request, geographical position, based on these historical informations, can utilize the spatial movement physical model to carry out the prediction of data access;
    2.2.2 select prediction be exactly constantly determine when to need to predict proper, prediction is determined by movement velocity size direction, the size of spatial data piece, the network availability bandwidth of spatial movement model representation constantly, these three factors determine prediction constantly simultaneously, comprehensive three factors could be under the situation that does not influence the normal visit data of system, it is local to utilize the network idle bandwidth in the time that allows transfer of data to be arrived, the hiding data access delay; The trend of the movement velocity representative of consumer data access request of spatial movement model representation, exactly future a certain moment user may visit that block space data; The speed of hypothesis space motion model representative is V, the position of current motion be A (x, y); The size of a spatial data piece is M, and representative geographical position scope is P (x 0, y 0, x 1, y 1); The available bandwidth of network is B; Current time is T 0Last moment T=min (the f ' (x that then predicts 0), f ' (y 0), f ' (y 1))-M/B;
    2.3 after a period of time is carried out in the prediction visit, because the restriction of local memory space, new data may can not find enough spaces and store, at this moment just must replace a certain blocks of data in the local memory space with new data, the present invention adopts and replaces based on the replacement policy of space length:
    Suppose 1), current time is T 0
    2), a spatial data piece representative geographical position scope is P (x 0, y 0, x 1, y 1), then the scope of local data set expression is { P 0, P 1, P 2Λ };
    3), the motion model that has living space can obtain current coordinates of motion A=f (T 0),
    Use S iRepresent A and p iThe distance of central point then can obtain set omega={ S 0, S 1, S 2Λ };
    Then replace data block p i=max (Ω).
  3. 3. the quickly data copy method based on prediction as claimed in claim 1, it is characterized in that the method that described parallel data is duplicated is: duplicate from a plurality of copies that are distributed in different nodes simultaneously, utilize the redundancy communication link of bottom-layer network to come the expedited data reproduction speed, in order to make this acceleration effect reach optimum, the present invention adopts the copy selection strategy to carry out copy and selects, and adopts the visit capacity allocation strategy to carry out the distribution of data access amount:
    3.1 the copy selection strategy mainly solves two problems, and the one, how from numerous copies, to select a plurality of copies, method is to select the best a plurality of copies of access efficiency from all copies; The 2nd, should select several copies comparatively suitable, parallel duplicate the best copy number of selection and the average number of degrees of bottom-layer network node have certain relation, when the average nodal that equals the bottom-layer network node when parallel copy number of duplicating selection is spent, parallel efficient the best of duplicating;
    3.2 the distribution of data access amount mainly solves the problem of how duplicating from a plurality of copies, duplicate so that in the shortest time, finish, basic principle is from the little more data of copy visit of visit expense, from the big a little less data of copy visit of visit expense, if all copies finish transfer of data simultaneously, then can reach best duplicating efficiency.
  4. 4. the quickly data copy method based on prediction as claimed in claim 1, it is characterized in that the present invention realizes that by designing a quickly data copy system based on prediction this system is made up of data access prediction module, parallel replication module, data replacement module, data access logging modle, data access history module, local data space; The data access prediction module is predicted next step data access according to the historical record of data access, produces data copy request, and parallel replication module is submitted in request; After parallel replication module received data copy request from the data access prediction module, a plurality of copies from network carried out data simultaneously and duplicate, duplicate finish after, data trnascription is submitted to data replaces module; Data are replaced module according to the replacement policy that adapts based on the spatial movement model, need the data trnascription of replacing in the Data Replica Replacement copy space that utilization receives; The historical information of data access logging modle record data visits comprises start-up time, the geographical location information of data access, and these historical informations are foundations that the data access prediction module is carried out the data access prediction.
CN 200510031286 2005-02-28 2005-02-28 Quickly data copy method based on prediction Pending CN1658560A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200510031286 CN1658560A (en) 2005-02-28 2005-02-28 Quickly data copy method based on prediction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200510031286 CN1658560A (en) 2005-02-28 2005-02-28 Quickly data copy method based on prediction

Publications (1)

Publication Number Publication Date
CN1658560A true CN1658560A (en) 2005-08-24

Family

ID=35007831

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200510031286 Pending CN1658560A (en) 2005-02-28 2005-02-28 Quickly data copy method based on prediction

Country Status (1)

Country Link
CN (1) CN1658560A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101841556A (en) * 2010-02-23 2010-09-22 中国科学院计算技术研究所 Method and system for placing resources replication in CDN-P2P (Content Distribution Network-Peer-to-Peer) network
CN102411607A (en) * 2010-09-20 2012-04-11 汤姆森许可贸易公司 Method of data replication in a distributed data storage system and corresponding device
CN101626504B (en) * 2008-07-09 2012-06-06 上海飞来飞去多媒体创意有限公司 Method for high speed JPEG decoding
US9118526B2 (en) 2010-10-11 2015-08-25 Huawei Technologies Co., Ltd. Method and apparatus for controlling data storage
CN108519861A (en) * 2018-04-02 2018-09-11 广东能龙教育股份有限公司 Dynamic storage method based on large-scale parallel access
CN109697018A (en) * 2017-10-20 2019-04-30 北京京东尚科信息技术有限公司 The method and apparatus for adjusting memory node copy amount

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101626504B (en) * 2008-07-09 2012-06-06 上海飞来飞去多媒体创意有限公司 Method for high speed JPEG decoding
CN101841556A (en) * 2010-02-23 2010-09-22 中国科学院计算技术研究所 Method and system for placing resources replication in CDN-P2P (Content Distribution Network-Peer-to-Peer) network
CN101841556B (en) * 2010-02-23 2013-01-30 中国科学院计算技术研究所 Method and system for placing resources replication in CDN-P2P (Content Distribution Network-Peer-to-Peer) network
CN102411607A (en) * 2010-09-20 2012-04-11 汤姆森许可贸易公司 Method of data replication in a distributed data storage system and corresponding device
CN102411607B (en) * 2010-09-20 2016-08-03 汤姆森许可贸易公司 In distributed data-storage system data replicate method and relevant device
US9118526B2 (en) 2010-10-11 2015-08-25 Huawei Technologies Co., Ltd. Method and apparatus for controlling data storage
CN109697018A (en) * 2017-10-20 2019-04-30 北京京东尚科信息技术有限公司 The method and apparatus for adjusting memory node copy amount
CN108519861A (en) * 2018-04-02 2018-09-11 广东能龙教育股份有限公司 Dynamic storage method based on large-scale parallel access

Similar Documents

Publication Publication Date Title
JP6784780B2 (en) How to build a probabilistic model for large-scale renewable energy data
CN108363643B (en) HDFS copy management method based on file access heat
CN111381936A (en) Method and system for allocating service container resources under distributed cloud system-cloud cluster architecture
CN1658560A (en) Quickly data copy method based on prediction
CN103595805A (en) Data placement method based on distributed cluster
CN104050042A (en) Resource allocation method and resource allocation device for ETL (Extraction-Transformation-Loading) jobs
CN106709068A (en) Hotspot data identification method and device
CN111176784B (en) Virtual machine integration method based on extreme learning machine and ant colony system
CN111813506A (en) Resource sensing calculation migration method, device and medium based on particle swarm algorithm
CN108647771A (en) The layout method of research-on-research flow data under a kind of mixing cloud environment
CN105808339A (en) Big data parallel computing method and device
CN103701894A (en) Method and system for dispatching dynamic resource
CN115755954B (en) Routing inspection path planning method, system, computer equipment and storage medium
CN114423023B (en) Mobile user-oriented 5G network edge server deployment method
CN104881366B (en) Repair the method and system of homogenizing
CN116455768B (en) Cloud edge end collaborative CNN reasoning method and system for global time delay optimization
CN111428747A (en) Method and device for monitoring dust and dirt condition of air cooling radiating fin
CN111526208A (en) High-concurrency cloud platform file transmission optimization method based on micro-service
CN113723443A (en) Distributed training method and system for large visual model
CN115860431A (en) Heterogeneous sensing-based multi-robot intelligent scheduling method, system, robot and medium
CN117707795B (en) Graph-based model partitioning side collaborative reasoning method and system
CN103984737A (en) Optimization method for data layout of multi-data centres based on calculating relevancy
CN104778088A (en) Method and system for optimizing parallel I/O (input/output) by reducing inter-progress communication expense
CN1604054A (en) Disc buffer substitution algorithm in layered video request
CN116755626A (en) Data block allocation prediction method, device, equipment and medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication