Background technology
Grid is the virtual environment of a resource-sharing, co-ordination, can fully receive various resources, and can they be converted into a kind of that be available anywhere, reliable, standard, still economic simultaneously resource.The very important effect of grid computing is exactly effectively to utilize various resources, carries out mass data rapidly and accurately and calculates.To be used for Practical Calculation to the effective time, just need reduce data transmission period as far as possible.
In the grid environment, adopt collaborative allocation architecture at present, utilize the concurrent transmission of a plurality of backups that exist in the grid to solve this problem.The collaborative allocation architecture of tradition has three kinds of strategies to distribute the data block in a plurality of backups, improves the efficient of Network Transmission.Be respectively:
1) method of exhaustion is collaborative distributes
Realize through the file size mean allocation is connected to each, do not consider the bandwidth difference that each client-service end connects.For example a client is connected with 3 service ends, and then each service end institute allocate file size all is 1/3.
2) distribute based on historical method is collaborative
Recently, each distributes corresponding file size through the prediction transmission speed for connecting.For example the transmission performance of 3 connections is than 1: 2: 3, and to be exactly 1/6, the second be exactly 2/6 to first service end institute allocate file size so, and the 3rd is exactly 3/6.
3) conservative load balancing method is collaborative distributes
The file allocation that conservative load balancing method will be transmitted becomes the file fragmentation of some identical sizes, and the server of each connection all distributes a block file burst to carry out the band shape transmission.If a server has been accomplished the transmission of a block file burst, just download until whole file and finish for another block file burst of server-assignment of accomplishing transmission.Should collaboratively divide the loading on the flow is dynamically adjustment, and therefore transmitting fast server can transmit more relatively file size.
Key based on the collaborative distribution of historical method is the prediction to connection speed; Certainly this is Utopian; When the average transmission rate that obtains according to test was distributed the data amount transmitted of each copy, it was perhaps slack-off always to exist various risks to make that transmitting speed accelerates suddenly.It is accurately whether bigger to the transmission performance influence to predict the outcome, and the load sharing process in this strategy is accomplished before the mass data transmission, and no longer changes once distributing just.The same with the collaborative distribution of the method for exhaustion, its maximum shortcoming is the dynamic change that can not adapt to network performance.
The burst size is the good and bad critical factor of the conservative load balancing method algorithm performance of influence.Burst is crossed conference and is made collaborative precision of distributing reduce effect to descend, and the branch leaf length is too small then can to cause connecting the reduction of asking burst to cause performance continually again.So, should weigh both and select a suitable branch leaf length.
The burst size should satisfy following some:
1, should be far longer than the number of data transfer server by the data block unit pack bag number of burst size decision.
2, unit pack should be as far as possible little: can make each data transfer server near finishing at one time, to improve the whole service efficiency of server on the one hand; On the other hand, can adapt to the variation of each data transfer server transmission performance better.
3, unit pack should be big relatively, is far longer than the free time between the transmission unit bag so that transmit a used time of unit pack.
Burst size is selected to be difficult to hold, and in addition, when the transmission speed of one of them server accelerated, it will be than other Server Transport more data amount, and transmission performance is by that minimum server decision of efficiency of transmission.In fact in this case, influence still is smaller, and the method that effectively solves this situation is dynamically to give the bigger data volume of this server, to alleviate the load of other servers.If when the transmission speed slows of one of them server, it will all will be delayed time than other all servers and finish the work.Owing to the said reason in front, this server determines whole transmission performances exactly, and it is fatal for a kind of situation in therefore relative front.
Traditional collaborative allocation strategy does not overcome and transmits the shortcoming that server faster must wait the slowest last block file of Server Transport to be transmitted.In most of the cases, this will waste the overall performance of a lot of times and final influence transmission.
Embodiment
A kind ofly improve that the method for data transmission efficiency may further comprise the steps in the grid environment
Step (1) distributes an identical big or small file fragmentation transformation task for each grid ftp server in the starting stage, and the file fragmentation size is confirmed by bandwidth and weights.
At first, we have set a upper limit for original document burst size, and this depends on client maximum bandwidth and copy number of resources.Although our can a plurality of servers parallel downloaded copy, the file fragmentation of downloading from many lines is gathered all will pass through single line of client.The client bandwidth becomes the bottleneck that collaborative allocation architecture is quickened download obviously.Original document burst maxsize InitialPT computing formula is following:
InitialPT=ClientMaxBandwidth/NumberOf?Re?plicaSource (1.1)
ClientMaxBandwidth is the client maximum bandwidth in the formula (1.1), and NumberOfReplicaSource is a copy resource sum.
We define the computing formula of auxiliary parameter Score earlier according to the state of different server equipment:
R wherein
CPU+ R
MEM+ R
BW=1; Each parameter declaration is following in the formula (1.2):
Score
i: the score value of server i, 1≤i≤n here, n is the server number.
The CPU idle condition percentage of
server i.
R
CPU: the shared ratio that influences key element of user-defined cpu load.
The internal memory free space percentage of
server i.
R
MEM: the shared ratio that influences key element of user-defined internal memory free space.
bandwidth from server i to client accounts for total server bandwidth percentage.Available current bandwidth obtains divided by theoretical maximum bandwidth.
R
BW: the shared ratio that influences key element of user-defined bandwidth.
Score when each server
iAfter obtaining, just can calculate the weight w eighing of each server
i:
Obtain the weight w eighing of each server
iAfter, the file fragmentation size NewPT next to be transmitted of each server
iJust can obtain by through type (1.4):
NewPT
i=ClientBandwidth×weighing
i (1.4)
When step (2) was accomplished current file slicing transmission task as server, this server can be assigned with new file fragmentation transformation task again, if bust this then can retransfer.
Each file fragmentation transmission that is through with current as server i, it will obtain new transformation task.New transformation task is that the real-time status according to server i obtains.Dynamically the adjustment strategy at every turn can be according to the transformation task next time of server end load and each server of Bandwidth Dynamic adjustment.The load of server is light more, and the transformation task that it is assigned with is just many more.
Step (3) need be provided with termination condition for fear of generating too little file fragmentation.If remaining file fragmentation size to be transmitted is less than the file fragmentation size of starting stage, remaining at last file fragmentation will be by transmission immediately.
Below will advantage of the present invention be described through the experiment contrast
Before making an experiment, carry out qualitative analysis for the collaborative allocation algorithm of tradition and the inventive method, as shown in table 1.
The collaborative allocation algorithm of table 1 relatively
Test of heuristics and analysis
(1) setting of input parameter
In order to make dynamic adjustment strategy realize that efficiency of transmission is best, before making an experiment, at first to set dynamic adjustment strategy input parameter.Three influencing factors are arranged: CPU idle condition, internal memory free space and bandwidth, the necessary shared ratio of setting influencing factor here here.
At first need know the influence of bandwidth.When file is smaller, the not much difference of transmission speed.Along with file to be transmitted is increasing, the bandwidth influence is just apparent in view.When the bandwidth ratio was lower than 0.6, transmission speed was slack-off gradually.When the bandwidth ratio was 0.8, transmission speed reached maximum.
Then need know of the influence of CPU computing capability to efficiency of transmission.Adopt three and have same memory and bandwidth, but the different loom of cpu type is tested.Test result shows that cpu performance is good more, and transmission performance is good more, is not very big but transmission performance is influenced by cpu performance, and performance is not significantly increased because cpu performance improves.
At last memory size is tested the influence of efficiency of transmission.Adopt three looms with different internal memories from the server file in download, result of the test shows that memory size neither be clearly for the influence of efficiency of transmission.
From experimental result all in all, CPU computing capability and memory size can improve transmission performance, but influence is not very big.This let us believes firmly that more bandwidth is the main factor that influences transmission performance, and the shared ratio of bandwidth should be greater than other two influencing factors.In the experiment that next will carry out, we are provided with dynamic adjustment strategy input parameter R
CPU, R
MEM, R
MEMRatio is 1: 1: 8.
(2) experimental enviroment builds
Building the grid environment of a compartmentalization, set up 4 different zones, is that the physical circuit through 100Mbps is formed by connecting between each zone, adopts full connected mode between the zone.Each regional concrete configuration is seen table 2.Each server is all installed Globus Tookit4.0.0 and above configuration, and the data transfer tool GridFTP that utilizes it to provide makes an experiment.
The regional machines configurations of table 2
(3) result of the test and analysis
For the performance of verifying the inventive method and copy number of resources to the inventive method Effect on Performance.We carry out two experiments.
Experiment one, the performance of checking the inventive method.
With the a-quadrant is the client-requested transfer of data, from 3 different zones, obtains data.After request of data is sent; When ldap server successfully navigates to 3 data copies, utilize the collaborative distribution of the method for exhaustion, 100MB tested to the data file of 2GB based on the collaborative distribution of historical method, conservative balance policy, dynamic allocation scheme and 5 kinds of methods of single copy transmission.Wherein the transfer of data of single copy is obtained data from B, C, D district separately respectively, is 5% of file length based on length burst in the collaborative distribution of historical method.
Through testing a result, we can obtain as drawing a conclusion:
(1) when the transmission small documents; Situation like 100M; The inventive method transfer of data effect is also not obvious; And the transfer of data effect that single copy occurred is transmitted situation preferably than the copy that manys of the inventive method, and this situation is because the proportion that accounts for total data transmission period computing time of the inventive method itself causes too greatly.Along with the increase of file size, above-mentioned ratio reduces, and the laser propagation effect that has occurred the inventive method in the time of 2G is two times a better effects of single-site transmission.
(2) file is greater than 100M the time, and conservative balance policy and two kinds of dynamic cooperation distribution methods of the inventive method overall transfer performance are superior to single copy data transmission and other two kinds of collaborative distribution methods.And the inventive method completion transformation task time is the shortest, and transmission performance is optimum.
(3) situation that single copy transmission performance is superior to the collaborative distribution method of tradition has appearred in experimental result.This is because the collaborative distribution method transmission performance of tradition depends on the transmission deadline of last block file burst, when the part server transmission performance is relatively poor, has influenced the overall transfer performance of collaborative distribution.
In the transmission course, experiment is provided with the situation that a certain service end lost efficacy, and the single copy transmission is in the wait state in indefinite duration; And the task that the inventive method can not have the inefficacy service end to accomplish in time is adjusted to end of transmission service end; Improved the reliability of grid data transmission effectively, this point is helpful to the data sharing and the collaborative work of grid system complicated and changeable in the wide scope.
Experiment two, checking copy number of resources is to the inventive method Effect on Performance.
Adopt the inventive method file in download from the server of multiple combination to test, the overall transfer performance is represented with transmission speed.Table 3 has been listed the Servers-all combination that experiment is used.For example server is combined as B, and the D region representation is from B, file in download on two region server of D.
The combination of table 3 copy Resource Server
Server combination title |
The server combination |
B |
The B zone |
D |
The D zone |
BD |
B, the D zone |
CD |
C, the D zone |
BCD |
B, C, D zone |
Through testing two results, we can obtain as drawing a conclusion:
(1) less during like 10MB when file size, the inventive method does not have advantage.
(2) in most of the cases, the inventive method overall performance improves along with the increase of copy number of resources.
When (3) experimental result show to be selected C, two replica servers of D, it is best that the inventive method overall performance reaches, and when increasing the D server, performance descends on the contrary.This means that increasing the copy number of resources might not improve overall performance.Transmission performance reaches best so we must select suitable copy number of resources.
Can find out that from top test the method for the grid environment lower network transmission that the present invention proposes has better efficiency of transmission.