CN103986744A - Throughput-based file parallel transmission method - Google Patents

Throughput-based file parallel transmission method Download PDF

Info

Publication number
CN103986744A
CN103986744A CN201310578190.0A CN201310578190A CN103986744A CN 103986744 A CN103986744 A CN 103986744A CN 201310578190 A CN201310578190 A CN 201310578190A CN 103986744 A CN103986744 A CN 103986744A
Authority
CN
China
Prior art keywords
throughput
connection
transmission
files
time period
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310578190.0A
Other languages
Chinese (zh)
Other versions
CN103986744B (en
Inventor
王俊峰
牟璇
黄一辛
王敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN201310578190.0A priority Critical patent/CN103986744B/en
Publication of CN103986744A publication Critical patent/CN103986744A/en
Application granted granted Critical
Publication of CN103986744B publication Critical patent/CN103986744B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a throughput-based file parallel transmission method. The method comprises the following steps: a step for extracting file size and dividing file blocks, a step for establishing parallel connections, a step for comparing the number of the file blocks and the number of the parallel connections, a step for charring out parallel transmission of the file blocks, a step for measuring and calculating throughput, and a step for adjusting a parallel transmission degree according to the throughput. The advantages are as follows: 1, for end-to-end files, the transmission performance can be substantially improved; 2, the method has quite high universality, is not redistricted to specific network environments, systems and hardware environments, and can improve network throughput by applying the scheme provided by the invention; and 3, the degree of parallelism is adjusted in real time by use of the throughput, so that the method can be adaptive to the change of the network environments, and the network bandwidth can be fully utilized.

Description

File in parallel transmission method based on throughput
Technical field
The present invention relates to technical field of the computer network, is specifically a kind of file in parallel transmission method based on throughput.
Background technology
Along with the development of the communication technology, computer technology and internet technique, the Internet is just towards future developments such as high bandwidth, long delay, intelligent radio, space communications; The mobile terminal devices such as smart mobile phone are constantly updated, and internet, applications data are risen suddenly and sharply day by day; The magnanimity scientific research data such as high-energy physics, astronomical observation, aviation all propose higher requirement to Internet Transmission by the development of new application model such as constantly generation and distributed network, cloud computing etc.Present stage, network configuration was relatively stable, and it is perfect that procotol has been tending towards, how to utilize to greatest extent conventional network resources, improved the transmission speed of file, had important research and wide using value prospect.Parallel data transmission technology belongs to bandwidth polymerization technique, refers to use many connections to carry out transfer of data between source host and destination host simultaneously, can solve the inefficient transmission problem of single connection, significantly improves network throughput and efficiency of transmission end to end.
Research parallel transmission technology concentrates on three layers in theory: application layer, transport layer and data link layer.There are at present many application layer protocols to be devoted to parallel TCP (the Transmission Control Protocol) stream of research and utilization, as grid data transmission protocol GridFTP (Grid File Transfer Protocol).Due to the transmission means of traditional F TP (File Transfer Protocol) single connection can not adaptive mess in the quick transmission storage of large-scale data, GridFTP expands based on FTP comprehensively, by the expansion to FTP order and passage, support parallel data to transmit, data are transmitted in a plurality of TCP connections simultaneously, and the performance of transfer of data is significantly improved.In transport layer, realizing end-to-end parallel transmission mainly contains based on transmission control protocol (TCP) with based on SCTP (SCTP (Stream Control Transmission Protocol)).Theoretical circles once proposed a kind of MulTCP method of real parallel flow that replaces by N bar virtual stream at TCP layer, and parallel TCP thought is realized in the transmission of a TCP stream.Stochastic TCP is also based on MulTCP algorithm, MulTCP is the set as the virtual TCP of N bar by congestion window, and think that this N bar TCP stream is identical, Stochastic TCP thinks that this N TCP stream is different, the size of the window of each virtual stream is random, tackles each independent operation.SCTP has multifrequency nature, one of them key property is to support multithread, and the data of SCTP can send in different data flow, have improved data throughout, and can use other paths to carry out transfer of data when main path failure, the reliability of assurance business transmission.Bandwidth in data link layer for a plurality of network interfaces of polymerization, the bonding technology of Linux can be bound into a plurality of network interfaces a virtual interface, thus user data is realized load balancing and bandwidth polymerization according to certain algorithmic dispatching between each interface.IPMP in Solaris (IP (Internet Protocol) network multipathing) has realized in the bandwidth polymerization of many interfaces of SUN operating system and parallel data transmission.
The research of above three levels, the research in application layer and application need to apply in specific network environment; The research of transport layer need to be done corresponding change to kernel, just rests on not large-scale popularization in theoretical research at present; The parallel research of data link layer needs the support of extra hardware; The parallel transmission end to end that some researchs are not suitable for domestic consumer to carry out above.
Summary of the invention
The object of this invention is to provide and a kind ofly in application layer, realize the file in parallel transmission method based on throughput, utilize to greatest extent conventional network resources, improve the transmission speed of file.
Technical scheme of the present invention is as follows: a kind of file in parallel transmission method based on throughput, comprises
Step 1: the big or small FileSize that extracts file to be transmitted; Size is set for the blocks of files of SegmentSize; By Divide File to be transmitted, be m blocks of files,
Step 2: set up n connection;
Step 3: if m < is n, use m to connect m blocks of files of parallel transmission, until All Files piece end of transmission; Otherwise go to step 4;
Step 4: choose n piece from m piece, use n to connect these blocks of files of parallel transmission, be made as transmission degree of parallelism n; The connection that each connection is set is masked as true; When parallel transmission starts, start timing, after duration t, stop and reclocking, obtain time period k, k=1,2 ..., N;
Step 5: the throughput parameter of measurements and calculations parallel transmission, comprises
501: measure the valid data amount that each connects transmission:
The valid data amount that connection i transmits at time period k is D (i, k), i=1, and 2 ..., n;
502: the throughput of calculating each connection:
Connecting i in the throughput of time period k is
503: the total throughout that calculates all connections:
All total throughouts that are connected to time period k are
504: calculate level and smooth throughput:
Smooth_throughput (k+1)=smooth_throughput (k)+α all_throughput (k+1), wherein, α is smoothing factor, smooth_throughput (1)=all_throughput (1);
505: the average throughput that calculates level and smooth rear each connection:
average _ throughput ( k ) = smooth _ throught ( k ) n ( k ) ;
506: calculation expectation throughput:
expect_throughput(k+1)=smooth_throughput(k)+Dev(k),
Wherein, Dev (k) is the deviation variables of time period k,
Dev (k+1)=(1-β) Dev (k)+β | smooth_throughput (k+1)-all_throughput (k+1) |, wherein,
β is the smoothing factor of deviation variables, Dev ( 1 ) = all _ throughput ( 1 ) 2 ;
Step 6: according to throughput parameter adjustment transmission degree of parallelism, comprise
601: judge whether the total throughout of time period k+1 is greater than the total throughout of time period k, continue in this way, otherwise the connection sign that throughput in time period k is less than those connections of average throughput in time period k is set to false, go to step afterwards 7;
602: judge whether the level and smooth throughput of time period k+1 is greater than the expectation throughput of this time period, newly-built n the blocks of files of connecting parallel transmission not transmit in this way, transmission degree of parallelism n'=2n after adjusting; As otherwise be a newly-built blocks of files that connection comes parallel transmission not transmit, transmission degree of parallelism n'=n+1 after adjusting;
Step 7: when any connection of parallel transmission transfers after a blocks of files, whether the connection sign that detects this connection is true is chosen a not blocks of files for transmission in this way from m piece, uses this connections to transmit, otherwise cancels this connection; After cancelling a connection, transmit degree of parallelism n'=n-1;
Step 8: repeating step 5 is to step 7, until all blocks of files end of transmissions.
In technique scheme, sliding factor-alpha equals 0.5, and the smoothing factor β of described deviation variables equals 0.8, and described connection is based on FTP.
The invention has the beneficial effects as follows: 1, for file transfer performance end to end, increase significantly; 2, there is good universality, be not limited to specific network environment, system and hardware environment, can both apply the throughput that the solution of the present invention improves network; 3, adopt throughput to carry out real-time adjustment degree of parallelism, to adapt to the variation of network environment, can utilize fully the network bandwidth.
Accompanying drawing explanation
Fig. 1 is when network condition is better, the laser propagation effect comparison of method of the present invention and SmartFTP;
Fig. 2 is when network condition is poor, the laser propagation effect comparison of method of the present invention and SmartFTP.
Embodiment
File in parallel transmission method based on throughput is for end-to-end file transfer, and either party can, as customer side or service end, can utilize the method to carry out propelling movement and the transmission of file.In this method, each connection is based on ftp agreement.Step 1: the big or small FileSize that extracts file to be transmitted; Size is set for the blocks of files of SegmentSize; By Divide File to be transmitted, be m blocks of files, extraction document when size, if pushing files cuts apart according to size file virtually in this locality, and each piece is increased to attribute block numbering, the initial pointer of piece and block end pointer etc.From the other side, transmit data in this way, the size that information that file is relevant arranges piece is obtained in connection of model.
Step 2: set up n connection;
Step 3: if m < is n, use m to connect m blocks of files of parallel transmission, until All Files piece end of transmission; Otherwise go to step 4;
Step 4: choose n piece from m piece, use n to connect these blocks of files of parallel transmission, be made as transmission degree of parallelism n; The connection that each connection is set is masked as true; When parallel transmission starts, start timing, after duration t, stop and reclocking, obtain time period k, k=1,2 ..., N; Wherein, duration t is much smaller than the transmission time of blocks of files.
Step 5: the throughput parameter of measurements and calculations parallel transmission, comprises
501: measure the valid data amount that each connects transmission:
The valid data amount that connection i transmits at time period k is D (i, k), i=1, and 2 ..., n;
502: the throughput of calculating each connection:
Connecting i in the throughput of time period k is
503: the total throughout that calculates all connections:
All total throughouts that are connected to time period k are
504: calculate level and smooth throughput:
Smooth_throughput (k+1)=smooth_throughput (k)+α all_throughput (k+1), wherein, α is smoothing factor, smooth_throughput (1)=all_throughput (1); Here, α value is 0.5.
505: the average throughput that calculates level and smooth rear each connection:
average _ throughput ( k ) = smooth _ throught ( k ) n ( k ) ;
506: calculation expectation throughput:
expect_throughput(k+1)=smooth_throughput(k)+Dev(k),
Wherein, Dev (k) is the deviation variables of time period k,
Dev (k+1)=(1-β) Dev (k)+β | smooth_throughput (k+1)-all_throughput (k+1) |, wherein, the smoothing factor that β is deviation variables, here, β value is 0.8.
Step 6: according to throughput parameter adjustment transmission degree of parallelism, comprise
601: judge whether the total throughout of time period k+1 is greater than the total throughout of time period k, continue in this way, otherwise the connection sign that throughput in time period k is less than those connections of average throughput in time period k is set to false, go to step afterwards 7;
602: judge whether the level and smooth throughput of time period k+1 is greater than the expectation throughput of this time period, newly-built n the blocks of files of connecting parallel transmission not transmit in this way, transmission degree of parallelism n'=2n after adjusting; As otherwise be a newly-built blocks of files that connection comes parallel transmission not transmit, transmission degree of parallelism n'=n+1 after adjusting;
Step 7: when any connection of parallel transmission transfers after a blocks of files, whether the connection sign that detects this connection is true is chosen a not blocks of files for transmission in this way from m piece, uses this connections to transmit, otherwise cancels this connection; After cancelling a connection, transmit degree of parallelism n'=n-1;
Step 8: repeating step 5 is to step 7, until all blocks of files end of transmissions.
Fig. 1 shows when network condition is better, adopts the effect comparison of method transfer files of the present invention (transmission method of the present invention represents with throughputFTP) and SmartFTP transfer files.As can be seen from the figure,, when network condition is better, while utilizing method of the present invention to transmit large file (the big or small FileSize of file is greater than 160MB), the transmission time of file (Transmission time) significantly shortens.Fig. 2 shows when network condition is poor, adopts the effect comparison of method transfer files of the present invention and SmartFTP transfer files.Can find out, even if file is less, the transmission time of method of the present invention also has obvious advantage.

Claims (3)

1. the file in parallel transmission method based on throughput, is characterized in that, comprises
Step 1: the big or small FileSize that extracts file to be transmitted; Size is set for the blocks of files of SegmentSize; By Divide File to be transmitted, be m blocks of files,
Step 2: set up n connection;
Step 3: if m < is n, use m to connect m blocks of files of parallel transmission, until All Files piece end of transmission; Otherwise go to step 4;
Step 4: choose n piece from m piece, use n to connect these blocks of files of parallel transmission, be made as transmission degree of parallelism n; The connection that each connection is set is masked as true; When parallel transmission starts, start timing, after duration t, stop and reclocking, obtain time period k, k=1,2 ..., N;
Step 5: the throughput parameter of measurements and calculations parallel transmission, comprises
501: measure the valid data amount that each connects transmission:
The valid data amount that connection i transmits at time period k is D (i, k), i=1, and 2 ..., n;
502: the throughput of calculating each connection:
Connecting i in the throughput of time period k is
503: the total throughout that calculates all connections:
All total throughouts that are connected to time period k are
504: calculate level and smooth throughput:
Smooth_throughput (k+1)=smooth_throughput (k)+α all_throughput (k+1), wherein, α is smoothing factor, smooth_throughput (1)=all_throughput (1);
505: the average throughput that calculates level and smooth rear each connection:
average _ throughput ( k ) = smooth _ throught ( k ) n ( k ) ;
506: calculation expectation throughput:
expect_throughput(k+1)=smooth_throughput(k)+Dev(k),
Wherein, Dev (k) is the deviation variables of time period k,
Dev (k+1)=(1-β) Dev (k)+β | smooth_throughput (k+1)-all_throughput (k+1) |, wherein, the smoothing factor that β is deviation variables,
Step 6: according to throughput parameter adjustment transmission degree of parallelism, comprise
601: judge whether the total throughout of time period k+1 is greater than the total throughout of time period k, continue in this way, otherwise the connection sign that throughput in time period k is less than those connections of average throughput in time period k is set to false, go to step afterwards 7;
602: judge whether the level and smooth throughput of time period k+1 is greater than the expectation throughput of this time period, newly-built n the blocks of files of connecting parallel transmission not transmit in this way, transmission degree of parallelism n'=2n after adjusting; As otherwise be a newly-built blocks of files that connection comes parallel transmission not transmit, transmission degree of parallelism n'=n+1 after adjusting;
Step 7: when any connection of parallel transmission transfers after a blocks of files, whether the connection sign that detects this connection is true is chosen a not blocks of files for transmission in this way from m piece, uses this connections to transmit, otherwise cancels this connection; After cancelling a connection, transmit degree of parallelism n'=n-1;
Step 8: repeating step 5 is to step 7, until all blocks of files end of transmissions.
2. parallel transmission method as claimed in claim 1, is characterized in that, described smoothing factor α equals 0.5, and the smoothing factor β of described deviation variables equals 0.8.
3. any one parallel transmission method as claimed in claim 1 or 2, is characterized in that, described connection is based on FTP.
CN201310578190.0A 2013-11-18 2013-11-18 Throughput-based file parallel transmission method Active CN103986744B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310578190.0A CN103986744B (en) 2013-11-18 2013-11-18 Throughput-based file parallel transmission method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310578190.0A CN103986744B (en) 2013-11-18 2013-11-18 Throughput-based file parallel transmission method

Publications (2)

Publication Number Publication Date
CN103986744A true CN103986744A (en) 2014-08-13
CN103986744B CN103986744B (en) 2017-02-08

Family

ID=51278567

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310578190.0A Active CN103986744B (en) 2013-11-18 2013-11-18 Throughput-based file parallel transmission method

Country Status (1)

Country Link
CN (1) CN103986744B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107453944A (en) * 2017-07-07 2017-12-08 上海斐讯数据通信技术有限公司 A kind of method and system for the optimal test connection number for determining network throughput test
CN112019447A (en) * 2020-08-19 2020-12-01 博锐尚格科技股份有限公司 Data flow control method, device, system, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030223430A1 (en) * 2002-06-04 2003-12-04 Sandeep Lodha Distributing unused allocated bandwidth using a borrow vector
CN101133599A (en) * 2004-12-24 2008-02-27 阿斯帕拉公司 Bulk data transfer
CN101136791A (en) * 2006-11-16 2008-03-05 中兴通讯股份有限公司 File transfer protocol based network throughput testing approach
CN101616077A (en) * 2009-07-29 2009-12-30 武汉大学 The rapid transmission method of the big file in the Internet

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030223430A1 (en) * 2002-06-04 2003-12-04 Sandeep Lodha Distributing unused allocated bandwidth using a borrow vector
CN101133599A (en) * 2004-12-24 2008-02-27 阿斯帕拉公司 Bulk data transfer
CN101136791A (en) * 2006-11-16 2008-03-05 中兴通讯股份有限公司 File transfer protocol based network throughput testing approach
CN101616077A (en) * 2009-07-29 2009-12-30 武汉大学 The rapid transmission method of the big file in the Internet

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107453944A (en) * 2017-07-07 2017-12-08 上海斐讯数据通信技术有限公司 A kind of method and system for the optimal test connection number for determining network throughput test
CN112019447A (en) * 2020-08-19 2020-12-01 博锐尚格科技股份有限公司 Data flow control method, device, system, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN103986744B (en) 2017-02-08

Similar Documents

Publication Publication Date Title
CN104734946A (en) Multi-tenant high-concurrency instant messaging cloud platform
CN101945103B (en) IP (Internet Protocol) network application accelerating system
CN102347876B (en) Multilink aggregation control device for cloud computing network
CN102263825A (en) Cloud-position-based hybrid cloud storage system data transmission method
CN103812949A (en) Task scheduling and resource allocation method and system for real-time cloud platform
CN103746938A (en) Method and device for transmitting data packet
CN104580503A (en) Efficient dynamic load balancing system and method for processing large-scale data
CN103986783A (en) Cloud computing system
CN104092758A (en) Distributed high-speed cloud storage server cluster system and reading method thereof
CN105610992A (en) Task allocation load balancing method for distributed stream computing system
CN103986744A (en) Throughput-based file parallel transmission method
CN103401778A (en) Receiving-end buffer overflow probability guarantee based multi-path transmission packet scheduling method
CN103577161A (en) Big data frequency parallel-processing method
CN205540723U (en) Information retrieval system based on cloud calculates
Zeinali et al. Comprehensive practical evaluation of wired and wireless internet base smart grid communication
CN102946443B (en) Multitask scheduling method for realizing large-scale data transmission
CN117196014B (en) Model training method and device based on federal learning, computer equipment and medium
CN104065719A (en) Variable sampling period scheduler and control method thereof
CN103338156A (en) Thread pool based named pipe server concurrent communication method
CN103532866A (en) Flow control method and system for virtual machine
Yamanaka et al. A TCP/IP-based constant-bit-rate file transfer protocol and its extension to multipoint data delivery
CN105407383A (en) Multi-version video-on-demand streaming media server cluster resource prediction method
CN102075584A (en) Distributed file system and access method thereof
CN103701865A (en) Data transmission method and system
CN102546659A (en) Durable TCP (transmission control protocol) connection method oriented to remote procedure call

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant