CN105959423A - Method for remotely transmitting large number of small files - Google Patents

Method for remotely transmitting large number of small files Download PDF

Info

Publication number
CN105959423A
CN105959423A CN201610585749.6A CN201610585749A CN105959423A CN 105959423 A CN105959423 A CN 105959423A CN 201610585749 A CN201610585749 A CN 201610585749A CN 105959423 A CN105959423 A CN 105959423A
Authority
CN
China
Prior art keywords
file
network
network packet
read
filename
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610585749.6A
Other languages
Chinese (zh)
Inventor
康炜
闫鹏飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dragon Is Deposited Science And Technology Ltd Co In Beijing
Original Assignee
Dragon Is Deposited Science And Technology Ltd Co In Beijing
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dragon Is Deposited Science And Technology Ltd Co In Beijing filed Critical Dragon Is Deposited Science And Technology Ltd Co In Beijing
Priority to CN201610585749.6A priority Critical patent/CN105959423A/en
Publication of CN105959423A publication Critical patent/CN105959423A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • H04L67/025Protocols based on web technology, e.g. hypertext transfer protocol [HTTP] for remote control or remote monitoring of applications
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer And Data Communications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method for remotely transmitting a large number of small files. The method remotely transmits a large number of smalsynchronization files located in a first computing device to a second computing device. The method comprises the following steps that the small files located in the first computing device are opened, and contents of the files are read; the read files are added into a network packet; files in the network packet are simultaneously transmitted to the second computing device through a network; the former steps are repeatedly conducted till all the files are completely transmitted. According to the method for remotely transmitting a large number of the small files, a repeated operating cycle is added on the basis of original processes, by means of a new synchronization protocol, contents of the network packet are defined, multiple sets of file operations are contained, network communications needing to be conducted multiple times originally are combined to be completed at a time, and the use ratio of network bandwidth is fully increased; meanwhile, the influence of file IO delay on system performance is eliminated by means of a multi-thread IO multiplexing technology. By means of the method, the remote transmission efficiency of a huge number of small files is significantly improved.

Description

A kind of method of large amount of small documents remote transmission
Technical field
The present invention relates to technical field of the computer network, particularly a kind of large amount of small documents remotely passes Defeated method.
Background technology
Along with the quick growth of unstructured data, computer system is created that the least File.In order to improve data security row, these files there is also synchronization, backup etc. not The demand of transmission between same computer system.
The at present transmission of mass small documents mainly has two class methods:
(1) according to the list of small documents, the instrument biographies one by one such as rsync or ftp are used Defeated file.This method generally cannot effectively utilize the network bandwidth, because network transmission bandwidth Positive correlation is had, i.e. along with network service bag with the size of the message bag of transmission over networks Increasing, the network bandwidth also follows increase, if so the transmission of file the most one by one, meeting Being limited by the network bandwidth, efficiency of transmission is the lowest.
(2) in advance small documents entirety is packaged into a big file, transmits the most again.This After small documents entirety is packed by mode, the size of the bag that network sends can be increased, thus improve File transmission efficiency, but this mode needs extra memory space after storing packing File, cannot use in the case of limited storage space.Additionally the packing and unpacking of file is also Need waste to calculate resource, also extend whole transmitting procedure.
Summary of the invention
In order to overcome the deficiencies in the prior art, the invention provides a kind of large amount of small documents and remotely pass Defeated method, is effectively increased the efficiency of transmission of mass small documents.
The technical solution adopted for the present invention to solve the technical problems is:
A kind of method that present invention firstly provides large amount of small documents remote transmission, it will be located in first The large amount of small documents remote transmission of calculating equipment calculates equipment to second, comprises the steps:
Step S11: open and be positioned at the small documents of the first calculating equipment and read file content;
Step S12: the file of reading is joined network packet;
Step S13: the file in network packet is calculated through network transmission to described second simultaneously and sets Standby;
Step S14: repeat step S11~S14, until All Files end of transmission.
Preferably, before described step S11, also include: opened file folder, it is judged that file It is complete whether name reads, if filename reads complete, end operation, if filename does not read Complete then execution step S11 after reading filename.
Preferably, before described step S13, also include: judge that network packet is the fullest, If then performing step S13, if otherwise repeating step S11 and step S12 until filling up network Bag.
Preferably, described step S13 specifically includes: second calculates equipment receives preservation file Request;Create multiple files corresponding in network packet;The content of write respective file;Operation knot Really list returns the first calculating equipment.
The present invention also provides for a kind of method of large amount of small documents remote transmission, and it will be located in the first meter The large amount of small documents remote transmission of calculation equipment calculates equipment to second, comprises the steps:
Step S21: opened file folder, it is judged that it is complete whether filename reads, if filename is read Take Bi Ze end operation, if filename does not read complete, read filename;
Step S22: distribute some worker threads;
Step S23: perform to operate as follows in units of single thread in each worker thread: Open and read the file of filename and read file content;The file of reading is joined network Bag;File in network packet is calculated equipment through network transmission to described second simultaneously.
Step S24: return step S21, until All Files end of transmission.
Preferably, in described step S23, the file in network packet is being transmitted through network simultaneously Before described second calculating equipment, also include, it is judged that network packet is the fullest, if otherwise weighing The re-reading operation taking file content and the file of reading joining network packet, until filling up network Bag.
The positive effect of the present invention: the present invention adds one in original flow process and heavily operates circulation, By new synchronous protocol, define network packet content, comprise and organize file operation more so that be original Need network service repeatedly to merge into once, substantially increase network bandwidth utilization factor;Simultaneously The present invention passes through multi-threaded I/O multiplex technique so that CPU released thread waits IO when Release and use to other worker threads, thus disappeared except file I/O postpones for systematic function Impact.The Remote transmission efficiency that the invention enables mass small documents has had and has been obviously improved.
Accompanying drawing explanation
Fig. 1 is the schematic flow sheet of the embodiment of the present invention 1;
Fig. 2 is the schematic flow sheet of the embodiment of the present invention 2;
Fig. 3 is the schematic flow sheet of the embodiment of the present invention 3;
Fig. 4 is the definition figure of embodiment of the present invention small file host-host protocol network packet;
Test knot when Fig. 5 is to use rsync and the inventive method transmission 1,000,000 16K file Really.
Detailed description of the invention
Below in conjunction with the accompanying drawings to a preferred embodiment of the present invention will be described in detail.
Embodiment 1
With reference to Fig. 1, the embodiment of the present invention 1 provides a kind of method of large amount of small documents remote transmission, Assume that the file needing to synchronize only has one, be entirely common small documents, here little below File is primarily referred to as the file less than 256KB, and maximum not can exceed that the file of 512KB, its In the size of each small documents may be different, it is also possible to the same.By local source server one All small documents under file are transferred on another remote server, need through following step Rapid:
Step S01: opened file folder, it is judged that it is complete whether filename reads, if filename is read Take Bi Ze end operation, if filename does not read complete, after performing after reading filename Continuous step;
Step S02: open and read the file of filename and read file content;
Step S03: file content is transmitted to remote server through network;
Step S04: repeat step S01~S03, until All Files end of transmission.
Described step S03 specifically includes: remote server receives the request preserving file;Create Respective file;The content of write respective file;Operating result list returns source server.
In the present embodiment, multiple operations therein need to use file operation or network operation This I/O operation at a slow speed, cannot continue executing with follow-up step waiting operating result response when Suddenly, its efficiency of transmission is poor.
Embodiment 2
With reference to Fig. 2, for promoting the whole structure of small documents transmission, the embodiment of the present invention 2 provides A kind of method of large amount of small documents remote transmission, it is big that it will be located under source server file Amount small documents remote transmission, to remote server, comprises the steps:
Step S11: open and be positioned at the small documents of source server and read file content;
Step S12: the file of reading is joined network packet;
Step S13: by the file in network packet simultaneously through network transmission extremely described remote service Device;
Step S14: repeat step S11~S14, until All Files end of transmission.
Before described step S11, also include: opened file folder, it is judged that whether filename is read Take complete, if filename reads complete, end operation, if filename does not read complete, Step S11 is performed after reading filename.
Before described step S13, also include: judge that network packet is the fullest, if then holding Row step S13, if otherwise repeating step S11 and step S12 until filling up network packet.
Described step S13 specifically includes: remote server receives the request preserving file;Create Multiple files corresponding in network packet;The content of write respective file;Operating result list returns Source server.
The present embodiment uses dynamic self-adapting group bag bulk transfer mode, described in embodiment 1 Add one in original flow process and heavily operate circulation, by new synchronous protocol, in definition network packet Hold (as shown in Figure 4), comprise and organize file operation more so that original needs network repeatedly leads to Letter is merged into once, substantially increases network bandwidth utilization factor.
One typical trivial file transport protocol network packet is as shown in Figure 4.First it is whole synchronization Software protocol network message controls information, comprises the information such as message header, verification, protocol specification. Next being exactly the file group information needing transmission, each file comprises metadata and data portion Point.Wherein metadata comprises the files such as file path, file name, size, owner, time Attribute.File content is file content to be transmitted.
Judge the standard that network packet is the fullest:
The network condition disposed according to application, network packet can choose a suitable size.Often After secondary file metadata and data are attached to network packet, all can update network packet long Degree.Upper once need appended document data before, all can calculate network packet remaining space whether foot Enough.It is judged as current network bag if deficiency the fullest, enters transmission flow.
Embodiment 3
With reference to Fig. 3, for promoting the whole structure of small documents transmission, the embodiment of the present invention further The 3 a kind of methods providing large amount of small documents remote transmission, it will be located in source server file Under large amount of small documents remote transmission to remote server, comprise the steps:
Step S21: opened file folder, it is judged that it is complete whether filename reads, if filename is read Take Bi Ze end operation, if filename does not read complete, read filename;
Step S22: distribute some worker threads;
Step S23: perform to operate as follows in units of single thread in each worker thread: Open and read the file of filename and read file content;The file of reading is joined network Bag;By the file in network packet simultaneously through network transmission extremely described remote server.
Step S24: return step S21, until All Files end of transmission.
In described step S23, by the file in network packet simultaneously through network transmission to the most described the Before two calculating equipment, also include, it is judged that network packet is the fullest, if otherwise repeating to read literary composition The file of reading is also joined the operation of network packet, until filling up network packet by part content.
The present embodiment uses multithreading dynamically to distribute I O multiplexing transmission means, in a thread Portion, need nonetheless remain for waiting I/O latency, by multi-threaded I/O multiplexing skill during being IO Art, makes so that CPU discharged thread waits IO when to other worker threads With, thus eliminate file I/O and postpone the impact for systematic function.
By the transmission method of the present invention, mass small documents transmission performance is obviously improved, contrast Test result as it is shown in figure 5, wherein test environment as follows:
CPU:Intel E5 is to strong series
Internal memory: 16G
Operating system: CentOS6.4
File system: the LeoFS of 12 dish configurations
Network connects: single ten thousand mbit ethernets connect
Above-described only the preferred embodiments of the present invention, be it should be understood that above enforcement The explanation of example is only intended to help to understand method and the core concept thereof of the present invention, is not used to limit Determining protection scope of the present invention, that is done within all thought in the present invention and principle any repaiies Change, equivalent etc., should be included within the scope of the present invention.

Claims (6)

1. a method for large amount of small documents remote transmission, it will be located in the first calculating equipment Large amount of small documents remote transmission calculates equipment to second, it is characterised in that comprise the steps:
Step S11: open and be positioned at the small documents of the first calculating equipment and read file content;
Step S12: the file of reading is joined network packet;
Step S13: the file in network packet is calculated through network transmission to described second simultaneously and sets Standby;
Step S14: repeat step S11~S14, until All Files end of transmission.
The method of a kind of large amount of small documents remote transmission the most according to claim 1, its It is characterised by: before described step S11, also include: opened file folder, it is judged that filename Whether read complete, if filename reads complete, end operation, if filename has not read Bi Ze performs step S11 after reading filename.
The method of a kind of large amount of small documents remote transmission the most according to claim 1, its It is characterised by: before described step S13, also include: judge that network packet is the fullest, if It is then to perform step S13, if otherwise repeating step S11 and step S12 until filling up network packet.
The method of a kind of large amount of small documents remote transmission the most according to claim 1, its It is characterised by: described step S13 specifically includes: second calculates equipment receives asking of preservation file Ask;Create multiple files corresponding in network packet;The content of write respective file;Operating result List returns the first calculating equipment.
5. a method for large amount of small documents remote transmission, it will be located in the first calculating equipment Large amount of small documents remote transmission calculates equipment to second, peculiar is, comprises the steps:
Step S21: opened file folder, it is judged that it is complete whether filename reads, if filename is read Take Bi Ze end operation, if filename does not read complete, read filename;
Step S22: distribute some worker threads;
Step S23: perform to operate as follows in units of single thread in each worker thread: Open and read the file of filename and read file content;The file of reading is joined network Bag;File in network packet is calculated equipment through network transmission to described second simultaneously.
Step S24: return step S21, until All Files end of transmission.
The method of a kind of large amount of small documents remote transmission the most according to claim 5, its It is characterised by: in described step S23, the file in network packet is being transmitted extremely through network simultaneously Before described second calculating equipment, also include, it is judged that network packet is the fullest, if otherwise repeating Read file content and the file of reading joined the operation of network packet, until filling up network Bag.
CN201610585749.6A 2016-07-22 2016-07-22 Method for remotely transmitting large number of small files Pending CN105959423A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610585749.6A CN105959423A (en) 2016-07-22 2016-07-22 Method for remotely transmitting large number of small files

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610585749.6A CN105959423A (en) 2016-07-22 2016-07-22 Method for remotely transmitting large number of small files

Publications (1)

Publication Number Publication Date
CN105959423A true CN105959423A (en) 2016-09-21

Family

ID=56901406

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610585749.6A Pending CN105959423A (en) 2016-07-22 2016-07-22 Method for remotely transmitting large number of small files

Country Status (1)

Country Link
CN (1) CN105959423A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678491A (en) * 2013-11-14 2014-03-26 东南大学 Method based on Hadoop small file optimization and reverse index establishment
CN103701860A (en) * 2013-12-06 2014-04-02 北京奇虎科技有限公司 Network transmission and receiving methods and devices for small files, and network transmission system
CN105069048A (en) * 2015-07-23 2015-11-18 东方网力科技股份有限公司 Small file storage method, query method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678491A (en) * 2013-11-14 2014-03-26 东南大学 Method based on Hadoop small file optimization and reverse index establishment
CN103701860A (en) * 2013-12-06 2014-04-02 北京奇虎科技有限公司 Network transmission and receiving methods and devices for small files, and network transmission system
CN105069048A (en) * 2015-07-23 2015-11-18 东方网力科技股份有限公司 Small file storage method, query method and device

Similar Documents

Publication Publication Date Title
US9336040B2 (en) Techniques for remapping sessions for a multi-threaded application
US9888048B1 (en) Supporting millions of parallel light weight data streams in a distributed system
CN106506587A (en) A kind of Docker image download methods based on distributed storage
CN103973560B (en) A kind of method and apparatus that link failure processing is stacked in IRF systems
WO2016115831A1 (en) Fault tolerant method, apparatus and system for virtual machine
US8533254B1 (en) Method and system for replicating content over a network
CN104219298B (en) Group system and its method for data backup
US20170048352A1 (en) Computer-readable recording medium, distributed processing method, and distributed processing device
CN107689976A (en) A kind of document transmission method and device
CN107360233A (en) Method, apparatus, equipment and the readable storage medium storing program for executing that file uploads
CN111490963A (en) Data processing method, system, equipment and storage medium based on QUIC protocol stack
CN106164888A (en) The sequencing schemes of network and storage I/O request for minimizing interference between live load free time and live load
CN105827678A (en) High-availability framework based communication method and node
CN107632780A (en) A kind of roll of strip implementation method and its storage architecture based on distributed memory system
CN109688606A (en) Data processing method, device, computer equipment and storage medium
CN113938379A (en) Method for dynamically loading cloud platform log acquisition configuration
Anderson et al. Algorithms for data migration
CN103677983A (en) Scheduling method and device of application
CN109939441A (en) Using discs verifying method and system
JP2013543169A (en) System including middleware machine environment
CN105959423A (en) Method for remotely transmitting large number of small files
CN110417860A (en) File transfer management method, apparatus, equipment and storage medium
CN114490458B (en) Data transmission method, chip, server and storage medium
CN102710772B (en) A kind of mass data communication system based on cloud platform
US10178014B2 (en) File system, control program of file system management device, and method of controlling file system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20160921