CN103546546A - Large-scale cluster file distribution method - Google Patents

Large-scale cluster file distribution method Download PDF

Info

Publication number
CN103546546A
CN103546546A CN201310462061.5A CN201310462061A CN103546546A CN 103546546 A CN103546546 A CN 103546546A CN 201310462061 A CN201310462061 A CN 201310462061A CN 103546546 A CN103546546 A CN 103546546A
Authority
CN
China
Prior art keywords
value
jump
unit interval
node
transmission
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310462061.5A
Other languages
Chinese (zh)
Inventor
柯宗贵
柯宗庆
杨育斌
赵必厦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bluedon Information Security Technologies Co Ltd
Original Assignee
Bluedon Information Security Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bluedon Information Security Technologies Co Ltd filed Critical Bluedon Information Security Technologies Co Ltd
Priority to CN201310462061.5A priority Critical patent/CN103546546A/en
Publication of CN103546546A publication Critical patent/CN103546546A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Computer And Data Communications (AREA)

Abstract

The invention discloses a large-scale cluster file distribution method. A Linux SCP transmission command serves as a network transmission protocol for the method. An SSH runs the SCP command on a node to distribute files according to a specific transmission node sequence. Through the application of a method whereby a master control node and an operating node already receiving the files are used at the same time to transmit the files to an operating node not receiving the files, reliable point-to-point transmission of the files among large-scale clusters is achieved, and parallel lockless quick file distribution is truly achieved.

Description

A kind of method of large-scale cluster distribution of document
Technical field
The present invention relates to field of computer technology, relate in particular to a kind of method of large-scale cluster distribution of document.
Background technology
Along with emergence and the development of cloud computing, mobile Internet, Internet of Things, the epoch of large data arrive.Between cluster, need transfer files frequently, it is especially important that performance seems.The simplest a kind of transmitting file mode is exactly SCP order at present.But, this order be single-point to multicast communication, can not break through the bottleneck of network, between multiple spot or transmit with the network transfer speeds of single-point maximum.Therefore, there is the problem that performance is low.The solution of finding quick distribution of document between a kind of effective large-scale cluster, additional important seems.
At present between different Linux copied files conventional have 3 kinds of methods:
1, use ftp: wherein a linux system is installed FTP Server, and another carries out the copy of file by the Client program of FTP.
2, use samba service: the mode of the similar Windows file of samba copy operates, more convenient.
3, use SCP order: utilize scp order to carry out file copy.
SCP is the file copy that has Security, based on ssh login, operates more convenient.Can copy local file/file and clip to remote machine, also can clip to this locality from remote machine copied files/file.
For example, when previous file copy to a long-range other main frame, can be as issued orders.
scp/home/daisy/full.tar.gz?root@172.16.1.100:/home/root
Then can point out you to input the root user's of that 172.16.1.100 main frame login password in addition, start copy with that.
If think to operate conversely, file is from distance host copy to current system, also very simple.
scp?root@/full.tar.gz172.16.1.100:/home/root/full.tar.gz?home/daisy/full.tar.gz
The defect of said method is: in large cluster the inside, using SCP order is the bulk transfer of one-to-many, can not solve the speed issue of transmission.For example, 1 machine transmits the file of 100mb to 100 machines, based on local area network (LAN) transmission, supposes to be limited to 10mb/s in transmission, uses SCP order simultaneously to 100 machine transfer files.Although SCP is concurrent transmission, the speed of transmission can not be greater than network bottleneck, and therefore, this transmission is consistent with the time of individual transmission substantially, or more permanent, because need competition network transmission speed resource between thread.To 100 machine end of transmissions, need the time to be approximately 100* (100/10)=1000s so, 1000 seconds.
In addition, the file transfer in parallel processing, is a kind of transmission from multi-source to many objects, and current research emphasis is all how to improve the degree of parallelism of transmission, shortens the transmission time.It is parallel that some work is used multifile copy to realize source, or adopt burst transmission, increases duplicate of the document number, improves source degree of parallelism.But it is parallel that destination is not considered in existing research, yet cluster inside can fast transport file, and a plurality of destination nodes can receive the different bursts of file simultaneously, can realize destination parallel.For transmission path, some work adopts the method that multihop path is cut apart to obtain good path; Also some work adopts the method for multi-path and burst transmission to realize the parallel transmission of single source destination.In addition, although be that to using whole deadline of bulk transfer request minimum as dispatching object, but in actual scheduling, what adopt is the dispatching method of minimum job priority, and do not coordinate the bandwidth conflict between a plurality of requests, algorithm is existed aspect system bandwidth utilance certain not enough, thereby affect the whole deadline.
For the problems referred to above, support the On-demand File Transmission Algorithm of many clusters data parallel to propose a File Transmission Algorithm as required (0FT) of supporting many clusters data parallel.First, OFT utilizes the inner shared feature fast of cluster, realizes parallel receive and the assembling of destination.The transmission request that is same cluster by destination node is merged into a request, and a plurality of nodes that this request is distributed in cluster disperse traffic load.While using multihop path to cut apart optimized transmission path, for single source destination, select an optimal path, the jumping figure of optimal path adds that an adjustable extent value (as 2) is as the upper limit of all paths jumping figure, and on the basis of realizing source destination connection, way to acquire is cut apart the performance improvement bringing.When the path conflict between a plurality of requests is processed, according to the traffic load of each request, in proportion for it distributes bandwidth, make the transmission time of a plurality of requests identical as far as possible, thereby shorten the overall transfer time of request in batches.But may there is the problem of resource contention and deadlock in this method.
In order to overcome the defect of prior art, the present invention proposes a kind of method of large-scale cluster distribution of document, the method is based on SCP order, according to interstitial content, calculate transmission node sequence, within the unit interval, SSH is above node, point-to-point transmission file concurrently, in the individual unit interval, complete the transmission of N node.
Summary of the invention
The present invention is in order to solve in prior art main controlled node to working node Transmit message, use one-to-many sends, the shortcoming that time is slow or deficiency, the working node that has adopted main controlled node, received file is simultaneously to not receiving the method for the working node Transmit message of file, thereby realized the object of the quick distribution of document of large-scale cluster.
A method for large-scale cluster distribution of document, its network transmission protocol is the SCP transmission command of Linux, according to specific transmission node sequence, SSH moves the distribute work that file is carried out in SCP order above node, is specially:
S1: establishing clustered node number is N;
S2: according to N value, calculate unit interval n;
S3: establish unit interval i and equal 1;
S4: the size of judgement i and n: when i is not more than n, jump to S5, otherwise, jump to S8;
S5: according to i and N value, calculate transmission node sequence;
S6: according to the order of transmission node sequence, SSH is to source node, to destination node SCP file respectively;
S7:i increases by 1 unit interval, jumps to S4;
S8: file distributing finishes.
The beneficial effect that technical solution of the present invention is brought:
1, distribution of document fast between cluster
Suppose that interstitial content is N, only need cost
Figure BDA0000390627500000031
the individual unit interval, just can file all be distributed complete.For example, interstitial content is 1000, and file, all distribution is complete, only needs
Figure BDA0000390627500000032
the individual unit interval.
And traditional SCP algorithm needs 1000-1=999 unit interval.If a plurality of nodes are transmitted with multithreading is next concurrent, owing to being multi-thread concurrent transmission, therefore there is resource contention and Deadlock, the unit interval of transmission, should be not less than 999 unit interval.
2, the reliable point-to-point transmission of file between cluster
Because this method is based on SCP agreement, be therefore reliable point-to-point transmission.
The system module of researching and developing based on the present invention, the performance issue ,Gei enterprise that can solve quick distribution of document between cluster brings certain economic benefit.
3, real parallel nothing lock transmits file fast
Because this method is the SCP transmission of single node to single node, therefore, there is not resource contention and Deadlock.In addition, within the unit interval, by source node, to destination node part, see it is serial, cluster angle, walk abreast.
Accompanying drawing explanation
In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, to the accompanying drawing of required use in embodiment or description of the Prior Art be briefly described below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skills, do not paying under the prerequisite of creative work, can also obtain according to these accompanying drawings other accompanying drawing.
Fig. 1 is method flow diagram of the present invention;
Fig. 2 be in the present invention in the unit interval 1 in n, between clustered node between the source node of transfer files and destination node, be related to schematic diagram;
Fig. 3 is the flow chart that calculates the algorithm of unit interval n in the present invention;
Fig. 4 is the flow chart of algorithm that calculates the transmission node sequence of unit interval i in the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is only the present invention's part embodiment, rather than whole embodiment.Embodiment based in the present invention, those of ordinary skills, not making the every other embodiment obtaining under creative work prerequisite, belong to the scope of protection of the invention.
The network transmission protocol of the inventive method is the SCP transmission command of Linux, and according to specific transmission node sequence, SSH moves the distribute work that file is carried out in SCP order above node.
Between hypothesis set group node, realized in the present invention SSH without cryptographic acess.
Clustered node number: the number of the node of group system, hereinafter to be referred as interstitial content.General clustered node number N value is greater than or equal to 3.
Source node: the node of Transmit message.
Destination node: the node that receives file.
Transmission node sequence: the node sequence from source node Transmit message to destination node.
The transmission node sequence of unit interval i: the node sequence from source node Transmit message to destination node when unit interval i.
The flow chart of the inventive method as shown in Figure 1, is specially:
(1) establishing clustered node number is N;
(2), according to N value, calculate unit interval n;
(3) establish unit interval i and equal 1;
(4) size of judgement i and n: when i is not more than n, jump to (5), otherwise, jump to (8);
(5) according to i and N value, calculate transmission node sequence;
(6), according to the order of transmission node sequence, SSH is to source node, to destination node SCP file respectively;
(7) i increases by 1 unit interval, jumps to (4);
(8) file distributing finishes.
Supposing to be numbered 1 " node1 " has and needs the file that sends, as shown in Figure 2, has shown in the unit interval 1 in n the source node of transfer files and the relation between destination node between clustered node.
Suppose that interstitial content is N, only need cost
Figure BDA0000390627500000052
the individual unit interval, just can file all be distributed complete.
Suppose to have 5 nodes, according to computing formula, can know
Figure BDA0000390627500000053
, need 3 unit interval to complete transmission, its transmission node sequence is (source node → destination node):
The 1st unit interval: 1 → 2
The 2nd unit interval: 1 → 3; 2 → 4
The 3rd unit interval: 1 → 5
For example, in table 1, be i at the 100 result tables with interior 2 i power.The number of clusters of google is approximately 15000 at present.Therefore, the unit of account time, recycle ratio maximum got 20 when n value, and number of clusters has been more than 1,000,000, substantially can meet the cluster size of present situation.
Table 1i is at the 100 result tables with interior 2 i power
Figure BDA0000390627500000051
Figure BDA0000390627500000061
The flow chart of the algorithm of the unit of account time n of the inventive method as shown in Figure 3, is specially:
(1) accept input number of nodes order N;
(2) value of establishing n equals 1;
(3) value of establishing max equals 21;
(4) size of judgement n and max: when n is less than max, jump to (5), otherwise, jump to (8);
(5) the i power that the value of poweri equals 2;
(6) size of judgement poweri and N: when poweri is less than N, jump to (7), otherwise, jump to (8);
(7) n increases by 1 unit interval, jumps to (4);
(8) algorithm finishes.
The flow chart of the algorithm of the transmission node sequence of the unit of account time i of the inventive method as shown in Figure 4, is specially:
(1) accept input unit interval i and interstitial content N;
(2) according to the maximum max of i value unit of account time and median mid, computing formula is: max=2 i, mid=max/2;
(3) value of design number k equals 1;
(4) size of judgement k and mid: when k is less than mid, jump to (5), otherwise, jump to (9);
(5) value of source equals the value that value that value that k value adds 1, target equals mid adds source;
(6) size of judgement target and N: when target is not more than N, jump to (7), otherwise, jump to (9);
(7) (target, source) to adding transmission node sequence;
(8) k increases by 1 unit interval, jumps to (4);
(9) algorithm finishes.
By the present invention, can realize the reliable point-to-point transmission of file between cluster, the real quick distribution of document ,Gei of parallel nothing lock enterprise brings certain economic benefit.In addition, within the unit interval, by source node, to destination node part, see it is serial, cluster angle, walk abreast, do not have resource contention and Deadlock.
The method of a kind of large-scale cluster the distribution of document above embodiment of the present invention being provided is described in detail, applied specific case herein principle of the present invention and execution mode are set forth, the explanation of above embodiment is just for helping to understand method of the present invention and core concept thereof; , for one of ordinary skill in the art, according to thought of the present invention, all will change in specific embodiments and applications, in sum, this description should not be construed as limitation of the present invention meanwhile.

Claims (3)

1. a method for large-scale cluster distribution of document, is characterized in that, the network transmission protocol of the method is the SCP transmission command of Linux, and according to specific transmission node sequence, SSH moves the distribute work that file is carried out in SCP order above node, is specially:
S1: establishing clustered node number is N;
S2: according to N value, calculate unit interval n;
S3: establish unit interval i and equal 1;
S4: the size of judgement i and n: when i is not more than n, jump to S5, otherwise, jump to S8;
S5: according to i and N value, calculate transmission node sequence;
S6: according to the order of transmission node sequence, SSH is to source node, to destination node SCP file respectively;
S7:i increases by 1 unit interval, jumps to S4;
S8: file distributing finishes.
2. method according to claim 1, is characterized in that, the method flow that calculates unit interval n in step S2 is:
S21: accept input number of nodes order N;
S22: the value of establishing n equals 1;
S23: the value of establishing max equals 21;
S24: the size of judgement n and max: when n is less than max, jump to S25, otherwise, jump to S28;
The i power that the value of S25:poweri equals 2;
S26: the size of judgement poweri and N: when poweri is less than N, jump to S27, otherwise, jump to S28;
S27:n increases by 1 unit interval, jumps to S24;
S28: algorithm finishes.
3. method according to claim 1, is characterized in that, the method flow that calculates the transmission node sequence of unit interval i in step S5 is:
S51: accept input unit interval i and interstitial content N;
S52: according to the maximum max of i value unit of account time and median mid, computing formula is: max=2 i, mid=max/2;
S53: the value of design number k equals 1;
S54: the size of judgement k and mid: when k is less than mid, jump to S55, otherwise, jump to S59;
The value of S55:source equals the value that value that value that k value adds 1, target equals mid adds source;
S56: the size of judgement target and N: when target is not more than N, jump to S57, otherwise, jump to S59;
S57: (target, source) to adding transmission node sequence;
S58:k increases by 1 unit interval, jumps to S54;
S59: algorithm finishes.
CN201310462061.5A 2013-09-30 2013-09-30 Large-scale cluster file distribution method Pending CN103546546A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310462061.5A CN103546546A (en) 2013-09-30 2013-09-30 Large-scale cluster file distribution method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310462061.5A CN103546546A (en) 2013-09-30 2013-09-30 Large-scale cluster file distribution method

Publications (1)

Publication Number Publication Date
CN103546546A true CN103546546A (en) 2014-01-29

Family

ID=49969585

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310462061.5A Pending CN103546546A (en) 2013-09-30 2013-09-30 Large-scale cluster file distribution method

Country Status (1)

Country Link
CN (1) CN103546546A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103986760A (en) * 2014-05-14 2014-08-13 上海上讯信息技术股份有限公司 Growth type data source efficient link transmission method based on multi-parameter arbitration
WO2017024805A1 (en) * 2015-08-12 2017-02-16 腾讯科技(深圳)有限公司 File delivery method, apparatus, and system
CN108769222A (en) * 2018-06-05 2018-11-06 朱士祥 A kind of high-performance treatments method that files in batch uploads
CN110912969A (en) * 2019-11-04 2020-03-24 西安雷风电子科技有限公司 High-speed file transmission source node, destination node device and system
CN112306962A (en) * 2019-07-26 2021-02-02 杭州海康威视数字技术股份有限公司 File copying method and device in computer cluster system and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101242337B (en) * 2007-02-08 2010-11-10 张永敏 A content distribution method and system in computer network
CN102638569A (en) * 2012-01-13 2012-08-15 深圳市同洲视讯传媒有限公司 File distribution synchronizing method and system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101242337B (en) * 2007-02-08 2010-11-10 张永敏 A content distribution method and system in computer network
CN102638569A (en) * 2012-01-13 2012-08-15 深圳市同洲视讯传媒有限公司 File distribution synchronizing method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
朱晓明: ""R-net文件分发系统(RFDS)设计与实现"", 《中国优秀硕士学位论文全文数据库信息科技辑》, no. 3, 15 March 2011 (2011-03-15), pages 138 - 1567 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103986760A (en) * 2014-05-14 2014-08-13 上海上讯信息技术股份有限公司 Growth type data source efficient link transmission method based on multi-parameter arbitration
CN103986760B (en) * 2014-05-14 2018-05-22 上海上讯信息技术股份有限公司 The efficient link transmission method of growth form data source formula based on multi-parameter arbitration
WO2017024805A1 (en) * 2015-08-12 2017-02-16 腾讯科技(深圳)有限公司 File delivery method, apparatus, and system
CN106453460A (en) * 2015-08-12 2017-02-22 腾讯科技(深圳)有限公司 File distributing method, apparatus and system
CN108769222A (en) * 2018-06-05 2018-11-06 朱士祥 A kind of high-performance treatments method that files in batch uploads
CN112306962A (en) * 2019-07-26 2021-02-02 杭州海康威视数字技术股份有限公司 File copying method and device in computer cluster system and storage medium
CN112306962B (en) * 2019-07-26 2024-02-23 杭州海康威视数字技术股份有限公司 File copying method, device and storage medium in computer cluster system
CN110912969A (en) * 2019-11-04 2020-03-24 西安雷风电子科技有限公司 High-speed file transmission source node, destination node device and system
CN110912969B (en) * 2019-11-04 2023-04-07 西安雷风电子科技有限公司 High-speed file transmission source node, destination node device and system

Similar Documents

Publication Publication Date Title
CN103546546A (en) Large-scale cluster file distribution method
CN104301391B (en) Multi-area optical network data center resource virtualizes mapping method
Xia et al. Blast: Accelerating high-performance data analytics applications by optical multicast
US11616662B2 (en) Fractal tree structure-based data transmit device and method, control device, and intelligent chip
Achary et al. Dynamic job scheduling using ant colony optimization for mobile cloud computing
US9166930B2 (en) Waved time multiplexing
CN102902594A (en) Resource management system and resource management method
WO2013052068A1 (en) Mechanism for employing and facilitating dynamic and remote memory collaboration at computing devices
KR20220063759A (en) Quantum measurement and control system for multi-bit quantum feedback control
CN105068435A (en) Distributed wireless smart home system
US20150016321A1 (en) Performance in a direct communication link environment
KR20180043669A (en) Data transmission system and method for energy efficiency in wireless sensor network
US9336169B2 (en) Facilitating resource use in multicycle arbitration for single cycle data transfer
US10310762B1 (en) Lease-based leader designation for multiple processes accessing storage resources of a storage system
CN114301980A (en) Method, device and system for scheduling container cluster and computer readable medium
US9342473B2 (en) Parallel computer system, crossbar switch, and method of controlling parallel computer system according to selective transmission of data via ports of the crossbar switch
Chen et al. Hypds: enabling a hybrid file transfer protocol and peer to peer content distribution system for remote sensing data
US9860191B2 (en) Method for constructing optimal time-controlled paths in a large computer network
CN109450809B (en) Data center scheduling system and method
CN104065707B (en) Based on the time triggered scheduling system calculated with communication coordinated design
CN109246487B (en) Intelligent scheduling system
Chen et al. Delay-optimal distributed edge computation offloading with correlated computation and communication workloads
CN102891895B (en) A kind of bandwidth optimization method of virtual mirror server
CN105577769A (en) Resource distribution system in multipath high-end computer system
Ao et al. Joint workload distribution and capacity augmentation in hybrid datacenter networks

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20140129