CN102917068A - Self-adaptive large-scale cluster communication system and self-adaptive large-scale cluster communication method - Google Patents

Self-adaptive large-scale cluster communication system and self-adaptive large-scale cluster communication method Download PDF

Info

Publication number
CN102917068A
CN102917068A CN2012104177069A CN201210417706A CN102917068A CN 102917068 A CN102917068 A CN 102917068A CN 2012104177069 A CN2012104177069 A CN 2012104177069A CN 201210417706 A CN201210417706 A CN 201210417706A CN 102917068 A CN102917068 A CN 102917068A
Authority
CN
China
Prior art keywords
time
irtt
real
service
timeout threshold
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2012104177069A
Other languages
Chinese (zh)
Inventor
范明彬
王静
王通
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Beijing Electronic Information Industry Co Ltd
Original Assignee
Inspur Beijing Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Beijing Electronic Information Industry Co Ltd filed Critical Inspur Beijing Electronic Information Industry Co Ltd
Priority to CN2012104177069A priority Critical patent/CN102917068A/en
Publication of CN102917068A publication Critical patent/CN102917068A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses a self-adaptive large-scale cluster communication system and a self-adaptive large-scale cluster communication method and relates to a distributed file system in a large-scale cluster. The self-adaptive large-scale cluster communication method includes that the self-adaptive large-scale cluster communication system acquires network communication round trip time (IRTT) T<IRTT> and server processing time T<service> thereof in real time, time-out threshold value T<time-out threshold value> is updated in real time according to the real-time acquired T<IRTT> and the T<service>, and time-out processing operation is carried out according to the updated time-out threshold value. The invention further discloses the self-adaptive large-scale cluster communication system. By the technical scheme, system cost is saved, communication quality is improved, optimization of the cluster is realized, and responsiveness, concurrency and reliability of the cluster system are improved.

Description

A kind of self adaptation large-scale cluster communication system and communication means thereof
Technical field
The present invention relates to distributed file system in the large-scale cluster, be specifically related to a kind of self adaptation large-scale cluster communication system and communication means thereof.
Background technology
Along with the development in epoch, information explosion becomes the hot issue that people fall over each other to talk about gradually, in the face of huge information like this, how some typed data is carried out high efficiency extraction and processing, is also brought into schedule, and becomes one of main direction of development of computer.According to statistics, in 500 the most powerful computers of global calculation ability, hardware system based on aggregated structure has increased to 500, the shared proportion of cluster has surpassed 70%, group system has become one of main fluid architecture that makes up high-performance computer system, and the trend of oriented ultra-large development.This shows, along with the increase of amount of information, the especially ultra-large Clustering of Clustering has vigorous vitality and vast potential for future development at high-performance computing sector and field of information processing.
Development along with ultra-large cluster (distributed file system) technology, people are also more and more higher to its requirement, especially when processing mass data, we can propose higher requirement to satisfy various performance requirements to network service, system load, disk I/O.These group systems have adopted client-server model mostly.And the communication between each node in the group system is carried out mainly with the mode of remote procedure call (remote produce call is called for short RPC) or class remote procedure call greatly.In the distributed cluster system based on RPC or class RPC structure, the failure conditions such as data-bag lost, network connection failure and node failure can cause the integrity problem of system.Can cause thus other the system failure, thereby reduce the performance of system, and hinder normally carrying out of operation.
How in time the inefficacy in discovery and the Precise Position System is the key issue that guarantees that the group system high reliability need solve.If communication failure or the system failure are failed timely and effective detecting, will have a strong impact on response and the availability of system; Otherwise, if false alarm often occurs in system, reparation action or the processing mode that then can lead to errors, the availability of reduction system may be brought the loss that can't retrieve to system simultaneously.
Therefore, in the especially ultra-large group system of group system was used, when realizing the communication protocol of RPC or class RPC, how effectively detecting communication failure was the problem that needs are paid close attention to.Overtime is a kind of commonly used and necessary failure detection means, and it binds together with a remote procedure call usually.In network service, most of communication protocols all are to detect inefficacy with overtime, this testing mechanism is particularly important for the insecure RPC agreement of bottom host-host protocol, thereby, overtime testing mechanism directly has influence on a lot of aspects in the distributed system that makes up based on RPC or class RPC in the cluster, particularly response, reliability and stability etc., thus cause the cluster performance to reduce.
Summary of the invention
Technical problem to be solved by this invention is to provide a kind of self adaptation large-scale cluster communication system and communication means thereof, to improve the reliability of group system.
In order to solve the problems of the technologies described above, the invention discloses a kind of communication means of self adaptation large-scale cluster communication system, comprising:
(IRTT) T network service two-way time of self adaptation large-scale cluster communication system Real-time Obtaining native system IRTTWith server processing time T Service, according to the T of Real-time Obtaining IRTTAnd T ServiceReal-time update timeout threshold T Timeout threshold, carry out the timeout treatment operation according to the timeout threshold of upgrading.
Preferably, in the said method, according to the T of Real-time Obtaining IRTTAnd T ServiceThe real-time update timeout threshold refers to calculate in real time timeout threshold T according to following formula Timeout threshold:
T Timeout threshold=T IRTT+ λ T Service
Wherein, λ is the positive number more than or equal to 1.
Preferably, in the said method, the span of described λ is: 1.3>λ>1.1.
Preferably, in the said method, according to the T of Real-time Obtaining IRTTAnd T ServiceThe process of real-time update timeout threshold is as follows:
Real-time Obtaining T IRTTAnd T ServiceIf, the T of Real-time Obtaining IRTTWith the front T that once obtains IRTTChange, and/or the T of Real-time Obtaining ServiceWith the front T that once obtains ServiceChange, then according to the T of Real-time Obtaining IRTTAnd T ServiceUpgrade timeout threshold.
Preferably, in the said method, described T IRTTValue for to issue a request to the time T that reaches server from client SeqTime T with the packet arrival client of returning from server end AckAnd.
The invention also discloses a kind of self adaptation large-scale cluster communication system, comprising:
The first module, (IRTT) T network service two-way time of Real-time Obtaining native system IRTTWith server processing time T Service
The second module is according to the T of Real-time Obtaining IRTTAnd T ServiceThe real-time update timeout threshold is carried out the timeout treatment operation according to the timeout threshold of upgrading.
Preferably, in the said system, described the second module is calculated timeout threshold T in real time according to following formula Timeout threshold:
T Timeout threshold=T IRTT+ λ T Service
Wherein, λ is the positive number more than or equal to 1.
Preferably, in the said system, the span of described λ is: 1.3>λ>1.1.
Preferably, in the said system, described the second module is at the T of described the first module Real-time Obtaining IRTTWith the front T that once obtains IRTTChange, and/or the T of described the first module Real-time Obtaining ServiceWith the front T that once obtains ServiceChange, then according to the T of Real-time Obtaining IRTTAnd T ServiceUpgrade timeout threshold.
Preferably, in the said system, the T that described the first module is obtained IRTTValue for to issue a request to the time T that reaches server from client SeqTime T with the packet arrival client of returning from server end AckAnd.
The present techniques scheme has been saved overhead, has improved communication quality, and realizes the optimization of cluster, improves response, concurrency and the reliability of group system.
Description of drawings
The self adaptation large-scale cluster communication system communication process schematic diagram that Fig. 1 provides for the present embodiment;
The self adaptation large-scale cluster communication system Principle of Communication schematic diagram that Fig. 2 provides for the present embodiment.
Embodiment
For making the purpose, technical solutions and advantages of the present invention clearer, hereinafter in connection with accompanying drawing technical solution of the present invention is described in further detail.Need to prove, in the situation that do not conflict, the application's embodiment and the feature among the embodiment can make up arbitrarily mutually.
Embodiment 1
Prior art generally with fixed value as the communication overtime threshold value to process, ignored the real-time status of system but the present application people finds this kind scheme, reliability is relatively poor.Therefore, the present application people proposes, and considers the factors such as server node load, concurrent situation and the system failure, dynamically updates the optimum timeout threshold under the current running environment, according to size minimizing or the increase communication mechanism time of this timeout threshold.Namely in client and server interaction process, by constantly delivery network communication (internet round trip time, IRTT) T two-way time IRTTWith server processing time T ServiceThese two time parameters, client is constantly updated parameter value, and dynamically updates timeout value, thereby can adjust adaptively timeout value according to network condition and system load.
Based on above-mentioned thought, the present embodiment provides a kind of communication means of self adaptation large-scale cluster communication system, and the implementation procedure of the method is as follows:
The network service two-way time of self adaptation large-scale cluster communication system Real-time Obtaining native system (internet round trip time, IRTT) T IRTTWith server processing time T Service, according to the T of Real-time Obtaining IRTTAnd T ServiceThe real-time update timeout threshold is carried out the timeout treatment operation according to the timeout threshold of upgrading.
Wherein, can calculate in real time timeout threshold T according to following formula Timeout threshold:
T Timeout threshold=T IRTT+ λ T Service
It is excessive that the applicant found through experiments the span of λ, will so that system's time-out time is long, after the system node fault, can cause response slowly; If the value of λ too small (for example less than 1) then can produce and connects overtime and interrupt, cause the unnecessary operations such as data packet retransmission, thereby reduce systematic function.Therefore, in the present embodiment, λ is the positive number more than or equal to 1.Preferably, recommend the span of λ to be: 1.3>λ>1.1.
Also be noted that the T according to Real-time Obtaining IRTTAnd T ServiceThe real-time update timeout threshold refers to, as the T of Real-time Obtaining IRTTAnd T ServiceIn appoint one or two values to change the time (the T of Real-time Obtaining IRTTThe T that obtains with the last time IRTTChange, and/or the T of Real-time Obtaining ServiceThe T that obtains with the last time ServiceChange), ability is according to the T of Real-time Obtaining IRTTAnd T ServiceUpgrade timeout threshold.And for T IRTTAnd T ServiceObtain manner have a variety ofly, can be Real-time Obtaining periodically, can according to system mode Real-time Obtaining when changing (be system mode at once obtain), can also be to obtain according to any times such as user's requests also.
And involved T herein IRTTWith server processing time T ServiceCan obtain by any-mode.For example, T IRTTValue can be for issue a request to the time T that reaches server from client SeqTime T with the packet arrival client of returning from server end AckAnd.At this moment, the operation principle of self adaptation large-scale cluster communication system as shown in Figure 1.Certainly those skilled in the art also can calculate according to other modes and obtain T IRTTAnd T Service
Below with reference to Fig. 2, with regard to how according to the T network service two-way time of Real-time Obtaining (internet round trip time is called for short IRTT) IRTTWith server processing time T ServiceThe process of communication is elaborated.Particularly, the communication process of this self adaptation large-scale cluster communication system comprises the steps 100 to step 400.
Step 100: the time by client and server both sides node in the time server synchronous self-adapting large-scale cluster communication system, guarantee its consistency.
Step 200: computing network communication IRTT two-way time.
In this step, network service T two-way time IRTTFinger issues a request to the time T that reaches server from client SeqTime T with the packet arrival client of returning from server end AckAnd.That is:
T IRTT=T seq+T ack
T seq=t sa-t cs
T ack=t ca-t ss
Wherein, t Ca, t SsBe respectively server end time of reception, client delivery time, client are constantly and the server delivery time.Therefore network service can simply be interpreted as two-way time: the time that information consumes in transmission course.
Step 300: calculation server processing time.
In this step, the server process time T ServiceRefer to the time t of solicited message (data) the arrival server that client is sent Sa, until server end returns the moment t of ack or data Ss, the time between this is referred to as the server process time.Be can simply be interpreted as the server process time: packet is in the time of server end stop.
Step 400: calculate dynamic timeout threshold.
In this step, the T that obtains in the timeout mechanism of the self adaptation large-scale cluster communication system meeting record unit time section IRTTAnd T ServiceTime pair, and get its maximum in the period at this section, then utilize following formula to calculate its timeout threshold:
T Timeout threshold=T IRTT+ λ T Service
λ is the positive number more than or equal to 1 in the following formula.Preferably, recommend the span of λ to be: 1.3>λ>1.1.
Need to prove that the calculating operation in the above-mentioned steps 200,300 and 400 all can be carried out by connecting the initiator.Be generally client and connect at present the initiator.
Embodiment 2
The present embodiment is introduced a kind of self adaptation large-scale cluster communication system, can realize the scheme of above-described embodiment 1.This system comprises the first module and the second module at least.
The first module, (IRTT) T network service two-way time of Real-time Obtaining native system IRTTWith server processing time T Service
The second module is according to the T of Real-time Obtaining IRTTAnd T ServiceThe real-time update timeout threshold is carried out the timeout treatment operation according to the timeout threshold of upgrading.
Wherein, the second module can be calculated timeout threshold T in real time according to following formula Timeout threshold:
T Timeout threshold=T IRTT+ λ T Service
Wherein, λ is the positive number more than or equal to 1.Particularly, the preferred span of λ is: 1.3>λ>1.1.
It is pointed out that the second module, at the T of the first module Real-time Obtaining IRTTAnd T ServiceIn to appoint one or two values to change (be the T of Real-time Obtaining IRTTThe T that obtains with the last time IRTTChange, and/or the T of Real-time Obtaining ServiceThe T that obtains with the last time ServiceChange) time, ability is according to the T of Real-time Obtaining IRTTAnd T ServiceUpgrade timeout threshold.
And for the first module, its Real-time Obtaining T IRTTAnd T ServiceMode very flexible, can be Real-time Obtaining periodically, can according to system mode Real-time Obtaining when changing (be system mode at once obtain), can also be to obtain according to any times such as user's requests also.Wherein, the T that obtains of the first module IRTTValue for to issue a request to the time T that reaches server from client SeqTime T with the packet arrival client of returning from server end AckAnd.The server process time T that the first module is obtained ServiceRefer to the time t of solicited message (data) the arrival server that client is sent Sa, until server end returns the moment t of ack or data SsBe can simply be interpreted as the server process time: packet is in the time of server end stop.
One of ordinary skill in the art will appreciate that all or part of step in the said method can come the instruction related hardware to finish by program, described program can be stored in the computer-readable recording medium, such as read-only memory, disk or CD etc.Alternatively, all or part of step of above-described embodiment also can realize with one or more integrated circuits.Correspondingly, each the module/unit in above-described embodiment can adopt the form of hardware to realize, also can adopt the form of software function module to realize.The application is not restricted to the combination of the hardware and software of any particular form.
The above is preferred embodiments of the present invention only, is not for limiting protection scope of the present invention.Within the spirit and principles in the present invention all, any modification of making, be equal to replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (10)

1. the communication means of a self adaptation large-scale cluster communication system is characterized in that, the method comprises:
(IRTT) T network service two-way time of self adaptation large-scale cluster communication system Real-time Obtaining native system IRTTWith server processing time T Service, according to the T of Real-time Obtaining IRTTAnd T ServiceReal-time update timeout threshold T Timeout threshold, carry out the timeout treatment operation according to the timeout threshold of upgrading.
2. the method for claim 1 is characterized in that, according to the T of Real-time Obtaining IRTTAnd T ServiceThe real-time update timeout threshold refers to calculate in real time timeout threshold T according to following formula Timeout threshold:
T Timeout threshold=T IRTT+ λ T Service
Wherein, λ is the positive number more than or equal to 1.
3. method as claimed in claim 2 is characterized in that,
The span of described λ is: 1.3>λ>1.1.
4. such as each described method of claims 1 to 3, it is characterized in that, according to the T of Real-time Obtaining IRTTAnd T ServiceThe process of real-time update timeout threshold is as follows:
Real-time Obtaining T IRTTAnd T ServiceIf, the T of Real-time Obtaining IRTTWith the front T that once obtains IRTTChange, and/or the T of Real-time Obtaining ServiceWith the front T that once obtains ServiceChange, then according to the T of Real-time Obtaining IRTTAnd T ServiceUpgrade timeout threshold.
5. method as claimed in claim 4 is characterized in that,
Described T IRTTValue for to issue a request to the time T that reaches server from client SeqTime T with the packet arrival client of returning from server end AckAnd.
6. a self adaptation large-scale cluster communication system is characterized in that, this system comprises:
The first module, (IRTT) T network service two-way time of Real-time Obtaining native system IRTTWith server processing time T Service
The second module is according to the T of Real-time Obtaining IRTTAnd T ServiceThe real-time update timeout threshold is carried out the timeout treatment operation according to the timeout threshold of upgrading.
7. system as claimed in claim 6 is characterized in that, described the second module is calculated timeout threshold T in real time according to following formula Timeout threshold:
T Timeout threshold=T IRTT+ λ T Service
Wherein, λ is the positive number more than or equal to 1.
8. system as claimed in claim 7 is characterized in that,
The span of described λ is: 1.3>λ>1.1.
9. such as each described system of claim 6 to 8, it is characterized in that,
Described the second module is at the T of described the first module Real-time Obtaining IRTTWith the front T that once obtains IRTTChange, and/or the T of described the first module Real-time Obtaining ServiceWith the front T that once obtains ServiceChange, then according to the T of Real-time Obtaining IRTTAnd T ServiceUpgrade timeout threshold.
10. system as claimed in claim 9 is characterized in that,
The T that described the first module is obtained IRTTValue for to issue a request to the time T that reaches server from client SeqTime T with the packet arrival client of returning from server end AckAnd.
CN2012104177069A 2012-10-26 2012-10-26 Self-adaptive large-scale cluster communication system and self-adaptive large-scale cluster communication method Pending CN102917068A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2012104177069A CN102917068A (en) 2012-10-26 2012-10-26 Self-adaptive large-scale cluster communication system and self-adaptive large-scale cluster communication method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2012104177069A CN102917068A (en) 2012-10-26 2012-10-26 Self-adaptive large-scale cluster communication system and self-adaptive large-scale cluster communication method

Publications (1)

Publication Number Publication Date
CN102917068A true CN102917068A (en) 2013-02-06

Family

ID=47615298

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2012104177069A Pending CN102917068A (en) 2012-10-26 2012-10-26 Self-adaptive large-scale cluster communication system and self-adaptive large-scale cluster communication method

Country Status (1)

Country Link
CN (1) CN102917068A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104348639A (en) * 2013-07-29 2015-02-11 华中科技大学 Sectioned RPC timeout value self-adaptive regulation method
CN109560897A (en) * 2017-09-25 2019-04-02 网宿科技股份有限公司 A kind of TCP repeating method and device
CN114710406A (en) * 2022-04-24 2022-07-05 中国工商银行股份有限公司 Dynamic determination method and device of timeout threshold, electronic equipment and medium
CN115190157A (en) * 2022-07-08 2022-10-14 济南浪潮数据技术有限公司 RPC timeout mechanism setting method, device, equipment and medium
WO2024016622A1 (en) * 2022-07-20 2024-01-25 北京佰才邦技术股份有限公司 Timeout parameter determination method and apparatus and communication device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030200329A1 (en) * 2002-04-23 2003-10-23 Delaney William P. Polling-based mechanism for improved RPC timeout handling
CN102033889A (en) * 2009-09-29 2011-04-27 熊凡凡 Distributed database parallel processing system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030200329A1 (en) * 2002-04-23 2003-10-23 Delaney William P. Polling-based mechanism for improved RPC timeout handling
CN102033889A (en) * 2009-09-29 2011-04-27 熊凡凡 Distributed database parallel processing system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
钱迎进等: "《大规模集群中一种自适应可扩展的RPC超时机制》", 《软件学报》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104348639A (en) * 2013-07-29 2015-02-11 华中科技大学 Sectioned RPC timeout value self-adaptive regulation method
CN104348639B (en) * 2013-07-29 2017-08-25 华中科技大学 A kind of RPC timeout value self-adapting regulation methods of by stages
CN109560897A (en) * 2017-09-25 2019-04-02 网宿科技股份有限公司 A kind of TCP repeating method and device
CN114710406A (en) * 2022-04-24 2022-07-05 中国工商银行股份有限公司 Dynamic determination method and device of timeout threshold, electronic equipment and medium
CN114710406B (en) * 2022-04-24 2023-09-26 中国工商银行股份有限公司 Method, device, electronic equipment and medium for dynamically determining timeout threshold
CN115190157A (en) * 2022-07-08 2022-10-14 济南浪潮数据技术有限公司 RPC timeout mechanism setting method, device, equipment and medium
WO2024016622A1 (en) * 2022-07-20 2024-01-25 北京佰才邦技术股份有限公司 Timeout parameter determination method and apparatus and communication device

Similar Documents

Publication Publication Date Title
JP5568048B2 (en) Parallel computer system and program
US9953053B2 (en) Reliability improvement of distributed transaction processing optimizations based on connection status
JP5624655B2 (en) Message to transfer backup manager in distributed server system
US10826812B2 (en) Multiple quorum witness
US10785350B2 (en) Heartbeat in failover cluster
CN107430606B (en) Message broker system with parallel persistence
US20170315886A1 (en) Locality based quorum eligibility
CN106874143B (en) Server backup method and backup system thereof
JP2012528382A (en) Cache data processing using cache clusters in configurable mode
CN107124469B (en) Cluster node communication method and system
CN102917068A (en) Self-adaptive large-scale cluster communication system and self-adaptive large-scale cluster communication method
WO2006125392A1 (en) A computer processing system for realizing data updating and a data updating method
CN103401712A (en) Content distribution based intelligent high-availability task processing method and system
KR20080101787A (en) Intelligent failover in a load-balanced networking environment
CN107682460B (en) Distributed storage cluster data communication method and system
US9830263B1 (en) Cache consistency
US7191356B2 (en) Method for asynchronous support of fault-tolerant and adaptive communication
US10897402B2 (en) Statistics increment for multiple publishers
CN101442437A (en) Method, system and equipment for implementing high availability
US9021109B1 (en) Controlling requests through message headers
WO2017071430A1 (en) Message processing method, network card, system, information update method, and server
CN110798366B (en) Task logic processing method, device and equipment
CN108234595B (en) Log transmission method and system
US8880670B1 (en) Group membership discovery service
JP2009217765A (en) Synchronous transmitting method to multiple destination, its implementation system and processing program

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20130206