CN102917068A - Self-adaptive large-scale cluster communication system and self-adaptive large-scale cluster communication method - Google Patents
Self-adaptive large-scale cluster communication system and self-adaptive large-scale cluster communication method Download PDFInfo
- Publication number
- CN102917068A CN102917068A CN2012104177069A CN201210417706A CN102917068A CN 102917068 A CN102917068 A CN 102917068A CN 2012104177069 A CN2012104177069 A CN 2012104177069A CN 201210417706 A CN201210417706 A CN 201210417706A CN 102917068 A CN102917068 A CN 102917068A
- Authority
- CN
- China
- Prior art keywords
- time
- irtt
- real
- service
- timeout threshold
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Abstract
The invention discloses a self-adaptive large-scale cluster communication system and a self-adaptive large-scale cluster communication method and relates to a distributed file system in a large-scale cluster. The self-adaptive large-scale cluster communication method includes that the self-adaptive large-scale cluster communication system acquires network communication round trip time (IRTT) T<IRTT> and server processing time T<service> thereof in real time, time-out threshold value T<time-out threshold value> is updated in real time according to the real-time acquired T<IRTT> and the T<service>, and time-out processing operation is carried out according to the updated time-out threshold value. The invention further discloses the self-adaptive large-scale cluster communication system. By the technical scheme, system cost is saved, communication quality is improved, optimization of the cluster is realized, and responsiveness, concurrency and reliability of the cluster system are improved.
Description
Technical field
The present invention relates to distributed file system in the large-scale cluster, be specifically related to a kind of self adaptation large-scale cluster communication system and communication means thereof.
Background technology
Along with the development in epoch, information explosion becomes the hot issue that people fall over each other to talk about gradually, in the face of huge information like this, how some typed data is carried out high efficiency extraction and processing, is also brought into schedule, and becomes one of main direction of development of computer.According to statistics, in 500 the most powerful computers of global calculation ability, hardware system based on aggregated structure has increased to 500, the shared proportion of cluster has surpassed 70%, group system has become one of main fluid architecture that makes up high-performance computer system, and the trend of oriented ultra-large development.This shows, along with the increase of amount of information, the especially ultra-large Clustering of Clustering has vigorous vitality and vast potential for future development at high-performance computing sector and field of information processing.
Development along with ultra-large cluster (distributed file system) technology, people are also more and more higher to its requirement, especially when processing mass data, we can propose higher requirement to satisfy various performance requirements to network service, system load, disk I/O.These group systems have adopted client-server model mostly.And the communication between each node in the group system is carried out mainly with the mode of remote procedure call (remote produce call is called for short RPC) or class remote procedure call greatly.In the distributed cluster system based on RPC or class RPC structure, the failure conditions such as data-bag lost, network connection failure and node failure can cause the integrity problem of system.Can cause thus other the system failure, thereby reduce the performance of system, and hinder normally carrying out of operation.
How in time the inefficacy in discovery and the Precise Position System is the key issue that guarantees that the group system high reliability need solve.If communication failure or the system failure are failed timely and effective detecting, will have a strong impact on response and the availability of system; Otherwise, if false alarm often occurs in system, reparation action or the processing mode that then can lead to errors, the availability of reduction system may be brought the loss that can't retrieve to system simultaneously.
Therefore, in the especially ultra-large group system of group system was used, when realizing the communication protocol of RPC or class RPC, how effectively detecting communication failure was the problem that needs are paid close attention to.Overtime is a kind of commonly used and necessary failure detection means, and it binds together with a remote procedure call usually.In network service, most of communication protocols all are to detect inefficacy with overtime, this testing mechanism is particularly important for the insecure RPC agreement of bottom host-host protocol, thereby, overtime testing mechanism directly has influence on a lot of aspects in the distributed system that makes up based on RPC or class RPC in the cluster, particularly response, reliability and stability etc., thus cause the cluster performance to reduce.
Summary of the invention
Technical problem to be solved by this invention is to provide a kind of self adaptation large-scale cluster communication system and communication means thereof, to improve the reliability of group system.
In order to solve the problems of the technologies described above, the invention discloses a kind of communication means of self adaptation large-scale cluster communication system, comprising:
(IRTT) T network service two-way time of self adaptation large-scale cluster communication system Real-time Obtaining native system
IRTTWith server processing time T
Service, according to the T of Real-time Obtaining
IRTTAnd T
ServiceReal-time update timeout threshold T
Timeout threshold, carry out the timeout treatment operation according to the timeout threshold of upgrading.
Preferably, in the said method, according to the T of Real-time Obtaining
IRTTAnd T
ServiceThe real-time update timeout threshold refers to calculate in real time timeout threshold T according to following formula
Timeout threshold:
T
Timeout threshold=T
IRTT+ λ T
Service
Wherein, λ is the positive number more than or equal to 1.
Preferably, in the said method, the span of described λ is: 1.3>λ>1.1.
Preferably, in the said method, according to the T of Real-time Obtaining
IRTTAnd T
ServiceThe process of real-time update timeout threshold is as follows:
Real-time Obtaining T
IRTTAnd T
ServiceIf, the T of Real-time Obtaining
IRTTWith the front T that once obtains
IRTTChange, and/or the T of Real-time Obtaining
ServiceWith the front T that once obtains
ServiceChange, then according to the T of Real-time Obtaining
IRTTAnd T
ServiceUpgrade timeout threshold.
Preferably, in the said method, described T
IRTTValue for to issue a request to the time T that reaches server from client
SeqTime T with the packet arrival client of returning from server end
AckAnd.
The invention also discloses a kind of self adaptation large-scale cluster communication system, comprising:
The first module, (IRTT) T network service two-way time of Real-time Obtaining native system
IRTTWith server processing time T
Service
The second module is according to the T of Real-time Obtaining
IRTTAnd T
ServiceThe real-time update timeout threshold is carried out the timeout treatment operation according to the timeout threshold of upgrading.
Preferably, in the said system, described the second module is calculated timeout threshold T in real time according to following formula
Timeout threshold:
T
Timeout threshold=T
IRTT+ λ T
Service
Wherein, λ is the positive number more than or equal to 1.
Preferably, in the said system, the span of described λ is: 1.3>λ>1.1.
Preferably, in the said system, described the second module is at the T of described the first module Real-time Obtaining
IRTTWith the front T that once obtains
IRTTChange, and/or the T of described the first module Real-time Obtaining
ServiceWith the front T that once obtains
ServiceChange, then according to the T of Real-time Obtaining
IRTTAnd T
ServiceUpgrade timeout threshold.
Preferably, in the said system, the T that described the first module is obtained
IRTTValue for to issue a request to the time T that reaches server from client
SeqTime T with the packet arrival client of returning from server end
AckAnd.
The present techniques scheme has been saved overhead, has improved communication quality, and realizes the optimization of cluster, improves response, concurrency and the reliability of group system.
Description of drawings
The self adaptation large-scale cluster communication system communication process schematic diagram that Fig. 1 provides for the present embodiment;
The self adaptation large-scale cluster communication system Principle of Communication schematic diagram that Fig. 2 provides for the present embodiment.
Embodiment
For making the purpose, technical solutions and advantages of the present invention clearer, hereinafter in connection with accompanying drawing technical solution of the present invention is described in further detail.Need to prove, in the situation that do not conflict, the application's embodiment and the feature among the embodiment can make up arbitrarily mutually.
Embodiment 1
Prior art generally with fixed value as the communication overtime threshold value to process, ignored the real-time status of system but the present application people finds this kind scheme, reliability is relatively poor.Therefore, the present application people proposes, and considers the factors such as server node load, concurrent situation and the system failure, dynamically updates the optimum timeout threshold under the current running environment, according to size minimizing or the increase communication mechanism time of this timeout threshold.Namely in client and server interaction process, by constantly delivery network communication (internet round trip time, IRTT) T two-way time
IRTTWith server processing time T
ServiceThese two time parameters, client is constantly updated parameter value, and dynamically updates timeout value, thereby can adjust adaptively timeout value according to network condition and system load.
Based on above-mentioned thought, the present embodiment provides a kind of communication means of self adaptation large-scale cluster communication system, and the implementation procedure of the method is as follows:
The network service two-way time of self adaptation large-scale cluster communication system Real-time Obtaining native system (internet round trip time, IRTT) T
IRTTWith server processing time T
Service, according to the T of Real-time Obtaining
IRTTAnd T
ServiceThe real-time update timeout threshold is carried out the timeout treatment operation according to the timeout threshold of upgrading.
Wherein, can calculate in real time timeout threshold T according to following formula
Timeout threshold:
T
Timeout threshold=T
IRTT+ λ T
Service
It is excessive that the applicant found through experiments the span of λ, will so that system's time-out time is long, after the system node fault, can cause response slowly; If the value of λ too small (for example less than 1) then can produce and connects overtime and interrupt, cause the unnecessary operations such as data packet retransmission, thereby reduce systematic function.Therefore, in the present embodiment, λ is the positive number more than or equal to 1.Preferably, recommend the span of λ to be: 1.3>λ>1.1.
Also be noted that the T according to Real-time Obtaining
IRTTAnd T
ServiceThe real-time update timeout threshold refers to, as the T of Real-time Obtaining
IRTTAnd T
ServiceIn appoint one or two values to change the time (the T of Real-time Obtaining
IRTTThe T that obtains with the last time
IRTTChange, and/or the T of Real-time Obtaining
ServiceThe T that obtains with the last time
ServiceChange), ability is according to the T of Real-time Obtaining
IRTTAnd T
ServiceUpgrade timeout threshold.And for T
IRTTAnd T
ServiceObtain manner have a variety ofly, can be Real-time Obtaining periodically, can according to system mode Real-time Obtaining when changing (be system mode at once obtain), can also be to obtain according to any times such as user's requests also.
And involved T herein
IRTTWith server processing time T
ServiceCan obtain by any-mode.For example, T
IRTTValue can be for issue a request to the time T that reaches server from client
SeqTime T with the packet arrival client of returning from server end
AckAnd.At this moment, the operation principle of self adaptation large-scale cluster communication system as shown in Figure 1.Certainly those skilled in the art also can calculate according to other modes and obtain T
IRTTAnd T
Service
Below with reference to Fig. 2, with regard to how according to the T network service two-way time of Real-time Obtaining (internet round trip time is called for short IRTT)
IRTTWith server processing time T
ServiceThe process of communication is elaborated.Particularly, the communication process of this self adaptation large-scale cluster communication system comprises the steps 100 to step 400.
Step 100: the time by client and server both sides node in the time server synchronous self-adapting large-scale cluster communication system, guarantee its consistency.
Step 200: computing network communication IRTT two-way time.
In this step, network service T two-way time
IRTTFinger issues a request to the time T that reaches server from client
SeqTime T with the packet arrival client of returning from server end
AckAnd.That is:
T
IRTT=T
seq+T
ack
T
seq=t
sa-t
cs
T
ack=t
ca-t
ss
Wherein, t
Ca, t
SsBe respectively server end time of reception, client delivery time, client are constantly and the server delivery time.Therefore network service can simply be interpreted as two-way time: the time that information consumes in transmission course.
Step 300: calculation server processing time.
In this step, the server process time T
ServiceRefer to the time t of solicited message (data) the arrival server that client is sent
Sa, until server end returns the moment t of ack or data
Ss, the time between this is referred to as the server process time.Be can simply be interpreted as the server process time: packet is in the time of server end stop.
Step 400: calculate dynamic timeout threshold.
In this step, the T that obtains in the timeout mechanism of the self adaptation large-scale cluster communication system meeting record unit time section
IRTTAnd T
ServiceTime pair, and get its maximum in the period at this section, then utilize following formula to calculate its timeout threshold:
T
Timeout threshold=T
IRTT+ λ T
Service
λ is the positive number more than or equal to 1 in the following formula.Preferably, recommend the span of λ to be: 1.3>λ>1.1.
Need to prove that the calculating operation in the above-mentioned steps 200,300 and 400 all can be carried out by connecting the initiator.Be generally client and connect at present the initiator.
Embodiment 2
The present embodiment is introduced a kind of self adaptation large-scale cluster communication system, can realize the scheme of above-described embodiment 1.This system comprises the first module and the second module at least.
The first module, (IRTT) T network service two-way time of Real-time Obtaining native system
IRTTWith server processing time T
Service
The second module is according to the T of Real-time Obtaining
IRTTAnd T
ServiceThe real-time update timeout threshold is carried out the timeout treatment operation according to the timeout threshold of upgrading.
Wherein, the second module can be calculated timeout threshold T in real time according to following formula
Timeout threshold:
T
Timeout threshold=T
IRTT+ λ T
Service
Wherein, λ is the positive number more than or equal to 1.Particularly, the preferred span of λ is: 1.3>λ>1.1.
It is pointed out that the second module, at the T of the first module Real-time Obtaining
IRTTAnd T
ServiceIn to appoint one or two values to change (be the T of Real-time Obtaining
IRTTThe T that obtains with the last time
IRTTChange, and/or the T of Real-time Obtaining
ServiceThe T that obtains with the last time
ServiceChange) time, ability is according to the T of Real-time Obtaining
IRTTAnd T
ServiceUpgrade timeout threshold.
And for the first module, its Real-time Obtaining T
IRTTAnd T
ServiceMode very flexible, can be Real-time Obtaining periodically, can according to system mode Real-time Obtaining when changing (be system mode at once obtain), can also be to obtain according to any times such as user's requests also.Wherein, the T that obtains of the first module
IRTTValue for to issue a request to the time T that reaches server from client
SeqTime T with the packet arrival client of returning from server end
AckAnd.The server process time T that the first module is obtained
ServiceRefer to the time t of solicited message (data) the arrival server that client is sent
Sa, until server end returns the moment t of ack or data
SsBe can simply be interpreted as the server process time: packet is in the time of server end stop.
One of ordinary skill in the art will appreciate that all or part of step in the said method can come the instruction related hardware to finish by program, described program can be stored in the computer-readable recording medium, such as read-only memory, disk or CD etc.Alternatively, all or part of step of above-described embodiment also can realize with one or more integrated circuits.Correspondingly, each the module/unit in above-described embodiment can adopt the form of hardware to realize, also can adopt the form of software function module to realize.The application is not restricted to the combination of the hardware and software of any particular form.
The above is preferred embodiments of the present invention only, is not for limiting protection scope of the present invention.Within the spirit and principles in the present invention all, any modification of making, be equal to replacement, improvement etc., all should be included within protection scope of the present invention.
Claims (10)
1. the communication means of a self adaptation large-scale cluster communication system is characterized in that, the method comprises:
(IRTT) T network service two-way time of self adaptation large-scale cluster communication system Real-time Obtaining native system
IRTTWith server processing time T
Service, according to the T of Real-time Obtaining
IRTTAnd T
ServiceReal-time update timeout threshold T
Timeout threshold, carry out the timeout treatment operation according to the timeout threshold of upgrading.
2. the method for claim 1 is characterized in that, according to the T of Real-time Obtaining
IRTTAnd T
ServiceThe real-time update timeout threshold refers to calculate in real time timeout threshold T according to following formula
Timeout threshold:
T
Timeout threshold=T
IRTT+ λ T
Service
Wherein, λ is the positive number more than or equal to 1.
3. method as claimed in claim 2 is characterized in that,
The span of described λ is: 1.3>λ>1.1.
4. such as each described method of claims 1 to 3, it is characterized in that, according to the T of Real-time Obtaining
IRTTAnd T
ServiceThe process of real-time update timeout threshold is as follows:
Real-time Obtaining T
IRTTAnd T
ServiceIf, the T of Real-time Obtaining
IRTTWith the front T that once obtains
IRTTChange, and/or the T of Real-time Obtaining
ServiceWith the front T that once obtains
ServiceChange, then according to the T of Real-time Obtaining
IRTTAnd T
ServiceUpgrade timeout threshold.
5. method as claimed in claim 4 is characterized in that,
Described T
IRTTValue for to issue a request to the time T that reaches server from client
SeqTime T with the packet arrival client of returning from server end
AckAnd.
6. a self adaptation large-scale cluster communication system is characterized in that, this system comprises:
The first module, (IRTT) T network service two-way time of Real-time Obtaining native system
IRTTWith server processing time T
Service
The second module is according to the T of Real-time Obtaining
IRTTAnd T
ServiceThe real-time update timeout threshold is carried out the timeout treatment operation according to the timeout threshold of upgrading.
7. system as claimed in claim 6 is characterized in that, described the second module is calculated timeout threshold T in real time according to following formula
Timeout threshold:
T
Timeout threshold=T
IRTT+ λ T
Service
Wherein, λ is the positive number more than or equal to 1.
8. system as claimed in claim 7 is characterized in that,
The span of described λ is: 1.3>λ>1.1.
9. such as each described system of claim 6 to 8, it is characterized in that,
Described the second module is at the T of described the first module Real-time Obtaining
IRTTWith the front T that once obtains
IRTTChange, and/or the T of described the first module Real-time Obtaining
ServiceWith the front T that once obtains
ServiceChange, then according to the T of Real-time Obtaining
IRTTAnd T
ServiceUpgrade timeout threshold.
10. system as claimed in claim 9 is characterized in that,
The T that described the first module is obtained
IRTTValue for to issue a request to the time T that reaches server from client
SeqTime T with the packet arrival client of returning from server end
AckAnd.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2012104177069A CN102917068A (en) | 2012-10-26 | 2012-10-26 | Self-adaptive large-scale cluster communication system and self-adaptive large-scale cluster communication method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2012104177069A CN102917068A (en) | 2012-10-26 | 2012-10-26 | Self-adaptive large-scale cluster communication system and self-adaptive large-scale cluster communication method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN102917068A true CN102917068A (en) | 2013-02-06 |
Family
ID=47615298
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2012104177069A Pending CN102917068A (en) | 2012-10-26 | 2012-10-26 | Self-adaptive large-scale cluster communication system and self-adaptive large-scale cluster communication method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102917068A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104348639A (en) * | 2013-07-29 | 2015-02-11 | 华中科技大学 | Sectioned RPC timeout value self-adaptive regulation method |
CN109560897A (en) * | 2017-09-25 | 2019-04-02 | 网宿科技股份有限公司 | A kind of TCP repeating method and device |
CN114710406A (en) * | 2022-04-24 | 2022-07-05 | 中国工商银行股份有限公司 | Dynamic determination method and device of timeout threshold, electronic equipment and medium |
CN115190157A (en) * | 2022-07-08 | 2022-10-14 | 济南浪潮数据技术有限公司 | RPC timeout mechanism setting method, device, equipment and medium |
WO2024016622A1 (en) * | 2022-07-20 | 2024-01-25 | 北京佰才邦技术股份有限公司 | Timeout parameter determination method and apparatus and communication device |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030200329A1 (en) * | 2002-04-23 | 2003-10-23 | Delaney William P. | Polling-based mechanism for improved RPC timeout handling |
CN102033889A (en) * | 2009-09-29 | 2011-04-27 | 熊凡凡 | Distributed database parallel processing system |
-
2012
- 2012-10-26 CN CN2012104177069A patent/CN102917068A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030200329A1 (en) * | 2002-04-23 | 2003-10-23 | Delaney William P. | Polling-based mechanism for improved RPC timeout handling |
CN102033889A (en) * | 2009-09-29 | 2011-04-27 | 熊凡凡 | Distributed database parallel processing system |
Non-Patent Citations (1)
Title |
---|
钱迎进等: "《大规模集群中一种自适应可扩展的RPC超时机制》", 《软件学报》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104348639A (en) * | 2013-07-29 | 2015-02-11 | 华中科技大学 | Sectioned RPC timeout value self-adaptive regulation method |
CN104348639B (en) * | 2013-07-29 | 2017-08-25 | 华中科技大学 | A kind of RPC timeout value self-adapting regulation methods of by stages |
CN109560897A (en) * | 2017-09-25 | 2019-04-02 | 网宿科技股份有限公司 | A kind of TCP repeating method and device |
CN114710406A (en) * | 2022-04-24 | 2022-07-05 | 中国工商银行股份有限公司 | Dynamic determination method and device of timeout threshold, electronic equipment and medium |
CN114710406B (en) * | 2022-04-24 | 2023-09-26 | 中国工商银行股份有限公司 | Method, device, electronic equipment and medium for dynamically determining timeout threshold |
CN115190157A (en) * | 2022-07-08 | 2022-10-14 | 济南浪潮数据技术有限公司 | RPC timeout mechanism setting method, device, equipment and medium |
WO2024016622A1 (en) * | 2022-07-20 | 2024-01-25 | 北京佰才邦技术股份有限公司 | Timeout parameter determination method and apparatus and communication device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5568048B2 (en) | Parallel computer system and program | |
US9953053B2 (en) | Reliability improvement of distributed transaction processing optimizations based on connection status | |
JP5624655B2 (en) | Message to transfer backup manager in distributed server system | |
US10826812B2 (en) | Multiple quorum witness | |
US10785350B2 (en) | Heartbeat in failover cluster | |
CN107430606B (en) | Message broker system with parallel persistence | |
US20170315886A1 (en) | Locality based quorum eligibility | |
CN106874143B (en) | Server backup method and backup system thereof | |
JP2012528382A (en) | Cache data processing using cache clusters in configurable mode | |
CN107124469B (en) | Cluster node communication method and system | |
CN102917068A (en) | Self-adaptive large-scale cluster communication system and self-adaptive large-scale cluster communication method | |
WO2006125392A1 (en) | A computer processing system for realizing data updating and a data updating method | |
CN103401712A (en) | Content distribution based intelligent high-availability task processing method and system | |
KR20080101787A (en) | Intelligent failover in a load-balanced networking environment | |
CN107682460B (en) | Distributed storage cluster data communication method and system | |
US9830263B1 (en) | Cache consistency | |
US7191356B2 (en) | Method for asynchronous support of fault-tolerant and adaptive communication | |
US10897402B2 (en) | Statistics increment for multiple publishers | |
CN101442437A (en) | Method, system and equipment for implementing high availability | |
US9021109B1 (en) | Controlling requests through message headers | |
WO2017071430A1 (en) | Message processing method, network card, system, information update method, and server | |
CN110798366B (en) | Task logic processing method, device and equipment | |
CN108234595B (en) | Log transmission method and system | |
US8880670B1 (en) | Group membership discovery service | |
JP2009217765A (en) | Synchronous transmitting method to multiple destination, its implementation system and processing program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C12 | Rejection of a patent application after its publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20130206 |