CN100334554C - A method of standby and controlling load in distributed data processing system - Google Patents

A method of standby and controlling load in distributed data processing system Download PDF

Info

Publication number
CN100334554C
CN100334554C CNB028299396A CN02829939A CN100334554C CN 100334554 C CN100334554 C CN 100334554C CN B028299396 A CNB028299396 A CN B028299396A CN 02829939 A CN02829939 A CN 02829939A CN 100334554 C CN100334554 C CN 100334554C
Authority
CN
China
Prior art keywords
processor
processors
load
processing system
service request
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB028299396A
Other languages
Chinese (zh)
Other versions
CN1695120A (en
Inventor
李海鹏
戴存军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Publication of CN1695120A publication Critical patent/CN1695120A/en
Application granted granted Critical
Publication of CN100334554C publication Critical patent/CN100334554C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Hardware Redundancy (AREA)
  • Multi Processors (AREA)

Abstract

The present invention relates to a distributed processing system. The distributed processing system comprises a plurality of load groups, an interface processing module and a plurality of processors, wherein the load groups are used for generating service requests; the interface processing module is used for receiving the service requests from the load groups and sending the service requests to corresponding processors; each processor processes the services of a plurality of load groups and each load group uses one processor of a plurality of processors as a master processor to bear the processing of the services and uses the other processor of a plurality of processors as a standby processor; when the master processor has failures, the standby processor processes the services of the load groups.

Description

Backup and duty control method in the full distributed processing system (DPS)
Technical field:
The present invention relates to a kind of full distributed processing system (DPS), particularly backup in the fully-distributed system and duty control method.
Background technology:
In distributed system, usually backup of adopting and control mode are active and standby with distributing, that is: the disposal system of a main usefulness is added a standby disposal system, under normal circumstances, back-up system is not handled any business, when having only the master to break down, just all loads are transferred in the back-up system with system.
The mode that also has a kind of normal employing is the load sharing mode, that is: two systems respectively share the load of half, do not have active and standby branch, if one of them system breaks down, then all load all by another system handles.In the patent No. is the distributed data disposal system that the file " Distributeddata processing system and method for processing data in distributed dataprocessing system (the distributed data disposal system and the method that are used for data processing in the distributed data disposal system) " of EP1 139 235 A2 both disclosed a kind of like this mode.
Yet above-mentioned these processing modes all need to reserve for the backup operation that will carry out the processing power of half, so usage factor of system resource can not surpass 50%.Really, such control mode can be suitable in small distributed processing system (DPS), but in the full disposal system that distributes of high capacity, the probability that has made processor break down in the development of prior art is reduced under the minimum situation, and such utilization factor is undoubtedly a kind of waste.
In order to improve the utilization factor of system, the backup mode of a kind of N+1 in being the patent document " Distributed data access system including a plurality of database accessprocessors with one-for-n redundancy (the distributed data access system with comprising of 1/N redundancy of a plurality of database access processors) " of US5408649, the patent No. has been proposed, that is: in a plurality of disposal systems, there is a disposal system to be in the backup status, when a disposal system in a plurality of disposal systems of normal operation breaks down and withdraws from when service, start this back-up processing system, carry out work with the system that replaces breaking down.In this system, because for N+1 disposal system, N the disposal system that is in the operate as normal can't prefabricatedly know which system wherein should back up in good time on back-up system, therefore, N the disposal system that is in the operate as normal do not back up on back-up system in good time; When the system of wherein normal operation breaks down, need backed up data to lose, standby system needs to restart new processing after starting.
This working method is for for the distributed processing system(DPS) of data processing, and especially for requiring data and process to handling all to back up in realtime, that is: real-time requires very high communication system, obviously is unacceptable.
Summary of the invention:
The objective of the invention is to construct a kind of backup of in the full distributed processing system (DPS) of high capacity, using and duty control method and system, this method and system not only can satisfy the requirement of backing up in realtime, that is: improve the reliability of system, and can improve the utilization factor of system effectively.
According to a kind of distributed processing system(DPS) provided by the present invention, the load group that comprises a plurality of generation service request, be used to receive from the service request of each load group and service request be distributed to an interface processing module of corresponding processor, and a plurality of processors, wherein each processor is responsible for handling the business of a plurality of load groups, and each load group is born its business processing by a processor in a plurality of processors as master processor, and by another processor in a plurality of processors as spare processor, when this master processor broke down, this spare processor was responsible for handling the business of this load group.Comprise a synchronous triggering module in the wherein said processor, be used for when described master processor breaks down, with the information synchronization in the master processor in spare processor.
According to the disposal route of carrying out in a kind of distributed processing system(DPS) provided by the present invention, comprise that step receives the service request from a plurality of load groups; Service request is distributed to processor corresponding in a plurality of processors handle, each processor wherein can be handled the business of a plurality of load groups; When breaking down as the master processor of handling this service request in a plurality of processors, this service request is handled by other a processor as the spare processor of this service request in a plurality of processors, it is characterized in that: when described master processor breaks down, with the information synchronization in the master processor in spare processor.
In a kind of distributed processing system(DPS) and disposal route thereof that the invention described above provided, a plurality of processors comprise 3 processors at least.
Describe in detail:
Be elaborated around synoptic diagram respectively below:
What Fig. 1 described is concrete dealing with relationship between processor and the load group;
When Fig. 2 is illustrated in one of them processor and breaks down, the disposition of load;
When Fig. 3 illustrates adjacent two processors and breaks down, the disposition of load;
Fig. 4 has described the FB(flow block) that realizes this load sharing mode
Fig. 5 has described the complicated type load sharing mode that this invention comprised
As shown in Figure 1, four processors are arranged in this distributed processing system(DPS), processor 1 is responsible for handling the business from load group 1 and 2 under the condition of normal operation, processor 2 is responsible for handling the business from load group 3 and 4, same processor 3 is handled load group 5 and 6, and processor 4 is responsible for load group 7 and 8.
Owing to handle the certain peripheral environment of service needed of a load group, for example: data qualification, internal memory condition etc., thereby be subjected to the restriction of physical condition, a processor does not have the business that condition is handled all load groups.So need the corresponding relation between definite processor and the load group, and when starting, be ready to environmental baseline.
In example shown in Figure 1, processor 1 also has the ability of the business that can handle load group 8 and load group 3, breaks down if that is: bear the master processor 4 of the business processing of load group 8, and then load group 8 can be transferred to and continue in the processor 1 to handle.Simultaneously, if processor 4 is when normal process load group 8 professional, can be in processor 1 with the intermediate information backed up in synchronization, just can accomplish to break down and when taking over load group 8 professional by processor 1 when processor 4, be implemented in the switching of carrying out processor under the situation of non-interrupting service.
In like manner, other all load groups all have the backup condition on its adjacent processor, and each processor too can two adjacent load groups of back-up processing, so just form a load backup chain, this chain link ring interlocks, and finishes business processing jointly.
If there is a processor to break down, situation as shown in Figure 2, suppose that processor 2 breaks down, then in the load group of the former processing of fault handling machine, the load of load group 3 changes have been handled by processor 1, the load of load group 4 is then handled by processor 3, and the load impact that has caused when dispersion treatment has avoided too big load to change by another processor processing has like this reduced the generation of a chain of fault.Under this situation, processor has only increased a load group, and it had two load groups originally, therefore as can be seen under normal circumstances the load factor of this processor can reach 66%.
Under the mutually redundant in twos system of tradition, if load group 1~4 bear respectively by processor 1 and 2 and backup each other, if under the situation that extreme processor 1 and processor 2 all break down, must cause load group 1~4 whole service disconnection.
And Fig. 3 shows under the back-up job mode that the present invention proposes, burden apportionment situation when processor 1 and 2 breaks down simultaneously; Load group 1 is because adjacent with processor 4 as can be seen, and contains the condition of handling this load group in the processor 4, so load group 1 forwards processor to and handle, and same load group 4 forwards processing on the processor 3 to.Two load group service disconnection are only arranged.
In general, polyprocessor distributed processing system (DPS) externally only has an interface processing module 10, is used to receive the service request from the outside, after service request enters this interface module, is distributed to each processor by certain mode and handles.Also adopt this implementation in the method described in the invention, its implementation and theory diagram are as shown in Figure 4.
In distributed processing system (DPS) of the present invention shown in Figure 4, data and load group parameters needed conditions all in the system all exist in the concentrated Large Volume Data storehouse, organize according to the load of logic in database and distinguish these data and unit, and----it is the twice of processor quantity that the dividing elements that professional ability is similar is gone into the load group number that same load grouping----divides altogether.
When processor starts, be loaded in this processor according to should the load group relevant data of load group that this processor distributed and this processor needs data and other conditions, to carry out the preparation that to manage business as the load group of back-up processing.
In the interface processing module, when receiving a service request, at first determine this service request from load cell, then as mentioned above, according to the grouping of load cell, determine the service groups that this service request is affiliated, thereby by the distribution of services table in service inquiry module of visit 30, according to the information in the distribution of services table of forming by load group, master processor, spare processor, determine to be responsible for handling the master processor and the spare processor of this service request.Wherein, this service inquiry module can place above-mentioned interface processing module, also can place each processor; Distribution of services table in the service inquiry module is that prior static configuration is good, and distribution of services table as shown in Figure 4 is the distribution of services table that is provided with according to the structure of Fig. 1 annular backup link in advance.
According to the information of distribution of services table, after the information that obtains about the master processor of handling this service request and spare processor, also need state according to processor, determine that present business specifically should still be responsible for processing by spare processor by master processor.
In the interface processing module, a system management module 20 is arranged, it is responsible for writing down the status information of each processor; By and nonidentical processor between get in touch, this system management module is being safeguarded the status information of each processor, that is: when a processor in the system broke down, system supervisor was made a response at once, and the state recording of this processor is made amendment.
If the information in the system management module shows that the state of master processor of this service request correspondence is normal, then this service request is distributed to its master processor; If break down but the information in the system management module shows this master processor, then this service request distributed to spare processor and handle.
In processor inside, the processing of each business all has a memory field corresponding with it, has write down each pilot process of this business processing in this memory field, has determined professional trend.A synchronous triggering module that needs external trigger is arranged in processor, the synchronization program in this synchronous triggering module, and have quick synchronizing channel between the adjacent processor.In case occur to need preserve and be synchronized to the situation of spare processor in the business processing process, will trigger this synchronization program, this synchronization program can be with the content synchronization of memory field in the master processor of this service request correspondence to spare processor.Therefore, in the business processing process, even master processor breaks down, this professional subsequent message is when being delivered to the spare processor processing, still can find the content that in the memory field of master processor, writes down of this business, and, proceed business processing according to the content that wherein writes down, thereby realized when breaking down switch handler professional continuous processing.
Core concept of the present invention is to go in ring to back up the formation of chain, realizes the backup and the load control of large-capacity distributing system by organizing suitable load group and the corresponding relation between the processor.
Fig. 1 is that content of the present invention should not only be confined to this according to a kind of canonical form in the constructed belt backup chain of inventive concept.According to thought of the present invention, can also concern by defining between more complicated load group and the processor, construct other forms of more complicated backup chain, better to back up in the realization large-capacity distributing system and to load and control.The backup link form of a kind of complexity as shown in Figure 5.Complicated backup link has the effect of more performance and load control, especially for the system of high capacity and vast capacity load and more multiprocessor carry out the system of distribution process, more can improve the utilization factor and the stability of total system.
Beneficial effect
According to the present invention above-mentioned distributed processing system(DPS) and method thereof, because in this system, Taked each processor to be responsible for processing the business of a plurality of load groups and each load group by many A processor in the individual processor is born its Business Processing as master processor and by a plurality of places Another processor in the reason machine as spare processor, when this master processor breaks down should Spare processor is responsible for processing the mode of the business of this load group, therefore, and when one of them processing When the machine node broke down, the load group of processing on it was come by its adjacent a plurality of processors respectively Share, that is: under the prerequisite that load is discontented with when guaranteeing that a processor breaks down, usual place The manageable load of reason machine is N/ (N+1) * 100%, and wherein N is adjacent back-up processing machine Quantity, just the load group is to the multiple of processor. Like this, when N=2, processing negative Lotus is 66.7%, and when N=3, accessible load is that 75%, N is more big, and is daily passable The load of processing is just more high, thereby has improved the utilization rate of system resource.
In addition, if two adjacent processors break down simultaneously, adopting branch of the present invention After cloth formula treatment system and the method, because load group wherein is respectively by their adjacent processor Process, thereby reduced the paralysis quantity of load group, improved the reliability of system.
Simultaneously, owing to have synchronous trigger module in the processor, therefore can be with master processor In the content synchronization of the corresponding memory field of service request in spare processor, thereby realized The continuity that system manages business, and improved the reliability of system.

Claims (10)

1, a kind of distributed processing system(DPS) comprises: a plurality of load groups are used to produce service request; An interface processing module is used to receive the service request from each load group, and service request is distributed to corresponding processor; A plurality of described processors, wherein each processor is responsible for handling the business of a plurality of described load groups, and each load group is born its business processing by a processor in described a plurality of processors as master processor, and by another processor in described a plurality of processors as spare processor, when this master processor breaks down, this spare processor is responsible for handling the business of this load group, it is characterized in that: comprise a synchronous triggering module in the wherein said processor, be used for when described master processor breaks down, with the information synchronization in the master processor in spare processor.
2, distributed processing system(DPS) as claimed in claim 1, wherein said a plurality of processors comprise 3 processors at least.
3, distributed processing system(DPS) as claimed in claim 1 or 2 wherein also comprises: a service inquiry module is used to provide the map information between described service request and the described a plurality of processor.
4, distributed processing system(DPS) as claimed in claim 1 or 2, wherein said interface also comprises: a system management module is used to provide the status information of processor.
5, distributed processing system(DPS) as claimed in claim 3, wherein said interface also comprises: a system management module is used to provide the status information of processor.
6, the disposal route of carrying out in a kind of distributed processing system(DPS) comprises step: receive the service request from a plurality of load groups; Described service request is distributed to processor corresponding in a plurality of processors handle, each processor wherein can be handled the business of a plurality of described load groups; When breaking down as the master processor of handling this service request in described a plurality of processors, this service request is handled by other a processor as the spare processor of this service request in described a plurality of processors, it is characterized in that: when described master processor breaks down, with the information synchronization in the master processor in spare processor.
7, the disposal route of carrying out in a kind of distributed processing system(DPS) as claimed in claim 6, wherein said a plurality of processors comprise 3 processors at least.
8, the disposal route as carrying out in claim 6 or the 7 described a kind of distributed processing system(DPS)s wherein also comprises step: provide the map information between described service request and the described a plurality of processor.
9, the disposal route as carrying out in claim 6 or the 7 described a kind of distributed processing system(DPS)s wherein also comprises step: the status information that processor is provided.
10, the disposal route of carrying out in a kind of distributed processing system(DPS) as claimed in claim 8 wherein also comprises step: the status information that processor is provided.
CNB028299396A 2002-12-31 2002-12-31 A method of standby and controlling load in distributed data processing system Expired - Fee Related CN100334554C (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2002/000939 WO2004059484A1 (en) 2002-12-31 2002-12-31 A method of standby and controlling load in distributed data processing system

Publications (2)

Publication Number Publication Date
CN1695120A CN1695120A (en) 2005-11-09
CN100334554C true CN100334554C (en) 2007-08-29

Family

ID=32661066

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB028299396A Expired - Fee Related CN100334554C (en) 2002-12-31 2002-12-31 A method of standby and controlling load in distributed data processing system

Country Status (3)

Country Link
CN (1) CN100334554C (en)
AU (1) AU2002357568A1 (en)
WO (1) WO2004059484A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100466534C (en) * 2004-11-12 2009-03-04 华为技术有限公司 Method for processing fault of multimedia sub-system equipment
CN1889699B (en) * 2006-07-27 2010-05-12 华为技术有限公司 Distributing system business dispensing method and system
KR102082282B1 (en) * 2016-01-14 2020-02-27 후아웨이 테크놀러지 컴퍼니 리미티드 Method and system for managing resource objects

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5307481A (en) * 1990-02-28 1994-04-26 Hitachi, Ltd. Highly reliable online system
CN1092886A (en) * 1992-12-08 1994-09-28 艾利森电话股份有限公司 Be used for the system that database backs up
US5408649A (en) * 1993-04-30 1995-04-18 Quotron Systems, Inc. Distributed data access system including a plurality of database access processors with one-for-N redundancy
US5655120A (en) * 1993-09-24 1997-08-05 Siemens Aktiengesellschaft Method for load balancing in a multi-processor system where arising jobs are processed by a plurality of processors under real-time conditions
CN1169191A (en) * 1994-12-09 1997-12-31 艾利森电话股份有限公司 Configuration Mechanism

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5307481A (en) * 1990-02-28 1994-04-26 Hitachi, Ltd. Highly reliable online system
CN1092886A (en) * 1992-12-08 1994-09-28 艾利森电话股份有限公司 Be used for the system that database backs up
US5408649A (en) * 1993-04-30 1995-04-18 Quotron Systems, Inc. Distributed data access system including a plurality of database access processors with one-for-N redundancy
US5655120A (en) * 1993-09-24 1997-08-05 Siemens Aktiengesellschaft Method for load balancing in a multi-processor system where arising jobs are processed by a plurality of processors under real-time conditions
CN1169191A (en) * 1994-12-09 1997-12-31 艾利森电话股份有限公司 Configuration Mechanism

Also Published As

Publication number Publication date
CN1695120A (en) 2005-11-09
WO2004059484A1 (en) 2004-07-15
AU2002357568A1 (en) 2004-07-22

Similar Documents

Publication Publication Date Title
EP3694148A1 (en) Configuration modification method for storage cluster, storage cluster and computer system
CN101227315B (en) Dynamic state server colony and control method thereof
US6421739B1 (en) Fault-tolerant java virtual machine
CN102411639B (en) Multi-copy storage management method and system of metadata
CN100470494C (en) Cluster availability management method and system
CN1893370B (en) Server cluster recovery and maintenance method and system
US9201747B2 (en) Real time database system
US8032786B2 (en) Information-processing equipment and system therefor with switching control for switchover operation
JPS59133663A (en) Message transmission between task execution means for systemof allowing fault in decentralized multiprocessor/computer
EP2224341B1 (en) Node system, server switching method, server device, and data transfer method
CN101751415A (en) Metadata service system metadata synchronized method and writing server updating method
CN112597202B (en) Data query method and device, computer equipment and storage medium
CN112732491B (en) Data processing system and business data processing method based on data processing system
CN101482829A (en) Cluster system, processing equipment and its redundancy method
CN109361777A (en) Synchronous method, synchronization system and the relevant apparatus of distributed type assemblies node state
KR100323255B1 (en) Job taking-over system
CN113515408A (en) Data disaster tolerance method, device, equipment and medium
CN100334554C (en) A method of standby and controlling load in distributed data processing system
KR100462886B1 (en) System combined loadsharing structure and primary/backup structure
CN102571311B (en) Master-slave switching communication system and master-slave switching communication method
CN103888510B (en) A kind of business high availability method of cloud computation data center
CN100499387C (en) A method of singleboard N+1 backup in communication system
JPH04311251A (en) Multiprocessor system
JP2002055840A (en) Redundant constitution switching system
CN109947593B (en) Data disaster tolerance method, system, strategy arbitration device and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20070829

Termination date: 20181231

CF01 Termination of patent right due to non-payment of annual fee