CN201122978Y - High usability system based on advanced telecommunication computation platform - Google Patents

High usability system based on advanced telecommunication computation platform Download PDF

Info

Publication number
CN201122978Y
CN201122978Y CNU200720198567XU CN200720198567U CN201122978Y CN 201122978 Y CN201122978 Y CN 201122978Y CN U200720198567X U CNU200720198567X U CN U200720198567XU CN 200720198567 U CN200720198567 U CN 200720198567U CN 201122978 Y CN201122978 Y CN 201122978Y
Authority
CN
China
Prior art keywords
module
layer
service
communication
control interface
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
CNU200720198567XU
Other languages
Chinese (zh)
Inventor
李�杰
孙刚
张奇智
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai B-Star Broadband Technology Co., Ltd.
Original Assignee
Shanghai B Star Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai B Star Co Ltd filed Critical Shanghai B Star Co Ltd
Priority to CNU200720198567XU priority Critical patent/CN201122978Y/en
Application granted granted Critical
Publication of CN201122978Y publication Critical patent/CN201122978Y/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Landscapes

  • Hardware Redundancy (AREA)

Abstract

The utility model relates to a high usability system based on an advanced telecommunication calculation platform. The high usability system comprises a communication driving layer, a member group communication system layer, a distributed type control interface layer and an application serving layer in sequence and from the lower layer to the high layer, wherein, the communication driving layer, the member group communication system layer and the distributed type control interface layer are connected in sequence. Compared with the prior art, the flexibility, the dynamic performance as well as the component-based type high usability structure can extremely improve the continuous operating capability of the system without failures, fully satisfy the requirements of a telecommunication level based on an ATCA construction system for 99.999 percent high reliability and high usability, and is enormously better than the continuous servicing capability of the traditional concentrated single-point controlled type high usability systems.

Description

A kind of highly available system based on the advanced telecom computing platform
Technical field
The utility model relates to computer communication system, particularly relates to the highly available system based on the advanced telecom computing platform.
Background technology
System based on the Advanced telecom computing architecture of ATCA, though on hardware designs, just considered the high availability requirement of system, but reach the high availability of carrier class 99.999%, except adopting the Redundancy Design model on the hardware designs, the measure that will adopt some to improve system availability on the software design equally guarantees the high availability of system.Most system can not good treatment because the configuration change of the caused operational system of thrashing, and need come the availability of safeguards system by frequent enforceable fault point testing mechanism, certainly will influence the efficient of operation task like this, even will just can make system's operate as normal again by restarting relevant system service or entire machine fully.The redundant hardware equipment of sharing session information and state information makes the redundancy of physical link become possibility.
In the system equipment that carrier class requires, can not be before system service be lost efficacy, the reason of problem identificatioin is in advance also taked corresponding precautionary measures.In fact, because system all is huger and complicated usually, assessment all is to estimate and add up by calculating the mean hours number that interrupts usually.By adjusting, dispatch the adaptive ability of arranging resource situation and accident (losing efficacy such as node device), these important informations that transmit and obtain system timely and effectively can improve the managerial ability of system.In Advanced telecom computing architecture system, treat at present solution that this Problem of Failure takes usually lost efficacy exactly tolerance or gap restoring method (Gap Recovery) and inversion recovery method (Rollback Recovery) based on ATCA.Yet most systems can not solve the operational system configuration change that causes owing to Problem of Failure effectively, and need restart necessary system service or even entire machine equipment fully.High availability tries hard to avoid unexpected Problem of Failure to take place by precautionary measures.High availability initiative mainly is the situation that concentrate to solve the continuous operate as normal of single node service at present, and we need further expand to these effort on device nodes and service based on all common cooperations of the whole system environment of ATCA Advanced telecom computing architecture.
The technology that a variety of realization high availability service are arranged wherein mainly comprises master/slave type Hot Spare technology, asymmetric formula master/principal mode Hot Spare technology and symmetrical expression master/principal mode Hot Spare technology.Master/slave type Hot Spare technology is followed above-mentioned failure model.Being saved in some stable shared storage medium or by network that the state of each service role is all regular sends to relevant Hot Spare assembly.When service failure, the system equipment of Hot Spare just can according to resultant system nearest or current state information take over system service.This mode can cause because system restoration or system roll back to the transient service interruption that system's certain state in the past causes down according to the old system backup state information that obtains.Asymmetric formula master/principal mode Hot Spare technology more effectively improves reliability, availability and the serviceability of system than master/slave type Hot Spare technology.Under this model, a plurality of device nodes provide identical service, but lack cooperation, promptly work as a host apparatus under the situation that fault takes place, thereby the service of taking over of other host apparatus guarantees the service uninterrupted service ability of available raising system continuously, yet owing to lack collaboration capabilities at all equipment rooms that participate in backup mutually, state and control information between can not the synchronous host apparatus of intelligence, and make it only be fit to limited application scenario.Symmetrical expression master/principal mode Hot Spare technology comes safeguards system that continuous service ability is provided by the equipment collaboration work of two or more operation same services usually.This technology can use distributed controlling mechanism or expansion virtual synchronous mechanism to safeguard the public system status information of overall importance of a cover.Symmetrical expression master/principal mode Hot Spare model is all outstanding aspect many in handling capacity, service availability ability and service responding ability etc., but also obviously wants complicated many.The system of present most of different frameworks all exists in integrating process and similarly lacks the high availability problem, all adopts the centralized control mode of single point failure and single-point control when for example most systems designs.When single point failure or the single-point Control Node problem that in a single day breaks down, inevitably will influence whole system, thereby cause restarting of whole or part system.
Summary of the invention
Problem to be solved in the utility model is exactly to provide a kind of highly available system based on the advanced telecom computing platform for the defective that overcomes above-mentioned prior art existence.
The purpose of this utility model can be achieved through the following technical solutions: a kind of highly available system based on the advanced telecom computing platform, it is characterized in that, comprise successively that from bottom to high level communication Drive Layer, member organize communication system layer, distributed control interface layer, application service layer, described communication Drive Layer, member organize the communication system layer, distributed control interface layer is connected successively with application service layer.
Described communication Drive Layer is packaged with driver module, network communication protocol module, messenger service module and the link failure detection module that adapts to bottom hardware at least; And comprise communication API application programming interfaces; Described driver module, network communication protocol module, messenger service module and link failure detection module interconnect the back and all are connected with communication API application programming interfaces.
Described member organizes the communication system layer and is packaged with group membership management module, external fault detection module, multicast module at least, and each module interconnects.
Described distributed control interface layer is packaged with distributed control module, state machine control module, Checkpoint service module at least, dynamically campaigns for module and distributed virtual synchrodata memory module, and each module interconnects.
Described application service layer is packaged with the various application modules that comprise monitoring module, file service module, time service module, log service module at least, and each module interconnects.
Compared with prior art, the flexibility that the utility model proposes, mobilism, can greatly improve the fault-free continuous firing ability of system based on the high availability structure of assembly type, fully satisfy high reliability and high availability requirement, be better than the sustainable service ability of high-availability system of traditional centralized single-point control type greatly based on the carrier class 99.999% of ATCA architecture system; The utility model mainly is at symmetrical expression master/principal mode high-availability system for considering on the performance, but because flexible, the dynamic and Componentized characteristics of model framework, the design's model can be supported other highly available system architecture designs equally, wherein communication Drive Layer module allows system seamlessly to realize the support of network technology that different hardware equipment provider is provided and relevant bottom-layer network agreement, improve the interoperability and the interoperability of system greatly, reduced system integration cost; Plug and play type, organize the communication system layer based on the member of assembly type and provide and to replace the agile and all-purpose platform that existing member organizes scheme in communication, the flexibility that has improved system greatly according to actual needs; And distributed control interface layer provides the application programming interfaces that are adapted to different system apply property API, strengthened the ease for use and the ease for maintenance of system; Particularly dynamically election mechanism and checkpoint module, distributed control module and state machine control module are finished the master jointly and are used internodal cooperation with internodal state synchronized function and master, thereby have guaranteed the very big raising of the uninterrupted ability to work of system; Adopt this method can be under the condition of upgrading hardware not, only the carrier class system equipment that satisfies based on the ATCA framework by software design can not produce under the prerequisite of considerable influence system effectiveness and systematic function, reduce product cost widely, shorten the system research and development cycle, greatly improve the high availability requirement of system, produced great economic benefit.
Description of drawings
Fig. 1 is a structural representation of the present utility model;
Embodiment
The utility model is described in further detail below in conjunction with accompanying drawing.
As shown in Figure 1, a kind of highly available system based on the advanced telecom computing platform, comprise successively that from bottom to high level communication Drive Layer, member organize communication system layer, distributed control interface layer, application service layer, described communication Drive Layer, member organize the communication system layer, distributed control interface layer is connected successively with application service layer.
Described communication Drive Layer is packaged with driver module, network communication protocol module, messenger service module and the link failure detection module that adapts to bottom hardware at least; And comprise communication API application programming interfaces; Described driver module, network communication protocol module, messenger service module and link failure detection module interconnect the back and all are connected with communication API application programming interfaces; Described member organizes the communication system layer and is packaged with group membership management module, external fault detection module, multicast module at least, and each module interconnects; Described distributed control interface layer is packaged with distributed control module, state machine control module, Checkpoint service module at least, dynamically campaigns for module and distributed virtual synchrodata memory module, and each module interconnects; Described application service layer is packaged with the various application modules that comprise monitoring module, file service module, time service module, log service module at least, and each module interconnects.
For this master/principal mode Hot Spare highly available system that is applicable to based on the ATCA framework is provided, we at first to propose a kind of flexibly, high availability component frame model structure modular and dynamically loading and unloading.In order to meet the system requirements of ATCA Advanced telecom computing architecture platform, our high availability frame model structure is made of four main levels: communication Drive Layer, member organize communication system layer, distributed control interface layer and application service layer.
Wherein the communication Drive Layer of lowermost layer provides various adaptive bottom hardwares pairing network protocol module, can provide clean culture and multicast message service ability for the upper strata, and relevant failure detection mechanisms also is provided simultaneously.The member organizes the communication system layer provides group membership management, external fault detection and reliable multicast mechanism and member to organize interior multicast message algorithm.Distributed control interface layer is set up the passage between member's group system and the application service layer, for application service layer provide be easier to call the member organize a standards service interface of communication system layer and distributed control, state machine control, checkpoint module, message module and dynamically election mechanism module etc. enrich function.Application service layer comprises the various attendant applications of customization that are, for example system monitoring module, file service module, log service module and time service module.
This high availability frame itself just is based on the Componentized standalone module and forms.All be to carry out communication between each layer and between each module by the synchronous and asynchronous message mechanism that the messenger service module provides.Each layer can be replaced by other module layers that provides different qualities to have the same services function.This framework allows software module to use shared library, static library or plug-in part technology to realize the replacement of module.Introduce each layer below in detail.
(1) communication Drive Layer
ATCA Advanced telecom computing architecture system supports many kinds of network technologies, for example Ethernet, Infiniband, StarFabric, multiple exchange agreements such as PCI Express, RapidIO at present.Our high availability frame can be supported diverse network technology that ATCA hardware supplier is supported and existing consensus standard.Utilize the communication Drive Layer between high enabled node equipment, to set up effective communication mechanism, so that make these equipment come better to provide the network communication service for the upper layer application service by distributed control interface layer.
Use the next adaptive different network technology of communication Drive Layer and take out unified communication API application programming interfaces, can improve the interchangeability and the interoperability of system, this notion is not new thing.For example Open HPI just is to use this notion and the notion of the encapsulation communication Drive Layer general character that realizes abstract bottom hardware based on the Componentized framework, realizes the interchangeability and the interoperability of ATCA system.
The messenger service module refers to the message buffered transmission system, and message queue between different task or different node can be provided on the same node point.The a piece of news formation allows many-to-one communication.When message queue is closed, if message also do not use, the messenger service module must reservation message till use.Promptly when main lost efficacy with active node after, secondary node is responsible for receiving and handling corresponding message, up to secondary node switch come into force after, the messenger service module is just thoroughly deleted this message.This design system that could make has higher availability.This module can realize that still for the consideration of efficient and performance requirement, the author advises that strongly the form that loads by dynamic kernel module realizes, and places it in the communication Drive Layer in the model in application layer.
In addition, at present the communication Drive Layer only provides the interface of handling the initial data message, and upper-layer protocol organizes the member mainly that the communication system layer manages.This one deck is mainly used for reference and with reference to the RMIX frame model, and dynamic, back-up system isomerism (such as syllable sequence and high-level protocol etc.), reconfigurable communication framework are provided.
(2) member organizes the communication system layer
The member organizes the communication system layer and comprises agreement and the business that is necessary, these agreements and business all are to serve master/principal mode Hot Spare high availability frame, and for the upper layer application service provides Communications service between the group membership, also be suitable for that the slave unit for a plurality of Hot Spares provides the status information accordance copy services in other highly available system models such as master/slave type Hot Spare highly available system simultaneously by distributed control interface layer.The member organizes the communication system layer also provides group membership management, external fault monitoring, reliable multicast mechanism and member to organize interior multicast message algorithm.This layer have many third party's middlewares and increase income the middleware project can be for reference, for example in the SA forum in the AIS application interface specification AMF high availability Governance framework just can be used as reference model.Owing to be not the emphasis of this specification, so do not go through.
(3) distributed control interface layer
The application programming interfaces API that distributed control interface layer is supported will realize based on application characteristic.The certainty symmetrical expression is used the interface that can use internal memory, file, state machine and database and is realized, uncertain asymmetric is used and can be used distributed control interface and remote procedure call (RPC) interface to realize.These apply properties are organized the requirement of communication system layer fully based on the member, for example move on the active node of all main usefulness of the batch program of task scheduling in a group system, and safeguarding a state information of overall importance.Each active node all can receive the change of these states with identical order and safeguard consistent state information.The task scheduling solicited message is delivered to any one in these active nodes, has caused the state information change, and other active node also will receive these requests with identical order.Utilize the state of a control plane mechanism, guarantee that the flow process of task dispatch is all determined for each node that the member organizes in the communication system.When flow process finishes, check whether the state information in system meets arbitration rules, if meet, then lastest imformation arrives the distributed virtual sync database, otherwise thinks that state updating information is invalid.
The Checkpoint service module provides the function of additional record Data Detection point for system.When certain node in the system lost efficacy owing to fault causes, system can recover failure node to return from data fault point again.This checkpoint service is mainly used in the detection data and the recording status before losing efficacy that obtained before losing efficacy and continues operation, thereby reduces the influence that fault produces.The Checkpoint data are that the overall situation is effective, the checkpoint data of each active node in the system all can generate portion and duplicate by the messenger service module and send on the corresponding secondary node, in case the inefficacy of active node and cause the interruption of system's related service here.
(4) application service layer
The high availability frame that relates in this specification is because the flexible and power of bottom-layer design aspect so a lot of dissimilar application can be provided, is supported the needs of various calculating occasion.These application comprise: the various services of customization such as system monitoring service, file system service, log services, name Service, time service.

Claims (5)

1. highly available system based on the advanced telecom computing platform, it is characterized in that, comprise successively that from bottom to high level communication Drive Layer, member organize communication system layer, distributed control interface layer, application service layer, described communication Drive Layer, member organize the communication system layer, distributed control interface layer is connected successively with application service layer.
2. a kind of highly available system according to claim 1 based on the advanced telecom computing platform, it is characterized in that described communication Drive Layer is packaged with driver module, network communication protocol module, messenger service module and the link failure detection module that adapts to bottom hardware at least; And comprise communication API application programming interfaces; Described driver module, network communication protocol module, messenger service module and link failure detection module interconnect the back and all are connected with communication API application programming interfaces.
3. a kind of highly available system according to claim 1 based on the advanced telecom computing platform, it is characterized in that, described member organizes the communication system layer and is packaged with group membership management module, external fault detection module, multicast module at least, and each module interconnects.
4. a kind of highly available system according to claim 1 based on the advanced telecom computing platform, it is characterized in that, described distributed control interface layer is packaged with distributed control module, state machine control module, Checkpoint service module at least, dynamically campaigns for module and distributed virtual synchrodata memory module, and each module interconnects.
5. a kind of highly available system according to claim 1 based on the advanced telecom computing platform, it is characterized in that, described application service layer is packaged with the various application modules that comprise monitoring module, file service module, time service module, log service module at least, and each module interconnects.
CNU200720198567XU 2007-11-29 2007-11-29 High usability system based on advanced telecommunication computation platform Expired - Lifetime CN201122978Y (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNU200720198567XU CN201122978Y (en) 2007-11-29 2007-11-29 High usability system based on advanced telecommunication computation platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNU200720198567XU CN201122978Y (en) 2007-11-29 2007-11-29 High usability system based on advanced telecommunication computation platform

Publications (1)

Publication Number Publication Date
CN201122978Y true CN201122978Y (en) 2008-09-24

Family

ID=40010219

Family Applications (1)

Application Number Title Priority Date Filing Date
CNU200720198567XU Expired - Lifetime CN201122978Y (en) 2007-11-29 2007-11-29 High usability system based on advanced telecommunication computation platform

Country Status (1)

Country Link
CN (1) CN201122978Y (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8762774B2 (en) 2010-11-30 2014-06-24 Huawei Technologies Co., Ltd. Distributed blade server system, management server and switching method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8762774B2 (en) 2010-11-30 2014-06-24 Huawei Technologies Co., Ltd. Distributed blade server system, management server and switching method
US9189349B2 (en) 2010-11-30 2015-11-17 Huawei Technologies Co., Ltd. Distributed blade server system, management server and switching method

Similar Documents

Publication Publication Date Title
EP2281240B1 (en) Maintaining data integrity in data servers across data centers
CN110083662B (en) Double-living framework construction method based on platform system
US9483369B2 (en) Method and apparatus for failover detection and recovery using gratuitous address resolution messages
CN102402395B (en) Quorum disk-based non-interrupted operation method for high availability system
CN102135929B (en) Distributed fault-tolerant service system
CN101989903B (en) Dual-machine redundancy by-mouth switching method of comprehensive monitoring pre-communication controller
CN103457775B (en) A kind of high available virtual machine pond management system of based role
CN104408071A (en) Distributive database high-availability method and system based on cluster manager
US20120324187A1 (en) Memory-mirroring control apparatus and memory-mirroring control method
EP2224341B1 (en) Node system, server switching method, server device, and data transfer method
JP2000112911A (en) System and method for automatically redistributing task in data base managing system for computer network
CN102761528A (en) System and method for data management
CN101056254B (en) An expansion method, system and device of network storage device
CN108964986B (en) Application-level double-active disaster recovery system of cooperative office system
CN111949444A (en) Data backup and recovery system and method based on distributed service cluster
CN106874142B (en) Real-time data fault-tolerant processing method and system
CN107040403A (en) The method that Distributed system reliability is improved based on DDS technologies
CN111935244B (en) Service request processing system and super-integration all-in-one machine
CN112527567A (en) System disaster tolerance method, device, equipment and storage medium
CN107357800A (en) A kind of database High Availabitity zero loses solution method
CN105302670A (en) Method and device for monitoring station by multi-machine redundancy way
CN114138568A (en) Scheduling method and system for client fault transfer in Redis sentinel mode
CN1988477A (en) Network managing system with high usability property
CN102487332B (en) Fault processing method, apparatus thereof and system thereof
Engelmann et al. Symmetric Active/Active High Availability for High-Performance Computing System Services.

Legal Events

Date Code Title Description
C14 Grant of patent or utility model
GR01 Patent grant
C56 Change in the name or address of the patentee

Owner name: SHANGHAI FUTURE BROADBAND TECHNOLOGY CO., LTD.

Free format text: FORMER NAME: SHANGHAI B-STAR BROADBAND TECHNOLOGY + APPLICATION ENGINEERING RESEARCH CENTER CO., LTD.

CP03 Change of name, title or address

Address after: 200336 Shanghai city Changning District Honggu Road No. 150

Patentee after: Shanghai B-Star Broadband Technology Co., Ltd.

Address before: 200336 Shanghai City Honggu Road No. 150

Patentee before: Shanghai B-STAR Co., Ltd.

CX01 Expiry of patent term

Granted publication date: 20080924