CN111447079B - High-availability extension system and method based on SCA framework - Google Patents

High-availability extension system and method based on SCA framework Download PDF

Info

Publication number
CN111447079B
CN111447079B CN202010129227.1A CN202010129227A CN111447079B CN 111447079 B CN111447079 B CN 111447079B CN 202010129227 A CN202010129227 A CN 202010129227A CN 111447079 B CN111447079 B CN 111447079B
Authority
CN
China
Prior art keywords
management
service
reconstruction
fault
card
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010129227.1A
Other languages
Chinese (zh)
Other versions
CN111447079A (en
Inventor
符凯
彭宏
刘荣宽
包晟临
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 32 Research Institute
Original Assignee
CETC 32 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 32 Research Institute filed Critical CETC 32 Research Institute
Priority to CN202010129227.1A priority Critical patent/CN111447079B/en
Publication of CN111447079A publication Critical patent/CN111447079A/en
Application granted granted Critical
Publication of CN111447079B publication Critical patent/CN111447079B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0663Performing the actions predefined by failover planning, e.g. switching to standby network elements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0659Management of faults, events, alarms or notifications using network fault recovery by isolating or reconfiguring faulty entities
    • H04L41/0661Management of faults, events, alarms or notifications using network fault recovery by isolating or reconfiguring faulty entities by reconfiguring faulty entities

Abstract

The invention provides a high-availability extension system and a method based on an SCA framework, which are characterized in that a main control node used as a hot standby is added and used as a standby domain management of an original main control node; two sets of master control dedicated services are allowed to back up each other, wherein the master control dedicated services comprise a resource configuration service, an application management service and a deployment reconstruction service; adding operation monitoring service and health management service to monitor and process faults, and adding a fault reconstruction mechanism on the basis of on-demand reconstruction of deployment reconstruction service; and modifying a bottom communication mechanism of the deployment management middleware to reduce the coupling of two communication ends. The invention carries out high-availability-oriented expansion on the basis of the SCA framework technology, so that the SCA framework technology can be extended to the high-availability safety field, and the component technology and the application reconstruction technology of the SCA can be applied to more fields.

Description

High-availability extension system and method based on SCA framework
Technical Field
The invention relates to the technical field of software radio, in particular to a high-availability extension system and method based on an SCA framework.
Background
With the development of Software Radio (SR) technology, especially the design concept and concept of "Software Defined" brought by Software radio, the Software Defined technology has been extended to other fields more and more, and thus new concepts and concepts such as Software Defined Radar (SDR), Software Defined Satellite (SDS), Software Defined Network (SDN) and the like have been generated. However, the SCA architecture designed for software radio suffers from water and soil inadequacy when being transplanted to other fields, especially some fields with strong requirements on High Availability (HA) of the system, and the defects of the SCA framework technology in high availability (such as single point of failure) are revealed to be inexhaustible and insufficient.
The prior art related to the present application is patent document CN 108737190a, and discloses a device management method based on an SCA core framework, in which a device manager completes deployment and startup of all devices in a processing node, then the device manager completes registration of device components, and finally the device manager calls device interface functions to complete management of the device components, so that operation/stop of the device components can be flexibly controlled, resources and states of all devices in a hierarchical management platform can be managed, and underlying hardware devices can be managed and controlled through the device components.
Disclosure of Invention
In view of the deficiencies in the prior art, it is an object of the present invention to provide a system and method for high availability expansion based on SCA framework.
According to the invention, the high-availability extension system based on the SCA framework comprises:
the standby domain management module: adding a main control node as a hot standby as a standby domain management of the original main control node;
running a management redundancy module: allowing two sets of master control dedicated services to back up each other, wherein the master control dedicated services comprise a resource configuration service, an application management service and a deployment reconstruction service;
the operation management monitoring module: and adding operation monitoring service and health management service to monitor and process faults, and adding a fault reconstruction mechanism on the basis of on-demand reconstruction by deploying reconstruction service.
Preferably, the SCA framework-based high availability expansion system further includes a deployment middleware module: and modifying a bottom communication mechanism of the deployment management middleware to reduce the coupling of two communication ends.
Preferably, the standby domain management is that when the primary domain management performed by the original main control node fails, the main control node is started to take over the work of the original main control node, and various support services are run, where the support services include a resource configuration service, a deployment and reconfiguration service, and an application management service, and various hardware resources and software components simultaneously reserve registration information and management information in the main control node and the original main control node.
Preferably, the deployment middleware module comprises:
a protocol modification module: a bottom layer communication mechanism of the CORBA middleware adopts a connectionless UDP protocol, so that two communication parties are equal;
a link detection module: and a heartbeat mechanism or a BIT hardware detection mechanism is adopted, so that two communication parties can perceive the link condition and the state of the opposite party, and when the failure of a link terminal or the opposite party is perceived, the CORBA middleware recovers resources and sends a message notification.
Preferably, a standard communication interface is packaged in a specification mode by adopting mapping table-based hardware abstraction layer communication, a bottom layer communication mechanism is shielded, so that the separation of a communication mode between waveform components and a specific hardware platform is realized, the consistency of a bottom layer communication access interface of the waveform components is kept, and the waveform components are transplanted between heterogeneous hardware platforms.
The high-availability extension method based on the SCA framework provided by the invention comprises the following steps:
a standby domain management step: adding a main control node as a hot standby as a standby domain management of the original main control node;
running management redundancy steps: allowing two sets of master control dedicated services to back up each other, wherein the master control dedicated services comprise a resource configuration service, an application management service and a deployment reconstruction service;
operation management monitoring: and adding operation monitoring service and health management service to monitor and process faults, and adding a fault reconstruction mechanism on the basis of on-demand reconstruction for deploying reconstruction service.
Preferably, the high-availability expansion method based on the SCA framework further comprises the step of deploying middleware: and modifying a bottom communication mechanism of the deployment management middleware to reduce the coupling of two communication ends.
Compared with the prior art, the invention has the following beneficial effects:
1. the invention carries out high-availability-oriented expansion on the basis of the SCA framework technology, so that the SCA framework technology can be extended to the high-availability safety field, and the component technology and the application reconstruction technology of the SCA can be applied to more fields.
2. The invention solves the problems that the master control node is single and is easy to have single point failure, the standard SCA framework only has one master control node, and when the master control node has a problem, the whole system can not work normally.
3. The invention solves the problem that the failure propagation is generated due to the tight coupling between nodes related to the middleware, so that the failure of one node can affect other related nodes.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a system framework diagram of the present invention;
FIG. 2 is a functional composition diagram of a componentized software integration development environment;
FIG. 3 is a diagram illustrating software distribution in a hardware device
FIG. 4 is a schematic diagram of a system initialization process;
FIG. 5 is a schematic illustration of an application deployment flow;
FIG. 6 is a schematic diagram of a signal processing card failure (non-zero core);
FIG. 7 is a schematic diagram of a signal processing card fault (zero core);
fig. 8 is a schematic diagram of a switch card failure.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the invention.
The invention aims to solve the technical problem that a high-availability extension mechanism is added on the basis of an SCA framework technology, so that the SCA framework technology can be extended to the high-availability field. The main problems of the SCA framework in the high availability field are two, one is single of the main control node, the standard SCA framework only has one main control node, and when the main control node goes wrong, the whole system can not work normally (namely single point failure). Another is the tight coupling between nodes caused by the middleware, so that the failure of one node can affect other related nodes (i.e. the failure propagates). The first problem can be solved by adding a hot standby master control node, and the second problem needs to modify the middleware.
A typical high availability software platform system comprises three parts: the system comprises an open type operation environment constructed by referring to SCA specifications and running on system equipment, a system operation management tool running on an integrated management machine and a component integrated development environment running on a development PC. As shown in fig. 1, the open operating environment operates on a terrestrial integrated terminal device, and includes: BSP drivers for the various modules, operating systems, distributed communication middleware, a runtime management framework, and the like. Each communication waveform is composed of a plurality of waveform components which run on heterogeneous executable processing units (including CUP, DSP and FPGA), configuration parameters of each component and communication interconnection relations among the components are restricted through waveform description files, and the operation management framework loads, assembles and monitors each waveform through the description files.
The invention makes the following expansion and reconstruction on the basis of a typical SCA system: firstly, on the basis of the original master control node, a hot standby master control node is added, for example, a standby domain management module is added on the basis of the master domain management module in fig. 1; secondly, corresponding redundancy supplement is carried out on the operation management framework, and two sets of dedicated services are allowed in one system, wherein the two sets of dedicated services comprise a resource configuration service, an application management service, a deployment reconstruction service, an application blueprint service, a reconstruction blueprint service, a resource blueprint service and a blueprint analysis service in the figure 1, and the two services are backup with each other; thirdly, adding operation monitoring service and health management service in the operation management framework for monitoring and processing faults, modifying and deploying reconstruction service, and adding a fault reconstruction mechanism on the basis of reconstruction according to needs; finally, a bottom layer communication mechanism of the deployment management middleware is modified, and the coupling of two communication ends is reduced, such as the deployment management middleware in fig. 1.
The system operation management tool software in fig. 1 operates on the integrated management machine, and includes operation management of waveform application lifecycle, configuration management of applications and systems, system software and hardware resource monitoring, system fault alarm, operation log management, and the like. The componentized application integration development tool in fig. 1 runs on a development PC, and is based on an Eclipse framework, and comprises application integration development tools of embedded software and DSP software.
The software running environment comprises an embedded operating system and a matched development environment, a DSP operating system and a matched development environment, a deployment management middleware, a distributed communication middleware, hardware abstraction layer software and running management framework software.
The method comprises the following specific steps:
(1) embedded operating system and matched development environment
The embedded operating system is system software supporting the running of embedded applications, and comprises a kernel, a device driver, a communication protocol, a file system and the like. In the system, an embedded operating system is an important component of a waveform application running environment, provides abstraction and encapsulation of bottom-layer physical hardware for upper-layer waveform application, organizes and manages various embedded computer software and hardware resources in a reasonable and effective mode, controls program execution flow, provides a system calling interface, a driving program and the like, and enables a user to conveniently use the embedded computer. The matched development environment ReDe provides an engineering management tool set, a compiling tool chain, a debugging tool set and other engineering management and application development tools.
(2) DSP operating system and matched development environment
The DSP operating system supports system software operated by application on the DSP, provides AMP multi-core support, provides rich and efficient inter-task communication mechanisms such as semaphore, message queue, event and asynchronous signal mechanisms and the like, and supports dynamic memory management, I/O equipment management, console Shell, file system and the like; and a POSIX system calling interface and a driver are provided. The matched development environment provides engineering management and application development tool sets such as an engineering management tool, a compiling tool chain, a debugging tool for the DSP application system and the like.
(3) Deployment management middleware
The system adopts lightweight CORBA middleware as a deployment management communication mechanism of an operation management framework, and is used for deploying application components and carrying out operation management control on the components. CORBA middleware plays a role of a soft bus of the whole system software operating environment, and can effectively shield the heterogeneity of an operating system and a network protocol in a distributed environment, thereby providing efficient synchronous/asynchronous cross-platform communication capability among objects.
However, because the default underlying communication mechanism of the CORBA middleware adopts a connection-oriented TCP/IP protocol, the two communication parties are not in equal status under the communication mechanism, and usually one communication party is required to be a server and adopt a passive response mode, and the other communication party is required to be a client and adopt an active request mode. This results in a one-way dependency of the client on the server, that is, when the server fails, the client cannot work normally, but the failure of the client does not affect the normal operation of the server.
CORBA middleware is characterized by being object-oriented, and the communication mode adopts inter-object interoperation, namely, the objects can be mutually called, and the positions of the objects are equal. To achieve this equality, each CORBA object must act as both a client and a server, thus forming a virtually tightly coupled relationship.
If the CORBA objects are distributed on different nodes, tight coupling among the nodes can be caused, which can cause fault propagation in a complex distributed system, namely when one node fails, chain reaction can be caused due to the association relationship, so that a plurality of nodes can not work normally, and the whole system is paralyzed.
Because the SCA system usually adopts middleware, namely CORBA middleware, and the tight coupling between objects is a key factor for restricting the SCA from expanding to high availability. In order to overcome the problem, a bottom layer communication mechanism of the CORBA middleware is modified, a connectionless UDP protocol is adopted, two communication parties are enabled to be equal, a link detection mechanism (a heartbeat mechanism is adopted, or hardware detection mechanisms such as BIT are relied on) is added, and any party can quickly sense the link condition and the state of the other party. When the link is interrupted or the opposite side fails, the middleware can rapidly recycle resources and inform the upper framework to process.
(4) Distributed communication middleware
The inter-component communication does not need to be limited to a specific middleware, and CORBA (communication application such as radio is suitable) or high-speed IPC middleware based on a global port mapping table can be adopted, wherein both the CORBA and the ipido provide a C interface (CORBA also provides a C + + interface on a CPU) and are adapted to ethernet and RapidIO underlying communication protocols, and are mainly used for application component communication programming.
(5) Hardware abstraction layer software
The hardware abstraction layer software mainly aims at the communication between the FPGA and the FPGA, between the FPGA and the DSP, and between the FPGA and the CPU, a standard communication interface can be standardized and packaged through the hardware abstraction layer communication based on a mapping table, a related bottom layer communication mechanism is shielded, the separation of a communication mode between waveform components and a specific hardware platform is realized, the consistency of a bottom layer communication access interface of the waveform components is kept, and the waveform components are easy to transplant among heterogeneous hardware platforms.
(6) Running management framework software
The operation management framework refers to the core component of the operation environment defined by the SCA specification and provides equipment resource management, deployment reconstruction management, application management and management interfaces of various support services; the method provides a componentized running environment of the waveform application software, realizes running management of installation, deployment, dynamic loading and unloading and the like of the waveform application, can uniformly manage system resources, analyzes attributes and configuration description of various resources, and provides software and hardware resource matching state query. The method can support the operation management of application component level dynamic deployment, loading and unloading and the like and the fault migration of user-defined planning.
For high availability, a main domain management module and a standby domain management module are set, equipment resource management, deployment reconfiguration management, application management and various support services are operated at the same time, and various hardware resources and software components in the system need to be registered with the main domain management module and the standby domain management module at the same time. When the main domain management module fails, the backup domain management module is required to take over the work of the main domain management module to undertake the functions of the main domain management module until a new backup module is online.
The system operation management tool software mainly comprises operation management, system configuration, resource monitoring, fault monitoring and alarming and log management.
The method comprises the following specific steps:
(1) operation management
And performing life cycle operation management on the waveform application according to a user strategy, such as installation, deletion, dynamic/static deployment, uninstallation, starting, stopping and other management operations of the waveform application.
(2) System configuration
The configuration management of the system mainly refers to that a user can configure some processing strategies of the framework on line, such as heartbeat detection periods of the main domain module and the standby domain module, application of reconfiguration strategy setting and the like.
(3) Resource monitoring
Multilevel system software and hardware resource monitoring comprises:
monitoring a rack and a module: providing BMC state data monitoring of hardware module such as power-on information, slot number, voltage, current, temperature and the like and power switch management based on hardware support;
monitoring operating system resources: various resources managed by the operating system are monitored, such as processor load, task running conditions on a working core and a core, memory use allocation conditions and the like.
Application component and resource deployment monitoring: the installation, deployment and operation conditions of the application components, the logic devices and the services are monitored.
(4) Fault monitoring alarm
According to the monitoring result of the monitoring service on the equipment, the system fault warning information is reported, and a fault processing strategy and processing result information can be provided.
(5) Log management
And performing management operations such as retrieval query, setting, deletion and the like on the running and monitoring logs.
As shown in fig. 2, the software of the modular software integrated development environment runs in the upper computer, which is an integrated environment for developing the waveform application software and the platform software, and integrates a series of tool kits required in the waveform development process, so as to be conveniently and quickly provided for waveform application developers and platform software developers. The software supports integrated guide development functions of engineering management, visual description and packaging of components, visual assembly of waveforms, automatic generation of code frames and XML configuration files, source code editing, target code generation, simulation debugging, operation testing and the like. The componentized software development environment can be integrated in an embedded software development environment to form a complete componentized software integration development environment. The integrated development environment composition diagram is shown in fig. 2, and is based on an Eclipse framework, and is composed of a software development basic tool set, a waveform development kit, a platform development kit, an operation management tool and a system resource query tool.
The Eclipse framework is an extensible platform for constructing an SCA integrated development environment and provides plug-in creating, integrating and managing functions.
The basic tool set for software development provides basic functions of engineering management, resource library management, tool chain management, framework code management, simulation and test support, a code editor, an XML editor, XML validity verification and operating system configuration, a code compiler and the like.
The SCA waveform development kit is used by SCA waveform application developers, provides functions of waveform component visual description, component code editing, component code generation, waveform component packaging, component visual assembly, waveform application packaging, waveform deployment planning and the like, and supports the whole process from visual description, code frame generation, editing, compiling, packaging, assembly to waveform deployment planning of waveform application.
The SCA platform development kit is used by SCA platform software developers, provides development support for equipment, services, an equipment manager and a domain manager in the SCA software platform, provides functions of visual assembly and node packaging of nodes and visual assembly and platform packaging of the platform, and facilitates development and maintenance of SCA platform software by the SCA platform software developers.
The deployment and operation management tool can carry out installation deployment and operation management on the SCA waveform, and provides functions of waveform installation and deletion, deployment and uninstallation, starting and stopping, target machine information query and the like. The user can conveniently carry out the integration test of the system by using the tool.
As shown in fig. 3, in the distribution of the software system in the hardware device, the PPC module on the switch card is used as a master-slave domain management module, the embedded operating system running on the switch card is a homemade ruihua operating system (works), which mainly executes device resource management, deployment reconfiguration management, application management and various support services of the running management framework, and various hardware resources and software components in the system need to simultaneously maintain registration and management information in the master-slave two domain management modules. The P2020CPU and FPGA computational resources on the signal processing board can be used as one set of channels, and the TI6670 and FPGA can be used as an independent other set of channel resources. Both P2020 and TI6678 have a domestic Ruihua operating System (ReWorks) running thereon. The integrated management machine is used as an independent external device, is connected with the ground integrated terminal through a gigabit Ethernet, an SRIO or a CAN bus, runs a system operation management tool with a human-computer interface, and controls, manages and monitors information to interact with the ground integrated terminal through a middleware.
The system fault reconstruction design comprises an applied fault reconstruction design and a design of a main domain management module and a standby domain management module.
In the fault reconstruction design of the application, the design of a core framework of an SCA specification is referred, and the corresponding application can be assembled, deployed and controlled to operate an application instance according to different configuration files, so that various function reconstruction is realized. However, if a fault occurs, the fault is processed according to the fault type and the fault policy, and at this time, a system fault policy scheme designed in advance by a user needs to be added, and the environment blueprint description and the application blueprint description of the operation management framework need to be modified correspondingly. The digital signal processing module can add the redundancy configuration attribute of the equipment in the original core framework blueprint configuration file or add a special equipment redundancy configuration file. After the system monitors module faults, a fault strategy is searched first, if fault isolation is only considered to be carried out on the hardware module simply, besides reporting the hardware faults, software information deployed on the fault module is reported, the fault module is removed from equipment registration resources or a fault label is printed on the fault module, an equipment configuration file is modified, and the fault module is not considered any more in later application deployment; if the redundancy backup strategy of the module is configured, reporting hardware failure, searching the software service and the component deployed by the failed module in the registry, finding out the backup module, migrating the corresponding software service and the component to the backup module, modifying the registry and the configuration file, reporting the application migration condition to the system monitoring service, and recording a system log.
In the design of the active and standby domain management modules, the PPC module on the switch card is used as a primary and standby domain management module to execute various support services of device resource management, deployment reconfiguration management, application management, and the like of the operation management framework, and various hardware resources and software components in the system need to simultaneously reserve registration and management information in the active and standby domain management modules. The operation management command of the integrated management machine to the system is sent to the main domain management module, the active state is monitored between the main domain management module and the standby domain management module by heartbeat, the main domain management module pushes the registration and management information snapshot to the backup domain management module after executing each step of operation management operation, after the main module fails, the integrated management machine is notified by the monitoring service, the backup module is upgraded to the main module, the operation management command of the integrated management machine to the system is sent to the new main domain management module, and after the new main domain management module monitors the new backup module to be on-line by heartbeat, the reserved registration and management information snapshot is started to be pushed to the new backup module.
As shown in fig. 4, the system initializes the main flow:
1. integrated terminal chassis power-on
2. Starting of each board card in case
1) Exchange card A starting process
Starting a bootstrap program;
secondly, starting ReWorks;
starting the domain manager and waiting for the registration of the device manager;
starting the health management service and waiting for the registration of the health monitoring service;
2) the switch card B is started, and the process is the same as the process of starting the switch card A;
3) starting process of signal processing card A
Starting a bootstrap program;
② starting ReWorks-dsp on core 0
i. Starting the device manager, and registering the device manager to the domain managers of the switch board A and the switch board B respectively;
starting health monitoring service, and registering the health monitoring service to the health management service of the switch board A and the health management service of the switch board B respectively;
activating nuclei 1 to 3;
thirdly, starting ReWorks-dsp on the core 1 to the core 3, wherein the starting process is the same
i. Starting the device manager, and registering the device manager to the domain managers of the switch board A and the switch board B respectively;
starting a health monitoring service, and establishing connection with the health monitoring service of the core 0;
4) starting the signal processing card B, wherein the flow is the same as that of starting the signal processing card A;
5) initialization integrated management system
Initializing an application deployment and operation management module, and establishing connection with domain managers of a switch board A and a switch board B;
initializing a system resource and state monitoring module, and establishing connection with the health management service of the exchange board A and the health management service of the exchange board B;
as shown in fig. 5, the main flow of application deployment:
1. installation:
1) the application deployment and operation management module sends an installation instruction to a domain manager of the switch card A;
2) the domain manager of the switch card A copies the waveform application software package to the local;
3) the application deployment and operation management module sends an installation instruction to a domain manager of the switch card B;
4) copying the waveform application software package to the local by a domain manager of the switch card B;
2. deploying
1) The application deployment and operation management module sends a deployment instruction to a domain manager of the switch card A;
2) a domain manager of the switch card A analyzes the configuration of the waveform application software package, and dispatches the components to corresponding nodes for loading;
3) the application deployment and operation management module sends a deployment instruction to a domain manager of the switch card B;
4) the domain manager of the switch card B analyzes the waveform application software package configuration and dispatches the components to the corresponding node for loading
3. Starting up
1) The application deployment and operation management module sends a starting instruction to a domain manager of the switch card A;
2) the domain manager of the switch card A informs the corresponding node device manager to start the related components;
the fault reconstruction process design comprises a signal processing card fault and a switch card fault. The signal processing card faults are divided into non-zero core fault reconstruction and zero core fault reconstruction in the signal processing card faults.
Fig. 6 shows a schematic diagram of signal processing card non-zero core fault reconfiguration, which includes the following main processes:
1. fault detection
1) The health monitoring service of the failed signal processing card zero core detects the failed core and resets the core;
2) notifying the health management service of the switch card A and the switch card B;
2. fault reconstruction
1) The domain manager of the switch card A activates the application of the standby waveform and takes over the processing service of the front-end digital signal;
2) the domain manager of the switch card B redeploys the waveform application to the core 3 as a spare;
3. status update
1) The health management service of the switch card A updates the state information after the fault reconstruction to a system resource and state monitoring module of the comprehensive management system;
2) the health management service of the switch card B updates the state information after the fault reconstruction to a system resource and state monitoring module of the comprehensive management system;
fig. 7 shows a schematic diagram of zero-core fault reconfiguration of a signal processing card, and the main flow is as follows:
1. fault detection
1) The health management service of the switch card A detects the signal processing card A with a fault and sends a reset instruction of the signal processing card A;
2) the health management service of the switch card B also detects the signal processing card A with the fault and sends a reset instruction of the signal processing card A;
2. fault reconstruction
1) The health management service of the switch card A instructs the local domain manager to redeploy the waveform application to the core 1 of the information processing card B, starts the waveform application (activation), and takes over the digital signal processing service;
2) the health management service of the switch card B instructs the local domain manager to redeploy the waveform application to the 2 cores of the information processing card B as a standby;
3. status update
1) The health management service of the switch card A updates the state information after the fault reconstruction to a system resource and state monitoring module of the comprehensive management system;
2) the health management service of the switch card B updates the state information after the fault reconstruction to a system resource and state monitoring module of the comprehensive management system;
fig. 8 shows a schematic diagram of switch card fault reconfiguration, which includes the main processes:
1. fault detection
1) The health management service of the switch card B detects that the switch card A has a fault and resets the switch card A;
2. fault reconstruction
1) The health management service of the switch card B indicates a local domain manager to start a standby waveform application;
2) the health management service of the switch card B resets the core of the signal processing card which is deployed on the switch card A and is related to the application;
3. status update
And the health management service of the switch card B updates the state information after the fault reconstruction to a system resource and state monitoring module of the comprehensive management system.
Those skilled in the art will appreciate that, in addition to implementing the systems, apparatus, and various modules thereof provided by the present invention in purely computer readable program code, the same procedures can be implemented entirely by logically programming method steps such that the systems, apparatus, and various modules thereof are provided in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system, the device and the modules thereof provided by the present invention can be considered as a hardware component, and the modules included in the system, the device and the modules thereof for implementing various programs can also be considered as structures in the hardware component; modules for performing various functions may also be considered to be both software programs for performing the methods and structures within hardware components.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims (8)

1. A high availability expansion system based on SCA framework, comprising:
the standby domain management module: adding a main control node as a hot standby as a standby domain management of the original main control node;
running a management redundancy module: two sets of master control dedicated services are allowed to back up each other, wherein the master control dedicated services comprise a resource configuration service, an application management service and a deployment reconstruction service;
the operation management monitoring module: adding operation monitoring service and health management service to monitor and process faults, and adding a fault reconstruction mechanism on the basis of on-demand reconstruction of deployment reconstruction service;
the signal processing fixture comprises a core 0, a core 1, a core 2 and a core 3;
in the failure reconfiguration mechanism, the method includes:
A. the signal processing card uncore 0 fault reconstruction process comprises the following steps:
a1, Fault detection
The health monitoring service of the failed signal processing card zero core detects the failed core and resets the core;
notifying the health management service of the switch card A and the switch card B;
a2, failure reconstruction
The domain manager of the switch card A activates the application of the standby waveform and takes over the processing service of the front-end digital signal;
the domain manager of the switch card B redeploys the waveform application to the core 3 as a spare;
a3, status update
The health management service of the switch card A updates the state information after the fault reconstruction to a system resource and state monitoring module of the comprehensive management system;
the health management service of the switch card B updates the state information after the fault reconstruction to a system resource and state monitoring module of the comprehensive management system;
B. and (3) a signal processing card core 0 fault reconstruction process:
b1, Fault detection
The health management service of the switch card A detects the signal processing card A with a fault and sends a reset instruction of the signal processing card A;
the health management service of the switch card B also detects the signal processing card A with the fault and sends a reset instruction of the signal processing card A;
b2, Fault reconstruction
The health management service of the switch card A instructs the local domain manager to redeploy the waveform application to the core 1 of the information processing card B, starts the waveform application activation, and takes over the digital signal processing service;
the health management service of the switch card B instructs the local domain manager to redeploy the waveform application to the 2 cores of the information processing card B as a standby;
b3, status update
The health management service of the switch card A updates the state information after the fault reconstruction to a system resource and state monitoring module of the comprehensive management system;
the health management service of the switch card B updates the state information after the fault reconstruction to a system resource and state monitoring module of the comprehensive management system;
C. the fault reconstruction process of the switch card comprises the following steps:
c1, Fault detection
The health management service of the switch card B detects that the switch card A has a fault and resets the switch card A;
c2, Fault reconstruction
The health management service of the switch card B indicates a local domain manager to start a standby waveform application;
the health management service of the switch card B resets the core of the signal processing card which is deployed on the switch card A and is related to the application;
c3, status update
The health management service of the switch card B updates the state information after the fault reconstruction to a system resource and state monitoring module of the comprehensive management system;
the standby domain management is that when the original main control node fails to perform main domain management, the main control node is started to take over the work of the original main control node, and various support services are operated, wherein the support services comprise resource configuration services, deployment reconstruction services and application management services, and various hardware resources and software components simultaneously reserve registration information and management information in the main control node and the original main control node;
when the main module fails, the comprehensive management machine is informed through the monitoring service, the backup module is upgraded to the main module, the comprehensive management machine sends operation management commands to the system to the new main domain management module, and after the new main domain management module monitors that the new backup module is on line through heartbeat, the reserved registration and management information snapshot is pushed to the new backup module.
2. The SCA framework based high availability expansion system of claim 1 further comprising a deployment middleware module: and modifying a bottom communication mechanism of the deployment management middleware to reduce the coupling of two communication ends.
3. The SCA framework based highly available expansion system according to claim 2, wherein the deployment middleware module comprises:
a protocol modification module: a bottom layer communication mechanism of the CORBA middleware adopts a connectionless UDP protocol, so that two communication parties are equal;
a link detection module: and a heartbeat mechanism or a BIT hardware detection mechanism is adopted, so that two communication parties can perceive the link condition and the state of the other party, and when the failure of the link terminal or the other party is perceived, the CORBA middleware recovers resources and sends a message notification.
4. The SCA framework based high-availability expansion system as claimed in claim 1, wherein a mapping table based hardware abstraction layer communication is adopted to standardize a communication interface of an encapsulation standard, an underlying communication mechanism is shielded to realize separation of a communication mode between waveform components and a specific hardware platform, and consistency of a waveform component underlying communication access interface is maintained to transplant the waveform components between heterogeneous hardware platforms.
5. A high-availability expansion method based on an SCA framework is characterized by comprising the following steps:
a standby domain management step: adding a main control node as a hot standby as a standby domain management of the original main control node;
running management redundancy steps: allowing two sets of master control dedicated services to back up each other, wherein the master control dedicated services comprise a resource configuration service, an application management service and a deployment reconstruction service;
operation management monitoring: adding operation monitoring service and health management service to monitor and process faults, and adding a fault reconstruction mechanism on the basis of on-demand reconstruction of deployment reconstruction service;
the signal processing fixture comprises a core 0, a core 1, a core 2 and a core 3;
in the failure reconfiguration mechanism, the method includes:
A. the signal processing card uncore 0 fault reconstruction process comprises the following steps:
a1, Fault detection
The health monitoring service of the failed signal processing card zero core detects the failed core and resets the core;
informing the health management service of the exchange card A and the exchange card B;
a2, failure reconstruction
The domain manager of the switch card A activates the application of the standby waveform and takes over the processing service of the front-end digital signal;
the domain manager of the switch card B redeploys the waveform application to the core 3 as a spare;
a3, status update
The health management service of the switch card A updates the state information after the fault reconstruction to a system resource and state monitoring module of the comprehensive management system;
the health management service of the switch card B updates the state information after the fault reconstruction to a system resource and state monitoring module of the comprehensive management system;
B. and (3) a signal processing card core 0 fault reconstruction process:
b1, Fault detection
The health management service of the switch card A detects the signal processing card A with a fault and sends a reset instruction of the signal processing card A;
the health management service of the switch card B also detects the signal processing card A with the fault and sends a reset instruction of the signal processing card A;
b2 failure reconstruction
The health management service of the switch card A instructs the local domain manager to redeploy the waveform application to the core 1 of the information processing card B, starts the waveform application activation, and takes over the digital signal processing service;
the health management service of the switch card B instructs the local domain manager to redeploy the waveform application to the 2 cores of the information processing card B as a standby;
b3, status update
The health management service of the switch card A updates the state information after the fault reconstruction to a system resource and state monitoring module of the comprehensive management system;
the health management service of the switch card B updates the state information after the fault reconstruction to a system resource and state monitoring module of the comprehensive management system;
C. the fault reconfiguration process of the switch card comprises the following steps:
c1, Fault detection
The health management service of the switch card B detects that the switch card A has a fault and resets the switch card A;
c2, Fault reconstruction
The health management service of the switch card B indicates a local domain manager to start a standby waveform application;
the health management service of the switch card B resets the core of the signal processing card which is deployed on the switch card A and is related to the application;
c3, status update
The health management service of the switch card B updates the state information after the fault reconstruction to a system resource and state monitoring module of the comprehensive management system;
the standby domain management is that when the original main control node fails to perform main domain management, the main control node is started to take over the work of the original main control node, and various support services are operated, wherein the support services comprise resource configuration services, deployment reconstruction services and application management services, and various hardware resources and software components simultaneously reserve registration information and management information in the main control node and the original main control node;
when the main module fails, the comprehensive management machine is informed through the monitoring service, the backup module is upgraded to the main module, the comprehensive management machine sends operation management commands to the system to the new main domain management module, and after the new main domain management module monitors that the new backup module is on line through heartbeat, the reserved registration and management information snapshot is pushed to the new backup module.
6. The SCA framework based high availability expansion method according to claim 5, further comprising a middleware deployment step: and modifying the bottom communication mechanism of the deployment management middleware to reduce the coupling of two communication ends.
7. The SCA framework based high availability expansion method of claim 6, wherein the deploying middleware step comprises:
a protocol reconstruction step: a bottom layer communication mechanism of the CORBA middleware adopts a connectionless UDP protocol, so that two communication parties are equal;
a link detection step: and a heartbeat mechanism or a BIT hardware detection mechanism is adopted, so that two communication parties can perceive the link condition and the state of the other party, and when the failure of the link terminal or the other party is perceived, the CORBA middleware recovers resources and sends a message notification.
8. The SCA framework based high-availability extension method of claim 5, characterized in that a mapping table based hardware abstraction layer communication is adopted to standardize a communication interface of an encapsulation standard, an underlying communication mechanism is shielded to realize separation of a communication mode between waveform components and a specific hardware platform, and consistency of a waveform component underlying communication access interface is maintained to transplant the waveform components between heterogeneous hardware platforms.
CN202010129227.1A 2020-02-28 2020-02-28 High-availability extension system and method based on SCA framework Active CN111447079B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010129227.1A CN111447079B (en) 2020-02-28 2020-02-28 High-availability extension system and method based on SCA framework

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010129227.1A CN111447079B (en) 2020-02-28 2020-02-28 High-availability extension system and method based on SCA framework

Publications (2)

Publication Number Publication Date
CN111447079A CN111447079A (en) 2020-07-24
CN111447079B true CN111447079B (en) 2022-08-16

Family

ID=71653925

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010129227.1A Active CN111447079B (en) 2020-02-28 2020-02-28 High-availability extension system and method based on SCA framework

Country Status (1)

Country Link
CN (1) CN111447079B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115134767B (en) * 2021-03-11 2024-02-09 上海大唐移动通信设备有限公司 Method, device and storage medium for improving performance of signaling soft acquisition equipment
CN113360136B (en) * 2021-05-31 2023-11-03 成都谐盈科技有限公司 SCA core framework control interface based implementation method
CN116225812B (en) * 2023-05-08 2023-08-04 山东云海国创云计算装备产业创新中心有限公司 Baseboard management controller system operation method, device, equipment and storage medium
CN116614388B (en) * 2023-07-14 2023-09-22 成都谐盈科技有限公司 Method and terminal for realizing domain manager model based on software communication system structure

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103152414A (en) * 2013-03-01 2013-06-12 四川省电力公司信息通信公司 High available system based on cloud calculation and implementation method thereof

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106790626B (en) * 2016-12-31 2020-07-21 广州佳都城轨智慧运维服务有限公司 Reliable distributed alarm implementation method
CN109194497B (en) * 2018-07-17 2021-07-16 中国航空无线电电子研究所 Dual SRIO network backup system for software-oriented radio system
CN109254757B (en) * 2018-07-17 2021-09-24 中国航空无线电电子研究所 Software communication architecture for dual core framework

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103152414A (en) * 2013-03-01 2013-06-12 四川省电力公司信息通信公司 High available system based on cloud calculation and implementation method thereof

Also Published As

Publication number Publication date
CN111447079A (en) 2020-07-24

Similar Documents

Publication Publication Date Title
CN111447079B (en) High-availability extension system and method based on SCA framework
US6854069B2 (en) Method and system for achieving high availability in a networked computer system
US6904593B1 (en) Method of administering software components using asynchronous messaging in a multi-platform, multi-programming language environment
CN108270726B (en) Application instance deployment method and device
CN109254757B (en) Software communication architecture for dual core framework
WO2003093987A1 (en) Plug-in configuration manager
CN105373650A (en) AADL based IMA dynamic reconfiguration modeling method
CN112230987B (en) Distributed modular plug-in frame realization system and method
US20120036496A1 (en) Plug-in based high availability application management framework (amf)
CN113448686A (en) Resource deployment method and device, electronic equipment and storage medium
Ketfi et al. Automatic adaptation of component-based software
CN112477919A (en) Dynamic redundancy backup method and system suitable for train control system platform
CN114995835A (en) Application automation deployment method, system, equipment and readable storage medium
US6381712B1 (en) Method and apparatus for providing an error messaging system
CN106850598B (en) Uniform resource management system and method for whole-ship computing environment
Quéma et al. Asynchronous, hierarchical, and scalable deployment of component-based applications
CN113010531B (en) Block chain BAAS system task scheduling framework based on directed acyclic graph
Lovrek et al. Improving software maintenance by using agent-based remote maintenance shell
Ozeer et al. Verification of a Failure Management Protocol for Stateful IoT Applications
Baitinger et al. System control structure of the IBM eServer z900
Deconinck et al. Integrating Recovery Strategies into a Primary Substation Automation System.
CN114090211A (en) Method and device for coordinating single-task master-slave program and related multi-server system
Masetti et al. Increasing Availability by Implementing Software Redundancy in the CMS Detector Control System
CN115437648A (en) Large-scale blueprint parallel deployment method, device and medium
Bieswanger et al. Hardware configuration framework for the IBM eServer z900

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant