CN112511394B - Management and maintenance method of RapidIO bus system - Google Patents

Management and maintenance method of RapidIO bus system Download PDF

Info

Publication number
CN112511394B
CN112511394B CN202011227054.3A CN202011227054A CN112511394B CN 112511394 B CN112511394 B CN 112511394B CN 202011227054 A CN202011227054 A CN 202011227054A CN 112511394 B CN112511394 B CN 112511394B
Authority
CN
China
Prior art keywords
rapidio
host node
processing unit
main
management
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011227054.3A
Other languages
Chinese (zh)
Other versions
CN112511394A (en
Inventor
邓豹
赵谦
陈颖图
樊超
颜丰琳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Aeronautics Computing Technique Research Institute of AVIC
Original Assignee
Xian Aeronautics Computing Technique Research Institute of AVIC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Aeronautics Computing Technique Research Institute of AVIC filed Critical Xian Aeronautics Computing Technique Research Institute of AVIC
Priority to CN202011227054.3A priority Critical patent/CN112511394B/en
Publication of CN112511394A publication Critical patent/CN112511394A/en
Application granted granted Critical
Publication of CN112511394B publication Critical patent/CN112511394B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/28Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
    • H04L12/40Bus networks
    • H04L12/40006Architecture of a communication node
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0663Performing the actions predefined by failover planning, e.g. switching to standby network elements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0803Configuration setting

Abstract

The invention discloses a management maintenance method of a RapidIO bus system, which adopts a two-stage management strategy combining board level concentration and system distribution, two management maintenance methods of RapidIO out-of-band operation and in-band operation and a hot backup monitoring mode of a Host node, provides reliable on-line management maintenance of the RapidIO bus system, and realizes the functions of real-time monitoring, fault isolation, fault recovery and the like of the RapidIO bus system. The embodiment of the invention solves the problems of difficult configuration management, large burst flow, more data transmission conflicts, high real-time requirement and the like of a RapidIO bus network in an embedded signal processing system.

Description

Management and maintenance method of RapidIO bus system
Technical Field
The present invention relates to, but is not limited to, the field of embedded signal processing technologies, and in particular, to a method for managing and maintaining a RapidIO bus system.
Background
The RapidIO bus (one of standard buses) is widely applied in the technical field of embedded signal processing, and the management and maintenance of a complex RapidIO bus network are important parts of the design of an embedded signal processing system.
However, the RapidIO bus network in the embedded signal processing system has the characteristics of difficult configuration management, large burst flow, more data transmission conflicts, high real-time requirement and the like.
The embodiment of the invention provides a reliable, online and real-time RapidIO bus system management and maintenance method, which can effectively solve the stability problem of multitask, strong real-time and mass data transmission in an embedded signal processing system.
Disclosure of Invention
The purpose of the invention is: the embodiment of the invention provides a management and maintenance method of a RapidIO bus system, which aims to solve the problems of difficult configuration and management, large burst flow, multiple data transmission conflicts, high real-time requirements and the like of a RapidIO bus network in an embedded signal processing system.
The technical scheme of the invention is as follows:
the embodiment of the invention provides a management and maintenance method of a RapidIO bus system, which is characterized in that the RapidIO bus system comprises the following steps: the RapidIO processing units are connected through a RapidIO bus to form a RapidIO bus system; each RapidIO processing unit includes: the system comprises a main Host node, a backup Host node, a RapidIO switch and other processing nodes with RapidIO interfaces, wherein one RapidIO processing unit serves as a main processing unit, and the other RapidIO processing units serve as slave processing units; a main Host node, a backup Host node and a RapidIO switch in each processing unit are all configured with a management maintenance interface and a RapidIO interface, other processing nodes are configured with RapidIO interfaces, the main Host node and the backup Host node are respectively interconnected with the RapidIO switch through the management maintenance interface, the RapidIO interfaces of the main Host node, the backup Host node and other processing nodes are respectively connected with the RapidIO interfaces of the RapidIO switch, and the RapidIO interface externally output by the RapidIO switch of each processing unit is interconnected with other processing units in the RapidIO bus system to form the RapidIO bus system; the method for executing management and maintenance of the RapidIO bus system comprises the following steps:
step 1, in each processing unit, a main Host node executes the initial configuration operation of the RapidIO network of the processing unit, and the configuration items comprise: the ID of the RapidIO equipment, the communication rate and the link line width; the main Host node in the main processing unit also executes the communication route configuration of the whole RapidIO bus system;
step 2, in each processing unit, the main Host node configures the RapidIO switch of the processing unit through software; the working state after the configuration is as follows: when an important fault event occurs at a Port of the RapidIO switch, the RapidIO switch reports a fault state to a main Host through an interrupt signal or a Port-Write maintenance packet;
step 3, in each processing unit, the main Host node monitors events in the processing unit, monitors conventional events and monitors important events, wherein the important events comprise fault events, and the fault events comprise conventional fault events and important fault events;
step 4, processing the conventional fault event, including: when a RapidIO interface of a non-main Host node or a switch fails, the main Host node maintains and configures a register of the RapidIO switch through a RapidIO maintenance packet; when the RapidIO interface of the main Host node fails, recovering the failed interface through the management maintenance interface of the main Host node;
and 5, processing important fault events, including: and the main Host reports the fault to a main Host node of the main processing unit through a RapidIO maintenance packet, and the main Host of the main processing unit performs fault transaction broadcasting, blocking link interruption, fault packet discarding or communication route reconfiguration operation through the RapidIO maintenance packet to perform system-level fault processing.
Optionally, in the above-mentioned management and maintenance method for RapidIO bus system, in step 2,
the monitoring mode of the conventional event is as follows: the main Host node accesses a register of the RapidIO switch through interconnection of the internal management and maintenance interface of the processing unit in a periodic query mode, and obtains RapidIO network states of the processing unit in real time, wherein the RapidIO network states include link states, flow and error rates.
Optionally, in the above-mentioned management and maintenance method for RapidIO bus system, in step 2,
the monitoring mode of the important events is as follows: and the main Host node receives an interrupt signal or a Port-Write maintenance packet of the RapidIO switch, analyzes the Port-Write maintenance packet and determines the fault type.
Optionally, in the above method for managing and maintaining a RapidIO bus system, after the step 5, the method further includes:
and 6, recording the event, comprising: the main Host records the event of the processing unit; and the main Host node reports the events affecting other processing units to the main Host of the main processing unit through a RapidIO maintenance packet, and the main Host of the main processing unit determines to discard, record or broadcast the events according to the system running state.
Optionally, in the management and maintenance method of a RapidIO bus system, the method further includes:
step 7, in each processing unit, the main Host periodically reports heartbeats to the standby Host, and when the main Host does not report heartbeats, the standby Host is used as the main Host to take over the management right of the processing unit in the RapidIO bus system; when the management right of the master Host is changed, the change condition is reported to the master Host of the master processing unit through the maintenance packet, and when the management right of the master Host of the master processing unit is changed, the change condition is broadcasted to the slave processing unit in the RapidIO bus system through the maintenance packet.
Optionally, in the above method for managing and maintaining a RapidIO bus system, before the step 1, further includes:
setting one RapidIO processing unit in a RapidIO bus system as a main processing unit and other RapidIO processing units as slave processing units according to the configuration file;
and determining a main Host node and a standby Host node in each processing unit in a right preempting mode.
Optionally, in the management and maintenance method for the RapidIO bus system, the management and maintenance of the RapidIO bus system uses two modes, namely an out-of-band operation of a management and maintenance interface and an in-band maintenance operation of the RapidIO interface, and adopts a two-stage management method combining board-level concentration and system distribution, and both board-level and system Host nodes are realized by a "master-slave" hot backup mode.
Optionally, in the management maintenance method of the RapidIO bus system as described above, the management maintenance interface includes one of PCIe, I2C, and JTAG;
the RapidIO switch in the RapidIO processing unit is configured to be realized through cascade connection of a plurality of RapidIO switch chips.
The invention has the advantages that:
the management and maintenance method of the RapidIO bus system provided by the embodiment of the invention specifically comprises the following aspects: (1) providing a method for real-time state monitoring, quick fault recovery and unrecoverable fault isolation in a RapidIO bus system; (2) providing a hierarchical management maintenance system architecture of a complex RapidIO bus system; (3) providing a management and maintenance strategy combining RapidIO in-band and out-of-band operations; (4) and a reliable 'main-standby' Host system management node design mode is provided. Therefore, the robustness of the RapidIO bus network in the embedded signal processing system is enhanced, and the stable transmission of multi-task, strong real-time and burst large-flow data is realized. The management and maintenance method of the RapidIO bus system has the following advantages that:
(1) the two-stage management and maintenance strategy of board level concentration and system distribution is adopted to realize the layering, classification and home management of RapidIO bus faults, reduce the management and maintenance cost of the RapidIO bus system and improve the management and maintenance efficiency;
(2) the management maintenance method combining the operations in the RapidIO band and out of the RapidIO band is adopted, so that the periodic real-time monitoring of the ordinary transactions and the rapid processing of the emergency transactions of the RapidIO network are realized, and the relationship between the management maintenance overhead and the fault real-time response of the RapidIO bus system is effectively balanced;
(4) the monitoring mode of hot backup of a main-backup Host node is adopted, so that the management and monitoring reliability of the RapidIO bus system is effectively improved.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the example serve to explain the principles of the invention and not to limit the invention.
FIG. 1 is a schematic structural diagram of a RapidIO bus system for executing a management maintenance method in the embodiment of the present invention;
fig. 2 is a block diagram of a specific embodiment of a RapidIO bus system according to an embodiment of the present invention;
fig. 3 is a functional operation block diagram of a specific embodiment of the RapidIO bus system according to the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be noted that the embodiments and features of the embodiments in the present application may be arbitrarily combined with each other without conflict.
The steps illustrated in the flow charts of the figures may be performed in a computer system such as a set of computer-executable instructions. Also, while a logical order is shown in the flow diagrams, in some cases, the steps shown or described may be performed in an order different than here.
The invention provides a management maintenance method of a RapidIO bus system, which adopts a two-stage management strategy combining board level concentration and system distribution, two management maintenance methods of RapidIO out-of-band operation and RapidIO in-band operation, and a hot backup monitoring mode of a Host node, provides reliable on-line management maintenance of the RapidIO bus system, and realizes real-time monitoring, fault isolation, fault recovery and the like of the RapidIO bus system.
The following specific embodiments of the present invention may be combined, and the same or similar concepts or processes may not be described in detail in some embodiments.
Fig. 1 is a schematic structural diagram of a RapidIO bus system for executing the management maintenance method in the embodiment of the present invention. The RapidIO bus system comprises a plurality of RapidIO processing units which are interconnected through RapidIO buses, and the RapidIO processing units can be configured into a master processing unit and a slave processing unit through software. Each RapidIO processing unit comprises a software configurable main Host node, a backup Host node, a RapidIO switch and other processing nodes with RapidIO interfaces. The main Host node and the backup Host node in the main processing unit are a system main Host node and a system backup Host node and are responsible for management and maintenance of a RapidIO network in the whole system. And the master Host node and the backup Host node in the slave processing unit are a board-level master Host node and a board-level backup Host node and are responsible for management and maintenance of a RapidIO sub-network in the processing unit.
The main Host node, the backup Host node and the RapidIO switch on the RapidIO processing unit should be provided with a management maintenance interface (the management maintenance interface includes one of PCIe, I2C and JTAG, but is not limited to the above interface) and a RapidIO interface. And the main Host node and the backup Host node are interconnected with the RapidIO switch through the management and maintenance interface. The RapidIO interfaces of all RapidIO processing nodes on the RapidIO processing unit are connected to a RapidIO switch. And the RapidIO switch of the RapidIO processing unit interconnects the externally output RapidIO interface with other RapidIO processing units in the system to form the whole RapidIO network. Based on the hardware structure of the RapidIO bus system and the functions of each part, the RapidIO bus system execution management maintenance method comprises the following steps:
step 1, in each processing unit, a main Host node executes the initial configuration operation of the RapidIO network of the processing unit, and the configuration items comprise: the ID of the RapidIO equipment, the communication rate and the link line width; the main Host node in the main processing unit also executes the communication route configuration of the whole RapidIO bus system;
step 2, in each processing unit, the main Host node configures the RapidIO switch of the processing unit through software; the working state after the configuration is as follows: when an important fault event occurs at a Port of the RapidIO switch, the RapidIO switch reports a fault state to a main Host through an interrupt signal or a Port-Write maintenance packet; important fault events can be disconnection, blockage, retransmission stop and the like;
step 3, in each processing unit, the main Host node monitors events in the processing unit, monitors conventional events and monitors important events, wherein the important events comprise fault events, and the fault events comprise conventional fault events and important fault events;
step 4, processing the conventional fault event, including: when the RapidIO interface of the non-master Host node or the switch fails, the master Host node maintains and configures the register of the RapidIO switch through a RapidIO maintenance packet, and the maintenance items may include: port reset, close-restart operation, attempting to recover the failed interface; when the RapidIO interface of the main Host node fails, recovering the failed interface through the management maintenance interface of the main Host node;
and 5, processing important fault events, including: and the main Host reports the fault to a main Host node of the main processing unit through a RapidIO maintenance packet, and the main Host of the main processing unit performs fault transaction broadcasting, blocking link interruption, fault packet discarding or communication route reconfiguration operation through the RapidIO maintenance packet to perform system-level fault processing.
In an implementation manner of the embodiment of the present invention, the monitoring manner of the conventional event in step 2 is as follows: the main Host node accesses a register of the RapidIO switch through interconnection of the internal management and maintenance interface of the processing unit in a periodic query mode, and obtains RapidIO network states of the processing unit in real time, wherein the RapidIO network states include link states, flow and error rates. In the specific implementation, on the RapidIO processing unit, the periodic management of the RapidIO sub-network is realized, the main Host node is realized by adopting a periodic query mode, the register of the RapidIO switch is accessed through the internal management maintenance interface of the processing unit, and the states of the RapidIO network in the processing unit, such as link state, flow, error rate and the like, are acquired in real time, so that the real-time monitoring of the RapidIO network state is realized.
In an implementation manner of the embodiment of the present invention, the monitoring manner of the important event in step 2 is as follows: and the main Host node receives an interrupt signal or a Port-Write maintenance packet of the RapidIO switch, analyzes the Port-Write maintenance packet and determines the fault type. In the specific implementation, the emergency transaction management of the RapidIO sub-network is realized by adopting an interrupt or Port-Write maintenance message mode, the emergency transaction of the RapidIO switch is configured to notify the board-level main Host node of the interrupt or Port-Write maintenance message, the board-level main Host node configures a RapidIO switch register through a RapidIO maintenance operation mode, and the RapidIO switch register is configured to handle fault types, including isolation and recovery operations such as route reconfiguration, Port reset or close, Port restart and the like.
When the failure can not be recovered, the board-level main Host node informs the system main Host node through RapidIO maintenance operation, and the system main Host node uses RapidIO maintenance operation to take charge of system-level failure processing, including system failure notification, failure packet discarding, routing reconfiguration and other operations, so as to realize the notification and isolation of the failure in the system.
On the RapidIO processing unit, a board-level main Host node set by software is responsible for the periodic state transaction management of a RapidIO sub-network on the processing unit through a management maintenance interface. And emergency transaction management such as fault isolation, fault recovery and the like of a RapidIO sub-network on the processing unit is taken charge through RapidIO maintenance operation. And reporting the board-level RapidIO sub-network state and fault information to a system Host through RapidIO maintenance operation.
On the main processing unit, the system main Host and the backup Host are responsible for receiving and analyzing the state and the fault message of the RapidIO sub-network reported by each board level Host node, and the RapidIO event is processed by adopting a RapidIO maintenance operation mode, and the method comprises the following steps: system failure notification, failed packet discarding, route reconfiguration, etc.
After step 5, the embodiment of the present invention further includes:
and 6, recording the event, comprising: the main Host records events of the processing unit, such as a flow state, a packet loss state and a retransmission state; and the main Host node reports events affecting other processing units to the main Host of the main processing unit through a RapidIO maintenance packet, such as disconnection, connection, link blockage and the like, and the main Host of the main processing unit determines to discard, record or broadcast the events according to the system running state.
Further, the embodiment of the present invention further includes:
step 7, in each processing unit, the main Host periodically reports heartbeats to the standby Host, and when the main Host does not report heartbeats, the standby Host is used as the main Host to take over the management right of the processing unit in the RapidIO bus system; when the management right of the master Host is changed, the change condition is reported to the master Host of the master processing unit through the maintenance packet, and when the management right of the master Host of the master processing unit is changed, the change condition is broadcasted to the slave processing unit in the RapidIO bus system through the maintenance packet.
In a specific implementation manner, on each RapidIO processing unit, the main Host node and the backup Host adopt a hot backup mode, main-backup monitoring is performed in a heartbeat mode, and when the main Host node fails, the backup Host node takes over the management authority of the main Host node.
In other RapidIO processing units, the board-level backup Host monitors the heartbeat of a board-level main Host node, takes over the management right of a board-level RapidIO sub-network when the board-level main Host is free of heartbeat, and notifies the main Host of the RapidIO bus system of the change of the management right. In the RapidIO main processing unit, the system backups the Host, monitors the heartbeat of the main Host node of the system, takes over the management right of the RapidIO bus system when the main Host of the system has no heartbeat, and notifies the board-level main Host in the RapidIO bus system of the change of the management right.
In practical application, any RapidIO processing unit in the software configuration system can be used as a main processing unit, and a main Host node and a backup Host node of the software configuration system are used as a system main Host node and a system backup Host node. When the RapidIO processing unit is a main processing unit, the main Host and the backup Host node are simultaneously responsible for management and maintenance of a board-level RapidIO bus and management and maintenance of a system RapidIO bus. At this time, the system main Host and the board-level main Host may be the same processor node, and the system backup Host and the board-level backup Host may be the same processor node. In addition, in each processing unit, a main Host node and a standby Host node in the processing unit can be determined in a right-robbing mode.
It should be noted that, in the embodiment of the present invention, the management and maintenance of the RapidIO bus system uses two modes, namely, an out-of-band operation of the management and maintenance interface and an in-band maintenance operation of the RapidIO interface, and a two-stage management method combining board level concentration and system distribution is adopted, and both board level and system Host node are implemented in a "main-standby" hot backup mode, so as to provide reliable management and maintenance of the RapidIO bus system.
Further, the management maintenance interface in the embodiment of the present invention includes one of PCIe, I2C, and JTAG; the RapidIO switch in the RapidIO processing unit is configured to be realized through cascade connection of a plurality of RapidIO switch chips.
The management and maintenance method of the RapidIO bus system provided by the embodiment of the invention specifically comprises the following aspects: (1) providing a method for real-time state monitoring, quick fault recovery and unrecoverable fault isolation in a RapidIO bus system; (2) providing a hierarchical management maintenance system architecture of a complex RapidIO bus system; (3) providing a management and maintenance strategy combining RapidIO in-band and out-of-band operations; (4) and a reliable 'main-standby' Host system management node design mode is provided. Therefore, the robustness of the RapidIO bus network in the embedded signal processing system is enhanced, and the stable transmission of multi-task, strong real-time and burst large-flow data is realized.
The management and maintenance method of the RapidIO bus system provided by the embodiment of the invention adopts a two-stage management strategy combining board level concentration and system distribution, two management and maintenance methods of RapidIO out-of-band operation and RapidIO in-band operation, and a hot backup monitoring mode of a Host node, provides reliable on-line management and maintenance of the RapidIO bus system, and realizes real-time monitoring, fault isolation, fault recovery and the like of the RapidIO bus system.
Fig. 2 is a block diagram of a RapidIO bus system according to an embodiment of the present invention, and the present invention is further described with reference to a specific embodiment.
The RapidIO bus system is realized by a plurality of RapidIO processing units, a main Host node and a backup Host node in the RapidIO processing units are realized by adopting a TMS320C6678 processor of a TI company, and a RapidIO switch is realized by adopting an 80HCPS1848 switching chip of the IDT company.
The management and maintenance interface of the TMS320C6678 processor is realized through I2C, the heartbeat between the main Host node and the backup Host node is realized through a GPIO interface of the processor, the main Host node reports the heartbeat to the backup Host node periodically, and the TMS320C6678 processor realizes interconnection of 1-path 4x/5Gbps RapidIO and a switching chip.
The 80HCPS1848 switching chip provides a RapidIO physical interface of 18 ports and 48 lines and supports the RapidIO V2.1 specification. In a specific embodiment, the configuration is a 4x module port, which is respectively connected with a TMS320C6678 processor inside a RapidIO processing unit and outputs 2 paths of 4x RapidIO interfaces to the outside, so as to implement system interconnection. The 80HCPS1848 switching chip provides I2C as an administration and maintenance interface, is connected with the TMS320C6678 processor, and realizes administration and maintenance of RapidIO.
As shown in fig. 3, for a functional operation block diagram of a specific embodiment of the RapidIO bus system provided in the embodiment of the present invention, in the management and maintenance design of the RapidIO bus system, a RapidIO switch chip is configured by software, and when it is preset that states such as disconnection, retransmission, and the like occur at each RapidIO Port, a fault state is reported to the TMS320C6678 processor by an interrupt or a Port-Write maintenance packet.
During the operation of a board-level main Host node TMS320C6678, the following functions are mainly realized: the states of the RapidIO switching chip are read periodically through an I2C interface, wherein the states include link states, flow, error rate and the like; responding and processing the interrupt or Port-Write transaction reported by the RapidIO chip, and performing fault recovery; periodically reporting the state of a local RapidIO sub-network to a system main Host node, and reporting an unrecoverable fault in real time; and in the process of system initialization or fault recovery, finishing the configuration management of the local route. Meanwhile, after the board-level backup Host node monitors that the main Host node has no heartbeat, the board-level backup Host node takes over the management right of the board-level RapidIO sub-network and reports the management right to the system main Host.
During the operation of a main Host node TMS320C6678 of the system, the following functions are mainly realized: periodically acquiring and analyzing state information reported by a RapidIO sub-network, and reporting a system state or recording a key state to an upper-level host system; responding to the unrecoverable fault reported by the RapidIO sub-network, reporting the unrecoverable fault and related nodes to the whole system, and sending route change information to realize fault isolation; and completing the configuration management of the system route in the process of system initialization or fault isolation. Meanwhile, after the system backup Host node monitors that the main Host node has no heartbeat, the management right of the RapidIO network of the system is taken over, and the management right is reported to each board-level main Host.
Although the embodiments of the present invention have been described above, the above description is only for the convenience of understanding the present invention, and is not intended to limit the present invention. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (8)

1. A management maintenance method of a RapidIO bus system is characterized in that the RapidIO bus system comprises the following steps: the RapidIO processing units are connected through a RapidIO bus to form a RapidIO bus system; each RapidIO processing unit includes: the system comprises a main Host node, a backup Host node, a RapidIO switch and other processing nodes with RapidIO interfaces, wherein one RapidIO processing unit serves as a main processing unit, and the other RapidIO processing units serve as slave processing units; a main Host node, a backup Host node and an RapidIO switch in each processing unit are all configured with a management maintenance interface and an RapidIO interface, other processing nodes are configured with RapidIO interfaces, the main Host node and the backup Host node are respectively interconnected with the management maintenance interface of the RapidIO switch through the management maintenance interface, the RapidIO interfaces of the main Host node, the backup Host node and other processing nodes are respectively connected with the RapidIO interface of the RapidIO switch, and the RapidIO interface externally output by the RapidIO switch of each processing unit is interconnected with other processing units in the RapidIO bus system to form the RapidIO bus system; the method for executing management and maintenance of the RapidIO bus system comprises the following steps:
step 1, in each processing unit, a main Host node executes the initial configuration operation of the RapidIO network of the processing unit, and the configuration items comprise: the ID of the RapidIO equipment, the communication rate and the link line width; the main Host node in the main processing unit also executes the communication route configuration of the whole RapidIO bus system;
step 2, in each processing unit, the main Host node configures the RapidIO switch of the processing unit through software; the working state after the configuration is as follows: when an important fault event occurs at a Port of the RapidIO switch, the RapidIO switch reports a fault state to a main Host node through an interrupt signal or a Port-Write maintenance packet; the Port-Write maintenance packet is a maintenance packet which actively reports the error state of a fault Port to a main Host node when the RapidIO switch fails;
step 3, in each processing unit, the main Host node monitors events in the processing unit, monitors conventional events and monitors important events, wherein the important events comprise fault events, and the fault events comprise conventional fault events and important fault events;
step 4, processing the conventional fault event, including: when a RapidIO interface of the switch or a RapidIO interface of a non-master Host node fails, the master Host node maintains and configures a register of the RapidIO switch through a RapidIO maintenance packet; when the RapidIO interface of the main Host node fails, recovering the failed interface through the management maintenance interface of the main Host node;
and 5, processing important fault events, including: the main Host node reports the fault to the main Host node of the main processing unit through the RapidIO maintenance packet, and the main Host node of the main processing unit performs fault transaction broadcasting, blocking link interruption, fault packet discarding or communication route reconfiguration operation through the RapidIO maintenance packet to perform system-level fault processing.
2. The method for managing and maintaining a RapidIO bus system according to claim 1, wherein in the step 3,
the monitoring mode of the conventional event is as follows: the main Host node accesses a register of the RapidIO switch through interconnection of the internal management and maintenance interface of the processing unit in a periodic query mode, and obtains RapidIO network states of the processing unit in real time, wherein the RapidIO network states include link states, flow and error rates.
3. The method for managing and maintaining a RapidIO bus system according to claim 1, wherein in the step 3,
the monitoring mode of the important events is as follows: and the main Host node receives an interrupt signal or a Port-Write maintenance packet of the RapidIO switch, analyzes the Port-Write maintenance packet and determines the fault type.
4. The method for managing and maintaining the RapidIO bus system according to claim 1, further comprising, after the step 5:
and 6, recording the event, comprising: the main Host node records the event of the processing unit; and the main Host node reports the events affecting other processing units to the main Host node of the main processing unit through a RapidIO maintenance packet, and the main Host node of the main processing unit determines to discard, record or broadcast the events according to the system running state.
5. The method for managing and maintaining the RapidIO bus system according to claim 4, further comprising:
step 7, in each processing unit, the main Host node periodically reports heartbeats to the backup Host node, and when the main Host node does not report heartbeats, the backup Host node is used as the main Host node to take over the management right of the processing unit in the RapidIO bus system; when the management right of the main Host node is changed, the change condition is reported to the main Host node of the main processing unit through the maintenance packet, and when the management right of the main Host node of the main processing unit is changed, the change condition is broadcasted to the auxiliary processing unit in the RapidIO bus system through the maintenance packet.
6. The method for managing and maintaining the RapidIO bus system according to claim 1, wherein the step 1 is preceded by the steps of:
setting one RapidIO processing unit in a RapidIO bus system as a main processing unit and other RapidIO processing units as slave processing units according to the configuration file;
and determining a main Host node and a backup Host node in each processing unit in a right preempting mode.
7. The method for managing and maintaining the RapidIO bus system according to any one of claims 1 to 6, characterized in that the RapidIO bus system is managed and maintained by using two modes of out-of-band operation of a management and maintenance interface and in-band maintenance operation of the RapidIO interface, a two-stage management method combining board level concentration and system distribution is adopted, and both board level and system Host nodes are realized by adopting a 'main-standby' hot backup mode.
8. The method for managing and maintaining the RapidIO bus system according to any of claims 1-6, wherein the management and maintenance interface comprises one of PCIe, I2C and JTAG;
the RapidIO switch in the RapidIO processing unit is configured to be realized through cascade connection of a plurality of RapidIO switch chips.
CN202011227054.3A 2020-11-05 2020-11-05 Management and maintenance method of RapidIO bus system Active CN112511394B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011227054.3A CN112511394B (en) 2020-11-05 2020-11-05 Management and maintenance method of RapidIO bus system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011227054.3A CN112511394B (en) 2020-11-05 2020-11-05 Management and maintenance method of RapidIO bus system

Publications (2)

Publication Number Publication Date
CN112511394A CN112511394A (en) 2021-03-16
CN112511394B true CN112511394B (en) 2022-02-11

Family

ID=74955347

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011227054.3A Active CN112511394B (en) 2020-11-05 2020-11-05 Management and maintenance method of RapidIO bus system

Country Status (1)

Country Link
CN (1) CN112511394B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113965459A (en) * 2021-10-08 2022-01-21 浪潮云信息技术股份公司 Consul-based method for monitoring host network to realize high availability of computing nodes
CN115484220B (en) * 2022-08-23 2023-06-27 中国电子科技集团公司第十研究所 Method, equipment and medium for processing event report of domestic SRIO exchange chip
CN115150322B (en) * 2022-09-06 2022-11-25 中勍科技股份有限公司 Multichannel RapidIO distribution system and fault self-isolation method thereof
CN116232864B (en) * 2023-05-05 2023-07-14 井芯微电子技术(天津)有限公司 Multi-machine hot backup method and system for network system based on event controller

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4509827B2 (en) * 2005-03-04 2010-07-21 富士通株式会社 Computer system using serial connect bus and method of connecting multiple CPU units by serial connect bus
US20070104219A1 (en) * 2005-11-09 2007-05-10 Honeywell International Inc. System and method to facilitate testing of rapidio components
CN102843264B (en) * 2012-09-21 2015-04-08 中国航空无线电电子研究所 Control method of double hosts in high-speed serial bus network
CN103001867A (en) * 2012-12-27 2013-03-27 中航(苏州)雷达与电子技术有限公司 Host-standby machine duplicated hot-backup system and method
CN103970704A (en) * 2014-04-16 2014-08-06 上海电控研究所 Optical fiber bus hardware system based on Rapid IO protocol
CN107483353B (en) * 2017-08-30 2019-08-16 天津津航计算技术研究所 A kind of RapidIO network management and monitoring system
CN109194497B (en) * 2018-07-17 2021-07-16 中国航空无线电电子研究所 Dual SRIO network backup system for software-oriented radio system
CN109218231A (en) * 2018-09-21 2019-01-15 中国航空无线电电子研究所 A kind of RapidIO exchange network
CN109547365B (en) * 2018-10-29 2021-04-30 中国航空无线电电子研究所 SRIO-based data exchange system of unmanned finger control system
CN110704250B (en) * 2019-09-23 2023-03-03 天津津航计算技术研究所 Hot backup device of distributed system

Also Published As

Publication number Publication date
CN112511394A (en) 2021-03-16

Similar Documents

Publication Publication Date Title
CN112511394B (en) Management and maintenance method of RapidIO bus system
US6173411B1 (en) Method and system for fault-tolerant network connection switchover
US9237092B2 (en) Method, apparatus, and system for updating ring network topology information
AU737333B2 (en) Active failure detection
JP2002517819A (en) Method and apparatus for managing redundant computer-based systems for fault-tolerant computing
EP2798782A1 (en) Technique for handling a status change in an interconnect node
JPH07235933A (en) Fault-torelant connection method and device to local area network of computor system
JP2004062535A (en) Method of dealing with failure for multiprocessor system, multiprocessor system and node
CN109194497B (en) Dual SRIO network backup system for software-oriented radio system
CA2357913A1 (en) System for providing fabric activity switch control in a communications system
CN101582797A (en) Management board and two-unit standby system and method
CN115550291B (en) Switch reset system and method, storage medium, and electronic device
JPH086910A (en) Cluster type computer system
CN110535715B (en) Linux-based port state real-time detection method, circuit and switch
CN102763087B (en) Method and system for realizing interconnection fault-tolerance between CPUs
CN114356665A (en) Comprehensive photoelectric signal processing computing resource management method
CN101212341A (en) Database system switching method
US8208370B1 (en) Method and system for fast link failover
CN114884767B (en) Synchronous dual-redundancy CAN bus communication system, method, equipment and medium
JP6134720B2 (en) Connection method
CN114095462B (en) Fault-tolerant method and system for SRIO communication system of radar processor
CN114928513A (en) Double-bus communication system and communication method based on SRIO protocol
CN217037201U (en) Management network device for storing products and storage system
CN112910686B (en) Flow analysis system, method of operating flow analysis system, and computer-readable storage medium
KR100198416B1 (en) Synchronization monitor circuit for duplicated control system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant