CN112511394B

CN112511394B - Management and maintenance method of RapidIO bus system

Info

Publication number: CN112511394B
Application number: CN202011227054.3A
Authority: CN
Inventors: 邓豹; 赵谦; 陈颖图; 樊超; 颜丰琳
Original assignee: Xian Aeronautics Computing Technique Research Institute of AVIC
Current assignee: Xian Aeronautics Computing Technique Research Institute of AVIC
Priority date: 2020-11-05
Filing date: 2020-11-05
Publication date: 2022-02-11
Anticipated expiration: 2040-11-05
Also published as: CN112511394A

Abstract

The invention discloses a management maintenance method of a RapidIO bus system, which adopts a two-stage management strategy combining board level concentration and system distribution, two management maintenance methods of RapidIO out-of-band operation and in-band operation and a hot backup monitoring mode of a Host node, provides reliable on-line management maintenance of the RapidIO bus system, and realizes the functions of real-time monitoring, fault isolation, fault recovery and the like of the RapidIO bus system. The embodiment of the invention solves the problems of difficult configuration management, large burst flow, more data transmission conflicts, high real-time requirement and the like of a RapidIO bus network in an embedded signal processing system.

Description

Management and maintenance method of RapidIO bus system

Technical Field

The present invention relates to, but is not limited to, the field of embedded signal processing technologies, and in particular, to a method for managing and maintaining a RapidIO bus system.

Background

The RapidIO bus (one of standard buses) is widely applied in the technical field of embedded signal processing, and the management and maintenance of a complex RapidIO bus network are important parts of the design of an embedded signal processing system.

However, the RapidIO bus network in the embedded signal processing system has the characteristics of difficult configuration management, large burst flow, more data transmission conflicts, high real-time requirement and the like.

The embodiment of the invention provides a reliable, online and real-time RapidIO bus system management and maintenance method, which can effectively solve the stability problem of multitask, strong real-time and mass data transmission in an embedded signal processing system.

Disclosure of Invention

The purpose of the invention is: the embodiment of the invention provides a management and maintenance method of a RapidIO bus system, which aims to solve the problems of difficult configuration and management, large burst flow, multiple data transmission conflicts, high real-time requirements and the like of a RapidIO bus network in an embedded signal processing system.

The technical scheme of the invention is as follows:

the embodiment of the invention provides a management and maintenance method of a RapidIO bus system, which is characterized in that the RapidIO bus system comprises the following steps: the RapidIO processing units are connected through a RapidIO bus to form a RapidIO bus system; each RapidIO processing unit includes: the system comprises a main Host node, a backup Host node, a RapidIO switch and other processing nodes with RapidIO interfaces, wherein one RapidIO processing unit serves as a main processing unit, and the other RapidIO processing units serve as slave processing units; a main Host node, a backup Host node and a RapidIO switch in each processing unit are all configured with a management maintenance interface and a RapidIO interface, other processing nodes are configured with RapidIO interfaces, the main Host node and the backup Host node are respectively interconnected with the RapidIO switch through the management maintenance interface, the RapidIO interfaces of the main Host node, the backup Host node and other processing nodes are respectively connected with the RapidIO interfaces of the RapidIO switch, and the RapidIO interface externally output by the RapidIO switch of each processing unit is interconnected with other processing units in the RapidIO bus system to form the RapidIO bus system; the method for executing management and maintenance of the RapidIO bus system comprises the following steps:

step 1, in each processing unit, a main Host node executes the initial configuration operation of the RapidIO network of the processing unit, and the configuration items comprise: the ID of the RapidIO equipment, the communication rate and the link line width; the main Host node in the main processing unit also executes the communication route configuration of the whole RapidIO bus system;

step 2, in each processing unit, the main Host node configures the RapidIO switch of the processing unit through software; the working state after the configuration is as follows: when an important fault event occurs at a Port of the RapidIO switch, the RapidIO switch reports a fault state to a main Host through an interrupt signal or a Port-Write maintenance packet;

step 3, in each processing unit, the main Host node monitors events in the processing unit, monitors conventional events and monitors important events, wherein the important events comprise fault events, and the fault events comprise conventional fault events and important fault events;

step 4, processing the conventional fault event, including: when a RapidIO interface of a non-main Host node or a switch fails, the main Host node maintains and configures a register of the RapidIO switch through a RapidIO maintenance packet; when the RapidIO interface of the main Host node fails, recovering the failed interface through the management maintenance interface of the main Host node;

and 5, processing important fault events, including: and the main Host reports the fault to a main Host node of the main processing unit through a RapidIO maintenance packet, and the main Host of the main processing unit performs fault transaction broadcasting, blocking link interruption, fault packet discarding or communication route reconfiguration operation through the RapidIO maintenance packet to perform system-level fault processing.

Optionally, in the above-mentioned management and maintenance method for RapidIO bus system, in step 2,

the monitoring mode of the conventional event is as follows: the main Host node accesses a register of the RapidIO switch through interconnection of the internal management and maintenance interface of the processing unit in a periodic query mode, and obtains RapidIO network states of the processing unit in real time, wherein the RapidIO network states include link states, flow and error rates.

the monitoring mode of the important events is as follows: and the main Host node receives an interrupt signal or a Port-Write maintenance packet of the RapidIO switch, analyzes the Port-Write maintenance packet and determines the fault type.

Optionally, in the above method for managing and maintaining a RapidIO bus system, after the step 5, the method further includes:

and 6, recording the event, comprising: the main Host records the event of the processing unit; and the main Host node reports the events affecting other processing units to the main Host of the main processing unit through a RapidIO maintenance packet, and the main Host of the main processing unit determines to discard, record or broadcast the events according to the system running state.

Optionally, in the management and maintenance method of a RapidIO bus system, the method further includes:

step 7, in each processing unit, the main Host periodically reports heartbeats to the standby Host, and when the main Host does not report heartbeats, the standby Host is used as the main Host to take over the management right of the processing unit in the RapidIO bus system; when the management right of the master Host is changed, the change condition is reported to the master Host of the master processing unit through the maintenance packet, and when the management right of the master Host of the master processing unit is changed, the change condition is broadcasted to the slave processing unit in the RapidIO bus system through the maintenance packet.

Optionally, in the above method for managing and maintaining a RapidIO bus system, before the step 1, further includes:

setting one RapidIO processing unit in a RapidIO bus system as a main processing unit and other RapidIO processing units as slave processing units according to the configuration file;

and determining a main Host node and a standby Host node in each processing unit in a right preempting mode.

Optionally, in the management and maintenance method for the RapidIO bus system, the management and maintenance of the RapidIO bus system uses two modes, namely an out-of-band operation of a management and maintenance interface and an in-band maintenance operation of the RapidIO interface, and adopts a two-stage management method combining board-level concentration and system distribution, and both board-level and system Host nodes are realized by a "master-slave" hot backup mode.

Optionally, in the management maintenance method of the RapidIO bus system as described above, the management maintenance interface includes one of PCIe, I2C, and JTAG;

the RapidIO switch in the RapidIO processing unit is configured to be realized through cascade connection of a plurality of RapidIO switch chips.

The invention has the advantages that:

the management and maintenance method of the RapidIO bus system provided by the embodiment of the invention specifically comprises the following aspects: (1) providing a method for real-time state monitoring, quick fault recovery and unrecoverable fault isolation in a RapidIO bus system; (2) providing a hierarchical management maintenance system architecture of a complex RapidIO bus system; (3) providing a management and maintenance strategy combining RapidIO in-band and out-of-band operations; (4) and a reliable 'main-standby' Host system management node design mode is provided. Therefore, the robustness of the RapidIO bus network in the embedded signal processing system is enhanced, and the stable transmission of multi-task, strong real-time and burst large-flow data is realized. The management and maintenance method of the RapidIO bus system has the following advantages that:

(1) the two-stage management and maintenance strategy of board level concentration and system distribution is adopted to realize the layering, classification and home management of RapidIO bus faults, reduce the management and maintenance cost of the RapidIO bus system and improve the management and maintenance efficiency;

(2) the management maintenance method combining the operations in the RapidIO band and out of the RapidIO band is adopted, so that the periodic real-time monitoring of the ordinary transactions and the rapid processing of the emergency transactions of the RapidIO network are realized, and the relationship between the management maintenance overhead and the fault real-time response of the RapidIO bus system is effectively balanced;

(4) the monitoring mode of hot backup of a main-backup Host node is adopted, so that the management and monitoring reliability of the RapidIO bus system is effectively improved.

Drawings

The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the example serve to explain the principles of the invention and not to limit the invention.

FIG. 1 is a schematic structural diagram of a RapidIO bus system for executing a management maintenance method in the embodiment of the present invention;

fig. 2 is a block diagram of a specific embodiment of a RapidIO bus system according to an embodiment of the present invention;

fig. 3 is a functional operation block diagram of a specific embodiment of the RapidIO bus system according to the embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be noted that the embodiments and features of the embodiments in the present application may be arbitrarily combined with each other without conflict.

The steps illustrated in the flow charts of the figures may be performed in a computer system such as a set of computer-executable instructions. Also, while a logical order is shown in the flow diagrams, in some cases, the steps shown or described may be performed in an order different than here.

The invention provides a management maintenance method of a RapidIO bus system, which adopts a two-stage management strategy combining board level concentration and system distribution, two management maintenance methods of RapidIO out-of-band operation and RapidIO in-band operation, and a hot backup monitoring mode of a Host node, provides reliable on-line management maintenance of the RapidIO bus system, and realizes real-time monitoring, fault isolation, fault recovery and the like of the RapidIO bus system.

The following specific embodiments of the present invention may be combined, and the same or similar concepts or processes may not be described in detail in some embodiments.

Fig. 1 is a schematic structural diagram of a RapidIO bus system for executing the management maintenance method in the embodiment of the present invention. The RapidIO bus system comprises a plurality of RapidIO processing units which are interconnected through RapidIO buses, and the RapidIO processing units can be configured into a master processing unit and a slave processing unit through software. Each RapidIO processing unit comprises a software configurable main Host node, a backup Host node, a RapidIO switch and other processing nodes with RapidIO interfaces. The main Host node and the backup Host node in the main processing unit are a system main Host node and a system backup Host node and are responsible for management and maintenance of a RapidIO network in the whole system. And the master Host node and the backup Host node in the slave processing unit are a board-level master Host node and a board-level backup Host node and are responsible for management and maintenance of a RapidIO sub-network in the processing unit.

The main Host node, the backup Host node and the RapidIO switch on the RapidIO processing unit should be provided with a management maintenance interface (the management maintenance interface includes one of PCIe, I2C and JTAG, but is not limited to the above interface) and a RapidIO interface. And the main Host node and the backup Host node are interconnected with the RapidIO switch through the management and maintenance interface. The RapidIO interfaces of all RapidIO processing nodes on the RapidIO processing unit are connected to a RapidIO switch. And the RapidIO switch of the RapidIO processing unit interconnects the externally output RapidIO interface with other RapidIO processing units in the system to form the whole RapidIO network. Based on the hardware structure of the RapidIO bus system and the functions of each part, the RapidIO bus system execution management maintenance method comprises the following steps:

step 2, in each processing unit, the main Host node configures the RapidIO switch of the processing unit through software; the working state after the configuration is as follows: when an important fault event occurs at a Port of the RapidIO switch, the RapidIO switch reports a fault state to a main Host through an interrupt signal or a Port-Write maintenance packet; important fault events can be disconnection, blockage, retransmission stop and the like;

step 4, processing the conventional fault event, including: when the RapidIO interface of the non-master Host node or the switch fails, the master Host node maintains and configures the register of the RapidIO switch through a RapidIO maintenance packet, and the maintenance items may include: port reset, close-restart operation, attempting to recover the failed interface; when the RapidIO interface of the main Host node fails, recovering the failed interface through the management maintenance interface of the main Host node;

In an implementation manner of the embodiment of the present invention, the monitoring manner of the conventional event in step 2 is as follows: the main Host node accesses a register of the RapidIO switch through interconnection of the internal management and maintenance interface of the processing unit in a periodic query mode, and obtains RapidIO network states of the processing unit in real time, wherein the RapidIO network states include link states, flow and error rates. In the specific implementation, on the RapidIO processing unit, the periodic management of the RapidIO sub-network is realized, the main Host node is realized by adopting a periodic query mode, the register of the RapidIO switch is accessed through the internal management maintenance interface of the processing unit, and the states of the RapidIO network in the processing unit, such as link state, flow, error rate and the like, are acquired in real time, so that the real-time monitoring of the RapidIO network state is realized.

In an implementation manner of the embodiment of the present invention, the monitoring manner of the important event in step 2 is as follows: and the main Host node receives an interrupt signal or a Port-Write maintenance packet of the RapidIO switch, analyzes the Port-Write maintenance packet and determines the fault type. In the specific implementation, the emergency transaction management of the RapidIO sub-network is realized by adopting an interrupt or Port-Write maintenance message mode, the emergency transaction of the RapidIO switch is configured to notify the board-level main Host node of the interrupt or Port-Write maintenance message, the board-level main Host node configures a RapidIO switch register through a RapidIO maintenance operation mode, and the RapidIO switch register is configured to handle fault types, including isolation and recovery operations such as route reconfiguration, Port reset or close, Port restart and the like.

When the failure can not be recovered, the board-level main Host node informs the system main Host node through RapidIO maintenance operation, and the system main Host node uses RapidIO maintenance operation to take charge of system-level failure processing, including system failure notification, failure packet discarding, routing reconfiguration and other operations, so as to realize the notification and isolation of the failure in the system.

On the RapidIO processing unit, a board-level main Host node set by software is responsible for the periodic state transaction management of a RapidIO sub-network on the processing unit through a management maintenance interface. And emergency transaction management such as fault isolation, fault recovery and the like of a RapidIO sub-network on the processing unit is taken charge through RapidIO maintenance operation. And reporting the board-level RapidIO sub-network state and fault information to a system Host through RapidIO maintenance operation.

On the main processing unit, the system main Host and the backup Host are responsible for receiving and analyzing the state and the fault message of the RapidIO sub-network reported by each board level Host node, and the RapidIO event is processed by adopting a RapidIO maintenance operation mode, and the method comprises the following steps: system failure notification, failed packet discarding, route reconfiguration, etc.

After step 5, the embodiment of the present invention further includes:

and 6, recording the event, comprising: the main Host records events of the processing unit, such as a flow state, a packet loss state and a retransmission state; and the main Host node reports events affecting other processing units to the main Host of the main processing unit through a RapidIO maintenance packet, such as disconnection, connection, link blockage and the like, and the main Host of the main processing unit determines to discard, record or broadcast the events according to the system running state.

Further, the embodiment of the present invention further includes:

In a specific implementation manner, on each RapidIO processing unit, the main Host node and the backup Host adopt a hot backup mode, main-backup monitoring is performed in a heartbeat mode, and when the main Host node fails, the backup Host node takes over the management authority of the main Host node.

In other RapidIO processing units, the board-level backup Host monitors the heartbeat of a board-level main Host node, takes over the management right of a board-level RapidIO sub-network when the board-level main Host is free of heartbeat, and notifies the main Host of the RapidIO bus system of the change of the management right. In the RapidIO main processing unit, the system backups the Host, monitors the heartbeat of the main Host node of the system, takes over the management right of the RapidIO bus system when the main Host of the system has no heartbeat, and notifies the board-level main Host in the RapidIO bus system of the change of the management right.

In practical application, any RapidIO processing unit in the software configuration system can be used as a main processing unit, and a main Host node and a backup Host node of the software configuration system are used as a system main Host node and a system backup Host node. When the RapidIO processing unit is a main processing unit, the main Host and the backup Host node are simultaneously responsible for management and maintenance of a board-level RapidIO bus and management and maintenance of a system RapidIO bus. At this time, the system main Host and the board-level main Host may be the same processor node, and the system backup Host and the board-level backup Host may be the same processor node. In addition, in each processing unit, a main Host node and a standby Host node in the processing unit can be determined in a right-robbing mode.

It should be noted that, in the embodiment of the present invention, the management and maintenance of the RapidIO bus system uses two modes, namely, an out-of-band operation of the management and maintenance interface and an in-band maintenance operation of the RapidIO interface, and a two-stage management method combining board level concentration and system distribution is adopted, and both board level and system Host node are implemented in a "main-standby" hot backup mode, so as to provide reliable management and maintenance of the RapidIO bus system.

Further, the management maintenance interface in the embodiment of the present invention includes one of PCIe, I2C, and JTAG; the RapidIO switch in the RapidIO processing unit is configured to be realized through cascade connection of a plurality of RapidIO switch chips.

The management and maintenance method of the RapidIO bus system provided by the embodiment of the invention specifically comprises the following aspects: (1) providing a method for real-time state monitoring, quick fault recovery and unrecoverable fault isolation in a RapidIO bus system; (2) providing a hierarchical management maintenance system architecture of a complex RapidIO bus system; (3) providing a management and maintenance strategy combining RapidIO in-band and out-of-band operations; (4) and a reliable 'main-standby' Host system management node design mode is provided. Therefore, the robustness of the RapidIO bus network in the embedded signal processing system is enhanced, and the stable transmission of multi-task, strong real-time and burst large-flow data is realized.

The management and maintenance method of the RapidIO bus system provided by the embodiment of the invention adopts a two-stage management strategy combining board level concentration and system distribution, two management and maintenance methods of RapidIO out-of-band operation and RapidIO in-band operation, and a hot backup monitoring mode of a Host node, provides reliable on-line management and maintenance of the RapidIO bus system, and realizes real-time monitoring, fault isolation, fault recovery and the like of the RapidIO bus system.

Fig. 2 is a block diagram of a RapidIO bus system according to an embodiment of the present invention, and the present invention is further described with reference to a specific embodiment.

The RapidIO bus system is realized by a plurality of RapidIO processing units, a main Host node and a backup Host node in the RapidIO processing units are realized by adopting a TMS320C6678 processor of a TI company, and a RapidIO switch is realized by adopting an 80HCPS1848 switching chip of the IDT company.

The management and maintenance interface of the TMS320C6678 processor is realized through I2C, the heartbeat between the main Host node and the backup Host node is realized through a GPIO interface of the processor, the main Host node reports the heartbeat to the backup Host node periodically, and the TMS320C6678 processor realizes interconnection of 1-path 4x/5Gbps RapidIO and a switching chip.

The 80HCPS1848 switching chip provides a RapidIO physical interface of 18 ports and 48 lines and supports the RapidIO V2.1 specification. In a specific embodiment, the configuration is a 4x module port, which is respectively connected with a TMS320C6678 processor inside a RapidIO processing unit and outputs 2 paths of 4x RapidIO interfaces to the outside, so as to implement system interconnection. The 80HCPS1848 switching chip provides I2C as an administration and maintenance interface, is connected with the TMS320C6678 processor, and realizes administration and maintenance of RapidIO.

As shown in fig. 3, for a functional operation block diagram of a specific embodiment of the RapidIO bus system provided in the embodiment of the present invention, in the management and maintenance design of the RapidIO bus system, a RapidIO switch chip is configured by software, and when it is preset that states such as disconnection, retransmission, and the like occur at each RapidIO Port, a fault state is reported to the TMS320C6678 processor by an interrupt or a Port-Write maintenance packet.

During the operation of a board-level main Host node TMS320C6678, the following functions are mainly realized: the states of the RapidIO switching chip are read periodically through an I2C interface, wherein the states include link states, flow, error rate and the like; responding and processing the interrupt or Port-Write transaction reported by the RapidIO chip, and performing fault recovery; periodically reporting the state of a local RapidIO sub-network to a system main Host node, and reporting an unrecoverable fault in real time; and in the process of system initialization or fault recovery, finishing the configuration management of the local route. Meanwhile, after the board-level backup Host node monitors that the main Host node has no heartbeat, the board-level backup Host node takes over the management right of the board-level RapidIO sub-network and reports the management right to the system main Host.

During the operation of a main Host node TMS320C6678 of the system, the following functions are mainly realized: periodically acquiring and analyzing state information reported by a RapidIO sub-network, and reporting a system state or recording a key state to an upper-level host system; responding to the unrecoverable fault reported by the RapidIO sub-network, reporting the unrecoverable fault and related nodes to the whole system, and sending route change information to realize fault isolation; and completing the configuration management of the system route in the process of system initialization or fault isolation. Meanwhile, after the system backup Host node monitors that the main Host node has no heartbeat, the management right of the RapidIO network of the system is taken over, and the management right is reported to each board-level main Host.

Although the embodiments of the present invention have been described above, the above description is only for the convenience of understanding the present invention, and is not intended to limit the present invention. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A management maintenance method of a RapidIO bus system is characterized in that the RapidIO bus system comprises the following steps: the RapidIO processing units are connected through a RapidIO bus to form a RapidIO bus system; each RapidIO processing unit includes: the system comprises a main Host node, a backup Host node, a RapidIO switch and other processing nodes with RapidIO interfaces, wherein one RapidIO processing unit serves as a main processing unit, and the other RapidIO processing units serve as slave processing units; a main Host node, a backup Host node and an RapidIO switch in each processing unit are all configured with a management maintenance interface and an RapidIO interface, other processing nodes are configured with RapidIO interfaces, the main Host node and the backup Host node are respectively interconnected with the management maintenance interface of the RapidIO switch through the management maintenance interface, the RapidIO interfaces of the main Host node, the backup Host node and other processing nodes are respectively connected with the RapidIO interface of the RapidIO switch, and the RapidIO interface externally output by the RapidIO switch of each processing unit is interconnected with other processing units in the RapidIO bus system to form the RapidIO bus system; the method for executing management and maintenance of the RapidIO bus system comprises the following steps:

step 2, in each processing unit, the main Host node configures the RapidIO switch of the processing unit through software; the working state after the configuration is as follows: when an important fault event occurs at a Port of the RapidIO switch, the RapidIO switch reports a fault state to a main Host node through an interrupt signal or a Port-Write maintenance packet; the Port-Write maintenance packet is a maintenance packet which actively reports the error state of a fault Port to a main Host node when the RapidIO switch fails;

step 4, processing the conventional fault event, including: when a RapidIO interface of the switch or a RapidIO interface of a non-master Host node fails, the master Host node maintains and configures a register of the RapidIO switch through a RapidIO maintenance packet; when the RapidIO interface of the main Host node fails, recovering the failed interface through the management maintenance interface of the main Host node;

and 5, processing important fault events, including: the main Host node reports the fault to the main Host node of the main processing unit through the RapidIO maintenance packet, and the main Host node of the main processing unit performs fault transaction broadcasting, blocking link interruption, fault packet discarding or communication route reconfiguration operation through the RapidIO maintenance packet to perform system-level fault processing.

2. The method for managing and maintaining a RapidIO bus system according to claim 1, wherein in the step 3,

3. The method for managing and maintaining a RapidIO bus system according to claim 1, wherein in the step 3,

4. The method for managing and maintaining the RapidIO bus system according to claim 1, further comprising, after the step 5:

and 6, recording the event, comprising: the main Host node records the event of the processing unit; and the main Host node reports the events affecting other processing units to the main Host node of the main processing unit through a RapidIO maintenance packet, and the main Host node of the main processing unit determines to discard, record or broadcast the events according to the system running state.

5. The method for managing and maintaining the RapidIO bus system according to claim 4, further comprising:

step 7, in each processing unit, the main Host node periodically reports heartbeats to the backup Host node, and when the main Host node does not report heartbeats, the backup Host node is used as the main Host node to take over the management right of the processing unit in the RapidIO bus system; when the management right of the main Host node is changed, the change condition is reported to the main Host node of the main processing unit through the maintenance packet, and when the management right of the main Host node of the main processing unit is changed, the change condition is broadcasted to the auxiliary processing unit in the RapidIO bus system through the maintenance packet.

6. The method for managing and maintaining the RapidIO bus system according to claim 1, wherein the step 1 is preceded by the steps of:

and determining a main Host node and a backup Host node in each processing unit in a right preempting mode.

7. The method for managing and maintaining the RapidIO bus system according to any one of claims 1 to 6, characterized in that the RapidIO bus system is managed and maintained by using two modes of out-of-band operation of a management and maintenance interface and in-band maintenance operation of the RapidIO interface, a two-stage management method combining board level concentration and system distribution is adopted, and both board level and system Host nodes are realized by adopting a 'main-standby' hot backup mode.

8. The method for managing and maintaining the RapidIO bus system according to any of claims 1-6, wherein the management and maintenance interface comprises one of PCIe, I2C and JTAG;