CN109918230B - Method and system for recovering abnormity of service board card - Google Patents

Method and system for recovering abnormity of service board card Download PDF

Info

Publication number
CN109918230B
CN109918230B CN201910124873.6A CN201910124873A CN109918230B CN 109918230 B CN109918230 B CN 109918230B CN 201910124873 A CN201910124873 A CN 201910124873A CN 109918230 B CN109918230 B CN 109918230B
Authority
CN
China
Prior art keywords
board card
service board
accessed
fpga
main control
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910124873.6A
Other languages
Chinese (zh)
Other versions
CN109918230A (en
Inventor
项东阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou DPTech Technologies Co Ltd
Original Assignee
Hangzhou DPTech Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou DPTech Technologies Co Ltd filed Critical Hangzhou DPTech Technologies Co Ltd
Priority to CN201910124873.6A priority Critical patent/CN109918230B/en
Publication of CN109918230A publication Critical patent/CN109918230A/en
Application granted granted Critical
Publication of CN109918230B publication Critical patent/CN109918230B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The application provides a method and a system for recovering abnormity of a service board card. A method for recovering an exception of a service board card comprises the following steps: a CPU on a main control board card sends an access request message to the FPGA; the FPGA receives the access request message, analyzes the PCIe bus address space address of the service board card to be accessed carried in the access request message, issues the access request message to the service board card to be accessed according to the PCIe bus address space address of the service board card to be accessed, judges whether response data returned by the service board card to be accessed are received, if not, determines that the access of the service board card to be accessed fails, reports abnormal interruption to a CPU on the main control board card, and stores the PCIe bus address space address of the service board card to be accessed into a cache; after receiving the abnormal interruption reported by the FPGA, the CPU on the main control board reads the PCIe bus address space address of the service board to be accessed from the cache, determines the abnormal service board to be accessed, and sends a reset or restart instruction to the FPGA.

Description

Method and system for recovering abnormity of service board card
Technical Field
The present application relates to the field of communications technologies, and in particular, to a method and a system for recovering an exception of a service board.
Background
In the centralized control frame type device, the management and control of all the service boards are uniformly handled by the CPU on the main control board, the service boards and the main control board are connected by a PCIe (Peripheral Component Interconnect express) bus, and the PCIe bus is widely applied in a manner that the service boards can share a broadband independently by virtue of its high-speed transmission efficiency. However, with the increase and expansion of the functions of the centralized control frame type device, more and more service board cards to be managed by the CPU on the main control board card are provided, and due to the channel link and the hardware of the centralized control frame type device, the abnormal condition of the service board cards occurs occasionally, and if the abnormal condition of the service board cards cannot be processed in time, the centralized control frame type device may be down.
The existing technical scheme is to process the exception of the service board card by a soft and hard combination mode: the CPU on the main control board card accesses the service board card through the PCIe bus, when the message sent to the service board card by the CPU on the main control board card cannot respond to the service board card for a long time, the CPU on the main control board card judges that the service board card is abnormal, at the moment, a special register in the CPU on the main control board card is set, and after a subsequent application program detects the corresponding register set, the abnormal service board card is determined and reset, so that the condition that the whole centralized control frame type equipment is crashed due to the abnormality of a single service board card can be prevented.
However, in the existing technical scheme, the instruction detection and the register inside the CPU are mainly realized, and corresponding functions need to be integrated inside the CPU, which greatly increases the hardware cost.
Disclosure of Invention
In view of this, the present application provides a method and a system for recovering an exception of a service board.
Specifically, the method is realized through the following technical scheme:
a method for recovering an abnormal service board card is characterized by being applied to centralized control frame type equipment, wherein the centralized control frame type equipment comprises a main control board card, an FPGA and at least one service board card, the FPGA is respectively connected with the main control board card and the at least one service board card, and the method comprises the following steps:
in the preparation stage: a Central Processing Unit (CPU) on the main control board card distributes corresponding PCIe bus address space for each service board card according to PCIe bus address space configuration information, and the address range of the PCIe bus address space distributed for each service board card is stored according to a board card distribution topological graph;
in the treatment stage: a CPU on a main control board card sends an access request message to the FPGA;
the FPGA receives the access request message and analyzes a PCIe bus address space address of the service board card to be accessed, wherein the PCIe bus address space address is carried in the access request message;
the FPGA issues the access request message to the corresponding service board card to be accessed according to the PCIe bus address space address of the service board card to be accessed;
the FPGA judges whether response data returned by the service board card to be accessed are received;
if not, the FPGA determines that the access to the service board card to be accessed fails, reports the abnormal interrupt to a CPU on a main control board card, and stores the PCIe bus address space address of the service board card to be accessed into a cache;
after receiving the abnormal interruption reported by the FPGA, a CPU on the main control board card reads the PCIe bus address space address of the service board card to be accessed from the cache, matches the address range of the PCIe bus address space allocated to each service board card in advance and determines the abnormal service board card to be accessed;
and the CPU on the main control board card sends a reset or restart instruction to the FPGA so that the FPGA executes reset or restart operation on the abnormal service board card to be accessed according to the reset or restart instruction.
The utility model provides a business integrated circuit board abnormity recovery system which characterized in that is applied to centralized control frame equipment, centralized control frame equipment includes main control integrated circuit board, FPGA and at least one business integrated circuit board, FPGA respectively with the main control integrated circuit board at least one business integrated circuit board is connected, the system includes:
in the preparation stage: a Central Processing Unit (CPU) on the main control board card distributes corresponding PCIe bus address space for each service board card according to PCIe bus address space configuration information, and the address range of the PCIe bus address space distributed for each service board card is stored according to a board card distribution topological graph;
in the treatment stage: a CPU on a main control board card sends an access request message to the FPGA;
the FPGA receives the access request message and analyzes a PCIe bus address space address of the service board card to be accessed, wherein the PCIe bus address space address is carried in the access request message;
the FPGA issues the access request message to the corresponding service board card to be accessed according to the PCIe bus address space address of the service board card to be accessed;
the FPGA judges whether response data returned by the service board card to be accessed are received;
if not, the FPGA determines that the access to the service board card to be accessed fails, reports the abnormal interrupt to a CPU on a main control board card, and stores the PCIe bus address space address of the service board card to be accessed into a cache;
after receiving the abnormal interruption reported by the FPGA, a CPU on the main control board card reads the PCIe bus address space address of the service board card to be accessed from the cache, matches the address range of the PCIe bus address space allocated to each service board card in advance and determines the abnormal service board card to be accessed;
and the CPU on the main control board card sends a reset or restart instruction to the FPGA so that the FPGA executes reset or restart operation on the abnormal service board card to be accessed according to the reset or restart instruction.
By adopting the technical scheme provided by the application, the internal instruction detection and register implementation of the CPU are not required, the corresponding function of the internal integration of the CPU is not required, and the hardware cost is greatly reduced.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required in the description of the embodiments will be briefly introduced below, and it is apparent that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art according to the drawings.
FIG. 1 is a diagram illustrating hardware connections in accordance with an exemplary embodiment of the present application;
FIG. 2 is another hardware connection diagram shown in an exemplary embodiment of the present application;
fig. 3 is an interaction flow diagram of a method for recovering an exception of a service board according to an exemplary embodiment of the present application;
fig. 4 is a card distribution topology diagram according to an exemplary embodiment of the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with aspects of the present application.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
Firstly, a description is given of a method for recovering a service board from an abnormality, which is provided by an embodiment of the present application and is applied to a centralized control frame type device, where the centralized control frame type device includes a main control board, an FPGA, and at least one service board, and the FPGA is connected to the main control board and the at least one service board, and the method mainly includes:
a preparation stage: a Central Processing Unit (CPU) on the main control board card distributes corresponding PCIe bus address space for each service board card according to PCIe bus address space configuration information, and the address range of the PCIe bus address space distributed for each service board card is stored according to a board card distribution topological graph;
in the treatment stage: a CPU on a main control board card sends an access request message to the FPGA;
the FPGA receives the access request message and analyzes a PCIe bus address space address of the service board card to be accessed, wherein the PCIe bus address space address is carried in the access request message;
the FPGA issues the access request message to the corresponding service board card to be accessed according to the PCIe bus address space address of the service board card to be accessed;
the FPGA judges whether response data returned by the service board card to be accessed are received;
if not, the FPGA determines that the access to the service board card to be accessed fails, reports the abnormal interrupt to a CPU on a main control board card, and stores the PCIe bus address space address of the service board card to be accessed into a cache;
after receiving the abnormal interruption reported by the FPGA, a CPU on the main control board card reads the PCIe bus address space address of the service board card to be accessed from the cache, matches the address range of the PCIe bus address space allocated to each service board card in advance and determines the abnormal service board card to be accessed;
and the CPU on the main control board card sends a reset or restart instruction to the FPGA so that the FPGA executes reset or restart operation on the abnormal service board card to be accessed according to the reset or restart instruction.
In the background art, as shown in an exemplary hardware connection diagram shown in fig. 1, a CPU on a main control board accesses a service board through a PCIe bus, when a message sent to the service board by the CPU on the main control board cannot receive a response of the service board for a long time, the CPU on the main control board determines that the service board is abnormal, a dedicated register inside the CPU on the main control board is set, and after detecting the corresponding register set, a subsequent application program determines the abnormal service board and resets the service board, so that occurrence of a downtime situation of the entire centralized control frame device due to an abnormality of a single service board can be prevented. Although the condition that the whole centralized control frame type equipment is down due to the abnormity of the single service board card can be prevented, the internal instruction detection and the register of the CPU are mainly used for realizing the internal function of the CPU, and the hardware cost can be greatly improved.
To this end, as shown in fig. 2, an exemplary hardware connection diagram is provided, in which an FPGA is connected to a CPU and a service board on a main control board respectively, in case that the FPGA determines that the access to the service board card fails, the PCIe bus address space address of the corresponding service board card to be accessed is stored in the cache (in the memory shown in fig. 2), reporting abnormal interruption to a CPU on the main control board, reading the PCIe bus address space address of the service board to be accessed from the cache after the CPU on the main control board receives the abnormal interruption reported by the FPGA, matching with the address range of PCIe bus address space pre-allocated for each service board card, determining the abnormal service board card to be accessed, subsequently sending a reset or restart instruction to the FPGA by the CPU on the main control board card, and the FPGA executes resetting or restarting operation on the abnormal service board card to be accessed according to the resetting or restarting instruction. Therefore, by adopting the technical scheme provided by the application, the occurrence of the downtime of the whole centralized control frame type equipment caused by the abnormality of the single service board card can be prevented, the instruction detection and the register inside the CPU are not required to realize, the corresponding function inside the CPU is not required to be integrated, and the hardware cost is greatly reduced. For further explanation of the present application, the following examples are provided:
as shown in fig. 3, an interactive flow diagram of a method for recovering an exception of a service board according to an embodiment of the present application is shown, where the method includes the following steps:
in the preparation stage:
s301, a CPU on the main control board card distributes corresponding PCIe bus address space for each service board card according to PCIe bus address space configuration information, and the address range of the PCIe bus address space distributed for each service board card is stored according to a board card distribution topological graph;
in the application, a CPU on a main control board pre-allocates a corresponding PCIe bus address space for each service board, and stores an address range of the PCIe bus address space allocated for each service board according to a board distribution topology, and stores a correspondence between the service board and the PCIe bus address space on the main control board and an FPGA, respectively, where the board distribution topology is similar to an equipment tree as shown in fig. 4, meaning that the address range of the PCIe bus address space allocated for each service board is stored according to a tree structure.
The CPU on the main control board card distributes corresponding PCIe bus address space for each service board card in advance according to PCIe bus address space configuration information, and the PCIe bus address space configuration information is obtained according to the PCIe bus address space required by each service board card. For example, the CPU on the main control board allocates 6MPCIe bus address space to the service board 1, allocates 5MPCIe bus address space to the service board 2, allocates 10MPCIe bus address space … … to the service board 3 in advance according to the PCIe bus address space configuration information, and stores the address range of the PCIe bus address space allocated to each service board according to the board distribution topology.
Preferably, after distributing the corresponding PCIe bus address space for each service board, the CPU on the main control board may scan the corresponding PCIe devices on each service board of the centralized control frame device in a recursive manner, distribute the corresponding PCIe bus address space for the scanned PCIe devices based on the PCIe bus address space distributed for each service board according to a depth-first principle, and store the address information of the distributed PCIe bus address space. For example, 6MPCIe bus address spaces are allocated to the service board 1, the CPU on the main control board may scan corresponding PCIe devices on each service board of the centralized control frame device in a recursive manner, and allocate corresponding 2MPCIe bus address spaces to the scanned PCIe devices 1 based on the 6MPCIe bus address spaces allocated to the service board 1 according to a depth-first principle, and store address information of the allocated PCIe bus address spaces.
In addition, the CPU on the main control board opens a certain size of space in the Cache in advance, for example, a 4byte memory space, which is specially used for write-back of the PCIe bus address space address of the service board to be accessed, and stores the first address of the Cache space into an FPGA register, for example, an FPGA dedicated register SMB Cache Addr, so as to facilitate the CPU on the main control board to read.
In the treatment stage:
s302, a CPU on a main control board sends an access request message to an FPGA;
when the CPU on the main control board card accesses the service board card or a PCIe device on the service board card, an access request message is sent to the FPGA, so that the FPGA sends the access request message to the corresponding service board card or the PCIe device on the service board card.
S303, the FPGA receives the access request message and analyzes the PCIe bus address space address of the service board card to be accessed carried by the access request message;
s304, the FPGA issues the access request message to the corresponding service board card to be accessed according to the PCIe bus address space address of the service board card to be accessed;
the FPGA receives an access request message issued by a CPU on the main control board, analyzes a PCIe bus address space address of a service board card to be accessed carried by the access request message, for example, if the CPU on the main control board needs to access the service board card 1, the PCIe bus address space address corresponding to the service board card 1 carried in the access request message is analyzed. The FPGA can query the corresponding relationship between the locally stored service board card and the PCIe bus address space according to the PCIe bus address space address corresponding to the service board card to be accessed, and issue the access request message to the corresponding service board card to be accessed, for example, issue the access request message to the service board card 1.
S305, judging whether response data returned by the service board card to be accessed is received by the FPGA;
and setting a timer in the FPGA, sending the access request message to the corresponding service board card to be accessed, judging whether response data returned by the service board card to be accessed is received in a preset time period, if the response data returned by the service board card to be accessed is not received in the preset time period, determining that the service board card to be accessed fails to be accessed, and otherwise, receiving the response data returned by the service board card to be accessed in real time.
S306, if not, the FPGA determines that the access to the service board card to be accessed fails, reports the abnormal interrupt to a CPU on the main control board card, and stores the PCIe bus address space address of the service board card to be accessed into a cache;
if the response data returned by the service board card to be accessed is not received, determining that the service board card to be accessed fails to be accessed, preferably, if the response data returned by the service board card to be accessed is not received within a preset time period, the FPGA continuously transmits the access request message to the corresponding service board card according to the preset transmission times of the access request message, and if the FPGA continuously transmits the access request message to the corresponding service board card according to the preset transmission times of the access request message, the FPGA still does not receive the response data of the service board card to be accessed, determining that the service board card to be accessed fails to be accessed.
For example, if response data returned by the service board card to be accessed is not received within a preset time period, the FPGA continuously issues the access request message to the corresponding service board card according to the preset number of times (5 times) of sending the access request message, judges whether response data returned by the service board card to be accessed is received or not every time the access request message is sent, and determines that the service board card to be accessed fails to be accessed if the response data returned by the service board card to be accessed is not received in the period of continuously issuing the access request message to the corresponding service board card.
And under the condition that the access of the service board card to be accessed is determined to be failed, the FPGA stops sending an access request message to the service board card to be accessed, reports abnormal interruption to a CPU on the main control board card, and stores the PCIe bus address space address of the service board card to be accessed into a pre-opened cache.
Preferably, in order to prevent the CPU on the main control board from being abnormal due to timeout of access to the service board to be accessed, the FPGA returns the self-organized invalid response data to the CPU on the main control board when determining that the access to the service board to be accessed fails.
S307, after the CPU on the main control board card receives the abnormal interruption reported by the FPGA, the PCIe bus address space address of the service board card to be accessed is read from the cache and is matched with the address range of the PCIe bus address space allocated to each service board card in advance, and the abnormal service board card to be accessed is determined;
after the CPU on the main control board card receives the abnormal interruption reported by the FPGA, the PCIe bus address space address of the service board card to be accessed in the cache is read and matched with the address range of the PCIe bus address space allocated to each service board card in advance, the abnormal service board card to be accessed can be determined, and the abnormal PCIe device on the service board card to be accessed can be further determined.
And stopping the access of the service board card to be accessed by the CPU on the main control board card under the condition of determining the abnormal service board card to be accessed.
S308, the CPU on the main control board sends a reset or restart instruction to the FPGA, so that the FPGA executes reset or restart operation on the abnormal service board to be accessed according to the reset or restart instruction.
The method comprises the following steps that under the condition that a CPU on a main control board card determines an abnormal service board card to be accessed, the abnormal service board card to be accessed is reset or restarted, and specifically, the method comprises the following steps: the CPU on the main control board sends a reset or restart instruction to the FPGA, the FPGA performs reset or restart operation on the abnormal service board to be accessed according to the reset or restart instruction, the CPU on the subsequent main control board sends an initialization instruction to the FPGA, and the FPGA performs initialization operation on the abnormal service board to be accessed according to the initialization instruction.
Through the above description of the technical scheme provided by the application, when the FPGA determines that the access to the service board card to be accessed fails, the PCIe bus address space address of the corresponding service board card to be accessed is stored in the cache, and the CPU reports the abnormal interrupt to the main control board, after the CPU on the main control board receives the abnormal interrupt reported by the FPGA, the PCIe bus address space address of the service board card to be accessed is read from the cache and matched with the address range of the PCIe bus address space allocated to each service board card in advance, the abnormal service board card to be accessed is determined, and the CPU on the subsequent main control board card sends a reset or restart instruction to the FPGA, so that the FPGA performs reset or restart operation on the abnormal service board card to be accessed according to the reset or restart instruction. Therefore, by adopting the technical scheme provided by the application, the occurrence of the downtime of the whole centralized control frame type equipment caused by the abnormality of the single service board card can be prevented, the instruction detection and the register inside the CPU are not required to realize, the corresponding function inside the CPU is not required to be integrated, and the hardware cost is greatly reduced.
In addition, after the above steps, the CPU on the main control board can access the service board to be accessed again, if the service board to be accessed after reset or restart still has a problem, the CPU on the main control board resets or restarts the abnormal service board to be accessed, and if the problem still cannot be solved, the CPU on the main control board executes a power-off operation on the service board to be accessed (specifically, sends a power-off instruction to the FPGA, and the FPGA executes the power-off operation on the service board to be accessed according to the power-off instruction), and prompts a maintenance person to perform processing.
Corresponding to the embodiment of the service board abnormality recovery method, the present application further provides an embodiment of a service board abnormality recovery system, which is applied to a centralized control frame device, where the centralized control frame device includes a main control board, an FPGA, and at least one service board, where the FPGA is connected to the main control board and the at least one service board, and the system includes:
in the preparation stage: a Central Processing Unit (CPU) on the main control board card distributes corresponding PCIe bus address space for each service board card according to PCIe bus address space configuration information, and the address range of the PCIe bus address space distributed for each service board card is stored according to a board card distribution topological graph;
in the treatment stage: a CPU on a main control board card sends an access request message to the FPGA;
the FPGA receives the access request message and analyzes a PCIe bus address space address of the service board card to be accessed, wherein the PCIe bus address space address is carried in the access request message;
the FPGA issues the access request message to the corresponding service board card to be accessed according to the PCIe bus address space address of the service board card to be accessed;
the FPGA judges whether response data returned by the service board card to be accessed are received;
if not, the FPGA determines that the access to the service board card to be accessed fails, reports the abnormal interrupt to a CPU on a main control board card, and stores the PCIe bus address space address of the service board card to be accessed into a cache;
after receiving the abnormal interruption reported by the FPGA, a CPU on the main control board card reads the PCIe bus address space address of the service board card to be accessed from the cache, matches the address range of the PCIe bus address space allocated to each service board card in advance and determines the abnormal service board card to be accessed;
and the CPU on the main control board card sends a reset or restart instruction to the FPGA so that the FPGA executes reset or restart operation on the abnormal service board card to be accessed according to the reset or restart instruction.
The system implementation process is detailed in the implementation process of the corresponding steps in the method, and is not described herein again.
Through the above description of the technical solution provided in the embodiment of the present application, when the FPGA determines that the access to the service board card to be accessed fails, the PCIe bus address space address of the corresponding service board card to be accessed is stored in the cache, and the CPU reports the abnormal interrupt to the main control board, after the CPU on the main control board receives the abnormal interrupt reported by the FPGA, the PCIe bus address space address of the service board card to be accessed is read from the cache, and is matched with the address range of the PCIe bus address space allocated to each service board card in advance, the abnormal service board card to be accessed is determined, and the CPU on the subsequent main control board card sends a reset or restart instruction to the FPGA, so that the FPGA performs a reset or restart operation on the abnormal service board card to be accessed according to the reset or restart instruction. Therefore, by adopting the technical scheme provided by the application, the occurrence of the downtime of the whole centralized control frame type equipment caused by the abnormality of the single service board card can be prevented, the instruction detection and the register inside the CPU are not required to realize, the corresponding function inside the CPU is not required to be integrated, and the hardware cost is greatly reduced.
For the system embodiment, since it basically corresponds to the method embodiment, reference may be made to the partial description of the method embodiment for relevant points. The above-described system embodiments are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the application. One of ordinary skill in the art can understand and implement it without inventive effort.
The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
The foregoing is directed to embodiments of the present invention, and it is understood that various modifications and improvements can be made by those skilled in the art without departing from the spirit of the invention.

Claims (10)

1. A method for recovering an abnormal service board card is characterized by being applied to centralized control frame type equipment, wherein the centralized control frame type equipment comprises a main control board card, an FPGA and at least one service board card, the FPGA is respectively connected with the main control board card and the at least one service board card, and the method comprises the following steps:
in the preparation stage: a Central Processing Unit (CPU) on the main control board card distributes corresponding PCIe bus address space for each service board card according to PCIe bus address space configuration information, and the address range of the PCIe bus address space distributed for each service board card is stored according to a board card distribution topological graph;
in the treatment stage: a CPU on a main control board card sends an access request message to the FPGA;
the FPGA receives the access request message and analyzes a PCIe bus address space address of the service board card to be accessed, wherein the PCIe bus address space address is carried in the access request message;
the FPGA issues the access request message to the corresponding service board card to be accessed according to the PCIe bus address space address of the service board card to be accessed;
the FPGA judges whether response data returned by the service board card to be accessed are received;
if not, the FPGA determines that the access to the service board card to be accessed fails, reports the abnormal interrupt to a CPU on a main control board card, and stores the PCIe bus address space address of the service board card to be accessed into a cache;
after receiving the abnormal interruption reported by the FPGA, a CPU on the main control board card reads the PCIe bus address space address of the service board card to be accessed from the cache, matches the address range of the PCIe bus address space allocated to each service board card in advance and determines the abnormal service board card to be accessed;
and the CPU on the main control board card sends a reset or restart instruction to the FPGA so that the FPGA executes reset or restart operation on the abnormal service board card to be accessed according to the reset or restart instruction.
2. The method of claim 1, wherein the determining, by the FPGA, whether response data returned by the service board to be accessed is received includes:
and the FPGA judges whether response data returned by the service board card to be accessed is received within a preset time period.
3. The method according to claim 2, wherein if not, the FPGA determining that the access to the service board to be accessed fails comprises:
if response data returned by the service board card to be accessed are not received within a preset time period, the FPGA continuously sends the access request message to the corresponding service board card according to the preset sending times of the access request message;
if the FPGA does not receive the response data of the service board card to be accessed in the period of continuously sending the access request message to the corresponding service board card according to the preset sending times of the access request message, determining that the service board card to be accessed fails to be accessed.
4. The method of claim 1, further comprising:
and stopping the access of the service board card to be accessed by the CPU on the main control board card under the condition of determining the abnormal service board card to be accessed.
5. The method according to any one of claims 1 to 4, further comprising:
and under the condition that the FPGA determines that the access to the service board card to be accessed fails, the self-organized invalid response data is returned to the CPU on the main control board card, so that the CPU on the main control board card cannot be abnormal due to overtime access.
6. A service board abnormity recovery system is characterized by being applied to centralized control frame type equipment, wherein the centralized control frame type equipment comprises a main control board, an FPGA and at least one service board, the FPGA is respectively connected with the main control board and the at least one service board, and the system is used for realizing the following method:
in the preparation stage: a Central Processing Unit (CPU) on the main control board card distributes corresponding PCIe bus address space for each service board card according to PCIe bus address space configuration information, and the address range of the PCIe bus address space distributed for each service board card is stored according to a board card distribution topological graph;
in the treatment stage: a CPU on a main control board card sends an access request message to the FPGA;
the FPGA receives the access request message and analyzes a PCIe bus address space address of the service board card to be accessed, wherein the PCIe bus address space address is carried in the access request message;
the FPGA issues the access request message to the corresponding service board card to be accessed according to the PCIe bus address space address of the service board card to be accessed;
the FPGA judges whether response data returned by the service board card to be accessed are received;
if not, the FPGA determines that the access to the service board card to be accessed fails, reports the abnormal interrupt to a CPU on a main control board card, and stores the PCIe bus address space address of the service board card to be accessed into a cache;
after receiving the abnormal interruption reported by the FPGA, a CPU on the main control board card reads the PCIe bus address space address of the service board card to be accessed from the cache, matches the address range of the PCIe bus address space allocated to each service board card in advance and determines the abnormal service board card to be accessed;
and the CPU on the main control board card sends a reset or restart instruction to the FPGA so that the FPGA executes reset or restart operation on the abnormal service board card to be accessed according to the reset or restart instruction.
7. The system according to claim 6, wherein the FPGA specifically determines whether response data returned by the service board card to be accessed is received by the following method:
and the FPGA judges whether response data returned by the service board card to be accessed is received within a preset time period.
8. The system according to claim 7, wherein the FPGA determines that the access to the service board card to be accessed fails by specifically:
if response data returned by the service board card to be accessed are not received within a preset time period, the FPGA continuously sends the access request message to the corresponding service board card according to the preset sending times of the access request message;
if the FPGA does not receive the response data of the service board card to be accessed in the period of continuously sending the access request message to the corresponding service board card according to the preset sending times of the access request message, determining that the service board card to be accessed fails to be accessed.
9. The system of claim 6, wherein the system is further configured to implement the method of:
and stopping the access of the service board card to be accessed by the CPU on the main control board card under the condition of determining the abnormal service board card to be accessed.
10. The system according to any of claims 6 to 9, characterized in that the system is further adapted to implement the method of:
and under the condition that the FPGA determines that the access to the service board card to be accessed fails, the self-organized invalid response data is returned to the CPU on the main control board card, so that the CPU on the main control board card cannot be abnormal due to overtime access.
CN201910124873.6A 2019-02-20 2019-02-20 Method and system for recovering abnormity of service board card Active CN109918230B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910124873.6A CN109918230B (en) 2019-02-20 2019-02-20 Method and system for recovering abnormity of service board card

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910124873.6A CN109918230B (en) 2019-02-20 2019-02-20 Method and system for recovering abnormity of service board card

Publications (2)

Publication Number Publication Date
CN109918230A CN109918230A (en) 2019-06-21
CN109918230B true CN109918230B (en) 2021-01-26

Family

ID=66961802

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910124873.6A Active CN109918230B (en) 2019-02-20 2019-02-20 Method and system for recovering abnormity of service board card

Country Status (1)

Country Link
CN (1) CN109918230B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110519098B (en) * 2019-08-30 2022-06-21 新华三信息安全技术有限公司 Method and device for processing abnormal single board
CN111338700B (en) * 2020-02-24 2022-11-25 杭州迪普科技股份有限公司 Method and device for loading FPGA version and storage medium
CN111682991B (en) * 2020-05-28 2022-08-12 杭州迪普科技股份有限公司 Bus error message processing method and device
CN111885431B (en) * 2020-08-03 2022-06-17 北京环境特性研究所 Communication control method and device
CN113836058A (en) * 2021-09-13 2021-12-24 南京南瑞继保电气有限公司 Method, device, equipment and storage medium for data exchange between board cards
CN116841358A (en) * 2023-06-09 2023-10-03 启朔(深圳)科技有限公司 Server refreshing method, refreshing structure, system, computer equipment and medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105991481A (en) * 2015-05-19 2016-10-05 杭州迪普科技有限公司 Message forwarding method and message forwarding apparatus

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102467420A (en) * 2010-11-15 2012-05-23 鸿富锦精密工业(深圳)有限公司 System and method for storing and restoring configuration information of external board card
CN103513994B (en) * 2012-06-19 2017-10-20 记忆科技(深圳)有限公司 A kind of method and system that FPGA online upgradings are carried out by PCIE
CN103098039B (en) * 2012-10-17 2016-05-25 华为技术有限公司 High-speed peripheral device interconnection bus port collocation method and equipment
KR101631461B1 (en) * 2014-09-30 2016-06-17 주식회사 네오셈 Memory Device Test Apparatus and Method
US10223318B2 (en) * 2017-05-31 2019-03-05 Hewlett Packard Enterprise Development Lp Hot plugging peripheral connected interface express (PCIe) cards
CN107632865B (en) * 2017-10-24 2021-02-26 新华三技术有限公司 FPGA configuration upgrading method and device
CN108874441B (en) * 2018-06-20 2022-08-09 上海思源弘瑞自动化有限公司 Board card configuration method, device, server and storage medium
CN109286537B (en) * 2018-10-23 2020-11-17 智强通达科技(北京)有限公司 Board management system and management method for frame type multi-service board

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105991481A (en) * 2015-05-19 2016-10-05 杭州迪普科技有限公司 Message forwarding method and message forwarding apparatus

Also Published As

Publication number Publication date
CN109918230A (en) 2019-06-21

Similar Documents

Publication Publication Date Title
CN109918230B (en) Method and system for recovering abnormity of service board card
US8086903B2 (en) Method, apparatus, and computer program product for coordinating error reporting and reset utilizing an I/O adapter that supports virtualization
US7668923B2 (en) Master-slave adapter
US7145837B2 (en) Global recovery for time of day synchronization
US7313637B2 (en) Fabric and method for sharing an I/O device among virtual machines formed in a computer system
US7028218B2 (en) Redundant multi-processor and logical processor configuration for a file server
US20060242453A1 (en) System and method for managing hung cluster nodes
US20050081080A1 (en) Error recovery for data processing systems transferring message packets through communications adapters
US20050091383A1 (en) Efficient zero copy transfer of messages between nodes in a data processing system
US20010052054A1 (en) Apparatus and method for partitioned memory protection in cache coherent symmetric multiprocessor systems
EP0575067A2 (en) Shared, distributed lock manager for loosely coupled processing systems
US20050080869A1 (en) Transferring message packets from a first node to a plurality of nodes in broadcast fashion via direct memory to memory transfer
US20050080920A1 (en) Interpartition control facility for processing commands that effectuate direct memory to memory information transfer
CN109828945B (en) Service message processing method and system
US20050080945A1 (en) Transferring message packets from data continued in disparate areas of source memory via preloading
US20050078708A1 (en) Formatting packet headers in a communications adapter
CN106933575B (en) System and method for identifying asset information of server out of band
CN109995597B (en) Network equipment fault processing method and device
US7904663B2 (en) Secondary path for coherency controller to interconnection network(s)
CN114500327B (en) Detection method and detection device for server cluster and computing equipment
JP4572138B2 (en) Server apparatus, server system, and system switching method in server system
US7210070B2 (en) Maintenance interface unit for servicing multiprocessor systems
CN113568710B (en) High availability realization method, device and equipment for virtual machine
CN116932274B (en) Heterogeneous computing system and server system
CN115509856A (en) Storage cluster arbitration state monitoring method, system, terminal and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant