CN105700967A - PCIe (Peripheral Component Interconnect Express) equipment and detection method thereof - Google Patents

PCIe (Peripheral Component Interconnect Express) equipment and detection method thereof Download PDF

Info

Publication number
CN105700967A
CN105700967A CN201610015204.1A CN201610015204A CN105700967A CN 105700967 A CN105700967 A CN 105700967A CN 201610015204 A CN201610015204 A CN 201610015204A CN 105700967 A CN105700967 A CN 105700967A
Authority
CN
China
Prior art keywords
pcie
message
detection unit
unit
tlp
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610015204.1A
Other languages
Chinese (zh)
Inventor
张浩鹏
吴沛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201610015204.1A priority Critical patent/CN105700967A/en
Publication of CN105700967A publication Critical patent/CN105700967A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0721Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment within a central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • G06F11/0754Error or fault detection not based on redundancy by exceeding limits
    • G06F11/0757Error or fault detection not based on redundancy by exceeding limits by exceeding a time limit, i.e. time-out, e.g. watchdogs

Abstract

The invention relates to PCIe (Peripheral Component Interconnect Express) equipment and a detection method thereof. The PCIe equipment comprises a PCIe port, wherein the PCIe port can be in a DP (Downstream Port) of a PICe bridge and also can be in a RP (Root Point) of a CPU (Central Processing Unit). One example comprises the following steps: a PCIe kernel unit receives a transmission layer message issued from the CPU or an UP (Upstream Port) and issues the message to an EP (End Point); an exception detection unit detects a process that the port issues the message to the EP, identifies whether the process has message retransmission exception, such as ACK/NAK (Acknowledge/Negative Acknowledge) message exception or fluid control credit value update exception or not, and the exception detection unit outputs a hardware chain scission enable signal when no credit value exception is in the presence; and a hardware chain scission unit disconnects the link of the PCIe port and the EP according to the signal. The embodiment of the invention detects and processes ACK/NAK message or fluid control credit value update which can not be perceived but can cause the exception including CPU crash so as to guarantee that the CPU can normally work, and the reliability of the PCIe system and the compatibility of the system on a drive program can be improved.

Description

A kind of external components interconnected PCIe device and detection method thereof
Technical field
The present invention relates to moving communicating field, particularly relate to a kind of external components interconnected PCIe device and detection method thereof。
Background technology
Along with external components interconnected (PeripheralComponentInterconnectExpress, PCIe) agreement widely using in fields such as storage, calculating, it is also more and more higher to the reliability requirement of system;In order to promote the reliability of system, PCIe protocol defines a series of error detection with treatment mechanism to improve the abnormal impact on main frame of I/O device, but PCIe protocol yet suffers from the understanding deficiency of the seriousness degree to some mistake and CPU also incomplete situation in the process to some mistake。
Currently, when I/O device occurs abnormal, it is possible to the root node (RootPoint of PCIe device can be caused, or the downstream port (DownstreamPort of PCIe bridge PCIeSwitch RP), DP) occur that wherein, PCIeSwitch contains multiple downstream port DP extremely。When it detects that the I/O device of the port or its connection has abnormal, the error message that advanced error report AER mechanism according to PCIe sends, such as interrupt message, fault processing software receipt in have no progeny arrange corresponding DP or RP link control depositor, the link of its lower end is disconnected, the I/O device that isolation is abnormal, wherein end node (Endpoint, EP) is the I/O device in PCIe system。
But, when some abnormal conditions occurs in the EP being articulated in PCIeSwitch on a DP, owing to the topological structure of Whole PC Ie is complex, software is difficult to quickly find the abnormal corresponding DP port of EP, it is impossible in time by corresponding DP port chain rupture;Not crashing for guarantee system under normal circumstances, drive software can be direct by all off for all links under RP, and the impact of business is bigger;Meanwhile, this scheme needs drive software is modified, and adds the control of the detection to abnormal and judgement and chain rupture, compatible poor。
Summary of the invention
The invention provides a kind of PCIe device and detection method thereof, hang dead for solving the abnormal CPU caused of EP in prior art。
On the one hand, the invention provides a kind of PCIe device, such as data transmission sets such as network interface card, SSB。PCIe device includes PCIe port, and this PCIe port includes: PCIe nuclear unit, abnormality detecting unit and hardware chain rupture unit。Wherein, PCIe nuclear unit is for receiving the upstream port UP of CPU or the PCIe bridge transport layer message TLP issued, and TLP message is issued to end caps EP。Whether PCIe port is detected by abnormality detecting unit to the process of EP downward message, detect this message and issue and have message retransmission abnormal in process or stream control credit value update anomalies, if abnormal, then output hardware chain rupture enables signal。Enabling signal according to hardware chain rupture, hardware chain rupture unit disconnects linking of this PCIe port and EP, in order to the EP that isolation is abnormal, lifting system reliability。
In a possible design, the hardware chain rupture that hardware chain rupture unit specifically may be used for according to abnormality detecting unit output enables signal, by by the disabling link LinkDisable status set of PCIe nuclear unit link status register, as effective in LinkDisable mode bit reception high level, then put 1, thus the LTSSM state machine controlled in PCIe core is set to disabling Disable state, disconnection PCIe port links with EP's, thus isolating abnormal EP。
In a possible design, when PCIe device includes CPU, the PCIe port of PCIe device is arranged in the root node RP of CPU。PCIe nuclear unit in PCIe port receives the CPU TLP message issued, and sends TLP message to EP。
In a possible design, when PCIe device includes CPU and PCIe bridge, the PCIe port of PCIe device is arranged in the DP of PCIe bridge。PCIe nuclear unit in PCIe port receives the UP TLP message issued, and sends TLP message to EP。
In a possible design, when PCIe port is to the message retransmission process that the process of EP downward message is between PCIe port and EP, abnormality detecting unit can be re-transmission buffer detection unit。The message retransmission process of PCIE core in PCIe port is detected by re-transmission buffer detection unit, if message retransmission process occurs abnormal, then output hardware chain rupture enables signal;If without exception, then without output。
In a possible design, re-transmission buffer detection unit can include time-out detection unit and number of times detection unit simultaneously, it is also possible to individually includes time-out detection unit or number of times detection unit。When re-transmission buffer detection unit includes time-out detection unit, the message retransmission process of PCIe nuclear unit in PCIe port is carried out overtime detection by time-out detection unit。When the PCIe nuclear unit in PCIe port has TLP message, but when not exporting TLP message release signal, time-out detection unit detects when the re-transmission message persistent period (could also say that the message retransmission time) not exporting TLP message release signal reaches threshold value, re-transmission buffer detection unit exports hardware chain rupture to hardware chain rupture unit and enables signal, control hardware chain rupture unit and disconnect the connection of PCIe port and abnormal EP, reduce the abnormal EP impact on system entirety business。
In a possible design, re-transmission buffer detection unit can include time-out detection unit and number of times detection unit simultaneously, it is also possible to individually includes time-out detection unit or number of times detection unit。When re-transmission buffer detection unit includes number of times detection unit, the message retransmission process of PCIe nuclear unit in PCIe port is carried out number of times detection by number of times detection unit。When the PCIe nuclear unit in PCIe port has TLP message, but when not exporting TLP message release signal, number of times detection unit detects when the lasting number of times (could also say that message retransmission number of times) of the re-transmission message not exporting TLP message release signal reaches threshold value, re-transmission buffer detection unit exports hardware chain rupture to hardware chain rupture unit and enables signal, control hardware chain rupture unit and disconnect the connection of PCIe port and abnormal EP, reduce the abnormal EP impact on system entirety business。
In a possible design, re-transmission buffer detection unit can include time-out detection unit and number of times detection unit simultaneously, it is also possible to individually includes time-out detection unit or number of times detection unit。When including time-out detection unit and number of times detection unit when re-transmission buffer detection unit, time-out detects unit and the message retransmission process of PCIe nuclear unit in PCIe port is carried out number of times detection by number of times detection unit jointly simultaneously。When the PCIe nuclear unit in PCIe port has TLP message, but when not exporting TLP message release signal, time-out detection unit and number of times detection unit detect the re-transmission message persistent period not exporting TLP message release signal respectively and re-transmission message continues number of times, retransmit the message persistent period and arbitrary retransmitted in the lasting number of times of message reaches predetermined threshold value (time threshold or frequency threshold value), re-transmission buffer detection unit all can export hardware chain rupture to hardware chain rupture unit and enable signal, realize the chain rupture of PCIe port and abnormal EP, reduce the abnormal EP impact on system entirety business。
In a possible design, re-transmission buffer detection unit is possible not only to include time-out detection unit and/or number of times detection unit, it is also possible to include recognition unit。For identifying, this recognition unit can cause in retransmission processes that CPU hangs the dead value retransmitting persistent period and/or number of retransmissions, thus setting time threshold and/or the frequency threshold value of time-out detection unit and/or number of times detection unit, disconnect the connection of PCIe port and abnormal EP, thus improving the reliability of system。
In a possible design, when PCIe port is to the stream control credit value renewal process that the process of EP downward message is between PCIe port and EP, abnormality detecting unit can be faithlessness value detection unit。The faithlessness value persistent period message retransmission process of the multiclass affairs of PCIe nuclear unit in PCIe port is detected by faithlessness value detection unit, if the faithlessness value persistent period of multiclass affairs exceedes threshold value, then output hardware chain rupture enables signal, control hardware chain rupture unit and disconnect the connection of PCIe port and abnormal EP, reduce the abnormal EP impact on system entirety business。
In a possible design, faithlessness value detection unit includes multiple timer detection unit, detects unit by multiple timers and respectively the faithlessness value persistent period of multiclass affairs in PCIe nuclear unit is detected。When the PCIe nuclear unit in PCIe port has TLP message, one or more timers detection unit that faithlessness value detection unit detects in multiple faithlessness value detection unit reaches time threshold, and now faithlessness value detection unit exports described hardware chain rupture to hardware chain rupture unit and enables signal。
In a possible design, multiple timers detection unit in faithlessness value detection unit specifically may be used for the stream control credit value renewal process of multiclass affairs in detection PCIe nuclear unit, and multiclass affairs herein may include that report request head PH, report request data PD, non-report request head NPH, non-report request data NPD, complete heading CPLH and complete message data CPLD。If PCIe nuclear unit has described TLP message, and the one or more timers detection unit in multiple timer detection unit reaches time threshold, then faithlessness value detection unit exports described hardware chain rupture to hardware chain rupture unit and enables signal。
On the other hand, embodiments providing the detection method of a kind of PCIe device, PCIe device includes PCIe port, and PCIe device can be the data transmission sets such as network interface card, SSB。This detection method is performed by PCIe port, including: receive the upstream port UP of CPU or the PCIe bridge transport layer message TLP issued, and TLP message is issued to end caps EP;Whether, by the process to EP downward message is detected, detect this message and issue and have message retransmission abnormal in process or stream control credit value update anomalies, if abnormal, then output hardware chain rupture enables signal。Enabling signal according to hardware chain rupture, hardware chain rupture unit disconnects linking of this PCIe port and EP, in order to the EP that isolation is abnormal, lifting system reliability。
In a possible design, hardware chain rupture according to abnormality detecting unit output enables signal, by the LinkDisable status set by PCIe nuclear unit link status register, as effective in LinkDisable mode bit reception high level, then put 1, thus the LTSSM state machine controlled in PCIe nuclear unit is set to Disable state, disconnection PCIe port links with EP's, thus isolating abnormal EP。
In a possible design, for detecting the process to EP downward message, it is possible to message retransmission process to be carried out time-out detection and/or number of times detection。When only message retransmission process being carried out time-out detection, for PCIe port has TLP message, but when not exporting TLP message release signal, detect when re-transmission message persistent period (could also say that the message retransmission time) reaches threshold value, output hardware chain rupture enables signal, realize the chain rupture of PCIe port and abnormal EP, reduce the abnormal EP impact on system entirety business。
In a possible design, for detecting the process to EP downward message, it is possible to message retransmission process to be carried out time-out detection and/or number of times detection。When only message retransmission process being carried out number of times detection, for PCIe port has TLP message, but when not exporting TLP message release signal, detect when message retransmission number of times reaches predetermined threshold value, output hardware chain rupture enables signal, realize the chain rupture of PCIe port and abnormal EP, reduce the abnormal EP impact on system entirety business。
In a possible design, for detecting the process to EP downward message, it is possible to message retransmission process to be carried out time-out detection and/or number of times detection。When message retransmission process to be carried out time-out detection unit and number of times detection simultaneously, for PCIe port has TLP message, but when not exporting TLP message release signal, if detecting, arbitrary one retransmitted in message persistent period and the lasting number of times of re-transmission message reaches predetermined threshold value (time threshold or frequency threshold value), then output hardware chain rupture enables signal, realize the chain rupture of PCIe port and abnormal EP, reduce the abnormal EP impact on system entirety business。
In a possible design, by identifying, retransmission processes causing, CPU hangs the dead value retransmitting persistent period and/or number of retransmissions, thus setting corresponding time threshold and/or frequency threshold value, disconnect the connection of PCIe port and abnormal EP, thus improving the reliability of system。
In a possible design, for detecting the process to EP downward message, it is possible to the persistent period of message faithlessness value is detected。For there being TLP message in PCIe port, and detecting when the faithlessness value persistent period of multiclass affairs reaches time threshold, output hardware chain rupture enables signal。
The PCIE device of embodiment of the present invention offer is abnormal for ACK/NAK message or flows the abnormal CPU dead phenomenon of extension caused of control credit value; process by the active detecting of abnormality detecting unit Yu hardware chain rupture unit; ensure that CPU normal operation; improve the reliability of PCIe system and the compatibility that PCIe system is to driver。By only isolating abnormal EP, reduce equipment blocking scope, thus decreasing the abnormal impact on system entirety business of I/O device to the full extent。
Accompanying drawing explanation
In order to be illustrated more clearly that the technical scheme of the embodiment of the present invention, below the accompanying drawing used required during embodiment is described is briefly described, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the premise not paying creative work, it is also possible to obtain other accompanying drawing according to these accompanying drawings。
A kind of PCIe port structure principle chart that Fig. 1 provides for the embodiment of the present invention;
A kind of PCIe device application scenarios schematic diagram that Fig. 2 provides for the embodiment of the present invention;
A kind of PCIe device application scenarios schematic diagram that Fig. 3 provides for the embodiment of the present invention;
Fig. 4 is Fig. 1 a kind of PCIe port structural representation provided;
Fig. 5 is Fig. 1 a kind of PCIe device detection method flow chart provided;
Fig. 6 is the structural representation of Fig. 1 another kind of PCIe port provided;
Fig. 7 detects process schematic for Fig. 4 another kind of PCIe device provided;
Fig. 8 is the detection method flow chart of Fig. 1 another kind of PCIe device provided。
Detailed description of the invention
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is only a part of embodiment of the present invention, rather than whole embodiments。Based on the embodiment in the present invention, the every other embodiment that those of ordinary skill in the art obtain under not making creative work premise, broadly fall into the scope of protection of the invention。
For ease of the understanding to the embodiment of the present invention, being further explained explanation below in conjunction with accompanying drawing with specific embodiment, embodiment is not intended that the restriction to the embodiment of the present invention。
PCIe device connects under PCIe port EP equipment, owing to EP equipment all can be connect under the downstream port DP in root node RP and the PCIeSwitch in CPU in PCIe device, therefore the PCIe port in PCIe device is improved by the embodiment of the present invention, owing to the internal structure of RP and DP is identical, therefore this PCIe port may be located in RP, it is also possible to is arranged in DP。Wherein, only meet EP, a RP and DP under RP or DP and realized by hardware chip, such as FPGA。PCIe device can be the data transmission set such as network interface card, SSB;EP equipment is the I/O device in PCIe system。
The message that PCIe port reception CPU or upstream port (UpstreamPort, UP) issue, and message is issued to EP;In issuing process, PCIe port detects whether EP in this process has ACK/NAK message abnormal or flow control credit value extremely, if there being exception。Then output hardware chain rupture enables signal, enables signal finally according to hardware chain rupture, the abnormal EP of isolation。
A kind of PCIe port structure principle chart that Fig. 1 provides for the embodiment of the present invention。As shown in Figure 1, PCIe port 100 receives has the processor of disposal ability or a device or transport layer message (TransportLayerPacket that system issues, TLP), can be the CPU TLP message issued, it is also possible to be the TLP message that issues of PCIeSwitch middle and upper reaches port UP herein;PCIe port 100 includes PCIe nuclear unit 130, abnormality detecting unit 120 and hardware chain rupture unit 140。
PCIe nuclear unit 130, for receiving CPU or the UP TLP message issued, is issued to EP150, PCIe nuclear unit 130 after treatment and is realized by hardware chip;According to PCIe protocol, the PCIe data bus in PCIe nuclear unit 130 is divided into three-decker, including transport layer 131, data link layer 132 and physical layer 133。
Transport layer 131 is handed down to data link layer 132 after receiving the CPU TLP message issued and being processed to, and judges that EP150 receives the ability of TLP message;Transport layer 131 includes all kinds of depositor and logical device, as credit value uses limit register, consumed credits value register, comparator and adder etc.。It should be noted that transport layer 131 can access the configuration space 20 of PCIE device, configuration space 20 comprises Link Status register。In transport layer 131, the position of all kinds of depositors, logical device and configuration space and title all along with the evolution of communication equipment, can change, as the position belonging to each merges。
Data link layer 132 receives the TLP message that transport layer 131 issues, and this TLP message carries out error detection and recovery etc. and processes, it is ensured that the reliability of message transmissions and correctness。
Data link layer 132 is additionally operable to storage TLP message, and exports to abnormality detecting unit 120 and trigger signal accordingly, as TLP message receive signal, TLP message release signal and without storage signal etc.。The error detection of data link layer 132 and the process of recovery are to complete by the TLP message made mistakes carries out re-transmission。
Physical layer 133 and EP150 carry out message transmissions, and the transmission for TLP message provides reliable environment;
The message transmission procedure of data link layer 132 in PCIe nuclear unit 130 is detected by abnormality detecting unit 120, identifies that whether message transmission procedure is abnormal, if occurring abnormal, then output hardware chain rupture enables signal, if without exception, then without output。Such as, abnormality detecting unit 120 can detect and cannot be received the EP ACK message sent by PCIe nuclear unit 130, or the ACK/NAK message that PCIe nuclear unit 130 receives the EP NAK message sent and constitutes is extremely always, can also detect by transport layer 131 faithlessness value in PCIE nuclear unit 130, or the stream control credit value that the stream control renewal message logic of EP transmission is constituted extremely is extremely。
The hardware chain rupture that hardware chain rupture unit 140 exports according to abnormality detecting unit 120 enables signal, control PCIe nuclear unit 130 to remove and abnormal cache information relevant for EP150, including data cached, Reset Status information and the configuration information of removing EP150, thus realizing the isolation to abnormal EP150。It should be noted that above-mentioned reset procedure can be realized by the hardware chip at PCIe nuclear unit 130 place, such as fpga chip。
PCIe nuclear unit 130 in PCIe port 100 receives the CPU TLP message issued, this message is sent to EP150 after treatment, at this message in EP150 transmitting procedure, the transmitting procedure of this message is detected by abnormality detecting unit 120, when detecting that EP150 occurs abnormal, trigger hardware chain rupture unit 140 carries out chain rupture, the abnormal EP150 of isolation, in case the command queue of CPU is congested and occur hanging dead phenomenon。
It should be noted that as in figure 2 it is shown, when PCIe port is arranged in the RP of CPU, now in PCIe device, the RP of CPU can directly be connected with EP, the message that CPU issues is sent to I/O device EP by RP;As it is shown on figure 3, when PCIe port is arranged in the DP of PCIeSwitch, now in PCIe device, the UP of RP and the PCIeSwitch of CPU is connected, DP and the EP of PCIeSwitch is connected, the message that CPU issues is sent to UP by RP, then is sent to DP by UP, is sent to I/O device EP by DP;Owing to the internal structure of RP and DP is identical, multiple PCIeSwitch therefore can be utilized to carry out DP ports-Extending。Wherein, UP can be realized by fpga chip。
The PCIe device of embodiment of the present invention offer is abnormal for ACK/NAK message or flows the abnormal CPU dead phenomenon of extension caused of control credit value; process by the active detecting of abnormality detecting unit Yu hardware chain rupture unit; ensure that CPU normal operation; improve the reliability of PCIe system and the compatibility that PCIe system is to driver。By only isolating abnormal EP, reduce equipment blocking scope, thus decreasing the abnormal impact on system entirety business of I/O device to the full extent。
Fig. 4 is Fig. 1 a kind of PCIE port organization schematic diagram provided。As shown in Figure 4, PCIe port 400 includes PCIe nuclear unit 130, re-transmission buffer detection unit 410, hardware chain rupture unit 140。
PCIe nuclear unit 130 receives the CPU TLP message issued, and is issued to EP150 after being processed to。Wherein, transport layer 131 accesses the Link Status register 21 in the configuration space 20 of PCIe device and can realize the state to PCIe nuclear unit 130 and control。Data link layer 132 includes storage device, for storing the TLP message of reception, such as buffer 1321, buffer 1321 detects unit 410 to re-transmission buffer and exports and trigger signal accordingly, as TLP message receive signal, TLP message release signal and without storage signal etc.。
Physical layer 133 can include state controller, as PCIE nuclear unit 130 is carried out state control according to the state of Link Status register 21 by LTSSM state machine 1311, LTSSM state machine 1311;
Re-transmission buffer detection unit 410 receive PCIE nuclear unit 130 data link layer 132 send trigger signal accordingly, detect according to triggering the message retransmission process of buffer 1321 in data signal link layer 132, identify that whether message retransmission process is abnormal, if occurring abnormal, then output hardware chain rupture enables signal, if without exception, then without output, so that data link layer 132 can exit message retransmission orientation endless loop, it is prevented that CPU hangs dead。
Further, re-transmission buffer detection unit 410 can include time-out detection unit 411 and number of times detection unit 412 simultaneously, it is also possible to individually includes time-out detection unit 411 or number of times detection unit 412;Wherein, time-out detection unit 411 can include a timer that can set threshold value, and number of times detection unit 412 can include an enumerator that can set threshold value。
The hardware chain rupture that hardware chain rupture unit 140 detects unit 410 output according to re-transmission buffer enables signal, controls PCIe nuclear unit 130 and removes cache information relevant to exception EP150 in PCIE nuclear unit 130, it is achieved the isolation to abnormal EP150。
In one example, in PCIe port 400, the transport layer 131 of PCIe nuclear unit 130 receives the CPU TLP message issued, and after being processed to, it is handed down to data link layer 132, data link layer 132 by this TLP packet storage in buffer 1321, and after this TLP message is carried out error detection and Recovery processing, being sent to physical layer 133, PCIe port 400 transmits this TLP message by physical layer 133 to EP150。
After TLP message is successfully transferred to EP150 by PCIe port 400, if EP150 is successfully received this TLP message, then EP150 will send ACK message to the data link layer 132 in PCIe port 400, and data link layer 132 normally receives next TLP message that transport layer 131 issues;If EP150 is not successfully received this message, then EP150 will send NAK message to the data link layer 132 in PCIe port 400, and the buffer 1321 in data link layer 132 retransmits by this message。
During downlink exception when between PCIe port 400 and EP150, as transmitting procedure instability causes that EP150 receives error code message, say, that now EP150 is not received by correct TLP message, EP150 is constantly in NAK and sends state;When up-link when between PCIe port 400 and EP150 occurs abnormal, owing to data link layer 132 cannot receive the ACK message returned by EP150, data link layer 132 will be considered to EP150 and is not received by correct TLP message;In both cases, according to the restriction of buffer 1321 cushion space in data link layer 132, when the memory space inadequate of buffer 1321, data link layer 132 will not receive the TLP message that transport layer 131 issues, simultaneously according to PCIe protocol, data link layer 132 constantly will retransmit this TLP message to EP150 by buffer 1321。
Time-out detection unit 411 and/or number of times in re-transmission buffer detection unit 410 detect unit 412, the message retransmission process of buffer 1321 in data link layer 132 carries out Abnormal lasting respectively and/or abnormal frequency detects。
It should be noted that before PCIE device starts, it is necessary to time-out is detected the timer in unit 411 and carries out time threshold setting, and/or the enumerator that number of times is detected in unit 412 carries out frequency threshold value setting;In data link layer 132, the TLP message of buffer 1321 output receives the count signal of the signal timing signal as timer and/or enumerator, and the TLP message release signal of buffer 1321 output is as the reset signal of timer and/or enumerator。
When buffer 1321 defeated middle packet storage without TLP, re-transmission buffer detection unit 410 receives the triggering signal of buffer 1321, time time-out detecting unit 411 Timer accumulative resets and keeps, and/or number of times number of times detecting unit 412 Counter accumulative resets and keeps;
When buffer 1321 has TLP message, and this message is not released, that is, when buffer 1321 does not export TLP message release signal, re-transmission buffer detection unit 410 receives the triggering signal of buffer 1321, make the timer in time-out detection unit 411 start timing, and/or the enumerator in number of times detection unit 122 starts counting up;
When buffer 1321 exports TLP message release signal, re-transmission buffer detection unit 410 receives the triggering signal of buffer 1321, time time-out detecting unit 411 Timer accumulative resets and restarts timing, number of times detects the accumulative number of times of unit 412 Counter and resets and again start counting up;
When the enumerator that the timer of time-out detection unit 411 reaches time threshold or number of times detection unit 412 reaches frequency threshold value, re-transmission buffer detection unit 410 exports hardware chain rupture and enables signal, and the timer of now time-out detection unit 411 and/or the enumerator of number of times detection unit 412 will be in maintenance state until resetting。That is, unit 411 can be detected by time-out and the common message retransmission process to buffer 1321 of number of times detection unit 412 detects, one detected when time-out in unit 411 and number of times detection unit 412 reaches threshold value, unit 410 is detected with regard to trigger re-transmissions buffer and exports hardware chain rupture enable signal, individually can also be detected unit 411 by time-out or the message retransmission process of buffer 1321 is detected by number of times detection unit 412, when time-out detection unit 411 or number of times detection unit 412 reach threshold value, just can export hardware chain rupture enable signal by trigger re-transmissions buffer detection unit 410。
After hardware chain rupture unit 140 receives the hardware chain rupture enable signal of re-transmission buffer detection unit 410 output, the disabling link LinkDisable mode bit of the Link Status register 21 transport layer 131 accessed forces set, as LinkDisable mode bit receives high level effectively, then put 1。Owing to Link Status register 21 is in disabling Disable state, thus controlling LTSSM state machine in physical layer 133 to be set to Disable state, make PCIe nuclear unit 130 is eliminated to abnormal cache information relevant for EP150, it is achieved the isolation to abnormal EP150。
It should be noted that, the time threshold of time-out detection unit and the frequency threshold value of number of times detection unit can be arranged according to practical situation by those skilled in the art, can also detect in re-transmission buffer and unit increases recognition unit, for identifying, this recognition unit can cause that CPU hangs dead re-transmission persistent period and number of retransmissions, set corresponding time threshold and frequency threshold value, the abnormal EP of isolation。
The PCIe device that the embodiment of the present invention provides for ACK/NAK message is this kind of cannot perception, and CPU can be caused to hang dead exception, by the active detecting of PCIe port and process, only isolate abnormal EP, reduce equipment blocking scope, improve the reliability of PCIE system。
With above-mentioned PCIe device accordingly, the embodiment of the present invention additionally provides a kind of detection method。
Fig. 5 is Fig. 1 a kind of PCIe device detection method flow chart provided。This PCIe device includes PCIe port, as it is shown in figure 5, the method is performed by PCIe port, including:
S510, reception TLP message, and issue this TLP message to EP;
Concrete, that in PCIe port, PCIE nuclear unit receives the processor or device with disposal ability or system issues TLP message, can be the CPU TLP message issued herein, and PCIe nuclear unit receives this message, is issued to EP after treatment。
S520, judge whether EP is properly received the TLP message that PCIe nuclear unit issues;
Concrete, if EP is properly received PCIe port TLP message, then in PCIe port, the data link layer of PCIe nuclear unit receives the EP ACK message sent, and now data link layer normally receives the TLP message that transport layer issues, and detects unit output state triggering signal to re-transmission buffer;
If EP is properly received the TLP message that PCIe port issues, but the data link layer of PCIe nuclear unit does not receive the EP ACK message sent in PCIe port, then according to PCIe protocol, data link layer will retransmit this message to EP, and now data link layer detects unit output state to re-transmission buffer and triggers signal;
If EP is not successfully received the message that PCIE port issues, then in PCIe port, the data link layer of PCIe nuclear unit receives the EP NAK message sent, and according to PCIe protocol, will retransmit this message to EP, and detect unit output state triggering signal to re-transmission buffer。
Constantly retransmitting in this message process in data link layer to EP, the restriction according to re-transmission buffer cushion space, when the memory space inadequate of re-transmission buffer, data link layer will not receive the TLP message that transport layer issues。Simultaneously according to PCIe protocol, data link layer constantly will retransmit this TLP message to EP。
S530, time and/or number of times to message retransmission process detect;
Concrete, when in data link layer without TLP packet storage, re-transmission buffer detection unit receives the State triggers of data link layer, re-transmission buffer detects accumulative number of times in the time and/or number of times detection location counter that in unit, time-out detection unit timer is accumulative all reset and keep, until re-transmission buffer detection unit receives other State triggers of data link layer;
When data link layer has TLP message, and this message is not released, that is, when re-transmission buffer does not export TLP message release signal, State triggers according to data link layer, in re-transmission buffer detection unit, the timer in time-out detection unit starts timing, and/or the enumerator in number of times detection unit starts counting up;
When data link layer exports TLP message release signal, re-transmission buffer detection unit receives the State triggers of data link layer, re-transmission buffer detects accumulative number of times in the detection unit timer of the time-out in unit accumulative time and/or number of times detection location counter and all resets and again start counting up;
When the enumerator that the timer of time-out detection unit reaches time threshold or number of times detection unit reaches frequency threshold value, re-transmission buffer detection unit output hardware chain rupture enables signal, and the timer of now time-out detection unit and the enumerator of number of times detection unit will hold up to reset。
S540, according to hardware chain rupture enable signal, isolate EP;
Concrete, detecting, according to re-transmission buffer, the hardware chain rupture enable signal that unit exports, the LinkDisable mode bit of Link Status register transport layer accessed forces set, as LinkDisable mode bit receives high level effectively, then puts 1。Owing to Link Status register is in Disable state, thus controlling LTSSM state machine in physical layer to be set to Disable state, making PCIe nuclear unit is eliminated to abnormal cache information relevant for EP, completing the isolation to abnormal EP。
The PCIe device that the above embodiments provide for ACK/NAK message this kind of cannot perception, and CPU can be caused to hang dead exception, by the active detecting of PCIe port and process, only isolate abnormal EP, reduce equipment blocking scope, improve the reliability of system。
Fig. 6 is the structural representation of Fig. 1 another kind of PCIE port provided。As shown in Figure 6, PCIe port 600 includes PCIe nuclear unit 130, faithlessness value detection unit 610, hardware chain rupture unit 140。
PCIe nuclear unit 130 receives the CPU TLP message issued, and sends to EP150 after this TLP Message processing;Transport layer 131 in PCIe nuclear unit 130 includes credit value limit register 22, consumed credits value register 23, comparator 24 and adder 25, and the Link Status register 21 in access PCIe device configuration space 20 can realize the state to PCIe nuclear unit 130 and control;Transport layer 131 is for judging that EP150 receives the ability of TLP message, the message amount being able to receive that is updated the transport layer 131 in message notifying PCIe port 600 with the form of credit value by sending stream control by EP150, transport layer 131 can also include storage device, such as message buffering device 1311, for storing the TLP message of reception。
Wherein, credit value limit register 22 is for recording the available credit of EP150, and consumed credits value register 23 is for recording the credit value that transport layer 131 has consumed。
Faithlessness value detection unit 610 receive PCIe nuclear unit 130 transport layer 131 send trigger signal accordingly, according to triggering signal, the persistent period of transport layer 131 faithlessness value is detected, faithlessness value detection unit 610 includes 6 timer detection unit that can set threshold value, i.e. timer detection unit 611, timer detection unit 612, timer detection unit 613, timer detection unit 614, timer detection unit 615 and timer detection unit 616, 6 timer detection unit are respectively used in detection transport layer 131 persistent period of 6 class affairs faithlessness values, if the persistent period of the one or more timers detection unit in 6 timer detection unit reaches time threshold, then output hardware chain rupture enables signal, prevent due to transport layer 131 faithlessness value or do not receive the stream control of EP150 for a long time and update the CPU that causes of message and hang dead。
According to PCIe protocol, needing the stream control credit value checking six class affairs to include: report request head PH, report request data PD, non-report request head NPH, non-report request data NPD, complete heading CPLH and complete message data CPLD, every kind of credit value of these six classes affairs is all controlled by one group of independent credit value limit register and corresponding consumed credits value register。In PCIe protocol, each PH, NPH, CPLH or NPD need to consume a credit value, and need to consume one or more credit value for each PD or CPLD, and wherein, the two consumed credits value depends on the specification of maximum load in transport layer。
For above-mentioned six class affairs, when arbitrary timer detection unit reaches threshold value, faithlessness value detection unit 610 all can export hardware chain rupture and enable signal, and trigger hardware chain rupture unit 140 works。It addition, limited credit value is only noticed effectively by faithlessness value detection unit 610, unlimited credit value is noticed without detecting。
It should be noted that credit value refers to the spatial cache quantity of relief area, available credit embodies EP150 and receives the ability of TLP message;Available credit is many, represents the ability receiving TLP message strong, otherwise, available credit is few, represents the ability receiving TLP message weak, and transport layer 131 sends a TLP message will consume one or more available credit。
Transport layer 131 faithlessness value has two layers of meaning: first is owing to the spatial cache of EP150 relief area is fully loaded with, send, to the credit value limit register 22 of transport layer 131, the stream control that credit value is 0 and update message, i.e. EP150 faithlessness value, thus causing transport layer 131 faithlessness value;Second is that EP150 has available credit, but the interrelated logic of credit value limit register 22 transmission stream control renewal message is made mistakes in transport layer 131, the available credit causing EP150 cannot update in credit value limit register 22 so that transport layer 131 faithlessness value。
The hardware chain rupture that hardware chain rupture unit 140 detects unit 410 output according to faithlessness value enables signal, removes cache information relevant to exception EP150 in PCIE nuclear unit 310, it is achieved the isolation to abnormal EP150。
In one example, before PCIE device starts, it is necessary to 6 timer detection unit faithlessness value detected in unit 610 carry out time threshold setting respectively。When the message buffering device of transport layer 131 having wait the TLP message issued, transport layer 131 faithlessness value detection process is as shown in Figure 7, credit value limit register 22 in transport layer 131 records the available credit of EP150, and consumed credits value register 23 records the credit value that transport layer 131 has consumed。The TLP message to be sent due to transport layer 131 needs consumed credits value, therefore, TLP message to be sent is needed the credit value that the credit value consumed and current transmission layer 131 have consumed to sue for peace by adder 25 by transport layer 131, wastage in bulk or weight credit value as transport layer 131, again through comparator 24, the available credit of the wastage in bulk or weight credit value of transport layer 131 Yu EP150 is compared。
When message buffering device 1311 has TLP message to be issued, and the available credit of EP150 more than or equal to the wastage in bulk or weight credit value of transport layer 131 time, message buffering device 1311 normally issues TLP message to data link layer 132, faithlessness value detection unit 610 receives the triggering signal that message buffering device 1311 sends, and the accumulative time is reset and keeps by all timers detection unit;
When message buffering device 1311 has TLP message to be issued, and the available credit of EP150 less than the wastage in bulk or weight credit value of transport layer 131 time, message buffering device 1311 stops issuing this TLP message to data link layer 132, faithlessness value detection unit 410 receives the triggering signal that message buffering device 1311 sends, and the corresponding timer detection unit of faithlessness value detection unit 610 will start timing;
When in message buffering device 1311 without TLP packet storage, faithlessness value detection unit 610 receives the triggering signal that message buffering device 1311 sends, and the time accumulative for all timers detection unit in faithlessness value detection unit 610 is reset and kept;
When arbitrary timer detection unit that faithlessness value detects in unit 610 reaches time threshold, faithlessness value detection unit 410 exports hardware chain rupture and enables signal, and all timers detection unit in faithlessness value detection unit 610 will keep to resetting。
Such as, transport layer 131 judges that from credit value limit register 22 available credit of EP150 is 110, current transmission layer 131 draws transport layer 131 consumed credits value 100 from consumed credits value register 23, if the TLP message that now transport layer 131 is to be received needs consumed credits value 9, owing to current transmission layer 131 consumed credits value and TLP message to be received need consumed credits value sum to be 109, available credit 110 less than EP150, the TLP message of reception is issued to data link layer 132 by transport layer 131, thus being issued in EP150 by data link layer 132;If the TLP message that now transport layer 131 is to be received needs consumed credits value to be 20, the credit value consumed due to current transmission layer 131 needs consumed credits value sum to be 120 with TLP message to be received, available credit 110 more than EP150, stopping is issued this TLP message to data link layer 132 by transport layer 131。
After hardware chain rupture unit 140 receives the hardware chain rupture enable signal of faithlessness value detection unit 610 output, the LinkDisable mode bit of the Link Status register 21 transport layer 131 accessed forces set, as LinkDisable mode bit receives high level effectively, then put 1。Owing to Link Status register 21 is in Disable state, thus controlling LTSSM state machine in the physical layer 133 of PCIe nuclear unit 130 to be set to Disable state, make transport layer 131 in PCIE nuclear unit 130 be eliminated with abnormal buffer information relevant for EP150 to data link layer 132, complete the isolation to abnormal EP150。
The PCIe device that the embodiment of the present invention provides, for stream control abnormal this kind of cannot perception, and CPU can be caused to hang dead exception, by the active detecting of PCIe port and process, only isolate abnormal EP, reduce equipment blocking scope, improve the reliability of PCIe system。
With above-mentioned PCIe device accordingly, the embodiment of the present invention additionally provides a kind of detection method。Credit value refers to the spatial cache quantity of relief area, and available credit embodies the ability receiving TLP message;Transport layer receives a TLP message will consume one or more available credit;Available credit is many, represents the ability receiving TLP message strong, otherwise, available credit is few, represents the ability receiving TLP message weak。
According to PCIe protocol, needing to check the stream control credit value of six class affairs: report request head PH, report request data PD, non-report request head NPH, non-report request data NPD, complete heading CPLH, complete message data CPLD, every kind of credit value of six class affairs is all controlled by one group of independent credit value limit register and corresponding consumed credits value register。In PCIe protocol, each PH, NPH, CPLH or NPD need to consume a credit value, and need to consume one or more credit value for each PD or CPLD, PD or CPLD consumed credits value quantity depends on the specification of maximum load in transport layer。
Transport layer faithlessness value has two layers of meaning: first is owing to the spatial cache of EP relief area is fully loaded with, and sends, to the credit value limit register of transport layer, the stream control that credit value is 0 and updates message, i.e. EP faithlessness value, thus causing transport layer faithlessness value;Second is that EP has available credit, but sends the interrelated logic flowing control renewal message to credit value limit register in transport layer and make mistakes, and causes that the available credit of EP cannot update in credit value limit register so that transport layer faithlessness value。
Fig. 8 is the detection method flow chart of Fig. 1 another kind of PCIe device provided。This PCIe device includes PCIe port, and as shown in Figure 8, the method is performed by PCIe port, including:
The credit value that S810, the available credit obtaining EP and transport layer have currently consumed;
Concrete, by the credit value limit register of access transport layer and consumed credits value register, the credit value that the available credit of acquisition EP and current transmission layer have consumed;
S820, credit value transport layer currently consumed and TLP message to be received want consumed credits value to sue for peace;
Concrete, consumed credits value is wanted owing to the transport layer of PCIE port receives TLP message, therefore, TLP message to be received is needed the credit value that the credit value consumed and transport layer have currently consumed to sue for peace by adder by transport layer, as the wastage in bulk or weight credit value of transport layer。
S830, comparative result according to the wastage in bulk or weight credit value of transport layer with EP available credit, the persistent period of detection transport layer faithlessness value;
Concrete, when transport layer has TLP message to be issued, and the available credit of EP more than or equal to the wastage in bulk or weight credit value of transport layer time, transport layer normally issues TLP message to data link layer, triggering signal according to transport layer, in faithlessness value detection unit, the accumulative time is reset and keeps by all timers detection unit;
When transport layer has TLP message to be issued, and the available credit of EP less than the wastage in bulk or weight credit value of transport layer time, transport layer stops issuing this TLP message to data link layer, the triggering signal according to transport layer, and the corresponding timer detection unit of faithlessness value detection unit will start timing;
When in transport layer, nothing is wait the TLP packet storage issued, the triggering signal according to transport layer, in faithlessness value detection unit, the accumulative time is reset and keeps by all timers detection unit;
When arbitrary timer detection unit that faithlessness value detects in unit reaches predetermined threshold value, faithlessness value detection unit exports described hardware chain rupture and enables signal, and all timers in faithlessness value detection unit will keep to resetting。
S840, according to hardware chain rupture enable signal, isolate EP;
Concrete, detecting, according to faithlessness value, the hardware chain rupture enable signal that unit exports, the LinkDisable mode bit of Link Status register transport layer accessed forces set, as LinkDisable mode bit receives high level effectively, then puts 1。Owing to Link Status register is in Disable state, thus controlling LTSSM state machine in physical layer to be set to Disable state, making PCIE nuclear unit is eliminated to abnormal cache information relevant for EP, completing the isolation to abnormal EP。
It should be noted that limited credit value is only noticed effectively by the detection of faithlessness value, unlimited credit value is noticed without detecting。
The above embodiments for stream control credit value abnormal this kind of cannot perception, and CPU can be caused to hang dead exception, only isolate corresponding abnormal EP by this detection method, reduce equipment blocking scope, improve the reliability of system。
The PCIe device that the embodiment of the present invention provides for ACK/NAK message or stream control credit value is this kind of cannot perception, and CPU can be caused to hang dead exception, active detecting and process can be carried out, ensure that CPU normal operation, improve reliability and the PCIe system compatibility to driver of PCIe system。By only isolating abnormal EP, reduce equipment blocking scope, thus decreasing the abnormal impact on system entirety business of I/O device to the full extent。
Professional should further appreciate that, the unit of each example described in conjunction with the embodiments described herein and algorithm steps, can with electronic hardware, computer software or the two be implemented in combination in, in order to clearly demonstrate the interchangeability of hardware and software, generally describe composition and the step of each example in the above description according to function。These functions perform with hardware or software mode actually, depend on application-specific and the design constraint of technical scheme。Professional and technical personnel specifically can should be used for using different methods to realize described function to each, but this realization is it is not considered that beyond the scope of this invention。
The method described in conjunction with the embodiments described herein or the step of algorithm can use the software module that hardware, processor perform, or the combination of the two is implemented。Software module can be placed in any other form of storage medium known in random access memory (RAM), internal memory, read only memory (ROM), electrically programmable ROM, electrically erasable ROM, depositor, hard disk, moveable magnetic disc, CD-ROM or technical field。
Above-described detailed description of the invention; the purpose of the present invention, technical scheme and beneficial effect have been further described; it is it should be understood that; the foregoing is only the specific embodiment of the present invention; the protection domain being not intended to limit the present invention; all within the spirit and principles in the present invention, any amendment of making, equivalent replacement, improvement etc., should be included within protection scope of the present invention。

Claims (15)

1. an external components interconnected PCIe device, it is characterised in that described PCIe device includes PCIe port, and described PCIe port includes:
PCIe nuclear unit, for receiving CPU or the upstream port UP transport layer message TLP issued, and is issued to end caps EP by described TLP message;
Abnormality detecting unit, for described PCIe port is detected to the EP process issuing described TLP message, when the described message of identification issues and has message retransmission abnormal in process or flow control credit value update anomalies, output hardware chain rupture enable signal;
Hardware chain rupture unit, for enabling signal according to described hardware chain rupture, disconnects linking of described PCIe port and described EP。
2. equipment according to claim 1, it is characterised in that described hardware chain rupture unit specifically for:
The described hardware chain rupture that described hardware chain rupture unit exports according to described abnormality detecting unit enables signal, by the disabling Link State set by PCIe nuclear unit link status register, the link controlling described PCIe nuclear unit arranges disabled status, disconnects linking of described PCIe port and described EP。
3. equipment according to claim 1, it is characterized in that, described PCIe device includes CPU, and described PCIe port is arranged in the root node RP of CPU, PCIe nuclear unit in described PCIe port receives the CPU described TLP message issued, and sends described TLP message to EP。
4. equipment according to claim 1, it is characterized in that, described PCIe device includes CPU and PCIe bridge, and described PCIe port is arranged in the DP of PCIe bridge, PCIe nuclear unit in described PCIe port receives the UP described TLP message issued, and sends described TLP message to EP。
5. equipment according to claim 1, it is characterised in that it is message retransmission process that described message issues process, and described abnormality detecting unit includes:
Re-transmission buffer detection unit, for the message retransmission process of described PCIE nuclear unit is detected, when identifying that described message retransmission process has abnormal, output hardware chain rupture enables signal。
6. equipment according to claim 5, it is characterised in that described re-transmission buffer detection unit includes time-out detection unit,
Described time-out detection unit, for carrying out time-out detection to the message retransmission process of described PCIe nuclear unit;
When described PCIe nuclear unit has described TLP message, but not exporting TLP message release signal, and described time-out detection unit detects when re-transmission time reaches threshold value, described re-transmission buffer detection unit exports described hardware chain rupture and enables signal。
7. equipment according to claim 5, it is characterised in that described re-transmission buffer detection unit includes number of times detection unit;
Described number of times detection unit, for carrying out number of times detection to the message retransmission process of described PCIe nuclear unit;
When described PCIe nuclear unit has described TLP message, but not exporting TLP message release signal, and described number of times detection unit detects when number of retransmissions reaches threshold value, described re-transmission buffer detection unit exports described hardware chain rupture and enables signal。
8. equipment according to claim 1, it is characterised in that it is stream control credit value renewal process that described message issues process, and described abnormality detecting unit includes:
Faithlessness value detection unit, for the persistent period of faithlessness value in described PCIE nuclear unit is detected, output hardware chain rupture enables signal。
9. equipment according to claim 8, it is characterised in that described faithlessness value detection unit includes:
Multiple timers detection unit, for detecting the persistent period of faithlessness value in described PCIE nuclear unit;
When described PCIe nuclear unit has described TLP message, and described faithlessness value detection unit detects that when the one or more described timer detection unit in described faithlessness value detection unit reaches time threshold, described faithlessness value detection unit exports described hardware chain rupture and enables signal。
10. equipment according to claim 9, it is characterised in that the plurality of timer detection unit specifically for:
Respectively the stream control credit value renewal process of multiclass affairs in described PCIe nuclear unit is detected, when described PCIe nuclear unit has described TLP message, and the one or more described timer detection unit in the plurality of timer detection unit is when reaching time threshold, described faithlessness value detection unit exports described hardware chain rupture and enables signal。
11. the detection method of an external components interconnected PCIe device, it is characterised in that described PCIe device includes PCIe port, and described method is performed by described PCIe port, including:
Receive CPU or the upstream port UP transport layer message TLP issued, and described TLP message is issued to end caps EP;
The process issuing described TLP message to described EP is detected, when identification message issues and has message retransmission abnormal in process or flow control credit value update anomalies, output hardware chain rupture enable signal;
Enable signal according to described hardware chain rupture, disconnect linking of described PCIe port and described EP。
12. method according to claim 11, it is characterised in that enable signal according to described hardware chain rupture, disconnect the step of EP, including:
Signal is enabled according to described hardware chain rupture, the disabling Link State set of Link Status register that will access, the link of the process of described downward message is arranged disabled status, disconnects linking of described PCIe port and described EP。
13. method according to claim 11, it is characterised in that the described step that the process to EP downward message is detected includes:
Described message retransmission process is carried out time-out detection;
When there being described TLP message, but do not export TLP message release signal, and when detecting that not exporting the TLP message release signal persistent period reaches threshold value, export described hardware chain rupture and enable signal。
14. method according to claim 11, it is characterised in that the described step that the process to EP downward message is detected also includes:
Described message retransmission process is carried out number of times detection;
When there being described TLP message, but do not export TLP message release signal, and detect when number of retransmissions reaches threshold value, export described hardware chain rupture and enable signal。
15. method according to claim 11, it is characterised in that the described step that the process to EP downward message is detected also includes:
The persistent period of faithlessness value is detected;
When there being described TLP message, and detect when the persistent period of described faithlessness value reaches time threshold, export described hardware chain rupture and enable signal。
CN201610015204.1A 2016-01-08 2016-01-08 PCIe (Peripheral Component Interconnect Express) equipment and detection method thereof Pending CN105700967A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610015204.1A CN105700967A (en) 2016-01-08 2016-01-08 PCIe (Peripheral Component Interconnect Express) equipment and detection method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610015204.1A CN105700967A (en) 2016-01-08 2016-01-08 PCIe (Peripheral Component Interconnect Express) equipment and detection method thereof

Publications (1)

Publication Number Publication Date
CN105700967A true CN105700967A (en) 2016-06-22

Family

ID=56227130

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610015204.1A Pending CN105700967A (en) 2016-01-08 2016-01-08 PCIe (Peripheral Component Interconnect Express) equipment and detection method thereof

Country Status (1)

Country Link
CN (1) CN105700967A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106326151A (en) * 2016-08-19 2017-01-11 浪潮(北京)电子信息产业有限公司 Method and device for unplugging PCIe equipment
CN110968443A (en) * 2018-09-28 2020-04-07 阿里巴巴集团控股有限公司 Equipment abnormity detection method and device
US10908895B2 (en) * 2018-12-21 2021-02-02 Pensando Systems Inc. State-preserving upgrade of an intelligent server adapter
CN112346917A (en) * 2019-08-09 2021-02-09 烽火通信科技股份有限公司 PCI-E endpoint diagnosis system and method
CN113542052A (en) * 2021-06-07 2021-10-22 新华三信息技术有限公司 Node fault determination method and device and server
US11182150B2 (en) 2020-01-14 2021-11-23 Pensando Systems Inc. Zero packet loss upgrade of an IO device
US11281453B1 (en) 2021-01-06 2022-03-22 Pensando Systems, Inc. Methods and systems for a hitless rollback mechanism during software upgrade of a network appliance
CN115150254A (en) * 2022-06-29 2022-10-04 苏州浪潮智能科技有限公司 PCIe link fault detection method, detection device, equipment and medium
CN115514682A (en) * 2022-09-23 2022-12-23 浪潮商用机器有限公司 Data transmission method, device, equipment and storage medium
WO2023173718A1 (en) * 2022-03-17 2023-09-21 苏州浪潮智能科技有限公司 Communication link update method and apparatus, and related device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103618618A (en) * 2013-11-13 2014-03-05 福建星网锐捷网络有限公司 Line card fault recovery method and related device based on distributed PCIE system
CN104170322A (en) * 2014-04-02 2014-11-26 华为技术有限公司 Method, device and system for processing PCIe link failure
US20150006780A1 (en) * 2013-06-28 2015-01-01 Futurewei Technologies, Inc. System and Method for Extended Peripheral Component Interconnect Express Fabrics
CN105205021A (en) * 2015-09-11 2015-12-30 华为技术有限公司 Method and device for disconnecting link between PCIe (peripheral component interface express) equipment and host computer

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150006780A1 (en) * 2013-06-28 2015-01-01 Futurewei Technologies, Inc. System and Method for Extended Peripheral Component Interconnect Express Fabrics
CN103618618A (en) * 2013-11-13 2014-03-05 福建星网锐捷网络有限公司 Line card fault recovery method and related device based on distributed PCIE system
CN104170322A (en) * 2014-04-02 2014-11-26 华为技术有限公司 Method, device and system for processing PCIe link failure
CN105205021A (en) * 2015-09-11 2015-12-30 华为技术有限公司 Method and device for disconnecting link between PCIe (peripheral component interface express) equipment and host computer

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106326151A (en) * 2016-08-19 2017-01-11 浪潮(北京)电子信息产业有限公司 Method and device for unplugging PCIe equipment
CN110968443B (en) * 2018-09-28 2023-04-11 阿里巴巴集团控股有限公司 Equipment abnormity detection method and device
CN110968443A (en) * 2018-09-28 2020-04-07 阿里巴巴集团控股有限公司 Equipment abnormity detection method and device
US10908895B2 (en) * 2018-12-21 2021-02-02 Pensando Systems Inc. State-preserving upgrade of an intelligent server adapter
CN112346917A (en) * 2019-08-09 2021-02-09 烽火通信科技股份有限公司 PCI-E endpoint diagnosis system and method
US11182150B2 (en) 2020-01-14 2021-11-23 Pensando Systems Inc. Zero packet loss upgrade of an IO device
US11281453B1 (en) 2021-01-06 2022-03-22 Pensando Systems, Inc. Methods and systems for a hitless rollback mechanism during software upgrade of a network appliance
CN113542052A (en) * 2021-06-07 2021-10-22 新华三信息技术有限公司 Node fault determination method and device and server
WO2023173718A1 (en) * 2022-03-17 2023-09-21 苏州浪潮智能科技有限公司 Communication link update method and apparatus, and related device
CN115150254A (en) * 2022-06-29 2022-10-04 苏州浪潮智能科技有限公司 PCIe link fault detection method, detection device, equipment and medium
CN115150254B (en) * 2022-06-29 2023-05-23 苏州浪潮智能科技有限公司 PCIe link fault detection method, detection device, equipment and medium
CN115514682A (en) * 2022-09-23 2022-12-23 浪潮商用机器有限公司 Data transmission method, device, equipment and storage medium
CN115514682B (en) * 2022-09-23 2024-03-22 浪潮商用机器有限公司 Data transmission method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN105700967A (en) PCIe (Peripheral Component Interconnect Express) equipment and detection method thereof
US7010639B2 (en) Inter integrated circuit bus router for preventing communication to an unauthorized port
JP4077812B2 (en) Integrated circuit routers that support individual transmission rates
US7082488B2 (en) System and method for presence detect and reset of a device coupled to an inter-integrated circuit router
CN105205021B (en) Disconnect the method and apparatus linked between PCIe device and main frame
CN109768907A (en) A kind of CAN bus baud rate self-adapting setting method
US20040252642A1 (en) Method of overflow recovery of I2C packets on an I2C router
WO2007040863A2 (en) A simplified universal serial bus (usb) hub architecture
DE102013004542A1 (en) METHOD AND SYSTEM FOR TIMEOUT MONITORING
WO2015169057A1 (en) Packet transmission method and device, and interconnection interface
US7398345B2 (en) Inter-integrated circuit bus router for providing increased security
JP3920280B2 (en) Data transmission method through I2C router
CN100501685C (en) Apparatus and method for maintaining data integrity following parity error detection
CN106502944A (en) The heartbeat detecting method of computer, PCIE device and PCIE device
US20040255193A1 (en) Inter integrated circuit router error management system and method
CN104834624B (en) A kind of anti-interference method of iic bus interface
US20040255195A1 (en) System and method for analysis of inter-integrated circuit router
CN113297022B (en) Method and device for testing expansion bus of high-speed serial computer
US11916802B2 (en) Data transmission flow control regime
WO2001018659A1 (en) Remote event handling in a packet network
JP4941212B2 (en) Electronic device, data processing apparatus, and bus control method
US7596724B2 (en) Quiescence for retry messages on bidirectional communications interface
CN113765824A (en) Response message sending method and device based on MBIM (multimedia broadcast multicast service) interface, MBB (multimedia broadcast multicast service) equipment and medium
Barranco et al. Developing TOBE-CAN: Total order broadcast enforcement in CAN

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20160622