CN114510366A - PCIe link training method, device and medium - Google Patents

PCIe link training method, device and medium Download PDF

Info

Publication number
CN114510366A
CN114510366A CN202210169840.5A CN202210169840A CN114510366A CN 114510366 A CN114510366 A CN 114510366A CN 202210169840 A CN202210169840 A CN 202210169840A CN 114510366 A CN114510366 A CN 114510366A
Authority
CN
China
Prior art keywords
hardware
state
ltssm
state machine
pcie
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210169840.5A
Other languages
Chinese (zh)
Inventor
王廷平
肖佐楠
高事成
郑茳
匡启和
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CCore Technology Suzhou Co Ltd
Original Assignee
CCore Technology Suzhou Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CCore Technology Suzhou Co Ltd filed Critical CCore Technology Suzhou Co Ltd
Priority to CN202210169840.5A priority Critical patent/CN114510366A/en
Publication of CN114510366A publication Critical patent/CN114510366A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0745Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in an input/output transactions management context
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/42Bus transfer protocol, e.g. handshake; Synchronisation
    • G06F13/4204Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus
    • G06F13/4221Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus being an input/output bus, e.g. ISA bus, EISA bus, PCI bus, SCSI bus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2213/00Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F2213/0026PCI express

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application discloses a PCIe link training method, a device and a medium, comprising the following steps: in the process of PCIe link training of a PCIE hardware system, tracking and synchronizing a hardware LTSSM state machine by using a software LTSSM state machine; when the hardware LTSSM state machine generates unexpected state transition, corresponding error correction processing is executed by utilizing a first preset software processing logic; and when the hardware LTSSM state machine has an unrecoverable error, resetting the PCIE hardware system by using a second preset software processing logic so as to restart the PCIe link training. The robustness of PCIe link training can be improved, and the cost is reduced.

Description

PCIe link training method, device and medium
Technical Field
The present disclosure relates to the field of PCIe link training technologies, and in particular, to a PCIe link training method, apparatus, and medium.
Background
PCIe (peripheral component interconnect express) is a high-speed serial computer expansion bus standard, which is characterized by point-to-point high-bandwidth multi-channel transmission, end-to-end reliable transmission, hot plug support, power management, virtualization, and the like, and is widely applied in the fields of consumer electronics, servers, industry, and the like. PCIe link training belongs to PCIe physical layer protocol specification, and is a process of initializing a physical layer, port configuration information, a transceiver module and related link states of equipment at two ends of a PCIe link and finally establishing data communication. The state transition in the link training process is ltssm (link training and status state) and the state machine includes Detect (detection), Polling (Polling), Configuration (Configuration), Recovery (Recovery), L0, L0s, L1, L2, Disabled (close), Loopback (loop back), Hot reset (Hot reset) states, wherein most of the states have a plurality of sub-states.
In the prior art, PCIe link training is performed by a hardware system composed of a controller, a PCS (physical coding layer) and a PMA (physical media attachment), and software initializes some hardware configurations only before the start of link training without much involvement in the link training process. However, since PCIe implemented by different architecture platforms has differences, and hardware emulation cannot completely simulate the real environment of PCIe devices adapting to various host servers, this presents a great challenge to the link training compatibility of PCIe devices. In the prior art, once the device link training fails, the problem points cannot be quickly located, a large amount of time is needed for capturing signals and data packets by using a high-frequency oscilloscope and a protocol analyzer to analyze the problems, then the initialization configuration is modified by software to debug, the effect of dynamic adjustment cannot be achieved, new compatibility problems are easily introduced, and the problems are solved only by modifying hardware and adding more cost because each state or even sub-state of the LTSSM is possible to be problematic and different solutions are provided.
Disclosure of Invention
In view of the above, an object of the present application is to provide a PCIe link training method, apparatus, and medium. The robustness of PCIe link training can be improved, and the cost is reduced. The specific scheme is as follows:
in a first aspect, the present application discloses a PCIe link training method, including:
in the process of PCIe link training of a PCIE hardware system, tracking and synchronizing a hardware LTSSM state machine by using a software LTSSM state machine;
when the hardware LTSSM state machine generates unexpected state transition, corresponding error correction processing is executed by utilizing a first preset software processing logic;
and when the hardware LTSSM state machine has an unrecoverable error, resetting the PCIE hardware system by using a second preset software processing logic so as to restart the PCIe link training.
Optionally, the tracking and synchronizing the hardware LTSSM state machine by using the software LTSSM state machine includes:
querying a state register of the hardware LTSSM in real time by using a software LTSSM state machine to obtain a register value;
determining the current state of the hardware LTSSM state machine according to the register value;
and synchronizing the state of the software LTSSM state machine to the current state of the hardware LTSSM state machine.
Optionally, the method further includes:
setting a timeout mechanism for each state in the hardware LTSSM state machine;
when any state in the hardware LTSSM state machine is tracked to have no state transition within the timeout time corresponding to the state, the hardware LTSSM state machine is judged to have an unrecoverable error.
Optionally, the method further includes:
when the state of the hardware LTSSM state machine is tracked to enter a detect.Wait state of a detection stage, the hardware LTSSM state machine does not enter a query state within preset time and returns to the detect.Quiet state, and then the hardware LTSSM state machine is judged to have unexpected state transition;
correspondingly, the executing the corresponding error correction processing by using the first preset software processing logic comprises:
the number of lanes of the PCIe link is reconfigured using first predetermined software processing logic.
Optionally, the method further includes:
when the hardware LTSSM state machine is tracked to enter the configuration.laninum.accept state of the configuration stage and then returns to the configuration.laninum.wait state, it is determined that unexpected state transition of the hardware LTSSM state machine occurs;
correspondingly, the executing the corresponding error correction processing by using the first preset software processing logic comprises:
and numbering the channels again by utilizing the first preset software processing logic.
Optionally, the method further includes:
when the condition that the hardware LTSSM state machine enters a polling. company state is tracked, judging that the hardware LTSSM state machine generates unexpected state transition;
correspondingly, the executing the corresponding error correction processing by using the first preset software processing logic comprises:
and checking a characteristic bit of a PCI standard configuration register or a characteristic bit of any TS packet received by a channel receiver by using a first preset software processing logic, if the characteristic bit is 0, controlling the hardware LTSSM state machine to return to a detection state, checking whether all channels have an electrical idle condition, if so, judging that the error reason is that the channels are not aligned, and reconfiguring the number of the channels.
Optionally, the method further includes:
monitoring the PERST signal and the thermal reset signal by using an interrupt controller;
and when the PERST signal or the hot reset signal is monitored to be effective, initializing a configuration space register and restarting PCIe link training.
Optionally, after the unexpected state transition of the hardware LTSSM state machine, the method further includes: recording an error log by using a first preset software log recording logic;
after the unrecoverable error occurs in the hardware LTSSM state machine, the method further includes: and recording an error log by using second preset software log recording logic.
In a second aspect, the present application discloses a PCIe link training apparatus, including:
the hardware state machine tracking and synchronizing module is used for tracking and synchronizing the hardware LTSSM by using the software LTSSM in the process of performing PCIe link training on the PCIE hardware system;
the error correction processing module is used for executing corresponding error correction processing by utilizing a first preset software processing logic when the hardware LTSSM state machine generates unexpected state transition;
and the hardware system resetting module is used for resetting the PCIE hardware system by utilizing a second preset software processing logic when the hardware LTSSM state machine has an unrecoverable error so as to restart the PCIe link training.
In a third aspect, the present application discloses a computer readable storage medium for storing a computer program, wherein the computer program when executed by a processor implements the PCIe link training method described above.
It can be seen that, in the process of PCIE link training in a PCIE hardware system, the hardware LTSSM state machine is tracked and synchronized by using the software LTSSM state machine, when the hardware LTSSM state machine undergoes unexpected state migration, corresponding error correction processing is executed by using the first preset software processing logic, and when an unrecoverable error occurs in the hardware LTSSM state machine, the PCIE hardware system is reset by using the second preset software processing logic, so as to restart PCIE link training. That is, the application tracks and synchronizes the hardware LTSSM state machine through the software LTSSM state machine, utilizes the software processing logic to timely carry out error correction processing on unexpected state migration of the hardware LTSSM state machine, and timely carries out reset operation on the PCIE hardware system when an unrecoverable error occurs, so as to restart the training of the PCIE link.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flow chart of a PCIe link training method provided in the present application;
fig. 2 is a schematic diagram of state transition of an LTSSM state machine provided in the present application;
fig. 3 is a schematic state transition diagram of a Detect state machine provided in the present application;
fig. 4 is a schematic diagram illustrating state transition of a Polling state machine provided in the present application;
fig. 5 is a schematic diagram illustrating state transition of a Configuration state machine according to the present application;
FIG. 6 is a schematic diagram illustrating state transition of a Recovery state machine according to the present application;
fig. 7 is a schematic structural diagram of a PCIe link training apparatus provided in the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 1, an embodiment of the present application discloses a PCIe link training method, including:
step S11: in the process of PCIe link training of a PCIE hardware system, a software LTSSM state machine is utilized to track and synchronize the hardware LTSSM state machine.
In a specific implementation manner, a software LTSSM state machine can be used for inquiring a state register of the hardware LTSSM state machine in real time to obtain a register value; determining the current state of the hardware LTSSM state machine according to the register value; and synchronizing the state of the software LTSSM state machine to the current state of the hardware LTSSM state machine.
That is, the value stored in the status register may represent the current status of the hardware LTSSM state machine, and in a specific implementation manner, in the embodiment of the present application, the software LTSSM state machine is used to query the status register of the hardware LTSSM state machine in real time, so as to implement tracking and synchronization of the hardware state machine.
For example, referring to fig. 2, fig. 2 is a schematic diagram illustrating state transition of an LTSSM state machine according to an embodiment of the present disclosure. The state machine includes Detect, Polling, Configuration, Recovery, L0, L0s, L1, L2, Disabled, Loopback, Hot reset states, and most of them have a plurality of sub-states.
Step S12: and when the hardware LTSSM state machine generates unexpected state transition, executing corresponding error correction processing by utilizing first preset software processing logic.
And after the unexpected state transition of the hardware LTSSM state machine, the method further includes: and recording an error log by using first preset software log recording logic.
In a specific implementation manner, after the hardware LTSSM state machine is tracked to enter a detect.wait state of a detection stage, the hardware LTSSM state machine does not enter a query state within a preset time and returns to the detect.query state, and it is determined that unexpected state transition occurs in the hardware LTSSM state machine; accordingly, the embodiments of the present application may reconfigure the number of lanes of the PCIe link by using the first preset software processing logic. And if none of the detected channels is lane0 (i.e. the channel which is logically ordered first), the configured channel crosses to the beginning of lane0, and an error log is recorded, wherein the error reason is that the channels are not aligned.
Further, the first preset software logic in the embodiment of the present application determines the number of lanes of the PCIe link according to the first detection result and the second detection result, and reconfigures the number of lanes of the PCIe link; when the first detection result is in a detect.active state, all the transmitters detect the detection results obtained by the receivers, and the detection result is that detection of a part of channels is successful, for example, there are 8 channels, 3 channels are detected, and all the channels are not detected. The second detection result is a detection result obtained by all the transmitters detecting the receivers, and after the detection result is that the partial channels are successfully detected, the state machine enters a detect. And determining the number of the lanes of the PCIe link according to the first probe result and the second probe result, specifically determining the number of the lanes detected by both the first probe result and the second probe result as the number of the lanes of the PCIe link.
For example, referring to fig. 3, fig. 3 is a schematic diagram of state transition of a Detect state machine according to an embodiment of the present application. In the Detect state machine in the detection stage, the function is to negotiate the number of channels available at two ends of the PCIe link, and the detailed steps are as follows:
detect. quench state: RC circuits of all transmitters on a link detect whether a Receiver load exists, and if any Transmitter detects the Receiver or enters a detect.active state after 12ms (milliseconds); it is noted that lanes, i.e. channels, one channel contains one transmitter and one receiver, and RC circuits, i.e. resistance capacitance circuits.
Active state: all transmitters continue to detect a Receiver, and if the detectors are all detected successfully, the Polling state machine is entered; if the detection fails, returning to a detect.Quiet state, and recording an error log, wherein the error reason is that an opposite end PCIe link is not enabled; if the partial detection is successful, entering a detect.wait state; it is to be understood that, returning to detect, the query state is also an unexpected state transition, and the corresponding error correction processing in this embodiment may be resetting the PCIe hardware system.
Wait state: waiting for 12ms, continuing to detect a Receiver by all transmitters, entering a Polling state machine if the detection result is consistent with the previous step, otherwise returning to a detect.Quiet state, and if the detected channels are not lane0 (the first channel is logically sequenced), crossing the configured channels to lane0 to record error logs, wherein the error reason is that the channels are not aligned; in addition to the above flow, each sub-state of the Detect has a timeout mechanism, and after timeout, if no state migration is sent, the PCIe hardware system is reset, and an error log is recorded, where the error is due to the controller card being dead.
It can be understood that the Detect state machine provided in the embodiment of the present application can effectively solve the problem of link training failure caused by the failure of Detect stage in the PCIe hardware system, such as: the problem of detection logic misjudgment and the problem that the PCIe hardware system does not support channel number self-adaption or cross.
Further, in the embodiment of the present application, when it is tracked that the hardware LTSSM state machine enters a polling.company state, it is determined that the hardware LTSSM state machine has unexpected state transition; correspondingly, the executing the corresponding error correction processing by using the first preset software processing logic comprises: and checking a characteristic bit of a PCIe standard configuration register or a characteristic bit of any TS packet received by a channel receiver by using a first preset software processing logic, if the characteristic bit is 0, controlling the hardware LTSSM state machine to return to a detection state, checking whether all channels have an electrical idle condition, if so, judging that the error reason is that the channels are not aligned, and reconfiguring the number of the channels.
For example, referring to fig. 4, fig. 4 is a schematic diagram of state transition of a Polling state machine according to an embodiment of the present application. The Polling state machine in the inquiry stage has the functions of establishing the polarity of a channel and recovering clock frequency and data from a transmitting end so as to establish communication, and the detailed steps are as follows:
active state: transmiter sends out 1024 TS1(Training Sequence 1) packets, if receivers of all channels receive 8 TS1 packets continuously, the state enters a polling. configuration state, and if the receivers of all channels already receive TS1 packets after 24ms but do not meet the condition that the receivers of all channels receive 8 TS1 packets continuously, the state enters a polling. configuration state; and if the condition of entering the polling. company state is not met after 24ms, entering a Detect state machine, recording an error log, wherein the error is a transceiving error, unexpected state migration occurs currently, and the corresponding error correction process can be resetting the PCIe hardware system.
Configuration state: transmitter sends out 16 TS2 packets, and if receivers of all channels receive 8 continuous TS2 packets, the Configuration state machine is entered; otherwise, the state machine returns to the Detect state machine after 48ms, and an error log is recorded, wherein the error is caused by a transceiving error, and an unexpected state transition currently occurs, and the corresponding error correction process can be resetting the PCIe hardware system.
Polling. company status: the state can test the amplitude and the timing sequence of the physical layer differential signals at two ends of the Link, the conventional Link training should not enter the state, generally caused by a transceiving error or a detect logic error so that channels are not aligned, a register characteristic bit (namely bit 4 of Link Control2 of a PCIe standard configuration register) or a characteristic bit in a TS packet (namely bits 2 and 4 of Symbol 5) is checked, if the characteristic bit is 0, the state is directly returned to a detect state machine, whether all channels have Electric Idle conditions or not is checked, if yes, the recording error reason is that the channels are not aligned, the number of the channels is reconfigured, otherwise, the error reason is the transceiving error.
It should be noted that the TS is a data packet of the PCIe physical layer, and has two formats, i.e., TS1 and TS2, which are used for link training and defined in the PCIe protocol specification.
That is, the embodiments of the present application provide a solution for link training to miss a Polling state, where at the beginning of a Polling phase, a receiver needs to recover and synchronize clock and data from a bit stream sent by an opposite-end transmitter, and sometimes needs to try multiple times to succeed in synchronization in consideration of signal attenuation, jitter, and other factors, but there is a case that synchronization succeeds when a 24ms timeout is reached, but a Polling state is missed when a continuous 8 TS1 packets are received, which is a test state, and conventional link training generally does not involve this state, and a timeout exit is not specified in a protocol and a specific operation exit is required, and therefore, the software checks whether a characteristic bit 1 exists in a register and a TS1 packet when the LTSSM is the Polling state, and if not, returns to the Detect phase directly. It can be understood that the Polling state machine provided in the embodiment of the present application can effectively solve the problem of a Polling state caused by a non-robust PCIe hardware system under the condition that a PMA signal conforms to the PCIe eye diagram standard, where the Polling state does not specify a timeout exit in the PCIe standard, and after the Polling, the Polling state is probably always in a dead loop in the Polling state, so that link training fails.
Further, in the embodiment of the present application, after it is tracked that the hardware LTSSM state machine enters the configuration.lanenum.accept state of the configuration stage, and then returns to the configuration.lanenum.wait state, it is determined that the hardware LTSSM state machine has unexpected state transition; correspondingly, the executing the corresponding error correction processing by using the first preset software processing logic comprises: and numbering the channels again by utilizing the first preset software processing logic.
For example, referring to fig. 5, fig. 5 is a schematic diagram illustrating state transition of a Configuration state machine according to an embodiment of the present application. The Configuration state machine in the Configuration stage is used for negotiating the connection number and the channel number by sending and receiving TS1 and TS2 packets. Since the Polling phase has already established communication, this phase negotiates more details on this basis, which is not a great problem if the PMA signals comply with the PCIe eye diagram standard. The detailed steps are as follows:
configuration. linkwidth. start state: transmitter sends out 2 TS1 packets with Link number (namely, connection number), and if a Receiver receives 2 TS1 packets with consistent Link number, the Receiver enters a configuration. If receiving TS packet with Disabled Link position, entering Disabled state; if receiving TS packet with Loopback position, entering Loopback state; and after 24ms of timeout, returning to the Detect state, and recording an error log, wherein the error is a transceiving error, and an unexpected state transition occurs currently, and the corresponding error correction process may be resetting the PCIe hardware system.
Configuration. linkwidth. accept state: transmitter sends out a TS1 packet with own connection number and Lane number, and then enters a configuration. And returning to the Detect state after 3ms of timeout, and recording an error log, wherein the error is caused by a transceiving error.
Configuration. The Receiver receives 2 continuous TS1 packets with own connection number and Lane number and enters a configuration. And returning to the Detect state after 2ms of timeout, and recording an error log, wherein the error is a transceiving error, and an unexpected state transition occurs currently, and the corresponding error correction process can be resetting the PCIe hardware system.
Configuration. The Receiver of each connection receives 2 continuous connection numbers and TS1 packets sent by Lane number and Transmitter in accordance, and then enters a configuration. If the channel number is inconsistent with the channel number, returning to a configuration. If a TS1 packet without a connection number and Lane number is received or the time is 3ms, the state returns to the Detect state, an error log is recorded, the error reason is a transceiving error, unexpected state transition currently occurs, and the corresponding error correction process can be resetting the PCIe hardware system.
Configuration. complete state: transmitter sends out TS2 packets with Link number and Lane number, and Receiver receives 8 continuous TS2 packets and enters a configuration. And entering a Detect state after 2ms of timeout, and recording an error log, wherein the error is a transceiving error, and an unexpected state transition occurs currently, and the corresponding error correction processing can be reset on the PCIe hardware system.
Identity state: transmitter sends out Idle packets, and a Receiver receives 8 continuous Idle packets and enters an L0 state; and after 2ms, if a higher rate is supported, entering a Recovery state machine, otherwise, returning to a Detect state, recording an error log, wherein the error reason is a transceiving error, and an unexpected state migration occurs currently, and the corresponding error correction process can be resetting the PCIe hardware system.
It should be noted that Link number, i.e. a connection number, is a character in the TS1 packet, and a device may establish one or more connections during Link training; lane number, a character in TS1 packet, there will be one or more channels in a connection; lane reverse, Lane reverse, an optional feature in the PCIe specification, indicates that the Lane number is reversed.
It can be understood that the Configuration state machine provided by the embodiment of the present application can effectively solve the problem that link training fails because hardware does not support Lane reverse characteristics.
That is, the embodiments of the present application provide a solution for adaptive link training channel number, when a device is interconnected with a port with fewer channels than the device, in a Detect phase, checking whether transmitters of all channels Detect receiver circuits, if not, waiting for 12ms and then detecting again, and if so, configuring a relevant register to close redundant channels and re-entering the Detect phase; when the device is interconnected with the port with inverted or crossed channel, in the Configuration stage, the channel number in the TS1 packet received by each channel receiver is checked, and if the channel number is not consistent, the channel number is reconfigured.
Furthermore, in the embodiment of the present application, the speed-up flag bit may be actively set through a corresponding software processing logic, so as to increase the link rate, and the number of channels of the capability register may be actively modified. For example, referring to fig. 6, an embodiment of the present application provides a schematic diagram of state transition of a Recovery state machine. The Recovery state machine in the Recovery phase aims at establishing a new rate, the number of connections or channels at the PCIe both ends or recovering the connections from a low power consumption or error state. The software intervention procedure is as follows:
speed increasing: when the device supports a current higher Speed than the link, a Speed Change flag bit is actively set, the state is changed from recovery.RecvLock to recovery.Speed, the coding mode of a PCS and the clock of a PMA are monitored at the moment, the state returns to recovery.RecvLock after the Speed updating is successful, the state is gradually changed into recovery.RecvCfg and recovery.Idle by receiving and sending TS packets and Idle packets, and finally the state enters an L0 state.
Updating the number of channels: when the equipment supports more channels than the current link and redundant channels can monitor the Receiver circuit, the number of the channels of the capability register is actively modified, the state enters a Configuration state machine from a Recovery state machine, and the state enters an L0 state after the number of the channels is successfully updated;
it can be understood that, the higher transmission rate and the higher channel number can be more adaptive on the basis of establishing the connection, and the robustness of the link training is improved.
Further, Disabled (off) state: and after the TS packet with the Link disabled position 1 is received, the state is entered, the PCIe hardware system is directly reset to carry out Link training again, an error log is recorded, and the error reason is a closing error.
And, Hot Reset (Hot Reset) state: the state is entered after receiving the TS packet with the Hot Reset position 1, the registers of the PCIe configuration space are Reset after entering the state, the registers are reconfigured through interruption, the error log is recorded, the error reason is a Hot Reset error, and the link training is restarted.
In a specific embodiment, the embodiment of the present application may utilize an interrupt controller to monitor the PERST signal and the hot reset signal; and when the PERST signal or the hot reset signal is monitored to be effective, initializing a configuration space register and restarting PCIe link training.
That is, the embodiments of the present application provide a solution for recovering configuration after passive Reset, when a # PERST signal on a circuit is pulled low or enters a Hot Reset state, a PCIe configuration space register is Reset at this time, if the configuration is not recovered, link training may fail, an interrupt controller of a coprocessor subsystem monitors these signals, when the signals are valid, an interrupt coprocessor enters an interrupt processing function, the LTSSM is closed in the function and the configuration space register is initialized again, and after the initialization is completed, the LTSSM is enabled again to start link training.
Loop (loop) state: and after the TS packet with the Loopback position 1 is received, the state is entered, the PCIe hardware system is directly reset to carry out link training again, an error log is recorded, and the error reason is a Loopback error.
L0s, L1, L2 state: these are low power states and the hardware system may choose to support, and when the hardware system does not support, the PCIe related capability registers are configured to turn off these functions.
L0 state: the state indicates that the PCIe link has been successfully established, higher level communication is possible, and the state may be migrated to a low power consumption state or a Recovery state, and the LTSSM state may be continuously monitored by the coprocessor and the corresponding processing may be performed on the state migration.
Step S13: and when the hardware LTSSM state machine has an unrecoverable error, resetting the PCIE hardware system by using a second preset software processing logic so as to restart the PCIe link training.
In a specific implementation manner, in the embodiment of the present application, a timeout mechanism may be set for each state in the hardware LTSSM state machine; when state transition does not occur in any state of the hardware LTSSM state machines within the timeout time corresponding to the state, determining that an unrecoverable error occurs in the hardware LTSSM state machines.
That is, except for the specified timeout time of the LTSSM state machine in the PCIe standard during state migration, in the embodiment of the present application, a timeout mechanism is set for each state, so that the state machine is prevented from being stuck in one state.
In addition, in the embodiment of the application, after the unrecoverable error occurs in the hardware LTSSM state machine, an error log may be recorded by using a second preset software log recording logic.
That is, the method and the device realize tracking and synchronization of the software LTSSM state machine on the hardware LTSSM state machine, when the hardware LTSSM state machine is subjected to unexpected state transition, the software records an error log and executes corresponding error processing to enable the hardware to correctly perform link training, and when the hardware LTSSM state machine is subjected to unrecoverable error, the software records the error log and executes corresponding reset operation to enable the hardware to perform link training again; when the PCIe link training fails, the problem points can be quickly positioned according to the error logs, and the connection is tried to be established as much as possible while fault tolerance is realized, so that various training time sequences of different hosts and servers can be met, the robustness of link training is improved, product updating iteration is facilitated, and the cost is reduced.
It should be noted that, in the embodiment of the present application, the first preset software processing logic, the second preset software processing logic, the first preset software logging logic, and the second preset software logging logic may all be embedded into the software state machine. That is, when a problem occurs in any state of the hardware state machine, the hardware state machine can correct the problem and record an error log through the software state machine.
Further, the present application may provide a PCIe system, which integrates a coprocessor subsystem in addition to a PCIe hardware system composed of an integrated controller, a PCS module and a PMA module, where the subsystem includes a CPU (Central processing unit, Central controller), an interrupt controller, a clock counter, an erasable programmable flash memory, and a random access memory RAM, where the CPU is configured to run the software, the interrupt controller is configured to monitor that a specific PCIe signal generates an interrupt, the clock counter is configured to time, the erasable programmable flash memory is configured to store and update the software, and the random access memory RAM is configured to store data during software running; the system is beneficial to product iteration, software makes up for the defects of a hardware system of the current product, and a next-stage product modifies the hardware system according to a software scheme, so that the cost can be effectively reduced through iteration.
The PCS belongs to a physical layer, and mainly has the functions of serial-parallel conversion, converting the parallel data of an upper layer into serial data and converting the serial data of a lower layer into parallel data; the PMA belongs to the physical layer and its main function is to provide the clock frequency and to transmit serial data.
It can be seen that, in the PCIE link training process of the PCIE hardware system in the embodiment of the present application, the hardware LTSSM state machine is tracked and synchronized by using the software LTSSM state machine, when the hardware LTSSM state machine is subjected to unexpected state migration, the first preset software processing logic is used to execute corresponding error correction processing, and when the hardware LTSSM state machine is subjected to unrecoverable error, the second preset software processing logic is used to perform reset operation on the PCIE hardware system, so as to restart PCIE link training. That is, the application tracks and synchronizes the hardware LTSSM state machine through the software LTSSM state machine, utilizes the software processing logic to timely carry out error correction processing on unexpected state migration of the hardware LTSSM state machine, and timely carries out reset operation on the PCIE hardware system when an unrecoverable error occurs, so as to restart the training of the PCIE link.
Referring to fig. 7, an embodiment of the present application discloses a PCIe link training apparatus, including:
a hardware state machine tracking and synchronizing module 11, configured to track and synchronize a hardware LTSSM state machine by using a software LTSSM state machine in a PCIE hardware system performing PCIE link training;
an error correction processing module 12, configured to, when the hardware LTSSM state machine undergoes unexpected state transition, execute corresponding error correction processing by using a first preset software processing logic;
and the hardware system resetting module 13 is configured to, when an unrecoverable error occurs in the hardware LTSSM state machine, perform a resetting operation on the PCIE hardware system by using a second preset software processing logic, so as to restart PCIE link training.
It can be seen that, in the PCIE link training process of the PCIE hardware system in the embodiment of the present application, the hardware LTSSM state machine is tracked and synchronized by using the software LTSSM state machine, when the hardware LTSSM state machine is subjected to unexpected state migration, the first preset software processing logic is used to execute corresponding error correction processing, and when the hardware LTSSM state machine is subjected to unrecoverable error, the second preset software processing logic is used to perform reset operation on the PCIE hardware system, so as to restart PCIE link training. That is, the application tracks and synchronizes the hardware LTSSM state machine through the software LTSSM state machine, utilizes the software processing logic to timely carry out error correction processing on unexpected state migration of the hardware LTSSM state machine, and timely carries out reset operation on the PCIE hardware system when an unrecoverable error occurs, so as to restart the training of the PCIE link.
A hardware state machine tracking and synchronizing module 11, specifically configured to query a state register of the hardware LTSSM state machine in real time by using a software LTSSM state machine to obtain a register value; determining the current state of the hardware LTSSM state machine according to the register value; and synchronizing the state of the software LTSSM state machine to the current state of the hardware LTSSM state machine.
Further, the device further comprises a timeout mechanism setting module, configured to set a timeout mechanism for each state in the hardware LTSSM state machine; correspondingly, the apparatus further includes an error determination module, configured to determine that an unrecoverable error occurs in the hardware LTSSM state machine when it is tracked that state transition does not occur in any state of the hardware LTSSM state machine within a timeout time corresponding to the state.
Further, the error determination module is further configured to determine that the hardware LTSSM state machine has unexpected state transition if the hardware LTSSM state machine does not enter the inquiry state within a preset time and returns to the detect.query state after the hardware state machine performs tracking and the synchronization module tracks that the hardware LTSSM state machine enters the detect.wait state of the detection stage.
Correspondingly, the error correction processing module 12 is specifically configured to reconfigure the number of lanes of the PCIe link by using a first preset software processing logic.
Further, the error determination module is further configured to, after the hardware state machine performs tracking and synchronization module tracking that the hardware LTSSM state machine enters the configuration.lanenum.accept state of the configuration stage, return to the configuration.lanenum.wait state, and determine that the hardware LTSSM state machine has unexpected state transition;
correspondingly, the error correction processing module 12 is specifically configured to renumber the channels by using a first preset software processing logic.
Further, the error determination module is further configured to determine that the hardware LTSSM state machine has unexpected state transition when the hardware state machine performs tracking and the synchronization module tracks that the hardware LTSSM state machine enters a polling.
Correspondingly, the error correction processing module 12 is specifically configured to utilize a first preset software processing logic to check a feature bit of a PCIe standard configuration register or a feature bit of any TS packet received by a channel receiver, if the feature bit is 0, control the hardware LTSSM state machine to return to a detection state, check whether all channels are electrically idle, if so, determine that a cause of error is that the channels are not aligned, and reconfigure the number of the channels.
Further, the apparatus further comprises:
the signal monitoring module is used for monitoring a PERST signal and a thermal reset signal by using the interrupt controller;
and the initialization module is used for initializing the configuration space register and restarting PCIe link training when the signal monitoring module monitors that the PERST signal or the hot reset signal is valid.
The device further comprises an error log recording module, configured to record an error log by using a first preset software log recording logic after unexpected state transition of the hardware LTSSM state machine occurs; and when the hardware LTSSM state machine has an unrecoverable error, recording an error log by using second preset software log recording logic.
Further, an embodiment of the present application also discloses a computer readable storage medium for storing a computer program, wherein the computer program, when executed by a processor, implements the PCIe link training method disclosed in the foregoing embodiment.
For the specific process of the PCIe link training method, reference may be made to corresponding contents disclosed in the foregoing embodiments, and details are not described here.
The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The PCIe link training method, apparatus, and medium provided in the present application are described in detail above, and specific examples are applied in the present application to explain the principle and the implementation of the present application, and the description of the above embodiments is only used to help understand the method and the core idea of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (10)

1. A PCIe link training method, comprising:
in the process of PCIe link training of a PCIE hardware system, tracking and synchronizing a hardware LTSSM state machine by using a software LTSSM state machine;
when the hardware LTSSM state machine generates unexpected state transition, corresponding error correction processing is executed by utilizing a first preset software processing logic;
and when the hardware LTSSM state machine has an unrecoverable error, resetting the PCIE hardware system by using a second preset software processing logic so as to restart the PCIe link training.
2. The PCIe link training method of claim 1, wherein the tracking and synchronizing the hardware LTSSM state machine with the software LTSSM state machine comprises:
querying a state register of the hardware LTSSM in real time by using a software LTSSM state machine to obtain a register value;
determining the current state of the hardware LTSSM state machine according to the register value;
and synchronizing the state of the software LTSSM state machine to the current state of the hardware LTSSM state machine.
3. The PCIe link training method of claim 1, further comprising:
setting a timeout mechanism for each state in the hardware LTSSM state machine;
when any state in the hardware LTSSM state machine is tracked to have no state transition within the timeout time corresponding to the state, the hardware LTSSM state machine is judged to have an unrecoverable error.
4. The PCIe link training method of claim 1, further comprising:
when the state of the hardware LTSSM state machine is tracked to enter a detect.Wait state of a detection stage, the hardware LTSSM state machine does not enter a query state within preset time and returns to the detect.Quiet state, and then the hardware LTSSM state machine is judged to have unexpected state transition;
correspondingly, the executing the corresponding error correction processing by using the first preset software processing logic comprises:
the number of lanes of the PCIe link is reconfigured using first predetermined software processing logic.
5. The PCIe link training method of claim 1, further comprising:
when the hardware LTSSM state machine is tracked to enter the configuration.laninum.accept state of the configuration stage and then returns to the configuration.laninum.wait state, it is determined that unexpected state transition of the hardware LTSSM state machine occurs;
correspondingly, the executing the corresponding error correction processing by using the first preset software processing logic comprises:
and numbering the channels again by utilizing the first preset software processing logic.
6. The PCIe link training method of claim 1, further comprising:
when the condition that the hardware LTSSM state machine enters a polling. company state is tracked, judging that the hardware LTSSM state machine generates unexpected state transition;
correspondingly, the executing the corresponding error correction processing by using the first preset software processing logic comprises:
and checking a characteristic bit of a PCIe standard configuration register or a characteristic bit of any TS packet received by a channel receiver by using a first preset software processing logic, if the characteristic bit is 0, controlling the hardware LTSSM state machine to return to a detection state, checking whether all channels have an electrical idle condition, if so, judging that the error reason is that the channels are not aligned, and reconfiguring the number of the channels.
7. The PCIe link training method of claim 1, further comprising:
monitoring the PERST signal and the thermal reset signal by using an interrupt controller;
and when the PERST signal or the hot reset signal is monitored to be effective, initializing a configuration space register and restarting PCIe link training.
8. The PCIe link training method as recited in any one of claims 1 to 7, further comprising, after an unexpected state migration of the hardware LTSSM state machine occurs: recording an error log by using a first preset software log recording logic;
after the unrecoverable error occurs in the hardware LTSSM state machine, the method further includes: and recording an error log by using second preset software log recording logic.
9. A PCIe link training apparatus, comprising:
the hardware state machine tracking and synchronizing module is used for tracking and synchronizing the hardware LTSSM by using the software LTSSM in the process of performing PCIe link training on the PCIE hardware system;
the error correction processing module is used for executing corresponding error correction processing by utilizing a first preset software processing logic when the hardware LTSSM state machine generates unexpected state transition;
and the hardware system resetting module is used for resetting the PCIE hardware system by utilizing a second preset software processing logic when the hardware LTSSM state machine has an unrecoverable error so as to restart the PCIe link training.
10. A computer-readable storage medium for storing a computer program, wherein the computer program, when executed by a processor, implements the PCIe link training method as recited in any one of claims 1 to 8.
CN202210169840.5A 2022-02-23 2022-02-23 PCIe link training method, device and medium Pending CN114510366A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210169840.5A CN114510366A (en) 2022-02-23 2022-02-23 PCIe link training method, device and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210169840.5A CN114510366A (en) 2022-02-23 2022-02-23 PCIe link training method, device and medium

Publications (1)

Publication Number Publication Date
CN114510366A true CN114510366A (en) 2022-05-17

Family

ID=81553597

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210169840.5A Pending CN114510366A (en) 2022-02-23 2022-02-23 PCIe link training method, device and medium

Country Status (1)

Country Link
CN (1) CN114510366A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118193289A (en) * 2024-03-21 2024-06-14 无锡众星微系统技术有限公司 Method and device for recovering PCIe configuration space

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118193289A (en) * 2024-03-21 2024-06-14 无锡众星微系统技术有限公司 Method and device for recovering PCIe configuration space

Similar Documents

Publication Publication Date Title
US8238255B2 (en) Recovering from failures without impact on data traffic in a shared bus architecture
US5185693A (en) Method and apparatus for providing backup process control
US6189109B1 (en) Method of remote access and control of environmental conditions
US7930425B2 (en) Method of effectively establishing and maintaining communication linkages with a network interface controller
US20110320706A1 (en) Storage apparatus and method for controlling the same
KR100968641B1 (en) Point-to-point link negotiation method and apparatus
CN101976217B (en) Anomaly detection method and system for network processing unit
CN110690894B (en) Clock failure safety protection method and circuit
US7453816B2 (en) Method and apparatus for automatic recovery from a failed node concurrent maintenance operation
US7890794B1 (en) Handling SAS topology problems
JP5835464B2 (en) Information processing apparatus and information processing apparatus control method
WO2021212943A1 (en) Server power supply maintenance method, apparatus and device, and medium
US9331922B2 (en) Automatic recover after loss of signal event in a network device
CN115391262A (en) High-speed peripheral component interconnection interface device and operation method thereof
JP2004326775A (en) Mechanism for fru fault isolation in distributed node environment
CN114510366A (en) PCIe link training method, device and medium
CN113359968A (en) Method, system, device and medium for resetting PCIE (peripheral component interface express) device based on ARM (advanced RISC machine) platform
US20090072953A1 (en) Reliable Redundant Data Communication Through Alternating Current Power Distribution System
CN115865743A (en) Device and method for realizing network connectivity detection of fusion type set top box
US20240248819A1 (en) Peripheral component interconnect express device and operating method thereof
CN114281639A (en) Storage server fault SAS physical link shielding device and method
WO2024131133A1 (en) Data processing method and apparatus for electronic device, and electronic device and storage medium
CN117527653A (en) Cluster heartbeat management method, system, equipment and medium
US20100023801A1 (en) Method to recover from ungrouped logical path failures
CN115220810A (en) Loongson 2K 1000-based FPGA resource dynamic loading configuration method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination