CN115328668A - Fault processing method, dual-core lockstep system, electronic device and medium - Google Patents

Fault processing method, dual-core lockstep system, electronic device and medium Download PDF

Info

Publication number
CN115328668A
CN115328668A CN202210897131.9A CN202210897131A CN115328668A CN 115328668 A CN115328668 A CN 115328668A CN 202210897131 A CN202210897131 A CN 202210897131A CN 115328668 A CN115328668 A CN 115328668A
Authority
CN
China
Prior art keywords
fault
processor
dual
data processing
type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210897131.9A
Other languages
Chinese (zh)
Inventor
邬宇剑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ARM Technology China Co Ltd
Original Assignee
ARM Technology China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ARM Technology China Co Ltd filed Critical ARM Technology China Co Ltd
Priority to CN202210897131.9A priority Critical patent/CN115328668A/en
Publication of CN115328668A publication Critical patent/CN115328668A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/52Program synchronisation; Mutual exclusion, e.g. by means of semaphores
    • G06F9/524Deadlock detection or avoidance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/1438Restarting or rejuvenating

Abstract

The application relates to the technical field of processors and discloses a fault processing method, a dual-core lockstep system, electronic equipment and a medium. The method comprises the following steps: the dual-core step locking system detects that a first processor has a fault in a main clock of a first data processing node, and records fault information of the main clock; judging the fault type of the first processor; when the fault type is a fault which does not affect the operation of the dual-core lockstep system, clearing the fault information of the main clock, and when the fault type is a second fault, the dual-core lockstep system reserves the fault information of the main clock and sends the fault information of the main clock to the main control circuit; and after the master control circuit receives the fault information of the master clock, the dual-core lockstep system is controlled to restart. Based on the scheme, by utilizing the characteristic that the second processor or called a redundant kernel carries out delay processing on the same data, the master clock error can be tolerated to the maximum extent when the master clock fails, the processing times of an external system on the DCLS system error are effectively reduced, and the system resource is saved.

Description

Fault processing method, dual-core lockstep system, electronic device and medium
Technical Field
The present application relates to the field of processor technologies, and in particular, to a fault handling method, a dual core lockstep system, an electronic device, and a medium.
Background
With the development of industry, microcontrollers play an increasingly important role in the development of industrial automation in China. The reliability and safety of a processor, which is the core of a microcontroller, are facing serious challenges due to the constantly updated process nodes and the constantly evolving attack techniques. In terms of reliability, one of the main fault tolerance methods for processors at present is to adopt a Dual-Core Lockstep (DCLS) system. As shown in fig. 1, a DCLS system generally includes two processors and a detection unit, for example, the first processor, the second processor and the detection unit shown in fig. 1, the two processors generally input the same data and execute the same instructions, and the detection unit monitors the states of the two processors in real time every clock cycle.
When any fault of any processor of the two processors is detected, the external circuit controls the DCLS system to restart and recovers the normal operation of the DCLS system so as to guarantee the safety of the system. In this case, a large number of programs need to be re-executed to complete the final resume execution, thus taking a long time. And a slight fault which does not influence the operation of the system in some of the faults can also cause the system to restart, thereby causing a great deal of resource waste.
Disclosure of Invention
The embodiment of the application provides a fault processing method, a dual-core lockstep system, electronic equipment and a medium.
In a first aspect, an embodiment of the present application provides a fault handling method, which is applied to an electronic device, where the electronic device includes a dual-core step locking system and a master control circuit; the dual-core step locking system comprises a first processor and a second processor which execute the same instruction, detects that the first processor fails at a main clock of a first data processing node, and records main clock failure information; the dual-core step locking system judges the fault type of the first processor; the dual-core lockstep system clears the main clock fault information corresponding to the fault type of the first processor as a first type of fault, wherein the first type of fault is a fault which does not influence the operation of the dual-core lockstep system; the dual-core lockstep system reserves main clock fault information corresponding to the second type of fault of the fault type of the first processor and sends the main clock fault information to the main control circuit; and after the master control circuit receives the fault information of the master clock, the dual-core lockstep system is controlled to restart.
It can be understood that, in the embodiment of the present application, based on the above scheme, when it is determined that the master clock of the first processor has a fault, it may be first determined whether the fault type of the fault is a negligible fault that does not affect the system operation, and when it is determined that the fault type of the fault is a negligible fault, the master clock fault information is not output, so that the DCLS system is not restarted due to the fault that does not affect the system operation, the restart times of the DCLS system are effectively reduced, and system resources are saved.
It is understood that the master circuit mentioned in the embodiments of the present application may be an external circuit mentioned in the embodiments of the present application.
In one possible implementation of the present application, determining the type of the failure of the first processor includes:
judging the working mode of the second processor at a second data processing node corresponding to the first data processing node; determining the fault as a first type of fault corresponding to the working mode being a mode without processing input data; and determining the fault as a second type of fault corresponding to the mode that the working mode is the mode for processing the input data, wherein the second type of fault is the fault influencing the operation of the dual-core lockstep system.
In one possible implementation of the present application, determining an operating mode of the second processor at the second data processing node corresponding to the first data processing node includes: acquiring a ready signal of a second processor in an output signal of a second data processing node corresponding to the first data processing node; under the condition that a ready signal in the output signal is at a high level, determining that the working mode of the second processor at a second data processing node corresponding to the first data processing node is a mode for processing the input data; and determining that the working mode of the second processor at the second data processing node corresponding to the first data processing node is a mode which does not process the input data when the ready signal in the output signal is at a low level.
It can be understood that when the ready signal in the output signal of the second processor is high level, it may be determined that the second processor is performing data processing, and when it is determined that the second processor is performing data processing, the first processor cannot perform data processing due to a fault, and therefore, the master clocks of the first processor and the second processor at the current node may be inconsistent, so that the system is affected. Therefore, at this time, it can be determined that the fault is a fault affecting the operation of the system, and master clock fault information is output to the external circuit.
If the second processor determines that the second processor does not perform data processing, it may be verified that the processing mode of the current node may be a mode in which data processing is not performed, such as a sleep mode, that is, the current processing node may be a node corresponding to a state in which the electronic device performs a standby mode, or the like. Therefore, the main clock of the first processor in the current period fails, data is not processed, and system operation is not influenced. The recorded master clock failure information may be cleared at this point.
In one possible implementation of the present application, the first data processing node and the second data processing node are separated by a set period.
In one possible implementation of the present application, the period is set to two periods.
In a second aspect, the present application provides a dual core lockstep system, which includes a first processor and a second processor, wherein the first processor and the second processor are used for executing the same instruction. The fault detection module is used for detecting that the first processor has a fault in a main clock of the first data processing node and recording fault information of the main clock; the second comparison unit is used for judging the fault type of the first processor; the first comparison unit is used for clearing main clock fault information under the condition that the fault type of the first processor is a first type of fault, wherein the first type of fault is a fault which does not affect the operation of the dual-core lockstep system, and the first comparison unit is used for reserving the main clock fault information and sending the main clock fault information to the main control circuit under the condition that the fault type of the first processor is a second type of fault.
In one possible implementation of the present application, the dual core lockstep system further includes an input delay register unit for delaying the input data by a set period to transmit to the second processor.
In a third aspect, the present application provides an electronic device, comprising: the processor is one of the one or more processors of the electronic device and is configured to perform the fault handling method.
In a fourth aspect, the present application provides a readable storage medium having stored thereon instructions that, when executed on an electronic device, cause the electronic device to perform the above-mentioned fault handling method.
In a fifth aspect, the present application provides a computer program product comprising execution instructions that, when executed on an electronic device, cause the electronic device to perform the above-mentioned fault handling method.
Drawings
FIG. 1 illustrates a schematic diagram of a dual core lockstep system, according to some embodiments of the present application;
FIG. 2 illustrates a schematic structural diagram of a dual core lockstep system, according to some embodiments of the present application;
FIG. 3 illustrates a workflow diagram of a dual core lockstep system, according to some embodiments of the present application;
FIG. 4 illustrates a workflow diagram of a dual core lockstep system, according to some embodiments of the present application;
FIG. 5 illustrates a workflow diagram of a dual core lockstep system, according to some embodiments of the present application;
FIG. 6 illustrates a flow diagram of a fault handling method, according to some embodiments of the present application;
fig. 7 illustrates a block diagram of an electronic device, according to some embodiments of the present application.
Detailed Description
Illustrative embodiments of the present application include, but are not limited to, a fault handling method, dual core lockstep system, electronic device, and medium.
It will be appreciated that as used herein, the term module may refer to or include an Application Specific Integrated Circuit (ASIC), an electronic circuit, a processor (shared, dedicated, or group) and/or memory that executes one or more software or firmware programs, a combinational logic circuit, and/or other suitable hardware components that provide the described functionality, or may be part of such hardware components.
It is to be appreciated that in various embodiments of the present application, the processor may be a microprocessor, a digital signal processor, a microcontroller, or the like, and/or any combination thereof. According to another aspect, the processor may be a single-core processor, a multi-core processor, the like, and/or any combination thereof.
In order to solve the above problem, an embodiment of the present application provides a fault handling method, which is used for a dual-core lockstep system. The dual-core lockstep system comprises a first processor and a second processor, wherein the input of the first processor and the input of the second processor are the same input data.
The fault processing method comprises the following steps: and if the current data processing node detects that the main clock of the first processor has a fault, recording the fault information of the main clock. And determining the fault type of the first processor, and clearing the recorded main clock fault information when the fault type corresponding to the first processor is a negligible fault which does not influence the operation of the system. And when the fault type corresponding to the first processor is a non-negligible fault affecting the operation of the system, outputting recorded fault information of the main clock to an external circuit so that the external circuit controls the dual-core lockstep system to restart.
In some embodiments, the determining the fault type of the first processor may be to obtain an operating mode (or referred to as a data processing condition) of the second processor at a data processing node corresponding to the data processing node, and determine the fault type corresponding to the first processor according to the data processing condition of the second processor. And if the data processing condition of the second processor is that the input data is not processed, determining that the fault type corresponding to the first processor is a negligible fault which does not influence the system operation, and clearing the recorded main clock fault information. And if the data processing condition of the second processor is that the input data is processed, determining that the fault type corresponding to the first processor is a non-negligible fault affecting the system operation, and outputting the recorded main clock fault information to an external circuit so that the external circuit controls the dual-core lockstep system to restart.
It is to be understood that, in some embodiments, the manner of determining the fault type of the first processor may also be any other implementable manner, and when it is determined that the fault type corresponding to the first processor is a negligible fault that does not affect the operation of the system, the recorded master clock fault information may be cleared.
It will be appreciated that prior art DCLS systems are capable of detecting the coincidence of two system outputs and then have a corresponding DCLS output signal indicating whether the outputs are coincident. In the embodiment of the application, the characteristic that the second processor or the redundant core is used for processing the same data in a delayed mode is utilized, and the main clock error can be tolerated to the maximum extent when the main clock fails.
Specifically, in this embodiment of the present application, based on the above scheme, when it is determined that the master clock of the first processor has a fault, it may be first determined whether the fault type of the fault is an ignorable fault that does not affect the system operation, and when it is determined that the fault type of the fault is an ignorable fault, no master clock fault information is output, so that the DCLS system is not restarted due to the fault that does not affect the system operation, the number of times of processing errors of the DCLS system by an external system is effectively reduced, and system resources are saved.
It will be appreciated that the data processing of the second processor may be determined from a ready signal in the output signal of the second processor. Specifically, when the ready signal in the output signal of the second processor is at a high level, it may be determined that the second processor performs data processing, and when it is determined that the second processor performs data processing, the first processor cannot process data due to a fault, so that the master clocks of the first processor and the second processor at the current node will be inconsistent, and the system is affected. Therefore, at this time, it can be determined that the fault is a fault affecting the operation of the system, and master clock fault information is output to the external circuit.
If the second processor determines that the second processor does not perform data processing, it may be verified that the processing mode of the current node may be a mode in which data processing is not performed, such as a sleep mode, that is, the current processing node may be a node corresponding to a state in which the electronic device performs a standby mode, or the like. Therefore, the main clock of the first processor in the current period fails, data is not processed, and system operation is not influenced. The recorded master clock failure information may be cleared at this point.
It will be appreciated that the failure of the master clock of the first processor may be a failure in which the first processor does not receive input data, i.e. the master clock controlling the logic of the first processor fails.
It can be understood that, in some embodiments, two input delay registers are generally disposed in front of the second processor of the dual core lockstep system, so that the time when the input data arrives at the first processor and the second processor is separated by two cycles, and an output delay register is generally disposed behind the second processor, that is, it is necessary to obtain the data processing condition of the second processor for the same data processing after three cycles of the fault occurrence cycle corresponding to the first processor.
It can be understood that the fault handling method in the embodiment of the present application may be used in a dual-core lockstep system, and the dual-core lockstep system may be used in various electronic devices. For example, in the embodiment of the present application, the dual-core lockstep system may be used in various devices with high requirements on safety performance, such as automobiles and industrial devices. The following describes a fault handling method provided in the embodiment of the present application by taking an example in which a dual core lockstep system is used in an industrial device.
For example, the dual-core step locking system is included in the central control device of the production line for manufacturing mechanical parts, the production line is controlled by the dual-core step locking system, if the dual-core step locking system detects that the master clock of the first processor has a fault in the tenth data processing period, the dual-core step locking system records master clock fault information, after a set period, for example, three periods, namely, a thirteenth period, detects that a ready signal in an output signal of the second processor for the same data processing is a low level, it is proved that the second processor does not perform data processing, a received control instruction in the tenth period is to control the production line to be in a standby state, namely, processor data is not processed, at this time, it is determined that the fault corresponding to the first processor does not affect the system operation, and the master clock fault information is cleared.
Before describing the fault handling method provided by the embodiment of the present application in detail, first, a dual-core lockstep system provided by the embodiment of the present application is described. Fig. 2 shows a schematic structural diagram of a dual core lockstep system in an embodiment of the present application, and as shown in fig. 2, the dual core lockstep system includes a first system 200, a second system 300 and a fault detection module 400, where the first system 200 uses a master clock to provide a clock, and the second system 300 uses a redundant clock to provide a clock.
It will be appreciated that the frequency and phase relationships of the primary and redundant clocks are the same. It is understood that the primary clock and the redundant clock are both clock signals, which are the basis of sequential logic, and are quantities of signals having a fixed period and being independent of operation, for determining when the states of various hardware in the system are updated.
The first system 200 comprises a first output delay unit 201, a first comparison unit 202 and a first processor 203, wherein the first output delay unit 201 comprises a first output delay register 2011, a second output delay register 2012 and a third output delay register 2013.
The first processor 203 may be referred to as a master core for executing a setup program to implement the corresponding tasks. The first processor 203 completes the operations of instruction execution, state transition, etc. under the driving of the master clock.
The first output delay unit 201 is configured to delay the output data of the first processor 203, and specifically, the first output delay unit 201 includes three output delay registers, that is, a first output delay register 2011, a second output delay register 2012, and a third output delay register 2013, so that the output data of the first processor 203 can be buffered for three cycles.
The first comparing unit 202 is configured to output the master clock failure information of the first processor 203 after receiving the instruction of outputting the master clock failure information to the external circuit sent by the second comparing unit 303.
After receiving the instruction to clear the master clock fault information sent by the second comparing unit 303, the master clock fault information (prim _ clk _ fault) is cleared.
The second system 300 comprises an input delay unit 301, a second output delay unit 302, a second comparison unit 303 and a second processor 304, wherein the input delay unit 301 comprises a first input delay register 3011 and a second input delay register 3012. The second output delay register 2012 unit includes a fourth output delay register 3021.
The second processor 304 may be referred to as a redundant core for executing the same set-up procedures as the first processor 203 to accomplish the corresponding tasks. The second processor 304 completes the execution of instructions, state transition, etc. under the driving of the redundant clock.
The input delay unit 301 is configured to delay the input data by a set period and send the delayed input data to the second processor 304. For example, the input delay unit includes two input delay registers, i.e., a first input delay register 3011 and a second input delay register 3012, the input delay unit 301 may delay the input data by two cycles before sending the delayed input data to the second processor 304.
A second output delay unit 302 for delaying the output data of the second processor 304.
When the user determines that the master clock of the second processor 304 fails, the second comparing unit 303 outputs corresponding master clock failure information.
The second comparing unit 303 is further configured to determine a fault type corresponding to the first processor 203. If it is determined that the second processor 304 does not process the data, it is determined that the fault corresponding to the first processor 203 is a negligible fault that does not affect the system operation, and then an instruction for deleting the master clock fault information is sent to the first comparing unit 202. If it is determined that the second processor 304 has processed the input data, and it is determined that the fault corresponding to the first processor 203 is a non-negligible fault affecting the operation of the system, an instruction for outputting master clock fault information to an external circuit is sent to the first comparing unit 202.
It is understood that the second comparing unit 303 may determine the data processing condition of the second processor 304 according to the ready signal state in the output signal of the second processor 304. Specifically, when the ready signal in the output signal of the second processor 304 is at a high level, it may be determined that the second processor 304 performs data processing, and when it is determined that the second processor 304 performs data processing, the first processor 203 cannot process data due to a fault, so the master clocks of the first processor 203 and the second processor 304 will be inconsistent, and the system is affected. Therefore, at this time, it can be determined that the fault is a non-negligible fault affecting the operation of the system, and it is determined that the master clock fault information needs to be output to the external circuit.
If the ready signal in the output signal is at a low level, it may be proved that the second processor 304 does not perform data processing, and if it is determined that the second processor 304 does not perform data processing, it may be proved that the processing mode of the current node may be a mode in which data processing is not performed, such as a sleep mode, or the like, that is, the current processing node may be a node corresponding to a state in which the electronic device performs a standby mode or the like. Therefore, the master clock of the first processor 203 in the current cycle fails, and data is not processed and system operation is not affected. At which point it is determined that recorded master clock fault information needs to be cleared.
In the failure detection module 400, a user detects whether the primary clocks of the first processor 203 and the second processor 304 fail, and if the first processor 203 fails, the output primary clock failure information, i.e., a failure signal (prim _ clk _ fault), is transmitted to the first comparison unit 202 and the second comparison unit 303 through the first output delay register 2011, the second output delay register 2012 and the third output delay register 2013.
If the second processor 304 fails, the output master clock failure information or signal (prim _ clk _ fault) is transmitted to the second comparing unit 303 through the fourth output delay register 3021.
Fig. 3 illustrates a work flow diagram of the dual core lockstep system in the embodiment of the present application, and as shown in fig. 4, data input to the first system 200 may be input to the first processor 203, and the first processor 203 may output the input data to the first output delay register 2011 after processing the input data, and output the input data to the first comparing unit 202 after being buffered by the second output delay register 2012 and the third output delay register 2013.
Data input into the second system 300 may pass through the first input delay register 3011 and the second input delay register 3012 and arrive at the second processor 304, and the second processor 304 processes the input data and outputs the processed data to the second comparing unit 303 through the fourth output delay register 3021.
It is to be understood that, as shown in fig. 4, when the master clock of the first processor 203 fails, for example, the master clock of the tenth cycle is not available, i.e., the first processor 203 cannot process the data of the current cycle, and the first input delay register 3011 of the second system 300 receives the input data. And the clock detection module in the fault detection module 400 may determine that the master clock fault occurred in the tenth cycle.
For example, as shown in fig. 5, when the delay of three cycles passing through the first input delay register 3011, the second input delay register 3012 and the fourth output delay register 3021, for example, the output data of the second processor 304 is input to the second comparing unit 303 in the thirteenth cycle, the second comparing unit 303 may determine the fault type corresponding to the first processor 203 according to the data processing condition of the second processor 304. If it is determined that the second processor 304 does not process the input data when the master clock fails in the tenth cycle, and it is determined that the master clock failure in the tenth cycle does not affect the system operation of the first processor 203, an instruction to delete the master clock failure information is sent to the first comparing unit 202, and the detected master clock failure information is cleared. If it is determined that the input data of the tenth cycle when the master clock fails is processed by the second processor 304, and it is determined that the failure corresponding to the first processor 203 is a non-negligible failure affecting the system operation, an instruction for outputting master clock failure information to an external circuit is sent to the first comparing unit 202.
It is understood that the second comparing unit 303 may determine the data processing condition of the second processor 304 according to the ready signal in the output signal of the second processor 304. Specifically, when the ready signal in the output signal of the second processor 304 is at a high level, it may be determined that the second processor 304 performs data processing, and when it is determined that the second processor 304 performs data processing, the first processor 203 cannot process data due to a fault, so the master clocks of the first processor 203 and the second processor 304 may be inconsistent, which may affect the system. Therefore, at this time, it is determined that the fault is a fault affecting the operation of the system, and it is determined that the master clock fault information needs to be output to the external circuit.
If the ready signal in the output signal is at a low level, it may be proved that the second processor 304 does not perform data processing, and if it is determined that the second processor 304 does not perform data processing, it may be proved that the processing mode of the current node may be a mode in which data processing is not performed, such as a sleep mode, or the like, that is, the current processing node may be a node corresponding to a state in which the electronic device performs a standby mode or the like. Therefore, the master clock of the first processor 203 in the current cycle fails, and data is not processed and system operation is not affected. At this point it can be determined that the recorded master clock fault information needs to be cleared.
The following describes in detail a fault handling method in the embodiment of the present application with reference to the dual-core lockstep system mentioned above, fig. 6 shows a schematic flow diagram of a fault handling method in the embodiment of the present application, a fault detection method provided in the embodiment of the present application may be executed by the dual-core lockstep system mentioned above, and as shown in fig. 6, the fault handling method includes:
601: the failure detection module 400 detects a failure of the master clock of the first processor 203.
602: the failure detection module 400 sends master clock failure information to the first comparison unit 202 and the second comparison unit 303.
It can be understood that in the embodiment of the present application, the fault detection module 400 detects whether the clocks of the two processors have faults in real time, and when the fault detection module detects that the master clock of the first processor 203 has faults, it may send a master clock fault message to the first comparison unit 202 and the second comparison unit 303. It is understood that the master clock fault information may be the fault signal prim _ clk _ fault.
It is to be understood that the failure of the master clock of the first processor 203 may be a failure that the first processor 203 does not receive input data, i.e., a failure of the master clock controlling the logic of the first processor 203.
For example, if in the tenth data processing cycle, the failure detection module 400 detects that the master clock of the first processor 203 does not arrive, that is, the first processor 203 cannot process the data of the current cycle, and the first input delay register in the second system receives the data, it can be determined that the master clock of the first processor 203 fails, and the failure detection module 400 records the master clock failure information (prim _ clk _ fault) and sends the master clock failure information to the first comparison unit 202 and the second comparison unit 303.
In some embodiments, the master clock fault information, i.e., the fault signal (prim _ clk _ fault), may be transmitted to the first and second comparison units 202 and 303 via the first, second, and third output delay registers 2011, 2012, and 2013.
603: the second comparing unit 303 acquires a ready signal in the output signal of the second processor 304.
It can be understood that, as shown in fig. 3, the data input into the second system 300 needs to pass through the first input delay register 3011 and the second input delay register 3012 to reach the second processor 304, and the second processor 304 processes the input data and outputs the processed data to the second comparing unit 303 through the fourth output delay register 3021.
That is, if the first processor 203 does not receive data when the master clock of the first processor 203 fails, for example, the clock of the tenth cycle does not come, the output data of the second processor 304, for example, the ready signal in the output signal, can be input to the second comparing unit 303 in the thirteenth cycle.
604: the second comparing unit 303 determines the type of the fault corresponding to the first processor 203 based on the ready signal in the output signal of the second processor 304.
It can be understood that, in the embodiment of the present application, the second comparing unit 303 may determine, through a ready signal in the output signal of the second processor 304, a data processing condition of the second processor 304 on the same data.
It is understood that the same data may refer to the same data input in the same cycle as the first processor 203.
It is understood that, as shown in fig. 3, in some embodiments, two input delay registers are generally disposed in front of the second processor 304 of the dual core lockstep system, and one output delay register is generally disposed behind the second processor 304, such that the time when the input data arrives at the first processor 203 and the second processor 304 is separated by two cycles, that is, it is necessary to obtain the output signal of the second processor 304 after three cycles of the corresponding fault occurrence cycle of the first processor 203, and determine the data processing condition of the same data processing by the second processor 304 according to the ready signal in the output signal.
Specifically, when the ready signal in the output signal of the second processor 304 is at a high level, it may be considered that the second processor 304 performs data processing, and when it is determined that the second processor 304 performs data processing, the first processor 203 cannot process data due to a fault, so the master clocks of the first processor 203 and the second processor 304 may be inconsistent, and the system may be affected. Therefore, at this time, it can be determined that the fault is a fault affecting the operation of the system, and master clock fault information is output to the external circuit.
When the ready signal in the output signal is at a low level, it can be proved that the second processor 304 does not perform data processing, and if it is determined that the second processor 304 does not perform data processing, it can be proved that the processing mode of the current node may be a mode without data processing, such as a sleep mode, that is, the current processing node may be a node corresponding to a state in which the electronic device performs a standby mode or the like. Therefore, the master clock of the first processor 203 in the current cycle fails, and data is not processed and system operation is not affected. The recorded master clock failure information may be cleared at this point.
605: the second comparing unit 303 determines whether the fault type corresponding to the first processor 203 is a non-negligible fault, if so, the process goes to 606, and the second comparing unit 303 sends an instruction for outputting the master clock fault information to the external circuit to the first comparing unit 202; if the determination result is negative, go to 607, and the second comparing unit 303 sends the instruction to clear the master clock fault information to the first comparing unit 202.
It can be understood that, when the second comparing unit 303 determines that the fault type corresponding to the first processor 203 is a non-negligible fault affecting the system operation, it determines that the master clock fault information needs to be output to the external circuit, so that the external circuit controls the dual-core lockstep system to restart. Therefore, an instruction to output the master clock failure information to the external circuit may be sent to the first comparing unit 202, so that the first comparing unit 202 may send the master clock failure information to the external circuit, and the security of the electronic device applying the dual core lockstep system is ensured.
When it is determined that the fault type corresponding to the first processor 203 is a negligible fault type that does not affect the system operation, it may be determined that the master clock fault information needs to be deleted, and then an instruction to clear the master clock fault information may be sent to the first comparing unit 202, so that the first comparing unit 202 deletes the master clock fault information. Therefore, the processing times of the external system to the DCLS system errors are effectively reduced, and the system resources are saved.
606: the second comparing unit 303 sends an instruction to output master clock failure information to the external circuit
A comparison
And (6) a unit 202.
607: the second comparing unit 303 sends an instruction to clear the master clock fault information to the first comparing unit 202.
608: the first comparison unit 202 clears the master clock failure information.
609: the first comparison unit 202 outputs master clock fail information to an external circuit.
It can be understood that the first comparing unit 202 sends the master clock fault information to the external circuit, so that the external circuit can control the dual-core lockstep system to restart, and the security of the electronic device applying the dual-core lockstep system is ensured.
It can be understood that, in the embodiment of the present application, based on the above scheme, when it is determined that the master clock of the first processor has a fault, the type of the fault may be determined first, and when it is determined that the fault is a negligible fault that does not affect the system operation, the master clock fault information is not output, so that the DCLS system is not restarted due to the fault that does not affect the system operation, the number of times of processing errors of the DCLS system by an external system is effectively reduced, and system resources are saved.
Namely, the scheme provided in the embodiment of the application can tolerate the fault that the part of the main clock does not influence the operation of the system.
The application provides an electronic device, which comprises the dual-core step locking system and a main control circuit (namely, a control circuit outside the dual-core step locking system).
The application provides an electronic device, including: the processor is one of the one or more processors of the electronic device and is used for executing the fault handling method.
The application provides a readable storage medium, wherein the readable storage medium stores instructions, and the instructions cause the electronic equipment to execute the fault processing method when executed on the electronic equipment.
The present application provides a computer program product, which includes an execution instruction, and when the execution instruction is executed on an electronic device, the execution instruction causes the electronic device to execute the above fault handling method.
Referring now to FIG. 7, shown is a block diagram of an electronic device 1400 in accordance with one embodiment of the present application. Fig. 7 schematically illustrates an example electronic device 1400 in accordance with various embodiments. In one embodiment, electronic device 1400 may include one or more processors 1404, system control logic 1408 coupled to at least one of processors 1404, system memory 1412 coupled to system control logic 1408, non-volatile memory (NVM) 1416 coupled to system control logic 1408, and a network interface 1420 coupled to system control logic 1408.
In some embodiments, the processor 1404 may employ the dual core lockstep system of embodiments of the present application.
In some embodiments, system control logic 1408 may include any suitable interface controllers to provide any suitable interface to at least one of processors 1404 and/or to any suitable device or component in communication with system control logic 1408.
In some embodiments, system control logic 1408 may include one or more memory controllers to provide an interface to system memory 1412. System memory 1412 may be used to load and store data and/or instructions. Memory 1412 of electronic device 1400 may include any suitable volatile memory, such as suitable Dynamic Random Access Memory (DRAM), in some embodiments.
NVM/memory 1416 may include one or more tangible, non-transitory computer-readable media for storing data and/or instructions. In some embodiments, the NVM/memory 1416 may include any suitable non-volatile memory such as flash memory and/or any suitable non-volatile storage device such as at least one of a HDD (Hard Disk Drive), CD (Compact Disc) Drive, DVD (Digital Versatile Disc) Drive.
The NVM/memory 1416 may comprise a portion of a storage resource on the device on which the electronic device 1400 is installed or it may be accessible by, but not necessarily a part of, the device. For example, the NVM/storage 1416 may be accessible over a network via the network interface 1420.
In particular, system memory 1412 and NVM/storage 1416 may each include: a temporary copy and a permanent copy of instructions 1424. Instructions 1424 may include: instructions that, when executed by at least one of the processors 1404, cause the electronic device 1400 to implement the fault handling methods of the embodiments of the present application. In some embodiments, instructions 1424, hardware, firmware, and/or software components thereof may additionally/alternatively be located in system control logic 1408, network interface 1420, and/or processor 1404.
The network interface 1420 may include a transceiver to provide a radio interface for the electronic device 1400 to communicate with any other suitable devices (e.g., front end modules, antennas, etc.) over one or more networks. In some embodiments, the network interface 1420 may be integrated with other components of the electronic device 1400. For example, network interface 1420 may be integrated with at least one of processor 1404, system memory 1412, nvm/storage 1416, and a firmware device (not shown) having instructions that, when executed by at least one of processors 1404, implement the fault handling method of the embodiments of the present application in electronic device 1400.
Network interface 1420 may further include any suitable hardware and/or firmware to provide a multiple-input multiple-output radio interface. For example, network interface 1420 may be a network adapter, a wireless network adapter, a telephone modem, and/or a wireless modem.
In one embodiment, at least one of the processors 1404 may be packaged together with logic for one or more controllers of system control logic 1408 to form a System In Package (SiP). In one embodiment, at least one of processors 1404 may be integrated on the same die with logic for one or more controllers of system control logic 1408 to form a system on a chip (SoC).
The electronic device 1400 may further include: input/output (I/O) devices 1432. The I/O device 1432 may include a user interface to enable a user to interact with the electronic device 1400; the design of the peripheral component interface enables peripheral components to also interact with the electronic device 1400. In some embodiments, the electronic device 1400 further includes sensors for determining at least one of environmental conditions and location information related to the electronic device 1400.
In some embodiments, the user interface may include, but is not limited to, a display (e.g., a liquid crystal display, a touch screen display, etc.), a speaker, a microphone, one or more cameras (e.g., still image cameras and/or video cameras), a flashlight (e.g., a light emitting diode flash), and a keyboard.
In some embodiments, the peripheral component interfaces may include, but are not limited to, a non-volatile memory port, an audio jack, and a power interface.
In some embodiments, the sensors may include, but are not limited to, a gyroscope sensor, an accelerometer, a proximity sensor, an ambient light sensor, and a positioning unit. The positioning unit may also be part of the network interface 1420 or interact with the network interface 1420 to communicate with components of a positioning network, such as Global Positioning System (GPS) satellites.
Embodiments of the mechanisms disclosed herein may be implemented in hardware, software, firmware, or a combination of these implementations. Embodiments of the application may be implemented as computer programs or program code executing on programmable systems comprising at least one processor, a storage system (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device.
Program code may be applied to input instructions to perform the functions described herein and generate output information. The output information may be applied to one or more output devices in a known manner. For purposes of this application, a processing system includes any system having a processor such as, for example, a Digital Signal Processor (DSP), a microcontroller, an Application Specific Integrated Circuit (ASIC), or a microprocessor.
The program code may be implemented in a high level procedural or object oriented programming language to communicate with a processing system. Program code may also be implemented in assembly or machine language, if desired. Indeed, the mechanisms described in this application are not limited in scope to any particular programming language. In any case, the language may be a compiled or interpreted language.
In some cases, the disclosed embodiments may be implemented in hardware, firmware, software, or any combination thereof. The disclosed embodiments may also be implemented as instructions carried by or stored on one or more transitory or non-transitory machine-readable (e.g., computer-readable) storage media, which may be read and executed by one or more processors. For example, the instructions may be distributed via a network or via other computer readable media. Thus, a machine-readable medium may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer), including, but not limited to, floppy diskettes, optical disks, read-only memories (CD-ROMs), magneto-optical disks, read-only memories (ROMs), random Access Memories (RAMs), erasable programmable read-only memories (EPROMs), electrically erasable programmable read-only memories (EEPROMs), magnetic or optical cards, flash memory, or a tangible machine-readable memory for transmitting information (e.g., carrier waves, infrared digital signals, etc.) using the internet in an electrical, optical, acoustical or other form of propagated signal. Thus, a machine-readable medium includes any type of machine-readable medium suitable for storing or transmitting electronic instructions or information in a form readable by a machine (e.g., a computer).
In the drawings, some features of the structures or methods may be shown in a particular arrangement and/or order. However, it is to be understood that such specific arrangement and/or ordering may not be required. Rather, in some embodiments, the features may be arranged in a manner and/or order different from that shown in the illustrative figures. In addition, the inclusion of a structural or methodical feature in a particular figure is not meant to imply that such feature is required in all embodiments, and in some embodiments, may not be included or may be combined with other features.
It should be noted that, in the embodiments of the apparatuses in the present application, each unit/module is a logical unit/module, and physically, one logical unit/module may be one physical unit/module, or may be a part of one physical unit/module, and may also be implemented by a combination of multiple physical units/modules, where the physical implementation manner of the logical unit/module itself is not the most important, and the combination of the functions implemented by the logical unit/module is the key to solve the technical problem provided by the present application. Furthermore, in order to highlight the innovative part of the present application, the above-mentioned device embodiments of the present application do not introduce units/modules which are not so closely related to solve the technical problems presented in the present application, which does not indicate that no other units/modules exist in the above-mentioned device embodiments.
It is noted that, in the examples and descriptions of this patent, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the use of the verb "comprise a" to define an element does not exclude the presence of another, same element in a process, method, article, or apparatus that comprises the element.
While the present application has been shown and described with reference to certain preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the application.

Claims (10)

1. A fault processing method is applied to electronic equipment, and is characterized in that the electronic equipment comprises a dual-core lockstep system and a main control circuit;
the dual core lockstep system includes a first processor and a second processor executing the same instruction,
the dual-core step locking system detects that a main clock of a first processor at a first data processing node has a fault, and records main clock fault information;
the dual-core step locking system judges the fault type of the first processor;
the dual-core lockstep system clears the main clock fault information corresponding to the fault type of the first processor as a first type of fault, wherein the first type of fault is a fault which does not influence the operation of the dual-core lockstep system;
corresponding to the fault type of the first processor is a second type of fault, the dual-core lockstep system reserves the fault information of the main clock and sends the fault information of the main clock to the main control circuit;
and after receiving the fault information of the main clock, the main control circuit controls the dual-core lockstep system to restart.
2. The method of claim 1, wherein determining the type of failure of the first processor comprises:
judging the working mode of the second processor at a second data processing node corresponding to the first data processing node;
determining the fault as the first type of fault corresponding to the working mode not processing input data;
and determining the fault as the second type of fault corresponding to the working mode which is a mode for processing the input data, wherein the second type of fault is a fault which influences the operation of the dual-core lockstep system.
3. The method of claim 2, wherein determining the operating mode of the second processor at the second data processing node corresponding to the first data processing node comprises:
acquiring a ready signal of the second processor in an output signal of a second data processing node corresponding to the first data processing node;
determining that the operating mode of the second processor at a second data processing node corresponding to the first data processing node is a mode for processing the input data when a ready signal in the output signal is at a high level;
and determining that the working mode of the second processor at a second data processing node corresponding to the first data processing node is a mode in which the input data is not processed under the condition that a ready signal in the output signal is at a low level.
4. The method of claim 1, wherein the first data processing node is separated from the second data processing node by a set period.
5. The method of claim 4, comprising: the set period is two periods.
6. A dual core lockstep system, comprising:
a first processor and a second processor for executing the same instructions;
the fault detection module is used for detecting that the first processor has a fault in a main clock of the first data processing node and recording fault information of the main clock;
the second comparison unit is used for judging the fault type of the first processor;
a first comparing unit, configured to clear the master clock failure information if the type of failure of the first processor is a first type of failure, where the first type of failure is a failure that does not affect the operation of the dual core lockstep system,
the first comparing unit is configured to, when the type of the fault of the first processor is a second type of fault, retain the master clock fault information, and send the master clock fault information to a master control circuit.
7. The dual core lockstep system of claim 6, comprising an input delay register unit for delaying input data by a set period to the second processor.
8. An electronic device, comprising: a memory for storing instructions for execution by one or more processors of the electronic device, and the processor being one of the one or more processors of the electronic device for performing the fault handling method of any of claims 1 to 5.
9. A readable storage medium having stored thereon instructions that, when executed on an electronic device, cause the electronic device to perform the fault handling method of any of claims 1-5.
10. A computer program product comprising execution instructions that, when executed on an electronic device, cause the electronic device to perform the fault handling method of any of claims 1-5.
CN202210897131.9A 2022-07-28 2022-07-28 Fault processing method, dual-core lockstep system, electronic device and medium Pending CN115328668A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210897131.9A CN115328668A (en) 2022-07-28 2022-07-28 Fault processing method, dual-core lockstep system, electronic device and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210897131.9A CN115328668A (en) 2022-07-28 2022-07-28 Fault processing method, dual-core lockstep system, electronic device and medium

Publications (1)

Publication Number Publication Date
CN115328668A true CN115328668A (en) 2022-11-11

Family

ID=83919813

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210897131.9A Pending CN115328668A (en) 2022-07-28 2022-07-28 Fault processing method, dual-core lockstep system, electronic device and medium

Country Status (1)

Country Link
CN (1) CN115328668A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116225810A (en) * 2023-05-04 2023-06-06 无锡国芯微高新技术有限公司 Periodic fault detection and repair framework and detection and repair method for dual-core lockstep
CN117389924A (en) * 2023-12-12 2024-01-12 苏州萨沙迈半导体有限公司 Dual-core lock step circuit and chip equipment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116225810A (en) * 2023-05-04 2023-06-06 无锡国芯微高新技术有限公司 Periodic fault detection and repair framework and detection and repair method for dual-core lockstep
CN117389924A (en) * 2023-12-12 2024-01-12 苏州萨沙迈半导体有限公司 Dual-core lock step circuit and chip equipment
CN117389924B (en) * 2023-12-12 2024-03-01 苏州萨沙迈半导体有限公司 Dual-core lock step circuit and chip equipment

Similar Documents

Publication Publication Date Title
CN115328668A (en) Fault processing method, dual-core lockstep system, electronic device and medium
US7698594B2 (en) Reconfigurable processor and reconfiguration method executed by the reconfigurable processor
TWI553650B (en) Method, apparatus and system for handling data error events with a memory controller
US6026499A (en) Scheme for restarting processes at distributed checkpoints in client-server computer system
JP5203967B2 (en) Method and system usable in sensor networks to handle memory failures
US20080127112A1 (en) Software tracing
CN103761160A (en) Method and apparatus for detecting a fault condition and restoration thereafter using user context information
JP2006259869A (en) Multiprocessor system
CN110413432B (en) Information processing method, electronic equipment and storage medium
US20060236084A1 (en) Method and system for providing an auxiliary bios code in an auxiliary bios memory utilizing time expiry control
CN112000735A (en) Data processing method, device and system
CN113672306B (en) Server component self-checking abnormity recovery method, device, system and medium
US20060136641A1 (en) Context save method, information processor and interrupt generator
JP2018180982A (en) Information processing device and log recording method
CN111208949B (en) Method for determining data rollback time period in distributed storage system
CN113032021B (en) System switching and data processing method, device, equipment and storage medium thereof
CN114461479A (en) Method and device for debugging multimedia processing chip, storage medium and electronic equipment
CN108037942B (en) Adaptive data recovery and update method and device for embedded equipment
US20040078649A1 (en) Computer system
US8893132B2 (en) Information processing apparatus
CN113330411B (en) Storage controller and data relocation monitoring method
Lakshmi et al. Communication Induced Checkpointing based Fault Tolerance Mechanism–A Review and CIAC-FTM Framework in IoT Environment
KR102327192B1 (en) Semiconductor system including fault manager
CN117452455B (en) Method for designing text decoding module of navigation receiver for embedded test
CN108415788B (en) Data processing apparatus and method for responding to non-responsive processing circuitry

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination