CN109032822B - Method and device for storing crash information - Google Patents

Method and device for storing crash information Download PDF

Info

Publication number
CN109032822B
CN109032822B CN201710432510.XA CN201710432510A CN109032822B CN 109032822 B CN109032822 B CN 109032822B CN 201710432510 A CN201710432510 A CN 201710432510A CN 109032822 B CN109032822 B CN 109032822B
Authority
CN
China
Prior art keywords
watchdog
information
condition
cpu
under
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710432510.XA
Other languages
Chinese (zh)
Other versions
CN109032822A (en
Inventor
刘佳妮
周武
王中辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN201710432510.XA priority Critical patent/CN109032822B/en
Publication of CN109032822A publication Critical patent/CN109032822A/en
Application granted granted Critical
Publication of CN109032822B publication Critical patent/CN109032822B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • G06F11/0787Storage of error reports, e.g. persistent data storage, storage using memory protection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/24Resetting means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • G06F11/0754Error or fault detection not based on redundancy by exceeding limits
    • G06F11/0757Error or fault detection not based on redundancy by exceeding limits by exceeding a time limit, i.e. time-out, e.g. watchdogs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a method and a device for storing crash information, wherein the method comprises the following steps: under the condition that the system is determined to be abnormally restarted, the first watchdog is controlled to reset the CPU; after the CPU is reset, setting a first watchdog and a second watchdog as a hardware feeding mode, and storing dead information into a flash memory; the second watchdog is configured as a dongle, and all devices of the whole board are reset if the second watchdog is overtime. The invention effectively solves the technical problem that the crash information cannot be effectively saved under the condition of abnormal crash restarting of the system in the prior art, so that the follow-up failure cannot be effectively analyzed, and achieves the technical effect that the crash information can be effectively saved under the condition of serious crash of the system.

Description

Method and device for storing crash information
Technical Field
The present invention relates to the field of computers, and in particular, to a method and an apparatus for storing crash information.
Background
The crash information plays an important role in analyzing the cause of the failure. In general, the system can save the crash information in the flash when the crash occurs, but sometimes, the crash situation is serious, so that the system cannot respond in time, and the crash information cannot be saved when the crash occurs. When the serious crash occurs, the crash information is lost because the crash information cannot be stored, so that failure analysis cannot be performed and recovery analysis cannot be performed on the failure.
In view of the above problems, no effective solution has been proposed at present.
Disclosure of Invention
The invention provides a method and a device for storing crash information, which are used for solving the technical problem that the crash information cannot be effectively stored under the condition of abnormal crash restarting of a system in the prior art, so that the follow-up failure cannot be effectively analyzed.
In order to solve the above technical problems, in one aspect, the present invention provides a method for storing crash information, including: under the condition that the system is determined to be abnormally restarted, the first watchdog is controlled to reset the CPU; after the CPU is reset, setting a first watchdog and a second watchdog as a hardware feeding mode, and storing dead information into a flash memory; the second watchdog is configured as a dongle, and all devices of the whole board are reset if the second watchdog is overtime.
Optionally, in the case that the system is determined to be abnormally restarted, controlling the first watchdog to reset the CPU includes: under the condition that the system crashes and does not respond, the software stops feeding the software to the first watchdog; and under the condition that the feeding dog is overtime, resetting the CPU by the first watchdog, and setting a first identifier, wherein the first identifier is used for identifying that the system is abnormally restarted.
Optionally, after the CPU resets, setting the first watchdog and the second watchdog as a hardware watchdog feeding mode, and storing the dead information in the flash memory includes: determining whether the first watchdog and the second watchdog are successfully set to store the dead information in the flash memory for the hardware feeding mode; under the condition of unsuccessful, retrying to set the first watchdog and the second watchdog to save the dead information in the flash memory for the hardware feeding mode, and recording the retry times; and under the condition that the retry times exceed a preset threshold value, discarding the first watchdog and the second watchdog to be in a hardware feeding mode.
Optionally, the software feeding is performed by software and the hardware feeding is performed by a programmable logic device.
Optionally, the crash information includes at least one of: memory mirror information, register information for one or more devices in the overall board.
Optionally, in the case that the system is determined to be abnormally restarted, before the first watchdog resets the CPU, the method further includes: in the running process of the system, the software continuously updates the current stack pointer to a preset memory address; after resetting all devices of the whole board, the method further comprises: and checking data information by checking a stack pointer when the machine is halted.
On the other hand, the invention also provides a device for storing the crash information, which comprises: the control module is used for controlling the first watchdog to reset the CPU under the condition that the system is determined to be abnormally restarted; the storage module is used for setting the first watchdog and the second watchdog to be in a hardware feeding mode after the CPU is reset, and storing the dead information into the flash memory; and the resetting module is used for configuring the second watchdog as a software watchdog, and resetting all devices of the whole board under the condition that the second watchdog is overtime.
Optionally, the control module includes: a suspension unit, configured to stop software feeding the first watchdog under a condition that a system crashes and does not respond; and a control unit. And under the condition that the feeding dog is overtime, the first watchdog resets the CPU and sets a first identifier, wherein the first identifier is used for identifying that the system is abnormally restarted.
Optionally, the storage module includes: the determining unit is used for determining whether the first watchdog and the second watchdog are successfully set to save the dead information in the flash memory in a hardware feeding mode; the retry unit is used for retrying setting the first watchdog and the second watchdog to be in a hardware feeding mode under the condition of unsuccessful, saving the dead information in the flash memory, and recording the retry times; and the discarding unit is used for discarding the first watchdog and the second watchdog to be in a hardware feeding mode under the condition that the retry number exceeds a preset threshold value.
Optionally, the device for storing crash information further includes: the updating module is used for continuously updating the current stack pointer to a preset memory address by software in the running process of the system before the first watchdog resets the CPU under the condition that the system is determined to be abnormally restarted; and the checking module is used for checking the data information by checking a stack pointer when the whole device is halted after resetting all the devices of the whole board.
In another aspect, the present invention also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the above method.
The invention has the following beneficial effects: under the condition that the system is determined to be abnormally restarted, by arranging two watchdog, only resetting the CPU is carried out by arranging a first watchdog, so that the system can save the crash information, and then resetting the whole board is realized by a second watchdog, so that the system is restarted, the crash information can be effectively saved under the condition that the system is seriously crashed, the problem that the crash information cannot be effectively saved under the condition that the system is abnormally crashed and restarted in the prior art is solved, the subsequent technical problem that the failure cannot be effectively analyzed is solved, and the technical effect that the crash information can be effectively saved under the condition that the system is seriously crashed is achieved.
Drawings
FIG. 1 is a flow chart of a method for storing crash information in an embodiment of the invention;
FIG. 2 is a block diagram of a device for storing crash information according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a system architecture according to an embodiment of the present invention;
FIG. 4 is a flowchart of a method for saving crash information according to an embodiment of the invention.
Detailed Description
In order to solve the technical problem that in the prior art, under the condition of abnormal crash restarting of a system, crash information cannot be effectively stored, so that faults cannot be effectively analyzed later, the invention provides a method and a device for storing the crash information, and the invention is further described in detail by combining drawings and embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
In this example, as shown in fig. 1, a method for saving crash information is provided, which may include the following steps:
step 101: under the condition that the system is determined to be abnormally restarted, the first watchdog is controlled to reset the CPU;
step 102: after the CPU is reset, setting a first watchdog and a second watchdog as a hardware feeding mode, and storing dead information into a flash memory;
step 103: the second watchdog is configured as a dongle, and all devices of the whole board are reset if the second watchdog is overtime.
In the above example, under the condition that the system is determined to be abnormally restarted, by setting two watchdog, only resetting the CPU is carried out by setting the first watchdog, so that the system can save the crash information, and then resetting the whole board is realized by the second watchdog, so that the restarting of the system is realized, the crash information can be effectively saved under the condition that the system is seriously crashed, the problem that the crash information cannot be effectively saved under the condition that the system is abnormally crashed and restarted in the prior art is solved, the subsequent technical problem that the failure cannot be effectively analyzed is solved, and the technical effect that the crash information can be effectively saved under the condition that the system is seriously crashed is achieved.
Considering that there is a process from the system crash to the restart, in which a plurality of devices are required to cooperate, in order to achieve effective recording of system information, so that each component can cooperate effectively, this object can be achieved by setting a flag. In one embodiment, when the system is determined to be abnormally restarted, and the first watchdog is controlled to reset the CPU, software can stop feeding the first watchdog under the condition that the system is dead and unresponsive; and under the condition that the feeding dog is overtime, resetting the CPU by the first watchdog, and setting a first identifier, wherein the first identifier is used for identifying that the system is abnormally restarted.
Specifically, the programmable logic device may provide two registers: a crash_flag register for recording whether there is an abnormal restart phenomenon; and a dump_retry register for recording the number of attempts to enter dump mode. When the abnormal restart condition is determined, an identifier is recorded in the crash_flag register and is used for identifying that the system is abnormally restarted. For example, the crash_flag register initial value may be set to 0; wherein, for the crash_flag register, 0 indicates a normal restart of the system, and 1 indicates a abnormal restart of the system.
In order to realize effective storage of the crash information, the two new products can be connected through a programmable logic device by arranging a first watchdog chip and a second watchdog chip, wherein the first watchdog chip is connected to a CPU reset signal, and the second watchdog chip is connected to a whole board reset signal. If the first watchdog chip is restarted, only the CPU will be restarted, and other devices can be kept in a state before reset.
Considering that when there is a special restart, it is desirable to perform a state or mode of saving crash information, but in a situation that the system tries to enter the mode all the time, the recognition may cause confusion of the system, and considering that a retry number may be set, if the retry number is exceeded, the system may be abandoned to ensure that the system can execute orderly. In one embodiment, after the CPU is reset, setting the first watchdog and the second watchdog to be in a hardware feeding mode, and storing the dead machine information in the flash memory may include: determining whether the first watchdog and the second watchdog are successfully set to store the dead information in the flash memory for the hardware feeding mode; under the condition of unsuccessful, retrying to set the first watchdog and the second watchdog to save the dead information in the flash memory for the hardware feeding mode, and recording the retry times; and under the condition that the retry times exceed a preset threshold value, discarding the first watchdog and the second watchdog to be in a hardware feeding mode. That is, it is determined whether the exception handling mode is successfully entered, and if the exception handling mode is not entered for a plurality of attempts, the exception handling mode is abandoned.
When the watchdog feeding is performed, the software feeding can be performed through software, and the hardware feeding can be performed through the programmable logic device, namely, different feeding modes can be selected according to the needs.
The crash information described above may include, but is not limited to, at least one of: memory mirror information, register information for one or more devices in the overall board.
In order to realize the reading and viewing of the data, the method can be carried out by setting a stack pointer, and specifically, before the first watchdog resets the CPU, the software continuously updates the current stack pointer to a preset memory address in the running process of the system under the condition that the system is determined to be abnormally restarted; thus, after all devices of the whole board are reset, the data information can be checked by checking a stack pointer when the device is in a dead state. For example, stack backtracking, variable viewing, code segment and data information can be performed by viewing stack fingerprints at the time of crash. By looking at the register information of the critical devices, the status of the problem module can be analyzed.
The method for storing the crash information provided by the embodiment can be applied to the embedded field and can be written through hardware design and programmable logic devices.
Based on the same inventive concept, the embodiment of the invention also provides a device for storing crash information, as described in the following embodiments. Because the principle of solving the problem of the dead halt information storage device is similar to that of the dead halt information storage method, the implementation of the dead halt information storage device can refer to the implementation of the dead halt information storage method, and repeated parts are not repeated. As used below, the term "unit" or "module" may be a combination of software and/or hardware that implements the intended function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated. Fig. 2 is a block diagram of a device for storing crash information according to an embodiment of the present invention, and as shown in fig. 2, may include: the control module 201, the save module 202, and the reset module 203 are described below.
The control module 201 is configured to control the first watchdog to reset the CPU if it is determined that the system is abnormally restarted;
the saving module 202 is configured to set the first watchdog and the second watchdog to be in a hardware feeding mode after the CPU is reset, and save the dead information to the flash memory;
and the resetting module 203 is configured to configure the second watchdog as a software watchdog, and reset all devices of the whole board if the second watchdog is overtime.
In one embodiment, the control module 201 may include: a suspension unit, configured to stop software feeding the first watchdog under a condition that a system crashes and does not respond; and a control unit. And under the condition that the feeding dog is overtime, the first watchdog resets the CPU and sets a first identifier, wherein the first identifier is used for identifying that the system is abnormally restarted.
In one embodiment, the save module 202 may include: the determining unit is used for determining whether the first watchdog and the second watchdog are successfully set to save the dead information in the flash memory in a hardware feeding mode; the retry unit is used for retrying setting the first watchdog and the second watchdog to be in a hardware feeding mode under the condition of unsuccessful, saving the dead information in the flash memory, and recording the retry times; and the discarding unit is used for discarding the first watchdog and the second watchdog to be in a hardware feeding mode under the condition that the retry number exceeds a preset threshold value.
In one embodiment, the apparatus for storing crash information may further include: the updating module is used for continuously updating the current stack pointer to a preset memory address by software in the running process of the system before the first watchdog resets the CPU under the condition that the system is determined to be abnormally restarted; and the checking module is used for checking the data information by checking a stack pointer when the whole device is halted after resetting all the devices of the whole board.
In this example, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of:
s1: under the condition that the system is determined to be abnormally restarted, the first watchdog is controlled to reset the CPU;
s2: after the CPU is reset, setting a first watchdog and a second watchdog as a hardware feeding mode, and storing dead information into a flash memory;
s3: the second watchdog is configured as a dongle, and all devices of the whole board are reset if the second watchdog is overtime.
Namely, under the condition that the system is determined to be abnormally restarted, by arranging two watchdog, only the CPU is reset through the first watchdog, so that the system can save the dead halt information, and then the reset of the whole board is realized through the second watchdog, so that the restarting of the system is realized, the dead halt information can be effectively saved under the condition that the system is seriously halted, the problem that the dead halt information cannot be effectively saved under the condition that the system is abnormally halted and restarted in the prior art is solved, the subsequent technical problem that the failure cannot be effectively analyzed is solved, and the technical effect that the dead halt new hiccup can be effectively saved under the condition that the system is seriously halted is achieved.
Considering that there is a process from the system crash to the restart, in which a plurality of devices are required to cooperate, in order to achieve effective recording of system information, so that each component can cooperate effectively, this object can be achieved by setting a flag. In one embodiment, when the system is determined to be abnormally restarted, and the first watchdog is controlled to reset the CPU, software can stop feeding the first watchdog under the condition that the system is dead and unresponsive; and under the condition that the feeding dog is overtime, resetting the CPU by the first watchdog, and setting a first identifier, wherein the first identifier is used for identifying that the system is abnormally restarted.
Specifically, the programmable logic device may provide two registers: a crash_flag register for recording whether there is an abnormal restart phenomenon; and a dump_retry register for recording the number of attempts to enter dump mode. When the abnormal restart condition is determined, an identifier is recorded in the crash_flag register and is used for identifying that the system is abnormally restarted. For example, the crash_flag register initial value may be set to 0; wherein, for the crash_flag register, 0 indicates a normal restart of the system, and 1 indicates a abnormal restart of the system.
In order to realize effective storage of the crash information, the two new products can be connected through a programmable logic device by arranging a first watchdog chip and a second watchdog chip, wherein the first watchdog chip is connected to a CPU reset signal, and the second watchdog chip is connected to a whole board reset signal. If the first watchdog chip is restarted, only the CPU will be restarted, and other devices can be kept in a state before reset.
Considering that when there is a special restart, it is desirable to perform a state or mode of saving crash information, but in a situation that the system tries to enter the mode all the time, the recognition may cause confusion of the system, and considering that a retry number may be set, if the retry number is exceeded, the system may be abandoned to ensure that the system can execute orderly. In one embodiment, after the CPU is reset, setting the first watchdog and the second watchdog to be in a hardware feeding mode, and storing the dead machine information in the flash memory may include: determining whether the first watchdog and the second watchdog are successfully set to store the dead information in the flash memory for the hardware feeding mode; under the condition of unsuccessful, retrying to set the first watchdog and the second watchdog to save the dead information in the flash memory for the hardware feeding mode, and recording the retry times; and under the condition that the retry times exceed a preset threshold value, discarding the first watchdog and the second watchdog to be in a hardware feeding mode. That is, it is determined whether the exception handling mode is successfully entered, and if the exception handling mode is not entered for a plurality of attempts, the exception handling mode is abandoned.
When the watchdog feeding is performed, the software feeding can be performed through software, and the hardware feeding can be performed through the programmable logic device, namely, different feeding modes can be selected according to the needs.
The crash information described above may include, but is not limited to, at least one of: memory mirror information, register information for one or more devices in the overall board.
In order to realize the reading and viewing of the data, the method can be carried out by setting a stack pointer, and specifically, before the first watchdog resets the CPU, the software continuously updates the current stack pointer to a preset memory address in the running process of the system under the condition that the system is determined to be abnormally restarted; thus, after all devices of the whole board are reset, the data information can be checked by checking a stack pointer when the device is in a dead state. For example, stack backtracking, variable viewing, code segment and data information can be performed by viewing stack fingerprints at the time of crash. By looking at the register information of the critical devices, the status of the problem module can be analyzed.
The method and apparatus for storing crash information are described below with reference to a specific embodiment, however, it should be noted that the specific embodiment is only for better explaining the present application and is not meant to be unduly limiting.
In order to solve the problem that the dead halt information cannot be recorded due to the fact that a CPU thoroughly loses sound in the existing dead halt processing process, in the embodiment, the dead halt information is recorded by recovering the system when the system completely loses response, and the purpose of recording the dead halt information when a serious dead halt condition occurs is achieved.
When the system is in normal operation, the pointer of the current stack is continuously updated into the memory, and when the system needs to be restarted, an abnormal processing mode (called as a DUMP mode for short) is entered, and crash information is stored in the abnormal processing mode, wherein the crash information can include: complete memory mirroring, critical device register state information, etc.
Based on the above-mentioned conception, in this example, a method for saving crash information is provided, so that the crash information can be saved for fault analysis in the case of serious crash phenomenon without affecting normal restarting of the system. The method can comprise the following steps:
s1: the programmable logic device connects two watchdog chips, a first watchdog chip connected to the CPU reset signal and a second watchdog chip connected to the reset signal of the whole board (i.e., the reset signal containing the CPU and all other devices). If the first watchdog chip is restarted, then only the CPU will be restarted, and the state of other devices (e.g., DDR, DSP, etc.) can remain in the pre-reset state.
S2: the configuration of the watchdog chip is controlled by a programmable logic device. The programmable logic device provides two registers: a crash_flag register for recording whether there is an abnormal restart phenomenon; and a dump_retry register for recording the number of attempts to enter dump mode.
The initial value of the programmable logic device is configured to: the first watchdog chip and the second watchdog chip are configured as hardware dogs, and the programmable logic device is used for feeding dogs; the initial value of the crash_flag register is 0; the dump_retry register initial value is 0. Wherein, for the crash_flag register, 0 indicates a normal restart of the system, and 1 indicates a abnormal restart of the system.
S3: and in the boot stage of the system, before the code is migrated to the memory, reading a crash_flag register, and judging whether an abnormal restart exists. crash_flag=1, indicating that the last restart of the system is an abnormal restart, then an exception handling mode (i.e., DUMP mode) is entered, and step S4 is entered; crash_flag=0, which indicates that the last restart of the system is a normal restart, and continues normal operation, and step S5 is entered: . The code is judged before being migrated to the memory, so that the boot can be prevented from modifying the memory content, namely, if the memory content is caused by abnormal restart in the last restart, the memory content is the same as the abnormal memory content.
S4: attempting to enter DUMP mode:
1) If the DUMP mode is successful, both the first and second watchdog chips are configured as hardware dogs, dump_retry=0 is set, and crash information is saved (e.g.: complete memory mirror and critical device register state, etc.) into flash. After the save operation is finished, the programmable logic device register crash_flag=0 is configured, and the first watchdog chip is configured as a dongle, all devices of the whole board are reset, and step S3 is re-entered. At this time, the crash information (i.e., the complete memory image, etc.) is the same as when the exception occurred.
2) If the DUMP mode is failed, the programmable logic device is not restarted at the moment, the first watchdog chip is a software watchdog, and the second watchdog chip is a hardware watchdog. Therefore, after the restart time is fixed (different devices are selected, the restart time is different), the CPU restarts again, and the step S3 is restarted, where crash_flag=1 indicates that the retry is to enter the DUMP mode. The number of retries is recorded to dump_retry. When the number of retries is greater than 3, the programmable logic device register crash_flag=0 and dump_retry=0 may be configured, indicating that the system foregoes entering DUMP mode.
S5: the boot continues to run normally, and the boot phase configures crash_flag=1. Waiting until the software can perform a software feeding stage (generally, a kernel stage), two watchdog chips are configured, the first watchdog chip is configured to perform software feeding, the second watchdog chip is configured to perform hardware feeding, the programmable logic device performs feeding, and when the current execution function is put on stack, a stack pointer is continuously updated and stored in a fixed memory address.
S6: the following three restart conditions may be encountered during operation:
1) And (5) restarting after dead halt: the first watchdog chip is not fed with software, after the restart time is fixed (different devices are selected according to the requirement, the restart time is different), the other devices are not restarted, and the other devices except the CPU are kept in the state before restarting, in this case, the register information of the memory and the key devices is mainly kept. The process will return to step S3, and since crash_flag=1 at this time, the DUMP mode will eventually be entered to save the crash information.
2) And (3) manual restarting: it can be considered as normal restart, when all devices of the whole board are reset by software configuration crash_flag=0, and the process returns to step S3.
3) And (5) restarting after power failure: it can be considered a normal restart and all devices of the whole board will be powered down and restarted, and will return to step s3.
By the method for storing the crash information, the crash information can be effectively stored under the condition that the system is not in response to the crash.
The following is a specific example:
in this example, as shown in fig. 3. In addition to using the system main body, it is mainly composed of a programmable logic device (which may be a device such as a CPLD or an FPGA, hereinafter simply referred to as a logic device) and two watchdog chips (hereinafter simply referred to as a watchdog 1 and a watchdog 2).
The logic device outputs two paths of watchdog input signals (MDI_1 and MDI_2), which are respectively input to watchdog input (MDI) pins of the two watchdog chips, wherein an output REST pin of the watchdog 1 is connected to a CPU RESET signal (CPU RESET) of the System, and an output RESET pin (RESET) of the watchdog 2 is connected to a whole board RESET signal (System RESET) of the System.
In this example, taking the case that the crash abnormality occurs and the system does not respond as an example, the method for saving the crash information specifically includes the following steps, as shown in fig. 4:
step S1: in the running process of the system, the software continuously updates the current stack pointer to a fixed memory address, and when the system is dead and has no response, the software stops feeding dogs, and after the feeding dogs are overtime, the watchdog 1 resets the CPU. At this time crash_flag=1.
Step S2: after the CPU is reset, the boot stage is entered, and the crash_flag is read before the code is migrated to the memory, because crash_flag=1, the system is indicated to be abnormally restarted, and an exception handling mode (DUMP mode) is entered.
Step S3: and entering a DUMP mode, configuring the watchdog 1 and the watchdog 2 into a hardware feed mode, wherein dump_retry=0, and storing complete memory mirror information and key device (DSP and the like) register information into a flash. After the save operation is completed, the logical device register crash_flag=0 is written, and watchdog 2 is configured as a software watchdog, and after the watchdog times out, the system resets all devices of the whole board.
Step S4: after the whole board is reset, the system is restarted, a boot stage is entered, and the crash_flag is read before the code is migrated to the memory, wherein the crash_flag=0 indicates that the system enters a normal starting stage.
Step S5: after the system is started normally, the stored crash information can be checked through a read flash. Stack backtracking, variable checking, code segment information, data information and the like can be performed by checking a stack pointer during crash; by looking at the register information of the critical devices, the status of the problem module can be analyzed.
In the above example, the problem that the dead halt information is lost under the condition that the existing system is dead halt and is not in reaction at all and dead halt information is not stored is solved through hardware design and software writing, and the normal operation of the system is not influenced when the system is realized, so that the realization is simpler.
It will be apparent to those skilled in the art that the modules or steps of the embodiments of the invention described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may alternatively be implemented in program code executable by computing devices, so that they may be stored in a storage device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than what is shown or described, or they may be separately fabricated into individual integrated circuit modules, or a plurality of modules or steps in them may be fabricated into a single integrated circuit module. Thus, embodiments of the invention are not limited to any specific combination of hardware and software.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, and various modifications and variations can be made to the embodiments of the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (9)

1. A method for storing crash information, comprising:
under the condition that the system is determined to be abnormally restarted before the code is migrated to the memory, only the first watchdog in the first watchdog and the second watchdog is controlled to reset the CPU; wherein the first watchdog is connected to a CPU reset signal, and the second watchdog is connected to a whole board reset signal;
after the CPU is reset, setting a first watchdog and a second watchdog as a hardware feeding mode, and storing dead information into a flash memory;
configuring the second watchdog as a dongle, and resetting all devices of the whole board under the condition that the second watchdog is overtime;
in the case that the system is determined to be abnormally restarted before the code is migrated to the memory, controlling only the first watchdog in the first watchdog and the second watchdog to reset the CPU comprises:
under the condition that the system crashes and does not respond, software stops feeding the first watchdog;
and under the condition that the feeding dog is overtime, resetting the CPU by the first watchdog, and setting a first identifier, wherein the first identifier is used for identifying that the system is abnormally restarted.
2. The method of claim 1, wherein setting the first watchdog and the second watchdog in a hardware-fed mode after the CPU resets, and wherein saving the dead-time information to the flash memory comprises:
determining whether the first watchdog and the second watchdog are successfully set to store the dead information in the flash memory for the hardware feeding mode;
under the condition of unsuccessful, retrying to set the first watchdog and the second watchdog to save the dead information in the flash memory for the hardware feeding mode, and recording the retrying times;
and under the condition that the retry times exceed a preset threshold value, discarding the first watchdog and the second watchdog to be in a hardware feeding mode.
3. The method of claim 1, wherein the dongle is implemented by software and the hardware dongle is implemented by a programmable logic device.
4. The method of claim 1, wherein the crash information comprises at least one of: memory mirror information, register information for one or more devices in the overall board.
5. The method of claim 1, wherein, in the event that the system is determined to be abnormally restarted, the first watchdog resets the CPU, the method further comprising:
in the running process of the system, the software continuously updates the current stack pointer to a preset memory address;
after resetting all devices of the whole board, the method further comprises:
and checking data information by checking a stack pointer when the machine is halted.
6. A crash information storage device, comprising:
the control module is used for controlling the first watchdog in the first watchdog and the second watchdog to reset the CPU only under the condition that the system is determined to be abnormally restarted before the code is migrated to the memory; wherein the first watchdog is connected to a CPU reset signal, and the second watchdog is connected to a whole board reset signal;
the storage module is used for setting the first watchdog and the second watchdog to be in a hardware feeding mode after the CPU is reset, and storing the dead information into the flash memory;
the reset module is used for configuring the second watchdog as a software feeding dog, and resetting all devices of the whole board under the condition that the second watchdog feeds out time;
the control module includes:
the suspension unit is used for stopping the software feeding of the first watchdog under the condition that the system crashes and does not respond;
and the control unit is used for resetting the CPU by the first watchdog under the condition of overtime of feeding the dog, and setting a first identifier, wherein the first identifier is used for identifying that the system is abnormally restarted.
7. The apparatus of claim 6, wherein the means for storing comprises:
the determining unit is used for determining whether the first watchdog and the second watchdog are successfully set to save the dead information in the flash memory in a hardware feeding mode;
the retry unit is used for retrying setting the first watchdog and the second watchdog to be in a hardware feeding mode, saving the dead information to the flash memory and recording the retry times under the condition of unsuccessful;
and the discarding unit is used for discarding the first watchdog and the second watchdog to be in a hardware feeding mode under the condition that the retry number exceeds a preset threshold value.
8. The apparatus as recited in claim 6, further comprising:
the updating module is used for continuously updating the current stack pointer to a preset memory address by software in the running process of the system before the first watchdog resets the CPU under the condition that the system is determined to be abnormally restarted;
and the checking module is used for checking the data information by checking a stack pointer when the whole device is halted after resetting all the devices of the whole board.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the method according to any one of claims 1 to 5.
CN201710432510.XA 2017-06-09 2017-06-09 Method and device for storing crash information Active CN109032822B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710432510.XA CN109032822B (en) 2017-06-09 2017-06-09 Method and device for storing crash information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710432510.XA CN109032822B (en) 2017-06-09 2017-06-09 Method and device for storing crash information

Publications (2)

Publication Number Publication Date
CN109032822A CN109032822A (en) 2018-12-18
CN109032822B true CN109032822B (en) 2024-01-09

Family

ID=64628786

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710432510.XA Active CN109032822B (en) 2017-06-09 2017-06-09 Method and device for storing crash information

Country Status (1)

Country Link
CN (1) CN109032822B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109739675A (en) * 2018-12-24 2019-05-10 深圳航天东方红海特卫星有限公司 A method of program exception is captured using hardware watchdog
CN109783267A (en) * 2019-01-17 2019-05-21 广东小天才科技有限公司 A kind of method and system solving downloading mode exception
CN109828858A (en) * 2019-01-17 2019-05-31 广东小天才科技有限公司 A kind of method and system for preventing system boot stuck
CN113010336A (en) * 2019-12-20 2021-06-22 珠海全志科技股份有限公司 Application processor crash field debugging method and application processor
CN112068980B (en) * 2020-09-18 2023-06-23 展讯通信(上海)有限公司 Method and device for sampling information before CPU suspension, equipment and storage medium
CN114741233A (en) * 2020-12-23 2022-07-12 华为技术有限公司 Quick start method
CN113535448B (en) * 2021-06-30 2024-04-26 浙江中控技术股份有限公司 Multiple watchdog control method and control system thereof
CN113946148B (en) * 2021-09-29 2023-11-10 浙江零跑科技股份有限公司 MCU chip awakening system based on multi-ECU cooperative control
CN114911642B (en) * 2022-04-27 2024-04-19 北京计算机技术及应用研究所 Firmware restarting method based on UEFI event mechanism and watchdog
CN115061752A (en) * 2022-06-28 2022-09-16 展讯通信(上海)有限公司 Terminal equipment restarting method and device
CN115904793B (en) * 2023-03-02 2023-05-23 上海励驰半导体有限公司 Memory transfer method, system and chip based on multi-core heterogeneous system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1400529A (en) * 2001-07-30 2003-03-05 华为技术有限公司 Fault location method of real-time embedding system
CN101369237A (en) * 2007-08-14 2009-02-18 中兴通讯股份有限公司 Watchdog reset circuit and reset method
CN102521098A (en) * 2011-11-23 2012-06-27 中兴通讯股份有限公司 Processing method and processing device for monitoring dead halt of CPU (Central Processing Unit)
US9274894B1 (en) * 2013-12-09 2016-03-01 Twitter, Inc. System and method for providing a watchdog timer to enable collection of crash data
CN106326055A (en) * 2016-08-29 2017-01-11 四川九洲空管科技有限责任公司 Method for software and hardware crashing detection and resetting of airborne collision avoidance system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1400529A (en) * 2001-07-30 2003-03-05 华为技术有限公司 Fault location method of real-time embedding system
CN101369237A (en) * 2007-08-14 2009-02-18 中兴通讯股份有限公司 Watchdog reset circuit and reset method
CN102521098A (en) * 2011-11-23 2012-06-27 中兴通讯股份有限公司 Processing method and processing device for monitoring dead halt of CPU (Central Processing Unit)
US9274894B1 (en) * 2013-12-09 2016-03-01 Twitter, Inc. System and method for providing a watchdog timer to enable collection of crash data
CN106326055A (en) * 2016-08-29 2017-01-11 四川九洲空管科技有限责任公司 Method for software and hardware crashing detection and resetting of airborne collision avoidance system

Also Published As

Publication number Publication date
CN109032822A (en) 2018-12-18

Similar Documents

Publication Publication Date Title
CN109032822B (en) Method and device for storing crash information
US9471435B2 (en) Information processing device, information processing method, and computer program
US8468389B2 (en) Firmware recovery system and method of baseboard management controller of computing device
US7890800B2 (en) Method, operating system and computing hardware for running a computer program
US20150149815A1 (en) Bios failover update with service processor having direct serial peripheral interface (spi) access
EP3770765B1 (en) Error recovery method and apparatus
US7103738B2 (en) Semiconductor integrated circuit having improving program recovery capabilities
US20120042215A1 (en) Request processing system provided with multi-core processor
US20080229158A1 (en) Restoration device for bios stall failures and method and computer program product for the same
WO2016206514A1 (en) Startup processing method and device
US7194614B2 (en) Boot swap method for multiple processor computer systems
TWI759719B (en) Flash memory controller and method used in flash memory controller
US20210124655A1 (en) Dynamic Configurable Microcontroller Recovery
US10108469B2 (en) Microcomputer and microcomputer system
US10360115B2 (en) Monitoring device, fault-tolerant system, and control method
JP6599725B2 (en) Information processing apparatus, log management method, and computer program
US20160179626A1 (en) Computer system, adaptable hibernation control module and control method thereof
CN115904793B (en) Memory transfer method, system and chip based on multi-core heterogeneous system
CN115576734B (en) Multi-core heterogeneous log storage method and system
CN116266150A (en) Service recovery method, data processing unit and related equipment
US10540222B2 (en) Data access device and access error notification method
US10108499B2 (en) Information processing device with watchdog timer
CN104572332A (en) Method and device for processing system crash
JP2785992B2 (en) Server program management processing method
CN108415788B (en) Data processing apparatus and method for responding to non-responsive processing circuitry

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant