CN115373997A - Board card firmware abnormity monitoring and core data exporting method of multi-core SoC - Google Patents

Board card firmware abnormity monitoring and core data exporting method of multi-core SoC Download PDF

Info

Publication number
CN115373997A
CN115373997A CN202211031997.8A CN202211031997A CN115373997A CN 115373997 A CN115373997 A CN 115373997A CN 202211031997 A CN202211031997 A CN 202211031997A CN 115373997 A CN115373997 A CN 115373997A
Authority
CN
China
Prior art keywords
core
host
core data
soc
monitoring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211031997.8A
Other languages
Chinese (zh)
Inventor
王思瑶
王磊
孙明刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd
Original Assignee
Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd filed Critical Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd
Priority to CN202211031997.8A priority Critical patent/CN115373997A/en
Publication of CN115373997A publication Critical patent/CN115373997A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/362Software debugging
    • G06F11/3636Software debugging by tracing the execution of the program
    • G06F11/364Software debugging by tracing the execution of the program tracing values on a bus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • G06F13/4063Device-to-bus coupling
    • G06F13/4068Electrical coupling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2213/00Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F2213/0026PCI express

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides a method for monitoring card firmware abnormity and exporting core data of a multi-core SoC, which comprises the following steps: carrying out abnormity monitoring on the board card firmware in real time, and sending an abnormity notification report alarm to the host after abnormity is generated; the host side performs response feedback after preprocessing operation, and each abnormal monitoring module stores the core data of each abnormal monitoring module in a specific DDR area; and after resetting the system again, sending a request to pull the core data to the host, feeding back an information data request by the host, and pulling and exporting the core data for analysis. The system is matched with a hardware watchdog circuit, multiple subsystems on the SoC feed dogs together, monitoring and notification of abnormal conditions of the subsystems are achieved, abnormality of the host caused by abnormality of board card firmware is avoided, core data of the submodules are stored in a specific DDR area, and after the system is reset, the core data can be pulled and exported by the host for analysis.

Description

Board card firmware abnormity monitoring and core data exporting method of multi-core SoC
Technical Field
The invention relates to the technical field of chips, in particular to a method for monitoring the abnormity of a board card firmware of a multi-core SoC and exporting core data.
Background
Currently, with the research and application of SoC technology for system on chip, there is an unprecedented revolution in the field of microelectronics and its applications. In SoC design, the designer is no longer oriented to a circuit chip, but rather to an IP block library that can implement the design functions. The system-on-chip design technology based on the IP core enables the design method to be changed from the traditional circuit-level design to the system-level design. The multi-core SoC is a typical application in the current SoC system, the multi-core system also means the existence of multiple systems, the exception of any system can cause the exception of the whole SoC, and how to monitor the exception and retain the core data of the exception site is an important function in the multi-core SoC trace debugging.
The existing abnormal detection mechanism has the mode of signal capture and analysis by hardware circuit sampling in the debugging stage, has the mode of printing logs at abnormal branches for subsequent analysis, and has the mode of storing core data information generated by abnormality in a storage medium on a chip and triggering the export of the core data by artificially inserting a special storage device. The first method is not very feasible for analyzing the online product problem, and the second method can locate the part problem through log analysis, but the export of the SoC log on the board firmware usually needs human intervention, the timeliness is not good enough, and the location problem in a large number of logs is difficult. The third method is to export the core file by means of human external equipment, which is difficult to operate for the board card firmware existing in the server chassis.
Therefore, a technology for automatically monitoring the abnormality and automatically exporting the core data of the abnormal site is urgently needed.
Disclosure of Invention
In view of the above, the present invention provides a method for board firmware abnormality monitoring and core data export of a multi-core SoC, which performs automatic board firmware abnormality monitoring and core data export based on the multi-core SoC, and implements monitoring and notification of abnormality of each subsystem by feeding a plurality of subsystems on the SoC together in cooperation with a hardware watchdog circuit. After the abnormity occurs, the host end can perform some preprocessing operations by reporting an alarm to the host, so that the abnormity of the host caused by the abnormity of the board card firmware is avoided. Each submodule stores the respective core data in a specific DDR area, and after the system is reset, the host computer can pull out and export the core data for analysis.
Based on the above object, in one aspect, the present invention provides a method for monitoring card firmware exception and exporting core data of a multi-core SoC, wherein the method includes the following steps:
carrying out abnormity monitoring on the board card firmware in real time, and sending an abnormity notification report alarm to the host after abnormity is generated;
the host side carries out response feedback after carrying out preprocessing operation, and each abnormal monitoring module stores the core data in a specific DDR region;
and after resetting the system again, sending a request to pull the core data to the host, feeding back an information data request by the host, and pulling and exporting the core data for analysis.
As a further scheme of the invention, the multi-core SoC has a plurality of CPUs corresponding to a plurality of subsystems, data interaction is carried out among the subsystems, when one system fails, the SoC responds to record and alarm the abnormality and dumps the site snapshot information of the abnormality.
As a further scheme of the invention, the system is a board firmware of the multi-core SoC, the board firmware is PCIe equipment on a mainboard, and the board firmware is communicated with the host through an NVMe protocol.
As a further scheme of the invention, when multiple subsystems are monitored, one hardware watchdog is adopted to monitor the multiple subsystems in the SoC, the multiple subsystems feed dogs regularly, and when one subsystem does not feed dogs timely, the system fault is considered.
As a further scheme of the invention, when the watchdog counter overflows, the alarm is sent to the host computer and the core data of the system is stored, and the operation steps are as follows:
when a certain module does not carry out dog feeding, the watchdog overflows, and WDT Pre-warming interruption is generated to inform each subsystem;
one subsystem is responsible for the communication with the host computer, send the abnormal information of the system to the host computer side, request the host computer to preprocess;
the host end driving program carries out corresponding preprocessing and returns to the success of the processing;
each subsystem completes the collection and organization of core data in the corresponding interrupt processing and stores the core data in a specified DDR space;
WDT Pre-warming generates a counting time to WDT Reset, and after the system is Reset, each subsystem is reloaded from bootloader;
and detecting the system reset reason at the bootloader stage, and pulling the core data from the corresponding DDR space to a host side fixed path by the host side.
As a further scheme of the invention, in the board card firmware abnormity monitoring and core data export method of the multi-core SoC, the system also supports the active triggering of each subsystem or the active issuing of commands by a Host terminal to generate and export core files.
As a further scheme of the invention, the Host end is matched with a tool of the Host end to actively trigger, the tool issues a PCIe (peripheral component interface express) command to the board firmware, and the board firmware receives the command and actively configures the watchdog timer into an overflow state to generate core data and export the core data.
As a further scheme of the present invention, in the board firmware anomaly monitoring and core data export method of the multi-core SoC, one hardware watchdog is used to monitor a plurality of subsystems.
As a further scheme of the invention, a notice is sent to the host before the system abnormal reset, and the host is required to stop the communication with the board firmware.
As a further scheme of the invention, the board card firmware abnormity monitoring and core data export method of the multi-core SoC supports the active triggering of the generation and export of core data at the host.
In another aspect of the present invention, a computer-readable storage medium is further provided, which stores computer program instructions, and when the computer program instructions are executed, the method for monitoring board firmware abnormality and deriving core data of a multi-core SoC according to any of the above embodiments is implemented.
In another aspect of the present invention, a computer device is further provided, which includes a memory and a processor, where the memory stores a computer program, and the computer program, when executed by the processor, performs any one of the board firmware abnormality monitoring and core data deriving methods of the multi-core SoC according to the present invention.
On the other hand, the invention also provides a chip for flow control according to any one of the board card firmware abnormity monitoring and core data export methods of the multi-core SoC, wherein the architecture of the chip is provided with a CPU reset vector register, a CPU release control pin, a CPU release control register and a debugging interface, wherein the CPU reset vector register, the CPU release control pin, the CPU release control register and the debugging interface are arranged in the architecture of the chip, and the chip is provided with a plurality of functional units, a plurality of functional units and a plurality of functional units, wherein the functional units are connected in series with one another
The CPU reset vector register is used for controlling the address of an instruction which is read and executed after the CPU is released;
the CPU release control register is used for controlling CPU release when the chip is powered on;
the CPU release control pin is used for controlling the validity of the CPU release control register;
the debugging interface is used for reading and writing the on-chip RAM and each register to execute the flow control of the chip.
The invention has at least the following beneficial technical effects:
the invention provides a method for monitoring the abnormity of a board card firmware of a multi-core SoC and exporting core data, wherein a hardware watchdog is used for monitoring a plurality of subsystems, so that the cost is saved and the realization is convenient; sending a notice to the host before the system abnormal reset, requiring the host end to stop communicating with the board firmware, preventing the host end from being abnormal due to the restart of the board firmware, reducing the diffusion of the abnormality and improving the stability of the host end; the core data can be automatically dumped to the host side, so that the positioning problem can be conveniently analyzed; the generation and export of core data are actively triggered at the host end, and the system is beneficial to the analysis of the running state of the system by developers.
These and other aspects of the present application will be more readily apparent from the following description of the embodiments. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.
In the figure:
fig. 1 shows an exception generation and core export flowchart of a board firmware exception monitoring and core data export method of a multi-core SoC according to the present invention;
fig. 2 shows a system topology diagram of the board firmware anomaly monitoring and core data export method of the multi-core SoC according to the present invention;
fig. 3 is a schematic view of monitoring a multi-subsystem according to the method for board firmware exception monitoring and core data export of a multi-core SoC of the present invention;
fig. 4 shows a multi-subsystem watchdog reset logic diagram of the board firmware anomaly monitoring and core data export method of the multi-core SoC according to the present invention;
fig. 5 is a schematic diagram illustrating that a watchdog of the multi-core SoC board firmware anomaly monitoring and core data export method according to the present invention automatically monitors multi-system anomalies and core collection and export;
fig. 6 is a schematic diagram illustrating multi-system exception active trigger core collection and export according to the board firmware exception monitoring and core data export method of the multi-core SoC
Fig. 7 is a schematic diagram illustrating an embodiment of a computer-readable storage medium for implementing the board firmware anomaly monitoring and core data export method of a multi-core SoC according to the present invention;
fig. 8 is a schematic hardware configuration diagram of an embodiment of a computer device for implementing the board firmware anomaly monitoring and core data derivation method of a multi-core SoC according to the present invention;
fig. 9 shows a schematic view of a frame of an embodiment of a chip according to the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.
It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two non-identical entities with the same name or different parameters, and it is understood that "first" and "second" are only used for convenience of expression and should not be construed as limiting the embodiments of the present invention. Furthermore, the terms "comprises" and "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements does not include all of the other steps or elements inherent in the list.
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The flowcharts shown in the figures are illustrative only and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.
The existing abnormal detection mechanism has the problems that signal capture and analysis are carried out in a hardware circuit sampling mode in a debugging stage, a log is printed at an abnormal branch for subsequent analysis, core data information of abnormal occurrence is stored in a storage medium on a chip, and the export of the core data is triggered in a mode of manually inserting a special storage device. The first method has low feasibility for analyzing problems of products on a line, and the second method can locate the problems of parts through log analysis, but the export of the SoC logs on the board firmware usually needs human intervention, the timeliness is not good enough, and the problem of location in a large number of logs is difficult. The third method is to export the core file by means of human external equipment, which is difficult to operate for the board card firmware existing in the server chassis.
In view of this, the embodiment of the present invention provides a method for board firmware abnormality monitoring and core data export for a multi-core SoC, which performs automatic board firmware abnormality monitoring and core data export based on the multi-core SoC, and implements monitoring and notification of abnormal conditions of each subsystem by cooperating with a hardware watchdog circuit and feeding dogs for multiple subsystems on the SoC together. After the abnormity occurs, the host end can perform some preprocessing operations by reporting an alarm to the host, so that the abnormity of the host caused by the abnormity of the board firmware is avoided. Each submodule stores the respective core data in a specific DDR area, and after the system is reset, the host computer can pull out and export the core data for analysis.
In some embodiments of the present invention, as shown in fig. 1, a method for board firmware exception monitoring and core data export of a multi-core SoC is provided, including the following steps:
carrying out abnormity monitoring on the board card firmware in real time, and sending an abnormity notification report alarm to the host after abnormity is generated;
the host side carries out response feedback after carrying out preprocessing operation, and each abnormal monitoring module stores the core data in a specific DDR region;
and after resetting the system again, sending a request to pull the core data to the host, feeding back an information data request by the host, and pulling and exporting the core data for analysis.
As a further scheme of the invention, the multi-core SoC has a plurality of CPUs corresponding to a plurality of subsystems, data interaction is carried out among the subsystems, and when one system fails, the SoC responds to record and alarm the abnormality and dumps the site snapshot information of the abnormality.
The multi-core SoC often has a plurality of CPUs, a plurality of subsystems can exist correspondingly, frequent data interaction generally exists among the subsystems, when one system fails, the SoC needs to respond to record and alarm an abnormality, and further, field snapshot information of the abnormality needs to be dumped for developers to perform problem analysis and positioning.
The system described in the present invention is a board firmware of a multi-core SoC, the board firmware is PCIe devices on a motherboard and communicates with a host via an NVMe protocol, a topological diagram of the system is shown in fig. 2, the system is a board firmware of a multi-core SoC, the board firmware is PCIe devices on a motherboard and communicates with a host via an NVMe protocol.
Referring to fig. 3 and 4, during multi-subsystem monitoring, a hardware watchdog is used to monitor multiple subsystems in the SoC, and the main implementation process is to feed the watchdog regularly by the multiple subsystems. When one subsystem does not feed dogs in time, the subsystem is regarded as a system fault, and a scenario of 3 subsystems is taken as an example.
When the watchdog counter overflows, the functions of sending an alarm to the host and storing system core data are provided. The main realization process is as follows:
a) When a certain module does not feed dogs, the watchdog overflows, and WDT Pre-warming interruption notification is generated to each subsystem;
b) The system 3 is responsible for communication with the host, and sends system abnormal information to the host side to request the host for preprocessing;
c) The host end driving program carries out corresponding pretreatment and returns that the treatment is successful;
d) And each subsystem completes the collection and organization of core data in the corresponding interrupt processing and stores the core data in the designated DDR space. The core data comprises registers, memory information and the like;
e) WDT Pre-warming is generated with a count time in between WDT Reset, so the above b-d operations need to be completed within this count time. Resetting the system when the counting time is up;
f) After the system is reset, each subsystem is reloaded from the bootloader, and the function of the bootloader in the system is completed by the subsystem 2;
g) Detecting the reason of system reset in the bootloader stage, and if the reason is caused by watchdog reset, informing the host to pull the core file in the bootloader stage;
h) And the host side pulls the core data from the corresponding DDR space to a host side fixed path for a developer to analyze.
The watchdog automatically monitors multi-system exception and core collection and export as shown in fig. 5, and in addition, the system also supports active triggering of each subsystem or active issuing of commands through a Host end to generate and export core files. The scene is suitable for the condition that developers need to check and analyze the detailed and performance information of the current running state of the system. The active triggering of the Host end needs to be matched with a tool of the Host end, the tool issues a PCIe (peripheral component interface express) instruction to the board firmware, the board firmware receives the instruction, the watchdog timer is actively configured to be in an overflow state, and then the steps of generating core data and exporting the core data are the same as those in the step 3), which is shown in fig. 6.
In this embodiment, in the method for board firmware anomaly monitoring and core data export of a multi-core SoC, the system further supports active triggering of each subsystem or active issuing of a command through a Host terminal to generate and export a core file.
In this embodiment, the Host end is actively triggered in cooperation with a tool of the Host end, the tool issues a PCIe instruction to the board firmware, and the board firmware receives the instruction and actively configures the watchdog timer to be in an overflow state to generate core data and export the core data.
In this embodiment, in the method for board firmware exception monitoring and core data export of a multi-core SoC, one hardware watchdog is used to monitor a plurality of subsystems.
In this embodiment, a notification is sent to the host before the system exception reset, and the host is required to stop communication with the board firmware.
In this embodiment, the board firmware anomaly monitoring and core data export method for a multi-core SoC supports active triggering of generation and export of core data at a host.
The invention realizes the monitoring of the abnormity by adopting a multi-subsystem unified dog feeding mode; realizing automatic collection of system cores; when the abnormal condition occurs, the notification is sent to the host in time, so that the abnormal condition is ensured not to spread; and core data is automatically synchronized to the host computer in case of abnormity, so that the problem can be handled in time.
According to the method for monitoring the abnormity of the board card firmware of the multi-core SoC and exporting the core, disclosed by the invention, through a unified monitoring, informing and exporting mechanism, the fault influence range is reduced, the reliability of a host terminal is ensured, and meanwhile, the abnormity analysis is conveniently carried out by developers.
It should be understood that although the steps are described above in a certain order, the steps are not necessarily performed in the order described. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, some steps of this embodiment may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a part of the steps or stages in other steps.
In a second aspect of the embodiment of the present invention, a computer-readable storage medium is further provided, and fig. 7 is a schematic diagram of a computer-readable storage medium of a board firmware anomaly monitoring and core data export method of a multi-core SoC according to the embodiment of the present invention. As shown in fig. 7, the computer-readable storage medium 300 stores computer program instructions 310, the computer program instructions 310 being executable by a processor. The computer program instructions 310 when executed implement the method of any of the embodiments described above.
It should be understood that all the embodiments, features and advantages described above for the method for board firmware abnormality monitoring and core data derivation for a multi-core SoC according to the present invention are equally applicable to the system for board firmware abnormality monitoring and core data derivation for a multi-core SoC and the storage medium according to the present invention without conflicting with each other.
In a third aspect of the embodiments of the present invention, there is further provided a computer device 400, including a memory 420 and a processor 410, where the memory stores therein a computer program, and when the processor executes the computer program, the computer program implements the method of any one of the above embodiments, including the following steps:
carrying out abnormity monitoring on the board card firmware in real time, and sending an abnormity notification report alarm to the host after abnormity is generated;
the host side performs response feedback after preprocessing operation, and each abnormal monitoring module stores the core data of each abnormal monitoring module in a specific DDR area;
and after resetting the system again, sending a request to pull the core data to the host, feeding back an information data request by the host, and pulling and exporting the core data for analysis.
Fig. 8 is a schematic diagram of a hardware structure of a computer device according to an embodiment of the method for performing board firmware anomaly monitoring and core data export of a multi-core SoC provided in the present invention. Taking the computer device 400 shown in fig. 8 as an example, the computer device includes a processor 410 and a memory 420, and may further include: an input device 430 and an output device 440. The processor 410, the memory 420, the input device 430, and the output device 440 may be connected by a bus or other means, as exemplified by the bus connection in fig. 8. The input device 430 may receive input numeric or character information and generate signal inputs related to board firmware anomaly monitoring and core data derivation for multi-core socs. The output device 440 may include a display device such as a display screen.
The memory 420 is used as a non-volatile computer-readable storage medium, and may be used to store a non-volatile software program, a non-volatile computer-executable program, and modules, such as program instructions/modules corresponding to the board firmware abnormality monitoring and core data exporting method of the multi-core SoC in the embodiment of the present application. The memory 420 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area can store data created by using the board firmware abnormity monitoring and core data export method of the multi-core SoC and the like. Further, the memory 420 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 420 may optionally include memory located remotely from processor 410, which may be connected to local modules over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The processor 410 executes various functional applications and data processing of the server by running the nonvolatile software program, instructions and modules stored in the memory 420, that is, the method for monitoring the board firmware abnormality and exporting the core data of the multi-core SoC according to the above embodiment of the method is implemented, and includes the following steps:
carrying out abnormity monitoring on the board card firmware in real time, and sending an abnormity notification report alarm to the host after abnormity is generated;
the host side performs response feedback after preprocessing operation, and each abnormal monitoring module stores the core data of each abnormal monitoring module in a specific DDR area;
and after resetting the system again, sending a request to pull the core data to the host, feeding back an information data request by the host, and pulling and exporting the core data for analysis.
In a fourth aspect of the embodiment of the present invention, there is further provided a chip 500 for performing flow control according to any one of the above methods for board firmware abnormality monitoring and core data derivation for a multi-core SoC according to the present invention. Fig. 9 shows a schematic diagram of a frame of a chip 500 according to the invention. As shown in FIG. 9, in this embodiment, the chip 500 has a CPU reset vector register 510, a CPU release control pin 520, a CPU release control register 530, and a debug interface 540 in its architecture, wherein
The CPU reset vector register 510 is used to control the address of the instruction that is read and executed after the CPU is released;
the CPU release control register 520 is used to control CPU release when the chip 500 is powered on;
the CPU release control pin 530 is used to control the validity of the CPU release control register 520;
the debug interface 540 is used to read and write the on-chip RAM and the registers to perform the flow control of the chip.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed embodiments of the present invention.
Finally, it should be noted that the computer-readable storage medium (e.g., memory) herein can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. By way of example, and not limitation, nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which can act as external cache memory. By way of example and not limitation, RAM may be available in a variety of forms such as synchronous RAM (DRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), synchlink DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The storage devices of the disclosed aspects are intended to comprise, without being limited to, these and other suitable types of memory.
The various illustrative logical blocks, modules, and circuits described in connection with the disclosure herein may be implemented or performed with the following components designed to perform the functions herein: a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination of these components. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP, and/or any other such configuration.
The invention provides a method for monitoring the abnormity of the board card firmware of a multi-core SoC and exporting core data, which uses a hardware watchdog to realize the monitoring of a plurality of subsystems, saves the cost and is convenient to realize; sending a notice to the host before the system abnormal reset, requiring the host end to stop communicating with the board firmware, preventing the host end from being abnormal due to the restart of the board firmware, reducing the diffusion of the abnormality and improving the stability of the host end; the core data can be automatically dumped to the host side, so that the positioning problem can be conveniently analyzed; the generation and export of core data are actively triggered at the host end, and the system is beneficial to the analysis of the running state of the system by developers.
The foregoing are exemplary embodiments of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
It should be understood that, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items. The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant only to be exemplary, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of the embodiments of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.

Claims (10)

1. A method for monitoring the abnormality of a board firmware of a multi-core SoC and exporting core data is characterized by comprising the following steps:
carrying out abnormity monitoring on the board card firmware in real time, and sending an abnormity notification report alarm to the host after abnormity is generated;
the host side carries out response feedback after carrying out preprocessing operation, and each abnormal monitoring module stores the core data in a specific DDR region;
and after resetting the system again, sending a request to pull the core data to the host, feeding back an information data request by the host, and pulling and exporting the core data for analysis.
2. The board card firmware abnormality monitoring and core data exporting method of the multi-core SoC of claim 1, wherein the multi-core SoC has a plurality of CPUs and corresponds to a plurality of subsystems, the plurality of subsystems perform data interaction, when one of the subsystems fails, the SoC responds to perform abnormality recording and alarming, and dumps site snapshot information of the abnormality occurrence.
3. The method for board firmware anomaly monitoring and core data derivation for multi-core SoC as claimed in claim 2, wherein the system is a board firmware for multi-core SoC, the board firmware is PCIe device on a motherboard, and the communication with the host computer is performed via NVMe protocol.
4. The method for board firmware anomaly monitoring and core data export of a multi-core SoC according to claim 3, wherein during multi-subsystem monitoring, a hardware watchdog is adopted to monitor a plurality of subsystems in the SoC, the plurality of subsystems feed dogs regularly, and when one subsystem does not feed dogs timely, the system fault is considered.
5. The method for board firmware anomaly monitoring and core data export of a multi-core SoC according to claim 4, wherein when the watchdog counter overflows, an alarm is sent to the host and the system core data is stored, and the operation steps are as follows:
when a certain module does not carry out dog feeding, the watchdog overflows, and WDT Pre-warming interruption is generated to inform each subsystem;
one subsystem is responsible for communication with the host, and sends system abnormal information to the host side to request the host for preprocessing;
the host end driving program carries out corresponding preprocessing and returns to the success of the processing;
each subsystem completes the collection and organization of core data in the corresponding interrupt processing and stores the core data in a specified DDR space;
WDT Pre-warming generates to WDT Reset has a count time in the middle, after the system is Reset, all the subsystems are reloaded from bootloader;
and detecting the system reset reason at the bootloader stage, and pulling the core data from the corresponding DDR space to a host side fixed path by the host side.
6. The method for board firmware anomaly monitoring and core data export of the multi-core SoC according to claim 5, wherein in the method for board firmware anomaly monitoring and core data export of the multi-core SoC, the system further supports active triggering of each subsystem or active issuing of commands through a Host end to generate and export core files.
7. The method of claim 6, wherein the writing modes include SLC, TLC and MLC writing modes.
8. The method for board firmware abnormality monitoring and core data derivation of a multi-core SoC of claim 1, wherein a hardware watchdog is used to monitor a plurality of subsystems in the method for board firmware abnormality monitoring and core data derivation of a multi-core SoC.
9. The method for board firmware exception monitoring and core data export for multi-core SoC of claim 8, wherein a notification is sent to the host before the system exception reset, requiring the host end to stop communication with the board firmware.
10. The method for board firmware abnormality monitoring and core data derivation for a multi-core SoC of claim 9, wherein the method for board firmware abnormality monitoring and core data derivation for a multi-core SoC supports active triggering of generation and derivation of core data at a host.
CN202211031997.8A 2022-08-26 2022-08-26 Board card firmware abnormity monitoring and core data exporting method of multi-core SoC Pending CN115373997A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211031997.8A CN115373997A (en) 2022-08-26 2022-08-26 Board card firmware abnormity monitoring and core data exporting method of multi-core SoC

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211031997.8A CN115373997A (en) 2022-08-26 2022-08-26 Board card firmware abnormity monitoring and core data exporting method of multi-core SoC

Publications (1)

Publication Number Publication Date
CN115373997A true CN115373997A (en) 2022-11-22

Family

ID=84067706

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211031997.8A Pending CN115373997A (en) 2022-08-26 2022-08-26 Board card firmware abnormity monitoring and core data exporting method of multi-core SoC

Country Status (1)

Country Link
CN (1) CN115373997A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116820837A (en) * 2023-06-28 2023-09-29 合芯科技有限公司 Exception handling method and device for system component
CN117234787A (en) * 2023-11-14 2023-12-15 苏州元脑智能科技有限公司 Method and system for monitoring running state of system-level chip

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116820837A (en) * 2023-06-28 2023-09-29 合芯科技有限公司 Exception handling method and device for system component
CN117234787A (en) * 2023-11-14 2023-12-15 苏州元脑智能科技有限公司 Method and system for monitoring running state of system-level chip
CN117234787B (en) * 2023-11-14 2024-02-23 苏州元脑智能科技有限公司 Method and system for monitoring running state of system-level chip

Similar Documents

Publication Publication Date Title
CN115373997A (en) Board card firmware abnormity monitoring and core data exporting method of multi-core SoC
US8843785B2 (en) Collecting debug data in a secure chip implementation
US9569325B2 (en) Method and system for automated test and result comparison
US6012148A (en) Programmable error detect/mask utilizing bus history stack
CN102081573B (en) Device and method for recording equipment restart reason
JP2017517060A (en) Fault processing method, related apparatus, and computer
EP3591485B1 (en) Method and device for monitoring for equipment failure
US20090248390A1 (en) Trace debugging in a hardware emulation environment
CN101286129A (en) Embedded systems debugging
CN112181833A (en) Intelligent fuzzy test method, device and system
CN104077220A (en) Method and device for debugging microprocessor without interlocked piped stages (MIPS) framework operating system kernel
CN110716878B (en) Automatic interface testing method, device and system
US8489938B2 (en) Diagnostic data capture in a computing environment
US11023335B2 (en) Computer and control method thereof for diagnosing abnormality
CN109408272B (en) Storage fault processing method and device
CN116680101A (en) Method and device for detecting downtime of operating system, and method and device for eliminating downtime of operating system
CN114003416B (en) Memory error dynamic processing method, system, terminal and storage medium
CN115562918A (en) Computer system fault testing method and device, electronic equipment and readable medium
CN113312246B (en) Control method, device, platform, equipment and storage medium of verification environment
CN115098291A (en) Method, system, storage medium and equipment for recording system restart reason
CN115757099A (en) Automatic test method and device for platform firmware protection recovery function
CN112486785B (en) Method, system, terminal and storage medium for positioning downtime phase of server
CN100369009C (en) Monitor system and method capable of using interrupt signal of system management
CN113778732A (en) Fault positioning method and device for service board card
CN111865719A (en) Automatic testing method and device for fault injection of switch

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination