WO2022116755A1 - 多核系统的死机信息存储方法以及介质和电子设备 - Google Patents

多核系统的死机信息存储方法以及介质和电子设备 Download PDF

Info

Publication number
WO2022116755A1
WO2022116755A1 PCT/CN2021/127102 CN2021127102W WO2022116755A1 WO 2022116755 A1 WO2022116755 A1 WO 2022116755A1 CN 2021127102 W CN2021127102 W CN 2021127102W WO 2022116755 A1 WO2022116755 A1 WO 2022116755A1
Authority
WO
WIPO (PCT)
Prior art keywords
processor
core system
interrupt
core
crash information
Prior art date
Application number
PCT/CN2021/127102
Other languages
English (en)
French (fr)
Inventor
师雯
Original Assignee
哲库科技(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 哲库科技(北京)有限公司 filed Critical 哲库科技(北京)有限公司
Publication of WO2022116755A1 publication Critical patent/WO2022116755A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • G06F11/0754Error or fault detection not based on redundancy by exceeding limits
    • G06F11/0757Error or fault detection not based on redundancy by exceeding limits by exceeding a time limit, i.e. time-out, e.g. watchdogs

Definitions

  • the present application relates to the field of embedded technologies, and in particular, to a method for storing crash information of a multi-core system, as well as a medium and an electronic device.
  • Watchdog is a monitoring technology commonly used in embedded software, which includes both software part and hardware part.
  • the hardware part includes a hardware timer. If the timer is not reset within a few seconds, it will notify the PMIC (Power Management Integrated Circuit) unit of the system to reset the system.
  • the software part can be implemented by a process scheduled by a timer, which periodically resets the hardware timer to prevent the PMIC unit from resetting the system.
  • the watchdog can actively reset the system when the system is stuck and cannot work normally, so that the system can return to normal work.
  • the above-mentioned system crash triggered by the watchdog timer timeout can save a valid memory image, but the premise is that when the watchdog timer times out, the processing core corresponding to the watchdog can respond to the interrupt. If the corresponding processing core turns off the interrupt when the watchdog timer times out, the crash process cannot be triggered actively. In the resulting memory image file, the cache is not refreshed, resulting in some data being invalid, thus affecting the subsequent analysis process. .
  • an object of the present application is to propose a method for storing crash information in a multi-core system, so as to ensure the validity of the random access memory part of the crash processing core.
  • the second object of the present application is to propose a computer storage medium.
  • the third object of the present application is to propose an electronic device.
  • the present application provides a method for storing crash information of a multi-core system, including the following steps: a first processor determines, through inter-core communication of the multi-core system, whether the second processor is in an interrupt that does not respond to interrupts A failure state; when the second processor is in an interrupt failure state, the first processor performs an action so that the crash information of the second processor is acquired into the storage device of the multi-core system, wherein, The first processor is in an interrupt active state in response to an interrupt.
  • the action of at least one processing core in the interrupt valid state in the multi-core system can be monitored. , to refresh the crash information of the processing core where the watchdog interrupt occurs to the random access memory of the multi-core system. Therefore, it is not necessary to require the processing core that has the watchdog interrupt to respond to the interrupt, and the validity of the random access memory image of the processing core can also be guaranteed, thereby providing more crash-related information for subsequent debugging and analysis.
  • the first processor when the second processor is in an interrupt disabled state and the first processor is in an interrupt enabled state, acquires the crash information of the second processor , and acquire the acquired crash information into the storage device of the multi-core system.
  • the first processor accesses the storage space of the second processor to acquire crash information of the second processor.
  • the first processor accesses the TCM and/or Cache of the second processor through an inter-core AXI interface, so as to acquire the crash information in the TCM and/or Cache to the in the storage device of a multi-core system.
  • the first processor acquires the crash information of the second processor according to the mapping relationship between the first processor and the second processor.
  • the cache of the multi-core system is set to the Fresh mode by the first processor, so that the second processor obtains the crash information to the in the storage device of a multi-core system.
  • the first processor and the second processor send heartbeat information to each other every first preset time to determine the first processor and/or the second processor Whether it is in the interrupt failure state.
  • the first processor and the second processor send an inter-core interrupt to each other every the first preset time, and mutually determine whether to respond to the inter-core interrupt ; If the first processor or the second processor fails to respond to the inter-core interrupt more than once, it is determined that the first processor or the second processor is in an interrupt failure state.
  • the state information of the multi-core system is acquired to the multi-core system in the storage device.
  • the first processor and the second processor when at least one of the first processor and the second processor has a watchdog interrupt and is in an interrupt valid state, the first processor and the second processor At least one of the processors sends an inter-core interrupt to the other processors, forcing the other processors to acquire the TCM and/or the Cache into the storage device of the multi-core system.
  • the storage device of the multi-core system is RAM.
  • the crash information includes state information of the multi-core system.
  • the inter-core interrupt is an IPI interrupt.
  • the present application provides a computer-readable storage medium on which a crash information storage program of a multi-core system is stored, and when the crash information storage program of the multi-core system is executed by a processor, the multi-core system described in the first aspect is implemented The crash information storage method.
  • the present application proposes an electronic device, including a memory, a processor, and a crash information storage program stored in the memory and capable of being a multi-core system on the processor.
  • the processor executes the crash information storage program, The method for storing the crash information of the multi-core system described in the first aspect is implemented.
  • FIG. 1 is a flowchart of a method for storing crash information of a multi-core system according to an embodiment of the present application
  • FIG. 2 is a flowchart of a method for storing crash information of a multi-core system according to a specific embodiment of the present application
  • FIG. 3 is a schematic diagram of communication between multiple processing cores in a multi-core system according to a specific embodiment of the present application
  • FIG. 4 is a structural block diagram of an electronic device according to an embodiment of the present application.
  • FIG. 1 is a flowchart of a method for storing crash information of a multi-core system according to an embodiment of the present application.
  • the method for storing crash information of the multi-core system includes the following steps:
  • the first processor determines, through inter-core communication of the multi-core system, whether the second processor is in an interrupt failure state that does not respond to interrupts.
  • the second processor is a processing core that generates a watchdog interrupt, and the first processor is in an interrupt valid state for responding to the terminal.
  • the multi-core system includes a plurality of processing cores, and there is a communication connection between the processing cores, wherein the multi-core system may be an integrated chip.
  • Each processing core can be equipped with a corresponding watchdog module, and each processing core can configure the internal related registers of the watchdog for the corresponding watchdog module, and enable the watchdog by configuring the watchdog control register.
  • Each processing core can periodically send a dog feeding signal to the corresponding watchdog module. When the watchdog module receives the first dog feeding signal, the watchdog counter of the watchdog module starts to count.
  • a watchdog interrupt is generated, that is, a watchdog interrupt occurs in any processing core.
  • the processing core is regarded as the second processor, and the other processing cores are regarded as the first processor.
  • the first processor can monitor the state of the processing core where the watchdog interrupt occurs through the communication connection with the second processor, such as whether there is heartbeat information exchange between the first processor and the second processor. , so as to determine whether the processing core in which the watchdog interrupt occurs is in the interrupt failure state.
  • the storage device of the multi-core system is RAM.
  • the second processor when the second processor is in the interrupt-inactive state, it cannot respond to the interrupt.
  • at least one processing core in the first processor in the interrupt-active state in the multi-core system for example, the A processing core with the best processing power
  • the crash information of the second processor can be refreshed to the random access memory of the multi-core system through inter-core communication; the Cache of the multi-core system can also be set to the Fresh mode, so that the second processor can be reset when the multi-core system restarts.
  • the crash information is flushed to the random access memory of the multi-core system.
  • the method for storing crash information of a multi-core system does not require the processing core that has a watchdog interrupt to respond to the interrupt, and can also ensure the validity of the random access memory image of the processing core, thereby providing more crash-related information for subsequent debugging and analysis.
  • the first processor and the second processor send heartbeat information to each other every first preset time to determine whether the first processor and/or the second processor is in an interrupt failure state.
  • the first preset time can be calibrated as required, for example, it can be a value from 0.5s to 3s.
  • the above-mentioned first processor and second processor determine whether the first processor and/or the second processor are in an interrupt failure state by sending heartbeat information to each other every first preset time, which may specifically include: the first processor and the The second processor sends inter-core interrupts to each other every first preset time, and mutually determines whether to respond to the inter-core interrupt; if the first processor or the second processor fails to respond to the inter-core interrupt more than once, Then it is determined that the first processor or the second processor is in an interrupt failure state.
  • the first processor sends an inter-core interrupt to the second processor every first preset time, and detects whether the second processor responds to the inter-core interrupt; One value in 10 times) does not respond to the inter-core interrupt, then it is determined that the second processor is in an interrupt failure state. In this way, it can be accurately monitored whether the second processor is in the interrupt failure state.
  • the second processor may also send an inter-core interrupt to the first processor every first preset time. In response, it is determined that the first processor is in an interrupt failure state.
  • the second processor may further determine whether the first processor determines whether the first processor is in an interrupt failure state according to whether the inter-core interrupt information sent by the first processor is received. Specifically, the second processor may know in advance the moment when the first processor sends the inter-core interrupt to the second processor; for example, the second processor may record the first time that the first processor sends the inter-core interrupt to the second processor The time at which the first processor sends the inter-core interrupt is calculated by using the first preset time known in advance based on the time.
  • the second processor fails to receive the inter-core interval sent by the first processor at the pre-calculated moment when the first processor sends the inter-core interrupt for several consecutive times (for example, one value in 3 to 10 times), then It is determined that the first processor is in an interrupt disabled state.
  • the two processing cores may be respectively recorded as a master processing core Core1 and a slave processing core Core0, see FIG. 2 .
  • the two processing cores can not only perform the operation of resetting the watchdog register, but also communicate with each other while performing this operation to determine the survival state of each other.
  • the above-mentioned inter-core interrupt is IPI (Inter Processor Interrupt, inter-core interrupt) interrupt, and then the survival state of the other party can be determined through the IPI interrupt.
  • IPI Inter Processor Interrupt, inter-core interrupt
  • the main processing core Core1 can send an IPI interrupt to the slave processing core Core0 every first preset time, such as 1s, to inform the slave processing core Core0 that the main processing core Core1 is not dead; the slave processing core Core0 After receiving the IPI interrupt sent by the main processing core Core1, it can respond to the IPI interrupt, and feed back corresponding information to the main processing core Core1, so as to inform the main processing core Core1 and the slave processing core Core0 that the machine is not dead.
  • the main processing core Core1 detects that the slave processing core Core0 does not respond to the inter-core interrupt, and continues for many times, such as 5 times, it can be determined that the slave processing of the watchdog interrupt occurs.
  • the core Core0 is in the interrupt invalid state; otherwise, it is determined that the slave processing core Core0 where the watchdog interrupt occurs is in the interrupt valid state.
  • the state information of the multi-core system is acquired into the storage device of the multi-core system. Therefore, the validity of the crash information of the processing core in which the watchdog interrupt occurs can be ensured without inter-core operations, which facilitates subsequent debugging and analysis.
  • the above-mentioned crash information includes the above-mentioned state information of the multi-core system.
  • a watchdog interrupt occurs in at least one of the first processor and the second processor and is in an interrupt valid state
  • at least one of the first processor and the second processor sends a message to the other processor.
  • the processor sends an inter-core interrupt to force other processors to acquire the TCM and/or Cache to the storage device of the multi-core system. In this way, more information when a watchdog interrupt occurs in the multi-core system can be obtained, which is convenient for subsequent debugging and analysis, and ensures the accuracy of the debugging data.
  • the first processor when the second processor is in an interrupt invalid state and the first processor is in an interrupt valid state, acquires crash information of the second processor, and acquires the acquired crash information to in the storage device of a multi-core system.
  • the first processor accesses the storage space of the second processor to obtain crash information of the second processor.
  • the first processor accesses the TCM and/or the Cache of the second processor through the inter-core AXI (Advanced eXtensible Interface) interface, so as to obtain the crash information in the TCM and/or the Cache to the storage of the multi-core system in the device.
  • AXI Advanced eXtensible Interface
  • multiple processing cores Core0 to Coren mutually access the TCM and the Cache through the inter-core interface AXI.
  • the first processor can actively call the inter-core interface to help the second processor refresh the TCM and L1 cache to the L2 Cache, and then actively trigger a crash to refresh the L2 Cache to the random memory. Take the memory RAM and save the memory image.
  • DTCM in FIG. 3 refers to a data transmission bus
  • ITCM refers to an instruction transmission bus.
  • the first processor acquires the crash information of the second processor according to the mapping relationship between the first processor and the second processor.
  • the mapping relationship between any two processing cores may be established in advance, and then the first processor acquires the mapping relationship between the first processor and the second processor.
  • the mapping relationship between Core0 and Core3 is also preset, and there is a mapping relationship between Core1 and Core2.
  • the watchdog can be obtained through Core3.
  • the cache of the multi-core system is set to the Fresh mode by the first processor, so that the second processor acquires the crash information to the storage device of the multi-core system after the multi-core system is restarted.
  • the first processor can also set the Cache of the multi-core system to the Fresh mode through an action. It is ensured that the contents of the Cache and/or TCM can be saved even when the multi-core system is warmly started, so that it can be refreshed to RAM when it is restarted, and then the memory image can be saved.
  • the method for storing crash information of a multi-core system of the present application can realize that when a certain processing core cannot respond to an interrupt and the multi-core system is about to crash, the other processing cores can refresh the cache, save variables, print logs, etc., This ensures the validity of the random access memory part of the processing core that crashes in the random access memory image, provides more crash-related information, and helps to better solve the crash caused by the watchdog terminal, which is difficult to debug. question.
  • the present application also proposes a computer-readable storage medium.
  • a computer-readable storage medium stores a crash information storage program of a multi-core system, and when the multi-core system crash information storage program is executed by a processor, implements the crash information storage method of the multi-core system of the above embodiment.
  • FIG. 4 is a structural block diagram of an electronic device according to an embodiment of the present application.
  • the electronic device 100 includes a memory 110, a processor 120, and a crash information storage program stored on the memory 110 and can be stored on the processor 120 for a multi-core system.
  • the processor 120 executes the crash information storage program, the above-mentioned A crash information storage method for a multi-core system.
  • a "computer-readable medium” can be any device that can contain, store, communicate, propagate, or transport the program for use by or in connection with an instruction execution system, apparatus, or apparatus.
  • computer readable media include the following: electrical connections with one or more wiring (electronic devices), portable computer disk cartridges (magnetic devices), random access memory (RAM), Read Only Memory (ROM), Erasable Editable Read Only Memory (EPROM or Flash Memory), Fiber Optic Devices, and Portable Compact Disc Read Only Memory (CDROM).
  • the computer readable medium may even be paper or other suitable medium on which the program may be printed, as the paper or other medium may be optically scanned, for example, followed by editing, interpretation, or other suitable medium as necessary process to obtain the program electronically and then store it in computer memory.
  • first and second are only used for descriptive purposes, and should not be construed as indicating or implying relative importance or implying the number of indicated technical features. Thus, a feature delimited with “first”, “second” may expressly or implicitly include at least one of that feature.
  • plurality means at least two, such as two, three, etc., unless expressly and specifically defined otherwise.
  • the terms “installed”, “connected”, “connected”, “fixed” and other terms should be understood in a broad sense, for example, it may be a fixed connection or a detachable connection , or integrated; it can be a mechanical connection or an electrical connection; it can be directly connected or indirectly connected through an intermediate medium, it can be the internal connection of two elements or the interaction relationship between the two elements, unless otherwise specified limit.
  • installed may be a fixed connection or a detachable connection , or integrated; it can be a mechanical connection or an electrical connection; it can be directly connected or indirectly connected through an intermediate medium, it can be the internal connection of two elements or the interaction relationship between the two elements, unless otherwise specified limit.
  • a first feature "on” or “under” a second feature may be in direct contact with the first and second features, or the first and second features indirectly through an intermediary touch.
  • the first feature being “above”, “over” and “above” the second feature may mean that the first feature is directly above or obliquely above the second feature, or simply means that the first feature is level higher than the second feature.
  • the first feature being “below”, “below” and “below” the second feature may mean that the first feature is directly below or obliquely below the second feature, or simply means that the first feature has a lower level than the second feature.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Hardware Redundancy (AREA)
  • Debugging And Monitoring (AREA)

Abstract

一种多核系统的死机信息存储方法、装置以及介质和电子设备。所述多核系统的死机信息存储方法包括以下步骤:第一处理器通过多核系统的核间通信,确定第二处理器是否处于不响应中断的中断失效状态(S101);在第二处理器处于中断失效状态的情况下,第一处理器进行动作使得第二处理器的死机信息被获取至多核系统的存储装置中(S102),其中,第一处理器处于对中断进行响应的中断有效状态。该多核系统的死机信息存储方法,无需要求发生看门狗中断的处理核响应中断,也可保证该处理核的随机存取存储器镜像的有效性,进而可提供更多的死机相关信息,以便后续调试分析。

Description

多核系统的死机信息存储方法以及介质和电子设备
相关申请的交叉引用
本公开要求于2020年12月03日提交的申请号为202011408857.9,名称为“多核系统的死机信息存储方法以及介质和电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。
技术领域
本申请涉及嵌入式技术领域,特别涉及一种多核系统的死机信息存储方法以及介质和电子设备。
背景技术
看门狗(Watchdog)是嵌入式软件当中常用的一种监控技术,该技术既包括软件部分和硬件部分。硬件部分包括硬件定时器,若该定时器在几秒内未被复位,则会通知系统的PMIC(Power Management Integrated Circuit,集成电源管理电路)单元做系统复位。软件部分可以用定时器调度的进程来实现,该进程周期性的进行硬件定时器的复位,来防止PMIC单元复位系统。看门狗可以在系统卡住,不能正常工作的情况下,主动复位系统,让系统恢复正常工作。
相关技术中,发生看门狗中断时,通过看门狗超时触发中断,在中断里触发整个系统死机,在死机的过程中,刷新TCM(Tightly Coupled Memory,紧耦合内存)和Cache(缓存器)到RAM(Random Access Memory,随机访问内存)中,然后保存整个RAM的内容到文件系统,以便后续调试分析。
然而,上述的看门狗计时器超时触发的系统死机,虽然可以保存有效的内存镜像,但其前提是看门狗计时器超时时,看门狗对应的处理核是可以响应中断的。如果看门狗计时器超时时,对应的处理核关闭了中断,则不能主动触发死机流程,由此得到的内存镜像文件里,没有刷新Cache,导致有些数据是无效的,从而影响后续的分析过程。
公开内容
本申请旨在至少在一定程度上解决相关技术中的技术问题之一。为此,本申请的一个目的在于提出一种多核系统的死机信息存储方法,以保证死机处理核的随机存取存储器部分的有效性。
本申请的第二个目的在于提出一种计算机存储介质。
本申请的第三个目的在于提出一种电子设备。
第一方面,本申请提出了一种多核系统的死机信息存储方法,包括以下步骤:第一处理器通过所述多核系统的核间通信,确定所述第二处理器是否处于不响应中断的中断失效状态;在所述第二处理器处于中断失效状态的情况下,所述第一处理器进行动作使得所述第二处理器的死机信息被获取至所述多核系统的存储装置中,其中,所述第一处理器处于对中断进行响应的中断有效状态。
本申请的多核系统的死机信息存储方法,通过多核系统的核间通信监控到发生看门狗中断的处理核处于中断失效状态时,可通过多核系统中处于中断有效状态的至少一个处理核的动作,以将发生看门狗中断的处理核的死机信息刷新到多核系统的随机存取存储器中。由此,无需要求发生看门狗中断的处理核响应中断,也可保证该处理核的随机存取存储器镜像的有效性,进而可提供更多的死机相关信息,以便后续调试分析。
根据本申请的一个实施例,在所述第二处理器处于中断失效状态,所述第一处理器处于中断有效状态的情况下,所述第一处理器获取所述第二处理器的死机信息,并将所获取的所述死机信息获取至所述多核系统的存储装置中。
根据本申请的一个实施例,所述第一处理器访问所述第二处理器的存储空间,以获取所述第二处理器的死机信息。
根据本申请的一个实施例,所述第一处理器通过核间AXI接口访问所述第二处理器的TCM和/或Cache,以将所述TCM和/或Cache中的死机信息获取至所述多核系统的存储装置中。
根据本申请的一个实施例,所述第一处理器根据所述第一处理器与所述第二处理器之间的映射关系,获取所述第二处理器的死机信息。
根据本申请的一个实施例,通过所述第一处理器将所述多核系统的Cache设置为Fresh模式,使得所述第二处理器在所述多核系统重启后将所述死机信息获取至所述多核系统的存储装置中。
根据本申请的一个实施例,所述第一处理器与所述第二处理器通过每隔第一预设时间互相发送心跳信息以确定所述第一处理器和/或所述第二处理器是否处于中断失效状态。
根据本申请的一个实施例,所述第一处理器与所述第二处理器通过每隔所述第一预设时间,互相发送核间中断,并互相确定是否对所述核间中断进行响应;如果所述第一处理器或所述第二处理器一次以上未对所述核间中断进行响应,则确定所述第一处理器或所述第二处理器处于中断失效状态。
根据本申请的一个实施例,在所述第一处理器与所述第二处理器中的至少一个发生看门狗中断且处于中断有效状态时,将多核系统的状态信息获取到所述多核系统的存储装置中。
根据本申请的一个实施例,在所述第一处理器与所述第二处理器中的至少一个发生看门狗中断且处于中断有效状态时,所述第一处理器与所述第二处理器中的至少一个向其它处理器发送核间中断,强制所述其它处理器将TCM和/或Cache获取到所述多核系统的存储装置中。
根据本申请的一个实施例,所述多核系统的所述存储装置为RAM。
根据本申请的一个实施例,所述死机信息包括所述多核系统的状态信息。
根据本申请的一个实施例,所述核间中断为IPI中断。
第二方面,本申请提出了一种计算机可读存储介质,其上存储有多核系统的死机信息存储程序,该多核系统的死机信息存储程序被处理器执行时实现第一方面所述的多核系统的死机信息存储方法。
第三方面,本申请提出了一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上多核系统的死机信息存储程序,所述处理器执行所述死机信息存储程序时,实现第一方面所述的多核系统的死机信息存储方法。
本申请附加的方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本申请的实践了解到。
附图说明
图1是本申请实施例的多核系统的死机信息存储方法的流程图;
图2是本申请一个具体实施例的多核系统的死机信息存储方法的流程图;
图3是本申请一个具体实施例的多核系统中多个处理核间的通信示意图;
图4是本申请实施例的电子设备的结构框图。
具体实施方式
下面详细描述本申请的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,旨在用于解释本申请,而不能理解为对本申请的限制。
下面参考附图描述本申请实施例的多核系统的死机信息存储方法以及介质、电子设备。
图1是本申请实施例的多核系统的死机信息存储方法的流程图。
如图1所示,该多核系统的死机信息存储方法,包括以下步骤:
S101,第一处理器通过多核系统的核间通信,确定第二处理器是否处于不响应中断的中断失效状态。
其中,上述第二处理器为发生看门狗中断的处理核,上述第一处理器处于对终端进行响应的中断有效状态。
具体地,多核系统包括多个处理核,各处理核之间存在通信连接,其中,多核系统可以是一集成芯片。每一处理核均可对应配备一个看门狗模块,各处理核可为对应的看门狗模块配置看门狗内部相关寄存器,并通过配置看门狗控制寄存器开启看门狗使能。各处理核可周期性发送喂狗信号至对应的看门狗模块,当看门狗模块接收到第一次喂狗信号时,看门狗模块的看门狗计数器开始计数。
当任一看门狗计算器的计数值发生第一次溢出时,在使能看门狗中断功能情况下,产生看门狗中断,即任一处理核发生看门狗中断。与此同时,将该处理核视为第二处理器,并将其他处理核视为第一处理器。进一步地,第一处理器可通过与第二处理器之间的通信连接对发生看门狗中断的处理核的状态进行监控,如第一处理器与第二处理器之间有无心跳信息交互,从而确定发生看门狗中断的处理核是否处于中断失效状态。
S102,在第二处理器处于中断失效状态的情况下,第一处理器进行动作使得第二处理器的死机信息被获取至多核系统的存储装置中。
其中,上述多核系统的存储装置为RAM。
在该实施例中,当第二处理器处于中断失效状态时,其不能响应中断,此时,可通过多核系统中处于中断有效状态的第一处理器中的至少一个处理核(如仲裁出的一个处理能力最佳的处理核)的动作,以将发生看门狗中断的处理核的死机信息刷新到多核系统的随机存取存储器RAM,如双倍速率同步动态随机存储器DDR SDRAM中。具体可通过核间通信,将第二处理器的死机信息刷新到多核系统的随机存取存储器中;也可将多核系统的Cache设置为Fresh模式,以在多核系统重启时再将第二处理器的死机信息刷新到多核系统的随机存取存储器中。
由此,本申请实施例的多核系统的死机信息存储方法,无需要求发生看门狗中断的处理核响应中断,也可保证该处理核的随机存取存储器镜像的有效性,进而可提供更多的死机相关信息,以便后续调试分析。
在一些实施例中,上述第一处理器与上述第二处理器通过每隔第一预设时间互相发送心跳信息以确定上述第一处理器和/或上述第二处理器是否处于中断失效状态。
其中,第一预设时间可根据需要进行标定,如可以是0.5s~3s中取值。上述第一处理器与第二处理器通过每隔第一预设时间互相发送心跳信息以确定第一处理器和/或第二处理器是否处于中断失效状态,具体可包括:第一处理器与第二处理器通过每隔第一预设时间,互相发送核间中断,并互相确定是否对核间中断进行响应;如果第一处理器或第二处理器一次以上未对核间中断进行响应,则确定第一处理器或第二处理器处于中断失效状态。
具体地,第一处理器每隔第一预设时间向第二处理器发送核间中断,并检测第二处理器是否对核间中断进行响应;如果第二处理器连续多次(如3~10次中的一值)未对核间中断进行响应,则确定第二处理器处于中断失效状态。由此,可准确监控到第二处理器是否处于中断失效状态。
进一步地,第二处理器也可以每个第一预设时间向第一处理器发送核间中断,如果第一处理器连续多次(如3~10次中的一值)未对核间中断进行响应,则确定第一处理器处于中断失效状态。
可选地,第二处理器还可以根据是否接收到上述第一处理器发送的核间中断信息判断第一处理器判断第一处理器是否处于中断失效状态。具体地,第二处理器可预先获知第一处理器向第二处理器发送核间中断的时刻;例如,第二处理器可以记录第一处理器第一次向第二处理器发送核间中断的时刻,并以该时刻为基准,通过预先获知的上述第一预设时间计算第一处理器发送核间中断的时刻。进而,若第二处理器连续多次(如3~10次中的一值)未在预先计算出的第一处理器发送核间中断的时刻接收到第一处理器发送的核间间隔,则确定第一处理器处于中断失效状态。
举例而言,以多核系统包括两个处理核为例,两处理核可分别记为主处理核Core1和从处理核Core0,参见图2。两处理核不仅可进行复位看门狗寄存器的操作,还可在进行该操作的同时进行相互通信,以确定对方的存活状态,例如,上述核间中断为IPI(Inter Processor Interrupt,核间中断)中断,进而可通过IPI中断确定对方的存活状态。
参见图2,多核系统启动后,主处理核Core1可每隔第一预设时间如1s向从处理核Core0发送IPI中断,以告知从处理核Core0主处理核Core1并未死机;从处理核Core0在接收到主处理核Core1发送的IPI中断,可响应该IPI中断,并向主处理核Core1反馈相应信息,以告知主处理核Core1从处理核Core0并未死机。如果从处理核Core0已发生看门狗中断,且主处理核Core1检测到从处理核Core0对核间中断不进行响应,并连续多次如5次,则可确定发生看门狗中断的从处理核Core0处于中断失效状态;否 则,确定发生看门狗中断的从处理核Core0处于中断有效状态。
作为一个示例,参见图2,在第一处理器与第二处理器中的至少一个发生看门狗中断且处于中断有效状态时,将多核系统的状态信息获取到多核系统的存储装置中。由此,无需通过核间操作,也可保证发生看门狗中断的处理核的死机信息的有效性,便于后续的调试分析。
其中,上述死机信息包括上述多核系统的状态信息。
作为一个示例,参见图2,在第一处理器与第二处理器中的至少一个发生看门狗中断且处于中断有效状态时,第一处理器与第二处理器中的至少一个向其它处理器发送核间中断,强制其它处理器将TCM和/或Cache获取到多核系统的存储装置中。由此,可得到多核系统在发生看门狗中断时的更多信息,便于后续的调试分析,保证调试数据的准确性。
可选地,参见图2,不考虑看门狗中断的情况,如果某一个处理核能响应该中断,则表明该处理核工作正常,否则,如果某一个处理核连续多次不能响应该中断,表明该处理核出现问题。此时,需要能正常工作的处理核主动触发系统死机。
在一些实施例中,在第二处理器处于中断失效状态,第一处理器处于中断有效状态的情况下,第一处理器获取第二处理器的死机信息,并将所获取的死机信息获取至多核系统的存储装置中。
作为一个可行的示例,第一处理器访问第二处理器的存储空间,以获取第二处理器的死机信息。
其中,第一处理器通过核间AXI(Advanced eXtensible Interface,先进可扩展接口)接口访问第二处理器的TCM和/或Cache,以将TCM和/或Cache中的死机信息获取至多核系统的存储装置中。
具体地,参见图3,多个处理核Core0~Coren之间通过核间接口AXI互相访问TCM和Cache。在检测到第二处理器处于中断失效状态时,可由第一处理器主动调用核间接口来帮助第二处理器刷新TCM和L1 cache到L2 Cache中,然后主动触发死机来刷新L2 Cache到随机存取存储器RAM并保存内存镜像。需要说明的是,图3中DTCM是指数据传输总线,ITCM是指指令传输总线。
作为另一个可行的示例,第一处理器根据第一处理器与第二处理器之间的映射关系,获取第二处理器的死机信息。
在该示例中,可预先建立任意两处理核之间的映射关系,进而第一处理器获取第一处理器与第二处理器之间的映射关系。例如,对于包括四个处理核的多核系统,也预先 设置Core0与Core3之间存在映射关系,Core1与Core2之间存在映射关系,当Core0发生看门狗中断时,可通过Core3获取发生看门狗中断的处理核的死机信息。
在另一些实施例中,通过第一处理器将多核系统的Cache设置为Fresh模式,使得第二处理器在多核系统重启后将死机信息获取至多核系统的存储装置中。
具体地,当第二处理器处于中断失效状态时,如果多核系统不支持Cache或者TCM之间的跨核访问,则还可通过第一处理器的动作,以设置多核系统的Cache为Fresh模式,保证在多核系统热启动的时候也能保存住Cache和/或TCM之中的内容,从而可以在重启的时候再将其刷新到RAM,然后在保存内存镜像。
综上所述,本申请的多核系统的死机信息存储方法,可以实现在某一个处理核不能响应中断而造成多核系统即将发生死机时,由其他处理核进行刷新缓存、保存变量、打印日志等,从而保证了随机存取存储器镜像中发生死机的处理核的随机存取存储器部分的有效性,可提供更多的死机相关信息,有助于更好的解决看门狗终端造成的死机难以调试的问题。
本申请还提出了一种计算机可读存储介质。
在该实施例中,计算机可读存储介质上存储有多核系统的死机信息存储程序,该多核系统的死机信息存储程序被处理器执行时实现上述实施例的多核系统的死机信息存储方法。
图4是本申请实施例的电子设备的结构框图。
如图4所示,电子设备100包括存储器110、处理器120及存储在存储器110上并可在处理器120上多核系统的死机信息存储程序,处理器120执行死机信息存储程序时,实现上述的多核系统的死机信息存储方法。
需要说明的是,在流程图中表示或在此以其他方式描述的逻辑和/或步骤,例如,可以被认为是用于实现逻辑功能的可执行指令的定序列表,可以具体实现在任何计算机可读介质中,以供指令执行系统、装置或设备(如基于计算机的系统、包括处理器的系统或其他可以从指令执行系统、装置或设备取指令并执行指令的系统)使用,或结合这些指令执行系统、装置或设备而使用。就本说明书而言,"计算机可读介质"可以是任何可以包含、存储、通信、传播或传输程序以供指令执行系统、装置或设备或结合这些指令执行系统、装置或设备而使用的装置。计算机可读介质的更具体的示例(非穷尽性列表)包括以下:具有一个或多个布线的电连接部(电子装置),便携式计算机盘盒(磁装置),随机存取存储器(RAM),只读存储器(ROM),可擦除可编辑只读存储器(EPROM或闪速存储器),光纤装置,以及便携式光盘只读存储器(CDROM)。另外,计算机可 读介质甚至可以是可在其上打印所述程序的纸或其他合适的介质,因为可以例如通过对纸或其他介质进行光学扫描,接着进行编辑、解译或必要时以其他合适方式进行处理来以电子方式获得所述程序,然后将其存储在计算机存储器中。
应当理解,本申请的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中,多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。例如,如果用硬件来实现,和在另一实施方式中一样,可用本领域公知的下列技术中的任一项或他们的组合来实现:具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路,具有合适的组合逻辑门电路的专用集成电路,可编程门阵列(PGA),现场可编程门阵列(FPGA)等。
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本申请的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不一定指的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。
在本申请的描述中,需要理解的是,术语“中心”、“纵向”、“横向”、“长度”、“宽度”、“厚度”、“上”、“下”、“前”、“后”、“左”、“右”、“竖直”、“水平”、“顶”、“底”“内”、“外”、“顺时针”、“逆时针”、“轴向”、“径向”、“周向”等指示的方位或位置关系为基于附图所示的方位或位置关系,仅是为了便于描述本申请和简化描述,而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作,因此不能理解为对本申请的限制。
此外,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。在本申请的描述中,“多个”的含义是至少两个,例如两个,三个等,除非另有明确具体的限定。
在本申请中,除非另有明确的规定和限定,术语“安装”、“相连”、“连接”、“固定”等术语应做广义理解,例如,可以是固定连接,也可以是可拆卸连接,或成一体;可以是机械连接,也可以是电连接;可以是直接相连,也可以通过中间媒介间接相连,可以是两个元件内部的连通或两个元件的相互作用关系,除非另有明确的限定。对于本领域的普通技术人员而言,可以根据具体情况理解上述术语在本申请中的具体含义。
在本申请中,除非另有明确的规定和限定,第一特征在第二特征“上”或“下”可 以是第一和第二特征直接接触,或第一和第二特征通过中间媒介间接接触。而且,第一特征在第二特征“之上”、“上方”和“上面”可是第一特征在第二特征正上方或斜上方,或仅仅表示第一特征水平高度高于第二特征。第一特征在第二特征“之下”、“下方”和“下面”可以是第一特征在第二特征正下方或斜下方,或仅仅表示第一特征水平高度小于第二特征。
尽管上面已经示出和描述了本申请的实施例,可以理解的是,上述实施例是示例性的,不能理解为对本申请的限制,本领域的普通技术人员在本申请的范围内可以对上述实施例进行变化、修改、替换和变型。

Claims (15)

  1. 一种多核系统的死机信息存储方法,所述多核系统包括第一处理器与第二处理器,其特征在于,包括以下步骤:
    第一处理器通过所述多核系统的核间通信,确定所述第二处理器是否处于不响应中断的中断失效状态;
    在所述第二处理器处于中断失效状态的情况下,所述第一处理器进行动作使得所述第二处理器的死机信息被获取至所述多核系统的存储装置中,
    其中,所述第一处理器处于对中断进行响应的中断有效状态。
  2. 如权利要求1所述的多核系统的死机信息存储方法,其特征在于,包括如下步骤:
    在所述第二处理器处于中断失效状态,所述第一处理器处于中断有效状态的情况下,所述第一处理器获取所述第二处理器的死机信息,并将所获取的所述死机信息获取至所述多核系统的存储装置中。
  3. 如权利要求2所述的多核系统的死机信息存储方法,其特征在于,包括如下步骤:
    所述第一处理器访问所述第二处理器的存储空间,以获取所述第二处理器的死机信息。
  4. 如权利要求3所述的多核系统的死机信息存储方法,其特征在于,所述第一处理器通过核间AXI接口访问所述第二处理器的TCM和/或Cache,以将所述TCM和/或Cache中的死机信息获取至所述多核系统的存储装置中。
  5. 如权利要求2所述的多核系统的死机信息存储方法,其特征在于,包括如下步骤:
    所述第一处理器根据所述第一处理器与所述第二处理器之间的映射关系,获取所述第二处理器的死机信息。
  6. 如权利要求1所述的多核系统的死机信息存储方法,其特征在于,包括如下步骤:
    通过所述第一处理器将所述多核系统的Cache设置为Fresh模式,使得所述第二处理器在所述多核系统重启后将所述死机信息获取至所述多核系统的存储装置中。
  7. 如权利要求1-6中任一项所述的多核系统的死机信息存储方法,其特征在于,包括如下步骤:
    所述第一处理器与所述第二处理器通过每隔第一预设时间互相发送心跳信息以确定所述第一处理器和/或所述第二处理器是否处于中断失效状态。
  8. 如权利要求7所述的多核系统的死机信息存储方法,其特征在于,包括如下步骤:
    所述第一处理器与所述第二处理器通过每隔所述第一预设时间,互相发送核间中断,并互相确定是否对所述核间中断进行响应;
    如果所述第一处理器或所述第二处理器一次以上未对所述核间中断进行响应,则确定所述第一处理器或所述第二处理器处于中断失效状态。
  9. 如权利要求1所述的多核系统的死机信息存储方法,其特征在于,在所述第一处理器与所述第二处理器中的至少一个发生看门狗中断且处于中断有效状态时,将多核系统的状态信息获取到所述多核系统的存储装置中。
  10. 如权利要求9所述的多核系统的死机信息存储方法,其特征在于,在所述第一处理器与所述第二处理器中的至少一个发生看门狗中断且处于中断有效状态时,所述第一处理器与所述第二处理器中的至少一个向其它处理器发送核间中断,强制所述其它处理器将TCM和/或Cache获取到所述多核系统的存储装置中。
  11. 如权利要求1-6中任一项所述的多核系统的死机信息存储方法,其特征在于,
    所述多核系统的所述存储装置为RAM。
  12. 如权利要求1-6中任一项所述的多核系统的死机信息存储方法,其特征在于,
    所述死机信息包括所述多核系统的状态信息。
  13. 如权利要求8或10所述的多核系统的死机信息存储方法,其特征在于,
    所述核间中断为IPI中断。
  14. 一种计算机可读存储介质,其特征在于,其上存储有多核系统的死机信息存储程序,该多核系统的死机信息存储程序被处理器执行时实现如权利要求1-13中任一项所述的多核系统的死机信息存储方法。
  15. 一种电子设备,其特征在于,包括存储器、处理器及存储在存储器上并可在处理器上多核系统的死机信息存储程序,所述处理器执行所述死机信息存储程序时,实现如权利要求1-13中任一项所述的多核系统的死机信息存储方法。
PCT/CN2021/127102 2020-12-03 2021-10-28 多核系统的死机信息存储方法以及介质和电子设备 WO2022116755A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011408857.9A CN112463430B (zh) 2020-12-03 2020-12-03 多核系统的死机信息存储方法以及介质和电子设备
CN202011408857.9 2020-12-03

Publications (1)

Publication Number Publication Date
WO2022116755A1 true WO2022116755A1 (zh) 2022-06-09

Family

ID=74805845

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/127102 WO2022116755A1 (zh) 2020-12-03 2021-10-28 多核系统的死机信息存储方法以及介质和电子设备

Country Status (2)

Country Link
CN (1) CN112463430B (zh)
WO (1) WO2022116755A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116501507A (zh) * 2023-06-28 2023-07-28 北京紫光芯能科技有限公司 用于中断处理的方法及中断控制模块、处理器、存储介质
CN117331720A (zh) * 2023-11-08 2024-01-02 瀚博半导体(上海)有限公司 用于多核间通信的方法、寄存器组、芯片及计算机设备

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112463430B (zh) * 2020-12-03 2022-10-25 哲库科技(北京)有限公司 多核系统的死机信息存储方法以及介质和电子设备
CN115904793B (zh) * 2023-03-02 2023-05-23 上海励驰半导体有限公司 一种基于多核异构系统的内存转存方法、系统及芯片

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040221246A1 (en) * 2003-04-30 2004-11-04 Lsi Logic Corporation Method for use of hardware semaphores for resource release notification
US20080082711A1 (en) * 2006-09-29 2008-04-03 Dell Products L.P. System and method for managing system management interrupts in a multiprocessor computer system
CN101976217A (zh) * 2010-10-29 2011-02-16 中兴通讯股份有限公司 网络处理器异常检测方法及系统
CN111274059A (zh) * 2020-01-21 2020-06-12 浙江大华技术股份有限公司 一种从设备的软件异常处理方法及装置
CN112463430A (zh) * 2020-12-03 2021-03-09 哲库科技(北京)有限公司 多核系统的死机信息存储方法以及介质和电子设备

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106407032A (zh) * 2016-09-18 2017-02-15 深圳震有科技股份有限公司 一种基于多核系统的硬件看门狗控制方法及系统
US10949367B2 (en) * 2018-10-18 2021-03-16 Samsung Electronics Co., Ltd. Method for handling kernel service request for interrupt routines in multi-core environment and electronic device thereof
CN109597719A (zh) * 2018-12-10 2019-04-09 浪潮(北京)电子信息产业有限公司 一种多核系统的监控方法、系统、装置及可读存储介质
CN111026573B (zh) * 2019-11-19 2023-08-18 中国航空工业集团公司西安航空计算技术研究所 一种多核处理系统的看门狗系统及控制方法
CN111552618B (zh) * 2020-05-06 2024-03-12 上海龙旗科技股份有限公司 一种收集日志的方法及设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040221246A1 (en) * 2003-04-30 2004-11-04 Lsi Logic Corporation Method for use of hardware semaphores for resource release notification
US20080082711A1 (en) * 2006-09-29 2008-04-03 Dell Products L.P. System and method for managing system management interrupts in a multiprocessor computer system
CN101976217A (zh) * 2010-10-29 2011-02-16 中兴通讯股份有限公司 网络处理器异常检测方法及系统
CN111274059A (zh) * 2020-01-21 2020-06-12 浙江大华技术股份有限公司 一种从设备的软件异常处理方法及装置
CN112463430A (zh) * 2020-12-03 2021-03-09 哲库科技(北京)有限公司 多核系统的死机信息存储方法以及介质和电子设备

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116501507A (zh) * 2023-06-28 2023-07-28 北京紫光芯能科技有限公司 用于中断处理的方法及中断控制模块、处理器、存储介质
CN116501507B (zh) * 2023-06-28 2023-10-24 北京紫光芯能科技有限公司 用于中断处理的方法及中断控制模块、处理器、存储介质
CN117331720A (zh) * 2023-11-08 2024-01-02 瀚博半导体(上海)有限公司 用于多核间通信的方法、寄存器组、芯片及计算机设备
CN117331720B (zh) * 2023-11-08 2024-02-23 瀚博半导体(上海)有限公司 用于多核间通信的方法、寄存器组、芯片及计算机设备

Also Published As

Publication number Publication date
CN112463430B (zh) 2022-10-25
CN112463430A (zh) 2021-03-09

Similar Documents

Publication Publication Date Title
WO2022116755A1 (zh) 多核系统的死机信息存储方法以及介质和电子设备
JP4792113B2 (ja) プロセッサ間割り込み
US6505298B1 (en) System using an OS inaccessible interrupt handler to reset the OS when a device driver failed to set a register bit indicating OS hang condition
US9245113B2 (en) Out of band vital product data collection
US7797563B1 (en) System and method for conserving power
US8392761B2 (en) Memory checkpointing using a co-located processor and service processor
JP5726340B2 (ja) プロセッサシステム
TW201222254A (en) Method for protecting data in damaged memory cells by dynamically switching memory mode
US20120117445A1 (en) Data protection method for damaged memory cells
US10157005B2 (en) Utilization of non-volatile random access memory for information storage in response to error conditions
JP2010186242A (ja) 計算機システム
US20230325262A1 (en) Message notification method and apparatus
US11853147B2 (en) System-on-chip and method of operating the same
US7254815B2 (en) Method and apparatus for implementing distributed event management in an embedded support processor computer system
JP6060781B2 (ja) 障害診断装置およびプログラム
CN113010303A (zh) 一种处理器间的数据交互方法、装置以及服务器
JP2012003510A (ja) 計算機及び転送プログラム
US10013299B2 (en) Handling crashes of a device's peripheral subsystems
JP2009116699A (ja) 情報処理システム
WO2024016864A1 (zh) 处理器、获取信息的方法、单板及网络设备
TWI840795B (zh) 主機系統及其操作方法
JP4957972B2 (ja) サイドショー装置が計算装置からシステム情報を検索できるようにする方法及びシステム
JP3316739B2 (ja) 装置間インタフェース制御方式
JP2002189706A (ja) 通信装置の分散型初期設定システム及び方法
CN118132386A (zh) 系统崩溃信息保存方法、装置和计算机系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21899789

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21899789

Country of ref document: EP

Kind code of ref document: A1