WO2021253855A1 - 一种信息记录方法、装置、设备及可读存储介质 - Google Patents

一种信息记录方法、装置、设备及可读存储介质 Download PDF

Info

Publication number
WO2021253855A1
WO2021253855A1 PCT/CN2021/076949 CN2021076949W WO2021253855A1 WO 2021253855 A1 WO2021253855 A1 WO 2021253855A1 CN 2021076949 W CN2021076949 W CN 2021076949W WO 2021253855 A1 WO2021253855 A1 WO 2021253855A1
Authority
WO
WIPO (PCT)
Prior art keywords
ring buffer
server
information recording
address
accelerator card
Prior art date
Application number
PCT/CN2021/076949
Other languages
English (en)
French (fr)
Inventor
李振辉
郝锐
王彦伟
Original Assignee
浪潮电子信息产业股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浪潮电子信息产业股份有限公司 filed Critical 浪潮电子信息产业股份有限公司
Priority to US18/011,378 priority Critical patent/US12026037B2/en
Publication of WO2021253855A1 publication Critical patent/WO2021253855A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1668Details of memory controller
    • G06F13/1673Details of memory controller using buffers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/073Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a memory management context, e.g. virtual memory or cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • G06F11/0778Dumping, i.e. gathering error/state information after a fault for later diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • G06F11/0787Storage of error reports, e.g. persistent data storage, storage using memory protection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2213/00Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F2213/0026PCI express
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This application relates to the field of computer technology, and in particular to an information recording method, device, equipment, and readable storage medium.
  • server downtime will occur.
  • the kdump service in the server can capture debugging-related information, so that after the server fails, technicians can analyze the information to locate the fault.
  • the server's CPU error is caused by the downtime, the kdump service cannot run normally, so the relevant information cannot be captured.
  • the purpose of this application is to provide an information recording method, device, device, and readable storage medium to capture debugging-related information when a server failure causes its CPU error.
  • the specific plan is as follows:
  • this application provides an information recording method, including:
  • the server If the server is started, determine the ring buffer in the DDR of the FPGA accelerator card based on the OpenPower platform; the FPGA accelerator card is in communication connection with the server;
  • the preset debugging information is recorded in the ring buffer in real time, so that after the server fails, the fault location can be performed based on the data in the ring buffer.
  • said determining the ring buffer in the DDR of the FPGA accelerator card based on the OpenPower platform includes:
  • the size of the storage space of the DDR is read, and the size of the buffer space of the ring buffer is calculated according to a preset ratio, and the ring buffer is allocated in the DDR according to the size of the buffer space.
  • the method further includes:
  • the FPGA accelerator card After the FPGA accelerator card initializes the current data write address of the ring buffer, it maps the ring buffer to the server through PCIE.
  • the real-time recording of preset debugging information to the ring buffer includes:
  • the preset debugging information is recorded in the ring buffer in ASCII format in real time, and the current data write address is controlled to be updated accordingly.
  • the method further includes:
  • the ring buffer If the ring buffer overflows, the ring buffer is backed up or reset according to the preset configuration information.
  • the performing a backup operation on the ring buffer includes:
  • a backup storage area is determined in the DDR, and data in the ring buffer is backed up to the backup storage area.
  • said performing fault location based on data in said ring buffer includes:
  • the data in the ring buffer or the backup storage area is read through the virtual address, and the data is written into the log.txt file, so as to view the string data type through vim.
  • this application provides an information recording device, including:
  • the determining module is configured to determine the ring buffer in the DDR of the FPGA accelerator card based on the OpenPower platform if the server is started; the FPGA accelerator card is in communication connection with the server;
  • the configuration module is used to determine the start address and the end address of the ring buffer, and configure the start address and the end address to the FPGA accelerator card;
  • the recording module is used to record preset debugging information to the ring buffer in real time during the operation of the server, so that when the server fails, the fault location can be performed based on the data in the ring buffer.
  • this application provides an information recording device, including:
  • Memory used to store computer programs
  • the processor is configured to execute the computer program to implement the information recording method disclosed above.
  • the present application provides a readable storage medium for storing a computer program, wherein the computer program is executed by a processor to implement the information recording method disclosed above.
  • this application provides an information recording method, including: if the server is started, determining a ring buffer in the DDR of the FPGA accelerator card based on the OpenPower platform; the FPGA accelerator card is in communication connection with the server; Determine the start address and end address of the ring buffer, and configure the start address and the end address to the FPGA accelerator card; while the server is running, the preset debugging information is recorded in real time to all
  • the ring buffer is used to locate the fault according to the data in the ring buffer after the server fails.
  • the ring buffer is set in the DDR of the FPGA accelerator card.
  • the preset debugging information can be recorded in the ring buffer in real time during the operation of the server.
  • Technicians can locate the fault based on the data in the ring buffer.
  • the DDR of the FPGA accelerator card is used to record the preset debugging information during the operation of the server. Therefore, when a downtime failure causes a server CPU error, the debugging information can also be recorded, which is helpful for fault location and expansion.
  • the function of FPGA accelerator card is improved, and the usability of FPGA accelerator card is improved.
  • an information recording device, equipment, and readable storage medium provided in this application also have the above technical effects.
  • Figure 1 is a flow chart of an information recording method disclosed in this application.
  • FIG. 2 is a diagram of a connection architecture between an FPGA accelerator card and a server disclosed in this application;
  • FIG. 3 is a schematic diagram of an information recording device disclosed in this application.
  • Fig. 4 is a schematic diagram of an information recording device disclosed in this application.
  • this application provides An information recording scheme that can ensure that debugging information is recorded when the server's CPU error is caused by a downtime.
  • an embodiment of the present application discloses an information recording method, including:
  • FPGA Field-Programmable Gate Array
  • PCIE Peripheral Component Interconnect Express, high-speed serial computer expansion bus standard
  • determining the ring buffer in the DDR (Double Data Rate, double-rate synchronous dynamic random access memory) of the FPGA accelerator card based on the OpenPower platform includes: reading the storage space of the DDR, and according to the preset Set the ratio to calculate the size of the buffer space of the ring buffer, and allocate the ring buffer in the DDR according to the size of the buffer space.
  • the preset ratio is, for example, 1% of the storage space size of DDR.
  • the ring buffer can default to a maximum of 16M and a minimum of 1M.
  • the OpenPower platform is not very compatible with the FPGA accelerator card, when testing the acceleration processing of the data in the server by the FPGA accelerator card based on the OpenPower platform, downtime will occur. This downtime will cause a server CPU error, and because the kdump service in the server needs the support of the server CPU to complete the collection of relevant kernel information, the kdump service cannot collect debugging information related to the test when the server CPU is faulty.
  • the server may be a Linux server.
  • the OpenPower platform is a platform for Linux server tuning, which is built on open industry standards.
  • S102 Determine the start address and end address of the ring buffer, and configure the start address and end address to the FPGA accelerator card.
  • the FPGA accelerator card after configuring the start address and the end address to the FPGA accelerator card, it further includes: after the FPGA accelerator card initializes the current data write address of the ring buffer, the ring buffer is mapped to the server through PCIE. Among them, initializing the current data write address of the ring buffer can make the current data write address the same as the start address of the ring buffer, and no data is recorded in the ring buffer at this time.
  • the preset debugging information is recorded in the ring buffer in real time, so that after the server fails, the fault location can be performed based on the data in the ring buffer.
  • real-time recording of the preset debugging information to the ring buffer includes: recording the preset debugging information to the ring buffer in ASCII format in real time, and controlling the current data write address to be updated accordingly.
  • the current data write address is updated accordingly. If the ring buffer overflows (that is, the ring buffer is full) after the control current data write address is updated accordingly, the ring buffer is backed up or reset according to the preset configuration information. Among them, if the ring buffer overflows, it means that the current data write address is the same as the end address of the ring buffer, or the current data write address has exceeded the end address of the ring buffer. Among them, because it is a ring buffer, when the current data write address exceeds the end address of the ring buffer, it is essentially: the data currently written in the ring buffer overwrites the data recorded in the ring buffer earlier. Cause data loss.
  • the ring buffer can be backed up.
  • the backup operation specifically includes: determining the backup storage area in the DDR (that is, dividing a new storage area), and backing up the data in the ring buffer to the backup storage area, so that subsequent You can directly reset the ring buffer, that is, write new data directly in the overwrite mode. At this time, since the original data has been backed up to the backup storage area, no data will be lost.
  • the ring buffer can be reset directly, and there is no need to back up the data.
  • whether to directly reset the ring buffer or perform a backup operation can be preset in the preset configuration information.
  • the fault location based on the data in the ring buffer includes: mapping the address of the ring buffer or the backup storage area to a virtual address through the map function; reading the ring buffer or the backup storage through the virtual address The data in the area, and write the data to the log.txt file, so that you can view the string data type through vim.
  • the preset debugging information is recorded in the ring buffer in ASCII form, the string data type can be directly viewed through vim, which can provide convenience for the technicians.
  • the ring buffer is set in the DDR of the FPGA accelerator card.
  • the preset debugging information can be recorded in the ring buffer in real time during the running of the server.
  • the technician can locate the fault based on the data in the ring buffer.
  • the DDR of the FPGA accelerator card is used to record the preset debugging information during the operation of the server. Therefore, when a downtime fault causes a CPU error of the server, the recording of the debugging information can also be ensured, thereby helping to locate the fault. Expand the function of FPGA accelerator card, improve the usability of FPGA accelerator card.
  • the FPGA accelerator card is powered on for self-check, and at the same time, the DDR storage space size of the FPGA accelerator card is confirmed, and the storage space size is passed to the server through the bar address register.
  • the server loads the driver, it reads the bar address register to confirm the size of the DDR storage space, and then allocates 1% of the DDR storage space as a ring buffer. Then configure the start address and end address of the ring buffer to the FPGA accelerator card.
  • the FPGA accelerator card initializes the data address register of the ring buffer (that is, the current data write address), the storage space of the ring buffer is passed through the PCIE The bar address space is mapped to the server.
  • the server When the server is running, use the printk function in the driver to write the preset debugging information into the ring buffer in ASCII format in real time. If the register interrupt handling function in the driver determines that the ring buffer overflows, the FPGA accelerator card generates PCIE MSI The interruption notification server allows the server to determine whether to perform a backup operation or a reset operation on the ring buffer according to specific needs. If the ring buffer is backed up, a storage area is re-determined in the DDR for backing up the data in the current ring buffer.
  • the preset debugging information can be preset by the user to select which information is recorded during the test.
  • the driver maps the ring buffer and backup storage area to the user mode program through the remmap interface of the char device node to display the data in it.
  • the application finds the corresponding driver through the /dev/fpga device node, and maps the virtual address of the ring buffer and the backup storage area through the map function. Read the data in it through the virtual address, and write the data to the log.txt file in the form of a write file. Since the data is written into the storage area in ASCII form, you can directly see the string data type through vim, which is convenient for technical personnel to view and use.
  • this embodiment can independently configure the size of the ring buffer and the backup storage area.
  • a downtime failure causes a server CPU error, it can also ensure the recording of debugging information, provide effective debugging information, and improve it.
  • the function of FPGA accelerator card reduces the demand for hardware resources.
  • the following describes an information recording device provided in an embodiment of the present application.
  • the information recording device described below and the information recording method described above can be cross-referenced.
  • an information recording device including:
  • the determining module 301 is used to determine the ring buffer in the DDR of the FPGA accelerator card based on the OpenPower platform if the server is started; the FPGA accelerator card communicates with the server;
  • the configuration module 302 is used to determine the start address and the end address of the ring buffer, and configure the start address and the end address to the FPGA accelerator card;
  • the recording module 303 is used to record preset debugging information to the ring buffer in real time during the operation of the server, so as to locate the fault according to the data in the ring buffer after the server fails.
  • the determining module is specifically used for:
  • the FPGA accelerator card is specifically used for:
  • the ring buffer After initializing the current data write address of the ring buffer, the ring buffer is mapped to the server through PCIE.
  • the recording module is specifically used for:
  • the preset debugging information is recorded to the ring buffer in ASCII format in real time, and the current data write address is controlled to update accordingly.
  • it further includes:
  • the execution module is used to perform a backup operation or a reset operation on the ring buffer according to the preset configuration information if the ring buffer overflows.
  • the execution module is specifically used for:
  • the recording module is specifically used for:
  • this embodiment provides an information recording device that uses the DDR of the FPGA accelerator card to record preset debugging information during the operation of the server, so that the debugging information can also be ensured when the server's CPU error occurs due to a downtime. This helps to locate faults, expands the functions of the FPGA accelerator card, and improves the usability of the FPGA accelerator card.
  • the following describes an information recording device provided by an embodiment of the present application.
  • the information recording device described below and the information recording method and device described above can be cross-referenced.
  • an information recording device including:
  • the memory 401 is used to store computer programs
  • the processor 402 is configured to execute the computer program to implement the method disclosed in any of the foregoing embodiments.
  • the following introduces a readable storage medium provided by an embodiment of the present application.
  • the readable storage medium described below and the information recording method, device, and device described above can be cross-referenced.
  • a readable storage medium for storing a computer program, where the computer program is executed by a processor to implement the information recording method disclosed in the foregoing embodiments.
  • the computer program is executed by a processor to implement the information recording method disclosed in the foregoing embodiments.
  • the specific steps of the method reference may be made to the corresponding content disclosed in the foregoing embodiments, and details are not described herein again.
  • the steps of the method or algorithm described in combination with the embodiments disclosed herein can be directly implemented by hardware, a software module executed by a processor, or a combination of the two.
  • the software module can be placed in random access memory (RAM), internal memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disks, removable disks, CD-ROMs, or all areas in the technical field. Any other form of well-known readable storage medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Retry When Errors Occur (AREA)
  • Debugging And Monitoring (AREA)

Abstract

一种信息记录方法、装置、设备及可读存储介质。本申请公开的方法包括:若服务器启动,则在基于OpenPower平台的FPGA加速卡的DDR中确定环形缓冲区;确定环形缓冲区的起始地址和结束地址,并将起始地址和结束地址配置给FPGA加速卡;在服务器运行过程中,实时记录预设调试信息至环形缓冲区,以便在服务器发生故障后,根据环形缓冲区中的数据进行故障定位。本申请在服务器运行过程中,利用FPGA加速卡的DDR记录预设调试信息,因此在宕机故障造成服务器的CPU错误时,也可以确保调试信息的记录,从而有助于进行故障定位。

Description

一种信息记录方法、装置、设备及可读存储介质
本申请要求于2020年06月19日提交至中国专利局、申请号为202010568683.6、发明名称为“一种信息记录方法、装置、设备及可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,特别涉及一种信息记录方法、装置、设备及可读存储介质。
背景技术
目前,在测试基于OpenPower平台的FPGA加速卡对服务器中数据的加速处理时,会出现服务器宕机故障。一般情况下,服务器中的kdump服务可以捕获调试相关信息,这样在服务器故障后,技术人员就可以分析这些信息,以进行故障定位。但若是宕机故障造成服务器的CPU错误,那么kdump服务就无法正常运行,因此无法捕获相关信息。
因此,如何在服务器故障造成其CPU错误时,捕获调试相关信息,是本领域技术人员需要解决的问题。
发明内容
有鉴于此,本申请的目的在于提供一种信息记录方法、装置、设备及可读存储介质,以在服务器故障造成其CPU错误时,捕获调试相关信息。其具体方案如下:
第一方面,本申请提供了一种信息记录方法,包括:
若服务器启动,则在基于OpenPower平台的FPGA加速卡的DDR中确定环形缓冲区;所述FPGA加速卡与所述服务器通信连接;
确定所述环形缓冲区的起始地址和结束地址,并将所述起始地址和所述结束地址配置给所述FPGA加速卡;
在所述服务器运行过程中,实时记录预设调试信息至所述环形缓冲区,以便在所述服务器发生故障后,根据所述环形缓冲区中的数据进行故障定位。
优选地,所述在基于OpenPower平台的FPGA加速卡的DDR中确定环形缓冲区,包括:
读取所述DDR的存储空间大小,并按照预设比例计算所述环形缓冲区的缓存空间大小,按照所述缓存空间大小在所述DDR中分配所述环形缓冲区。
优选地,所述将所述起始地址和所述结束地址配置给所述FPGA加速卡之后,还包括:
所述FPGA加速卡初始化所述环形缓冲区的当前数据写地址后,将所述环形缓冲区通过PCIE映射至所述服务器。
优选地,所述实时记录预设调试信息至所述环形缓冲区,包括:
将所述预设调试信息以ASCII形式实时记录至所述环形缓冲区,并控制所述当前数据写地址相应更新。
优选地,所述控制所述当前数据写地址相应更新之后,还包括:
若所述环形缓冲区溢出,则按照预设配置信息对所述环形缓冲区进行备份操作或重置操作。
优选地,所述对所述环形缓冲区进行备份操作,包括:
在所述DDR中确定备份存储区,将所述环形缓冲区中的数据备份至所述备份存储区。
优选地,所述根据所述环形缓冲区中的数据进行故障定位,包括:
通过map函数将所述环形缓冲区或所述备份存储区的地址映射为虚拟地址;
通过所述虚拟地址读取所述环形缓冲区或所述备份存储区中的数据,并将所述数据写入log.txt文件,以便通过vim查看字符串数据类型。
第二方面,本申请提供了一种信息记录装置,包括:
确定模块,用于若服务器启动,则在基于OpenPower平台的FPGA加速卡的DDR中确定环形缓冲区;所述FPGA加速卡与所述服务器通信连接;
配置模块,用于确定所述环形缓冲区的起始地址和结束地址,并将所述起始地址和所述结束地址配置给所述FPGA加速卡;
记录模块,用于在所述服务器运行过程中,实时记录预设调试信息至所述环形缓冲区,以便在所述服务器发生故障后,根据所述环形缓冲区中 的数据进行故障定位。
第三方面,本申请提供了一种信息记录设备,包括:
存储器,用于存储计算机程序;
处理器,用于执行所述计算机程序,以实现前述公开的信息记录方法。
第四方面,本申请提供了一种可读存储介质,用于保存计算机程序,其中,所述计算机程序被处理器执行时实现前述公开的信息记录方法。
通过以上方案可知,本申请提供了一种信息记录方法,包括:若服务器启动,则在基于OpenPower平台的FPGA加速卡的DDR中确定环形缓冲区;所述FPGA加速卡与所述服务器通信连接;确定所述环形缓冲区的起始地址和结束地址,并将所述起始地址和所述结束地址配置给所述FPGA加速卡;在所述服务器运行过程中,实时记录预设调试信息至所述环形缓冲区,以便在所述服务器发生故障后,根据所述环形缓冲区中的数据进行故障定位。
可见,本申请在服务器启动后,先在FPGA加速卡的DDR中设置环形缓冲区,这样在服务器运行过程中,就可以实时记录预设调试信息至环形缓冲区,那么在所述服务器发生故障后,技术人员就可以根据环形缓冲区中的数据进行故障定位。本申请在服务器运行过程中,利用FPGA加速卡的DDR记录预设调试信息,因此在宕机故障造成服务器的CPU错误时,也可以确保调试信息的记录,从而有助于进行故障定位,也扩展了FPGA加速卡的功能,提高了FPGA加速卡的可用性。
相应地,本申请提供的一种信息记录装置、设备及可读存储介质,也同样具有上述技术效果。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。
图1为本申请公开的一种信息记录方法流程图;
图2为本申请公开的一种FPGA加速卡与服务器的连接架构图;
图3为本申请公开的一种信息记录装置示意图;
图4为本申请公开的一种信息记录设备示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
目前,在测试基于OpenPower平台的FPGA加速卡对服务器中数据的加速处理时,若出现宕机故障,且造成了服务器的CPU错误,就会导致无法捕获调试相关信息,为此,本申请提供了一种信息记录方案,能够在宕机故障造成服务器的CPU错误时,确保调试信息得到了记录。
参见图1所示,本申请实施例公开了一种信息记录方法,包括:
S101、若服务器启动,则在基于OpenPower平台的FPGA加速卡的DDR中确定环形缓冲区。
其中,FPGA(Field-Programmable Gate Array,现场可编程门阵列)加速卡与服务器基于PCIE(Peripheral Component Interconnect Express,高速串行计算机扩展总线标准)通信连接,连接架构如图2所示。若服务器启动,FPGA加速卡会上电自检,并将其中的DDR存储空间大小通过bar地址寄存器传递给服务器,因此服务器可以通过读取bar地址寄存器来确定DDR存储空间大小,并分配环形缓冲区。
在一种具体实施方式中,在基于OpenPower平台的FPGA加速卡的DDR(Double Data Rate,双倍速率同步动态随机存储器)中确定环形缓冲区,包括:读取DDR的存储空间大小,并按照预设比例计算环形缓冲区的缓存空间大小,按照缓存空间大小在DDR中分配环形缓冲区。预设比例如DDR的存储空间大小的1%。环形缓冲区可以默认最大16M,最小1M。
需要说明的是,由于OpenPower平台与FPGA加速卡不太适配,因此 在测试基于OpenPower平台的FPGA加速卡对服务器中数据的加速处理时,会出现宕机故障。该宕机故障会造成服务器的CPU错误,且由于服务器中的kdump服务需要服务器CPU的支持才能完成相关内核信息的收集,因此服务器CPU错误时,kdump服务就无法收集与测试相关的调试信息。服务器可以为Linux服务器。其中,OpenPower平台是针对Linux服务器调优的平台,这个平台构建在开放的业界标准之上。
S102、确定环形缓冲区的起始地址和结束地址,并将起始地址和结束地址配置给FPGA加速卡。
在一种具体实施方式中,将起始地址和结束地址配置给FPGA加速卡之后,还包括:FPGA加速卡初始化环形缓冲区的当前数据写地址后,将环形缓冲区通过PCIE映射至服务器。其中,初始化环形缓冲区的当前数据写地址可以使当前数据写地址与环形缓冲区的起始地址相同,此时环形缓冲区中没有记录任何数据。
S103、在服务器运行过程中,实时记录预设调试信息至环形缓冲区,以便在服务器发生故障后,根据环形缓冲区中的数据进行故障定位。
在一种具体实施方式中,实时记录预设调试信息至环形缓冲区,包括:将预设调试信息以ASCII形式实时记录至环形缓冲区,并控制当前数据写地址相应更新。
在环形缓冲区中每一次记录数据后,当前数据写地址就相应更新。若控制当前数据写地址相应更新之后,环形缓冲区溢出(即环形缓冲区被写满),则按照预设配置信息对环形缓冲区进行备份操作或重置操作。其中,若环形缓冲区溢出,则表明当前数据写地址与环形缓冲区的结束地址相同,或当前数据写地址已超过环形缓冲区的结束地址。其中,由于是环形缓冲区,因此当前数据写地址超过环形缓冲区的结束地址时,实质上是:当前写入环形缓冲区的数据覆盖了早先记录在环形缓冲区中的数据,此种情况会造成数据丢失。
为避免数据丢失,可以对环形缓冲区进行备份操作,备份操作具体包括:在DDR中确定备份存储区(即新划分一块存储区域),将环形缓冲区中的数据备份至备份存储区,这样后续就可以直接对环形缓冲区进行重置操作,即:直接以覆盖方式写入新数据。此时,由于原数据已备份至备份 存储区,所以不会造成数据丢失。
当然,若在环形缓冲区溢出时,环形缓冲区中的数据可以丢弃,那么直接对环形缓冲区进行重置操作,无需再做数据的备份工作。其中,在环形缓冲区溢出时,是直接对环形缓冲区进行重置操作还是进行备份操作,用户可以在预设配置信息中进行预先设置。
在一种具体实施方式中,根据环形缓冲区中的数据进行故障定位,包括:通过map函数将环形缓冲区或备份存储区的地址映射为虚拟地址;通过虚拟地址读取环形缓冲区或备份存储区中的数据,并将数据写入log.txt文件,以便通过vim查看字符串数据类型。其中,由于预设调试信息以ASCII形式记录至环形缓冲区,因此通过vim可以直接查看字符串数据类型,如此能够为技术人员提供方便。
可见,本申请实施例在服务器启动后,先在FPGA加速卡的DDR中设置环形缓冲区,这样在服务器运行过程中,就可以实时记录预设调试信息至环形缓冲区,那么在所述服务器发生故障后,技术人员就可以根据环形缓冲区中的数据进行故障定位。本实施例在服务器运行过程中,利用FPGA加速卡的DDR记录预设调试信息,因此在宕机故障造成服务器的CPU错误时,也可以确保调试信息的记录,从而有助于进行故障定位,也扩展了FPGA加速卡的功能,提高了FPGA加速卡的可用性。
为了更清楚地说明本申请的技术方案,下面将做具体介绍。
服务器启动后,FPGA加速卡上电自检,同时确认FPGA加速卡的DDR存储空间大小,并将该存储空间大小通过bar地址寄存器传递给服务器。服务器器加载驱动后,读取bar地址寄存器,以确认DDR存储空间大小,然后分配DDR存储空间大小的1%作为环形缓冲区。之后将环形缓冲区的起始地址和结束地址配置给FPGA加速卡,FPGA加速卡初始化环形缓冲区的数据地址寄存器(即当前数据写地址)后,将环形缓冲区的此段存储空间通过PCIE的bar地址空间映射给服务器。
在服务器运行过程中,利用驱动中的printk函数以ASCII形式将预设调试信息实时写入至环形缓冲区,若驱动中的注册中断处理函数确定环形缓冲区溢出,则FPGA加速卡产生PCIE的MSI中断通知服务器,以使服 务器根据具体需求确定是对环形缓冲区进行备份操作还是重置操作。若对环形缓冲区进行备份操作,则重新在DDR中确定一块存储区域,用于备份当前环形缓冲区中的数据。预设调试信息用户可以预先设置,以选择测试过程中的哪些信息被记录。
若服务器发生故障,则驱动通过char设备节点的remmap接口将环形缓冲区和备份存储区映射给用户态程序,用于显示其中的数据。在应用层,应用通过/dev/fpga设备节点找到对应的驱动程序,并通过map函数映射确定环形缓冲区和备份存储区的虚拟地址。通过虚拟地址读取其中的数据,并以写文件的形式将数据写到log.txt文件中。由于数据以ASCII形式写入存储区域,所以可以直接通过vim看到字符串数据类型,从而可方便技术人员查看使用。
由上可见,本实施例可以自主配置环形缓冲区和备份存储区的大小,在宕机故障造成服务器的CPU错误时,也可以确保调试信息的记录,能够提供有效的debug调试信息,也完善了FPGA加速卡的功能,降低了硬件资源需求。
下面对本申请实施例提供的一种信息记录装置进行介绍,下文描述的一种信息记录装置与上文描述的一种信息记录方法可以相互参照。
参见图3所示,本申请实施例公开了一种信息记录装置,包括:
确定模块301,用于若服务器启动,则在基于OpenPower平台的FPGA加速卡的DDR中确定环形缓冲区;FPGA加速卡与服务器通信连接;
配置模块302,用于确定环形缓冲区的起始地址和结束地址,并将起始地址和结束地址配置给FPGA加速卡;
记录模块303,用于在服务器运行过程中,实时记录预设调试信息至环形缓冲区,以便在服务器发生故障后,根据环形缓冲区中的数据进行故障定位。
在一种具体实施方式中,确定模块具体用于:
读取DDR的存储空间大小,并按照预设比例计算环形缓冲区的缓存空间大小,按照缓存空间大小在DDR中分配环形缓冲区。
在一种具体实施方式中,FPGA加速卡具体用于:
初始化环形缓冲区的当前数据写地址后,将环形缓冲区通过PCIE映射至服务器。
在一种具体实施方式中,记录模块具体用于:
将预设调试信息以ASCII形式实时记录至环形缓冲区,并控制当前数据写地址相应更新。
在一种具体实施方式中,还包括:
执行模块,用于若环形缓冲区溢出,则按照预设配置信息对环形缓冲区进行备份操作或重置操作。
在一种具体实施方式中,执行模块具体用于:
在DDR中确定备份存储区,将环形缓冲区中的数据备份至备份存储区。
在一种具体实施方式中,记录模块具体用于:
通过map函数将环形缓冲区或备份存储区的地址映射为虚拟地址;
通过虚拟地址读取环形缓冲区或备份存储区中的数据,并将数据写入log.txt文件,以便通过vim查看字符串数据类型。
其中,关于本实施例中各个模块、单元更加具体的工作过程可以参考前述实施例中公开的相应内容,在此不再进行赘述。
可见,本实施例提供了一种信息记录装置,该装置在服务器运行过程中,利用FPGA加速卡的DDR记录预设调试信息,因此在宕机故障造成服务器的CPU错误时,也可以确保调试信息的记录,从而有助于进行故障定位,也扩展了FPGA加速卡的功能,提高了FPGA加速卡的可用性。
下面对本申请实施例提供的一种信息记录设备进行介绍,下文描述的一种信息记录设备与上文描述的一种信息记录方法及装置可以相互参照。
参见图4所示,本申请实施例公开了一种信息记录设备,包括:
存储器401,用于保存计算机程序;
处理器402,用于执行所述计算机程序,以实现上述任意实施例公开的方法。
下面对本申请实施例提供的一种可读存储介质进行介绍,下文描述的 一种可读存储介质与上文描述的一种信息记录方法、装置及设备可以相互参照。
一种可读存储介质,用于保存计算机程序,其中,所述计算机程序被处理器执行时实现前述实施例公开的信息记录方法。关于该方法的具体步骤可以参考前述实施例中公开的相应内容,在此不再进行赘述。
本申请涉及的“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法或设备固有的其它步骤或单元。
需要说明的是,在本申请中涉及“第一”、“第二”等的描述仅用于描述目的,而不能理解为指示或暗示其相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。另外,各个实施例之间的技术方案可以相互结合,但是必须是以本领域普通技术人员能够实现为基础,当技术方案的结合出现相互矛盾或无法实现时应当认为这种技术方案的结合不存在,也不在本申请要求的保护范围之内。
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似部分互相参见即可。
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的可读存储介质中。
本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上 实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (10)

  1. 一种信息记录方法,其特征在于,包括:
    若服务器启动,则在基于OpenPower平台的FPGA加速卡的DDR中确定环形缓冲区;所述FPGA加速卡与所述服务器通信连接;
    确定所述环形缓冲区的起始地址和结束地址,并将所述起始地址和所述结束地址配置给所述FPGA加速卡;
    在所述服务器运行过程中,实时记录预设调试信息至所述环形缓冲区,以便在所述服务器发生故障后,根据所述环形缓冲区中的数据进行故障定位。
  2. 根据权利要求1所述的信息记录方法,其特征在于,所述在基于OpenPower平台的FPGA加速卡的DDR中确定环形缓冲区,包括:
    读取所述DDR的存储空间大小,并按照预设比例计算所述环形缓冲区的缓存空间大小,按照所述缓存空间大小在所述DDR中分配所述环形缓冲区。
  3. 根据权利要求1所述的信息记录方法,其特征在于,所述将所述起始地址和所述结束地址配置给所述FPGA加速卡之后,还包括:
    所述FPGA加速卡初始化所述环形缓冲区的当前数据写地址后,将所述环形缓冲区通过PCIE映射至所述服务器。
  4. 根据权利要求3所述的信息记录方法,其特征在于,所述实时记录预设调试信息至所述环形缓冲区,包括:
    将所述预设调试信息以ASCII形式实时记录至所述环形缓冲区,并控制所述当前数据写地址相应更新。
  5. 根据权利要求4所述的信息记录方法,其特征在于,所述控制所述当前数据写地址相应更新之后,还包括:
    若所述环形缓冲区溢出,则按照预设配置信息对所述环形缓冲区进行备份操作或重置操作。
  6. 根据权利要求5所述的信息记录方法,其特征在于,所述对所述环形缓冲区进行备份操作,包括:
    在所述DDR中确定备份存储区,将所述环形缓冲区中的数据备份至所述备份存储区。
  7. 根据权利要求6所述的信息记录方法,其特征在于,所述根据所述环形缓冲区中的数据进行故障定位,包括:
    通过map函数将所述环形缓冲区或所述备份存储区的地址映射为虚拟地址;
    通过所述虚拟地址读取所述环形缓冲区或所述备份存储区中的数据,并将所述数据写入log.txt文件,以便通过vim查看字符串数据类型。
  8. 一种信息记录装置,其特征在于,包括:
    确定模块,用于若服务器启动,则在基于OpenPower平台的FPGA加速卡的DDR中确定环形缓冲区;所述FPGA加速卡与所述服务器通信连接;
    配置模块,用于确定所述环形缓冲区的起始地址和结束地址,并将所述起始地址和所述结束地址配置给所述FPGA加速卡;
    记录模块,用于在所述服务器运行过程中,实时记录预设调试信息至所述环形缓冲区,以便在所述服务器发生故障后,根据所述环形缓冲区中的数据进行故障定位。
  9. 一种信息记录设备,其特征在于,包括:
    存储器,用于存储计算机程序;
    处理器,用于执行所述计算机程序,以实现如权利要求1至7任一项所述的信息记录方法。
  10. 一种可读存储介质,其特征在于,用于保存计算机程序,其中,所述计算机程序被处理器执行时实现如权利要求1至7任一项所述的信息记录方法。
PCT/CN2021/076949 2020-06-19 2021-02-19 一种信息记录方法、装置、设备及可读存储介质 WO2021253855A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/011,378 US12026037B2 (en) 2020-06-19 2021-02-19 Information recording method, apparatus, and device, and readable storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010568683.6A CN111858116B (zh) 2020-06-19 2020-06-19 一种信息记录方法、装置、设备及可读存储介质
CN202010568683.6 2020-06-19

Publications (1)

Publication Number Publication Date
WO2021253855A1 true WO2021253855A1 (zh) 2021-12-23

Family

ID=72987772

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/076949 WO2021253855A1 (zh) 2020-06-19 2021-02-19 一种信息记录方法、装置、设备及可读存储介质

Country Status (3)

Country Link
US (1) US12026037B2 (zh)
CN (1) CN111858116B (zh)
WO (1) WO2021253855A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111858116B (zh) 2020-06-19 2024-02-13 浪潮电子信息产业股份有限公司 一种信息记录方法、装置、设备及可读存储介质
CN113706738B (zh) * 2021-09-01 2023-06-06 陕西航空电气有限责任公司 一种航空交流起动控制器数据记录方法及系统
CN118275825B (zh) * 2024-06-03 2024-09-10 东方电子股份有限公司 一种用于配电线路行波测距的数据处理方法及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101471823A (zh) * 2007-12-29 2009-07-01 大唐移动通信设备有限公司 一种用于通信系统中故障定位的方法及装置
US20090193298A1 (en) * 2008-01-30 2009-07-30 International Business Machines Corporation System and method of fault detection, diagnosis and prevention for complex computing systems
CN104077220A (zh) * 2014-06-10 2014-10-01 中标软件有限公司 Mips架构操作系统内核的调试方法和装置
CN105204977A (zh) * 2014-06-30 2015-12-30 中兴通讯股份有限公司 一种系统异常的捕获方法、主系统、影子系统及智能设备
CN111858116A (zh) * 2020-06-19 2020-10-30 浪潮电子信息产业股份有限公司 一种信息记录方法、装置、设备及可读存储介质

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9208008B2 (en) * 2013-07-24 2015-12-08 Qualcomm Incorporated Method and apparatus for multi-chip reduced pin cross triggering to enhance debug experience
US9384106B2 (en) * 2014-02-21 2016-07-05 Rolf Segger Real time terminal for debugging embedded computing systems
US10437694B2 (en) * 2014-02-21 2019-10-08 Rolf Segger Real time terminal for debugging embedded computing systems
US9519510B2 (en) * 2014-03-31 2016-12-13 Amazon Technologies, Inc. Atomic writes for multiple-extent operations
US10116557B2 (en) * 2015-05-22 2018-10-30 Gray Research LLC Directional two-dimensional router and interconnection network for field programmable gate arrays, and other circuits and applications of the router and network
CN106598800A (zh) * 2015-10-14 2017-04-26 中兴通讯股份有限公司 一种硬件故障分析系统和方法
CN106227651A (zh) * 2016-07-12 2016-12-14 南京百敖软件有限公司 基于gpio模块的固件调试系统
KR20180041898A (ko) * 2016-10-17 2018-04-25 에스케이하이닉스 주식회사 메모리 시스템 및 메모리 시스템의 동작 방법
US10474518B1 (en) * 2016-12-06 2019-11-12 Juniper Networks, Inc. Obtaining historical information in a device core dump
CN108628726B (zh) * 2017-03-22 2021-02-23 比亚迪股份有限公司 Cpu状态信息记录方法和装置
US10901829B2 (en) * 2018-05-10 2021-01-26 International Business Machines Corporation Troubleshooting using a visual communications protocol
US10810105B2 (en) * 2019-01-08 2020-10-20 International Business Machines Corporation Logging stored information for identifying a fix for and/or a cause of an error condition
CN110134540A (zh) * 2019-05-21 2019-08-16 苏州浪潮智能科技有限公司 一种日志信息收集方法、装置、设备及可读存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101471823A (zh) * 2007-12-29 2009-07-01 大唐移动通信设备有限公司 一种用于通信系统中故障定位的方法及装置
US20090193298A1 (en) * 2008-01-30 2009-07-30 International Business Machines Corporation System and method of fault detection, diagnosis and prevention for complex computing systems
CN104077220A (zh) * 2014-06-10 2014-10-01 中标软件有限公司 Mips架构操作系统内核的调试方法和装置
CN105204977A (zh) * 2014-06-30 2015-12-30 中兴通讯股份有限公司 一种系统异常的捕获方法、主系统、影子系统及智能设备
CN111858116A (zh) * 2020-06-19 2020-10-30 浪潮电子信息产业股份有限公司 一种信息记录方法、装置、设备及可读存储介质

Also Published As

Publication number Publication date
US20230214286A1 (en) 2023-07-06
CN111858116B (zh) 2024-02-13
US12026037B2 (en) 2024-07-02
CN111858116A (zh) 2020-10-30

Similar Documents

Publication Publication Date Title
WO2021253855A1 (zh) 一种信息记录方法、装置、设备及可读存储介质
AU2017290267B2 (en) Performance variability reduction using an opportunistic hypervisor
AU2020202180A1 (en) Memory allocation techniques at partially-offloaded virtualization managers
US20180004954A1 (en) Secure booting of virtualization managers
CN109710317B (zh) 系统启动方法、装置、电子设备及存储介质
US20160132380A1 (en) Building an intelligent, scalable system dump facility
US11176020B2 (en) Server status monitoring system and method using baseboard management controller
CN106598796A (zh) 一种测试reboot时硬件信息稳定性的方法
CN111158945B (zh) 内核故障处理方法、装置、网络安全设备和可读存储介质
CN116382991A (zh) 一种存储设备测试方法、装置、计算机设备及存储介质
JP2016162428A (ja) 情報処理システム、管理装置、および情報処理システムの管理方法
CN104407806B (zh) 独立磁盘冗余阵列组硬盘信息的修改方法和装置
JP2006172100A (ja) オペレーティングシステムの高速切替え方式及びその方法
CN113468020A (zh) 内存监控方法、装置、电子设备及计算机可读存储介质
TWI311706B (en) Method for testing hard disk of extensible firmware interface
CN113010407B (zh) 一种系统信息获取方法、装置及系统
CN113645056B (zh) 一种定位智能网卡故障的方法及系统
JPH1165898A (ja) 電子計算機の保守方式
TWI715066B (zh) 定位叢集式儲存系統之硬碟實體裝設位置的方法
TW201928981A (zh) 記憶體整體測試之系統及其方法
CN107368315A (zh) 一种双硬盘双系统的启动方法及装置
CN114005484A (zh) 机械磁盘自擦除机制测试方法、装置、终端及存储介质
JPH04362755A (ja) 共用型拡張記憶試験方式
CN118227503A (zh) virtio设备的问题诊断方法、装置、设备、存储介质和程序产品
JP2016076152A (ja) エラー検出システム、エラー検出方法およびエラー検出プログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21826781

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 18011378

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21826781

Country of ref document: EP

Kind code of ref document: A1