CN117234820A - An automated testing method for SoC array server and SoC array server - Google Patents

An automated testing method for SoC array server and SoC array server Download PDF

Info

Publication number
CN117234820A
CN117234820A CN202311214753.8A CN202311214753A CN117234820A CN 117234820 A CN117234820 A CN 117234820A CN 202311214753 A CN202311214753 A CN 202311214753A CN 117234820 A CN117234820 A CN 117234820A
Authority
CN
China
Prior art keywords
controller
serial port
test result
soc
test
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311214753.8A
Other languages
Chinese (zh)
Other versions
CN117234820B (en
Inventor
陈卓杰
张定乾
支彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qishuo Shenzhen Technology Co ltd
Original Assignee
Qishuo Shenzhen Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qishuo Shenzhen Technology Co ltd filed Critical Qishuo Shenzhen Technology Co ltd
Priority to CN202311214753.8A priority Critical patent/CN117234820B/en
Publication of CN117234820A publication Critical patent/CN117234820A/en
Application granted granted Critical
Publication of CN117234820B publication Critical patent/CN117234820B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Test And Diagnosis Of Digital Computers (AREA)

Abstract

The embodiment of the invention discloses an automatic test method of an SoC array server, which is applied to a BMC main board in the SoC array server, wherein the SoC array server also comprises a back plate, a blade plate and a switching plate, the back plate comprises a back plate controller and a fan controller, and the fan controller is connected with a plurality of fans; the blade board comprises a blade board controller, an SoC board card and a serial port controller. The method specifically comprises the following steps: and enabling the backboard controller, the fan, the blade board controller, the SoC board card, the serial port controller and the exchange board to work normally respectively, setting preset duration and preset range to verify the backboard controller, the fan, the blade board controller, the SoC board card, the serial port controller and the exchange board respectively, finally obtaining a verification result and generating a corresponding test report. According to the invention, each part in the automatic test system of the SoC array server is subjected to different verification methods to obtain an accurate test result, and a complete test report is generated based on the test result, so that the comprehensive, accurate and efficient test of the SoC array server is realized.

Description

一种SoC阵列服务器的自动化测试方法及SoC阵列服务器An automated testing method for SoC array server and SoC array server

技术领域Technical field

本发明涉及设备测试技术领域,尤其涉及一种SoC阵列服务器的自动化测试方法及SoC阵列服务器。The invention relates to the technical field of equipment testing, and in particular to an automated testing method for an SoC array server and an SoC array server.

背景技术Background technique

随着科技的不断进步和信息时代的到来,数据处理和计算需求不断增长。SoC(System on Chip,系统级芯片)阵列服务器作为一种集成了处理器、内存和I/O接口等多种功能的高性能服务器,其高密度和高计算能力能够满足日益增长的数据处理需求,对于大规模数据中心、云计算、人工智能等领域具有重要意义,并在这些领域得到了广泛的应用。进一步的,针对于SoC阵列服务器的测试也开始被广泛关注,由于SoC阵列服务器的复杂性和大规模使得传统的手动测试和检测方式面临许多挑战,包括例如耗时费力,还容易产生人为错误,增加了系统部署和维护的成本等问题。With the continuous advancement of science and technology and the advent of the information age, the demand for data processing and computing continues to grow. SoC (System on Chip) array server is a high-performance server that integrates multiple functions such as processor, memory and I/O interface. Its high density and high computing power can meet the growing needs of data processing. , is of great significance to large-scale data centers, cloud computing, artificial intelligence and other fields, and has been widely used in these fields. Further, the testing of SoC array servers has also begun to receive widespread attention. Due to the complexity and large scale of SoC array servers, traditional manual testing and inspection methods face many challenges, including being time-consuming and labor-intensive, and prone to human errors. Increased system deployment and maintenance costs and other issues.

因此,亟需一种可靠的、自动化SoC阵列服务器测试方案,能够对服务器内部的各个部件进行全面的、高效的测试。Therefore, there is an urgent need for a reliable and automated SoC array server testing solution that can conduct comprehensive and efficient testing of various components within the server.

发明内容Contents of the invention

基于此,有必要针对上述问题,提出一种SoC阵列服务器的自动化测试方法及SoC阵列服务器,可以实现对SoC阵列服务器的内部的各个部件进行全面的、高效的测试。Based on this, it is necessary to propose an automated testing method and SoC array server for the SoC array server in response to the above problems, which can achieve comprehensive and efficient testing of various internal components of the SoC array server.

在第一方面,本发明提供一种SoC阵列服务器的自动化测试方法,所述方法应用于SoC阵列服务器的BMC主板,所述SoC阵列服务器还包括背板、刀片板、交换板,所述背板包括背板控制器、风扇控制器,所述风扇控制器连接有多个风扇;所述刀片板包括刀片板控制器、SoC板卡、串口控制器;所述方法包括:In a first aspect, the present invention provides an automated testing method for an SoC array server. The method is applied to the BMC motherboard of the SoC array server. The SoC array server also includes a backplane, a blade board, and a switching board. The backplane It includes a backplane controller and a fan controller, and the fan controller is connected to multiple fans; the blade board includes a blade board controller, an SoC board card, and a serial port controller; the method includes:

发送第一串口指令至所述背板控制器,根据在第一预设时长内是否接收到所述背板控制器传回的成功指令,生成第一测试结果;Send a first serial port command to the backplane controller, and generate a first test result based on whether a successful command returned by the backplane controller is received within a first preset time period;

在第一预设范围内基于所述风扇控制器调整所述风扇转速,根据调整后所述风扇的实际转速是否符合转速阈值,生成第二测试结果;Adjust the fan speed based on the fan controller within a first preset range, and generate a second test result based on whether the adjusted actual speed of the fan meets a speed threshold;

接收所述交换板的启动时长,根据所述启动时长是否符合预设时长范围,生成第三测试结果;Receive the startup duration of the switching board, and generate a third test result based on whether the startup duration meets the preset duration range;

发送第二串口指令至所述刀片板控制器,根据在所述第一预设时长内是否接收到所述刀片板控制器传回的成功指令,生成第四测试结果;Send a second serial port command to the blade board controller, and generate a fourth test result based on whether a successful command returned by the blade board controller is received within the first preset time period;

监控所述串口控制器的串口数据持续输出情况,根据在所述第一预设时长内是否接收到所述串口控制器输出的所述串口数据,生成第五测试结果;Monitor the continuous output of serial port data by the serial port controller, and generate a fifth test result according to whether the serial port data output by the serial port controller is received within the first preset time period;

获取所述SoC阵列服务器的启动时长、网速、USB检测结果、深度恢复模式检测结果、串口检测结果中的至少一项,根据所述启动时长、网速、USB检测结果、深度恢复模式检测结果、串口检测结果是否都符合预设标准,生成第六测试结果;Obtain at least one of the startup time, network speed, USB detection results, deep recovery mode detection results, and serial port detection results of the SoC array server, and based on the startup time, network speed, USB detection results, and deep recovery mode detection results , whether the serial port test results all meet the preset standards and generate the sixth test result;

对于所述SoC板卡的各个部件分别进行预设老化时长的老化测试,根据所述老化测试时所述系统的运行情况,生成第七测试结果;Perform an aging test with a preset aging time on each component of the SoC board, and generate a seventh test result based on the operation of the system during the aging test;

多次重启所述系统,获取每次所述系统的重启时长,根据所述多次重启时长,生成第八测试结果;Restart the system multiple times, obtain the restart duration of each system, and generate an eighth test result based on the multiple restart durations;

基于所述第一测试结果、所述第二测试结果、所述第三测试结果、所述第四测试结果、所述第五测试结果、所述第六测试结果、所述第七测试结果、所述第八测试结果,生成并输出测试报告。Based on the first test result, the second test result, the third test result, the fourth test result, the fifth test result, the sixth test result, the seventh test result, The eighth test result generates and outputs a test report.

可选的,所述发送第一串口指令至所述背板控制器,根据在第一预设时长内是否接收到所述背板控制器传回的成功指令,生成第一测试结果,包括:Optionally, sending the first serial port command to the backplane controller, and generating a first test result based on whether a successful command returned by the backplane controller is received within a first preset time period includes:

下发串口指令至所述背板控制器,等待第一预设时长,若在所述第一预设时长内接收到所述背板控制器传回的成功指令,则所述第一测试结果为正常;Send a serial port command to the backplane controller and wait for a first preset time period. If a successful command returned by the backplane controller is received within the first preset time period, the first test result will be is normal;

若在所述第一预设时长内未接收到所述成功指令,则重复n次所述下发串口指令至所述背板控制器的步骤及其之后的步骤,若n次均未接收到所述成功指令,则所述第一测试结果为错误,此时终止所述测试。If the successful command is not received within the first preset time period, repeat the step of issuing the serial port command to the backplane controller n times and the subsequent steps. If the successful command is not received n times, If the instruction is successful, the first test result is an error, and the test is terminated at this time.

可选的,所述在第一预设范围内基于所述风扇控制器调整所述风扇转速,根据调整后所述风扇的实际转速是否符合转速阈值,生成第二测试结果,包括:Optionally, the fan speed is adjusted based on the fan controller within a first preset range, and a second test result is generated based on whether the adjusted actual speed of the fan meets a speed threshold, including:

在第一预设范围内基于所述风扇控制器调整所述风扇转速,所述第一预设范围内的每个值均对应于一个转速阈值,若在第一预设范围内选取某个值调整所述风扇转速,调整后所述风扇的实际转速符合所述转速阈值,则所述第二测试结果为正常;The fan speed is adjusted based on the fan controller within a first preset range. Each value in the first preset range corresponds to a speed threshold. If a certain value is selected within the first preset range Adjust the fan speed, and the actual speed of the fan after adjustment meets the speed threshold, then the second test result is normal;

若调整后所述风扇的实际转速不符合所述转速阈值,则重复n次所述在第一预设范围内选取某个值调整所述风扇转速的步骤及其之后的步骤,若n次得到的所述实际风扇转速均未符合所述转速阈值,则所述第二测试结果为错误,此时终止所述测试。If the actual speed of the fan after adjustment does not meet the speed threshold, repeat the step of selecting a value within the first preset range to adjust the fan speed and the subsequent steps n times. If n times, If none of the actual fan speeds meets the speed threshold, the second test result is an error, and the test is terminated.

可选的,所述接收所述交换板的启动时长,根据所述启动时长是否符合预设时长范围,生成第三测试结果,包括:Optionally, receiving the startup duration of the switching board, and generating a third test result based on whether the startup duration complies with a preset duration range, including:

重启交换板,接收所述交换板的启动时长,若所述启动时长符合预设时长范围,则所述第三测试结果为正常;Restart the switching board and receive the startup duration of the switching board. If the startup duration meets the preset duration range, the third test result is normal;

若所述启动时长不符合所述预设时长范围,则重复n次所述重启交换板的步骤及其之后的步骤,若n次得到的启动时长均未符合所述预设时长范围,则所述第三测试结果为错误,此时终止所述测试。If the startup duration does not meet the preset duration range, repeat the step of restarting the switch board and subsequent steps n times. If the startup duration obtained n times does not comply with the preset duration range, then the The third test result is an error, and the test is terminated at this time.

可选的,所述发送第二串口指令至所述刀片板控制器,根据在所述第一预设时长内是否接收到所述刀片板控制器传回的成功指令,生成第四测试结果,包括:Optionally, the second serial port command is sent to the blade board controller, and a fourth test result is generated based on whether a successful command returned by the blade board controller is received within the first preset time period, include:

下发串口指令至所述刀片板控制器,等待第一预设时长,若在所述第一预设时长内接收到所述刀片板控制器传回的成功指令,则所述第四测试结果为正常;Send a serial port command to the blade board controller and wait for a first preset time period. If a successful command returned by the blade board controller is received within the first preset time period, the fourth test result will be is normal;

若在所述第一预设时长内未接收到所述成功指令,则重复n次所述下发串口指令至所述刀片板控制器的步骤及其之后的步骤,若n次均未接收到所述成功指令,则所述第四测试结果为错误,此时终止所述测试。If the successful command is not received within the first preset time period, repeat the step of issuing the serial port command to the blade board controller n times and the subsequent steps. If the successful command is not received n times, If the instruction is successful, the fourth test result is an error, and the test is terminated at this time.

可选的,所述监控所述串口控制器的串口数据持续输出情况,根据在所述第一预设时长内是否接收到所述串口控制器输出的所述串口数据,生成第五测试结果,包括:Optionally, monitor the continuous output of serial port data of the serial port controller, and generate a fifth test result according to whether the serial port data output by the serial port controller is received within the first preset time period, include:

监控所述串口控制器的串口数据持续输出情况,若在所述第一预设时长内未接收到所述串口控制器输出的所述串口数据,则等待第二预设时长,若在所述第二预设时长内均未接收到所述串口数据,则所述第五测试结果为错误,此时终止所述测试。Monitor the continuous output of serial port data of the serial port controller. If the serial port data output by the serial port controller is not received within the first preset time period, wait for the second preset time period. If within the first preset time period, If the serial port data is not received within the second preset time period, the fifth test result is an error, and the test is terminated at this time.

可选的,获取所述SoC阵列服务器的启动时长、网速、USB检测结果、深度恢复模式检测结果、串口检测结果中的至少一项,所述根据所述启动时长、网速、USB检测结果、深度恢复模式检测结果、串口检测结果是否都符合预设标准,生成第六测试结果,包括:Optionally, obtain at least one of the startup duration, network speed, USB detection results, deep recovery mode detection results, and serial port detection results of the SoC array server. According to the startup duration, network speed, and USB detection results, , whether the deep recovery mode detection results and the serial port detection results all meet the preset standards, generate the sixth test result, including:

重启所述SoC阵列服务器,获取启动时长;获取网速、USB检测结果、深度恢复模式检测结果、串口检测结果中的至少一项,若所述启动时长、网速、USB检测结果、深度恢复模式检测结果、串口检测结果都符合预设标准,则所述第六测试结果为正常;Restart the SoC array server and obtain the startup time; obtain at least one of the network speed, USB detection results, deep recovery mode detection results, and serial port detection results. If the startup time, network speed, USB detection results, and deep recovery mode If the test results and serial port test results both meet the preset standards, then the sixth test result is normal;

若所述启动时长、网速、USB检测结果、深度恢复模式检测结果、串口检测结果中的至少一项不符合预设标准,则重复n次所述重启所述SoC阵列服务器的步骤及其之后的步骤,若n次的检测结果中均至少一个不符合预设标准,则所述第六测试结果为错误,此时终止所述测试。If at least one of the boot time, network speed, USB detection results, deep recovery mode detection results, and serial port detection results does not meet the preset standard, repeat the steps of restarting the SoC array server n times and thereafter. step, if at least one of the n test results does not meet the preset standard, the sixth test result is an error, and the test is terminated at this time.

可选的,所述对于所述SoC板卡的各个部件分别进行预设老化时长的老化测试,根据所述老化测试时所述系统的运行情况,生成第七测试结果,包括:Optionally, each component of the SoC board is subjected to an aging test with a preset aging time, and a seventh test result is generated based on the operation of the system during the aging test, including:

对所述SoC板卡的各个部件进行老化测试,所述老化测试均为预设老化时长,若在所述老化测试期间,所述系统出现重启和/或宕机的情况,则所述第七测试结果为错误,此时终止所述测试。An aging test is performed on each component of the SoC board. The aging test is a preset aging time. If the system restarts and/or crashes during the aging test, the seventh The test result is an error and the test is terminated at this time.

可选的,所述多次重启所述系统,获取每次所述系统的重启时长,根据所述多次重启时长,生成第八测试结果,包括:Optionally, restart the system multiple times, obtain the restart duration of each system, and generate an eighth test result based on the restart duration, including:

模仿用户使用场景,对所述系统进行多次重启,获取每次所述系统的重启时长,若所述每次系统的重启时长均符合预设重启时长标准,则所述第八测试结果为正常;Imitate user usage scenarios, restart the system multiple times, and obtain the restart time of each system. If the restart time of each system meets the preset restart time standard, the eighth test result is normal. ;

若至少一次的系统的重启时长不符合所述预设重启时长标准,则所述第八测试结果为错误,此时终止所述测试。If the restart duration of at least one system does not meet the preset restart duration standard, the eighth test result is an error, and the test is terminated.

在第二方面,本发明提供一种SoC阵列服务器,所述SoC阵列服务器包括互相连接的BMC主板、背板、刀片板、交换板,所述背板包括背板控制器、风扇控制器,所述风扇控制器连接有多个风扇;所述刀片板包括刀片板控制器、SoC板卡、串口控制器;In a second aspect, the present invention provides a SoC array server. The SoC array server includes a BMC motherboard, a backplane, a blade board, and a switching board that are connected to each other. The backplane includes a backplane controller and a fan controller. The fan controller is connected to multiple fans; the blade board includes a blade board controller, SoC board card, and serial port controller;

所述BMC主板用于管理和监控整个SoC阵列服务器的运行状态;所述BMC主板还用于实时监测所述SoC阵列服务器的健康状态,并进行远程管理和维护;The BMC mainboard is used to manage and monitor the running status of the entire SoC array server; the BMC mainboard is also used to monitor the health status of the SoC array server in real time and perform remote management and maintenance;

所述背板控制器与所述BMC主板连接,所述背板控制器用于连接并协调所述背板上的各个硬件组件,确保所述各个硬件组件得以正常运行和通信,并管理和控制整个背板上的各个子模块;The backplane controller is connected to the BMC mainboard. The backplane controller is used to connect and coordinate the various hardware components on the backplane, ensure the normal operation and communication of the various hardware components, and manage and control the entire Each sub-module on the backplane;

所述刀片板控制器与所述背板控制器连接,所述刀片板控制器用于与所述刀片板内的各个组件进行通信,并管理和监控所述刀片板内的各个组件的运行状态;The blade board controller is connected to the backplane controller, and the blade board controller is used to communicate with each component in the blade board, and manage and monitor the operating status of each component in the blade board;

所述串口控制器与所述BMC主板、所述刀片板控制器、所述SoC板卡、所述交换板连接,用于对所述SoC阵列服务器中的串口设备进行配置和控制,同时支持各个串口通信,并提供所述SoC阵列服务器的状态监测功能;The serial port controller is connected to the BMC motherboard, the blade board controller, the SoC board card, and the switching board, and is used to configure and control the serial port devices in the SoC array server, while supporting various Serial communication and providing status monitoring function of the SoC array server;

所述交换板用于在所述SoC阵列服务器内部的各个部件之间建立高速、稳定的数据通道;所述交换板还用于支持所述SoC阵列服务器内部的各个部件之间快速交换数据,并支持灵活的网络配置,以满足不同业务场景的需求;The switching board is used to establish high-speed and stable data channels between various components inside the SoC array server; the switching board is also used to support rapid data exchange between various components inside the SoC array server, and Support flexible network configuration to meet the needs of different business scenarios;

所述风扇控制器用于控制所述风扇转速和监控所述风扇状态;The fan controller is used to control the fan speed and monitor the fan status;

所述SoC板卡包括一个SoC芯片,所述SoC板卡用于执行所述SoC阵列服务器上的各种计算任务和数据处理操作;The SoC board includes an SoC chip, and the SoC board is used to perform various computing tasks and data processing operations on the SoC array server;

其中,所述BMC主板用于执行如第一方面中任一项所述的SoC阵列服务器的自动化测试方法。Wherein, the BMC motherboard is used to execute the automated testing method of the SoC array server as described in any one of the first aspects.

采用本发明实施例,具有如下有益效果:Adopting the embodiments of the present invention has the following beneficial effects:

本发明提供了一种SoC阵列服务器的自动化测试方法,所述方法应用于一种SoC阵列服务器的BMC主板,所述SoC阵列服务器还包括背板、刀片板、交换板,所述背板包括背板控制器、风扇控制器,所述风扇控制器连接有多个风扇;所述刀片板包括刀片板控制器、SoC板卡、串口控制器;该方法具体为:令背板控制器、风扇、刀片板控制器、SoC板卡、串口控制器、交换板,分别正常工作,并设置预设的时长和预设的范围分别对其进行验证,最终得到验证结果,并生成测试报告。本发明通过具体的对SoC阵列服务器的自动化测试系统中的每个部分进行不同的验证方法进行验证,得到准确的测试结果,基于测试结果生成完整的测试报告,实现了对SoC阵列服务器的全面、高效的测试。The invention provides an automated testing method for an SoC array server. The method is applied to a BMC mainboard of an SoC array server. The SoC array server also includes a backplane, a blade board, and a switching board. The backplane includes a backplane. board controller and fan controller, the fan controller is connected to multiple fans; the blade board includes a blade board controller, an SoC board card, and a serial port controller; the method is specifically: let the backplane controller, fans, The blade board controller, SoC board, serial port controller, and switching board work normally, and the preset time length and preset range are set to verify them respectively. Finally, the verification results are obtained and a test report is generated. The present invention specifically conducts different verification methods on each part of the automated test system of the SoC array server to obtain accurate test results, and generates a complete test report based on the test results, thereby realizing comprehensive and comprehensive testing of the SoC array server. Efficient testing.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are only These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without exerting creative efforts.

其中:in:

图1为本申请实施例提供的一种SoC阵列服务器的自动化测试方法流程示意图;Figure 1 is a schematic flow chart of an automated testing method for an SoC array server provided by an embodiment of the present application;

图2为本申请实施例提供的一种SoC阵列服务器的自动化测试方法举例示意图;Figure 2 is a schematic diagram showing an example of an automated testing method for an SoC array server provided by an embodiment of the present application;

图3为本申请实施例中提供的一种SoC阵列服务器结构示意图。Figure 3 is a schematic structural diagram of an SoC array server provided in an embodiment of the present application.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without making creative efforts fall within the scope of protection of the present invention.

本申请实施例提出了一种SoC阵列服务器自动化测试方法,该方法通过智能化的自动化测试系统,对SoC阵列服务器的背板MCU、刀片MCU、串口MCU、交换板和SoC板卡等各个部件进行全面的、高效的自动化测试,从而提高服务器的可靠性和性能,并降低系统部署和维护的成本。该方法能够准确地检测部件是否正常工作,并生成相应的检测报告,进而可以依据检测报告及时发现潜在问题,进一步的实现为大规模SoC阵列服务器的部署和维护带来重要的技术支持。The embodiment of this application proposes an automated testing method for an SoC array server. This method uses an intelligent automated testing system to test various components such as the backplane MCU, blade MCU, serial port MCU, switching board, and SoC board card of the SoC array server. Comprehensive and efficient automated testing to improve server reliability and performance and reduce system deployment and maintenance costs. This method can accurately detect whether the components are working properly and generate corresponding detection reports, so that potential problems can be discovered in a timely manner based on the detection reports. Further implementation can bring important technical support to the deployment and maintenance of large-scale SoC array servers.

请参阅图1,为本申请实施例提供的一种SoC阵列服务器的自动化测试方法流程示意图,该方法应用于SoC阵列服务器的BMC主板,SoC阵列服务器还包括背板、刀片板、交换板;背板包括背板控制器、风扇控制器;风扇控制器连接有多个风扇;刀片板包括刀片板控制器、SoC板卡、串口控制器;该方法具体包括:Please refer to Figure 1, which is a schematic flow chart of an automated testing method for an SoC array server provided by an embodiment of the present application. The method is applied to the BMC motherboard of the SoC array server. The SoC array server also includes a backplane, a blade board, and a switching board; the backplane The board includes a backplane controller and a fan controller; the fan controller is connected to multiple fans; the blade board includes a blade board controller, an SoC board card, and a serial port controller; the method specifically includes:

步骤101、发送第一串口指令至背板控制器,根据在第一预设时长内是否接收到背板控制器传回的成功指令,生成第一测试结果。Step 101: Send a first serial port command to the backplane controller, and generate a first test result based on whether a successful command returned by the backplane controller is received within a first preset time period.

在本申请实施例中,BMC主板下发串口指令至背板控制器,等待第一预设时长,若在第一预设时长内接收到背板控制器传回的成功指令,则第一测试结果为正常;若在第一预设时长内未接收到成功指令,则重复n次下发串口指令至所述背板控制器的步骤及其之后的步骤,若n次均未接收到成功指令,则第一测试结果为错误,此时终止测试。In the embodiment of this application, the BMC motherboard issues serial port commands to the backplane controller and waits for the first preset time period. If a successful command is received from the backplane controller within the first preset time period, the first test The result is normal; if no successful command is received within the first preset time period, repeat the steps of issuing serial port commands to the backplane controller n times and the subsequent steps. If no successful command is received n times, , then the first test result is an error, and the test is terminated at this time.

其中,第一预设时长可以优选为3s,n可以优选为3次。需要说明的是,此处的第一预设时长和n的值仅为一种优选的举例说明,其余符合标准的值均可应用于本申请实施例,在此处并不做过多限制。The first preset duration may be preferably 3 seconds, and n may be preferably 3 times. It should be noted that the first preset duration and the value of n here are only a preferred example, and other values that meet the standards can be applied to the embodiments of the present application, and there are no excessive restrictions here.

对步骤101进行举例说明:BMC主板下发串口指令至背板控制器,等待3s的时间,若3s内接收到背板控制器返回成功指令,则说明背板控制器正常,此时返回第一测试结果为正常;若3s内没有接收到背板控制器返回的成功指令,则说明背板控制器可能存在异常,此时重试3次前述步骤,如果3次中均未接收到该成功指令或者接收到其他与成功指令无关的异常信号,则确认背板控制器异常,此时返回第一测试结果为错误,提醒用户背板控制器出现异常情况,并终止测试流程。Let’s give an example of step 101: The BMC motherboard issues serial port commands to the backplane controller and waits for 3 seconds. If the backplane controller returns a successful command within 3 seconds, it means that the backplane controller is normal. At this time, return to the first step. The test result is normal; if no successful command is received from the backplane controller within 3 seconds, it means that the backplane controller may be abnormal. At this time, retry the above steps 3 times. If the successful command is not received in 3 times, Or if other abnormal signals unrelated to the successful command are received, it is confirmed that the backplane controller is abnormal. At this time, the first test result is returned as an error, reminding the user that there is an abnormality in the backplane controller, and terminating the test process.

步骤102、在第一预设范围内基于风扇控制器调整风扇转速,根据调整后风扇的实际转速是否符合转速阈值,生成第二测试结果。Step 102: Adjust the fan speed based on the fan controller within the first preset range, and generate a second test result based on whether the adjusted actual fan speed meets the speed threshold.

在一种可行的实现方式中,在第一预设范围内基于风扇控制器调整风扇转速,第一预设范围内的每个值均对应于一个转速阈值,若在第一预设范围内选取某个值调整风扇转速,调整后风扇的实际转速符合转速阈值,则第二测试结果为正常;若调整后风扇的实际转速不符合转速阈值,则重复n次在第一预设范围内选取某个值调整风扇转速的步骤及其之后的步骤,若n次得到的实际风扇转速均未符合转速阈值,则第二测试结果为错误,此时终止测试。In a feasible implementation, the fan speed is adjusted based on the fan controller within the first preset range. Each value in the first preset range corresponds to a speed threshold. If the fan speed is selected within the first preset range, Adjust the fan speed to a certain value. If the actual fan speed after adjustment meets the speed threshold, the second test result is normal; if the actual fan speed after adjustment does not meet the speed threshold, repeat n times to select a certain value within the first preset range. In the steps of adjusting the fan speed by n values and subsequent steps, if the actual fan speed obtained n times does not meet the speed threshold, the second test result is an error, and the test is terminated at this time.

其中,风扇的满载转速为15000转,则预设第一预设范围为10%至100%,此时风扇满载转速的10%对应的转数为1500转,则设置10%对应的转速阈值为1500±10%(其中1500为15000*10%);以此类推,得到第一预设范围10%至100%内的每个值对应的转速阈值。同上所述,第一预设时长可以优选为3s,n可以优选为3次。需要说明的是,此处的第一预设时长和n的值仅为一种优选的举例说明,其余符合标准的值均可应用于本申请实施例,在此处并不做过多限制。Among them, the full load speed of the fan is 15,000 rpm, and the first preset range is preset to be 10% to 100%. At this time, the number of revolutions corresponding to 10% of the fan's full load speed is 1,500 rpm, and the speed threshold corresponding to 10% is set to 1500±10% (where 1500 is 15000*10%); and by analogy, the rotation speed threshold corresponding to each value within the first preset range of 10% to 100% is obtained. As mentioned above, the first preset duration may be preferably 3 seconds, and n may be preferably 3 times. It should be noted that the first preset duration and the value of n here are only a preferred example, and other values that meet the standards can be applied to the embodiments of the present application, and there are no excessive restrictions here.

对步骤102举例说明:BMC主板从10%至100%中选取某个值调整风扇转速,假设选取25%,则对应的转速阈值为3750±25%(其中3750为15000*25%),若调整后的实际转速没有超过转速阈值3750±25%,则说明风扇是正常的,返回第二测试结果为正常;若调整后的实际转速超过了转速阈值3750±25%,则说明风扇可能异常,此时要重试3次前述调整步骤,若3次的结果都显示风扇异常,即3次中风扇的实际转速均超过了转速阈值3750±25%,则确定风扇异常,此时要返回第二测试结果为错误,提醒用户风扇出现异常情况,并终止测试流程。An example of step 102: The BMC motherboard selects a value from 10% to 100% to adjust the fan speed. Assume that 25% is selected, then the corresponding speed threshold is 3750±25% (where 3750 is 15000*25%). If adjusted If the actual speed after adjustment does not exceed the speed threshold of 3750±25%, it means that the fan is normal, and the second test result is returned as normal; if the actual speed after adjustment exceeds the speed threshold of 3750±25%, it means that the fan may be abnormal, and this If the results of the three times show that the fan is abnormal, that is, the actual speed of the fan exceeds the speed threshold of 3750±25% in all three times, it is determined that the fan is abnormal, and then return to the second test. The result is an error, which reminds the user that there is an abnormality in the fan and terminates the test process.

需要注意的是,步骤102中对风扇的转速的调整除了选取第一预设范围内的任意值调整,也可以是从10%逐渐调整至100%。It should be noted that in step 102, in addition to adjusting the fan speed by selecting any value within the first preset range, the fan speed may also be adjusted gradually from 10% to 100%.

步骤103、接收交换板的启动时长,根据启动时长是否符合预设时长范围,生成第三测试结果。Step 103: Receive the startup duration of the switching board, and generate a third test result based on whether the startup duration meets the preset duration range.

在本申请实施例中,重启交换板,接收交换板的启动时长,若启动时长符合预设时长范围,则第三测试结果为正常;若启动时长不符合预设时长范围,则重复n次重启交换板的步骤及其之后的步骤,若n次得到的启动时长均未符合预设时长范围,则第三测试结果为错误,此时终止测试。In the embodiment of this application, the switching board is restarted and the startup time of the switch board is received. If the startup time meets the preset time range, the third test result is normal; if the startup time does not meet the preset time range, the restart is repeated n times. In the step of exchanging the board and the subsequent steps, if the startup duration obtained n times does not meet the preset duration range, the third test result is an error, and the test is terminated at this time.

可以理解的是,步骤103主要是对于交换板的上下电测试,用于测试交换板重启电路是否正常,以及网络通路是否正常。其中,交换板启动时间一般为30s,最长不超过40s,是故预设时长范围优选为30s至40s,n优选为3次。It can be understood that step 103 is mainly a power-on and power-off test of the switching board, and is used to test whether the restart circuit of the switching board is normal and whether the network path is normal. Among them, the switching board startup time is generally 30s, and the longest does not exceed 40s, so the preset time range is preferably 30s to 40s, and n is preferably 3 times.

对步骤103进行举例说明:重启交换板,接收交换板的启动时长,若启动时长在30s至40s之间,则说明交换板是正常的,返回第三测试结果为正常;若启动时长超过40s,或网络访问不正常,则说明交换板可能存在异常情况,重复3次前述步骤,若3次中得到的启动时长均超过40s,此时要返回第三测试结果为错误,提醒用户交换板出现异常情况,并终止测试流程。Give an example of step 103: restart the switching board and receive the startup time of the switching board. If the startup time is between 30s and 40s, it means that the switching board is normal, and the third test result is returned as normal; if the startup time exceeds 40s, Or the network access is abnormal, it means that there may be an abnormality in the switch board. Repeat the above steps three times. If the boot time obtained in the three times exceeds 40s, the third test result will be returned as an error at this time to remind the user that the switch board is abnormal. situation and terminate the test process.

步骤104、发送第二串口指令至刀片板控制器,根据在第一预设时长内是否接收到刀片板控制器传回的成功指令,生成第四测试结果。Step 104: Send a second serial port command to the blade board controller, and generate a fourth test result based on whether a successful command returned by the blade board controller is received within the first preset time period.

在本申请实施例中,下发串口指令至刀片板控制器,等待第一预设时长,若在第一预设时长内接收到刀片板控制器传回的成功指令,则第四测试结果为正常;若在第一预设时长内未接收到成功指令,则重复n次下发串口指令至刀片板控制器的步骤及其之后的步骤,若n次均未接收到成功指令,则第四测试结果为错误,此时终止测试。In the embodiment of the present application, a serial port command is sent to the blade board controller and the wait time is the first preset time. If a successful command returned by the blade board controller is received within the first preset time, the fourth test result is Normal; if no successful command is received within the first preset time period, repeat the steps of issuing the serial port command to the blade board controller n times and the subsequent steps. If no successful command is received n times, the fourth The test result is an error and the test is terminated at this time.

其中,如上所述,第一预设时长优选为3s,n优选为3次。Wherein, as mentioned above, the first preset duration is preferably 3s, and n is preferably 3 times.

对步骤104进行举例说明:BMC主板下发串口指令至刀片板控制器,等待3s时间,若3s内接收到刀片板控制器返回的成功指令,则说明刀片板控制器正常,此时返回第四测试结果为正常;若3s内BMC主板没能接收到刀片板控制器返回的成功指令或者接收到其他异常信号,则说明刀片板控制器可能存在异常,此时重试3次前述步骤,如果3次中均未收到刀片板控制器传回的成功指令,则确认刀片板控制器异常,此时返回第四测试结果为错误,提醒用户刀片板控制器出现了异常情况,并终止测试流程。Take an example to illustrate step 104: The BMC motherboard issues a serial port command to the blade board controller and waits for 3 seconds. If a successful command returned by the blade board controller is received within 3 seconds, it means that the blade board controller is normal. At this time, return to the fourth step. The test result is normal; if the BMC mainboard fails to receive the successful command returned by the blade board controller within 3 seconds or receives other abnormal signals, it means that the blade board controller may be abnormal. At this time, retry the above steps three times. If 3 If no successful command is received from the blade board controller, it is confirmed that the blade board controller is abnormal. At this time, the fourth test result is returned as an error, reminding the user that an abnormality has occurred in the blade board controller, and terminating the test process.

步骤105、监控串口控制器的串口数据持续输出情况,根据在第一预设时长内是否接收到串口控制器输出的串口数据,生成第五测试结果。Step 105: Monitor the continuous output of serial port data by the serial port controller, and generate a fifth test result based on whether the serial port data output by the serial port controller is received within the first preset time period.

在本申请实施例中,监控所述串口控制器的串口数据持续输出情况,若在所述第一预设时长内未接收到所述串口控制器输出的所述串口数据,则等待第二预设时长,若在所述第二预设时长内均未接收到所述串口数据,则所述第五测试结果为错误,此时终止所述测试。In the embodiment of the present application, the continuous output of serial port data of the serial port controller is monitored. If the serial port data output by the serial port controller is not received within the first preset time period, wait for the second preset time. Assuming a time length, if the serial port data is not received within the second preset time period, the fifth test result is an error, and the test is terminated at this time.

可以理解的是,串口MCU作为实时串口输出方,BMC主板只需要监控串口数据是否持续输出即可得到串口控制器的运行情况。其中,第一预设时长如上所述优选为3s,第二预设时长优选为30s。It is understandable that the serial port MCU is a real-time serial port output party, and the BMC motherboard only needs to monitor whether the serial port data is continuously output to obtain the operation status of the serial port controller. Wherein, as mentioned above, the first preset time length is preferably 3s, and the second preset time length is preferably 30s.

对步骤105进行举例说明:BMC主板监控串口控制器的串口数据是否持续输出,若在3s内未接收到串口控制器输出的串口数据,则说明串口控制器可能存在异常,此时等待30s,若在等待的30s中一直没有接收到串口数据,则确认串口控制器存在异常情况,此时返回第五测试结果为错误,提醒用于串口控制器出现异常情况,并终止测试流程;若在3s内接收到了串口控制器传输的串口数据,则说明串口控制器是正常的,返回第五测试结果为正常即可。Give an example of step 105: The BMC motherboard monitors whether the serial port data of the serial port controller is continuously output. If the serial port data output by the serial port controller is not received within 3s, it means that the serial port controller may be abnormal. At this time, wait for 30s. If If no serial port data is received during the waiting 30s, it is confirmed that there is an abnormality in the serial port controller. At this time, the fifth test result is returned as an error, reminding the serial port controller that there is an abnormality, and the test process is terminated; if within 3s If the serial port data transmitted by the serial port controller is received, it means that the serial port controller is normal, and the fifth test result is returned as normal.

步骤106、获取SoC阵列服务器的启动时长、网速、USB检测结果、深度恢复模式检测结果、串口检测结果中的至少一项,根据启动时长、网速、USB检测结果、深度恢复模式检测结果、串口检测结果是否都符合预设标准,生成第六测试结果。Step 106: Obtain at least one of the startup time, network speed, USB detection results, deep recovery mode detection results, and serial port detection results of the SoC array server. Based on the startup time, network speed, USB detection results, and deep recovery mode detection results, Whether the serial port test results all meet the preset standards, the sixth test result is generated.

在本申请实施例中,重启SoC阵列服务器,获取启动时长;获取网速、USB检测结果、深度恢复模式检测结果、串口检测结果中的至少一项,若启动时长、网速、USB检测结果、深度恢复模式检测结果、串口检测结果都符合预设标准,则第六测试结果为正常;若启动时长、网速、USB检测结果、深度恢复模式检测结果、串口检测结果中的至少一项不符合预设标准,则重复n次重启所述SoC阵列服务器的步骤及其之后的步骤,若n次的检测结果中均至少一个不符合预设标准,则第六测试结果为错误,此时终止测试。In the embodiment of this application, restart the SoC array server and obtain the startup time; obtain at least one of the network speed, USB detection results, deep recovery mode detection results, and serial port detection results. If the startup time, network speed, USB detection results, If the deep recovery mode detection results and serial port detection results both meet the preset standards, the sixth test result is normal; if at least one of the startup time, network speed, USB detection results, deep recovery mode detection results, and serial port detection results does not meet the If the preset standard is set, repeat the steps of restarting the SoC array server n times and the following steps. If at least one of the n test results does not meet the preset standard, the sixth test result is an error, and the test is terminated at this time. .

可以理解的是,步骤106是对于SoC阵列服务器中的SoC功能进行测试验证。其中,重启SoC阵列服务器的启动时长的预设标准为2分钟以内启动完成;网速的预设标准为2500W速度;USB检测结果的预设标准为ADB(Android Debug Bridge,Android调试桥是一种功能多样的命令行工具,可使得设备进行互相通信)是否正常连接;深度恢复模式检测结果的预设标准为深度刷机模式是否正常启动;串口检测结果的预设标准为串口传输数据是否正常。It can be understood that step 106 is to test and verify the SoC function in the SoC array server. Among them, the preset standard for the startup time of restarting the SoC array server is to complete the startup within 2 minutes; the preset standard for the network speed is 2500W speed; the preset standard for the USB detection result is ADB (Android Debug Bridge, Android debug bridge is a kind of A command line tool with various functions that enables devices to communicate with each other.) Whether the connection is normal; the preset standard for the deep recovery mode detection result is whether the deep flash mode is started normally; the preset standard for the serial port detection result is whether the serial port transmission data is normal.

对步骤106进行举例说明:重启SoC阵列服务器,获取启动时长;获取网速、USB检测结果、深度恢复模式检测结果、串口检测结果中的至少一项,若同时满足启动时长是在2分钟以内、网速为2500M、USB检测结果显示ADB正常连接、深度恢复模式检测结果显示深度刷机模式正常启动、串口检测结果显示串口数据传输正常,则确认SoC功能正常,此时返回第六测试结果为正常;若启动时长、网速、USB检测结果、深度恢复模式检测结果、串口检测结果中有任意一项不符合前述预设标准,则说明SoC功能有可能异常,此时重复3次前述步骤,若3次的检测结果中均至少一个不符合预设标准,则确认SoC功能存在异常,此时需要返回第六测试结果为错误,提醒用户SoC功能出现异常情况,并终止测试流程。Take an example to illustrate step 106: Restart the SoC array server and obtain the startup time; obtain at least one of the network speed, USB detection results, deep recovery mode detection results, and serial port detection results. If the startup time is within 2 minutes, The network speed is 2500M, the USB test result shows that ADB is connected normally, the deep recovery mode test result shows that the deep flash mode starts normally, and the serial port test result shows that the serial port data transmission is normal, then it is confirmed that the SoC function is normal, and the sixth test result is returned as normal; If any of the startup time, network speed, USB detection results, deep recovery mode detection results, and serial port detection results do not meet the aforementioned preset standards, it means that the SoC function may be abnormal. At this time, repeat the above steps three times. If 3 If at least one of the test results does not meet the preset standard, it is confirmed that the SoC function is abnormal. At this time, the sixth test result needs to be returned as an error to remind the user that the SoC function is abnormal and the test process is terminated.

步骤107、对于SoC板卡的各个部件分别进行预设老化时长的老化测试,根据老化测试时系统的运行情况,生成第七测试结果。Step 107: Perform an aging test with a preset aging time on each component of the SoC board, and generate a seventh test result based on the operation of the system during the aging test.

在本申请实施例中,对SoC板卡的各个部件进行老化测试,老化测试均为预设老化时长,若在老化测试期间,系统出现重启和/或宕机的情况,则第七测试结果为错误,此时终止测试。In the embodiment of this application, an aging test is performed on each component of the SoC board. The aging tests are all for a preset aging time. If the system restarts and/or crashes during the aging test, the seventh test result is Error, the test is terminated at this time.

其中,预设老化时长为3*24小时。老化测试主要针对于SoC板卡的各个部件,例如CPU、GPU、网卡等等。Among them, the default aging time is 3*24 hours. The burn-in test mainly targets various components of the SoC board, such as CPU, GPU, network card, etc.

对步骤107进行举例说明:对SoC板卡的各个部件进行老化时长为3*24小时的老化测试,若在老化测试期间,系统出现重启和/或宕机的情况,则确认系统的老化测试不成功,此时返回第七测试结果为错误,提醒用户老化测试出现异常情况,并终止测试流程;若在老化测试期间,系统一直正常运行,并未重启和/或宕机,则系统的老化测试成功,此时返回第七测试结果为正常。Let’s give an example of step 107: perform an aging test of 3*24 hours on each component of the SoC board. If the system restarts and/or crashes during the aging test, confirm that the system’s aging test is not correct. Successfully, the seventh test result is returned as an error, reminding the user that there is an abnormality in the aging test, and terminating the test process; if during the aging test, the system has been running normally without restarting and/or downtime, the system's aging test Successfully, the seventh test result returned is normal.

步骤108、多次重启系统,获取每次系统的重启时长,根据多次重启时长,生成第八测试结果。Step 108: Restart the system multiple times, obtain the restart duration of each system, and generate an eighth test result based on the multiple restart durations.

在本申请实施例中,模仿用户使用场景,对系统进行多次重启,获取每次系统的重启时长,若每次系统的重启时长均符合预设重启时长标准,则第八测试结果为正常;若至少一次的系统的重启时长不符合预设重启时长标准,则第八测试结果为错误,此时终止测试。In the embodiment of this application, the user usage scenario is simulated, the system is restarted multiple times, and the restart duration of each system is obtained. If the restart duration of each system meets the preset restart duration standard, the eighth test result is normal; If the system restart time at least once does not meet the preset restart time standard, the eighth test result is an error, and the test is terminated at this time.

可以理解的是,步骤108的目的在于模仿用户使用中的频繁重启验证操作,测试频繁重启是否会对SoC阵列服务器造成损坏。其中,预设重启时长标准为2分钟。It can be understood that the purpose of step 108 is to imitate the frequent restart verification operations used by users and test whether frequent restarts will cause damage to the SoC array server. Among them, the default restart time is 2 minutes.

对步骤108进行举例说明:模仿用户使用场景,对系统进行多次频繁的重启,假设重启100次,获取每次系统的重启时长,若每次系统均在2分钟内完成重启,则第八测试结果为正常;若至少一次的系统的重启时长未在2分钟以内,则说明系统的抗频繁重启性能不够优良,此时返回第八测试结果为错误,提醒用户重启操作测试出现异常,并终止测试流程。Give an example of step 108: imitate the user usage scenario and restart the system multiple times frequently. Assume that the system is restarted 100 times and obtain the restart time of each system. If the system completes the restart within 2 minutes each time, the eighth test The result is normal; if the system restart time at least once is not within 2 minutes, it means that the system's anti-frequent restart performance is not good enough. At this time, the eighth test result returned is an error, reminding the user that the restart operation test is abnormal, and the test is terminated. process.

步骤109、基于第一测试结果、第二测试结果、第三测试结果、第四测试结果、第五测试结果、第六测试结果、第七测试结果、第八测试结果,生成并输出测试报告。Step 109: Generate and output a test report based on the first test result, the second test result, the third test result, the fourth test result, the fifth test result, the sixth test result, the seventh test result, and the eighth test result.

可以理解的是,请参阅图2,为本申请实施例提供的一种SoC阵列服务器的自动化测试方法举例示意图,在所有的测试验证结束之后,将第一测试结果、第二测试结果、第三测试结果、第四测试结果、第五测试结果、第六测试结果、第七测试结果、第八测试结果,生成一份完整的测试报告,并自动输出显示至用户端。It can be understood that please refer to Figure 2, which is a schematic diagram of an automated testing method for an SoC array server provided by an embodiment of the present application. After all test verifications are completed, the first test result, the second test result, and the third test result are The test results, the fourth test result, the fifth test result, the sixth test result, the seventh test result, and the eighth test result generate a complete test report and automatically output and display it to the user terminal.

在本申请实施例中,通过具体的对SoC阵列服务器的自动化测试系统中的每个部分进行不同的验证方法进行验证,得到准确的测试结果,基于测试结果生成完整的测试报告,实现了对SoC阵列服务器的全面、高效的测试。In the embodiment of this application, each part of the automated test system of the SoC array server is specifically verified using different verification methods to obtain accurate test results, and a complete test report is generated based on the test results, thereby realizing the verification of the SoC array server. Comprehensive and efficient testing of array servers.

请参阅图3,为本申请实施例中提供的一种SoC阵列服务器结构示意图,SoC阵列服务器包括互相连接的BMC主板310、背板320、刀片板330、交换板340,背板320包括背板控制器321、风扇控制器322,风扇控制器322连接有多个风扇;刀片板330包括刀片板控制器331、SoC板卡332、串口控制器333。具体的:Please refer to Figure 3, which is a schematic structural diagram of an SoC array server provided in an embodiment of the present application. The SoC array server includes a BMC mainboard 310, a backplane 320, a blade board 330, and a switching board 340 that are connected to each other. The backplane 320 includes a backplane. The controller 321 and the fan controller 322 are connected to multiple fans; the blade board 330 includes a blade board controller 331, an SoC board 332, and a serial port controller 333. specific:

BMC(Baseboard Management Controller)主板310可以通过以太网、UART(Universal Asynchronous Receiver-Transmitter,通用异步收发器)串口、Console控制台中的至少一个与交换板340连接,BMC主板310还通过UART串口分别与背板控制器321、串口控制器333连接;背板控制器321通过控制I/O的方式分别与交换板340和刀片板控制器331连接,背板控制器321通过PM BUS(Power Management Bus,电源管理总线)与风扇控制器322连接,背板控制器321还可以通过UART串口与刀片板控制器331连接;刀片板控制器331通过控制I/O的方式分别与串口控制器333和SoC板卡332连接;SoC板卡332通过UART串口与串口控制器333连接,SoC板卡332通过以太网与交换板340连接;串口控制器333也可以通过以太网与交换板340连接。The BMC (Baseboard Management Controller) mainboard 310 can be connected to the switch board 340 through at least one of Ethernet, UART (Universal Asynchronous Receiver-Transmitter, Universal Asynchronous Receiver-Transmitter) serial port, and Console. The BMC mainboard 310 is also connected to the back panel through the UART serial port. The board controller 321 and the serial port controller 333 are connected; the backplane controller 321 is connected to the switching board 340 and the blade board controller 331 respectively by controlling I/O. The backplane controller 321 is connected to the switch board 340 and the blade board controller 331 through PM BUS (Power Management Bus, power supply) Management bus) is connected to the fan controller 322. The backplane controller 321 can also be connected to the blade board controller 331 through the UART serial port; the blade board controller 331 is connected to the serial port controller 333 and the SoC board by controlling I/O. 332 connection; the SoC board 332 is connected to the serial port controller 333 through the UART serial port, and the SoC board 332 is connected to the switch board 340 through Ethernet; the serial port controller 333 can also be connected to the switch board 340 through Ethernet.

BMC主板310是SoC阵列服务器的核心控制板,负责管理和监控整个服务器的运行状态。它集成了一系列管理功能,如远程监控、故障诊断、电源控制、风扇控制等。通过BMC主板310,管理员可以实时监测服务器的健康状态,并进行远程管理和维护。BMC motherboard 310 is the core control board of the SoC array server and is responsible for managing and monitoring the running status of the entire server. It integrates a series of management functions, such as remote monitoring, fault diagnosis, power control, fan control, etc. Through the BMC motherboard 310, administrators can monitor the health status of the server in real time and perform remote management and maintenance.

背板320上集成了背板控制器321、风扇控制器322。The backplane controller 321 and fan controller 322 are integrated on the backplane 320 .

其中,背板控制器321是SoC阵列服务器背板320上的控制模块,用于管理和控制整个背板320上的各个子模块。它负责连接并协调背板320上的各个硬件组件,确保它们能够正常运行和通信。背板控制器321与BMC主板310之间有数据交互,通过这种通信,BMC主板310可以监控和管理背板320上的所有子模块。Among them, the backplane controller 321 is a control module on the SoC array server backplane 320 and is used to manage and control each sub-module on the entire backplane 320. It is responsible for connecting and coordinating the various hardware components on the backplane 320 to ensure that they can operate and communicate properly. There is data exchange between the backplane controller 321 and the BMC mainboard 310. Through this communication, the BMC mainboard 310 can monitor and manage all sub-modules on the backplane 320.

风扇控制器322负责控制上述多个风扇转速和监控风扇状态。可以理解的是,此处图2中优选多个风扇为4个,但不代表本SoC阵列服务器中只能连接4个风扇,此处仅为优选的举例说明,并不做过多限制。The fan controller 322 is responsible for controlling the plurality of fan speeds and monitoring fan status. It can be understood that the preferred number of fans in Figure 2 here is four, but this does not mean that only four fans can be connected to this SoC array server. This is only a preferred example and does not impose too many restrictions.

刀片板330上集成了刀片板控制器331、SoC板卡332、串口控制器333。其中,刀片板330优选为4个,SoC板卡332优选为10个,串口控制器333优选为2个。需要注意的是,此处仅为一种优选情况,并不对数量做过多的限制。The blade board 330 integrates a blade board controller 331, an SoC board 332, and a serial port controller 333. Among them, the number of blade boards 330 is preferably 4, the number of SoC boards 332 is preferably 10, and the number of serial port controllers 333 is preferably 2. It should be noted that this is only a preferred situation and does not impose excessive restrictions on the quantity.

其中,刀片板控制器331是SoC阵列服务器刀片板330上的控制模块,用于管理和监控刀片板330的运行状态。它负责与刀片板330内的各个组件进行通信。刀片板控制器331与背板控制器321之间有数据交互,通过这种通信,背板控制器321可以对刀片板330进行统一管理和监控。Among them, the blade board controller 331 is a control module on the SoC array server blade board 330 and is used to manage and monitor the operating status of the blade board 330 . It is responsible for communicating with the various components within the blade board 330. There is data exchange between the blade board controller 331 and the backplane controller 321. Through this communication, the backplane controller 321 can perform unified management and monitoring of the blade board 330.

SoC板卡332是SoC阵列服务器的核心计算单元,每个SoC板卡集成了一个SoC芯片,包含处理器、内存、I/O接口等。SoC板卡332是SoC阵列服务器的计算和数据处理核心,它们负责执行SoC阵列服务器上的各种计算任务和数据处理操作。在SoC阵列服务器中,多个SoC板卡332组成阵列,通过交换板340和背板控制器321进行连接和协同工作,实现高性能的数据处理和计算能力。The SoC board 332 is the core computing unit of the SoC array server. Each SoC board integrates a SoC chip, including a processor, memory, I/O interface, etc. The SoC board 332 is the computing and data processing core of the SoC array server, and they are responsible for performing various computing tasks and data processing operations on the SoC array server. In the SoC array server, multiple SoC boards 332 form an array, which are connected and work together through the switching board 340 and the backplane controller 321 to achieve high-performance data processing and computing capabilities.

串口控制器333与BMC主板310、刀片板控制器331、SoC板卡332、交换板340连接,是SoC阵列服务器中的一个控制模块,负责管理和控制SoC阵列服务器中的各个串口。它可以实现对SoC阵列服务器中各个串口设备的配置和控制,支持串口通信,并提供SoC阵列服务器中SoC设备的状态监测功能。The serial port controller 333 is connected to the BMC mainboard 310, the blade board controller 331, the SoC board 332, and the switching board 340. It is a control module in the SoC array server and is responsible for managing and controlling each serial port in the SoC array server. It can realize the configuration and control of each serial port device in the SoC array server, support serial port communication, and provide status monitoring functions of the SoC devices in the SoC array server.

交换板340是SoC阵列服务器中的一个重要组件,负责处理SoC阵列服务器内部的网络通信。它集成了高性能的交换芯片,用于在SoC阵列服务器内部的各个部件之间建立高速、稳定的数据通道。交换板340允许各个部件之间快速交换数据,并支持灵活的网络配置,以满足不同业务场景的需求。其中,交换板可以优选为25G交换板,同样的,此处仅为一种优选情况,并不做过多限制。The switching board 340 is an important component in the SoC array server and is responsible for processing network communications within the SoC array server. It integrates a high-performance switching chip to establish high-speed and stable data channels between various components within the SoC array server. The switching board 340 allows rapid exchange of data between various components and supports flexible network configuration to meet the needs of different business scenarios. Among them, the switching board can preferably be a 25G switching board. Similarly, this is only a preferred situation without too many restrictions.

可以理解的是,上述图1中所述的一种SoC阵列服务器自动化测试方法是在SoC阵列服务器中的BMC主板310中进行的。It can be understood that the above-mentioned automated testing method of the SoC array server described in Figure 1 is performed on the BMC mainboard 310 in the SoC array server.

在本申请实施例中,上述BMC主板310执行如图1中所述的任一项SoC阵列服务器的自动化测试方法可以参阅图1所示实施例内容,此处不做赘述。In the embodiment of the present application, the above-mentioned BMC motherboard 310 performs any automated test method of the SoC array server as shown in Figure 1. Please refer to the content of the embodiment shown in Figure 1, and will not be described again here.

在本申请实施例中,提供一种计算机可读存储介质,存储有计算机程序,计算机程序被处理器执行时,使得处理器执行上述方法实施例中任意一个实施例的方法。In an embodiment of the present application, a computer-readable storage medium is provided, which stores a computer program. When the computer program is executed by a processor, it causes the processor to execute the method of any one of the above method embodiments.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一非易失性计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through computer programs. The programs can be stored in a non-volatile computer-readable storage medium. , when the program is executed, it may include the processes of the above-mentioned method embodiments. Any reference to memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Synchlink DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the above embodiments can be combined in any way. To simplify the description, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, all possible combinations should be used. It is considered to be within the scope of this manual.

以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对本申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。The above-described embodiments only express several implementation modes of the present application, and their descriptions are relatively specific and detailed, but should not be construed as limiting the patent scope of the present application. It should be noted that, for those of ordinary skill in the art, several modifications and improvements can be made without departing from the concept of the present application, and these all fall within the protection scope of the present application. Therefore, the protection scope of this patent application should be determined by the appended claims.

Claims (10)

1.一种SoC阵列服务器的自动化测试方法,其特征在于,所述方法应用于SoC阵列服务器的BMC主板,所述SoC阵列服务器还包括背板、刀片板、交换板,所述背板包括背板控制器、风扇控制器,所述风扇控制器连接有多个风扇;所述刀片板包括刀片板控制器、SoC板卡、串口控制器;所述方法包括:1. An automated testing method for an SoC array server, characterized in that the method is applied to the BMC motherboard of the SoC array server. The SoC array server also includes a backplane, a blade board, and a switching board. The backplane includes a backplane. A board controller and a fan controller, the fan controller is connected to multiple fans; the blade board includes a blade board controller, an SoC board card, and a serial port controller; the method includes: 发送第一串口指令至所述背板控制器,根据在第一预设时长内是否接收到所述背板控制器传回的成功指令,生成第一测试结果;Send a first serial port command to the backplane controller, and generate a first test result based on whether a successful command returned by the backplane controller is received within a first preset time period; 在第一预设范围内基于所述风扇控制器调整所述风扇转速,根据调整后所述风扇的实际转速是否符合转速阈值,生成第二测试结果;Adjust the fan speed based on the fan controller within a first preset range, and generate a second test result based on whether the adjusted actual speed of the fan meets a speed threshold; 接收所述交换板的启动时长,根据所述启动时长是否符合预设时长范围,生成第三测试结果;Receive the startup duration of the switching board, and generate a third test result based on whether the startup duration meets the preset duration range; 发送第二串口指令至所述刀片板控制器,根据在所述第一预设时长内是否接收到所述刀片板控制器传回的成功指令,生成第四测试结果;Send a second serial port command to the blade board controller, and generate a fourth test result based on whether a successful command returned by the blade board controller is received within the first preset time period; 监控所述串口控制器的串口数据持续输出情况,根据在所述第一预设时长内是否接收到所述串口控制器输出的所述串口数据,生成第五测试结果;Monitor the continuous output of serial port data by the serial port controller, and generate a fifth test result according to whether the serial port data output by the serial port controller is received within the first preset time period; 获取所述SoC阵列服务器的启动时长、网速、USB检测结果、深度恢复模式检测结果、串口检测结果中的至少一项,根据所述启动时长、网速、USB检测结果、深度恢复模式检测结果、串口检测结果是否都符合预设标准,生成第六测试结果;Obtain at least one of the startup time, network speed, USB detection results, deep recovery mode detection results, and serial port detection results of the SoC array server, and based on the startup time, network speed, USB detection results, and deep recovery mode detection results , whether the serial port test results all meet the preset standards and generate the sixth test result; 对于所述SoC板卡的各个部件分别进行预设老化时长的老化测试,根据所述老化测试时所述系统的运行情况,生成第七测试结果;Perform an aging test with a preset aging time on each component of the SoC board, and generate a seventh test result based on the operation of the system during the aging test; 多次重启所述系统,获取每次所述系统的重启时长,根据所述多次重启时长,生成第八测试结果;Restart the system multiple times, obtain the restart duration of each system, and generate an eighth test result based on the multiple restart durations; 基于所述第一测试结果、所述第二测试结果、所述第三测试结果、所述第四测试结果、所述第五测试结果、所述第六测试结果、所述第七测试结果、所述第八测试结果,生成并输出测试报告。Based on the first test result, the second test result, the third test result, the fourth test result, the fifth test result, the sixth test result, the seventh test result, The eighth test result generates and outputs a test report. 2.根据权利要求1所述的方法,其特征在于,所述发送第一串口指令至所述背板控制器,根据在第一预设时长内是否接收到所述背板控制器传回的成功指令,生成第一测试结果,包括:2. The method according to claim 1, characterized in that the sending of the first serial port command to the backplane controller is based on whether the backplane controller is received within a first preset time period. A successful command generates the first test result, including: 下发串口指令至所述背板控制器,等待第一预设时长,若在所述第一预设时长内接收到所述背板控制器传回的成功指令,则所述第一测试结果为正常;Send a serial port command to the backplane controller and wait for a first preset time period. If a successful command returned by the backplane controller is received within the first preset time period, the first test result will be is normal; 若在所述第一预设时长内未接收到所述成功指令,则重复n次所述下发串口指令至所述背板控制器的步骤及其之后的步骤,若n次均未接收到所述成功指令,则所述第一测试结果为错误,此时终止所述测试。If the successful command is not received within the first preset time period, repeat the step of issuing the serial port command to the backplane controller n times and the subsequent steps. If the successful command is not received n times, If the instruction is successful, the first test result is an error, and the test is terminated at this time. 3.根据权利要求1所述的方法,其特征在于,所述在第一预设范围内基于所述风扇控制器调整所述风扇转速,根据调整后所述风扇的实际转速是否符合转速阈值,生成第二测试结果,包括:3. The method of claim 1, wherein the fan speed is adjusted based on the fan controller within a first preset range based on whether the adjusted actual speed of the fan meets a speed threshold, Generate second test results, including: 在第一预设范围内基于所述风扇控制器调整所述风扇转速,所述第一预设范围内的每个值均对应于一个转速阈值,若在第一预设范围内选取某个值调整所述风扇转速,调整后所述风扇的实际转速符合所述转速阈值,则所述第二测试结果为正常;The fan speed is adjusted based on the fan controller within a first preset range. Each value in the first preset range corresponds to a speed threshold. If a certain value is selected within the first preset range Adjust the fan speed, and the actual speed of the fan after adjustment meets the speed threshold, then the second test result is normal; 若调整后所述风扇的实际转速不符合所述转速阈值,则重复n次所述在第一预设范围内选取某个值调整所述风扇转速的步骤及其之后的步骤,若n次得到的所述实际风扇转速均未符合所述转速阈值,则所述第二测试结果为错误,此时终止所述测试。If the actual speed of the fan after adjustment does not meet the speed threshold, repeat the step of selecting a value within the first preset range to adjust the fan speed and the subsequent steps n times. If n times, If none of the actual fan speeds meets the speed threshold, the second test result is an error, and the test is terminated. 4.根据权利要求1所述的方法,其特征在于,所述接收所述交换板的启动时长,根据所述启动时长是否符合预设时长范围,生成第三测试结果,包括:4. The method according to claim 1, characterized in that, receiving the startup duration of the switching board, and generating a third test result according to whether the startup duration complies with a preset duration range, including: 重启交换板,接收所述交换板的启动时长,若所述启动时长符合预设时长范围,则所述第三测试结果为正常;Restart the switching board and receive the startup duration of the switching board. If the startup duration meets the preset duration range, the third test result is normal; 若所述启动时长不符合所述预设时长范围,则重复n次所述重启交换板的步骤及其之后的步骤,若n次得到的启动时长均未符合所述预设时长范围,则所述第三测试结果为错误,此时终止所述测试。If the startup duration does not meet the preset duration range, repeat the step of restarting the switch board and subsequent steps n times. If the startup duration obtained n times does not comply with the preset duration range, then the The third test result is an error, and the test is terminated at this time. 5.根据权利要求1所述的方法,其特征在于,所述发送第二串口指令至所述刀片板控制器,根据在所述第一预设时长内是否接收到所述刀片板控制器传回的成功指令,生成第四测试结果,包括:5. The method according to claim 1, wherein the sending of the second serial port command to the blade board controller is based on whether a transmission from the blade board controller is received within the first preset time period. The successful command returned generates the fourth test result, including: 下发串口指令至所述刀片板控制器,等待第一预设时长,若在所述第一预设时长内接收到所述刀片板控制器传回的成功指令,则所述第四测试结果为正常;Send a serial port command to the blade board controller and wait for a first preset time period. If a successful command returned by the blade board controller is received within the first preset time period, the fourth test result will be is normal; 若在所述第一预设时长内未接收到所述成功指令,则重复n次所述下发串口指令至所述刀片板控制器的步骤及其之后的步骤,若n次均未接收到所述成功指令,则所述第四测试结果为错误,此时终止所述测试。If the successful command is not received within the first preset time period, repeat the step of issuing the serial port command to the blade board controller n times and the subsequent steps. If the successful command is not received n times, If the instruction is successful, the fourth test result is an error, and the test is terminated at this time. 6.根据权利要求1所述的方法,其特征在于,所述监控所述串口控制器的串口数据持续输出情况,根据在所述第一预设时长内是否接收到所述串口控制器输出的所述串口数据,生成第五测试结果,包括:6. The method according to claim 1, wherein the monitoring of the continuous output of serial port data of the serial port controller is based on whether the serial port data output by the serial port controller is received within the first preset time period. The serial port data generates the fifth test result, including: 监控所述串口控制器的串口数据持续输出情况,若在所述第一预设时长内未接收到所述串口控制器输出的所述串口数据,则等待第二预设时长,若在所述第二预设时长内均未接收到所述串口数据,则所述第五测试结果为错误,此时终止所述测试。Monitor the continuous output of serial port data of the serial port controller. If the serial port data output by the serial port controller is not received within the first preset time period, wait for the second preset time period. If within the first preset time period, If the serial port data is not received within the second preset time period, the fifth test result is an error, and the test is terminated at this time. 7.根据权利要求1所述的方法,其特征在于,获取所述SoC阵列服务器的启动时长、网速、USB检测结果、深度恢复模式检测结果、串口检测结果中的至少一项,所述根据所述启动时长、网速、USB检测结果、深度恢复模式检测结果、串口检测结果是否都符合预设标准,生成第六测试结果,包括:7. The method according to claim 1, characterized in that, obtaining at least one of the startup time, network speed, USB detection results, deep recovery mode detection results, and serial port detection results of the SoC array server, the method is based on Whether the startup time, network speed, USB detection results, deep recovery mode detection results, and serial port detection results all meet the preset standards, a sixth test result is generated, including: 重启所述SoC阵列服务器,获取启动时长;获取网速、USB检测结果、深度恢复模式检测结果、串口检测结果中的至少一项,若所述启动时长、网速、USB检测结果、深度恢复模式检测结果、串口检测结果都符合预设标准,则所述第六测试结果为正常;Restart the SoC array server and obtain the startup time; obtain at least one of the network speed, USB detection results, deep recovery mode detection results, and serial port detection results. If the startup time, network speed, USB detection results, and deep recovery mode If the test results and serial port test results both meet the preset standards, then the sixth test result is normal; 若所述启动时长、网速、USB检测结果、深度恢复模式检测结果、串口检测结果中的至少一项不符合预设标准,则重复n次所述重启所述SoC阵列服务器的步骤及其之后的步骤,若n次的检测结果中均至少一个不符合预设标准,则所述第六测试结果为错误,此时终止所述测试。If at least one of the boot time, network speed, USB detection results, deep recovery mode detection results, and serial port detection results does not meet the preset standard, repeat the steps of restarting the SoC array server n times and thereafter. step, if at least one of the n test results does not meet the preset standard, the sixth test result is an error, and the test is terminated at this time. 8.根据权利要求1所述的方法,其特征在于,所述对于所述SoC板卡的各个部件分别进行预设老化时长的老化测试,根据所述老化测试时所述系统的运行情况,生成第七测试结果,包括:8. The method according to claim 1, characterized in that, each component of the SoC board is subjected to an aging test with a preset aging time, and based on the operation status of the system during the aging test, a generated The seventh test results include: 对所述SoC板卡的各个部件进行老化测试,所述老化测试均为预设老化时长,若在所述老化测试期间,所述系统出现重启和/或宕机的情况,则所述第七测试结果为错误,此时终止所述测试。An aging test is performed on each component of the SoC board. The aging test is a preset aging time. If the system restarts and/or crashes during the aging test, the seventh The test result is an error and the test is terminated at this time. 9.根据权利要求1所述的方法,其特征在于,所述多次重启所述系统,获取每次所述系统的重启时长,根据所述多次重启时长,生成第八测试结果,包括:9. The method according to claim 1, wherein the system is restarted multiple times, the restart duration of each system is obtained, and an eighth test result is generated based on the restart duration, including: 模仿用户使用场景,对所述系统进行多次重启,获取每次所述系统的重启时长,若所述每次系统的重启时长均符合预设重启时长标准,则所述第八测试结果为正常;Imitate user usage scenarios, restart the system multiple times, and obtain the restart time of each system. If the restart time of each system meets the preset restart time standard, the eighth test result is normal. ; 若至少一次的系统的重启时长不符合所述预设重启时长标准,则所述第八测试结果为错误,此时终止所述测试。If the restart duration of at least one system does not meet the preset restart duration standard, the eighth test result is an error, and the test is terminated. 10.一种SoC阵列服务器,其特征在于,所述SoC阵列服务器包括互相连接的BMC主板、背板、刀片板、交换板,所述背板包括背板控制器、风扇控制器,所述风扇控制器连接有多个风扇;所述刀片板包括刀片板控制器、SoC板卡、串口控制器;10. A SoC array server, characterized in that the SoC array server includes a BMC motherboard, a backplane, a blade board, and a switching board that are connected to each other. The backplane includes a backplane controller and a fan controller. The fan The controller is connected to multiple fans; the blade board includes a blade board controller, SoC board card, and serial port controller; 所述BMC主板用于管理和监控整个SoC阵列服务器的运行状态;所述BMC主板还用于实时监测所述SoC阵列服务器的健康状态,并进行远程管理和维护;The BMC mainboard is used to manage and monitor the running status of the entire SoC array server; the BMC mainboard is also used to monitor the health status of the SoC array server in real time and perform remote management and maintenance; 所述背板控制器与所述BMC主板连接,所述背板控制器用于连接并协调所述背板上的各个硬件组件,确保所述各个硬件组件得以正常运行和通信,并管理和控制整个背板上的各个子模块;The backplane controller is connected to the BMC mainboard. The backplane controller is used to connect and coordinate the various hardware components on the backplane, ensure the normal operation and communication of the various hardware components, and manage and control the entire Each sub-module on the backplane; 所述刀片板控制器与所述背板控制器连接,所述刀片板控制器用于与所述刀片板内的各个组件进行通信,并管理和监控所述刀片板内的各个组件的运行状态;The blade board controller is connected to the backplane controller, and the blade board controller is used to communicate with each component in the blade board, and manage and monitor the operating status of each component in the blade board; 所述串口控制器与所述BMC主板、所述刀片板控制器、所述SoC板卡、所述交换板连接,用于对所述SoC阵列服务器中的串口设备进行配置和控制,同时支持各个串口通信,并提供所述SoC阵列服务器的状态监测功能;The serial port controller is connected to the BMC motherboard, the blade board controller, the SoC board card, and the switching board, and is used to configure and control the serial port devices in the SoC array server, while supporting various Serial communication and providing status monitoring function of the SoC array server; 所述交换板用于在所述SoC阵列服务器内部的各个部件之间建立高速、稳定的数据通道;所述交换板还用于支持所述SoC阵列服务器内部的各个部件之间快速交换数据,并支持灵活的网络配置,以满足不同业务场景的需求;The switching board is used to establish high-speed and stable data channels between various components inside the SoC array server; the switching board is also used to support rapid data exchange between various components inside the SoC array server, and Support flexible network configuration to meet the needs of different business scenarios; 所述风扇控制器用于控制所述风扇转速和监控所述风扇状态;The fan controller is used to control the fan speed and monitor the fan status; 所述SoC板卡包括一个SoC芯片,所述SoC板卡用于执行所述SoC阵列服务器上的各种计算任务和数据处理操作;The SoC board includes an SoC chip, and the SoC board is used to perform various computing tasks and data processing operations on the SoC array server; 其中,所述BMC主板用于执行如权利要求1至9任一项所述的SoC阵列服务器的自动化测试方法。Wherein, the BMC motherboard is used to execute the automated testing method of the SoC array server according to any one of claims 1 to 9.
CN202311214753.8A 2023-09-19 2023-09-19 SoC array server automated testing method and SoC array server Active CN117234820B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311214753.8A CN117234820B (en) 2023-09-19 2023-09-19 SoC array server automated testing method and SoC array server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311214753.8A CN117234820B (en) 2023-09-19 2023-09-19 SoC array server automated testing method and SoC array server

Publications (2)

Publication Number Publication Date
CN117234820A true CN117234820A (en) 2023-12-15
CN117234820B CN117234820B (en) 2025-06-06

Family

ID=89090710

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311214753.8A Active CN117234820B (en) 2023-09-19 2023-09-19 SoC array server automated testing method and SoC array server

Country Status (1)

Country Link
CN (1) CN117234820B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101442448A (en) * 2007-11-23 2009-05-27 鸿富锦精密工业(深圳)有限公司 Test system and method for knife blade server
CN109976959A (en) * 2019-03-27 2019-07-05 苏州浪潮智能科技有限公司 A kind of portable device and method for server failure detection
CN216719081U (en) * 2022-01-20 2022-06-10 苏州浪潮智能科技有限公司 A server testing device based on SOC chip
WO2022237549A1 (en) * 2021-05-14 2022-11-17 山东英信计算机技术有限公司 Server board card apparatus, detection method therefor, and detection device thereof, and storage medium
CN115604069A (en) * 2022-09-29 2023-01-13 苏州浪潮智能科技有限公司(Cn) Server detection method, system and device
CN116701074A (en) * 2023-05-06 2023-09-05 苏州浪潮智能科技有限公司 Device and method for testing cyclic restarting of server

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101442448A (en) * 2007-11-23 2009-05-27 鸿富锦精密工业(深圳)有限公司 Test system and method for knife blade server
CN109976959A (en) * 2019-03-27 2019-07-05 苏州浪潮智能科技有限公司 A kind of portable device and method for server failure detection
WO2022237549A1 (en) * 2021-05-14 2022-11-17 山东英信计算机技术有限公司 Server board card apparatus, detection method therefor, and detection device thereof, and storage medium
CN216719081U (en) * 2022-01-20 2022-06-10 苏州浪潮智能科技有限公司 A server testing device based on SOC chip
CN115604069A (en) * 2022-09-29 2023-01-13 苏州浪潮智能科技有限公司(Cn) Server detection method, system and device
CN116701074A (en) * 2023-05-06 2023-09-05 苏州浪潮智能科技有限公司 Device and method for testing cyclic restarting of server

Also Published As

Publication number Publication date
CN117234820B (en) 2025-06-06

Similar Documents

Publication Publication Date Title
CN111752776B (en) Cyclic power-on and power-off test method and system for server
CN114003445B (en) BMC I2C monitoring function test method, system, terminal and storage medium
CN112286709A (en) Diagnosis method, diagnosis device and diagnosis equipment for server hardware faults
CN107943637A (en) A kind of mains cycle test device and method based on IPMI platforms
CN111858201A (en) A BMC comprehensive testing method, system, terminal and storage medium
CN116915583B (en) Diagnostic method for communication abnormality, device and electronic equipment thereof
CN115827358A (en) Automatic test system, method and device for PFR function and storage medium
CN114780316A (en) Memory test method, device and system
CN114138587B (en) Reliability verification method, device and equipment for server power supply firmware upgrade
CN101539876A (en) Start-up test system and method thereof
CN115480627A (en) Power supply testing method and device for server, electronic equipment and readable medium
CN119473744A (en) Link testing method, electronic device, storage medium, product and computing device
CN113778732A (en) Fault positioning method and device for service board card
CN105893196A (en) Server debugging auxiliary tool and system
CN117234820A (en) An automated testing method for SoC array server and SoC array server
CN118675602A (en) Memory testing system and testing method thereof
CN111488250A (en) Test method, system, terminal and storage medium for high-density multi-node server
CN117149555A (en) Test management method, device, equipment and medium based on server power consumption
CN116932297A (en) CPU performance/watt ratio test method, system, terminal and storage medium
CN116594826A (en) A server testing method, device, electronic equipment and storage medium
CN115640181A (en) System and method for testing image processing device
CN112463499A (en) Method, device, equipment and storage medium for adapting external equipment
CN100369009C (en) Monitoring system and method using system management interrupt signal
TWI748241B (en) Debug message automatically providing method of bios
CN102411527B (en) Detecting method for image-processing chip, developing plate and detecting system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant