CN117234820A - Automatic testing method of SoC array server and SoC array server - Google Patents

Automatic testing method of SoC array server and SoC array server Download PDF

Info

Publication number
CN117234820A
CN117234820A CN202311214753.8A CN202311214753A CN117234820A CN 117234820 A CN117234820 A CN 117234820A CN 202311214753 A CN202311214753 A CN 202311214753A CN 117234820 A CN117234820 A CN 117234820A
Authority
CN
China
Prior art keywords
controller
serial port
board
test result
soc
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311214753.8A
Other languages
Chinese (zh)
Inventor
陈卓杰
张定乾
支彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qishuo Shenzhen Technology Co ltd
Original Assignee
Qishuo Shenzhen Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qishuo Shenzhen Technology Co ltd filed Critical Qishuo Shenzhen Technology Co ltd
Priority to CN202311214753.8A priority Critical patent/CN117234820A/en
Publication of CN117234820A publication Critical patent/CN117234820A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Test And Diagnosis Of Digital Computers (AREA)

Abstract

The embodiment of the invention discloses an automatic test method of an SoC array server, which is applied to a BMC main board in the SoC array server, wherein the SoC array server also comprises a back plate, a blade plate and a switching plate, the back plate comprises a back plate controller and a fan controller, and the fan controller is connected with a plurality of fans; the blade board comprises a blade board controller, an SoC board card and a serial port controller. The method specifically comprises the following steps: and enabling the backboard controller, the fan, the blade board controller, the SoC board card, the serial port controller and the exchange board to work normally respectively, setting preset duration and preset range to verify the backboard controller, the fan, the blade board controller, the SoC board card, the serial port controller and the exchange board respectively, finally obtaining a verification result and generating a corresponding test report. According to the invention, each part in the automatic test system of the SoC array server is subjected to different verification methods to obtain an accurate test result, and a complete test report is generated based on the test result, so that the comprehensive, accurate and efficient test of the SoC array server is realized.

Description

Automatic testing method of SoC array server and SoC array server
Technical Field
The present invention relates to the field of device testing technologies, and in particular, to an automatic testing method for an SoC array server and an SoC array server.
Background
With the continuous advancement of technology and the advent of the information age, data processing and computing demands have grown. The SoC (System on Chip) array server is used as a high-performance server integrating multiple functions such as a processor, a memory, an I/O interface and the like, and the high density and the high computing capacity of the SoC (System on Chip) array server can meet the increasing data processing demands, and the SoC array server has important significance in the fields such as a large-scale data center, cloud computing, artificial intelligence and the like and is widely applied to the fields. Further, testing on SoC array servers has also begun to be of great concern, and conventional manual testing and inspection methods face many challenges due to the complexity and large scale of SoC array servers, including, for example, time and effort, and also the problems of human error, increased system deployment and maintenance costs.
Therefore, there is a need for a reliable, automated SoC array server testing scheme that enables comprehensive, efficient testing of individual components within a server.
Disclosure of Invention
In view of the above, it is necessary to provide an automated testing method for an SoC array server and an SoC array server, which can realize comprehensive and efficient testing of each component in the SoC array server.
In a first aspect, the present invention provides an automated test method of an SoC array server, where the method is applied to a BMC motherboard of the SoC array server, the SoC array server further includes a back board, a blade board, and a switch board, the back board includes a back board controller, a fan controller, and the fan controller is connected with a plurality of fans; the blade board comprises a blade board controller, an SoC board card and a serial port controller; the method comprises the following steps:
sending a first serial port instruction to the backboard controller, and generating a first test result according to whether a success instruction returned by the backboard controller is received within a first preset time length;
adjusting the rotating speed of the fan based on the fan controller in a first preset range, and generating a second test result according to whether the actual rotating speed of the fan after adjustment accords with a rotating speed threshold value;
receiving the starting time length of the exchange board, and generating a third test result according to whether the starting time length accords with a preset time length range;
Sending a second serial port instruction to the blade board controller, and generating a fourth test result according to whether a success instruction returned by the blade board controller is received within the first preset time length;
monitoring the continuous output condition of serial port data of the serial port controller, and generating a fifth test result according to whether the serial port data output by the serial port controller is received within the first preset duration;
acquiring at least one of starting time length, network speed, USB detection result, deep recovery mode detection result and serial port detection result of the SoC array server, and generating a sixth test result according to whether the starting time length, network speed, USB detection result, deep recovery mode detection result and serial port detection result all meet preset standards;
respectively performing aging tests with preset aging time for each component of the SoC board card, and generating a seventh test result according to the running condition of the system during the aging tests;
restarting the system for multiple times, obtaining the restarting time length of the system each time, and generating an eighth test result according to the restarting time length for multiple times;
and generating and outputting a test report based on the first test result, the second test result, the third test result, the fourth test result, the fifth test result, the sixth test result, the seventh test result and the eighth test result.
Optionally, the sending the first serial port instruction to the back panel controller generates a first test result according to whether a success instruction returned by the back panel controller is received within a first preset duration, including:
issuing a serial port instruction to the backboard controller, waiting for a first preset time period, and if a successful instruction returned by the backboard controller is received within the first preset time period, determining that the first test result is normal;
and if the success instruction is not received within the first preset time period, repeating the step of issuing the serial port instruction to the backboard controller and the steps after the step for issuing the serial port instruction for n times, and if the success instruction is not received for n times, determining that the first test result is an error, and terminating the test at the moment.
Optionally, the adjusting the rotation speed of the fan based on the fan controller in the first preset range, and generating a second test result according to whether the adjusted actual rotation speed of the fan meets a rotation speed threshold value, includes:
adjusting the rotating speed of the fan based on the fan controller in a first preset range, wherein each value in the first preset range corresponds to a rotating speed threshold value, and if a certain value is selected in the first preset range to adjust the rotating speed of the fan, the actual rotating speed of the fan after adjustment accords with the rotating speed threshold value, the second test result is normal;
And if the actual rotation speed of the fan after adjustment does not accord with the rotation speed threshold, repeating the step of selecting a certain value in a first preset range for adjusting the rotation speed of the fan for n times and the subsequent steps, and if the actual rotation speed of the fan obtained for n times does not accord with the rotation speed threshold, determining that the second test result is wrong, and ending the test at the moment.
Optionally, the receiving the starting duration of the switch board and generating a third test result according to whether the starting duration meets a preset duration range, includes:
restarting the exchange board, receiving the starting time length of the exchange board, and if the starting time length accords with a preset time length range, determining that the third test result is normal;
and if the starting time length does not accord with the preset time length range, repeating the step of restarting the exchange board for n times and the subsequent steps, and if the starting time length obtained for n times does not accord with the preset time length range, the third test result is an error, and the test is terminated at the moment.
Optionally, the sending a second serial port instruction to the blade board controller generates a fourth test result according to whether a success instruction returned by the blade board controller is received within the first preset duration, including:
Issuing a serial port instruction to the blade controller, waiting for a first preset time period, and if a success instruction transmitted back by the blade controller is received within the first preset time period, determining that the fourth test result is normal;
and if the success instruction is not received within the first preset time period, repeating the step of issuing the serial port instruction to the blade board controller and the steps after the step for issuing the serial port instruction for n times, and if the success instruction is not received for n times, determining that the fourth test result is an error, and terminating the test at the moment.
Optionally, the monitoring the continuous output condition of the serial port data of the serial port controller, generating a fifth test result according to whether the serial port data output by the serial port controller is received within the first preset duration, includes:
and monitoring the continuous output condition of the serial port data of the serial port controller, waiting for a second preset duration if the serial port data output by the serial port controller is not received within the first preset duration, and stopping the test at the moment if the serial port data is not received within the second preset duration, wherein the fifth test result is an error.
Optionally, at least one of a start-up duration, a network speed, a USB detection result, a deep recovery mode detection result, and a serial port detection result of the SoC array server is obtained, and generating a sixth test result according to whether the start-up duration, the network speed, the USB detection result, the deep recovery mode detection result, and the serial port detection result all meet a preset standard, includes:
Restarting the SoC array server to obtain starting time; acquiring at least one of a network speed, a USB detection result, a deep recovery mode detection result and a serial port detection result, and if the starting time length, the network speed, the USB detection result, the deep recovery mode detection result and the serial port detection result all meet preset standards, determining that the sixth test result is normal;
and if at least one of the starting duration, the network speed, the USB detection result, the deep recovery mode detection result and the serial port detection result does not meet the preset standard, repeating the steps of restarting the SoC array server and the steps after the restarting for n times, and if at least one of the detection results for n times does not meet the preset standard, the sixth test result is an error, and ending the test at the moment.
Optionally, the performing an aging test for each component of the SoC board card for a preset aging period, and generating a seventh test result according to the operation condition of the system during the aging test, where the seventh test result includes:
and performing aging test on each component of the SoC board card, wherein the aging test is of preset aging duration, and if the system is restarted and/or down during the aging test, the seventh test result is an error, and the test is terminated at the moment.
Optionally, the restarting the system for multiple times, obtaining a restarting duration of each time of the system, and generating an eighth test result according to the restarting duration for multiple times, where the eighth test result includes:
simulating a user use scene, restarting the system for multiple times, and acquiring the restarting time length of the system each time, wherein if the restarting time length of the system each time accords with a preset restarting time length standard, the eighth test result is normal;
if the restarting time length of the system at least once does not meet the preset restarting time length standard, the eighth test result is an error, and the test is terminated at the moment.
In a second aspect, the present invention provides an SoC array server, where the SoC array server includes a BMC motherboard, a backplane, a blade board, and a switch board that are connected to each other, the backplane includes a backplane controller, a fan controller, and the fan controller is connected to a plurality of fans; the blade board comprises a blade board controller, an SoC board card and a serial port controller;
the BMC mainboard is used for managing and monitoring the running state of the whole SoC array server; the BMC is also used for monitoring the health state of the SoC array server in real time and performing remote management and maintenance;
The backboard controller is connected with the BMC mainboard, and is used for connecting and coordinating all hardware components on the backboard, ensuring that all hardware components can normally operate and communicate, and managing and controlling all sub-modules on the whole backboard;
the blade board controller is connected with the back board controller, and is used for communicating with each component in the blade board and managing and monitoring the running state of each component in the blade board;
the serial port controller is connected with the BMC mainboard, the blade board controller, the SoC board card and the switch board, and is used for configuring and controlling serial port equipment in the SoC array server, supporting serial port communication and providing a state monitoring function of the SoC array server;
the exchange board is used for establishing a high-speed and stable data channel among all components in the SoC array server; the exchange board is also used for supporting rapid exchange of data among all components in the SoC array server and supporting flexible network configuration so as to meet the requirements of different service scenes;
the fan controller is used for controlling the fan rotating speed and monitoring the fan state;
The SoC board card comprises an SoC chip and is used for executing various calculation tasks and data processing operations on the SoC array server;
the BMC motherboard is configured to perform the automated test method of the SoC array server according to any one of the first aspect.
The embodiment of the invention has the following beneficial effects:
the invention provides an automatic test method of an SoC array server, which is applied to a BMC main board of the SoC array server, wherein the SoC array server further comprises a back plate, a blade plate and a switching plate, the back plate comprises a back plate controller and a fan controller, and the fan controller is connected with a plurality of fans; the blade board comprises a blade board controller, an SoC board card and a serial port controller; the method specifically comprises the following steps: and enabling the backboard controller, the fan, the blade board controller, the SoC board card, the serial port controller and the exchange board to work normally respectively, setting preset duration and preset range to verify the backboard controller, the fan, the blade board controller, the SoC board card, the serial port controller and the exchange board respectively, finally obtaining a verification result and generating a test report. According to the invention, each part in the automatic test system of the SoC array server is subjected to different verification methods to obtain an accurate test result, and a complete test report is generated based on the test result, so that the comprehensive and efficient test of the SoC array server is realized.
Drawings
In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Wherein:
fig. 1 is a schematic flow chart of an automated testing method of an SoC array server according to an embodiment of the present application;
fig. 2 is an exemplary schematic diagram of an automated test method for an SoC array server according to an embodiment of the present application;
fig. 3 is a schematic diagram of a SoC array server according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The embodiment of the application provides an automatic test method for an SoC array server, which is used for comprehensively and efficiently automatically testing all components such as a back panel MCU, a blade MCU, a serial port MCU, a switching board, an SoC board card and the like of the SoC array server through an intelligent automatic test system, so that the reliability and the performance of the server are improved, and the cost of system deployment and maintenance is reduced. The method can accurately detect whether the component works normally or not, and generate a corresponding detection report, so that potential problems can be found in time according to the detection report, and important technical support is brought to deployment and maintenance of a large-scale SoC array server.
Referring to fig. 1, a flow chart of an automated testing method of an SoC array server according to an embodiment of the present application is applied to a BMC motherboard of the SoC array server, where the SoC array server further includes a back board, a blade board, and a switch board; the back plate comprises a back plate controller and a fan controller; the fan controller is connected with a plurality of fans; the blade board comprises a blade board controller, an SoC board card and a serial port controller; the method specifically comprises the following steps:
step 101, a first serial port instruction is sent to a back panel controller, and a first test result is generated according to whether a success instruction sent back by the back panel controller is received within a first preset time length.
In the embodiment of the application, the BMC main board transmits a serial port instruction to the backboard controller, waits for a first preset time period, and if a successful instruction transmitted back by the backboard controller is received within the first preset time period, the first test result is normal; if no successful instruction is received within the first preset time period, repeating the steps of issuing the serial port instruction to the backboard controller and the steps after the serial port instruction, and if no successful instruction is received for n times, the first test result is an error, and the test is terminated at the moment.
Wherein, the first preset duration may be preferably 3s, and n may be preferably 3 times. It should be noted that the first preset duration and the value of n are only a preferred example, and the other values meeting the standard can be applied to the embodiments of the present application, which is not limited herein.
Step 101 is illustrated: the BMC main board issues a serial port instruction to the backboard controller, waits for 3 seconds, and if the backboard controller receives a successful instruction returned by the backboard controller within 3 seconds, the backboard controller is indicated to be normal, and a first test result is returned to be normal; if a success instruction returned by the backboard controller is not received within 3s, the backboard controller is indicated to have an abnormality, the steps are retried for 3 times, if the success instruction is not received or other abnormal signals irrelevant to the success instruction are received in 3 times, the backboard controller is confirmed to be abnormal, a first test result is returned to be an error, a user is reminded of the abnormal condition of the backboard controller, and the test flow is terminated.
Step 102, adjusting the rotation speed of the fan based on the fan controller in the first preset range, and generating a second test result according to whether the actual rotation speed of the adjusted fan meets the rotation speed threshold.
In one possible implementation manner, the fan speed is adjusted based on the fan controller within a first preset range, each value within the first preset range corresponds to a speed threshold, if a certain value is selected within the first preset range to adjust the fan speed, the actual speed of the adjusted fan accords with the speed threshold, and then the second test result is normal; if the actual rotation speed of the adjusted fan does not accord with the rotation speed threshold, repeating the step of selecting a certain value in the first preset range for adjusting the rotation speed of the fan and the subsequent steps for n times, and if the obtained actual rotation speeds of the fan do not accord with the rotation speed threshold for n times, determining that the second test result is wrong, and ending the test at the moment.
The full-load rotating speed of the fan is 15000 revolutions, a first preset range is preset to be 10-100%, at the moment, the rotating speed corresponding to 10% of the full-load rotating speed of the fan is 1500 revolutions, and the rotating speed threshold corresponding to 10% is set to be 1500+/-10% (wherein 1500 is 15000 x 10%); and by the method, the rotating speed threshold value corresponding to each value in the first preset range of 10-100% is obtained. As described above, the first preset time period may be preferably 3s, and n may be preferably 3 times. It should be noted that the first preset duration and the value of n are only a preferred example, and the other values meeting the standard can be applied to the embodiments of the present application, which is not limited herein.
Step 102 is illustrated by: the BMC main board selects a certain value from 10% to 100% to adjust the rotation speed of the fan, and if 25% is selected, the corresponding rotation speed threshold is 3750+/-25% (wherein 3750 is 15000 x 25%), if the adjusted actual rotation speed does not exceed the rotation speed threshold 3750+/-25%, the fan is normal, and the second test result is returned to be normal; if the adjusted actual rotation speed exceeds the rotation speed threshold 3750+ -25%, the fan is indicated to be possibly abnormal, 3 times of the foregoing adjustment steps are retried at this time, if the 3 times of results all show that the fan is abnormal, that is, the actual rotation speeds of the fans in 3 times all exceed the rotation speed threshold 3750+ -25%, the fan is determined to be abnormal, at this time, the second test result is returned to be an error, the user is reminded that the fan is abnormal, and the test flow is terminated.
It should be noted that, in step 102, the adjustment of the rotation speed of the fan may be gradually adjusted from 10% to 100% instead of selecting any value within the first preset range.
Step 103, receiving the starting time of the exchange board, and generating a third test result according to whether the starting time accords with a preset time range.
In the embodiment of the application, the exchange board is restarted, the starting time of the exchange board is received, and if the starting time accords with the preset time range, the third test result is normal; if the starting time length does not accord with the preset time length range, repeating the steps of restarting the exchange board for n times and the following steps, if the starting time length obtained for n times does not accord with the preset time length range, the third test result is an error, and the test is terminated at the moment.
It will be appreciated that step 103 is mainly performed on the up and down test of the switch board, and is used to test whether the restart circuit of the switch board is normal or not, and whether the network path is normal or not. The starting time of the exchange board is generally 30s and is not more than 40s at the longest, so that the preset time length is preferably 30s to 40s, and n is preferably 3 times.
Step 103 is illustrated: restarting the exchange board, receiving the starting time of the exchange board, if the starting time is between 30s and 40s, indicating that the exchange board is normal, and returning a third test result to be normal; if the starting time is longer than 40s or the network access is abnormal, the switching board is indicated to have abnormal conditions, the steps are repeated for 3 times, if the starting time obtained in 3 times is longer than 40s, a third test result is returned to be an error at the moment, the user is reminded of the abnormal conditions of the switching board, and the test flow is terminated.
Step 104, a second serial port instruction is sent to the blade board controller, and a fourth test result is generated according to whether a success instruction returned by the blade board controller is received within a first preset time period.
In the embodiment of the application, a serial port instruction is issued to the blade board controller, a first preset duration is waited, and if a success instruction transmitted back by the blade board controller is received within the first preset duration, a fourth test result is normal; if no successful instruction is received within the first preset time period, repeating the steps of issuing the serial port instruction to the blade board controller and the steps after the serial port instruction for n times, and if no successful instruction is received for n times, determining that the fourth test result is an error, and terminating the test at the moment.
Wherein, as mentioned above, the first preset duration is preferably 3s, and n is preferably 3 times.
Step 104 is illustrated: the BMC main board transmits a serial port instruction to the blade board controller, waits for 3s, and if a successful instruction returned by the blade board controller is received in 3s, the blade board controller is indicated to be normal, and a fourth test result is returned to be normal; if the BMC in 3s fails to receive the successful instruction returned by the blade controller or receives other abnormal signals, the blade controller is indicated to possibly have abnormality, the previous steps are retried for 3 times, if the successful instruction returned by the blade controller is not received in 3 times, the blade controller is confirmed to be abnormal, a fourth test result is returned to be error, a user is reminded that the blade controller has abnormality, and the test flow is terminated.
Step 105, monitoring the continuous output condition of serial port data of the serial port controller, and generating a fifth test result according to whether the serial port data output by the serial port controller is received within a first preset duration.
In the embodiment of the application, the continuous output condition of the serial port data of the serial port controller is monitored, if the serial port data output by the serial port controller is not received within the first preset time period, a second preset time period is waited, and if the serial port data is not received within the second preset time period, the fifth test result is an error, and the test is terminated at the moment.
It can be understood that the serial port MCU is used as a real-time serial port output party, and the BMC main board can obtain the running condition of the serial port controller only by monitoring whether serial port data is continuously output or not. Wherein the first preset time period is preferably 3s as described above, and the second preset time period is preferably 30s.
Step 105 is illustrated: the BMC main board monitors whether serial port data of the serial port controller are continuously output or not, if the serial port data output by the serial port controller are not received within 3s, the serial port controller is indicated to be possibly abnormal, 30s is waited at the moment, if the serial port data are not received all the time in the waited 30s, the serial port controller is confirmed to be abnormal, a fifth test result is returned to be error at the moment, the serial port controller is reminded of being used for abnormal conditions, and the test flow is terminated; if the serial port data transmitted by the serial port controller is received within 3 seconds, the serial port controller is normal, and a fifth test result is returned to be normal.
Step 106, obtaining at least one of a start-up time length, a network speed, a USB detection result, a deep recovery mode detection result and a serial port detection result of the SoC array server, and generating a sixth test result according to whether the start-up time length, the network speed, the USB detection result, the deep recovery mode detection result and the serial port detection result all meet preset standards.
In the embodiment of the application, the SoC array server is restarted, and the starting time is obtained; acquiring at least one of a network speed, a USB detection result, a deep recovery mode detection result and a serial port detection result, and if the starting time length, the network speed, the USB detection result, the deep recovery mode detection result and the serial port detection result all meet preset standards, the sixth test result is normal; if at least one of the starting duration, the network speed, the USB detection result, the deep recovery mode detection result and the serial port detection result does not meet the preset standard, repeating the steps of restarting the SoC array server for n times and the steps after the restarting, and if at least one of the detection results for n times does not meet the preset standard, the sixth test result is an error, and stopping the test at the moment.
It is understood that step 106 is to test and verify the SoC function in the SoC array server. The starting method comprises the steps that a preset standard of starting duration of restarting an SoC array server is that starting is completed within 2 minutes; the preset standard of the net speed is 2500W; the preset standard of the USB detection result is ADB (Android Debug Bridge, the Android debug bridge is a command line tool with various functions, and the devices can be communicated with each other) whether the devices are normally connected or not; the preset standard of the detection result of the depth recovery mode is whether the depth brushing mode is normally started or not; the preset standard of the serial port detection result is whether serial port transmission data are normal or not.
Step 106 is illustrated: restarting the SoC array server to obtain starting time; acquiring at least one of a network speed, a USB detection result, a deep recovery mode detection result and a serial port detection result, and if the starting duration is within 2 minutes, the network speed is 2500M, USB, the detection result of the deep recovery mode shows that the ADB is normally connected, the detection result of the deep recovery mode shows that the deep brushing mode is normally started, the serial port detection result shows that the serial port data transmission is normal, confirming that the SoC function is normal, and returning a sixth test result to be normal at the moment; if any one of the starting duration, the network speed, the USB detection result, the deep recovery mode detection result and the serial port detection result does not meet the preset standard, the SoC function is possibly abnormal, the steps are repeated for 3 times, if at least one of the detection results of 3 times does not meet the preset standard, the SoC function is confirmed to be abnormal, a sixth test result is required to be returned to be an error at the moment, a user is reminded of the abnormal condition of the SoC function, and the test flow is terminated.
And 107, performing aging tests with preset aging time periods on all components of the SoC board card respectively, and generating a seventh test result according to the operation condition of the system during the aging tests.
In the embodiment of the application, the aging test is performed on each component of the SoC board card, wherein the aging test is the preset aging time, and if the system is restarted and/or down during the aging test, the seventh test result is an error, and the test is terminated at the moment.
Wherein the preset aging time is 3 x 24 hours. Burn-in testing is primarily directed to various components of the SoC board, such as CPU, GPU, network card, etc.
Step 107 is illustrated: performing aging test with aging time length of 3×24 hours on each component of the SoC board card, if restarting and/or downtime occurs in the system during the aging test, confirming that the aging test of the system is unsuccessful, returning a seventh test result to be an error at the moment, reminding a user that the aging test is abnormal, and terminating the test flow; if the system is always in normal operation and is not restarted and/or down during the aging test, the aging test of the system is successful, and a seventh test result is returned to be normal.
And 108, restarting the system for multiple times, obtaining the restarting time length of each system, and generating an eighth test result according to the restarting time length for multiple times.
In the embodiment of the application, a user use scene is simulated, the system is restarted for a plurality of times, the restarting time length of each system is obtained, and if the restarting time length of each system meets the preset restarting time length standard, the eighth test result is normal; if the restarting time length of the system at least once does not meet the preset restarting time length standard, the eighth test result is an error, and the test is terminated at the moment.
It will be appreciated that the purpose of step 108 is to simulate a frequent restart verification operation in use by a user, testing whether the frequent restart would cause damage to the SoC array server. Wherein the preset restart time standard is 2 minutes.
Step 108 is illustrated: simulating a user use scene, carrying out frequent restarting on the system for a plurality of times, and obtaining the restarting time length of each system on the assumption of restarting 100 times, wherein if the restarting of each system is completed within 2 minutes, the eighth test result is normal; if the restarting time length of the system for at least one time is not within 2 minutes, the system is not good enough in frequent restarting resistance, an eighth test result is returned to be an error, a user is reminded of abnormal restarting operation test, and the test flow is terminated.
And 109, generating and outputting a test report based on the first test result, the second test result, the third test result, the fourth test result, the fifth test result, the sixth test result, the seventh test result and the eighth test result.
It can be understood that referring to fig. 2, an exemplary schematic diagram of an automated test method for an SoC array server according to an embodiment of the present application is provided, after all test verification ends, a complete test report is generated by the first test result, the second test result, the third test result, the fourth test result, the fifth test result, the sixth test result, the seventh test result, and the eighth test result, and is automatically output and displayed to the user side.
In the embodiment of the application, each part in the automatic test system of the SoC array server is verified by a different verification method to obtain an accurate test result, and a complete test report is generated based on the test result, so that the comprehensive and efficient test of the SoC array server is realized.
Referring to fig. 3, a schematic structure diagram of an SoC array server according to an embodiment of the present application is provided, the SoC array server includes a BMC motherboard 310, a back board 320, a blade board 330, and a switch board 340 that are connected to each other, the back board 320 includes a back board controller 321 and a fan controller 322, and the fan controller 322 is connected to a plurality of fans; blade 330 includes blade controller 331, soC board 332, serial controller 333. Specific:
the BMC (Baseboard Management Controller) motherboard 310 may be connected to the switch board 340 through at least one of an ethernet, a UART (Universal Asynchronous Receiver-transceiver) serial port, and a Console of a Console, and the BMC motherboard 310 is further connected to the backplane controller 321 and the serial port controller 333 through a UART serial port; the back board controller 321 is respectively connected with the exchange board 340 and the blade board controller 331 in a manner of controlling I/O, the back board controller 321 is connected with the fan controller 322 through a PM BUS (Power Management Bus ), and the back board controller 321 can also be connected with the blade board controller 331 through a UART serial port; the blade board controller 331 is connected with the serial port controller 333 and the SoC board card 332 by controlling the I/O mode; the SoC board 332 is connected with the serial port controller 333 through a UART serial port, and the SoC board 332 is connected with the switch board 340 through an ethernet; the serial controller 333 may also be connected to the switch board 340 via ethernet.
The BMC motherboard 310 is a core control board of the SoC array server, and is responsible for managing and monitoring the operation state of the entire server. It integrates a series of management functions such as remote monitoring, fault diagnosis, power control, fan control, etc. Through the BMC board 310, an administrator can monitor the health status of the server in real time, and perform remote management and maintenance.
The back plate 320 has integrated therein a back plate controller 321 and a fan controller 322.
The backplane controller 321 is a control module on the SoC array server backplane 320, and is used to manage and control each sub-module on the entire backplane 320. It is responsible for interfacing and coordinating the various hardware components on the backplane 320, ensuring that they function and communicate properly. There is data interaction between the backplane controller 321 and the BMC motherboard 310, through which the BMC motherboard 310 can monitor and manage all the sub-modules on the backplane 320.
The fan controller 322 is responsible for controlling the above-described plurality of fan speeds and monitoring the fan status. It should be understood that the number of fans in fig. 2 is preferably 4, but it does not represent that only 4 fans can be connected in the SoC array server, which is only preferred and not excessively limited.
Blade controller 331, soC board 332, serial controller 333 are integrated on blade 330. Among them, the number of blade boards 330 is preferably 4, the number of SoC boards 332 is preferably 10, and the number of serial controllers 333 is preferably 2. It should be noted that this is only a preferred case and is not meant to be an excessive limitation on the number.
The blade controller 331 is a control module on the SoC array server blade 330, and is configured to manage and monitor the operation state of the blade 330. It is responsible for communicating with the various components within blade 330. There is data interaction between blade board controller 331 and back board controller 321, through which back board controller 321 can uniformly manage and monitor blade board 330.
The SoC board 332 is a core computing unit of the SoC array server, and each SoC board integrates a SoC chip, including a processor, a memory, an I/O interface, and the like. The SoC board 332 is the compute and data processing core of the SoC array server, which is responsible for performing various compute tasks and data processing operations on the SoC array server. In the SoC array server, a plurality of SoC boards 332 form an array, and are connected and cooperate with the backplane controller 321 through the switch board 340, so as to realize high-performance data processing and computing capability.
The serial port controller 333 is connected to the BMC motherboard 310, the blade board controller 331, the SoC board 332, and the switch board 340, and is a control module in the SoC array server, and is responsible for managing and controlling each serial port in the SoC array server. The configuration and control of each serial port device in the SoC array server can be realized, serial port communication is supported, and the state monitoring function of the SoC devices in the SoC array server is provided.
Switch board 340 is an important component in the SoC array server and is responsible for handling network communications within the SoC array server. It integrates a high-performance switching chip for establishing high-speed, stable data channels between the various components inside the SoC array server. The switch board 340 allows for fast data exchange between the various components and supports flexible network configurations to meet the needs of different traffic scenarios. The exchange plate may be a 25G exchange plate, and again, this is only a preferred case and is not intended to be limiting.
It will be appreciated that one SoC array server automated test method described in fig. 1 above is performed in the BMC motherboard 310 in the SoC array server.
In the embodiment of the present application, the method for automatically testing the BMC motherboard 310 to execute any SoC array server as described in fig. 1 may refer to the embodiment shown in fig. 1, and will not be described herein.
In an embodiment of the present application, there is provided a computer-readable storage medium storing a computer program that, when executed by a processor, causes the processor to perform the method of any one of the above-described method embodiments.
Those skilled in the art will appreciate that all or part of the processes in the methods of the above embodiments may be implemented by a computer program for instructing relevant hardware, where the program may be stored in a non-volatile computer readable storage medium, and where the program, when executed, may include processes in the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims (10)

1. The automatic test method for the SoC array server is characterized by being applied to a BMC main board of the SoC array server, wherein the SoC array server further comprises a back plate, a blade plate and a switch plate, the back plate comprises a back plate controller and a fan controller, and the fan controller is connected with a plurality of fans; the blade board comprises a blade board controller, an SoC board card and a serial port controller; the method comprises the following steps:
Sending a first serial port instruction to the backboard controller, and generating a first test result according to whether a success instruction returned by the backboard controller is received within a first preset time length;
adjusting the rotating speed of the fan based on the fan controller in a first preset range, and generating a second test result according to whether the actual rotating speed of the fan after adjustment accords with a rotating speed threshold value;
receiving the starting time length of the exchange board, and generating a third test result according to whether the starting time length accords with a preset time length range;
sending a second serial port instruction to the blade board controller, and generating a fourth test result according to whether a success instruction returned by the blade board controller is received within the first preset time length;
monitoring the continuous output condition of serial port data of the serial port controller, and generating a fifth test result according to whether the serial port data output by the serial port controller is received within the first preset duration;
acquiring at least one of starting time length, network speed, USB detection result, deep recovery mode detection result and serial port detection result of the SoC array server, and generating a sixth test result according to whether the starting time length, network speed, USB detection result, deep recovery mode detection result and serial port detection result all meet preset standards;
Respectively performing aging tests with preset aging time for each component of the SoC board card, and generating a seventh test result according to the running condition of the system during the aging tests;
restarting the system for multiple times, obtaining the restarting time length of the system each time, and generating an eighth test result according to the restarting time length for multiple times;
and generating and outputting a test report based on the first test result, the second test result, the third test result, the fourth test result, the fifth test result, the sixth test result, the seventh test result and the eighth test result.
2. The method of claim 1, wherein the sending the first serial port command to the backplane controller generates a first test result according to whether a success command returned by the backplane controller is received within a first preset duration, including:
issuing a serial port instruction to the backboard controller, waiting for a first preset time period, and if a successful instruction returned by the backboard controller is received within the first preset time period, determining that the first test result is normal;
and if the success instruction is not received within the first preset time period, repeating the step of issuing the serial port instruction to the backboard controller and the steps after the step for issuing the serial port instruction for n times, and if the success instruction is not received for n times, determining that the first test result is an error, and terminating the test at the moment.
3. The method of claim 1, wherein the adjusting the fan speed based on the fan controller within the first preset range, and generating the second test result according to whether the adjusted actual speed of the fan meets the speed threshold value, comprises:
adjusting the rotating speed of the fan based on the fan controller in a first preset range, wherein each value in the first preset range corresponds to a rotating speed threshold value, and if a certain value is selected in the first preset range to adjust the rotating speed of the fan, the actual rotating speed of the fan after adjustment accords with the rotating speed threshold value, the second test result is normal;
and if the actual rotation speed of the fan after adjustment does not accord with the rotation speed threshold, repeating the step of selecting a certain value in a first preset range for adjusting the rotation speed of the fan for n times and the subsequent steps, and if the actual rotation speed of the fan obtained for n times does not accord with the rotation speed threshold, determining that the second test result is wrong, and ending the test at the moment.
4. The method of claim 1, wherein the receiving the start-up duration of the switch board and generating the third test result according to whether the start-up duration meets a preset duration range comprise:
Restarting the exchange board, receiving the starting time length of the exchange board, and if the starting time length accords with a preset time length range, determining that the third test result is normal;
and if the starting time length does not accord with the preset time length range, repeating the step of restarting the exchange board for n times and the subsequent steps, and if the starting time length obtained for n times does not accord with the preset time length range, the third test result is an error, and the test is terminated at the moment.
5. The method of claim 1, wherein the sending the second serial port command to the blade controller generates a fourth test result according to whether a success command returned by the blade controller is received within the first preset time period, including:
issuing a serial port instruction to the blade controller, waiting for a first preset time period, and if a success instruction transmitted back by the blade controller is received within the first preset time period, determining that the fourth test result is normal;
and if the success instruction is not received within the first preset time period, repeating the step of issuing the serial port instruction to the blade board controller and the steps after the step for issuing the serial port instruction for n times, and if the success instruction is not received for n times, determining that the fourth test result is an error, and terminating the test at the moment.
6. The method of claim 1, wherein the monitoring the serial port data continuous output condition of the serial port controller, according to whether the serial port data output by the serial port controller is received within the first preset duration, generates a fifth test result, including:
and monitoring the continuous output condition of the serial port data of the serial port controller, waiting for a second preset duration if the serial port data output by the serial port controller is not received within the first preset duration, and stopping the test at the moment if the serial port data is not received within the second preset duration, wherein the fifth test result is an error.
7. The method of claim 1, wherein obtaining at least one of a start-up duration, a network speed, a USB detection result, a deep recovery mode detection result, and a serial port detection result of the SoC array server, and generating a sixth test result according to whether the start-up duration, the network speed, the USB detection result, the deep recovery mode detection result, and the serial port detection result all meet a preset standard, comprises:
restarting the SoC array server to obtain starting time; acquiring at least one of a network speed, a USB detection result, a deep recovery mode detection result and a serial port detection result, and if the starting time length, the network speed, the USB detection result, the deep recovery mode detection result and the serial port detection result all meet preset standards, determining that the sixth test result is normal;
And if at least one of the starting duration, the network speed, the USB detection result, the deep recovery mode detection result and the serial port detection result does not meet the preset standard, repeating the steps of restarting the SoC array server and the steps after the restarting for n times, and if at least one of the detection results for n times does not meet the preset standard, the sixth test result is an error, and ending the test at the moment.
8. The method of claim 1, wherein the performing the burn-in test for the predetermined burn-in duration for each component of the SoC board, and generating the seventh test result according to the operation condition of the system during the burn-in test, includes:
and performing aging test on each component of the SoC board card, wherein the aging test is of preset aging duration, and if the system is restarted and/or down during the aging test, the seventh test result is an error, and the test is terminated at the moment.
9. The method of claim 1, wherein the restarting the system a plurality of times, obtaining a restart duration of each time the system, and generating an eighth test result according to the restart duration a plurality of times, comprises:
Simulating a user use scene, restarting the system for multiple times, and acquiring the restarting time length of the system each time, wherein if the restarting time length of the system each time accords with a preset restarting time length standard, the eighth test result is normal;
if the restarting time length of the system at least once does not meet the preset restarting time length standard, the eighth test result is an error, and the test is terminated at the moment.
10. The SoC array server is characterized by comprising a BMC main board, a back board, a blade board and a switching board which are connected with each other, wherein the back board comprises a back board controller and a fan controller, and the fan controller is connected with a plurality of fans; the blade board comprises a blade board controller, an SoC board card and a serial port controller;
the BMC mainboard is used for managing and monitoring the running state of the whole SoC array server; the BMC is also used for monitoring the health state of the SoC array server in real time and performing remote management and maintenance;
the backboard controller is connected with the BMC mainboard, and is used for connecting and coordinating all hardware components on the backboard, ensuring that all hardware components can normally operate and communicate, and managing and controlling all sub-modules on the whole backboard;
The blade board controller is connected with the back board controller, and is used for communicating with each component in the blade board and managing and monitoring the running state of each component in the blade board;
the serial port controller is connected with the BMC mainboard, the blade board controller, the SoC board card and the switch board, and is used for configuring and controlling serial port equipment in the SoC array server, supporting serial port communication and providing a state monitoring function of the SoC array server;
the exchange board is used for establishing a high-speed and stable data channel among all components in the SoC array server; the exchange board is also used for supporting rapid exchange of data among all components in the SoC array server and supporting flexible network configuration so as to meet the requirements of different service scenes;
the fan controller is used for controlling the fan rotating speed and monitoring the fan state;
the SoC board card comprises an SoC chip and is used for executing various calculation tasks and data processing operations on the SoC array server;
wherein, the BMC motherboard is configured to perform the automated test method of the SoC array server according to any of claims 1 to 9.
CN202311214753.8A 2023-09-19 2023-09-19 Automatic testing method of SoC array server and SoC array server Pending CN117234820A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311214753.8A CN117234820A (en) 2023-09-19 2023-09-19 Automatic testing method of SoC array server and SoC array server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311214753.8A CN117234820A (en) 2023-09-19 2023-09-19 Automatic testing method of SoC array server and SoC array server

Publications (1)

Publication Number Publication Date
CN117234820A true CN117234820A (en) 2023-12-15

Family

ID=89090710

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311214753.8A Pending CN117234820A (en) 2023-09-19 2023-09-19 Automatic testing method of SoC array server and SoC array server

Country Status (1)

Country Link
CN (1) CN117234820A (en)

Similar Documents

Publication Publication Date Title
CN111104139A (en) Firmware upgrading method, device, equipment and storage medium
CN111078484B (en) Power-off test method, device, equipment and storage medium for system upgrade
CN112328467B (en) Embedded system program testing method and device
CN114138587B (en) Method, device and equipment for verifying reliability of server power firmware upgrade
CN116680101A (en) Method and device for detecting downtime of operating system, and method and device for eliminating downtime of operating system
CN112073263A (en) Method, system, equipment and medium for testing and monitoring reliability of white box switch
CN101800672B (en) Equipment detection method and equipment
CN111124828B (en) Data processing method, device, equipment and storage medium
CN116909800A (en) Method and device for locating crash information and storage medium
CN116974941A (en) Testing method for management interface function of intelligent platform of baseboard management controller
CN117234820A (en) Automatic testing method of SoC array server and SoC array server
CN116489046A (en) Reliability test method, device, equipment, medium and system of shunt equipment
CN115437865A (en) Method, device, equipment and medium for testing abnormal power failure of hard disk
CN115242697A (en) Test system, method, equipment and medium for switch
CN112034296B (en) Avionics fault injection system and method
CN113778732A (en) Fault positioning method and device for service board card
CN113608939A (en) Server starting timing method, device, terminal and storage medium in performance test
CN112463504A (en) Double-control storage product testing method, system, terminal and storage medium
CN112162887A (en) Storage device and machine frame shared component access method, device and storage medium thereof
CN111459734A (en) Method and system for testing fault monitoring period and computer storage medium
CN112003727A (en) Multi-node server power supply testing method, system, terminal and storage medium
CN114281615B (en) Automatic testing system and method for consistency of stored data
CN115941524B (en) Network communication reliability test method and test system based on train display equipment
CN116915583B (en) Communication abnormality diagnosis method, device and electronic equipment
CN118260131B (en) Method and device for testing operation performance of expansion module

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination