TWI739603B - Monitoring and problem analysis system during server test and method thereof - Google Patents

Monitoring and problem analysis system during server test and method thereof Download PDF

Info

Publication number
TWI739603B
TWI739603B TW109132213A TW109132213A TWI739603B TW I739603 B TWI739603 B TW I739603B TW 109132213 A TW109132213 A TW 109132213A TW 109132213 A TW109132213 A TW 109132213A TW I739603 B TWI739603 B TW I739603B
Authority
TW
Taiwan
Prior art keywords
server
test
tested
monitoring
module
Prior art date
Application number
TW109132213A
Other languages
Chinese (zh)
Other versions
TW202212857A (en
Inventor
宋寶棟
陳樹青
Original Assignee
英業達股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 英業達股份有限公司 filed Critical 英業達股份有限公司
Priority to TW109132213A priority Critical patent/TWI739603B/en
Application granted granted Critical
Publication of TWI739603B publication Critical patent/TWI739603B/en
Publication of TW202212857A publication Critical patent/TW202212857A/en

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Debugging And Monitoring (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

A monitoring and problem analysis system during server test and a method thereof are provided. Timeout response is triggered and dump log is generated when test program of tested server is hung on testing process. Baseboard management controller (BMC) of tested server is connected according to BMC IP address and serial number and BMC of tested server is logged in by username and password to receive dump log from tested server when timeout response is triggered by tested server is monitored by event monitoring server. Dump log is analyzed to generate analysis result with cause of the test shutdown of the server to be tested according to the test procedure. Therefore, the efficiency of obtaining and analyzing in process of server testing the cause of test shutdown may be achieved.

Description

對伺服器測試時的監控與問題分析系統及其方法Monitoring and problem analysis system and method during server test

一種監控與問題分析系統及其方法,尤其是指一種對伺服器測試時的監控與問題分析系統及其方法。 A monitoring and problem analysis system and method thereof, in particular to a monitoring and problem analysis system and method during server testing.

隨著伺服器行業的快速發展,伺服器的市場需求量與日俱增,對於伺服器的測試服務的品質越來越嚴格,為了精確找出測試伺服器的在測試過程中的異常狀況,現有的對於伺服器的測試方式為直接使用各種測試工具進行測試(包括往復開關機測試程式、壓力測試程式…等),當伺服器在進行測試中停擺時,伺服器本身就會產生阻塞的狀況,導致伺服器不能主動做進行任何的作動,故而無法正常與準確地獲得伺服器在進行測試時的停擺成因。 With the rapid development of the server industry, the market demand for servers is increasing day by day. The quality of server testing services is becoming more and more stringent. The test method of the server is to directly use various test tools to test (including reciprocating switch machine test program, pressure test program... etc.). When the server stops during the test, the server itself will be blocked, causing the server It can't take any initiative to do any action, so it is impossible to obtain the cause of the server shutdown during the test normally and accurately.

綜上所述,可知先前技術中長期以來一直存在現有對於伺服器進行測試發生停擺無法正常與準確獲得測試停擺成因的問題,因此有必要提出改進的技術手段,來解決此一問題。 To sum up, it can be known that the prior art has long been a problem that the existing server shutdown cannot be obtained normally and accurately due to the shutdown of the server during the test. Therefore, it is necessary to propose improved technical means to solve this problem.

有鑒於先前技術存在現有對於伺服器進行測試發生停擺無法正常與準確獲得測試停擺成因的問題,本發明遂揭露一種對伺服器測試時的監控與問題分析系統及其方法,其中: In view of the existing problems in the prior art that the testing of the server fails to normally and accurately obtain the cause of the testing shutdown, the present invention discloses a monitoring and problem analysis system and method for server testing, in which:

本發明所揭露的對伺服器測試時的監控與問題分析系統,其包含:待測試伺服器、事件監測伺服器以及管理員裝置,待測試伺服器更包含:資料收集模組、測試模組、生成模組以及傳送模組;事件監測伺服器更包含:接收模組、監測模組、日誌接收模組、日誌分析模組以及資訊傳送模組。 The monitoring and problem analysis system for server testing disclosed in the present invention includes: a server to be tested, an event monitoring server, and an administrator device. The server to be tested further includes: a data collection module, a test module, Generating module and transmitting module; The event monitoring server further includes: receiving module, monitoring module, log receiving module, log analysis module, and information transmitting module.

待測試伺服器的資料收集模組是用以收集待測試伺服器的基板管理控制器網際網路協定位址(Baseboard Management Controller Internet Protocol Address,BMC IP Address)、使用者名稱、密碼以及序號;待測試伺服器的測試模組是用以依據測試程序進行待測試伺服器的測試,當測試程序在測試過程中停擺(hung)時觸發逾時(timeout)響應;待測試伺服器的生成模組是當測試程序在測試過程中停擺時,生成傾印日誌(dump log);及待測試伺服器的傳送模組是用以傳送基板管理控制器網際網路協定位址、使用者名稱、密碼以及序號。 The data collection module of the server to be tested is used to collect the Baseboard Management Controller Internet Protocol Address (BMC IP Address), user name, password and serial number of the server to be tested; The test module of the test server is used to test the server to be tested according to the test procedure. When the test procedure is hung during the test, a timeout response is triggered; the generation module of the server to be tested is When the test program stops during the test, a dump log is generated; and the transmission module of the server to be tested is used to transmit the baseboard management controller Internet protocol address, user name, password, and serial number .

事件監測伺服器的接收模組是用以自傳送模組接收基板管理控制器網際網路協定位址、使用者名稱、密碼以及序號;事件監測伺服器的監測模組是用以對待測試伺服器的測試模組的測試過程進行監測;事件監測伺服器的日誌接收模組是當監測模組監測到測試模組觸發逾時響應時,依據基板管理控制器網際網路協定位址以及序號連線至待測試伺服器的基板管理控制器並藉由使用者名稱以及密碼登入待測試伺服器的基板管理控制器以自待測試伺服器接收傾印日誌;事件監測伺服器的日誌分析模組是用以對傾印日誌中的資 料進行分析以分析出待測試伺服器依據測試程序進行測試停擺的發生成因以生成分析結果;及事件監測伺服器的資訊傳送模組是用以傳送分析結果。 The receiving module of the event monitoring server is used to receive the baseboard management controller Internet protocol address, user name, password and serial number from the transmitting module; the monitoring module of the event monitoring server is used to test the server The test process of the test module is monitored; the log receiving module of the event monitoring server is connected according to the baseboard management controller Internet protocol address and serial number when the monitoring module detects that the test module triggers a time-out response Go to the baseboard management controller of the server to be tested and log in to the baseboard management controller of the server to be tested with the user name and password to receive the dump log from the server to be tested; the log analysis module of the event monitoring server is used In order to check the data in the dump log The data is analyzed to analyze the cause of the test shutdown of the server to be tested according to the test procedure to generate analysis results; and the information transmission module of the event monitoring server is used to transmit the analysis results.

管理員裝置自資訊傳送模組接收分析結果並加以顯示。 The manager device receives the analysis result from the information transmission module and displays it.

本發明所揭露的對伺服器測試時的監控與問題分析方法,其包含下列步驟: The monitoring and problem analysis method for server testing disclosed in the present invention includes the following steps:

首先,待測試伺服器收集待測試伺服器的基板管理控制器網際網路協定位址、使用者名稱、密碼以及序號;接著,待測試伺服器傳送基板管理控制器網際網路協定位址、使用者名稱、密碼以及序號至事件監測伺服器;接著,待測試伺服器依據測試程序進行待測試伺服器的測試;接著,事件監測伺服器對待測試伺服器的測試過程進行監測;接著,當測試程序在測試過程中停擺時,待測試伺服器觸發逾時響應;接著,當測試程序在測試過程中停擺時,待測試伺服器生成傾印日誌;接著,當事件監測伺服器監測到待測試伺服器觸發逾時響應時,依據基板管理控制器網際網路協定位址以及序號連線至待測試伺服器的基板管理控制器並藉由使用者名稱以及密碼登入待測試伺服器的基板管理控制器以自待測試伺服器接收傾印日誌;接著,事件監測伺服器對傾印日誌中的資料進行分析以分析出待測試伺服器依據測試程序進行測試停擺的發生成因以生成分析結果;最後,事件監測伺服器傳送分析結果至管理員裝置,管理員裝置接收分析結果並加以顯示。 First, the server to be tested collects the Internet protocol address, user name, password, and serial number of the baseboard management controller of the server to be tested; then, the server to be tested sends the Internet protocol address and use The user name, password, and serial number are sent to the event monitoring server; then, the server to be tested performs the test of the server to be tested according to the test procedure; then, the event monitoring server monitors the test process of the server to be tested; then, the test procedure When the test is stopped during the test, the server under test triggers a timeout response; then, when the test program is stopped during the test, the server under test generates a dump log; then, when the event monitoring server detects the server under test When the timeout response is triggered, connect to the baseboard management controller of the server under test according to the internet protocol address and serial number of the baseboard management controller, and log in to the baseboard management controller of the server under test with the user name and password. Receive the dump log from the server under test; then, the event monitoring server analyzes the data in the dump log to analyze the cause of the test shutdown of the server under test according to the test procedure to generate analysis results; finally, event monitoring The server sends the analysis result to the administrator device, and the administrator device receives the analysis result and displays it.

本發明所揭露的系統及方法如上,與先前技術之間的差異在於當待測試伺服器中測試程序在測試過程中停擺時,待測試伺服器觸發逾時響應與生成傾印日誌,事件監測伺服器監測到待測試伺服器觸發逾時響應時,依據基板管理控制器網際網路協定位址以及序號連線至待測試伺服器的基板管理控 制器並藉由使用者名稱以及密碼登入待測試伺服器的基板管理控制器以自待測試伺服器接收傾印日誌,事件監測伺服器對傾印日誌中的資料進行分析以分析出待測試伺服器依據測試程序進行測試停擺的發生成因以生成分析結果,事件監測伺服器傳送分析結果至管理員裝置,管理員裝置接收分析結果並加以顯示。 The system and method disclosed in the present invention are as above. The difference with the prior art is that when the test program in the server under test stops during the test, the server under test triggers a timeout response and generates a dump log, and the event monitoring server When the controller detects that the server under test triggers a time-out response, it connects to the substrate management controller of the server under test according to the Internet protocol address and serial number of the substrate management controller. And log in the baseboard management controller of the server to be tested with the user name and password to receive the dump log from the server to be tested. The event monitoring server analyzes the data in the dump log to analyze the server to be tested The device tests the cause of the shutdown according to the test program to generate analysis results. The event monitoring server sends the analysis results to the administrator device, and the administrator device receives the analysis results and displays them.

透過上述的技術手段,本發明可以達成在伺服器測試過程中產生停擺能準確獲得與分析出測試停擺成因的技術功效。 Through the above-mentioned technical means, the present invention can achieve a technical effect that can accurately obtain and analyze the cause of the test stoppage when the server stoppage is generated during the server test process.

10:待測試伺服器 10: Server to be tested

11:資料收集模組 11: Data collection module

12:測試模組 12: Test module

13:生成模組 13: Generate modules

14:傳送模組 14: Transmission module

20:事件監測伺服器 20: Event monitoring server

21:接收模組 21: receiving module

22:監測模組 22: Monitoring module

23:日誌接收模組 23: Log receiving module

24:日誌分析模組 24: log analysis module

25:資訊傳送模組 25: Information Transmission Module

30:管理員裝置 30: Manager device

41:顯示內容 41: display content

42:傾印日誌 42: Dump log

步驟101:待測試伺服器收集待測試伺服器的基板管理控制器網際網路協定位址、使用者名稱、密碼以及序號 Step 101: The server under test collects the Internet protocol address, user name, password, and serial number of the baseboard management controller of the server under test

步驟102:待測試伺服器傳送基板管理控制器網際網路協定位址、使用者名稱、密碼以及序號至事件監測伺服器 Step 102: The server to be tested sends the baseboard management controller Internet protocol address, user name, password, and serial number to the event monitoring server

步驟103:待測試伺服器依據測試程序進行待測試伺服器的測試 Step 103: The server to be tested performs the test of the server to be tested according to the test procedure

步驟104:事件監測伺服器對待測試伺服器的測試過程進行監測 Step 104: The event monitoring server monitors the testing process of the server to be tested

步驟105:當測試程序在測試過程中停擺時,待測試伺服器觸發逾時響應 Step 105: When the test program stops during the test, the server to be tested triggers a timeout response

步驟106:當測試程序在測試過程中停擺時,待測試伺服器生成傾印日誌 Step 106: When the test program stops during the test, the server to be tested generates a dump log

步驟107:當事件監測伺服器監測到待測試伺服器觸發逾時響應時,依據基板管理控制器網際網路協定位址以及序號連線至待測試伺服器的 基板管理控制器並藉由使用者名稱以及密碼登入待測試伺服器的基板管理控制器以自待測試伺服器接收傾印日誌 Step 107: When the event monitoring server detects that the server under test triggers a timeout response, it connects to the server under test according to the Internet Protocol address and serial number of the baseboard management controller The baseboard management controller logs in the baseboard management controller of the server to be tested with the user name and password to receive the dump log from the server to be tested

步驟108:事件監測伺服器對傾印日誌中的資料進行分析以分析出待測試伺服器依據測試程序進行測試停擺的發生成因以生成分析結果 Step 108: The event monitoring server analyzes the data in the dump log to analyze the cause of the test shutdown of the server to be tested according to the test procedure to generate the analysis result

步驟109:事件監測伺服器傳送分析結果至管理員裝置,管理員裝置接收分析結果並加以顯示 Step 109: The event monitoring server sends the analysis result to the administrator device, and the administrator device receives the analysis result and displays it

第1圖繪示為本發明對伺服器測試時的監控與問題分析系統的系統方塊圖。 Figure 1 is a system block diagram of the monitoring and problem analysis system during server testing of the present invention.

第2A圖至第2C圖繪示為本發明對伺服器測試時的監控與問題分析的顯示內容示意圖。 Figures 2A to 2C are schematic diagrams of the display content of the monitoring and problem analysis during server testing of the present invention.

第3圖繪示為本發明對伺服器測試時的監控與問題分析的傾印日誌示意圖。 Figure 3 is a schematic diagram of the dump log for monitoring and problem analysis during server testing according to the present invention.

第4A圖以及第4B圖繪示為本發明對伺服器測試時的監控與問題分析方法的方法流程圖。 Fig. 4A and Fig. 4B are flowcharts of the method for monitoring and problem analysis during server testing of the present invention.

以下將配合圖式及實施例來詳細說明本發明的實施方式,藉此對本發明如何應用技術手段來解決技術問題並達成技術功效的實現過程能充分理解並據以實施。 The following describes the implementation of the present invention in detail with the drawings and embodiments, so as to fully understand and implement the implementation process of how the present invention uses technical means to solve technical problems and achieve technical effects.

以下首先要說明本發明所揭露的對伺服器測試時的監控與問題分析系統,並請參考「第1圖」所示,「第1圖」繪示為本發明對伺服器測試時的監控與問題分析系統的系統方塊圖。 The following first describes the monitoring and problem analysis system for server testing disclosed in the present invention, and please refer to "Figure 1" which shows the monitoring and problem analysis system of the present invention during server testing. System block diagram of the problem analysis system.

本發明所揭露的對伺服器測試時的監控與問題分析系統,其包含:待測試伺服器10、事件監測伺服器20以及管理員裝置30,待測試伺服器10更包含:資料收集模組11、測試模組12、生成模組13以及傳送模組14;事件監測伺服器20更包含:接收模組21、監測模組22、日誌接收模組23、日誌分析模組24以及資訊傳送模組25。 The monitoring and problem analysis system for server testing disclosed in the present invention includes: a server to be tested 10, an event monitoring server 20, and an administrator device 30. The server to be tested 10 further includes: a data collection module 11 , Testing module 12, generating module 13, and transmitting module 14. Event monitoring server 20 further includes: receiving module 21, monitoring module 22, log receiving module 23, log analysis module 24, and information transmitting module 25.

待測試伺服器10的主機板中安裝有基板管理控制器(Baseboard Management Controller,BMC),基板管理控制器需要支援智慧型平台管理介面(Intelligent Platform Management Interface,IPMI),待測試伺服器10的資料收集模組11可以透過智慧型平台管理介面所提供的操作指令收集待測試伺服器10的基板管理控制器網際網路協定位址(BMC Internet Protocol Address,BMC IP Address)、使用者名稱、密碼以及序號。 The baseboard management controller (BMC) is installed in the motherboard of the server 10 to be tested. The baseboard management controller needs to support an Intelligent Platform Management Interface (IPMI). The data of the server 10 to be tested The collection module 11 can collect the BMC Internet Protocol Address (BMC IP Address), user name, password, and BMC IP Address of the server 10 under test through the operation commands provided by the intelligent platform management interface. Serial number.

具體而言,待測試伺服器10的資料收集模組11可以透過智慧型平台管理介面所提供的操作指令為“ipmitool lan print”即可收集到待測試伺服器10的基板管理控制器網際網路協定位址例如是:101.124.79,在待測試伺服器10的資料收集模組11輸入操作指令為“ipmitool lan print”後,會反饋Set in Progress、Auth Type Support、rAuth Type Enable、IP Address以及Subnet Mask…等資訊的顯示內容41,顯示內容41的示意請參考「第2A圖」所示,「第2A圖」繪示為本發明對伺服器測試時的監控與問題分析的顯示內容示意圖,其中IP Address即為基板管理控制器網際網路協定位址。 Specifically, the data collection module 11 of the server 10 to be tested can be collected to the Internet of the baseboard management controller of the server 10 to be tested through the operation command "ipmitool lan print" provided by the intelligent platform management interface. The protocol address is, for example, 101.124.79. After inputting the operation command "ipmitool lan print" in the data collection module 11 of the server 10 under test, it will feedback Set in Progress, Auth Type Support, rAuth Type Enable, IP Address and The display content 41 of Subnet Mask... and other information, please refer to "Figure 2A" for the display content 41. "Figure 2A" is a schematic diagram of the display content of the monitoring and problem analysis during server testing of the present invention. The IP Address is the Internet Protocol address of the baseboard management controller.

具體而言,待測試伺服器10的資料收集模組11可以透過智慧型平台管理介面所提供的操作指令為“ipmitool user list”即可收集到待測試伺服器10的使用者名稱例如是:ADMIN以及密碼例如是:ADMIN,在待測試伺服器10的資料收集模組11輸入操作指令為“ipmitool user list”後,會反饋ID Name、Callin、Link Auth、IPMI Msg以及Channel Priv Limit…等資訊的顯示內容41,顯示內容41的示意請參考「第2B圖」所示,「第2B圖」繪示為本發明對伺服器測試時的監控與問題分析的顯示內容示意圖,其中ID Name即為使用者名稱,值得注意的是,智慧型平台管理介面所提供的操作指令為“ipmitool user list”是會得到使用者名稱,而密碼會與使用者名稱相同。 Specifically, the data collection module 11 of the server under test 10 can collect the user name of the server under test 10 through the operation command "ipmitool user list" provided by the intelligent platform management interface, for example: ADMIN And the password is, for example, ADMIN. After inputting the operation command "ipmitool user list" in the data collection module 11 of the server 10 to be tested, the ID Name, Callin, Link Auth, IPMI Msg, Channel Priv Limit... and other information will be fed back. Display content 41, please refer to "Figure 2B" for the schematic of display content 41. "Figure 2B" is a schematic diagram of the display content of the monitoring and problem analysis during server testing of the present invention, where ID Name is used It’s worth noting that the operation command provided by the smart platform management interface is "ipmitool user list" to get the user name, and the password will be the same as the user name.

具體而言,待測試伺服器10的資料收集模組11可以透過智慧型平台管理介面所提供的操作指令為“ipmitool fru”即可收集到待測試伺服器10的序號例如是:CK6Y307N001,在待測試伺服器10的資料收集模組11輸入操作指令為“ipmitool fru”後,會反饋FRU Device Description、Chassis Type、Chassis Part Number、Chassis Serial、Chassis Extra、Board Mfg Date、Product Serial以及Product Asset Tag…等資訊的顯示內容41,顯示內容41的示意請參考「第2C圖」所示,「第2C圖」繪示為本發明對伺服器測試時的監控與問題分析的顯示內容示意圖,其中Product Serial即為序號。 Specifically, the data collection module 11 of the server under test 10 can collect the serial number of the server under test 10 through the operation command "ipmitool fru" provided by the intelligent platform management interface, for example: CK6Y307N001. After the data collection module 11 of the test server 10 enters the operation command "ipmitool fru", it will feedback FRU Device Description, Chassis Type, Chassis Part Number, Chassis Serial, Chassis Extra, Board Mfg Date, Product Serial and Product Asset Tag... Please refer to "Figure 2C" for the display content 41 of other information. "Figure 2C" is a schematic diagram of the display content of the monitoring and problem analysis during server testing of the present invention. Product Serial That is the serial number.

待測試伺服器10與事件監測伺服器20透過有線傳輸方式或是無線傳輸方式建立連線,前述的有線傳輸方式包含有:電纜網路、光纖網路…等,前述的無線傳輸方式包含有:行動通訊網路(例如是:3G、4G、5G…等),在此僅為舉例說明之,並不以此侷限本發明的應用範疇。 The server to be tested 10 and the event monitoring server 20 establish a connection through wired transmission or wireless transmission. The aforementioned wired transmission methods include: cable network, optical fiber network, etc., and the aforementioned wireless transmission methods include: The mobile communication network (for example: 3G, 4G, 5G... etc.) is only an example for illustration, and the scope of application of the present invention is not limited by this.

在待測試伺服器10的資料收集模組11收集待測試伺服器10的基板管理控制器網際網路協定位址、使用者名稱、密碼以及序號後,即可透過待測試伺服器10的傳送模組14將基板管理控制器網際網路協定位址、使用者名稱、密碼以及序號傳送至事件監測伺服器20,事件監測伺服器20的接收模組21即可自待測試伺服器10的傳送模組14接收基板管理控制器網際網路協定位址、使用者名稱、密碼以及序號。 After the data collection module 11 of the server 10 to be tested collects the Internet protocol address, user name, password, and serial number of the baseboard management controller of the server 10 to be tested, it can be transmitted through the transmission module of the server 10 to be tested The group 14 sends the Internet Protocol address, user name, password, and serial number of the baseboard management controller to the event monitoring server 20, and the receiving module 21 of the event monitoring server 20 can be transferred from the transmission module of the server 10 under test. Group 14 receives the Internet Protocol address, user name, password, and serial number of the baseboard management controller.

待測試伺服器10的測試模組12會依據測試程序進行待測試伺服器10的測試,測試程序可以是預先被儲存於待測試伺服器10,測試程序也可以是透過外部裝置與待測試伺服器10直接連線進行提供,或是測試程序也可以是由外部裝置利用網路傳輸至待測試伺服器10進行提供。 The test module 12 of the server under test 10 will perform the test of the server under test 10 according to the test procedure. The test procedure can be stored in the server under test 10 in advance, or the test procedure can be through an external device and the server under test. 10 is directly connected to provide, or the test program can also be provided by an external device using a network transmission to the server 10 to be tested.

在待測試伺服器10的測試模組12依據測試程序進行待測試伺服器10的測試的同時,事件監測伺服器20的監測模組22會對待測試伺服器10的測試模組12的測試過程進行監測,當待測試伺服器10的測試模組12中測試程序在測試過程中停擺(hung)時觸發逾時(timeout)響應時,待測試伺服器10的生成模組13會在待測試伺服器10的測試模組12中測試程序在測試過程中停擺時,生成傾印日誌(dump log)42,傾印日誌42的示意請參考「第3圖」所示,「第3圖」繪示為本發明對伺服器測試時的監控與問題分析的傾印日誌示意圖。 While the test module 12 of the server to be tested 10 performs the test of the server to be tested 10 according to the test procedure, the monitoring module 22 of the event monitoring server 20 will perform the test process of the test module 12 of the server to be tested 10 Monitoring, when the test program in the test module 12 of the server under test 10 is hung during the test and a timeout response is triggered, the generation module 13 of the server under test 10 will be in the server under test. When the test program in the test module 12 of 10 stops during the test, a dump log 42 is generated. Please refer to the "Figure 3" for the schematic of the dump log 42. The "Figure 3" is shown as The present invention is a schematic diagram of dump log for monitoring and problem analysis during server testing.

當待測試伺服器10的測試模組12中測試程序在測試過程中停擺時觸發逾時響應時,事件監測伺服器20的監測模組22即會監測到待測試伺服器10的測試模組12觸發逾時響應,而在事件監測伺服器20的監測模組22監測到待測試伺服器10的測試模組12觸發逾時響應時,事件監測伺服器20的日誌接收模組23即可依據基板管理控制器網際網路協定位址以及序號連線至待測試伺服 器10的基板管理控制器並藉由使用者名稱以及密碼登入待測試伺服器10的基板管理控制器以自待測試伺服器10接收傾印日誌。 When the test program in the test module 12 of the server under test 10 triggers a time-out response when it stops during the test, the monitoring module 22 of the event monitoring server 20 will monitor the test module 12 of the server under test 10 The timeout response is triggered, and when the monitoring module 22 of the event monitoring server 20 detects that the test module 12 of the server 10 to be tested triggers the timeout response, the log receiving module 23 of the event monitoring server 20 can be based on the substrate The Internet Protocol address and serial number of the management controller connect to the server to be tested The baseboard management controller of the server 10 logs in the baseboard management controller of the server 10 to be tested with a user name and a password to receive the dump log from the server 10 to be tested.

在事件監測伺服器20的日誌接收模組23自待測試伺服器10接收傾印日誌後,事件監測伺服器20的日誌分析模組24即可對傾印日誌中的資料進行分析以分析出待測試伺服器10依據測試程序進行測試停擺的發生成因以生成分析結果。 After the log receiving module 23 of the event monitoring server 20 receives the dump log from the server under test 10, the log analysis module 24 of the event monitoring server 20 can analyze the data in the dump log to analyze the dump log. The test server 10 tests the cause of the shutdown according to the test program to generate an analysis result.

在事件監測伺服器20的日誌分析模組24分析與生成分析結果後,即可再藉由事件監測伺服器20的資訊傳送模組25傳送分析結果至管理員裝置30。 After the log analysis module 24 of the event monitoring server 20 analyzes and generates the analysis result, the information transmission module 25 of the event monitoring server 20 can then send the analysis result to the administrator device 30.

管理員裝置30與事件監測伺服器20透過有線傳輸方式或是無線傳輸方式建立連線,前述的有線傳輸方式包含有:電纜網路、光纖網路…等,前述的無線傳輸方式包含有:行動通訊網路(例如是:3G、4G、5G…等),在此僅為舉例說明之,並不以此侷限本發明的應用範疇。 The administrator device 30 and the event monitoring server 20 establish a connection through a wired transmission method or a wireless transmission method. The aforementioned wired transmission methods include: cable network, optical fiber network, etc., and the aforementioned wireless transmission methods include: mobile The communication network (for example: 3G, 4G, 5G... etc.) is only an example for illustration, and the scope of application of the present invention is not limited by this.

管理員裝置30自事件監測伺服器20的資訊傳送模組25接收分析結果後,管理員裝置30即可對分析結果進行顯示,藉以提供管理人員依據分析結果得知待測試伺服器10進行測試程序的測試過程中停擺所發生的成因以對待測試伺服器10進行對應的元件替換、作業系統重新安裝…等處理,藉以提升待測試伺服器出廠的良率。 After the administrator device 30 receives the analysis result from the information transmission module 25 of the event monitoring server 20, the administrator device 30 can display the analysis result, so as to provide the administrator to know that the server 10 to be tested performs the test procedure according to the analysis result The cause of the shutdown during the test process is the corresponding component replacement, operating system reinstallation, etc., of the server to be tested 10, so as to improve the yield rate of the server to be tested.

事件監測伺服器20的資訊傳送模組25除了傳送分析結果至管理員裝置30之外,事件監測伺服器20的資訊傳送模組25亦可同時將傾印日誌傳送至管理員裝置30,管理員裝置30自事件監測伺服器20的資訊傳送模組25接收到傾印日誌後,管理員裝置30即可對傾印日誌進行顯示,藉以提供管理人員進一 步依據傾印日誌進行待測試伺服器10測試程序的測試過程中停擺所發生成因的分析並驗證分析結果的準確性,使更為精確的對待測試伺服器10進行對應的元件替換、作業系統重新安裝…等處理,藉以提升待測試伺服器出廠的良率。 In addition to the information transmission module 25 of the event monitoring server 20 transmitting the analysis results to the administrator device 30, the information transmission module 25 of the event monitoring server 20 can also simultaneously transmit the dump log to the administrator device 30. The administrator After the device 30 receives the dump log from the information transmission module 25 of the event monitoring server 20, the administrator device 30 can display the dump log to provide management personnel with further information. Step according to the dump log to analyze the cause of the shutdown of the server 10 under test during the test process and verify the accuracy of the analysis results, so that the server 10 under test can be replaced with corresponding components and the operating system will be renewed more accurately. Installation... and other processing to improve the yield of the server under test.

接著,以下將說明本發明的運作方法,並請參考「第4A圖」以及「第4B圖」所示,「第4A圖」以及「第4B圖」繪示為本發明對伺服器測試時的監控與問題分析方法的方法流程圖。 Next, the operation method of the present invention will be described below, and please refer to the "Figure 4A" and "Figure 4B". Method flow chart of monitoring and problem analysis methods.

首先,待測試伺服器收集待測試伺服器的基板管理控制器網際網路協定位址、使用者名稱、密碼以及序號(步驟101);接著,待測試伺服器傳送基板管理控制器網際網路協定位址、使用者名稱、密碼以及序號至事件監測伺服器(步驟102);接著,待測試伺服器依據測試程序進行待測試伺服器的測試(步驟103);接著,事件監測伺服器對待測試伺服器的測試過程進行監測(步驟104);接著,當測試程序在測試過程中停擺時,待測試伺服器觸發逾時響應(步驟105);接著,當測試程序在測試過程中停擺時,待測試伺服器生成傾印日誌(步驟106);接著,當事件監測伺服器監測到待測試伺服器觸發逾時響應時,依據基板管理控制器網際網路協定位址以及序號連線至待測試伺服器的基板管理控制器並藉由使用者名稱以及密碼登入待測試伺服器的基板管理控制器以自待測試伺服器接收傾印日誌(步驟107);接著,事件監測伺服器對傾印日誌中的資料進行分析以分析出待測試伺服器依據測試程序進行測試停擺的發生成因以生成分析結果(步驟108);最後,事件監測伺服器傳送分析結果至管理員裝置,管理員裝置接收分析結果並加以顯示(步驟109)。 First, the server under test collects the Internet protocol address, user name, password, and serial number of the server under test (step 101); then, the server under test sends the Internet protocol of the substrate management controller The address, user name, password, and serial number are sent to the event monitoring server (step 102); then, the server to be tested performs the test of the server to be tested according to the test procedure (step 103); then, the event monitoring server is for the server to be tested The test process of the tester is monitored (step 104); then, when the test program stops during the test, the server to be tested triggers a timeout response (step 105); then, when the test program stops during the test, the test The server generates a dump log (step 106); then, when the event monitoring server detects that the server under test triggers a timeout response, it connects to the server under test according to the baseboard management controller Internet protocol address and serial number And log in the baseboard management controller of the server to be tested with the user name and password to receive the dump log from the server to be tested (step 107); then, the event monitoring server responds to the dump log in the The data is analyzed to analyze the cause of the test shutdown of the server to be tested according to the test procedure to generate analysis results (step 108); finally, the event monitoring server sends the analysis results to the administrator device, and the administrator device receives the analysis results and adds them. Display (step 109).

綜上所述,可知本發明與先前技術之間的差異在於當待測試伺服器中測試程序在測試過程中停擺時,待測試伺服器觸發逾時響應與生成傾印日誌,事件監測伺服器監測到待測試伺服器觸發逾時響應時,依據基板管理控制器網際網路協定位址以及序號連線至待測試伺服器的基板管理控制器並藉由使用者名稱以及密碼登入待測試伺服器的基板管理控制器以自待測試伺服器接收傾印日誌,事件監測伺服器對傾印日誌中的資料進行分析以分析出待測試伺服器依據測試程序進行測試停擺的發生成因以生成分析結果,事件監測伺服器傳送分析結果至管理員裝置,管理員裝置接收分析結果並加以顯示。 In summary, it can be seen that the difference between the present invention and the prior art is that when the test program in the server under test stops during the test, the server under test triggers a timeout response and generates a dump log, and the event monitoring server monitors When the server to be tested triggers a timeout response, connect to the BMC of the server to be tested according to the Internet Protocol address and serial number of the BMC, and log in to the BMC of the server to be tested with the user name and password. The baseboard management controller receives the dump log from the server to be tested, and the event monitoring server analyzes the data in the dump log to analyze the cause of the test shutdown of the server to be tested according to the test procedure to generate analysis results, events The monitoring server sends the analysis result to the administrator device, and the administrator device receives the analysis result and displays it.

藉由此一技術手段可以來解決先前技術所存在現有對於伺服器進行測試發生停擺無法正常與準確獲得測試停擺成因的問題,進而達成在伺服器測試過程中產生停擺能準確獲得與分析出測試停擺成因的技術功效。 This technical means can solve the problem of the prior art that the test stoppage cannot be obtained normally and accurately when the test stoppage occurs during the test of the server in the prior art, and then the test stoppage can be accurately obtained and analyzed when the stoppage occurs during the server test. The technical effect of the cause.

雖然本發明所揭露的實施方式如上,惟所述的內容並非用以直接限定本發明的專利保護範圍。任何本發明所屬技術領域中具有通常知識者,在不脫離本發明所揭露的精神和範圍的前提下,可以在實施的形式上及細節上作些許的更動。本發明的專利保護範圍,仍須以所附的申請專利範圍所界定者為準。 Although the embodiments disclosed in the present invention are as described above, the content described is not intended to directly limit the scope of patent protection of the present invention. Anyone with ordinary knowledge in the technical field to which the present invention belongs can make some changes in the form and details of the implementation without departing from the spirit and scope of the present invention. The scope of patent protection of the present invention shall still be subject to those defined by the scope of the attached patent application.

10:待測試伺服器 10: Server to be tested

11:資料收集模組 11: Data collection module

12:測試模組 12: Test module

13:生成模組 13: Generate modules

14:傳送模組 14: Transmission module

20:事件監測伺服器 20: Event monitoring server

21:接收模組 21: receiving module

22:監測模組 22: Monitoring module

23:日誌接收模組 23: Log receiving module

24:日誌分析模組 24: log analysis module

25:資訊傳送模組 25: Information Transmission Module

30:管理員裝置 30: Manager device

Claims (8)

一種對伺服器測試時的監控與問題分析系統,其包含: 一待測試伺服器,所述待測試伺服器更包含: 一資料收集模組,用以收集所述待測試伺服器的基板管理控制器網際網路協定位址(Baseboard Management Controller Internet Protocol Address,BMC IP Address)、使用者名稱、密碼以及序號; 一測試模組,用以依據一測試程序進行所述待測試伺服器的測試,當所述測試程序在測試過程中停擺(hung)時觸發一逾時(timeout)響應; 一生成模組,當所述測試程序在測試過程中停擺時,生成一傾印日誌(dump log);及 一傳送模組,用以傳送基板管理控制器網際網路協定位址、使用者名稱、密碼以及序號; 一事件監測伺服器,所述事件監測伺服器更包含: 一接收模組,用以自所述傳送模組接收基板管理控制器網際網路協定位址、使用者名稱、密碼以及序號; 一監測模組,用以對所述待測試伺服器的所述測試模組的測試過程進行監測; 一日誌接收模組,當所述監測模組監測到所述測試模組觸發所述逾時響應時,依據基板管理控制器網際網路協定位址以及序號連線至所述待測試伺服器的基板管理控制器並藉由使用者名稱以及密碼登入所述待測試伺服器的基板管理控制器以自所述待測試伺服器接收所述傾印日誌; 一日誌分析模組,用以對所述傾印日誌中的資料進行分析以分析出所述待測試伺服器依據所述測試程序進行測試停擺的發生成因以生成一分析結果;及 一資訊傳送模組,用以傳送所述分析結果;及 一管理員裝置,所述管理員裝置自所述資訊傳送模組接收所述分析結果並加以顯示。 A monitoring and problem analysis system for server testing, which includes: A server to be tested, the server to be tested further includes: A data collection module for collecting the Baseboard Management Controller Internet Protocol Address (BMC IP Address), user name, password, and serial number of the server to be tested; A test module for testing the server to be tested according to a test program, triggering a timeout response when the test program hung during the test; A generating module, when the test program stops during the test, a dump log is generated; and A transmission module for transmitting the Internet Protocol address, user name, password and serial number of the baseboard management controller; An event monitoring server, the event monitoring server further includes: A receiving module for receiving the Internet protocol address, user name, password, and serial number of the baseboard management controller from the transmitting module; A monitoring module for monitoring the test process of the test module of the server to be tested; A log receiving module, when the monitoring module detects that the test module triggers the timeout response, connects to the server under test according to the Internet Protocol address and serial number of the baseboard management controller The baseboard management controller logs in the baseboard management controller of the server to be tested with a user name and a password to receive the dump log from the server to be tested; A log analysis module for analyzing the data in the dump log to analyze the cause of the test shutdown of the server to be tested according to the test program to generate an analysis result; and An information transmission module for transmitting the analysis result; and An administrator device, the administrator device receives the analysis result from the information transmission module and displays it. 如請求項1所述的對伺服器測試時的監控與問題分析系統,其中所述測試程序是預先被儲存於所述待測試伺服器。The monitoring and problem analysis system during server testing according to claim 1, wherein the test program is pre-stored in the server to be tested. 如請求項1所述的對伺服器測試時的監控與問題分析系統,其中所述測試程序透過外部裝置與所述待測試伺服器直接連線或是網路傳輸進行提供。The monitoring and problem analysis system during server testing according to claim 1, wherein the test procedure is provided through direct connection with the server under test via an external device or network transmission. 如請求項1所述的對伺服器測試時的監控與問題分析系統,其中所述資訊傳送模組更包含傳送所述傾印日誌至所述管理員裝置,所述管理員裝置自所述資訊傳送模組接收所述傾印日誌並加以顯示。The monitoring and problem analysis system during server testing according to claim 1, wherein the information transmission module further includes transmitting the dump log to the administrator device, and the administrator device uses the information The transmission module receives the dump log and displays it. 一種對伺服器測試時的監控與問題分析方法,其包含: 一待測試伺服器收集所述待測試伺服器的基板管理控制器網際網路協定位址(Baseboard Management Controller Internet Protocol Address,BMC IP Address)、使用者名稱、密碼以及序號; 所述待測試伺服器傳送基板管理控制器網際網路協定位址、使用者名稱、密碼以及序號至一事件監測伺服器; 所述待測試伺服器依據一測試程序進行所述待測試伺服器的測試; 所述事件監測伺服器對所述待測試伺服器的測試過程進行監測; 當所述測試程序在測試過程中停擺(hung)時,所述待測試伺服器觸發一逾時(timeout)響應; 當所述測試程序在測試過程中停擺時,所述待測試伺服器生成一傾印日誌(dump log); 當所述事件監測伺服器監測到所述待測試伺服器觸發所述逾時響應時,依據基板管理控制器網際網路協定位址以及序號連線至所述待測試伺服器的基板管理控制器並藉由使用者名稱以及密碼登入所述待測試伺服器的基板管理控制器以自所述待測試伺服器接收所述傾印日誌; 所述事件監測伺服器對所述傾印日誌中的資料進行分析以分析出所述待測試伺服器依據所述測試程序進行測試停擺的發生成因以生成一分析結果;及 所述事件監測伺服器傳送所述分析結果至一管理員裝置,所述管理員裝置接收所述分析結果並加以顯示。 A monitoring and problem analysis method for server testing, which includes: A server to be tested collects the Baseboard Management Controller Internet Protocol Address (BMC IP Address), user name, password, and serial number of the server to be tested; The server to be tested transmits the Internet Protocol address, user name, password, and serial number of the baseboard management controller to an event monitoring server; The server to be tested performs the test of the server to be tested according to a test procedure; The event monitoring server monitors the testing process of the server to be tested; When the test program is hung during the test, the server to be tested triggers a timeout response; When the test program stops during the test, the server to be tested generates a dump log; When the event monitoring server detects that the server under test triggers the timeout response, it connects to the baseboard management controller of the server under test according to the internet protocol address and serial number of the baseboard management controller And log in the baseboard management controller of the server to be tested with a user name and password to receive the dump log from the server to be tested; The event monitoring server analyzes the data in the dump log to analyze the cause of the test shutdown of the server to be tested according to the test program to generate an analysis result; and The event monitoring server transmits the analysis result to an administrator device, and the administrator device receives the analysis result and displays it. 如請求項5所述的對伺服器測試時的監控與問題分析方法,其中所述待測試伺服器依據所述測試程序進行所述待測試伺服器的測試的步驟中,所述測試程序是預先被儲存於所述待測試伺服器。The method for monitoring and problem analysis during server testing according to claim 5, wherein in the step in which the server to be tested performs the test of the server to be tested according to the test procedure, the test procedure is performed in advance Are stored in the server to be tested. 如請求項5所述的對伺服器測試時的監控與問題分析方法,其中所述待測試伺服器依據所述測試程序進行所述待測試伺服器的測試的步驟中,所述測試程序透過外部裝置與所述待測試伺服器直接連線或是網路傳輸進行提供。The method for monitoring and problem analysis during server testing according to claim 5, wherein in the step in which the server to be tested performs the test of the server to be tested according to the test procedure, the test procedure passes externally The device is directly connected with the server to be tested or provided through network transmission. 如請求項5所述的對伺服器測試時的監控與問題分析方法,其中所述對伺服器測試時的監控與問題分析方法更包含所述事件監測伺服器傳送所述傾印日誌至所述管理員裝置,所述管理員裝置自所述資訊傳送模組接收所述傾印日誌並加以顯示的步驟。The monitoring and problem analysis method during server testing according to claim 5, wherein the monitoring and problem analysis method during server testing further includes the event monitoring server sending the dump log to the The administrator device, the administrator device receives the dump log from the information transmission module and displays the step.
TW109132213A 2020-09-18 2020-09-18 Monitoring and problem analysis system during server test and method thereof TWI739603B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW109132213A TWI739603B (en) 2020-09-18 2020-09-18 Monitoring and problem analysis system during server test and method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW109132213A TWI739603B (en) 2020-09-18 2020-09-18 Monitoring and problem analysis system during server test and method thereof

Publications (2)

Publication Number Publication Date
TWI739603B true TWI739603B (en) 2021-09-11
TW202212857A TW202212857A (en) 2022-04-01

Family

ID=78778021

Family Applications (1)

Application Number Title Priority Date Filing Date
TW109132213A TWI739603B (en) 2020-09-18 2020-09-18 Monitoring and problem analysis system during server test and method thereof

Country Status (1)

Country Link
TW (1) TWI739603B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW200707203A (en) * 2005-08-03 2007-02-16 Aten Int Co Ltd System and method of managing a plurality of peripheral interfaces in IMPI architecture
CN102457394A (en) * 2010-10-27 2012-05-16 宏碁股份有限公司 Management method for server side device and management side device
TW201516665A (en) * 2013-08-30 2015-05-01 Hon Hai Prec Ind Co Ltd System and method for detecting system error of server
TW201704929A (en) * 2015-07-30 2017-02-01 神雲科技股份有限公司 Server and method for detecting power reset
US20170220419A1 (en) * 2016-02-03 2017-08-03 Mitac Computing Technology Corporation Method of detecting power reset of a server, a baseboard management controller, and a server
CN108737139A (en) * 2017-04-19 2018-11-02 北京京东尚科信息技术有限公司 For the data processing method of server, device and server B MC systems
TW201907301A (en) * 2017-07-05 2019-02-16 英業達股份有限公司 Transmission method of server event alert

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW200707203A (en) * 2005-08-03 2007-02-16 Aten Int Co Ltd System and method of managing a plurality of peripheral interfaces in IMPI architecture
CN102457394A (en) * 2010-10-27 2012-05-16 宏碁股份有限公司 Management method for server side device and management side device
TW201516665A (en) * 2013-08-30 2015-05-01 Hon Hai Prec Ind Co Ltd System and method for detecting system error of server
TW201704929A (en) * 2015-07-30 2017-02-01 神雲科技股份有限公司 Server and method for detecting power reset
US20170220419A1 (en) * 2016-02-03 2017-08-03 Mitac Computing Technology Corporation Method of detecting power reset of a server, a baseboard management controller, and a server
CN108737139A (en) * 2017-04-19 2018-11-02 北京京东尚科信息技术有限公司 For the data processing method of server, device and server B MC systems
TW201907301A (en) * 2017-07-05 2019-02-16 英業達股份有限公司 Transmission method of server event alert

Also Published As

Publication number Publication date
TW202212857A (en) 2022-04-01

Similar Documents

Publication Publication Date Title
US8443074B2 (en) Constructing an inference graph for a network
US6625648B1 (en) Methods, systems and computer program products for network performance testing through active endpoint pair based testing and passive application monitoring
US7289988B2 (en) Method and system for managing events
US8090995B2 (en) System monitoring
US7525422B2 (en) Method and system for providing alarm reporting in a managed network services environment
CN104104543B (en) Server managing system and method based on SNMP and IPMI protocol
US20100020715A1 (en) Proactive Network Analysis System
CN105323113B (en) A kind of system failure emergence treating method based on visualization technique
CN111600781A (en) Firewall system stability testing method based on tester
US20020112064A1 (en) Customer support network
US11659449B2 (en) Machine learning-based network analytics, troubleshoot, and self-healing holistic telemetry system incorporating modem-embedded machine analysis of multi-protocol stacks
US20060245365A1 (en) Apparatus and method for correlation and display of signaling and network events
CN112468335A (en) IPRAN cloud private line fault positioning method and device
WO2012139461A1 (en) Data acquisition method, apparatus and system
JPH06236337A (en) Method for controlling computer system
JP2013130901A (en) Monitoring server and network device recovery system using the same
CN106330554B (en) Operation and maintenance auditing system and method for monitoring and managing operation and maintenance operation process
TWI739603B (en) Monitoring and problem analysis system during server test and method thereof
KR100551452B1 (en) Grid computing system for testing application program capacity of server
CN116155687A (en) Remote operation and maintenance system
CN114185730A (en) System and method for monitoring and analyzing problems during server test
KR20120132910A (en) Network management system and method using smart nodes
CN109634848B (en) Large-scale testing environment management method and system for bank
JP2014036310A (en) Apparatus and method for evaluating effect
CN115714719B (en) Operation and maintenance processing method and device of server, electronic equipment and storage medium