TW201115332A - Server monitoring method - Google Patents

Server monitoring method Download PDF

Info

Publication number
TW201115332A
TW201115332A TW98135806A TW98135806A TW201115332A TW 201115332 A TW201115332 A TW 201115332A TW 98135806 A TW98135806 A TW 98135806A TW 98135806 A TW98135806 A TW 98135806A TW 201115332 A TW201115332 A TW 201115332A
Authority
TW
Taiwan
Prior art keywords
server
monitoring
notification message
host
error notification
Prior art date
Application number
TW98135806A
Other languages
Chinese (zh)
Other versions
TWI414939B (en
Inventor
Ta-Hua Lin
Chung-Nan Chen
Original Assignee
Inventec Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inventec Corp filed Critical Inventec Corp
Priority to TW98135806A priority Critical patent/TWI414939B/en
Publication of TW201115332A publication Critical patent/TW201115332A/en
Application granted granted Critical
Publication of TWI414939B publication Critical patent/TWI414939B/en

Links

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

A server monitoring method is provided. The server monitoring method comprises the steps as follow. A test signal is sent from a monitoring host to at least one server. Whether a platform event filter (PEF) function of the server is normal is determined. When the PEF function is abnormal, a test procedure is initiated. A system event log (SEL) of a baseboard management controller (BMC) of the server is updated. The SEL is retrieved from the BMC to the monitoring host. Whether the SEL is abnormal is determined. When the SEL is abnormal, an error notification message is generated and sent to a remote host. The remote host performs an analysis according to the error notification message.

Description

201115332 六、發明說明: 【發明所屬之技術領域】 本揭示内容是有關於一種監控方法,且特別是有關於 v —種伺服器監控方法。 【先前技術】 在伺服器中’基板管理控制器(baseboard management controller ; BMC)是一種設置於主機板上,以對系統進行 ^ 控管的裝置。在開發伺服器的階段中,必需對祠服器及其 上的基板管理控制器進行壓力測試。壓力測試係用以測試 農置在長時間運作的情形下’是否能正常地工作。然而, 在動辄十餘小時的長時間測試下,目前的測試方式必需耗 費人力進行檢查與除錯。舉例來說,如果錯誤發生於測試 流程的第十小時,則前面未發生錯誤的時間都將白費。而 如果檢測者在每隔兩小時檢測的情形下,錯誤卻在第一小 時即發生’則無法達到立即除錯及分析的功效。 籲 另一方面,有些基板管理控制器提供了平台事件過濾 器(Platform Event Filter ; PEF )的功能。平台事件過濾器 是一項事件處理的功能,能在系統產生事件時藉由網路來 發出警不通知。然而如果在進行測試的伺服器中,基板管 理控制器並未提供這項功能,或是這項功能已經損壞,亦 或網路的功能失效,則無法藉由基板管理控制 服器發生事件時產生任何警示通知。 、 因此,如何設計一個新的伺服器監控方法,能夠即時 的對伺服器的測試流程進行控管,並且可以在伺服器的各 201115332 種不正常情形發生時,仍能進行警示通知,乃為此一業界 亟待解決的問題。 【發明内容】 因此,本揭示内容之一態樣是在提供一種伺服器監控 方法,係用於伺服器監控系統,用以監控伺服器,包含下 列步驟:由監控主機傳送測試訊號至伺服器;根據伺服器 之回應判斷伺服器之平台事件過濾器功能是否正常;當平 台事件過濾器功能不正常,起始伺服器之測試流程;根據 伺服器之狀況,更新各伺服器之基板管理控制器之系統事 件記錄;藉由監控主機擷取各基板管理控制器之系統事件 記錄;判斷系統事件記錄是否出現異常記錄,當系統事件 記錄出現異常記錄,俾產生錯誤通知訊息至遠端主機;以 及根據錯誤通知訊息進一步藉由遠端主機進行錯誤分析。 根據本揭示内容之一實施例,伺服器監控方法更包 含:根據系統事件記錄判斷測試流程是否結束,當測試流 程結束,產生測試結束通知訊息至該遠端主機。 根據本揭示内容之另一實施例,更包含根據錯誤通知 訊息終止測試流程之步驟。 根據本揭示内容之又一實施例,測試流程係為壓力測 試。 根據本揭示内容之再一實施例,更包含:判斷伺服器 之網路功能是否正常;當網路功能正常,監控主機係藉由 網路功能擷取各基板管理控制器之系統事件記錄;當網路 功能不正常,監控主機係藉由直接連線路徑擷取各基板管 WU15332 理控:器之系統事件記錄。 本揭示内容之 a201115332 VI. Description of the Invention: [Technical Field of the Invention] The present disclosure relates to a monitoring method, and in particular to a server monitoring method. [Prior Art] In the server, the "baseboard management controller (BMC)" is a device that is disposed on the motherboard to control the system. During the development of the server, it is necessary to stress test the server and the baseboard management controller on it. The stress test is used to test whether the farmer can work normally under long-term operation. However, under the long-term test of more than ten hours, the current test method requires labor to check and debug. For example, if an error occurs in the tenth hour of the test process, the time before the error did not occur will be wasted. However, if the tester detects the error every two hours, the error will occur at the first hour, and the effect of immediate debugging and analysis cannot be achieved. On the other hand, some baseboard management controllers provide the functionality of the Platform Event Filter (PEF). The Platform Event Filter is an event handling feature that alerts the network when an event occurs. However, if the baseboard management controller does not provide this function in the server under test, or if the function is damaged, or the function of the network fails, it cannot be generated by the substrate management control server. Any warning notice. Therefore, how to design a new server monitoring method can control the server's testing process in real time, and can still provide warning notice when the server's 201115332 abnormal conditions occur. An industry problem to be solved. SUMMARY OF THE INVENTION Accordingly, one aspect of the present disclosure is to provide a server monitoring method for a server monitoring system for monitoring a server, comprising the steps of: transmitting a test signal to a server by a monitoring host; According to the response of the server, it is judged whether the platform event filter function of the server is normal; when the platform event filter function is abnormal, the test process of the server is started; and the baseboard management controller of each server is updated according to the status of the server. System event record; by monitoring the host to capture the system event record of each baseboard management controller; determining whether the system event record has an abnormal record, when the system event record has an abnormal record, generating an error notification message to the remote host; The notification message is further analyzed by the remote host for error analysis. According to an embodiment of the present disclosure, the server monitoring method further includes: determining whether the testing process ends according to the system event record, and when the testing process ends, generating a test end notification message to the remote host. According to another embodiment of the present disclosure, the step of terminating the test flow based on the error notification message is further included. According to yet another embodiment of the present disclosure, the test procedure is a stress test. According to still another embodiment of the present disclosure, the method further includes: determining whether the network function of the server is normal; when the network function is normal, the monitoring host uses the network function to capture a system event record of each baseboard management controller; The network function is abnormal. The monitoring host uses the direct connection path to capture the system event records of each board controller WU15332. a of the present disclosure

法,係用以監控伺態樣是在提供一種伺服器監控方 測試訊號至伺服器.器,包含下列步驟:由監控主機傳送 事件過濾器功能^,根據伺服器之回應判斷伺服器之平台 係判斷伺服器之=正㊉,當平台事件過濾器功能正常, 係將平台事件過嘑。。功能是否正常;當網路功能不正常, 台事件過濾器功:c之目標設定為監控主機,及將平 之直接連線路*·如傳輸路徑設定為甸服器及監控主機間 板管理控制器^ί始飼服器之測試流程;當做器之基 能產生錯誤通知訊自Ί異常’係藉由平台事件過據器功 機,再藉由監由直接連線路徑傳送至監控主 根據錯誤通知訊自錯誤通知訊息至遠端主機;以及 〜進步糟由遠端主機進行錯誤分析。 器監容之一實施例,當網路功能正常,服 3 .起始伺服器之測試流程;當伺服器之 制器偵測到系統異常,係藉由平台事件過渡器 月b 土錯誤通知訊息’以透過網路功能傳送錯誤通知訊 息至遠端主機;以及根據錯誤通知訊息進-步藉由遠端主 機進行錯誤分析。 根據本揭示内容之另一實施例,平台事件過濾器功能 產生之錯誤通知訊息係包含警示方式欄位、通知位址欄位 以及傳輸路徑攔位。 根據本揭示内容之又一實施例’其中網路功能係為區 域網路(Local Area Network ; LAN)功能。 根據本揭示内容之再一實施例,其中直接連線路徑係 201115332 為序列埠(Serial Port)或I2C介面。 應用本揭示内容之優點係在於藉由判斷伺服器的平台 事件過濾器功能之網路功能是否運作正常,以因應各種伺 服器的情況,決定產生錯誤通知訊息之方式,並且即時地 通知遠端主機以進行分析與除錯,而輕易地達到上述之目 的0 【實施方式】 請參照第1圖,係為本揭示内容之一實施例之伺服器 監控系統1之一方塊圖。伺服器監控系統1包含:伺服器 10、監控主機12以及遠端主機14。須注意的是,第1圖 中係繪示三個伺服器10,然而於其他實施例中,係可因應 不同之情況設置不同數目之伺服器10。 本實施例之伺服器監控系統1,係可在伺服器10有異 常的事件發生時,在不同的情形下經由不同的方式產生錯 誤通知訊息13、17或19即時地使遠端主機14可以進行錯 誤分析,甚或終止測試流程的進行,以針對測試流程中, 發生錯誤的關鍵點進行除錯。 伺服器10包含:基板管理控制器1〇〇以及通訊介面(未 繪示)。在開發伺服器10的階段中,必需對伺服器10及其 上的基板管理控制器1〇〇進行測試流程。於一實施例中, 測試流程係為壓力測試,係用以測試裝置在長時間運作的 情形下,是否能正常地工作。 基板管理控制器100是一種設置於伺服器10内的主機 板(未繪示)上,以對伺服器10的系統進行控管的裝置, 201115332 具有使系統管理軟體與裴置硬體相溝通的作用。基板管理 控制器100可以根據伺服器10内的各種偵測器(未繪示), 掌握伺服器10内的各種情況,如溫度、風扇轉速、電源模 - 式、作業系統狀態等等。 .一般正常運作的基板管理控制器100提供了平台事件 ,濾器的功能。平台事件過濾器功能是一項事件處理的功 能,能在系統產生事件時經由網路發出警示通知。然而如 果在進行測試的伺服器中,基板管理控制器1〇〇並未提 # 供這項功能,或是這項功能已經損壞,亦或網路之功能失 效,則無法藉由基板管理控制器100直接於伺服器10發生 事件時產生警示通知。 凊同時參照第2圖,係為本揭示内容之一實施例之伺 服器監控方法之流程圖。伺服器監控方法係可應用於如第 1圖繪不之伺服器監控系統丨。伺服器監控方法包含下列步 於步驟201,由監控主機12傳送測試訊號121至伺服 器10。接著於步驟202,根據伺服器1〇之回應(未繪示) _ 判斷伺服器之平台事件過濾器功能是否正常。 當平台事件過濾器功能不正常時,於第i圖中係以 ,左侧之健器10為例,執行步驟2G3,起始舰器之測 試流程。繼續執行步驟204,根據伺服器10之狀況,更新 飼服器10之基板管理控制器100之系統事件記錄^卜基 板g理控制器1〇〇於測試流程令,根據词服器】〇在之狀 況’包含上述如溫度、風扇轉速、電源 態等等的狀況,來持續更新系統事件記錄11β系統 錄11則以數據來對上述之狀況進行記錄。 ° 201115332 接著於步驟205 ’判斷伺服器之網路功能是否正 常。當網路功能正常,係執行步驟206,藉由監控主機12 經由網路功能擷取基板管理控制器1〇〇之系統事件記錄 11。其中網路功能於一實施例中,係為區域網路介面。而 當網路功能不正常’係執行步驟207,藉由監控主機12經 由伺服器10與監控主機12間的直接連線路徑擷取基板管 理控制器100之系統事件記錄U。直接連線路徑於一實施 例中’係為I2C或序列埠之通訊介面。The method for monitoring the servo is to provide a server monitoring test signal to the server, and includes the following steps: the event filter function is transmitted by the monitoring host, and the platform of the server is determined according to the response of the server. Judging the server = positive ten, when the platform event filter function is normal, the platform event is over. . Whether the function is normal; when the network function is not normal, the event filter function: the target of c is set to monitor the host, and the direct connection line will be set to *. If the transmission path is set to the device and the monitoring of the host board management The test procedure of the device is started; the base of the device can generate an error notification message. The error is caused by the platform event, and then transmitted to the monitoring master by the direct connection path. The notification message is sent from the error notification message to the remote host; and ~ the progress is badly analyzed by the remote host. One embodiment of the device monitoring, when the network function is normal, the service is 3. The start server test process; when the server device detects the system abnormality, the platform event transition device is notified by the platform error message. 'Transfer error notification message to the remote host through the network function; and further error analysis by the remote host according to the error notification message. According to another embodiment of the present disclosure, the error notification message generated by the platform event filter function includes a warning mode field, a notification address field, and a transmission path stop. According to yet another embodiment of the present disclosure, the network function is a Local Area Network (LAN) function. According to still another embodiment of the present disclosure, the direct connection path 201115332 is a Serial Port or an I2C interface. The advantage of applying the disclosure lies in determining whether the network function of the platform event filter function of the server is functioning properly, determining the manner of generating an error notification message according to various server conditions, and immediately notifying the remote host. For the purpose of analysis and debugging, the above-mentioned purpose is easily achieved. [Embodiment] Referring to FIG. 1, a block diagram of a server monitoring system 1 according to an embodiment of the present disclosure is shown. The server monitoring system 1 includes a server 10, a monitoring host 12, and a remote host 14. It should be noted that the three servers 10 are shown in Fig. 1, however, in other embodiments, different numbers of servers 10 may be provided depending on the situation. The server monitoring system 1 of the embodiment can generate the error notification message 13, 17 or 19 in different ways to make the remote host 14 available in an instant when an abnormal event occurs in the server 10. Error analysis, or even termination of the test process, to debug the key points of the error in the test process. The server 10 includes a baseboard management controller 1A and a communication interface (not shown). In the stage of developing the server 10, it is necessary to perform a test flow on the server 10 and the substrate management controller 1 on it. In one embodiment, the test procedure is a stress test that is used to test whether the device is functioning properly under prolonged operation. The baseboard management controller 100 is a device that is disposed on a motherboard (not shown) in the server 10 to control the system of the server 10. The 201115332 has a system for communicating the system management software with the device hardware. effect. The substrate management controller 100 can grasp various conditions in the server 10 according to various detectors (not shown) in the server 10, such as temperature, fan speed, power supply mode, operating system status, and the like. The generally functioning substrate management controller 100 provides platform events and filter functions. The Platform Event Filter feature is an event handling feature that alerts you to alerts when the system generates an event. However, if the baseboard management controller 1 does not mention this function in the server under test, or if the function is damaged, or the function of the network fails, the controller cannot be managed by the base unit. 100 generates an alert notification directly when an event occurs on the server 10. Referring to Fig. 2 at the same time, it is a flowchart of a servo monitoring method according to an embodiment of the present disclosure. The server monitoring method can be applied to the server monitoring system as shown in Fig. 1. The server monitoring method includes the following steps. In step 201, the test signal 121 is transmitted from the monitoring host 12 to the server 10. Next, in step 202, it is determined whether the platform event filter function of the server is normal according to the response (not shown) of the server 1〇. When the platform event filter function is not normal, in the i-th figure, the left side of the health device 10 is taken as an example, and step 2G3 is executed to start the test process of the ship. Continuing to step 204, according to the condition of the server 10, updating the system event record of the substrate management controller 100 of the feeding device 10, the substrate controller 1 is in the test flow order, according to the word service device The condition 'includes the above conditions such as temperature, fan speed, power state, etc., to continuously update the system event record. The 11th system record 11 records the above status with data. ° 201115332 Next, in step 205 ', it is determined whether the network function of the server is normal. When the network function is normal, step 206 is performed, and the monitoring system 12 captures the system event record 11 of the baseboard management controller 1 through the network function. In one embodiment, the network function is a regional network interface. If the network function is abnormal, step 207 is executed, and the system event record U of the substrate management controller 100 is captured by the monitoring host 12 via the direct connection path between the server 10 and the monitoring host 12. The direct connection path is an I2C or serial port communication interface in an embodiment.

步驟206或207結束後,係執行步驟2〇8,根據系統 事件記錄11判斷測試流程是否結束。當測試流程已經結 束,即進行步驟209,產生測試結束通知訊息13至遠端主 機ίο而當測試流程依然在進行,則執行步驟21〇,判斷 系統事件A錄11疋否出現異常的記錄。所謂的異常,舉例 來隹可為伺服器10之溫度過高、風扇轉速未達到所設定 的&準、電源供應不足或是作業“產 統事件記錄11未出頦昱當的印絲為寺寻田乐 —禾出現八申的名錄’則將回到步驟204,以 IW者測试〜程繼續更新系統事件記錄u。 2U,田產記錄Η出現異常的紀錄時,則將執行步觸 將根據,誤曰通4通知訊息15至遠端主機14,遠端主機^ = =訊息15進一步進行錯誤分析。因此,_ I式的人貝不需要隨時地在伺服器 遠端主機14遠㈣/旁邊觀察’而叮以名 實施例中,,: 伺服器10的狀況。於- ^據錯誤通機丨4係可在接收到錯誤通知訊息13後 根據錯誤通知訊息13,傳遞指令 程,避免測試流程繼續進行_ 、,日不)而終止測試 >力 L%繼續進㈣無法釐清錯誤狀況發生¥ 2〇lll5332 原因及時間點。 二因此,上述實施例中的伺服器監控方法,係可以在平 0事件過濾器功能不正常的情形下,由監控主機12偵測伺 器ίο在測試過程中產生的錯誤,並藉由錯誤通知訊息 3通知遠端主機14,俾進行錯誤分析。 〇於另一實施例中,當第2圖中的步驟202,根據伺服 器10之回應判斷伺服器之平台事件過濾器功能是否正常 之判斷結果’係顯示平台事件過遽器功能是正常的,則執 仃步驟A。步驟A係詳細繪示於第3圖中。 凊參照第3圖,係為本揭示内容一實施例中,當平台 事件m功能:^正常時之@服器監控方法之流程圖。於 步驟202判斷平台事件過濾器功能正常後,執行步驟3〇1, 判,伺服器1G之網路功能是否正常。當網路功能不正常, 於第1圖+,係以中間之飼服器1〇為例,執行步驟迎, Π台事件過濾器功能之目標設定為監控主機12,及將平 Λ7線職。如上所述,直接連料徑於一實 %例中,係為IC或序列埠之通訊介面。 接著於步驟303,起始仞w 1Λ 、丨 步驟304,判斷基板管理《測試流程。接著於 仪&埋控制器100是否福丨 當基板管理控制器100未 二否,到系統異常。 驟304,以债測測試流程是否且有慕二书’係重覆執行步 制器100細系統異t,係執:當ί板管理控 件過遽器功能產生錯誤通知訊自^ ’猎由平台事 傳送至監控主機12,再雜〜 並藉由直接連線路徑 再碡由監控域U傳送錯誤通^ 201 比 332 =7至遠端主機14,以使遠端主機14根據錯誤通知訊自、 進-步藉由進行錯誤分析。 。 訊_於一實施例中,平台事件過濾器功能產生之錯誤通知 襴^ 17係包含警示方式欄位、通知位址攔位以及傳輸路徑 生 其中警示方式攔位係表示以字串或是其他形式來產 機。通知位址攔位於本實施例中則由於傳輸對象為監控主 偏,而為監控主機12的位址。傳輸路徑攔位於本實施 巧中則為直接連線路徑。 A因此,上述實施例中的伺服器監控方法,係可以在平 事件過濾器功能正常,但網路功能不正常的情形下,由 ^月民器10本身產生錯誤通知訊息17,並藉由監控主機12 送至遠端主機14,俾進行錯誤分析。 而當步驟301中,判斷伺服器1〇之網路功能是否正常 之判斷結果係顯示網路功能正常,於第1圖中,係以最右 侧之伺服器1〇為例,執行步驟306,起始伺服器之測試流 程0 籲 接著於步驟307,判斷基板管理控制器1〇〇是否彳貞測 到系統異常。當基板管理控制器100未偵測到系統異常, 係重覆執行步驟307,以伯測測試流程是否具有異常。而 當基板管理控制器1〇〇偵測到系統異常,係執行步驟3〇8, 藉由平台事件過濾器功能產生錯誤通知訊息19,並藉由網 路傳送至遠端主機〗4,而不需再經由監控主機12,以使遠 端主機14根據錯誤通知訊息17進一步藉由進行錯誤分析。 於一實施例中,平台事件過濾器功能產生之錯誤通知 訊息19係包含警示方式欄位'通知位址攔位以及傳輸路徑 201115332 欄位。其中警示方式欄位係表示以字串或是其他形式來產 生。通知位址欄位於本實施例中則由於傳輸對象為遠端主 機14,而為遠端主機14的位址。傳輸路徑攔位於本實施 例中則為網路。 因此,上述實施例中的伺服器監控方法,係可以在平 台事件過濾器功能正常,並且網路功能亦正常的情形下, 由伺服器10本身產生錯誤通知訊息19,並藉由網路傳送 至遠端主機14,俾進行錯誤分析。(應瞭解到,在本實施 方式中所提及的步驟,除特別敘明其順序者外,均可依實 際需要調整其前後順序,甚至可同時或部分同時執行)。 由上述本揭示内容實施方式可知,應用本揭示内容之 優點係在於藉由檢測基板管理控制器之系統事件記錄,判 斷是否具有異常,並即時地通知遠端主機以進行分析與除 錯。 雖然本揭示内容已以實施方式揭露如上,然其並非用 以限定本揭示内容,任何熟習此技藝者,在不脫離本揭示 内容之精神和範圍内,當可作各種之更動與潤飾,因此本 揭示内容之保護範圍當視後附之申請專利範圍所界定者為 準。 【圖式簡單說明】 為讓本揭示内容之上述和其他目的、特徵、優點與實 施例能更明顯易懂,所附圖式之說明如下: 第1圖係為本揭示内容之一實施例之伺服器監控系統 之方塊圖 12 201115332 第2圖係為本揭示内交+ q 門令之另一實施例之伺服器監控方 法之流程圖;以及 。。第3圖係為本揭示内容—實施例中當平台事件過滤 态功能不正常時之伺服器監控方法之流程圖。 【主要元件符號說明】 :伺服器 11 :系統事件記錄 121 ·測試訊號 14 :遠端主機 201-211 :步骤 1 :監控系統 100 :基板管理控制器After the end of step 206 or 207, step 2〇8 is executed to judge whether the test flow is finished according to the system event record 11. When the test process has ended, step 209 is performed to generate a test end notification message 13 to the remote host ίο. When the test flow is still in progress, step 21 is performed to determine whether the system event A records 11 or not. The so-called abnormality, for example, may be that the temperature of the server 10 is too high, the fan speed does not reach the set & the power supply is insufficient, or the operation "production event record 11 is not out of the silk for the temple" "Tian Tian Le - Wo appears in the list of eight applications" will return to step 204, with the IW test ~ continue to update the system event record u. 2U, the field record Η when an abnormal record occurs, the step will be executed according to The error message 4 is sent to the remote host 14, and the remote host ^ = = message 15 further performs error analysis. Therefore, the _I type of person does not need to be at any time remotely from the server remote host 14 (four)/side Observe the 'in the name of the example,:: The status of the server 10. In the error-based machine 4, after receiving the error notification message 13, the instruction sequence is transmitted according to the error notification message 13, and the test procedure is avoided. Continue to _ , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , 0 event filter function In an abnormal situation, the monitoring host 12 detects an error generated by the server ίο during the test, and notifies the remote host 14 by the error notification message 3, and performs error analysis. In another embodiment, when In step 202 of FIG. 2, according to the response of the server 10, it is judged whether the function of the platform event filter of the server is normal. The function of displaying the platform event filter is normal, and then step A is performed. Step A 3 is a flowchart of a server monitoring method when the platform event m function is normal: In step 202, the platform event is determined in step 202. After the filter function is normal, perform step 3〇1 to determine whether the network function of the server 1G is normal. When the network function is abnormal, in Figure 1, the middle of the feeding device is used as an example to execute Steps to meet, the goal of the channel event filter function is set to monitor the host 12, and the line will be 7 lines. As mentioned above, the direct connection path is in the case of a real %, which is the communication interface of the IC or serial port. In step 303, the initial 仞w 1Λ丨Step 304, determining the substrate management "test flow. Then whether the instrument & buried controller 100 is well-being, the substrate management controller 100 is not the second, to the system abnormality. Step 304, whether the debt test process is admirable The second book's repeated implementation of the stepper 100 fine system is different, the system is executed: when the 板 board management control function is generated, the error notification message is sent from the platform to the monitoring host 12, and then mixed and borrowed. From the direct connection path, the error is transmitted from the monitoring domain U to 332 = 7 to the remote host 14, so that the remote host 14 performs error analysis according to the error notification. In an embodiment, the error notification generated by the platform event filter function includes a warning mode field, a notification address block, and a transmission path. The warning mode is indicated by a string or other form. machine. In the present embodiment, the notification address is the address of the monitoring host 12 because the transmission object is the monitoring main offset. The transmission path is located in this implementation and is a direct connection path. Therefore, the server monitoring method in the above embodiment can generate the error notification message 17 by the monthly device 10 in the case that the flat event filter is functioning normally, but the network function is abnormal, and by monitoring The host 12 sends it to the remote host 14 for error analysis. In step 301, the judgment result of determining whether the network function of the server 1 is normal is that the network function is normal. In the first figure, the server on the rightmost side is taken as an example, and step 306 is performed. The test flow of the start server is 0. Next, in step 307, it is determined whether the baseboard management controller 1 detects a system abnormality. When the substrate management controller 100 does not detect a system abnormality, step 307 is repeatedly executed to check whether the test process has an abnormality. When the baseboard management controller 1 detects a system abnormality, step 3〇8 is executed, and the error notification message 19 is generated by the platform event filter function, and is transmitted to the remote host 〖4 through the network instead of The monitoring host 12 is again required to cause the remote host 14 to further perform error analysis according to the error notification message 17. In one embodiment, the error notification message 19 generated by the platform event filter function includes a warning mode field 'notification address block and a transmission path 201115332 field. The warning mode field is generated by string or other forms. The notification address field is located in the embodiment, and the address of the remote host 14 is the remote host 14 because the transmission object is the remote host 14. The transmission path is in the network in this embodiment. Therefore, in the server monitoring method in the above embodiment, when the platform event filter function is normal and the network function is also normal, the error notification message 19 is generated by the server 10 itself and transmitted to the network through the network. The remote host 14 performs error analysis. (It should be understood that the steps mentioned in the present embodiment can be adjusted according to the actual needs, except for the order in which they are specifically stated, or even simultaneously or partially.) As can be seen from the above embodiments of the present disclosure, the advantage of applying the present disclosure is to determine whether there is an abnormality by detecting a system event record of the baseboard management controller, and immediately notify the remote host for analysis and debugging. The present disclosure has been disclosed in the above embodiments, but it is not intended to limit the disclosure, and any person skilled in the art can make various changes and refinements without departing from the spirit and scope of the disclosure. The scope of protection of the disclosure is subject to the definition of the scope of the patent application. BRIEF DESCRIPTION OF THE DRAWINGS The above and other objects, features, advantages and embodiments of the present disclosure will become more apparent and understood. Block diagram of the server monitoring system Fig. 12 201115332 Fig. 2 is a flow chart of the server monitoring method of another embodiment of the inscribed + q keeper; . Figure 3 is a flow chart of the server monitoring method when the platform event filtering function is abnormal in the present disclosure. [Main component symbol description] : Server 11 : System event record 121 · Test signal 14 : Remote host 201-211 : Step 1: Monitoring system 100 : Baseboard management controller

12 :監控主機 13、17、19 :錯誤通知訊息 15 :測試結束通知訊息 301-308 :步骤12: Monitoring host 13, 17, 19: error notification message 15: test end notification message 301-308: steps

1313

Claims (1)

201115332 七、申請專利範圍: 1·-種舰器監控方法’係用以監控至少一祠服 器,包含下列步驟: 由一監控主機傳送一測試訊號至該伺服器. 器器之回應判斷該词服器之-平台事件職 -測事件過濾器功能不正常,係起始該舰器之 根據該等伺服器之狀況,更新各該 管理控制器之-系統事件記錄; 通器之基板 事件=該監控主_取各料基板管理控制 器之該系統 件記Si;!:記=出現異常紀錄,當該系統事 機;以及仏錄’俾產生—錯誤通知訊息至―遠端主 誤分^據該錯誤通知訊息進一步藉由該遠端主機進行一錯 ㈣It請求項1所述之伺服11監控方法,更包含.根 記錄判斷該測試流程是否結束二= 生一測試結束通知訊息至該遠端主機。 如靖求項1所述之伺服器監控方法,p# 據該錯誤通知訊息終止該测試流程。μ t包3.根 201115332 4. 如請求項1所述之伺服器監控方法,其中該測試 流程為一壓力測試。 5. 如請求項1所述之伺服器監控方法,更包含: 判斷該伺服器之一網路功能是否正常;以及 當該網路功能正常,該監控主機係藉由該網路功能擷 取各該等基板管理控制器之該系統事件記錄; # 當該網路功能不正常,該監控主機係藉由一直接連線 路徑擷取各該等基板管理控制器之該系統事件記錄。 6. 一種伺服器監控方法,係用以監控至少一伺服 器,包含下列步驟: 由一監控主機傳送一測試訊號至該伺服器; 根據該伺服器之回應判斷該伺服器之一平台事件過濾 器功能是否正常; ® 當該平台事件過濾器功能正常,係判斷該伺服器之一 網路功能是否正常; 當該網路功能不正常,係將該平台事件過濾器功能之 一目標設定為該監控主機,及將該平台事件過濾器功能之 一傳輸路徑設定為該伺服器及該監控主機間之一直接連線 路徑; 起始該伺服器之一測試流程; 當該伺服器之一基板管理控制器偵測到一系統異常, 15 201115332 係藉由該平台事件過濾器功能產生一錯誤通知訊息,並藉 由該直接連線路徑傳送至該監控主機,再藉由該監控主機 傳送該錯誤通知訊息至一遠端主機;以及 根據該錯誤通知訊息進一步藉由該遠端主機進行一錯 誤分析。 7. 如請求項6所述之伺服器監控方法,當該網路功 能正常,更包含: φ 起始該伺服器之該測試流程; 當該伺服器之一基板管理控制器偵測到一系統異常, 係藉由該平台事件過濾器功能產生錯誤通知訊息,以透過 該網路功能傳送該錯誤通知訊息至該遠端主機;以及 根據該錯誤通知訊息進一步藉由該遠端主機進行一錯 誤分析。 8. 如請求項6所述之伺服器監控方法,該平台事件 • 過濾器功能產生之該錯誤通知訊息係包含一警示方式欄 位、一通知位址攔位以及一傳輸路徑欄位。 9. 如請求項6所述之伺服器監控方法,其中該網路 功能係為一區域網路功能。 10. 如請求項6所述之伺服器監控方法,其中該直接 連線路徑係為一序列埠或一 I2c介面。201115332 VII. Patent application scope: 1·-vehicle monitoring method is used to monitor at least one server, including the following steps: A test signal is transmitted from a monitoring host to the server. The device responds to the word The service device-platform event-measured event filter function is abnormal, and the system starts to update the management system-based system event record according to the status of the server; Monitor the master_take the system management controller of each material substrate controller Si;!: remember = an abnormal record occurs, when the system is in the event; and the record '俾 — - error notification message to the remote main error The error notification message is further caused by the remote host performing a fault (4) the servo 11 monitoring method described in the It request item 1, and further comprising: the root record determining whether the test flow ends or not, and the test end notification message is sent to the remote host. . According to the server monitoring method described in Jing. 1, the ## is terminated according to the error notification message. μ t package 3. Root 201115332 4. The server monitoring method according to claim 1, wherein the test flow is a stress test. 5. The server monitoring method according to claim 1, further comprising: determining whether a network function of the server is normal; and when the network function is normal, the monitoring host extracts each by using the network function The system event record of the baseboard management controllers; # When the network function is abnormal, the monitoring host retrieves the system event record of each of the baseboard management controllers by using a direct connection path. A server monitoring method for monitoring at least one server, comprising the steps of: transmitting a test signal to a server by a monitoring host; determining a platform event filter of the server according to the response of the server Whether the function is normal; ® When the platform event filter is functioning properly, it is judged whether the network function of one of the servers is normal; when the network function is abnormal, the target of the platform event filter function is set to the monitoring a host, and a transmission path of one of the platform event filter functions is set as a direct connection path between the server and the monitoring host; starting a test flow of the server; when one of the servers is managed by the substrate The device detects a system abnormality, 15 201115332 generates an error notification message by using the platform event filter function, and transmits the error notification message to the monitoring host through the direct connection path, and transmits the error notification message by the monitoring host. To a remote host; and further performing an error analysis by the remote host according to the error notification message7. The server monitoring method according to claim 6, when the network function is normal, further comprising: φ starting the test flow of the server; when one of the server management controllers detects a system An abnormality is generated by the platform event filter function to transmit the error notification message to the remote host through the network function; and further performing an error analysis by the remote host according to the error notification message . 8. The server monitoring method according to claim 6, wherein the error notification message generated by the filter function includes a warning mode field, a notification address block, and a transmission path field. 9. The server monitoring method of claim 6, wherein the network function is a regional network function. 10. The server monitoring method of claim 6, wherein the direct connection path is a sequence or an I2c interface.
TW98135806A 2009-10-22 2009-10-22 Server monitoring method TWI414939B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW98135806A TWI414939B (en) 2009-10-22 2009-10-22 Server monitoring method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW98135806A TWI414939B (en) 2009-10-22 2009-10-22 Server monitoring method

Publications (2)

Publication Number Publication Date
TW201115332A true TW201115332A (en) 2011-05-01
TWI414939B TWI414939B (en) 2013-11-11

Family

ID=44934408

Family Applications (1)

Application Number Title Priority Date Filing Date
TW98135806A TWI414939B (en) 2009-10-22 2009-10-22 Server monitoring method

Country Status (1)

Country Link
TW (1) TWI414939B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108170577A (en) * 2013-10-31 2018-06-15 深圳迈辽技术转移中心有限公司 Server

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI258075B (en) * 2003-09-02 2006-07-11 Acer Inc Real time monitoring device for host and the monitoring method therefor
TWI238325B (en) * 2003-10-09 2005-08-21 Quanta Comp Inc Apparatus of remote server console redirection
TWI310131B (en) * 2006-03-24 2009-05-21 Wistron Corp Remote monitoring method with event-triggered warning capability
TW200736930A (en) * 2006-03-29 2007-10-01 Mitac Int Corp Monitoring method for monitoring servers
TW200838212A (en) * 2007-03-13 2008-09-16 Inventec Corp Method for remotely monitoring system
TW200904034A (en) * 2007-07-13 2009-01-16 Chunghwa Telecom Co Ltd Centralized monitoring system and its method for integrated test equipment of measurement platform
TWI349458B (en) * 2007-09-07 2011-09-21 Inventec Corp Testing monitoring system and method
US7886050B2 (en) * 2007-10-05 2011-02-08 Citrix Systems, Inc. Systems and methods for monitoring components of a remote access server farm
TW200922201A (en) * 2007-11-13 2009-05-16 Jr Rack Co Ltd Monitoring system of server cabinet and over-temperature monitoring device thereof

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108170577A (en) * 2013-10-31 2018-06-15 深圳迈辽技术转移中心有限公司 Server
CN108170577B (en) * 2013-10-31 2021-11-26 乾元云硕科技(深圳)有限公司 Server

Also Published As

Publication number Publication date
TWI414939B (en) 2013-11-11

Similar Documents

Publication Publication Date Title
US20180107196A1 (en) Method of Detecting Home Appliance Bus Control System
CN102055615B (en) Server monitoring method
TWI588660B (en) Method of detecting fault on communication bus using baseboard management controller and fault detector for network system
CN104639380A (en) Server monitoring method
US20120136970A1 (en) Computer system and method for managing computer device
CN106407059A (en) Server node testing system and method
DE60002908D1 (en) DEVICE AND METHOD FOR IMPROVED ERROR LOCATION AND DIAGNOSIS IN COMPUTERS
CN115022163A (en) Log collection method and device, computer equipment and storage medium
CN112995656B (en) Abnormality detection method and system for image processing circuit
TWI310131B (en) Remote monitoring method with event-triggered warning capability
TW201115332A (en) Server monitoring method
JP2009135255A (en) Communication logging device and fault analyzing apparatus including the same
WO2024113962A1 (en) Liquid leakage detection cable testing method, system, and apparatus, server, and electronic device
CN107562561A (en) Computer hardware rapid diagnostic test system
JP5623449B2 (en) Report creation apparatus, report creation program, and report creation method
JPH11133225A (en) Centralized monitoring system for color filter manufacturing device
CN105447389A (en) Vulnerability location and rapid reproduction based on Peach platform
CN111008098A (en) Monitoring system and method
CN114020586A (en) Method for rapidly alarming server fault by acquiring Event log through BMC
CN109726055A (en) Detect the method and computer equipment of PCIe chip exception
TWI494754B (en) Server monitoring apparatus and method thereof
TW200945029A (en) Control system and management method utilizing the same
CN110119370A (en) A kind of VR chip controls method and system based on PECI bus
TWI229996B (en) Operation system for accessing test function log file
KR20070109301A (en) Semiconductor manufacturing equipment parameter visual system and semiconductor manufacturing apparatus using the same

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees