201227282 六、發明說明: 【,發明所屬之技術領域】 [0001] 本發明涉及一種告警系統及方法,尤其關於一種伺服器 異常告警系統及方法。201227282 VI. Description of the Invention: [Technical Field] The present invention relates to an alarm system and method, and more particularly to a server abnormality alarm system and method.
【先前技術J[Prior Art J
[0002] 為了保證伺服器的安全運行,伺服器的基板管理控制器 會監控伺服器的運行狀態,在伺服器出現異常狀況時關 閉或重啟伺服器,並透過網路卡向遠端設備發送告警資 訊。例如,當CPU溫度超出門限值時,基板管理控制器會 關閉伺服器並利用網路卡向遠端設備發送告警資訊。然 而,在伺服器在關機或開機時,網路卡的工作會中斷, 若基板管理控制器正在發送告警資訊,告警資訊的傳送 將會失敗。遠端設備無法收到告警資訊,減弱了對伺服 器的監控能力。 【發明内容】 [0003] 鑒於以上内容,有必要提供一種伺服器異常告警系統及 方法,能夠在伺服器出現異常狀況時成功將告警資訊利 用網路卡發送出去。 [0004] 一種伺服器異常告警系統,運行於伺服器的基板管理控 制器中,所述伺服器包括網路卡,伺服器透過該網路卡 與遠端設備網路連接,該系統包括:判斷模組,用於判 斷伺服器是否出現導致關機或重啟的異常狀況;第一告 警模組,用於當伺服器出現導致關機或重啟的異常狀況 時,透過網路卡向遠端設備發送告警資訊,並將告警資 訊儲存至基板管理控制器的儲存器中;第二告警模組, 099146016 表單編號 A0101 第 4 頁/共 15 頁 0992079158-0 201227282 用於檢測伺服器在告警時間内是否關機,當檢測到伺服 器在告警時間内關機時,待網路卡恢復正常工作後將儲 存的告警資訊透過網路卡發送给遠端設備,所述告警時 間是成功發送告警資訊所需的時間;及第三告警模組, 用於檢測伺服器在告警時間内是否開機,當檢測到伺服 器在告警時間内開機時,待網路卡恢復正常工作後將儲 存的告警資訊透過網路卡發送給遠端設備。 [0005] Ο[0002] In order to ensure the safe operation of the server, the server management controller of the server monitors the running status of the server, shuts down or restarts the server when an abnormal condition occurs in the server, and sends an alarm to the remote device through the network card. News. For example, when the CPU temperature exceeds the threshold, the baseboard management controller shuts down the server and sends a warning message to the remote device using the network card. However, when the server is turned off or turned on, the operation of the network card will be interrupted. If the baseboard management controller is sending alarm information, the transmission of the alarm information will fail. The remote device cannot receive alarm information, which reduces the monitoring capability of the server. SUMMARY OF THE INVENTION [0003] In view of the above, it is necessary to provide a server abnormality alarm system and method, which can successfully transmit alarm information to a network card when an abnormal condition occurs in the server. [0004] A server abnormality alarm system runs in a baseboard management controller of a server, the server includes a network card, and the server is connected to a remote device network through the network card, and the system includes: determining The module is configured to determine whether the server has an abnormal condition that causes shutdown or restart. The first alarm module is configured to send an alarm message to the remote device through the network card when an abnormal condition that causes the server to shut down or restart occurs. And storing the alarm information in the storage of the baseboard management controller; the second alarm module, 099146016 Form No. A0101, page 4/15, 0992079158-0 201227282 is used to detect whether the server is shut down during the alarm time, when After detecting that the server is shut down during the alarm time, the stored alarm information is sent to the remote device through the network card after the network card resumes normal operation, and the alarm time is the time required for successfully sending the alarm information; The three alarm module is configured to detect whether the server is powered on during the alarm time, and when the server is detected to be powered on during the alarm time, the network is waiting for the network. To resume normal work after the storage of alarm information is sent to the remote device through the network card. [0005] Ο
GG
[0006] 一種伺服器異常告警方法,執行於伺服器的基板管理控 制器中,所述伺服器包括網路卡,伺服器透過該網路卡 與遠端設備網路連接,該方法包括步驟:判斷伺服器是 否出現導致關機或重啟的異常狀沉;若伺服器出現導致 關機或重啟的異常狀況,則透過網路卡向遠端設備發送 告警資訊,並將告警資訊儲存至基板管理控制器的儲存 器中;檢測伺服器在告警時間内是否關機,所述告警時 間是成功發送告警資訊所需的時間;若檢測到伺服器在 告警時間内關機,則待網路卡恢復正常工作後將儲存的 告警資訊透過網蜂卡發送給遠端設備;檢測伺服器在告 警時間内是否開機;及若檢測到伺服器在告警時間内開 機’則待轉卡恢復正常I作後贿存的告警資訊透過 網路卡發送給遠端設備’並返回檢測伺服ϋ在告警時間 内是否關機的步驟。 劂到伺服器在告警時間内關機或開機時重新 發送告警誊如 止 貝訊,保證了在伺服器出現異常狀況時成功將 。警資訊利用網路卡發送出去。 【實施方式】 099146016 表單蝙號A〇l〇1 第5頁/共is頁 0992079158-0 201227282 [0007] [0008] [0009] [0010] 參閱圖1所示,係本發明伺服器異常告警系統較佳實施例 的應用環境示意圖。所述伺服器異常告警系統1〇運行於 伺服器11的基板管理控制器1 2中。所述伺服器丨丨包括網 路卡13,伺服器π透過網路卡u與遠端設備14網路連接 ▲伺服器11出現導致關機或重啟的異常狀況時,基板 笞理控制器12關.閉或重啟伺服器11,並透過網路卡1 3向 遠端設備14發送告警資訊。所述基板管理控制器包括儲 存益15,用於儲存告警資訊。在本實施例中,所述儲存 器15是串列快閃儲存器(即卯^ flash)。 參閱圖2所示,係圖丨中伺服器異常告警系統的功能模組 圖所述飼服器異常告警系統包括判斷模組2 〇 〇、第一告 警模組210、第二告警模組22〇、第三告警模組23〇及清 除模組240。 所述判斷模組200用於判斷伺服器丨丨是否出現導致關機或 重啟的異常狀況。在本實施例中,基板筆理控制器12對 伺服器11的運行狀態進行監控,當釋於指示伺服器11運 订狀態的被監控量(例如cpu的溧度或電壓)超過門限值 夺關閉或重啟甸服器11。例如,若Cpu的溫度或者電壓過 高’則基板管理控制㈣關或重啟舰fill。在飼服 器11關機或開機時,網路卡13的工作將中斷,不能正常 傳送資訊。 所述第-S警模組21 〇用於當伺服器丨丨出現導致關機或重 啟的異常狀叫,透過網路卡13向遠端設備14發送告警 資訊’並將告警資訊儲存至基板管理控制器12的儲存器 15 ° 099146016 表單編號A0101 第6頁/共15頁 0992079158-0 201227282 [0011] 、,e第〜告警拉組22〇用於檢測伺服器丨丨在告警時間内是 機’當檢測刺服H11在告警時間㈣機時,等待 叱時間(例如30秒)以使網路卡13恢復正常工作,在 、。卡1 3恢復正常工作後將儲存器1 5儲存的告警資訊透 "路卡13發送給遠端設備14 ^所述告警時間是成功發 迗告警資訊所需的時間,例如1秒鐘。若檢測到伺服器n s 間内關機,表明網路卡13在告馨時間内工作中 斷’告警資訊發送失敗。 [0012] Ο 所述第三告警模組23〇用於檢測伺服器丨丨在告警時間内是 否開機’當檢測到祠服器11在告警時間内開機時,等待 —定時間以使網路卡13恢復正常工作,在網路卡13恢復 正常工作後將儲存器15儲存的告警資訊透過網路卡丨3發 送給遠端設備14。若檢測到伺服器u在告警時間内開機 ’表明網路卡13在告警時間内工作中斷,告警資訊發送 失敗。 [0013] Ο 所述清除模組240用於當第二告警模組220未檢測到伺服 器11在告警時間内關機或者第;三告警模組230未檢測到伺 服器11在告警時間内開機時,清除儲存器15儲存的告警 資訊。若第二告警模組22 0未檢測到伺服器11在告警時間 内關機,或者第三告警模組2 3 0未檢測到伺服器11在告警 時間内開機,表明網路卡13在告警時間内工作正常,告 警資訊發送成功。 參閱圖3所示,係本發明伺服器異常告警方法較佳實施例 的流程圖。 099146016 表單編號Α0101 第7頁/共15頁 °"2〇79158~〇 [0014] 201227282 [0015] [0016] [0017] [0018] 步驟S301,判斷模組200判斷伺服器丨丨是否出現導致關 機或重啟的異常狀況。在本實施例中,基板管理控制器 12對伺服器11的運行狀態進行監控,當用於指示伺服器 11運行狀態的被監控量(例如CPU的溫度或電壓)超過門 限值時關閉或重啟伺服器U。例如,若cpu的溫度或者電 壓過高,則基板管理控制器12關閉或重啟伺服器u。在 伺服器11關機或開機時,網路卡13的工作將中斷,不能 正常傳送資訊。 右伺服器11出現導致關機或重啟的異常狀況,步驟S3〇2 ,第一告警模組210透過網路卡13向遠端設備14發送告警 貝Λ,並將告警資訊儲存至基板管理控制器1 2的儲存器 15。在本實施例巾’所猶存印是串躲閃儲存器( 即SPI flash)。 步驟S303 ’第二告警模組220檢測在第一告警模組21〇發 送告警資訊的告警時間内伺服器u是否關機。所述告警 時間是成功發送告警資訊所需的時間,例W秒鐘。若未 檢測到伺服器11在告警時間内關機,表明網路卡丨3在告 警時間内工作正常’告警資訊發送成功,轉到步驟sm 〇 若檢測到伺服fill在告料間㈣機,表明網路卡13在 告警時間内工作中斷,告警資訊發送失敗,步驟S304, 第一告警模組22G等待-定時間(例如3G秒)以使網路卡 13恢復正常工作,在網路卡13恢復正常王作後將儲存器 15儲存的σ s資讯透過網路卡Η發送給遠端設備η。 099146016 表單編號A0101 第8頁/共15頁 0992079158-0 201227282 [0019] [0020] Ο [0021] 步驟S305,第三告警模組23〇檢測在生‘ 送告警資訊的告警時間内伺服器u是否:警模組220發 到劃11在告警時間内開機,表明若未檢測 間内工作告較訊發送成功 3在告警時 執仃步驟S307。 若檢測到伺服器U在告警時間内 告擎時間内工你士龄 表明網路卡13在 β β 円工作中斷,告警資訊發送失# 馇-土铯铲,η 天敗’步驟S306, 第二wM23Q特—定時間以 ^ , ή. 得卞1 3恢復正常工 、’ ㈣正常工作後將儲#||15儲存的告馨 育訊透過網路卡13發送給遠端設備14,並返回步驟聊3 ’第二口警模組220檢懒服器叫第三告警模組23〇發 送告警資訊的告警時間内是否關機。 若步驟S303中未檢測到饲服器U在告警時間内關機,或 者步驟S305中未檢測到飼服器11在告警時間内開機,表 明告警資訊發送成功’則步驟S307,清除模組24〇清除儲 [0022]Ο 存器15儲存的告警資訊。 綜上所述’.本發明符.合發明專利要·件,爰依法提出專利 申請。惟,以上所述者僅為本發明之較佳實施例,本發 明之範圍並不以上述實施例為限,舉凡熟悉本案技藝之 人士援依本發明之精神所作之等效修飾或變化,皆應涵 蓋於以下申請專利範圍内。 [0023] 圖式簡單說明】 圖1係本發明伺服器異常告警系統較佳實施例的應用環境 示意圖。 [0024] 圖2係圖1中伺服器異常告警系統的功能模組圖。 099146016 表單編號A0101 第9頁/共15頁 0992079158-0 201227282 [0025] 圖3係本發明伺服器異常告警方法較佳實施例的流程圖。 【主要元件符號說明】 [0026] 伺服器異常告警系統:10 [0027] 伺服器:11 [0028] 基板管理控制器 :12 [0029] 網路卡:13 [0030] 遠端設備:14 [0031] 儲存器:15 [0032] 判斷模組:200 [0033] 第一告警模組: 210 [0034] 第二告警模組: 220 [0035] 第三告警模組: 230 [0036] 清除模組:240 099146016 表單編號A0101 第10頁/共15頁 0992079158-0[0006] A server abnormality alarm method is implemented in a server management controller of a server, where the server includes a network card, and the server is connected to the remote device network through the network card, and the method includes the following steps: Determine whether the server has an abnormal situation that causes shutdown or restart; if the server has an abnormal condition that causes shutdown or restart, send an alarm message to the remote device through the network card, and store the alarm information to the baseboard management controller. In the storage device, detecting whether the server is powered off during the alarm time, the alarm time is the time required for the alarm information to be successfully sent; if the server is detected to be powered off during the alarm time, the network card will be stored after the network card resumes normal operation. The alarm information is sent to the remote device through the network bee card; the detection server is turned on during the alarm time; and if the server is detected to be powered on during the alarm time, then the alarm information of the card is restored after the card is resumed. The network card is sent to the remote device' and returns to the step of detecting whether the servo is shut down during the warning time. When the server is turned off or turned on during the alarm time, it will re-send the alarm, such as the stop message, to ensure that the server will be successful when the server is abnormal. Police information is sent out using a network card. [Embodiment] 099146016 Form bat number A〇l〇1 Page 5 / total is page 0992079158-0 201227282 [0007] [0009] [0010] Referring to FIG. 1, the server abnormal alarm system of the present invention is shown. A schematic diagram of an application environment of the preferred embodiment. The server abnormality alarm system 1 is operated in the baseboard management controller 12 of the server 11. The server 丨丨 includes a network card 13, and the server π is connected to the remote device 14 through the network card u. When the server 11 has an abnormal condition that causes shutdown or restart, the substrate processing controller 12 is turned off. The server 11 is closed or restarted, and the alarm information is sent to the remote device 14 through the network card 13. The baseboard management controller includes a storage benefit 15 for storing alarm information. In the present embodiment, the memory 15 is a serial flash memory (i.e., flash). Referring to FIG. 2, the function module of the server abnormal alarm system in the figure 图 includes the judgment module 2 〇〇, the first alarm module 210, and the second alarm module 22〇. The third alarm module 23 and the clear module 240. The determining module 200 is configured to determine whether the server 丨丨 has an abnormal condition that causes shutdown or restart. In this embodiment, the substrate processing controller 12 monitors the operating state of the server 11 when the monitored amount (eg, the cpu's temperature or voltage) indicating the state of the server 11 is exceeded. Or restart the device 11. For example, if the temperature or voltage of the CPU is too high, then the substrate management control (4) turns off or restarts the ship fill. When the feeder 11 is turned off or turned on, the operation of the network card 13 is interrupted, and the information cannot be transmitted normally. The first-S police module 21 is configured to send an alarm message to the remote device 14 through the network card 13 when the server detects an abnormal call that causes shutdown or restart, and stores the alarm information to the baseboard management control. The storage of the device 12 is 15 ° 099146016 Form No. A0101 Page 6 / Total 15 page 0992079158-0 201227282 [0011], e - alarm pull group 22 〇 is used to detect the server 是 is in the alarm time ' When the detection service H11 is in the alarm time (four) machine, it waits for the time (for example, 30 seconds) to restore the network card 13 to normal operation. After the card 1 3 resumes normal operation, the alarm information stored in the storage device 15 is transmitted to the remote device 14 ^ The alarm time is the time required for successfully transmitting the alarm information, for example, 1 second. If the shutdown of the server n s is detected, it indicates that the network card 13 is working interrupted during the notification time. [0012] Ο The third alarm module 23 is used to detect whether the server is turned on during the alarm time. When the server 11 is detected to be powered on during the alarm time, wait for a set time to make the network card 13 resumes normal operation, and the alarm information stored in the storage 15 is sent to the remote device 14 through the network card 3 after the network card 13 resumes normal operation. If it is detected that the server u is powered on during the alarm time, indicating that the network card 13 is interrupted during the alarm time, the alarm information transmission fails. [0013] 清除 The clearing module 240 is configured to: when the second alarm module 220 does not detect that the server 11 is powered off during the warning time or the third alarm module 230 does not detect that the server 11 is powered on during the warning time The alarm information stored in the storage 15 is cleared. If the second alarm module 22 0 does not detect that the server 11 is powered off during the alarm time, or the third alarm module 2 3 0 does not detect that the server 11 is powered on within the alarm time, indicating that the network card 13 is within the warning time. The work is normal and the alarm information is sent successfully. Referring to Figure 3, there is shown a flow chart of a preferred embodiment of the server exception alerting method of the present invention. 099146016 Form No. Α0101 Page 7 of 15 °"2〇79158~〇[0014] 201227282 [0015] [0018] Step S301, the determination module 200 determines whether the server is defective or not An abnormal condition of shutdown or restart. In the present embodiment, the baseboard management controller 12 monitors the operating state of the server 11, and turns off or restarts the servo when the monitored amount (for example, the temperature or voltage of the CPU) indicating the operating state of the server 11 exceeds the threshold. U. For example, if the temperature or voltage of the CPU is too high, the baseboard management controller 12 turns off or restarts the server u. When the server 11 is turned off or turned on, the operation of the network card 13 is interrupted, and the information cannot be normally transmitted. The right server 11 has an abnormal condition that causes shutdown or restart. In step S3, the first alarm module 210 sends an alarm to the remote device 14 through the network card 13, and stores the alarm information to the baseboard management controller 1. 2 of the storage 15. In this embodiment, the print is still a string dodge storage (ie, SPI flash). Step S303 ’ The second alarm module 220 detects whether the server u is turned off during the alarm time when the first alarm module 21 sends the alarm information. The alarm time is the time required to successfully send the alarm information, for example, W seconds. If it is not detected that the server 11 is turned off during the alarm time, it indicates that the network card 丨3 is working normally during the alarm time. 'The alarm information is sent successfully, go to step sm 〇 If the servo fill is detected in the notification room (4), indicating that the network The road card 13 is interrupted during the alarm time, and the alarm information is failed to be sent. In step S304, the first alarm module 22G waits for a fixed time (for example, 3G seconds) to restore the network card 13 to normal operation, and the network card 13 returns to normal. After Wang Zuo, the σ s information stored in the storage 15 is sent to the remote device η through the network card. 099146016 Form No. A0101 Page 8 / Total 15 Page 0992079158-0 201227282 [0020] [0021] Step S305, the third alarm module 23 detects whether the server u is in the alarm time of sending the alarm information The alarm module 220 sends the alarm 11 to the power-on time in the alarm time, indicating that if the operation is not detected, the data transmission success is successful. If it is detected that the server U is in the warning time, and the age of the employee indicates that the network card 13 is interrupted in the β β 円 operation, the alarm information is sent out #馇-土铯铲, η天败', step S306, second wM23Q special - set the time to ^, ή. Get 卞 1 3 to resume normal work, ' (4) After normal work, save the stored #||15 stored in the network card 13 to the remote device 14, and return to the steps Chat 3 'The second police module 220 check the lazy device called the third alarm module 23 是否 send alarm information during the alarm time is off. If it is not detected in step S303 that the feeder U is turned off during the alarm time, or if the feeder 11 is not turned on during the warning time in step S305, indicating that the alarm information is successfully sent, then step S307, the clear module 24 is cleared. Store [0022] the alarm information stored in the buffer 15. In summary, the invention is in accordance with the invention patent, and the patent application is filed according to law. The above is only the preferred embodiment of the present invention, and the scope of the present invention is not limited to the above-described embodiments, and equivalent modifications or variations made by those skilled in the art in light of the spirit of the present invention are It should be covered by the following patent application. Brief Description of the Drawings Fig. 1 is a schematic diagram of an application environment of a preferred embodiment of the server abnormal alarm system of the present invention. 2 is a functional block diagram of the server abnormality alarm system of FIG. 1. 099146016 Form No. A0101 Page 9 of 15 0992079158-0 201227282 [0025] FIG. 3 is a flow chart of a preferred embodiment of the server abnormality alerting method of the present invention. [Main component symbol description] [0026] Server abnormality alarm system: 10 [0027] Server: 11 [0028] Baseboard management controller: 12 [0029] Network card: 13 [0030] Remote device: 14 [0031 ] Storage: 15 [0032] Judgment Module: 200 [0033] First Alarm Module: 210 [0034] Second Alarm Module: 220 [0035] Third Alarm Module: 230 [0036] Clear Module: 240 099146016 Form No. A0101 Page 10 of 15 0992079158-0