TW200521837A - Method for switching to boot multi-processor computer system - Google Patents
Method for switching to boot multi-processor computer system Download PDFInfo
- Publication number
- TW200521837A TW200521837A TW92136324A TW92136324A TW200521837A TW 200521837 A TW200521837 A TW 200521837A TW 92136324 A TW92136324 A TW 92136324A TW 92136324 A TW92136324 A TW 92136324A TW 200521837 A TW200521837 A TW 200521837A
- Authority
- TW
- Taiwan
- Prior art keywords
- switching
- cpu
- rom
- boot
- bmc
- Prior art date
Links
Landscapes
- Stored Programmes (AREA)
Abstract
Description
200521837 五、發明說明α) 【發明所屬之技術領域] 本發明係關於一種複處理器電腦系統之開機異常管理 方法’特別是指一種藉由BMC (基板管理控制器)進行cpu 或ROM切換程序,以管理開機異常問題之方法。 【先前技術】 於電腦系統中,基於高可用性(High Available)系統 的设汁理念’為了可以維持系統繼續運作,而不需任何人 為#作將故障排除’遂有備份系統存在的必要,此乃複處 理器(multiple pr〇cessor )系統產生的原因之一。複處 理态電腦系統如伺服器(s e v e r ),由於具有多個中央處 理單το ( CPU ) ’因此可以提高整體處理效能,並於指定 CPU發生錯誤時作為替代之用。 —如一般而言,對於複處理器電腦系統的開機程序,是指 定單一的開機CPU (Boot strap Processor)提供運算功 能’負責處理開機時基本輸入輸出系統(B丨〇s )的指令, 以進行電腦系統初始化作業並載入操作系統(〇s );其 中’開機B I 0S是儲存於「基本輸入輸出系統之唯讀記憶 體」(BIOS ROM )上,而開機時其他的CPU被定義為「應 用^U (appiicati〇n pr〇cess〇rs )」,並被設定處於。 「等待狀態(wait state )」。 當使用開機cpu無法開機時,既有的作法是於BI0S中 編寫開機時切換CPU之程式,由開機Cpu切換到其他的應用 CPU,其切換機制如第1圖所示。 “ 另一個可能發生的問題,是B丨os已經切換到所有的200521837 V. Description of the invention α) [Technical field to which the invention belongs] The present invention relates to a method for managing a boot abnormality of a multiprocessor computer system, in particular, a CPU or ROM switching program by a BMC (Baseboard Management Controller), To manage abnormal boot problems. [Previous technology] In the computer system, based on the high-availability (High Available) system design concept of 'in order to maintain the system to continue to operate without the need for any human # work to troubleshoot' there is a need for a backup system, this is One of the reasons for a multiple processor system. Reprocessing physical computer systems such as servers (s e v e r), because they have multiple central processing units το (CPU) ′, can improve the overall processing performance, and can be used as a substitute when a specified CPU error occurs. — As a general rule, the boot process for a multiprocessor computer system is to designate a single boot CPU (Boot strap Processor) to provide the computing function 'responsible for processing the basic input and output system (B 丨 〇s) instructions at boot time to perform The computer system is initialized and loaded into the operating system (0s); of which, the “boot BI 0S” is stored in the “Read Only Memory of the Basic Input Output System” (BIOS ROM), and other CPUs are defined as “applications when booting ^ U (appiicati〇n pr〇cess〇rs) ", and is set at. "Wait state". When the boot CPU cannot be used, the existing method is to write a program for switching the CPU at boot time in BI0S to switch from the boot CPU to other application CPUs. The switching mechanism is shown in Figure 1. "Another possible problem is that B 丨 os has switched to all
200521837 五、發明說明(2) CPU0旨试’但仍然無法開機;此狀況可能是BIOS ROM出了 問,鱗為了解決M〇S異常的問題,一個甚至多個備份唯讀 。己丨思體(backup. ROM )被用來作為替代方案;將開機BI〇s 切換到儲存在備份R0M上的備份81〇§,繼續進行開機程 序’其切換機制如第2圖所示。 然而’前述作法的缺點是必須使用特殊的β丨〇s來切換 ’或者作「唯讀記憶體開機切換」(r〇m Β〇〇ί Swap) 口又汁,進行BI〇s R〇M的切換,其電子線路是比較複雜的, 所以其確實過於麻煩且不符合成本效益。 【發明内容】 本發明所欲解決之技術問題,在於習用技術中是以改 寫B⑽、設計R〇M切換開機等方式進行開機異常之切疋換處改 理’不符合成本效益與工作需求。 鑒於以上習知技術的問題,本發明提供一種複處理哭 电腦系統之開機切換方法,利用基板管理控制器來管。 理開機異常時CPU與…㈧切換的判斷與執行作業,其包含 =下步驟:透過一BMC (基板管理控制器)確認開機昱3 :c’PU進Λ一cpu (中央處理單元)切換程序並重新開機;200521837 V. Description of the invention (2) CPU0 was attempted but still unable to boot; this situation may be caused by a BIOS ROM problem. In order to solve the problem of MOS abnormality, one or more backups are read-only. Backup. ROM is used as an alternative solution; the booting BI0s are switched to the backup 81 stored in the backup ROM, and the booting process is continued. The switching mechanism is shown in FIG. 2. However, the disadvantage of the aforementioned method is that special β 丨 〇s must be used for switching, or “read-only memory boot switching” (r〇m Β〇〇ί Swap) is performed, and BI0s ROM is performed. The electronic circuit of switching is more complicated, so it is really too cumbersome and not cost-effective. [Summary of the Invention] The technical problem to be solved by the present invention is that in conventional technology, rewriting B 开机, designing ROM switching, and other methods to switch the abnormality of the startup abnormality are not cost-effective and work requirements. In view of the problems of the above-mentioned conventional technologies, the present invention provides a method for switching on and off a computer system, which is managed by a substrate management controller. Judgment and execution of CPU and ... CPU switchover during abnormal startup, including = next step: confirm booting through a BMC (Baseboard Management Controller): c'PU enters a CPU (Central Processing Unit) switching program and restart;
Mel 新開機之步驟;其中,CPU切換程序係透過 序—Λ機cpu與至少一應用cpu之間切換;再者,_切 機ROM序 過,將執行開機程序之BI〇S由儲存於一開 份B^S之—開機^08,切換至儲存於至少一備份1^之一備Mel's new booting steps; Among them, the CPU switching program is switched by sequence-Λ machine cpu and at least one application cpu; Furthermore, the _cut machine ROM sequence is passed, the BIOS that executes the boot process is stored in a Copy B ^ S — Power on ^ 08, switch to one of the backups stored in at least 1 ^
第6頁 200521837 五、發明說明(3) 本發明達成之功效,在於可以BMC管理開機異常問 題,系統B I 0 S和R 0 Μ均不必作額外的設計,而可進一步提 高系統穩定性。 【實施方式】 本發明係為一種複處理器電腦系統之開機切換方法, 主要是利用基板管理控制器BMC (Baseboard Management Control ler )來管理開機異常時CPU與BIOS切換的判斷與 執行作業。 BMC原本是應用於智慧平台管理介面(IPMI ),控制系 統的管理軟體和平台管理硬體之間的介面,提供自主監 視、事件記錄和恢復控制功能,並可作為系統管理軟體和 智慧平台管理匯流排IPMB(Intelligent Platform Management Bus)與智慧機箱管理匯流排ICMB (Intelligent Chassis Management Bus )介面間的網路 閘道使用。 之所以可以透過BMC來管理系統異常問題,是因為系 統可以透過「低接腳數LPC(Low Pin Count)介面」,從 BMC得到系統的狀況資訊。 本發明即為B M C的另一全新的應用領域,以下藉由第3 圖說明透過BMC管理開機異常狀況之處理方式。執行優先 順序’基本是先作CPU切換、重開機,若不行再做R0M切 換、重開機。 首先,在系統電源啟動後,確認BMC未接獲開機BI〇S 之一已開機訊息(步驟11 〇 );如接獲已開機訊息,表示Page 6 200521837 V. Explanation of the invention (3) The effect achieved by the present invention is that the BMC can manage the abnormal startup problem. The system B I 0 S and R 0 M do not need to make additional design, which can further improve the system stability. [Embodiment] The present invention is a booting switching method for a multiprocessor computer system, which mainly uses a baseboard management controller (BMC) to manage the judgment and execution of CPU and BIOS switching during abnormal booting. BMC was originally applied to the intelligent platform management interface (IPMI), the interface between the management software of the control system and the platform management hardware, providing independent monitoring, event recording and recovery control functions, and can be used as the system management software and intelligent platform management convergence Use the network gateway between the Intelligent Platform Management Bus (IPMB) and the Intelligent Chassis Management Bus (ICMB) interface. The reason why the BMC can be used to manage system abnormality is because the system can obtain the system status information from the BMC through the "Low Pin Count (LPC) Interface". The present invention is another brand-new application field of B MC. The following is a description of a processing method for managing an abnormal situation of booting through BMC with reference to FIG. 3. The execution priority sequence is basically a CPU switch and a reboot, if not, then a ROM switch and a reboot. First, after the system power is turned on, confirm that the BMC has not received a power-on message from one of the booting BIOS (step 11); if it receives a power-on message, it means
第7頁 200521837Page 7 200521837
系統正常開機運作(步驟120 ) 。BMC係以系統備用電源佴 電,因此系統電源啟動前,BMC即已備妥,如此才能在系、 統電源一啟動,就接收B I 傳來的開機程序運作狀態。/、 接著,確認未完成CPU切換程序與ROM切換程序(步驟 1 3 0 ),如系統已完成c p u、r 〇 μ切換程序卻無法開機,代 表所有C P U均發生錯誤,系統無法開機運作(步驟1 4 〇 ), 只能人工排除故障,例如更換cpu。 其次,確認未完成CPU切換程序(步驟丨5〇 ),隨印 行CPU切換程序(步驟1 60 )。 運 步驟1 6 0之C P U切換程序,更包括兩個細部流程;其〜 為改變所有CPU之SMI狀態以將BSP CPU(所謂BSP CPU是於〜 boot strap processor,是指一開始開機時,先開始動9 開機的CPU,也就是預設用以開機之cpu ;於第二次以後的 cpu切換程序中,則為前一次開機之CPU)與cpu匯流排隔絕 (步驟161 ),然後BMC產生一CPU切換信號與一重開機信^ 號至開機B I 0S或備份B I 0S (步驟1 6 2 )。重開機之後,即 回到步驟11 0確認開機权態。 步驟150的判斷後,如已進行過CPU切換,即進行r〇m 切換程序(步驟1 7 0 )。此程序即在將執行開機程序的 B I 0 S ’由B 10 S R 0 Μ中的開機B I 0 S,切換到備份r 〇 μ中的備 份BIOS,並以備份BIOS重開機;詳而言之,由BMC產生r〇m 切換信號至一複雜可程式邏輯器件CPLD (ComplexThe system boots normally (step 120). The BMC is powered by the system's backup power. Therefore, the BMC is ready before the system power is turned on. In this way, the system can receive the operating status of the boot process as soon as the system and system power is turned on. / 、 Next, confirm that the CPU switching program and ROM switching program are not completed (step 130). If the system has completed the cpu and r 0μ switching programs but cannot boot, it means that all CPUs have errors and the system cannot boot. (Step 1 4 〇), can only manually troubleshoot, such as replacing the cpu. Secondly, confirm that the CPU switching program is not completed (step 5o), and the CPU switching program is printed (step 1 60). The CPU switching procedure of step 160 includes two detailed processes; it is to change the SMI state of all CPUs to change the BSP CPU (the so-called BSP CPU is in the ~ boot strap processor). Start the CPU that is booted, which is the CPU that is preset to boot; in the second and subsequent cpu switching procedures, the CPU that was booted the previous time) is isolated from the cpu bus (step 161), and then the BMC generates a CPU Switch the signal and a restart signal to the boot BI 0S or backup BI 0S (step 16 2). After restarting, return to step 11 0 to confirm the power-on status. After the determination in step 150, if the CPU switching has been performed, the ROM switching procedure is performed (step 170). This program is to switch on the BI 0 S 'in B 10 SR 0 Μ from the booting of BI 0 S' to the backup BIOS in backup r 0 μ, and restart the computer with the backup BIOS. BMC generates rm switching signal to a complex programmable logic device CPLD (Complex
Programmable Logic Device)以切換至備份rom,並產生 系統重開機訊號至備份BIOS。重開機之後,亦回到步驟Programmable Logic Device) to switch to the backup ROM and generate a system restart signal to the backup BIOS. After rebooting, it also returns to step
第8頁 200521837 五、發明說明(5) 11 0確認開機狀態。 請參閱第4圖,說明在執行CPU切換時,BMC中之執行 流程,可用以佐證本發明之可行性。S Μ 11與S ΜI 2為B M C上 的兩個「系統管理中斷SMI (System Management Interrupt)」,SWAP狀態表示切換狀態,STBY_PGD、 R0M_SWAP 、 STATE_CHANGE 、 SYS_PGD 、 CPU_SWAP 等為BMC 中 控制程式之功能參數;STBY_PGD為待命開機狀態, R0M_SWAP為ROM切換狀態,STATE_CHANGE為狀態轉換, SYS_PGD為系統重開機狀態,CPU_SWAP為CPU切換狀態。圖 中CPU之切換包含四個狀態,可讓BMC知道切換到第幾顆 CPU,各狀態下之執行内容分述如下: (1 )第1狀態之執行内容包括: a. 設定SMI 1為LOW (低位); b. 設定SMI2為HIGH (高位); c. 設定SWAP狀態至第2狀態; d·設定STATE_CHANGE 為CHANGE (轉換)。 (2)第2狀態之執行内容包括: a. 設定SMI 1 為HIGH ; b. 設定SMI2 為LOW ; c. 設定SWAP狀態至第3狀態; d·設定STATE_CHANGE 為CHANGE ° (3 )第3狀態之執行内容包括: a. 設定SMI1 為LOW ; b. 設定SMI2 為LOW ;Page 8 200521837 V. Description of the invention (5) 11 0 Confirm the startup state. Please refer to FIG. 4 to illustrate the execution flow in the BMC when performing CPU switching, which can prove the feasibility of the present invention. SM 11 and SM 2 are two "System Management Interrupts (SMI)" on the BMC. The SWAP status indicates the switching status. STBY_PGD, R0M_SWAP, STATE_CHANGE, SYS_PGD, CPU_SWAP, etc. are the functional parameters of the control program in the BMC; STBY_PGD is the standby power-on state, R0M_SWAP is the ROM switching state, STATE_CHANGE is the state transition, SYS_PGD is the system restart state, and CPU_SWAP is the CPU switching state. The CPU switching in the figure includes four states, which allows the BMC to know the number of CPUs to switch to. The execution content of each state is described as follows: (1) The execution content of the first state includes: a. Set SMI 1 to LOW ( Low level); b. Set SMI2 to HIGH; c. Set SWAP state to the second state; d. Set STATE_CHANGE to CHANGE. (2) The implementation of the second state includes: a. Set SMI 1 to HIGH; b. Set SMI2 to LOW; c. Set SWAP state to the third state; d · Set STATE_CHANGE to CHANGE ° (3) in the third state The execution contents include: a. Set SMI1 to LOW; b. Set SMI2 to LOW;
第9頁 200521837 五、發明說明(6) c. 設定SWAP狀態至第4狀態; d. 設定STATE_CHANGE 為CHANGE ° (4 )第4狀態之執行内容包括: a. 設定SWAP狀態至第4狀態; b. 設定STATE_CHANGE 為CHANGE ° 第5圖可說明本發明在利用BMC進行R0M切換(R〇M SWAP )時的細部流程,亦可用以驗證本發明之可行性。其 中BACKUPR0M代表備份ROM狀態,於本發明中,備份R〇M可' 處於正常狀態(normal state)或備用狀態(backup state ); ROMswitch則代表ROM切換狀態之功能參數。 藉由第4、5圖之CPU及ROM流程,BMC可依據第3圖的流 程’於開機異常時先進行第4圖之CPU切換,未成功開機時 再進行第5圖ROM切換,證實以BMC管理開機異常狀況確實 可行。 、 以上所述者’僅為本發明較佳之實施例而已,並非用 以限定本發明實施之範圍,熟習此技藝者經本發明之揭露 後’所據以修改替換者,均屬基於本發明技術思想之衍生 創作。 因此’在不脫離本發明之技術思想範圍下所作之均等 變化與修飾,皆應涵蓋於本發明之申請專利範圍内。Page 9 200521837 V. Description of the invention (6) c. Set SWAP state to the 4th state; d. Set STATE_CHANGE to CHANGE ° (4) The implementation of the 4th state includes: a. Set the SWAP state to the 4th state; b . Set STATE_CHANGE to CHANGE ° Figure 5 illustrates the detailed process of the invention when using the BMC for ROM switching (ROM SWAP), and can also be used to verify the feasibility of the invention. Among them, BACKUPR0M represents a backup ROM state. In the present invention, the backup ROM may be in a normal state or a backup state; ROMswitch represents a function parameter of a ROM switching state. With the CPU and ROM flow in Figures 4 and 5, the BMC can follow the flow in Figure 3 'to switch the CPU in Figure 4 when the boot is abnormal, and then perform the ROM switch in Figure 5 when it fails to boot. Managing abnormal startup conditions is indeed feasible. The above-mentioned ones are merely preferred embodiments of the present invention, and are not intended to limit the scope of implementation of the present invention. Those who are familiar with this technique after the disclosure of the present invention 'are modified based on the technical ideas of the present invention. Derivative works. Therefore, equivalent changes and modifications made without departing from the technical scope of the present invention should all be covered by the scope of patent application of the present invention.
第10頁 200521837 圖式簡單說明 第1 機制;Page 10 200521837 Schematic illustration of the first mechanism;
2圖係祝明先前技術 中複處理器系統之開機 切換 第3圖係說明本發明以BMC管理複處理器系 統之開機切 換機制; 第4圖係說明本發明以BMC進行複處理器 CPU切換流程;及 糸、、先開機之 第5圖係說明本發明以b M C進行複處 ROM切換流程。 理器系統開機之 【圖式符號說明】 步驟110 BMC未接獲開機BI0S之一已開機訊息 步驟1 2 0 系統正常開機運作 步驟130未完成CPU切換程序與R0M切換程序 步驟1 4 0 系統無法開機運作 步驟150 確認未完成CPU切換程序 步驟1 6 G 進行C P U切換程序 步驟161改變所有CPU之SMI狀態以將BSP cpu^cpu 排隔絕 、 /机 步驟16 2 BMC產生一 CPU切換信號與一 BIOS或備份BIOS 步驟1 7 0 R 0 Μ切換程序 CPU 中央處理單元 BIOS 基本輸入輸出系統 ROM 唯讀記憶體 BMC 基板管理控制器 重開機信號至開機2 is a diagram showing the booting switching of the multiprocessor system in the prior art. 3 is a diagram illustrating the booting switching mechanism of the multiprocessor system managed by the BMC according to the present invention; FIG. 4 is a diagram illustrating the process of switching the multiprocessor CPU by the BMC according to the present invention ; And 糸 ,, Figure 5 of the first boot is to illustrate the present invention to restore the ROM switch process using b MC. [Schematic symbol description] when the processor system is turned on. Step 110 The BMC has not received one of the boot messages of BI0S. Step 1 2 0 The system starts normally. Step 130 The CPU switching process and the ROM switching process are not completed. Step 1 4 0 The system cannot boot. Step 150 Confirm that the CPU switching procedure is not completed. Step 1 6 G Perform the CPU switching procedure. Step 161 Change the SMI status of all CPUs to isolate BSP cpu ^ cpu. / Step 16 2 The BMC generates a CPU switching signal and a BIOS or backup. BIOS step 1 7 0 R 0 Μ switch program CPU central processing unit BIOS basic input output system ROM read-only memory BMC baseboard management controller restart signal to boot
200521837 圖式簡單說明200521837 Schematic description
BSP CPU 預設開機之CPU SMI 1、SMI 2 系統管理中斷 SWAP狀態 切換狀態 STBY_PGD 待命開機狀態 R0M_SWAP ROM切換狀態 STATE_CHANGE 狀態轉換 SYS_PGD 系統重開機狀態 CPU.SWAP CPU切換狀態 LOW 低位 HIGH 高位 BACKUPR0M 備份ROM狀態 ROMswitch ROM切換狀態BSP CPU default boot CPU SMI 1, SMI 2 System management interrupt SWAP state switching state STBY_PGD Standby power-on state R0M_SWAP ROM switching state STATE_CHANGE state transition SYS_PGD System restart state CPU.SWAP CPU switching state LOW Low HIGH High BACKUPR0M Backup ROM status ROMswitch ROM switching status
第12頁Page 12
Claims (1)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW92136324A TWI244031B (en) | 2003-12-19 | 2003-12-19 | Booting switch method for computer system having multiple processors |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW92136324A TWI244031B (en) | 2003-12-19 | 2003-12-19 | Booting switch method for computer system having multiple processors |
Publications (2)
Publication Number | Publication Date |
---|---|
TW200521837A true TW200521837A (en) | 2005-07-01 |
TWI244031B TWI244031B (en) | 2005-11-21 |
Family
ID=37154675
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW92136324A TWI244031B (en) | 2003-12-19 | 2003-12-19 | Booting switch method for computer system having multiple processors |
Country Status (1)
Country | Link |
---|---|
TW (1) | TWI244031B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI514263B (en) * | 2011-12-29 | 2015-12-21 | Intel Corp | Boot strap processor assignment for a multi-core processing unit |
-
2003
- 2003-12-19 TW TW92136324A patent/TWI244031B/en not_active IP Right Cessation
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI514263B (en) * | 2011-12-29 | 2015-12-21 | Intel Corp | Boot strap processor assignment for a multi-core processing unit |
Also Published As
Publication number | Publication date |
---|---|
TWI244031B (en) | 2005-11-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11586514B2 (en) | High reliability fault tolerant computer architecture | |
WO2022198972A1 (en) | Method, system and apparatus for fault positioning in starting process of server | |
CN107122321B (en) | Hardware repair method, hardware repair system, and computer-readable storage device | |
TWI754317B (en) | Method and system for optimal boot path for a network device | |
US9223738B2 (en) | Method, system, and apparatus for dynamic reconfiguration of resources | |
US6691225B1 (en) | Method and apparatus for deterministically booting a computer system having redundant components | |
US9946553B2 (en) | BMC firmware recovery | |
US6594784B1 (en) | Method and system for transparent time-based selective software rejuvenation | |
US5317752A (en) | Fault-tolerant computer system with auto-restart after power-fall | |
US7953831B2 (en) | Method for setting up failure recovery environment | |
US7194614B2 (en) | Boot swap method for multiple processor computer systems | |
EP0433979A2 (en) | Fault-tolerant computer system with/config filesystem | |
US7007192B2 (en) | Information processing system, and method and program for controlling the same | |
WO2018095107A1 (en) | Bios program abnormal processing method and apparatus | |
US10896087B2 (en) | System for configurable error handling | |
WO2015042925A1 (en) | Server control method and server control device | |
US7984219B2 (en) | Enhanced CPU RASUM feature in ISS servers | |
US20060036832A1 (en) | Virtual computer system and firmware updating method in virtual computer system | |
WO2000022527A1 (en) | Process monitoring in a computer system | |
EP0683456B1 (en) | Fault-tolerant computer system with online reintegration and shutdown/restart | |
JP2002259130A (en) | Information processing system and is start control method | |
KR20050058241A (en) | Method and apparatus for enumeration of a multi-node computer system | |
JP2017078998A (en) | Information processor, log management method, and computer program | |
US7103639B2 (en) | Method and apparatus for processing unit synchronization for scalable parallel processing | |
TWI777664B (en) | Booting method of embedded system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
MM4A | Annulment or lapse of patent due to non-payment of fees |