TWI280477B - System, method and computer program product for correcting error of device connected to computer - Google Patents

System, method and computer program product for correcting error of device connected to computer Download PDF

Info

Publication number
TWI280477B
TWI280477B TW093105026A TW93105026A TWI280477B TW I280477 B TWI280477 B TW I280477B TW 093105026 A TW093105026 A TW 093105026A TW 93105026 A TW93105026 A TW 93105026A TW I280477 B TWI280477 B TW I280477B
Authority
TW
Taiwan
Prior art keywords
computer
error
correct
routine
designed
Prior art date
Application number
TW093105026A
Other languages
Chinese (zh)
Other versions
TW200525343A (en
Inventor
Alan Cox
Original Assignee
Red Hat Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from PCT/US2004/001286 external-priority patent/WO2004068397A2/en
Application filed by Red Hat Inc filed Critical Red Hat Inc
Publication of TW200525343A publication Critical patent/TW200525343A/en
Application granted granted Critical
Publication of TWI280477B publication Critical patent/TWI280477B/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0745Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in an input/output transactions management context
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions

Abstract

An error handling method/system for correcting an error stimulates a removal of the device from the computer and automatically turns off the device. The method/system also automatically turns on the device and stimulates an insertion of the device. This allows a pre-existing initialization routine to correct the error.

Description

[280477 A7 B7 五、發明說明(1 【發明所屬之技術領域】 本發明係關於使用熱插拔型介面以簡化供電腦之 輸入/輸出裝置用之錯誤處理常式。 經濟部智慧財產局員工消費合作社印製 5【先前技術】 如圖1所示,習知之電腦尤其包含中央處理單元 (CPU)lOl、輸入/輸出(I/O)匯流排103以及I/O裝置 105。I/O匯流排103允許資訊在I/O裝置105、CPU 101與隨機存取記憶體(RAM)之間流動。匯流排之例 10 子係為ISA(工業標準架構)、EISA(延伸型工業標準架 構)、PCI (周邊零件連接介面)以及MCA(微通道架 構)。 典型的I/O裝置包含鍵盤、印表機、網路裝置、 遊戲裝置等。I/O匯流排係利用一層次之硬體元件(包 15 含·· I/O埠107、介面以及裝置控制器109)而依序連 接至每個I/O裝置。連接至I/O匯流排之每個裝置具 有它們自己設定的I/O位址。在操作上,CPU選擇一 I/O埠並使用I/O匯流排以在一 CPU暫存器與該埠之 間傳送資料。 20 尤其,在如圖2所示之例示配置中,CPU 101將 所欲被傳送至I/O裝置之命令寫入至一控制暫存器 201,並從一狀態暫存器203讀取一表示I/O裝置之 内部狀態之數值。CPU 101亦藉由從一輸入暫存器 i 訂 本紙張尺度適用中國國家標準(CNS)A4規格(210x297公釐) A7 1280477 B7 五、發明說明(2 ) 205讃取位元組來從I/O裝置提取資料,並藉由將位 元組寫入至一輸出暫存器207來將資料推送至I/O裝 置。 一 I/O介面209係為一硬體電路及/或在一群I/O 5 埠與相對應的裝置控制器之間的軟體操作。其係作為 將I/O埠中之數值轉譯成供裝置用之命令與資料之解 譯器。反之,其偵測裝置狀態的改變並相應地更新扮 演狀態暫存器之角色的I/O埠。公用介面之例子係 為:鍵盤介面、磁碟介面、匯流排滑鼠介面、網路介 10 面、並列埠、串列埠、通用串列埠(USB)、PCMCIA 介面、PCI介面、SCSI介面等。 經濟部智慧財產局員工消費合作社印製 上述元件之系統層級操作係藉由包含一核心之操 作系統而執行。此核心提供基本服務給操作系統。一 般而言,核心(或一操作系統之任何類似的中心)包 15 含··一中斷處理常式(interrupt handler),其處理與核 心之服務競爭之所有需求或完成的I/O操作;一排程 器(scheduler),其決定哪些程式依照什麼順序共用核 心之處理時間;以及一監督程式(supervisor),其在排 定每個程序時,實際上提供電腦之使用給每個程序。 20 在習知之操作系統中,核心被要求處理I/O裝置 錯誤。由那些常式所產生之多數錯誤處理常式與呼叫 係為複雜且外延的。為了處理所有I/O裝置之個別錯 誤,核心本身變得複雜。此種複雜性易於延長核心之 -4- 本紙張尺度適用中國國家標準(CNS)A4規格(210x297公釐) 1280477 A7 B7 五、發明說明 10 15 經濟部智慧財產局員工消費合作社印製 20 發展時間。其亦成為核心'之除錯與維修之難懂且困難 的程序。 【發明内容】 本發明之核心包含藉由使用既存的驅動程式功能 而從I/O裝置錯誤恢復之簡化常式。換言之,處理個 別錯誤之外延與複雜常式在本發明w、中並不需 要。這種簡化是可能的,其乃因為大多數的錯誤(如 果不是全部的話)係可藉由只_ 1/〇裝置與回復開 啟裝置來改正。藉由提供常式給核心以透過電力循環 (亦即關賴啟)或模擬這種事件,可改正的錯誤係被 校正。可改正的錯誤之例子包含多數的硬體故障以及 幾乎所有可利用軟體來完全解決之故障。 既存的驅動程式功能之一例係為熱插拔特徵,其 允許將一驅動程式安裝置一新近可利用的裝置,並在 -裝置變成不可利用時移除一驅動程式。藉由使用埶 插拔特徵,核心可隔開有問題的1/0裝置。在此裝置 被隔開之後,核心可模擬一熱插拔事件(例如,透過 f置之電力循環或模擬裝置之插拔)。然後,如果錯 誤已藉由I/O裝置之初始化常式而獲得校正,則核心 可將裝置回至線上。因為初始化常式等等已經為I/O 驅動程式之一部分,所以核心未被要求包含其自己的 錯誤處理常式。 〃 ' 本紙張尺度適用中國國家標準(CNS)A4規格(21〇 χ 297公釐) 裝 訂 1 擎 A7 1280477 B7 五、發明說明(4 ) 【實施方式】 本發明提供處理電腦I/O裝置中之錯誤之機制。 藉由本發明可利用簡化方式來處理包含非典型錯誤及 5 /或難以診斷錯誤之很多錯誤。本發明之下述說明包 含關於錯誤之討論、錯誤偵測、熱插拔特徵、在處理 錯誤上使用熱插拔特徵以及本發明使用PCI匯流排介 面之一例。 多數可能的錯誤可能發生在I/O裝置/介面中。 10 舉例而言,網路卡可停止從網路複製資料進入電腦之 記憶體。此卡仍然可作為網路卡之功能,但實際上其 不再可將資料放置在電腦中的任何地方。 在處理這種錯誤時,第一步驟係偵測它。在 Linux™之tc_35815驅動程式之執行時間期間偵測某 15 些錯誤之習知常式可顯現如下: /*使空的緩衝區復位至控制器*/ int bdctl=le32_to_cpu(lp->rfd_cur->bd[bd_count 經濟部智慧財產局員工消費合作社印製 -1]·Β unsigned char id = 20 (bdctl & BD_RxBDID_MASK)»BD_RxBDID_SHIFT ; if (id >- RX_BUF PAGES) { printk(,,%s invalid BDID.\n,,, dev- -6- 本紙張尺度適用中國國家標準(CNS)A4規格(210 x 297公釐) A7 1280477 B7 五、發明說明(5 ) >name); panic queues(dev); } 於此例中,’’panic_queues"函數停止’’dev’’(I/0裝 5 置)之操作,其乃因為核心不能處理錯誤。即使在核 心能處理錯誤之情況下,所需之復原常式可能是複雜 的。在習知之電腦中,核心被要求要偵測並處理這種 錯誤與所有其他錯誤或導致如上所示之緊急異常 (panic) 〇 10 於本發明中,上述例子之常式可被修改以停止此 項處理並呼叫一簡化的錯誤處理常式。其需要比 npanic_queues”常式更少的碼,且亦可在不同的I/O 驅動程式之間分享此簡化常式。 經濟部智慧財產局員工消費合作社印製 簡化常式係結合熱插拔特徵而說明於下。在許多 15 標準操作系統中係可得到熱插拔特徵。舉例而言,從 核心2.4(2001年1月)開始,熱插拔特徵變成Linux™ 之標準特徵。首先,熱插拔特徵包含支援USB與 PCI (Cardbus)裝置。更新版本同樣包含IEEE 1394 (火線/i丄ink)支援。在主機上,S/390通道裝置使用熱 20 插拔特徵譬如用以報導裝置安裝與其他狀態改變事 件。對Windows、Mac OS以及其他操作系統而言, 亦可得到熱插拔特徵。 尤其,熱插拔特徵允許使用者插入新裝置,並直 本紙張尺度適用中國國家標準(CNS)A4規格(210x297公釐) A7 1280477 五、發明說明(6 ) 接使用它們而不需要開啟/關閉電腦。因為此特徵, 所以並未要求使用者深入學習系統管理。取而代之的 疋’ I/O裝置將至少局部自動設定它們自己的組態, 尤其藉由使用由熱插拔特徵所提供之插入/移除處理 5 常式來設定。 以下為供Linux™之一般熱插拔特徵中的插入/移 除處理常式用的偽碼例子: 中斷或另一個事件告知OS硬體已改變 〇s發現什麼已改變(硬體特定任務) 10 如果裝置被移除 啤叫裝置‘移除一個,處理常式 摧毀裝置之一般資料構造 否則 讀取新硬體 15 建立表示它的一般資料構造 經濟部智慧財產局員工消費合作社印製 使裝置可利用、分配資源等 尋找可處理它的驅動程式 呼叫驅動程式’初始化一個’處理常式 完成 20 如上述所說明的,熱插拔特徵包含數種常式以初 始=、移除以及開/關1/0裝置。換言之,熱插拔特 徵提供包含在初始化、移除以及開/關1/0裝置期間 供錯誤處理用之軟體碼之基本服務。因為這些基本服[280477 A7 B7 V. OBJECTS OF THE INVENTION (1 Technical Field of the Invention) The present invention relates to the use of a hot-swappable interface to simplify the error handling routine for input/output devices for computers. Cooperative Printing 5 [Prior Art] As shown in FIG. 1, a conventional computer includes, in particular, a central processing unit (CPU) 101, an input/output (I/O) bus 103, and an I/O device 105. I/O bus 103 allows information to flow between the I/O device 105, the CPU 101, and the random access memory (RAM). Example 10 of the bus is ISA (Industrial Standard Architecture), EISA (Extended Industrial Standard Architecture), PCI (peripheral part connection interface) and MCA (micro channel architecture). Typical I/O devices include keyboards, printers, network devices, gaming devices, etc. I/O busbars utilize a hierarchy of hardware components (packages) 15 I·O埠 107, interface and device controller 109) are sequentially connected to each I/O device. Each device connected to the I/O bus has its own set I/O address. In operation, the CPU selects an I/O port and uses the I/O bus to The data is transferred between the CPU register and the UI. In particular, in the exemplary configuration shown in FIG. 2, the CPU 101 writes a command to be transmitted to the I/O device to a control register 201, And reading a value indicating the internal state of the I/O device from a state register 203. The CPU 101 also applies the Chinese National Standard (CNS) A4 specification (210x297 public) by ordering the paper size from an input register i. PCT) A7 1280477 B7 V. Invention Description (2) 205 extracts a byte to extract data from the I/O device and pushes the data to I/ by writing the byte to an output register 207. O. An I/O interface 209 is a hardware circuit and/or software operation between a group of I/O 5 埠 and a corresponding device controller. It is used as a translation of the values in the I/O埠. It is used as an interpreter for command and data for the device. Conversely, it detects the change of state of the device and updates the I/O port that plays the role of the state register. The example of the common interface is: keyboard interface, magnetic Disc interface, bus mouse interface, network interface 10, parallel port, serial port, universal serial port (USB) , PCMCIA interface, PCI interface, SCSI interface, etc. The system level operation of the above-mentioned components printed by the Intellectual Property Office of the Intellectual Property Office of the Ministry of Economic Affairs is carried out by including a core operating system. This core provides basic services to the operating system. In other words, the core (or any similar center of an operating system) package 15 contains an interrupt handler that handles all the requirements or completed I/O operations that compete with the core services; A scheduler that determines which programs share the processing time of the core in what order; and a supervisor that actually provides the use of the computer to each program when scheduling each program. 20 In the conventional operating system, the core is required to handle I/O device errors. Most of the error handling routines and call generations produced by those routines are complex and extended. In order to handle individual errors of all I/O devices, the core itself becomes complex. This complexity is easy to extend the core -4- This paper scale is applicable to China National Standard (CNS) A4 specification (210x297 mm) 1280477 A7 B7 V. Invention Description 10 15 Ministry of Economic Affairs Intellectual Property Bureau Staff Consumer Cooperative Print 20 Development Time . It has also become a difficult and difficult process for debugging and repairing the core. SUMMARY OF THE INVENTION The core of the present invention includes a simplified routine for error recovery from an I/O device by using an existing driver function. In other words, it is not necessary to deal with individual errors and complex routines in the present invention. This simplification is possible because most of the errors, if not all, can be corrected by means of only the _ 1/〇 device and the replies. The correctable error is corrected by providing a routine to the core to pass the power cycle (ie, to turn it on) or to simulate such an event. Examples of correctable errors include most hardware failures and almost all failures that can be fully resolved with software. One example of an existing driver function is a hot plug feature that allows a driver to install a newly available device and remove a driver when the device becomes unavailable. By using the 埶 plug feature, the core can isolate the problematic 1/0 device. After the device is separated, the core can simulate a hot plug event (for example, a power cycle through the f or a plug in the analog device). Then, if the error has been corrected by the initialization routine of the I/O device, the core can return the device to the line. Since the initialization routine and so on are already part of the I/O driver, the core is not required to include its own error handling routine. 〃 ' This paper scale applies to China National Standard (CNS) A4 specification (21〇χ 297 mm) Binding 1 A7 1280477 B7 V. Inventive Note (4) [Embodiment] The present invention provides processing in a computer I/O device. The mechanism of error. A number of errors including atypical errors and 5/or difficult to diagnose errors can be handled in a simplified manner by the present invention. The following description of the invention includes discussion of errors, error detection, hot swapping features, use of hot plug features in processing errors, and an example of the use of a PCI bus interface in accordance with the present invention. Most possible errors can occur in the I/O device/interface. 10 For example, a network card can stop copying data from the network into the computer's memory. This card can still function as a network card, but in fact it can no longer place data anywhere on your computer. When dealing with this error, the first step is to detect it. The conventional routine for detecting some 15 errors during the execution time of the LinuxTM tc_35815 driver can be as follows: /* Reset the empty buffer to the controller */ int bdctl=le32_to_cpu(lp->rfd_cur- >bd[bd_count Ministry of Economic Affairs Intellectual Property Bureau employee consumption cooperative printing-1]·Β unsigned char id = 20 (bdctl &BD_RxBDID_MASK)»BD_RxBDID_SHIFT; if (id >- RX_BUF PAGES) { printk(,,%s Invalid BDID.\n,,, dev- -6- This paper scale applies to China National Standard (CNS) A4 specification (210 x 297 mm) A7 1280477 B7 V. Invention description (5) >name); panic queues ( Dev); } In this case, the ''panic_queues" function stops ''dev'' (I/0 is set to 5) operation because the core cannot handle errors. Even if the core can handle errors, The recovery routines required may be complex. In a conventional computer, the core is required to detect and handle such errors and all other errors or cause an emergency panic as shown above, in the present invention, The routine of the example can be modified to stop this Handle and call a simplified error handling routine. It requires fewer code than the npanic_queues" routine, and can also share this simplified routine between different I/O drivers. Ministry of Economic Affairs Intellectual Property Office Staff Cooperatives The printed simplified routine is described in conjunction with the hot-swap feature. Hot-swappable features are available in many 15 standard operating systems. For example, from Core 2.4 (January 2001), hot-swap features It becomes a standard feature of LinuxTM. First, the hot-swap feature includes support for USB and PCI (Cardbus) devices. The updated version also includes IEEE 1394 (FireWire/i丄ink) support. On the host, the S/390 channel device uses hot 20 Plug-in features such as reporting device installation and other status change events. Hot swappable features are also available for Windows, Mac OS, and other operating systems. In particular, hot-swappable features allow users to insert new devices and straighten This paper size applies to the Chinese National Standard (CNS) A4 specification (210x297 mm). A7 1280477 V. Invention Description (6) Use them without turning the computer on/off. Because of this feature Therefore, the user is not required to study system management in depth. Instead, the I/O devices will automatically set their own configuration at least partially, especially by using the insertion/removal process provided by the hot-swap feature. To set. The following is an example of a pseudocode for the insertion/removal processing routine in the general hot plug feature of LinuxTM: Interrupt or another event tells the OS that the hardware has changed 〇s to find out what has changed (hardware specific task) 10 If the device is removed, the beer is called 'removed one', the general data structure of the normal destruction device is processed, otherwise the new hardware is read. 15 The general information indicating it is constructed. The Ministry of Economic Affairs, the Intellectual Property Bureau, the employee consumption cooperative, the printing device, the device is available. , allocate resources, etc. Look for a driver that can handle it. The call driver 'initializes a' processing routine. 20 As explained above, the hot-swap feature contains several routines with initial =, remove, and on/off 1 0 device. In other words, the hot plug feature provides a basic service that includes software code for error handling during initialization, removal, and on/off 1/0 devices. Because of these basic clothes

五、發明說明(7) 務如果故些錯誤係藉由使用埶插枋胜外^ 理,目,丨社、、,^ …袍拔特徵而獲得處 ' 則核心亚不需要具有它們自己的常式。 (如傳I此卜所取代在核心中具有所有不同^錯誤處理碼 二傳統上所做的)的是,核心可模擬 存的基本服務。尤1,各社、> I〜用既 恶士务尤其备核心模擬關閉/開啟U0裝 m亥心並不需要包含供初始化與處理錯誤用的 碼。於本發明中,錯誤處理會: 呼叫裝置移除一個處理常式 [關機與開機] 啤叫裝置初始化一個處理常式 ,上述偽碼係更詳細顯示於圖3中。如此圖所示, 當,測到一項錯誤時,其被傳達給核心(步驟3〇1)。 接著,核心從I/O匯流排隔開1/〇裝置(步驟3〇3)。 此種隔開步驟尤其可包含:除能資料1/〇、除能ι/〇 15裝置使其不能寫入/讀取RAM、及/或除能1/()裝置之 控制。一旦I/O裝置被安全隔開,核心接著模擬從 I/O匯流排移除I/O裝置(步驟305)。 然後,可利用一些不同方式將1/()裝置回至線上 (例如,模擬I/O裝置之插入,步驟3〇7)。舉例而 20言,1/0裝置可在等待某段時間(例如十分之一秒)之 後回至線上。在另一例中,當可取得更多時間/資源 時,I/O裝置可回至線上。例如,如果1/〇裝置正在 控制一精密鑽頭,則其可等待直到鑽頭在使其再初始 A7 1280477 _ B7 五、發明說明(8 ) 化之前完成其目前任務為止。在又另一例中,當I/O 裝置被呼叫時’ I/O裝置可回至線上。例如,如果磁 碟驅動程式失效,則其在接收至磁碟之讀取/寫入命 令時可回至線上。 5 如上所述,可校正可改正的錯誤。例如,當硬體 與軟體對於彼此狀態混淆時,這種錯誤可藉由關閉/ 開啟裝置而被矯正。 在存在有不可改正的錯誤之場合下,I/O裝置可 落入一種不斷試著利用嘗試之間的延遲來重新開始之 10 迴路。不可改正的錯誤之例子係為不連通纜線、過度 加熱之硬體元件等。在這種情況之下,初始化常式發 現裝置有錯誤並拒絕要求它。舉例而言,IBM lanstreamer之令牌環(token ring)驅動程式使用下述的 碼: 15 writew(readw(streamer mmio + LAPWWO) +69 streamer mmio + LAPA); if (readw(streamer_mmio + LAPD)) { 經濟部智慧財產局員工消費合作社印製 printk(KERN—INFO ’’tokenring card intialization failed : %d\nM? 20 ntohs(readw(streamer_mmio + LAPD))); release_region(dev->base_addr5 STREAMERIOSPACE); /*不要求此裝置*/ -10- 本紙張尺度適用中國國家標準(CNS)A4規格(210x297公釐) 1280477 a7 B7 五、發明說明(9 ) return-1; } 簡化的錯誤處理常式關於PCI匯流排之更詳細例 子係說明於下。關於PCI匯流排之基本特徵,其具有 5 一些由操作系統所控制之操作狀態。這些狀態被標示 為” DO”、nDln、nD2n以及”D3”。傳統上,D0狀態係 為一正常操作狀態,而D3狀態係為一斷電狀態。在 D3狀態中,除了實體上關閉/開啟I/O裝置所需要的 功能以外,連接至匯流排之I/O裝置係被關閉。 10 PCI匯流排亦包含一主要旗標,其允許核心阻止 連接至匯流排之I/O裝置與CPU相通。藉由清除主 要旗標,PCI匯流排避免這些裝置毁損電腦之其他部 分。當主要旗標被清除時,此裝置並未被允許寫入至 電腦之記憶體。 15 藉由參考圖4,本發明之簡化錯誤處理常式係藉 經濟部智慧財產局員工消費合作社印製 由使用上述PCI匯流排之特徵而作說明。當驅動程式 偵測一項錯誤時,其通知核心。核心藉由清除主要旗 標來隔開I/O裝置。在主要旗標已被清除之後,核心 接著模擬給驅動程式之熱插拔通知表示裝置已移除 20 (步驟401),然後,停止I/O裝置使其進入一斷電狀 態(步驟403)。對PCI匯流排而言,這係為按此說明 書被稱為’D3’之狀態。於此點上,I/O裝置係藉由核 心而實體上被關閉(步驟405)。在PCI匯流排中,核 -11- 本紙張尺度適用中國國家標準(CNS)A4規格(210x297公釐) 1280477 a7 B7 五、發明說明(1〇) 心係被設計成用以控制I/O裝置之打開/關閉電源狀 態。在其他實施例中,如果核心無法控制打開/關閉 電源狀態,如同在USB中,則核心係被設計成只模 擬關閉/開啟I/O裝置。 5 在某段時間(例如十分之一秒)之後,可將I/O裝 置帶回至正常操作。對PCI匯流排而言,這係為被稱 為ΈΚ)’之狀態(步驟407)。接著,核心通知熱插拔層裝 置已被插入(例如,模擬熱插拔插入)(步驟409)。依 據錯誤之嚴重性,當下一次呼叫裝置時或當資源變成 10 可利用時,錯誤可被初始化常式所校正。 以下為使用Linux™核心中之PCI匯流排來實施 此實施例之一組例子碼。於此,為清楚起見,已移除 錯誤處理碼。 void pci_device_failed(struct pci dev *pdev) 15 { u8 flag; /*清除主要旗標*/ 經濟部智慧財產局員工消費合作社印製 pci—readconfig—byte(pdev5 PCI COMM AND, &flag); 20 flag &=〜PCI—COMMAND—MASTER; /* 隔開裝置 */ pci—writeconfig—byte(pdev, PCICOMMAND, flag); -12- 本紙張尺度適用中國國家標準(CNS)A4規格(210x297公釐) A7 1280477 B7 五、發明說明(11) printk(KERN_WARNING “Device %s has failed and is being restartedAn, pdev->slotname); INIT_WORK(&pdev->recovery, pci_recover_device,pdev); 5 schedule_work(&pdev->recovery); } /*依照schedule work之需求,一方便就呼叫*/ void pci_recover_device(void *arg) { 10 struct pci dev *pdev = arg; /*模擬硬體之移除*/ pci_remove_device(dev); /*將裝置置於D3(off)狀態*/ pci—set_state(dev,3); 15 /*簡單關閉電源*/ set—current—state(TASK_UNINTERRUPTIBLE): schedule—timeout(HZ/10); 經濟部智慧財產局員工消費合作社印製 /*將裝置置於D0(on)狀態*/ pci_set_state(dev,0); 20 /*模擬硬體之熱插入*/ pci—insert—device(dev); printk(KERN_ WARNING “Device %s has been restrartedAn, pdev->slotname); -13- 本紙張尺度適用中國國家標準(CNS)A4規格(210x297公釐) 1280477 A7 B7 五、發明說明(I2 經濟部智慧財產局員工消費合作社印製 上述常式係類似於實體上關閉/開啟I/O裝置。 如上所注意到的,藉由關閉/開啟,係使用通常被標 準化並受到良好測試之既存的初始化常式。這種初始 5 化常式一般包含此邏輯以檢查I/O裝置是否有適當功 能(例如,設定I/O裝置、測試連接、測試電力等)。 雖然上述例子使用PCI匯流排,但本發明係直接 適用於任何I/O匯流排,於此操作系統具有用以處理 裝置之動態添加與移除(例如,熱插拔)之介面,以及 10 在軟體中存在有重設或重新啟動(power cycle)I/0裝 置之能力。更具體地說,對連接至允許核心控制關閉 /開啟裝置之匯流排(例如PCI匯流排、圖形加速埠 (AGP)等)之I/O裝置而言,核心可包含類似於上述提 供之例子的碼。對連接至允許核心實體上關閉/開啟 15 裝置之匯流排(例如USB匯流排、Cardbus、PCMCIA 等)之I/O裝置而言,核心包含必須實體上關閉/開啟 裝置的碼。換言之,只要匯流排之結構允許模擬或實 體上關閉/打開連接到那裡的I/O裝置之電源,本發 明係適用於這種匯流排。 20 雖然已顯示與說明本發明之例子,但熟習本項技 藝者將輕易明白到在不背離如由以下申請專利範圍所 界定之本發明之範疇之下可做成各種變化及修正。本 發明係適用於任何操作系統(例如Linux™、Unix、 -14- 本紙張尺度適用中國國家標準(CNS)A4規格(210x297公釐) 裝 訂V. INSTRUCTIONS (7) If the mistakes are caused by the use of the 埶 枋 外 , , , , , , , , , , 则 则 则 则 则 则 则 则 则 则 则 则 则 则 则 则 则 则 则 则 核心 核心formula. (For example, I have replaced all the different error handling codes in the core. Traditionally, the core can simulate the basic services of the memory. In particular, each company, > I ~ use the ethics, especially the core simulation to turn off / turn on the U0 installed mhai heart does not need to contain the code for initialization and processing errors. In the present invention, the error handling will be: The calling device removes a processing routine [Shutdown and Power On] The beer calling device initializes a processing routine, and the above pseudo code system is shown in more detail in FIG. As shown in this figure, when an error is detected, it is communicated to the core (step 3〇1). Next, the core is separated from the I/O busbar by a 1/〇 device (step 3〇3). Such a step of separating may include, inter alia, the removal of the material 1/〇, the ι/〇 15 device such that it cannot be written/read RAM, and/or the control of the I/() device. Once the I/O devices are safely separated, the core then simulates removing the I/O devices from the I/O busbars (step 305). The 1/() device can then be returned to the line in a number of different ways (e.g., the insertion of an analog I/O device, step 3〇7). For example, in the 20th, the 1/0 device can return to the line after waiting for a certain period of time (for example, one tenth of a second). In another example, the I/O device can go back online when more time/resources are available. For example, if a 1/〇 device is controlling a precision drill bit, it can wait until the drill bit completes its current task before re-initializing it A7 1280477 _ B7 V. In yet another example, the I/O device can be returned to the line when the I/O device is called. For example, if the disk driver fails, it can go back to the line when it receives a read/write command to the disk. 5 As described above, correctable errors can be corrected. For example, when the hardware and the software are confused with each other, such an error can be corrected by turning off/on the device. In the presence of uncorrectable errors, the I/O device can fall into a loop that is constantly trying to restart using the delay between attempts. Examples of uncorrectable errors are non-connected cables, overheated hardware components, and the like. In this case, the initialization routine found that the device has an error and refuses to request it. For example, IBM lanstreamer's token ring driver uses the following code: 15 writew(readw(streamer mmio + LAPWWO) +69 streamer mmio + LAPA); if (readw(streamer_mmio + LAPD)) { The Ministry of Economic Affairs Intellectual Property Bureau employee consumption cooperative prints printk (KERN_INFO ''tokenring card intialization failed : %d\nM? 20 ntohs(reader(streamer_mmio + LAPD))); release_region(dev->base_addr5 STREAMERIOSPACE); *This device is not required*/ -10- This paper size is applicable to China National Standard (CNS) A4 specification (210x297 mm) 1280477 a7 B7 V. Invention description (9) return-1; } Simplified error handling routine on PCI A more detailed example of the bus bar is described below. Regarding the basic features of the PCI bus, it has 5 operating states controlled by the operating system. These states are labeled "DO", nDln, nD2n, and "D3". Traditionally, the D0 state is a normal operating state and the D3 state is a power down state. In the D3 state, the I/O devices connected to the bus are closed except for the functions required to physically turn off/on the I/O device. The 10 PCI bus also includes a primary flag that allows the core to block I/O devices connected to the bus from communicating with the CPU. By clearing the main flag, the PCI bus bar prevents these devices from damaging other parts of the computer. When the primary flag is cleared, the device is not allowed to write to the computer's memory. 15 By referring to Fig. 4, the simplified error handling routine of the present invention is printed by the Ministry of Economic Affairs, Intellectual Property Office, Staff Consumer Cooperative, and is described by using the characteristics of the above PCI bus. When the driver detects an error, it notifies the core. The core separates the I/O devices by clearing the main flags. After the primary flag has been cleared, the core then simulates a hot plug notification to the driver indicating that the device has been removed 20 (step 401), and then stops the I/O device from entering a power down state (step 403). For the PCI bus, this is the state referred to as 'D3' according to this specification. At this point, the I/O device is physically closed by the core (step 405). In the PCI bus, the nuclear-11- paper scale applies to the Chinese National Standard (CNS) A4 specification (210x297 mm) 1280477 a7 B7 V. Description of the invention (1〇) The heart system is designed to control the I/O device Turn on/off the power state. In other embodiments, if the core is unable to control the power on/off state, as in the USB, the core is designed to only simulate turning the I/O device off/on. 5 After a certain period of time (for example, one tenth of a second), the I/O device can be brought back to normal operation. For the PCI bus, this is the state referred to as ΈΚ)' (step 407). Next, the core informs that the hot plug layer device has been inserted (e.g., simulated hot plug insertion) (step 409). Depending on the severity of the error, the error can be corrected by the initialization routine when the device is next called or when the resource becomes available. The following is a set of example codes for implementing this embodiment using a PCI bus in the LinuxTM core. Here, the error handling code has been removed for clarity. Void pci_device_failed(struct pci dev *pdev) 15 { u8 flag; /*Clear main flag*/ Ministry of Economic Affairs Intellectual Property Bureau employee consumption cooperative printed pci-readconfig_byte(pdev5 PCI COMM AND, &flag); 20 flag &=~PCI—COMMAND—MASTER; /* Separate device*/ pci—writeconfig—byte(pdev, PCICOMMAND, flag); -12- This paper size applies to China National Standard (CNS) A4 specification (210x297 mm) A7 1280477 B7 V. INSTRUCTIONS (11) printk(KERN_WARNING "Device %s has failed and is being restartedAn, pdev->slotname);INIT_WORK(&pdev->recovery,pci_recover_device,pdev); 5 schedule_work(&amp ;pdev->recovery); } /*In accordance with the requirements of the schedule work, a convenient call */ void pci_recover_device(void *arg) { 10 struct pci dev *pdev = arg; /* Simulated hardware removal */ Pci_remove_device(dev); /* puts the device in the D3 (off) state */ pci-set_state(dev,3); 15 /* simply turns off the power */ set_current_state(TASK_UNINTERRUPTIBLE): schedule_timeout(HZ/ 10); Ministry of Economic Affairs Intellectual Property Bureau employee consumption cooperative printing /* Put the device in the D0 (on) state * / pci_set_state (dev, 0); 20 / * simulated hardware hot plug * / pci - insertert - device (dev); printk (KERN_ WARNING "Device %s has been restrartedAn, Pdev->slotname); -13- This paper scale applies to China National Standard (CNS) A4 specification (210x297 mm) 1280477 A7 B7 V. Invention Description (I2 Ministry of Economic Affairs Intellectual Property Bureau employee consumption cooperative prints the above-mentioned regular system Similar to physically turning off/on I/O devices. As noted above, by turning off/on, the existing initialization routines that are typically standardized and well tested are used. This initial routine typically includes this logic to check if the I/O device has the appropriate functionality (eg, setting up I/O devices, testing connections, testing power, etc.). Although the above example uses a PCI bus, the present invention is directly applicable to any I/O bus, which has an interface for handling dynamic addition and removal (eg, hot plugging) of the device, and 10 There is the ability in the software to reset or power cycle the I/O device. More specifically, for an I/O device connected to a bus bar (e.g., PCI bus, graphics accelerator (AGP), etc.) that allows the core control to turn off/on the device, the core may include an example similar to that provided above. code. For I/O devices connected to a bus that allows the core entity to be turned off/on 15 devices (e.g., USB bus, Cardbus, PCMCIA, etc.), the core contains code that must physically close/turn on the device. In other words, the present invention is applicable to such a bus bar as long as the structure of the bus bar allows analog or physical power to turn off/on the I/O device connected thereto. While the present invention has been shown and described, it will be apparent to those skilled in the art that the various modifications and changes can be made without departing from the scope of the invention as defined by the appended claims. The present invention is applicable to any operating system (eg, LinuxTM, Unix, -14- This paper size applies to the Chinese National Standard (CNS) A4 specification (210x297 mm) binding

I A7 1280477 B7 五、發明說明(l3)I A7 1280477 B7 V. Description of invention (l3)

Microsoft Windows、MacOS等),只要它們具有熱插 拔型介面等等。因此,本發明係只被以下申請專利範 圍及其等效設計所限制。 厂裝- 訂· 經濟部智慧財產局員工消費合作社印製 本紙張尺度適用中國國家標準(CNS)A4規格(210x297公釐) A7 1280477 B7 五、發明說明(14) 【圖式簡單說明】 圖1係為習知之輸入/輸出(I/O)結構之概要圖; 圖2係為習知之I/O埠之概要圖; 圖3係為依據本發明之一實施例之說明處理錯誤 5 之高階步驟之流程圖;以及 圖4係為依據本發明之一實施例之說明詳細錯誤 處理步驟之一例之流程圖 【圖式之代號說明】 103〜I/O匯流排 107〜I/O埠 201〜控制暫存器 205〜輸入暫存器 209〜I/O介面Microsoft Windows, MacOS, etc., as long as they have a hot plug interface and so on. Therefore, the present invention is limited only by the scope of the following patent application and its equivalent design. Factory Installation - Booking · Ministry of Economic Affairs Intellectual Property Bureau Staff Consumer Cooperative Printed This paper scale applies China National Standard (CNS) A4 specification (210x297 mm) A7 1280477 B7 V. Invention Description (14) [Simple diagram] Figure 1 A schematic diagram of a conventional input/output (I/O) structure; FIG. 2 is a schematic diagram of a conventional I/O埠; FIG. 3 is a high-order step of processing error 5 according to an embodiment of the present invention. FIG. 4 is a flow chart showing an example of detailed error processing steps according to an embodiment of the present invention. [Description of the code] 103~I/O bus bar 107~I/O埠201~Control Register 205~ input register 209~I/O interface

10 101〜CPU 105〜I/O裝置 109〜裝置控制器 203〜狀態暫存器 207〜輸出暫存器 15 301〜偵測一錯誤 303〜從匯流排隔開裝置 305〜模擬移除裝置 307〜模擬插入裝置 309〜回到線上 經濟部智慧財產局員工消費合作社印製 401〜模擬移除裝置 403〜使裝置置於OFF狀態(D3) 20 405〜簡單關閉電源 407〜使裝置置於ON狀態(D0) 409〜模擬插入裝置 -16- 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐)10 101 to CPU 105 to I/O device 109 to device controller 203 to state register 207 to output register 15 301 to detect an error 303 to from bus bar separation device 305 to analog removal device 307 to The analog insertion device 309 is returned to the online Ministry of Economic Affairs, the Intellectual Property Office, the employee consumption cooperative, the printing 401 to the analog removal device 403, the device is placed in the OFF state (D3), and the 405 is simply turned off, and the device is placed in the ON state ( D0) 409~ Analog Insertion Device-16- This paper size is applicable to China National Standard (CNS) A4 specification (210 X 297 mm)

Claims (1)

經濟部智慧財產局員工消費合作社印製 128047^ 月/碉修(更)正替換頁The Ministry of Economic Affairs, the Intellectual Property Bureau, the employee consumption cooperative, prints 128,047, ^ month / 碉 repair (more) is replacing page 、…厂—1牟月 c (Submitted on December •種在執行時間期間校正電腦中的裝置之錯 "吳而不而重新載啟動(reboot)電腦之方法,包含: 模擬從該電腦移除該裝置; 自動關閉該裝置; 5 自動開啟該裝置;以及 、模擬插入該裝置,其中一預先存在的初始化常 式校正該錯誤。 2·如申請專利範圍第1項所述之方法,更包含 從其I/O匯流排隔開該裝置。 10 3·如申請專利範圍第2項所述之方法,其中隔 開該裝置包含下述至少一 ··除能資料1/〇、除能記 憶體以及除能匯流排主控。 4·如申請專利範圍第1項所述之方法,其中自 動關閉該裝置包含將該裝置置於一 PCI介面中之_ 15 D3狀態。 5·如申請專利範圍第1項所述之方法,其中自 動開啟該裝置包含將該裝置置於一 PCI介面中之— D0狀態。 6·如申請專利範圍第1項所述之方法,更包含 20維持該裝置被關閉持續一段預定時間。 7·如申請專利範圍第5項所述之方法,其中該 預定時間實質上等於十分之_秒。 8·如申請專利範圍第1項所述之方法,其中該 預先存在的初始化常式係設計成用以在計算資源變 -17 -, ... factory - 1 month c (Submitted on December • the correct time during the execution time to correct the device in the computer " Wu and then re-boot the computer, including: Simulation to remove the computer from the computer The device automatically turns off the device; 5 automatically turns the device on; and simulates the insertion of the device, wherein a pre-existing initialization routine corrects the error. 2. The method of claim 1, further comprising The I/O bus bar separates the device. 10 3. The method of claim 2, wherein the device comprises at least one of the following: • energy dissipating material 1/〇, de-energy memory, and 4. The method of claim 1, wherein the method of automatically closing the device comprises placing the device in a _ 15 D3 state in a PCI interface. 5. If the patent application is in the first item The method of claim 1, wherein automatically opening the device comprises placing the device in a PCI interface - D0 state. 6. The method of claim 1, further comprising 20 maintaining the device being closed for a period of time The method of claim 5, wherein the predetermined time is substantially equal to tenths of a second. The method of claim 1, wherein the pre-existing initialization is The system is designed to be used in computing resources. 8 8 8 8 A B CD 專利申請案第93105026號 ROC Patent Appln. No. 93105026 修正後無劃線之中文申請專利範圍修正本—附件(二) Amended Claims in Chinese - Enel ΠΠ8 8 8 8 A B CD Patent Application No. 93105026 ROC Patent Appln. No. 93105026 Amended Claims in Chinese - Enel Amend Claims in Chinese - Enel ΠΠ 申清專利範圍 送呈) X , 2006) 本紙張尺度適用中國國家標準(CNS)A4規格(2l〇x297公餐) 93073-CLM-接Shen Qing patent scope) (X, 2006) This paper scale applies to China National Standard (CNS) A4 specification (2l〇x297 public meal) 93073-CLM-connection 5 10 15 包 啟 經濟部智慧財產局員工消費合作社印制农 20 更包 其中 、可利用時校正該錯誤。 預參,專利範圍第1項所述之方*,其中該 正的常式係設計成用以在該裝置被呼叫時校 ·種在執行時間期間校正電腦中的裝置之錯 :不需, (reb〇〇t)電腦之系統,包含·· 用以模擬從該電腦移除該裝置之設帛;以及 用以模擬插入該裝置之設備,其中該預 的初始化常式校正該錯誤。 、 人J1.如申請專利範圍第10項所述之系統,更包 各仗其I/O匯流排隔開該裝置。 12.如申請專利範圍第"項所述之系統 隔開該裝置包含下述至少—:除能:㈣⑽、除 記憶體以及除能匯流排主控。 μ 13·如申請專利範圍第1〇項所述之系統, 含用以自動關閉該裝置之設備,以及用:、、、動 該裝置之設備。 動開 I4·如申請專利範圍第13項所述之系統 含維持該裝置被關閉持續一段預定時間。 15·如申請專利範圍第14項所述之系統 該預定時間實質上等於十分之一秒。 16.如申請專利範圍第10項所述之系統,直中 該預先存在的初始化常式係設計成用以在計算資源 -18 - ^ 本紙張尺度適用中國國家標準(CNS)A4規格(2ΐ〇χ297公釐)5 10 15 Bao Qi Ministry of Economic Affairs Intellectual Property Bureau Employees Consumption Cooperatives Printed Agriculture 20 Packages Where, when available, correct the error. Pre-participation, the party described in item 1 of the patent scope, wherein the positive routine is designed to correct the error of the device in the computer during the execution time when the device is called: no, ( Rebt) A computer system comprising: a device for simulating removal of the device from the computer; and a device for simulating insertion of the device, wherein the pre-initialization routine corrects the error. , J1. The system described in claim 10, further comprising its I/O busbar separating the device. 12. The system as described in the scope of the patent application " separates the device to include at least the following:: (4), (10), memory and de-energized bus master. μ 13· The system of claim 1 includes a device for automatically shutting down the device, and a device for::,, and moving the device. The opening of the system is as described in claim 13 of the patent application, wherein the system is maintained for a predetermined period of time. 15. The system of claim 14 wherein the predetermined time is substantially equal to one tenth of a second. 16. The system of claim 10, wherein the pre-existing initialization routine is designed to apply the Chinese National Standard (CNS) A4 specification on the computing resource -18 - ^ paper scale (2ΐ〇 Χ297 mm) 六 /UM3修(更)正替4 申請專利範 閥m C8 圍 變成可利用時校正該錯誤。 …7:申請專利範圍第10項所述之系統,其中 二叫士 :在的初始化常式係設計成用以在該裝置被 乎叫日寸校正該錯誤。 5 10 15 種存在於電腦可讀媒體上用以處理裝置之 :!之電腦程式產品’該電腦程式產品包含用以使 弘月自執行下述步驟之指令·· 模擬從該電腦移除該裝置; 自動關閉該裝置; 自動開啟該裝置;以及 模擬插入該裝置’其中一預先存在的初始化常 式校正該錯誤。 19’如申請專利範圍第18項所述之電腦程式產 ^其中該預先存在的初始化常式係設計成用以在 计异貧源變成可利用時校正該錯誤。 20.如申請專利範圍第19項所述之電腦程式產 品’其中該預先存在的初始化常式係設計成用以在 该裝置被呼叫時校正該錯誤。 -19 - 本紙張尺度適用_國國家標準(CNS)A4規格(21〇x297公釐)Six/UM3 repair (more) corrects the error when the patent application valve C C8 becomes available. ...7: The system of claim 10, wherein the second routine is designed to correct the error at the device. 5 10 15 computer program products that exist on a computer readable medium for processing devices: 'The computer program product contains instructions for enabling Hongyue to perform the following steps. · Simulating the removal of the device from the computer Automatically shutting down the device; automatically turning the device on; and simulating the device's one of the pre-existing initialization routines correcting the error. 19' The computer program of claim 18, wherein the pre-existing initialization routine is designed to correct the error when the heterogeneous source becomes available. 20. The computer program product of claim 19, wherein the pre-existing initialization routine is designed to correct the error when the device is called. -19 - This paper size applies to National Standard (CNS) A4 specification (21〇x297 mm)
TW093105026A 2004-01-20 2004-02-27 System, method and computer program product for correcting error of device connected to computer TWI280477B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2004/001286 WO2004068397A2 (en) 2003-01-22 2004-01-20 Hot plug interfaces and failure handling

Publications (2)

Publication Number Publication Date
TW200525343A TW200525343A (en) 2005-08-01
TWI280477B true TWI280477B (en) 2007-05-01

Family

ID=38742511

Family Applications (1)

Application Number Title Priority Date Filing Date
TW093105026A TWI280477B (en) 2004-01-20 2004-02-27 System, method and computer program product for correcting error of device connected to computer

Country Status (1)

Country Link
TW (1) TWI280477B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI503673B (en) * 2008-05-14 2015-10-11 Ibm Computer system, method for initializing a computer system and computer program product

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101615152B (en) * 2009-07-13 2013-07-03 中兴通讯股份有限公司 Method and device for detecting hot plug fault of storage card

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI503673B (en) * 2008-05-14 2015-10-11 Ibm Computer system, method for initializing a computer system and computer program product

Also Published As

Publication number Publication date
TW200525343A (en) 2005-08-01

Similar Documents

Publication Publication Date Title
US7260749B2 (en) Hot plug interfaces and failure handling
JP3697178B2 (en) Method, system and computer program product for managing hardware devices
US5787019A (en) System and method for handling dynamic changes in device states
TWI310899B (en) Method, system, and product for utilizing a power subsystem to diagnose and recover from errors
TW498213B (en) Method and chipset for supporting interrupts of system management mode in multiple-CPU system
US7890812B2 (en) Computer system which controls closing of bus
US9921949B2 (en) Software testing
JP2004342109A (en) Automatic recovery from hardware error in i/o fabric
JP6032510B2 (en) Recovery after I / O error containment event
US20120042215A1 (en) Request processing system provided with multi-core processor
US20230115629A1 (en) SYSTEM AND METHOD FOR VALIDATING A POWER CYCLE FOR AN EMULATED PCIe BASED STORAGE DEVICE
US20090049330A1 (en) Method and system for virtual removal of physical field replaceable units
US6725396B2 (en) Identifying field replaceable units responsible for faults detected with processor timeouts utilizing IPL boot progress indicator status
US10514972B2 (en) Embedding forensic and triage data in memory dumps
JP4726915B2 (en) Method and system for determining device criticality in a computer configuration
CN106371945A (en) Method and device for restoring firmware information
TWI280477B (en) System, method and computer program product for correcting error of device connected to computer
US7647531B2 (en) Overriding daughterboard slots marked with power fault
US11314582B2 (en) Systems and methods for dynamically resolving hardware failures in an information handling system
AU2011217727B2 (en) Co-design of a testbench and driver of a device
US20050210329A1 (en) Facilitating system diagnostic functionality through selective quiescing of system component sensor devices
JP2006227923A (en) Disk management apparatus and program
TW201137608A (en) System and method for handling system failure
TW200847013A (en) Employing a buffer to facilitate instruction execution
US20070124522A1 (en) Node detach in multi-node system