EP0920661A1 - Slave dsp reboots stalled master cpu - Google Patents
Slave dsp reboots stalled master cpuInfo
- Publication number
- EP0920661A1 EP0920661A1 EP98905551A EP98905551A EP0920661A1 EP 0920661 A1 EP0920661 A1 EP 0920661A1 EP 98905551 A EP98905551 A EP 98905551A EP 98905551 A EP98905551 A EP 98905551A EP 0920661 A1 EP0920661 A1 EP 0920661A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- master
- slave
- processor
- stalled
- master processor
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/24—Resetting means
Definitions
- the invention relates to an information processing system, in particular to a consumer electronics home entertainment system, with a slave processor for processing a task, and a master processor for control of the system, as defined in the precharacterizing part of claim 1.
- a typical multi-processor system with a hierarchical organization comprises one or more slave processors, each processing a specific task, and a master processor driving one or more slave processors and controlling the system.
- the system is a multimedia consumer apparatus or a digital entertainment system, wherein the master is a CPU and the slaves are DSPs.
- One of the slaves processes audio data, another slave processes video data.
- the master encounters an event that causes the master to stop working. For example, the master has lost its stack, or its memory is completely filled. The result is that the entire system stops functioning and has to be rebooted, typically by hand. The power has to be turned off before the system can start afresh (i.e. , cold reboot). Accordingly, the complete state of the system may be lost.
- typical consumer apparatus of the type described which are commercially available now or in the near future, are well protected against such disasters, since all processes active on the system have been well tried out by the OEM during the system's development.
- An object of the invention is therefore to provide an asynchronously multiprocessor system that limits the rather disastrous consequences of a stalled master or a "hung" system. It is another object to render a cold reboot power sequence unnecessary, which is of particular interest to apparatus meant for the consumer market. If anything goes wrong with the master, the consumer him/herself is the sole external agent available to jumpstart the system. Most consumers are not, and should not need to be, particularly knowledgeable about the nuances of system architectures beyond where to put the power plug and how to operate the remote. It is clear that having to do a cold reboot now and again is not only particularly unattractive, but a major drawback.
- the invention provides an information processing system with a slave processor for processing a task, and a master processor for control of the system.
- the slave processor is operative to reboot the master processor if the master processor has stalled.
- the slave detects that the master has stalled by monitoring the master's heart beat.
- the automatic reboot performed by the slave allows the temporarily inactivated master to start working again without intervention of the user.
- periodic state savings are performed allowing the system to pick up at the last known valid checkpoint. Since almost all of the stalls occur as a consequence of mutually interfering events in the asynchronous system, restarting the master at the last known valid checkpoint solves the problem in most of the cases.
- the system according to the invention is particularly useful in a consumer apparatus, wherein the DSP of the slave is capable of rebooting the master's CPU without the user's interfering.
- FigJ is a block diagram of a system according to the invention.
- FigJ is a block diagram of a multiprocessor system 100 according to the invention.
- System 100 has a master processor 102 and one or more slave processors 104 and 106.
- Master 102 drives the entire system 100.
- Slave 104 and 106 each process a respective one of specific tasks under control of master 102.
- the system is, for example, a multimedia system with an open architecture, wherein master 102 is a CPU and slaves 104 and 106 are DSPs (Digital signal processors).
- Slaves 104 and 106 may communicate with each other.
- Master 102 has program memory 108.
- Slave 104 has program memory 110, and slave 106 has program memory 112.
- Master 102 sends a data stream 118 to slave 104 wherein periodically a special command occurs, the sole purpose of which is to notify slave 104 of the fact that master 102 is still running.
- the special command is commonly referred to as "heart beat" .
- a heart beat is sent one every second.
- Slave 104 has a fail safe timer 114. Upon receipt of a heart beat, slave 104 resets its timer 114. Timer 114 expires after, say, 2 seconds, which is substantially longer than the time period between two successive heart beats. When master 102 stalls, slave 104 does not receive the heart beat anymore, and timer 114 expires. This confirms that master 102 has become inert. Slave 104 then resets master 102.
- the reset brings master 102 back to the very beginning of the program(s) it was supposed to execute, and master 102 starts anew from ground zero.
- master 102 is coupled to a checkpoint memory
- master 102 Prior to the failure, master 102 has been updating checkpoint memory 116 every N heart beats with the system state. For example, every 10 heart beats the content of the registers (not shown) of master 102, including the I/O registers, control registers, and the content of memory 108 are stored in memory 116.
- Memory 116 thus stores periodically a snapshot of the system, i.e. , all information that unambiguously defines the state of system 100 that is necessary to restore the state of the system.
- slave 104 notes that master 102 has stalled, slave 104 sends a reset to master 102.
- master 102 starts to reboot it sends a response to slave 104 indicating it rebooted. This response enables slave 104 to issue a command to master 102 to fetch the last valid state recorded in memory 116, to reload the registers and memory 108 with this valid state, and start executing the program code from there on.
- Slave 104 can notify the user, for example, via a short message on the system's display (not shown), that a problem occurred, has been solved, and that system operation has been resumed.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Debugging And Monitoring (AREA)
- Retry When Errors Occur (AREA)
- Hardware Redundancy (AREA)
- Power Sources (AREA)
- Multi Processors (AREA)
Abstract
A digital home entertainment system comprises one or more slave processors, e.g, DSPs, for processing specific tasks, and a master processor, e.g., a CPU, for control of the system. The slave processor is capable of rebooting the master processor if the master processor has stalled. This slave-controlled rebooting avoids manual cold rebooting of the system and is particularly advantageous in open-architecture multimedia systems with asynchronously cooperating components.
Description
SLAVE DSP REBOOTS STALLED MASTER CPU
The invention relates to an information processing system, in particular to a consumer electronics home entertainment system, with a slave processor for processing a task, and a master processor for control of the system, as defined in the precharacterizing part of claim 1.
A typical multi-processor system with a hierarchical organization comprises one or more slave processors, each processing a specific task, and a master processor driving one or more slave processors and controlling the system. For example, the system is a multimedia consumer apparatus or a digital entertainment system, wherein the master is a CPU and the slaves are DSPs. One of the slaves processes audio data, another slave processes video data.
Suppose that, in the multi-processor system, the master encounters an event that causes the master to stop working. For example, the master has lost its stack, or its memory is completely filled. The result is that the entire system stops functioning and has to be rebooted, typically by hand. The power has to be turned off before the system can start afresh (i.e. , cold reboot). Accordingly, the complete state of the system may be lost. Fortunately, typical consumer apparatus of the type described, which are commercially available now or in the near future, are well protected against such disasters, since all processes active on the system have been well tried out by the OEM during the system's development.
An object of the invention is therefore to provide an asynchronously multiprocessor system that limits the rather disastrous consequences of a stalled master or a "hung" system. It is another object to render a cold reboot power sequence unnecessary, which is of particular interest to apparatus meant for the consumer market. If anything goes wrong with the master, the consumer him/herself is the sole external agent available to jumpstart the system. Most consumers are not, and should not need to be, particularly knowledgeable about the nuances of system architectures beyond where to put the power plug and how to operate the remote. It is clear that having to do a cold reboot now and again is
not only particularly unattractive, but a major drawback.
Consumer apparatus have been becoming increasingly more sophisticated. Modular configurations and open architectures are believed to form the paradigm for such apparatus. The inventors have realized, however, that failure of the master may occur more frequently in such an architecture, typically when its components are cooperating asynchronously. The reason for this is the following. An open architecture system can be modified and extended at will. Future functionalities, presently unknown, or customized functionalities, will be added to the existing system as an after-market add-on. Proper functioning under each and every circumstance cannot be guaranteed anymore, simply because many of all possible processes could not have been contemplated in advance by the manufacturer, let alone tried out in the development phase.
To this end, the invention provides an information processing system with a slave processor for processing a task, and a master processor for control of the system. The slave processor is operative to reboot the master processor if the master processor has stalled. The slave detects that the master has stalled by monitoring the master's heart beat. The automatic reboot performed by the slave allows the temporarily inactivated master to start working again without intervention of the user. Preferably, periodic state savings are performed allowing the system to pick up at the last known valid checkpoint. Since almost all of the stalls occur as a consequence of mutually interfering events in the asynchronous system, restarting the master at the last known valid checkpoint solves the problem in most of the cases. The system according to the invention is particularly useful in a consumer apparatus, wherein the DSP of the slave is capable of rebooting the master's CPU without the user's interfering.
These and other aspects of the invention will be explained in further detail by way of example and with reference to the accompanying drawing.
FigJ is a block diagram of a system according to the invention.
FigJ is a block diagram of a multiprocessor system 100 according to the invention. System 100 has a master processor 102 and one or more slave processors 104 and 106. Master 102 drives the entire system 100. Slave 104 and 106 each process a respective one of specific tasks under control of master 102. The system is, for example, a multimedia system with an open architecture, wherein master 102 is a CPU and slaves 104 and 106 are DSPs (Digital signal processors). Slaves 104 and 106 may communicate with each other.
Master 102 has program memory 108. Slave 104 has program memory 110, and slave 106 has program memory 112.
Master 102 sends a data stream 118 to slave 104 wherein periodically a special command occurs, the sole purpose of which is to notify slave 104 of the fact that master 102 is still running. The special command is commonly referred to as "heart beat" . Typically, a heart beat is sent one every second. Slave 104 has a fail safe timer 114. Upon receipt of a heart beat, slave 104 resets its timer 114. Timer 114 expires after, say, 2 seconds, which is substantially longer than the time period between two successive heart beats. When master 102 stalls, slave 104 does not receive the heart beat anymore, and timer 114 expires. This confirms that master 102 has become inert. Slave 104 then resets master 102.
In a first embodiment, the reset brings master 102 back to the very beginning of the program(s) it was supposed to execute, and master 102 starts anew from ground zero. In a second embodiment, master 102 is coupled to a checkpoint memory
116, typically a magnetic disk. Prior to the failure, master 102 has been updating checkpoint memory 116 every N heart beats with the system state. For example, every 10 heart beats the content of the registers (not shown) of master 102, including the I/O registers, control registers, and the content of memory 108 are stored in memory 116. Memory 116 thus stores periodically a snapshot of the system, i.e. , all information that unambiguously defines the state of system 100 that is necessary to restore the state of the system. Now, when slave 104 notes that master 102 has stalled, slave 104 sends a reset to master 102. When master 102 starts to reboot it sends a response to slave 104 indicating it rebooted. This response enables slave 104 to issue a command to master 102 to fetch the last valid state recorded in memory 116, to reload the registers and memory 108 with this valid state, and start executing the program code from there on.
Slave 104 can notify the user, for example, via a short message on the system's display (not shown), that a problem occurred, has been solved, and that system operation has been resumed.
Claims
1. An information processing system with a slave processor for processing a task, and a master processor for control of the system, wherein the slave processor is operative to reboot the master processor if the master processor has stalled.
2. The system of claim 1, wherein: - the master processor periodically sends a heart beat to the slave processor;
- the slave processor has a timer with a maximum timing interval substantially longer than a time period between two successive heart beats;
- the slave processor resets the timer upon receipt of the heart beat; and
- the slave processor starts rebooting the master upon expiry of the maximum timing interval.
3. The system of claim 2, wherein:
- the system comprises a checkpoint memory for periodically saving a valid current state of the system;
- upon detection of the master processor having stalled, the slave processor issues a command to the master processor to re-establish the last valid state of the system as saved in the checkpoint memory.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US880387 | 1986-06-30 | ||
US88038797A | 1997-06-23 | 1997-06-23 | |
PCT/IB1998/000332 WO1998059288A1 (en) | 1997-06-23 | 1998-03-12 | Slave dsp reboots stalled master cpu |
Publications (1)
Publication Number | Publication Date |
---|---|
EP0920661A1 true EP0920661A1 (en) | 1999-06-09 |
Family
ID=25376154
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP98905551A Withdrawn EP0920661A1 (en) | 1997-06-23 | 1998-03-12 | Slave dsp reboots stalled master cpu |
Country Status (4)
Country | Link |
---|---|
EP (1) | EP0920661A1 (en) |
JP (1) | JP2000516745A (en) |
KR (1) | KR100518478B1 (en) |
WO (1) | WO1998059288A1 (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7428655B2 (en) | 2004-09-08 | 2008-09-23 | Hewlett-Packard Development Company, L.P. | Smart card for high-availability clustering |
JP2007034479A (en) * | 2005-07-25 | 2007-02-08 | Nec Corp | Operation system device, standby system device, operation/standby system, operation system control method, standby system control method, and operation system/standby system control method |
US7917812B2 (en) | 2006-09-30 | 2011-03-29 | Codman Neuro Sciences Sárl | Resetting of multiple processors in an electronic device |
KR101132389B1 (en) * | 2007-04-09 | 2012-04-03 | 엘지엔시스(주) | Apparatus and method of structuralizing checkpoint memory based dispersion data structure |
KR100928187B1 (en) * | 2007-11-30 | 2009-11-25 | 한국전기연구원 | Fault-safe structure of dual processor control unit |
US8117494B2 (en) * | 2009-12-22 | 2012-02-14 | Intel Corporation | DMI redundancy in multiple processor computer systems |
JP5494134B2 (en) * | 2010-03-31 | 2014-05-14 | 株式会社リコー | Control apparatus and control method |
KR102031576B1 (en) * | 2019-05-21 | 2019-10-14 | 주식회사 우리기술 | A controller of a distributed control system having an abnormal task monitoring function |
KR102220389B1 (en) * | 2019-11-28 | 2021-02-24 | 주식회사 한화 | Apparatus and method for performing real-time synchronization using fpga |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH04240946A (en) * | 1991-01-25 | 1992-08-28 | Nec Eng Ltd | Data communication system |
JPH09128269A (en) * | 1995-10-31 | 1997-05-16 | Fujitsu Ltd | Abnormality display system |
-
1998
- 1998-03-12 EP EP98905551A patent/EP0920661A1/en not_active Withdrawn
- 1998-03-12 JP JP10529327A patent/JP2000516745A/en not_active Ceased
- 1998-03-12 KR KR10-1999-7001430A patent/KR100518478B1/en not_active IP Right Cessation
- 1998-03-12 WO PCT/IB1998/000332 patent/WO1998059288A1/en not_active Application Discontinuation
Non-Patent Citations (1)
Title |
---|
See references of WO9859288A1 * |
Also Published As
Publication number | Publication date |
---|---|
KR100518478B1 (en) | 2005-10-05 |
KR20000068286A (en) | 2000-11-25 |
WO1998059288A1 (en) | 1998-12-30 |
JP2000516745A (en) | 2000-12-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7975188B2 (en) | Restoration device for BIOS stall failures and method and computer program product for the same | |
US6393560B1 (en) | Initializing and restarting operating systems | |
AU2010307632B2 (en) | Microcomputer and operation method thereof | |
KR20170120559A (en) | Watchdog timer | |
WO1998059288A1 (en) | Slave dsp reboots stalled master cpu | |
JPH07219809A (en) | Apparatus and method for data processing | |
JPH0527880A (en) | System restart device | |
TWI461905B (en) | Computing device capable of remote crash recovery, method for remote crash recovery of computing device, and computer readable medium | |
JPH10187454A (en) | Bios reloading system | |
US20010054130A1 (en) | Computing machine with hard stop-tolerant disk file management system | |
JPH11175108A (en) | Duplex computer device | |
JP2000039928A (en) | Microcomputer | |
JP3476667B2 (en) | Redundant controller | |
JPH09190360A (en) | Microcomputer and its ranaway monitoring processing method | |
JPH02281367A (en) | Sate reproducing method | |
JPH0721091A (en) | Service interruption processing method for electronic computer | |
JPS58195968A (en) | Re-execution controlling system | |
JPH0120775B2 (en) | ||
JP2003067221A (en) | Watchdog timer control circuit | |
JPH1027040A (en) | Computer resetting system | |
JPH03244045A (en) | Microcomputer circuit | |
JP3473002B2 (en) | Data processing device and data processing method | |
JPH0395634A (en) | Restart control system for computer system | |
JPH0833839B2 (en) | Data processing device | |
JPS6249426A (en) | Automatic operation control device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): DE FR GB |
|
17P | Request for examination filed |
Effective date: 19990630 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN |
|
18W | Application withdrawn |
Effective date: 20070801 |