JP2002229811A - Control method of logical partition system - Google Patents

Control method of logical partition system

Info

Publication number
JP2002229811A
JP2002229811A JP2001027757A JP2001027757A JP2002229811A JP 2002229811 A JP2002229811 A JP 2002229811A JP 2001027757 A JP2001027757 A JP 2001027757A JP 2001027757 A JP2001027757 A JP 2001027757A JP 2002229811 A JP2002229811 A JP 2002229811A
Authority
JP
Japan
Prior art keywords
processor
failure
stop
partition
bus
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP2001027757A
Other languages
Japanese (ja)
Other versions
JP3746957B2 (en
Inventor
Takeshi Koike
毅 小池
Original Assignee
Nec Eng Ltd
日本電気エンジニアリング株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nec Eng Ltd, 日本電気エンジニアリング株式会社 filed Critical Nec Eng Ltd
Priority to JP2001027757A priority Critical patent/JP3746957B2/en
Publication of JP2002229811A publication Critical patent/JP2002229811A/en
Application granted granted Critical
Publication of JP3746957B2 publication Critical patent/JP3746957B2/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

(57) [Summary] [PROBLEMS] Although it is desirable that the failure of each logical partition does not spread to other partitions as much as possible under logical partitioning, if the method of connecting with a failure / stop signal line is applied, failures of the maximum number of partitions will occur.・
A stop signal line is required. SOLUTION: A section number storage register 201 is provided in place of a failure / stop signal line 102. Processor 21
0, the processor 210 needs to shut down the operating system running on the partition 1 as a system.
Issue a fault / stop instruction for the partition 1 to which the own processor belongs.

Description

DETAILED DESCRIPTION OF THE INVENTION

[0001]

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to fault processing for an information processing apparatus, and more particularly to a method for controlling a logical partitioning system in a tightly coupled connection.

[0002]

2. Description of the Related Art When an information processing apparatus fails, it is necessary to stop the system operation depending on the degree of the failure. There are two main factors that halt system operation. One is due to software factors. In many cases, an abnormality is detected in a specific process or job, and this process or job becomes a system core. If there is, the system needs to be stopped. Since such a failure is not caused by a hardware failure, the failure can be remedied by reviewing the cause of the failure from a software viewpoint and executing a system restart.

Another fault is caused by a hardware factor, and is caused by a fault of a processor or the like serving as a system core, and after the faulty device is repaired or the faulty part is degenerated, the system is restarted. Carry out raising.

A large-scale information processing apparatus such as a general-purpose computer is provided with a log function as a measure against the above-mentioned software / hardware failure in order to easily analyze the content of the failure from the occurrence of the failure and specify the suspected component. I have. The log function for hardware failure countermeasures includes a register dump function and an execution trace function.

It is desirable that such hardware log collection be performed as soon as possible from the time of occurrence of a failure. The reason is that as the contents of the register dump information and the trace information elapse from the point of occurrence of the failure, the state change progresses and the accuracy as a log is lost. The simplest and most effective method for transitioning the information processing device to the failure / stop state is to connect the processors to be logged with hardware / wired failure / stop signal lines, and use any of the processors. By activating this signal line when an abnormality is detected, the entire information processing apparatus is shifted to a failure / stop state.

With the development of multiprocessor technology in recent years, even in a tightly coupled device, an information processing device for operating a system by connecting tens to hundreds of processors to a main memory has appeared. In this type of information processing device,
A logical partitioning system that divides a processor group into necessary numbers according to the processing amount of business and operates one operating system for each processor group is employed. PR / SM, a trademark of IBM (International Business Machines), is famous for logical division.

In a logical partitioning system, a single fault
When the processors are connected by the stop signal line, depending on the degree of the failure, the logical partitions that are not required to be shut down and are normally operated are shut down. Therefore, when improving fault-tolerant performance in a normal logical partitioning system, a plurality of information processing devices are configured to have a hot standby configuration, and system operation is continued in a standby system even if a failure spreads to the entire active system. I am doing it. However, with the spread of information technology, the tasks handled by computers have become diversified, and there is an increasing need to operate a plurality of operating systems even in a small system configured with a single information processing device. On the other hand, there is an urgent need to improve the reliability of computers as mission-critical servers, such as 24-hour operation. In this sense, it is not just a hot-standby configuration that enhances the reliability of the information processing device itself. Emphasis has been placed on technology for improving the performance. FIG.
FIG. 2 is a block diagram illustrating a detailed configuration of a bus control unit according to the related art. The address data register 202 is connected to the address data bus 101 in the inter-processor bus 100, and exchanges data with another processor group and the main storage device 400. The command register 203 is connected to the address data bus 101 in the inter-processor bus 100, and stores an attribute of a main storage transaction when data is exchanged between another processor and the main storage device 400. The command decoder 204 is connected to some bit fields of the command register 203 and the address data register 202, decodes a main storage transaction, and instructs various operations of the bus control device according to a read instruction / write instruction and other control instructions. . The load / store control circuit 206 is connected to other functional units of the processor that perform operations and input / output via a processor address data signal line 208, and controls prefetch of instructions and data used by the processor, read / write order control, and cache. Perform control and the like. Interrupt control circuit 205
Is the fault / stop signal line 10 in the inter-processor bus 100.
2, and activates the failure / stop signal line 102 when the own processor detects a failure. Also, when a failure occurs in another processor, a function is provided for detecting the activation of the failure / stop signal line 102 to shift the own processor to the failure / stop state. The bus control unit and other functional units of the processor stop functioning when the stop instruction signal line 207 is activated by the interrupt control circuit 205.

[0008]

As described above, it is desirable that a failure in each logical partition does not spread to other partitions as much as possible under logical partitioning. However, if the method of connecting with the failure / stop signal lines is applied to the logical partitioning system, the failure / stop signal lines for the maximum number of partitions are required. In recent years, as the degree of integration of LSIs has increased, the amount of gates that can be incorporated in LSIs has been increasing at an accelerating rate. However, between the LSI and the board,
Or, in reality, the number of signal lines connecting the processors cannot keep up with the logical increase of the LSI due to physical mounting restrictions. From this point as well, it is necessary to reduce the number of signal lines that are not related to the normal logic, such as the signal lines for failure and stop.

An object of the present invention is to connect a plurality of processors to a main storage device by tight coupling, divide this processor group into a plurality of logical partitions in advance, and allocate an operating system to each logical partition for operation. An object of the present invention is to realize control for selectively transiting only a specific logical partition to a failure / stop state when a failure occurs without using a dedicated signal line.

[0010]

According to a control method of a logical partitioning system of the present invention, a plurality of processors are tightly coupled to a main storage device, and this processor group is divided into a plurality of logical partitions in advance. In a logical partitioning system that operates by allocating an operating system for each logical partition, a step of receiving a transaction in the main storage device, a partition number to be stopped and a processor provided in a failure stop command defined as one of the transactions A step of comparing the partition number to which it belongs in advance for each of them, and a step of processing the transaction until it does not affect a processor normally operating in a different partition when the comparison results match. Transitioning the processor to a fault / stop state. To have.

Further, in the control method of the logical partitioning system according to the present invention, the step of defining a broadcast partition number corresponding to all partition designations in the partition number to be stopped provided in the failure stop command, Each processor is characterized by comprising a step of, upon receiving a failure stop command in which a report section number is set, immediately transiting itself to the failure stop state regardless of the bus operation state at that time.

[0012]

Next, embodiments of the present invention will be described in detail with reference to the drawings.

As shown in FIG. 1, the present invention relates to arithmetic processors 200 to 240 and input / output processors 300 to 32.
0 is stored in the main storage device 40 via the inter-processor bus 100.
0 and are closely connected to each other.

The arithmetic processors 200, 210, 220 and the input / output processor 300 constitute one processor group called partition 1, and operate one operating system for this. The processor 230 and the input / output processor 310 constitute one processor group, and this is used as a partition 2 to operate an operating system different from that of the partition 1. Similarly, the arithmetic processor 240
And the input / output processor 320 constitute one processor group, and another operating system is operated in the partition 3. The main storage device 400 is a partition sharing device that is referred to by all arithmetic processors and input / output processors in the information processing device.

Next, referring to FIG.
The detailed configuration of the bus control unit incorporated in the input / output processors 300 to 320 and the input / output processors 300 to 320 will be described. Note that a description of a portion performing the same processing as that of the related art will be omitted.

The partition number storage register 201 includes arithmetic processors 200 to 240 and input / output processors 300 to
Reference numeral 320 denotes a register for holding a section number to which the section belongs.
The command decoder 204 is connected to some bit fields of the command register 203 and the address data register 202, decodes a main storage transaction, and instructs various operations of the bus control device according to a read command / write command and other control commands. The load / store control circuit 206 is connected to other functional units of the processor for performing calculations and input / output via a processor address data signal line 208,
It performs prefetch control of instructions and data used by the processor, read / write order control, cache control, and the like. The interrupt control circuit 205 is a command decoder 2
In response to an instruction from the CPU 04, control is performed to shift the own processor to a failure / stop state.

The operation in the case where a fault is detected by the arithmetic processor 210 and the other arithmetic processors 200 and 220 to 240, the input / output processors 300 to 320, and the main storage device 400 are set in the fault / stop state will be described as an example. I do.

In the case of an information processing apparatus that does not use the logical partitioning method, the fault / stop signal line 102 is used to shift the entire information processing apparatus to the fault / stop state by the simplest method.

In the prior art, the interrupt control circuit 205 of the arithmetic processor 210 that has detected a fault instructs an emergency stop of the information processing device using the fault / stop signal line 102, and interrupts the other processors and the main storage device. The control circuit 205 receives the signal. Stop instruction signal line 207
The basic operation was to stop the processor function using.

However, when such control is performed by the logical division method, for example, the arithmetic processors 200 to 220 shown in FIG.
And the input / output processor 300 and the input / output processor 300, when the fault propagation range of the arithmetic processor 210 is closed within the partition 1, the input / output processor 310, the input processor 310, and the input processor 310 of the partition 2 which need not be stopped originally
Arithmetic processor 240 and input / output processor 320
Will be stopped.

Therefore, in the present invention, the fault / stop signal line 10
2, a section number storage register 201 is provided. When a failure occurs in the processor 210 and it becomes necessary to shut down the operating system operating on the partition 1 as a system, the processor 210 transmits the failure of the partition 1 to which the own processor belongs via the inter-processor bus 100.・ Issue a stop command. The fault / stop command is defined as one of main storage transactions such as a memory read command and a memory write command, and is transferred via the address data bus 101. The fault / stop command has a partition number as an argument in a part of the bit field of the address / data section. Processor 2 that needs to block partition 1
Reference numeral 10 denotes a command code corresponding to the fault / stop instruction stored in the command register 203 in the address data register 20.
The block numbers corresponding to the block 1 are set in the block 2 respectively, and data is transmitted to the address data bus 101 to notify all the processors under the block 1 of the fault / stop command. The data transmitted to the inter-processor bus 100 also propagates to all processors in the information processing device and the main storage device 400.

An arithmetic processor 20 belonging to the same section 1
At 0, the command register 203 built in the bus control unit is set.
And the data is transferred to the command decoder 204. When the decoded main storage transaction is a failure / stop command, the command decoder 204 previously stores a part of the bit field data of the address data bus 101 taken into the address data register 202 and the partition number storage register 201. The interrupt control circuit 205 is compared with the section number to which the own processor belongs, and if they match, the interrupt control circuit 205 is instructed to stop the function of the processor by the stop instruction signal line 207.

At this time, it should be noted that the arithmetic processor 20
The point is that even if a function other than the bus control unit 0 is in a failure / stop state and stops operating, the bus control unit itself does not stop functioning immediately. The bus control unit of the arithmetic processor 200 terminates the access to the inter-processor bus 100 in a state where the bus operation is normal so as not to affect the other sections 2 and 3 that are operating normally on the system. After operating up to the point, it will be in the fault / stop state for the first time. More specifically, if the load / store control circuit 206 has not completed a write operation that affects other partitions, the load / store control circuit 206 continues the write operation, or a cache built into the processor uses a store-in method (write-back). In this case, a copy of the data in the main memory may be held in the cache, and an operation of flushing the contents of the cache to the main memory corresponds to this. .

On the other hand, the fault / stop instruction issued from the arithmetic processor 210 is also notified to the processors belonging to the sections 2 and 3 via the inter-processor bus 100. However, in these processors, the partition number storage register 2
01 and the command decoder 2
Since the partition numbers in the address data register 202 indicated by the failure / stop instruction decoded by the address 04 do not match, the subsequent operation is continued without performing any function with respect to the failure / stop instruction.

By the operations described above, the arithmetic processors 200, 210, 220 and the input / output processor 300 belonging to the partition 1 enter the arithmetic processor 230 and the input / output processor 310 belonging to the partition 2 and the arithmetic processor 240 belonging to the partition 3. Without disturbing the operation of the output processor 320, only the partition 1 can be selectively put into the failure / stop state.

Further, the present invention has a broadcast communication function for covering all the processors in the information processing apparatus into a failure / stop state.

This defines a broadcast partition number corresponding to all partitions as a value of a bit field representing a logical partition number of an address / data part specified by an argument in the failure / stop instruction. When a fault / stop command to which a report section number is set is received, the inter-processor bus 10
The present invention is characterized in that an emergency stop is performed to immediately bring the own processor into a failure / stop state irrespective of the operation state of 0.

The difference from the transition means to the failure / stop state limited to one section is that the failure / stop instruction is decoded by the command decoder 204 of the bus control unit incorporated in each processor and is taken into the address data register 202. If the data on the address data bus 101 is a broadcast section number corresponding to all the sections, the interrupt control circuit 205 does not care what the value of the logical section number storage register 201 is.
And instructs the processor to stop by the stop instruction signal line 207, the bus control unit itself stops the subsequent bus control operation. Specifically, this corresponds to an operation in which the operation of the load / store control circuit 206 is stopped at that point, and the operation is forcibly stopped even when data being read or being written remains in the load / store control circuit 206.

The reason why the broadcast section number is defined and the bus control unit itself is also stopped urgently is that if the failure is a problem relating to the entire information processing apparatus, the bus control unit itself may have a failure. .

The bus control unit in FIG. 2 uses the four command codes shown in FIG. 3 as main storage transactions. That is, a memory read instruction (command code 0), a memory write instruction (command code 1), a lock instruction (command code 2), and a failure / stop instruction (command code 3). The memory write instruction is transmitted to the arithmetic processors 200 to 240 and the input / output processors 300 to 32.
0 is an instruction used when writing data to the main storage device 400. The memory read instruction is executed by the arithmetic processors 200 to 240 and the input / output processors 300 to 3
20 is an instruction to be used when data is read from the main storage device 400 to the storage device 20. The lock instruction is an instruction used when exclusive control is required between the arithmetic processors 200 to 240. The failure / stop instruction is an instruction related to the present invention, and is used when the arithmetic processors 200 to 240 and the input / output processors 300 to 320 place the partition to which the own processor belongs or the entire information processing apparatus in the failure / stop state. .

The command code is composed of four bits and uses four signal lines of the address data bus 101. The command code is extracted from the address data bus 101 by the command register 203.

In FIG. 3, the main storage transaction uses 32 signal lines of the address data bus 101 as arguments corresponding to a command code when a command is requested. This argument is taken out of the address data bus 101 by the address data register 202. The 32 signal lines used as arguments are also used for sending addresses and transmitting / receiving data in memory read and memory write instructions.

All instructions are stored in the address data register 2
The identification number of the processor that has transmitted the instruction is stored in bits 0 to 7 of the argument indicated by 02. When the instruction is a lock instruction, bits 8 to 15 of the argument indicate a lock type,
In the case of a failure / stop command, the partition numbers to be in the failure / stop state are set in bits 8 to 15 of the argument.

The section number of the section to be stopped is set in the section number setting field (bits 8 to 15) of the argument used in the fault / stop instruction. In the case of the configuration of FIG.
The procedure is as follows. The partition number storage register 201 incorporated in the bus control unit of the arithmetic processors 200 to 220 and the input / output processor 300 stores 01 (hexadecimal,
The following H) is set. Similarly, 02 (H) is stored in the partition number storage registers 201 of the arithmetic processors 230 and the input / output processors 310 belonging to the partition 2, and is stored in the partition number storage registers 201 of the arithmetic processors 240 and the input / output processors 320 belonging to the partition 3. 03 (H) is set.

When a processor belonging to a partition in one of the partitions fails, and another processor belonging to the same partition enters a failure / stop state, the command register 20
3 is set to 3 (H), which is the command code of the fault / stop instruction, and bits 0 to 7 of the address data register 202 are provided with an identification number unique to the own processor, and bits 8 to
After setting the value of the section to which the self belongs in advance in the section number storage register 201 in 15, the main storage transaction is transmitted to the address data bus 101. Each processor receives the main storage transaction flowing through the address data bus 101 by the command register 203 and decodes the instruction by the command decoder 204. If the command code is 3 (H), which is the command code of the failure / stop instruction, Then, the partition number to be stopped given by bits 8 to 15 of the address data register 202 is compared with the value of the partition number setting register 201 of the own processor.
When the two match, the processor that has received the fault / stop instruction shifts itself to the fault / stop state.

On the other hand, the main storage device 400 used in common by each section decodes only a memory read instruction and a memory write instruction. Therefore, even if a failure / stop command is received by the bus control unit of the main storage device 400, the main storage device 400, which is a partition sharing device, does not enter the failure / stop state.

Next, the transition processing to the failure / stop state according to the present invention will be described. FIG. 4 is a timing diagram of one bus protocol used in describing the present invention. In this bus protocol, one main storage transaction is composed of eight stages. That is, a bus acquisition request stage REQ (request), a bus arbitration stage ARB (arbitration), a bus synchronization stage SYN (syncronize), an address transmission stage ADR (address), a transaction cancellation stage CAN (cancel), and the first half of data. 2 transmission stage DT0 (data0), data transmission stage DT1 (data1) of the latter half of data, detection report stage CHK (check
k).

Even if these stages share the same hardware resource in some cases, there is no operational problem, so that the next transaction can be executed from the middle of one main storage transaction (pipeline structure). ). For example, a request REQ for acquiring a bus of stage 1
In FIG. 2, an argument is stored in the address data register 202 of FIG. 2. However, in the address transmission ADR of stage 4, address information for main memory access is stored in the address data register 202. In addition, since there is a time difference before the main storage device 400 prepares the write or read data for the address information, the first half of the write / read data in the stage 6 data transmission DT0 after two stages is: In the subsequent stage 7 data transmission DT1, the latter half of the write / read data is stored in the address data register 202.

At this time, even if the stages 1 to 4 and the stages 5 to 8 are overlapped and executed, the address data register 20
2 does not use the address data register 202 at the same timing, the argument of stage 1, the address of stage 4, and the data of stages 6 and 7 are not used.
There is no problem if the next main storage transaction starts to be executed when the stages 1 to 4 are completed. Stages that can be superimposed in this way are called states, and the unit of state repetition is called a cycle.

In the case of the bus protocol shown in FIG. 4, state A is the REQ of stage 1 and CAN of stage 5, and state B is the ARB of stage 2 and DT0 of stage 6.
And state C is SYN of stage 3 and D of stage 7
At T1, state D is ADR of stage 4 and stage 8
CHK, and four states A to D form one bus cycle.

FIG. 5 shows an arithmetic processor 2 belonging to the section 1.
FIG. 10 is a timing chart showing a bus operation from when a failure is detected at 10 to when an arithmetic processor 200 belonging to the same partition 1 shifts to a failure / stop state.

The arithmetic processor 200 is executing a memory write instruction to the main storage device 400 in accordance with the transaction 1. At this time, it is assumed that the arithmetic processor 210 detects a failure during the stages 1 to 4 and the partition 1 needs to be shifted to the failure / stop state. Arithmetic processor 21
0 indicates a failure in transaction 2 starting from stage 5.
Execute the stop instruction.

When making a bus acquisition request in stage 5, the arithmetic processor 210 sends a command code of a fault / stop command and a partition number to be stopped to the address data bus 101. The arithmetic processors 200 belonging to the same partition recognize that the instruction sent to the processor bus is a fault / stop instruction, and as a result of comparison with their own partition numbers, they are the same. Stop the function of parts other than the control unit. At this time, since the bus control unit of the arithmetic processor 200 is executing the memory write instruction of the preceding main memory transaction 1, if the bus operation is stopped during the execution of the transaction, the write data of the subsequent stages 6 and 7 will be lost. It becomes undefined and data in the main memory is corrupted.

In the first embodiment of the present invention, after the bus operation of the arithmetic processor 200 is executed to the point where the other partitions are not affected, the operation is performed so as to shift to the failure / stop state. Therefore, the arithmetic processor 200 executes the main storage transaction 1 up to the SYN of the stage 8, and shifts the bus control unit to the failure stop state for the first time. As a result, the data in the main storage device 400 is maintained in a normal state without data corruption.

On the other hand, the arithmetic processor 2 belonging to the section 2
30 is about to execute a memory read instruction in main storage transaction 3 starting at stage 9. The arithmetic processor 230 also decodes the fault / stop instruction requested by the arithmetic processor 210 of stage 5, which decodes the partition number stored in the address data register 202 as an argument of the fault / stop instruction and the arithmetic processor 2
Since the values of the 30 logical partition number storage registers 201 do not match, a failure / stop state does not occur, and the memory read instruction is normally executed from the stage 9.

Further, even if the arithmetic processor 230 reads data from the main memory address to which the arithmetic processor 200 has written by the transaction 1, the transaction 1 has been completed normally.
The operation of the partition 2 is continued in a normal state without causing the data read by the arithmetic processor 230 to be garbled.

In the second embodiment of the present invention, the address data register 2 used as an argument of the fault / stop instruction
As the section number set in the section number setting field 02, a broadcast section number for the purpose of stopping all processors in the information processing apparatus is defined. In the example shown in FIG. 3, this corresponds to the case where FF (H) is set in bits 8 to 15 of the arguments of the fault / stop instruction passed on the address data bus.

The operation leading to the failure / stop state will be described in detail. If any of the processors fails and the failure affects the entire information processing apparatus, the failed processor is set in the command register 203. Is set to 3 (H) which is the command code of the fault / stop instruction, an identification number unique to the own processor is set in bits 0 to 7 of the address data register 202, and a partition number indicating a broadcast notification is set in bits 8 to 15. After setting a certain FF (H), a main storage transaction is transmitted to the address data bus 101.

On the other hand, in the processor that receives the main storage transaction, the main storage transaction flowing through the address data bus 101 is received by the command register 203, and when the command is decoded by the command decoder 204, it is the command code of the failure / stop instruction. If it is 3 (H), bits 8 to 8 of the address data register 202
Reference is made to the target section number given at 15. At this time, the arithmetic processors 200 to 240 and the input / output processors 300 to
In the section number storage register 201 built in 320, the section number to which the self belongs is set in advance, but the target section number given by the address data register 202 is FF (H) indicating a broadcast notification. If there is, the own processor is transited to the failure / stop state without comparing with the value of the partition number setting register 201.

Next, the operation of the second embodiment of the present invention will be described in detail.

FIG. 6 shows an arithmetic processor 2 belonging to the section 1.
10 is a timing chart showing a bus operation up to a transition to a failure / stop state when a failure is detected in 10 and it is determined that the range in which the failure state propagates extends to the entire information processing apparatus.

Assume that arithmetic processor 200 is executing a memory write instruction to main storage device 400 in accordance with transaction 1. At this time, it is assumed that the arithmetic processor 210 detects a failure during the stages 1 to 4, and it becomes necessary to shift the entire information processing apparatus to the failure / stop state. The arithmetic processor 210 executes the failure / stop instruction in transaction 2 starting from stage 5.

When making a bus acquisition request in stage 5, the arithmetic processor 210 sends a command code of a fault / stop command and a broadcast section number representing all sections to the address data bus 101. On the other hand, the arithmetic processor 200 fetches the instruction sent to the address data bus 101 into the command register 203 in stage 5 and decodes the instruction with the command decoder 204 to recognize that the instruction is a failure / stop instruction. At this time, when the broadcast section number is set in the section number to be stopped, which is extracted from the address data register 202 by the arithmetic processor 200, the entire processor including the bus control unit is immediately switched to the fault / stop state. Work to do.

Referring to FIG. 6, arithmetic processor 200
Is the transaction cancellation processing CA in stage 5
N is being executed, but since a failure / stop command by the broadcast section number has been received, the first half of the subsequent stage 6 data has been received.
Of the arithmetic processor 200 including the bus control without executing the three processes of the transmission process DT0, the transmission process DT1 of the latter half of the stage 7 data, and the detection reporting process CHK of the stage 8 parity error and the like. .

The arithmetic processor 2 belonging to the section 2
30 is also about to execute a memory read instruction in main storage transaction 3 starting at stage 9, but here too,
In the same manner as in the above, the failure / stop instruction is decoded, and the partition number given as an argument of the failure / stop instruction is a broadcast partition number. Therefore, the function of the arithmetic processor 230 can be performed without executing the memory read instruction from the stage 9. To stop.

According to the second embodiment of the present invention, the state information of the bus control section can be held at a timing three stages earlier than in the first embodiment.

[0057]

A first effect of the present invention is that, in an information processing apparatus of a logical partitioning system, a mechanism for selectively shifting only a specific logical partition to a failure / stop state uses a dedicated signal line. That is, it can be realized with a small number of hardware signal lines without performing. This is because the fault / stop instruction of the processor is defined as one of the main storage transactions instead of the fault / stop instruction signal line, and the fault / stop is performed using the address data bus normally provided in the processor.
This is because the stop state is notified and the number of signal lines for failure / stop instructions can be reduced.

The second effect is that, when a fault relating to the entire information processing apparatus occurs, it is possible to shift to the fault / stop state earlier than the time of the fault occurrence, so that a register useful for fault analysis can be obtained. It is to prevent the destruction of information and trace information from progressing. This is to stop the bus control operation by using a partition number for causing a specific partition to fail or stop as an argument of a failure / stop instruction and a broadcast section number for causing all partitions to fail or stop. It depends on the selection of timing.

[Brief description of the drawings]

FIG. 1 is a configuration diagram illustrating an embodiment of the present invention.

FIG. 2 is a detailed block diagram of a bus control unit representing an embodiment of the present invention.

FIG. 3 is an instruction system diagram of a main storage transaction representing an embodiment of the present invention.

FIG. 4 is a timing diagram of a bus protocol representing an embodiment of the present invention.

FIG. 5 is a timing chart illustrating a bus operation according to the first embodiment of the present invention.

FIG. 6 is a timing chart illustrating a bus operation according to the second embodiment of the present invention.

FIG. 7 is a detailed block diagram of a bus control unit representing an embodiment of the related art.

[Explanation of symbols]

REFERENCE SIGNS LIST 100 Bus between processors 200, 210, 220, 230, 240 Arithmetic processor 300, 310, 320 I / O processor 400 Main storage device 101 Address data bus 102 Fault / stop signal line 201 Logical partition number storage register 202 Address data register 203 Command register 204 Command decoder 205 Interrupt control circuit 206 Load / store control circuit 207 Processor stop instruction signal line 208 Processor address data signal line

Claims (2)

[Claims]
1. A logical partitioning system in which a plurality of processors are tightly coupled to a main storage device, the processor group is divided into a plurality of logical partitions in advance, and an operating system is assigned to each logical partition for operation. Receiving the transaction in the main storage device, and comparing a partition number to be stopped provided in a failure stop command defined as one of the transactions with a partition number to which the processor belongs, which is set in advance for each processor. And processing the transaction to a state that does not affect a processor that is operating normally in a different partition when the comparison result matches, and causing the own processor to transition to a failure / stop state. A control method for a logical partitioning system, characterized by the following.
2. A step of defining a broadcast section number corresponding to all partition designations in a section number to be stopped provided in the fault stop instruction, and a fault stop instruction in which the broadcast section number is set. 2. The method according to claim 1, further comprising the step of, upon receiving the command, causing each processor to immediately shift itself to the failure stop state regardless of the bus operation state at that time.
JP2001027757A 2001-02-05 2001-02-05 Control method of logical partitioning system Active JP3746957B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2001027757A JP3746957B2 (en) 2001-02-05 2001-02-05 Control method of logical partitioning system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2001027757A JP3746957B2 (en) 2001-02-05 2001-02-05 Control method of logical partitioning system

Publications (2)

Publication Number Publication Date
JP2002229811A true JP2002229811A (en) 2002-08-16
JP3746957B2 JP3746957B2 (en) 2006-02-22

Family

ID=18892362

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2001027757A Active JP3746957B2 (en) 2001-02-05 2001-02-05 Control method of logical partitioning system

Country Status (1)

Country Link
JP (1) JP3746957B2 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006285810A (en) * 2005-04-04 2006-10-19 Hitachi Ltd Cluster configuration computer system and system reset method therefor
WO2007096999A1 (en) * 2006-02-24 2007-08-30 Fujitsu Limited Separating device and separating method
WO2007099606A1 (en) * 2006-02-28 2007-09-07 Fujitsu Limited Processor control method
WO2008120383A1 (en) * 2007-03-29 2008-10-09 Fujitsu Limited Information processor and fault processing method
WO2009147716A1 (en) * 2008-06-02 2009-12-10 富士通株式会社 Data processing system, data processing method, and data processing program

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4529767B2 (en) * 2005-04-04 2010-08-25 株式会社日立製作所 Cluster configuration computer system and system reset method thereof
JP2006285810A (en) * 2005-04-04 2006-10-19 Hitachi Ltd Cluster configuration computer system and system reset method therefor
WO2007096999A1 (en) * 2006-02-24 2007-08-30 Fujitsu Limited Separating device and separating method
JPWO2007096999A1 (en) * 2006-02-24 2009-07-09 富士通株式会社 Separation device and method
JP4576453B2 (en) * 2006-02-24 2010-11-10 富士通株式会社 Cache control circuit, information processing apparatus, and disconnection method
JP4653838B2 (en) * 2006-02-28 2011-03-16 富士通株式会社 Arithmetic processing device, control method of arithmetic processing device, and control program
JPWO2007099606A1 (en) * 2006-02-28 2009-07-16 富士通株式会社 Processor control method
US8060778B2 (en) 2006-02-28 2011-11-15 Fujitsu Limited Processor controller, processor control method, storage medium, and external controller
WO2007099606A1 (en) * 2006-02-28 2007-09-07 Fujitsu Limited Processor control method
JPWO2008120383A1 (en) * 2007-03-29 2010-07-15 富士通株式会社 Information processing apparatus and failure processing method
JP4495248B2 (en) * 2007-03-29 2010-06-30 富士通株式会社 Information processing apparatus and failure processing method
WO2008120383A1 (en) * 2007-03-29 2008-10-09 Fujitsu Limited Information processor and fault processing method
US7930599B2 (en) 2007-03-29 2011-04-19 Fujitsu Limited Information processing apparatus and fault processing method
WO2009147716A1 (en) * 2008-06-02 2009-12-10 富士通株式会社 Data processing system, data processing method, and data processing program
JP5212471B2 (en) * 2008-06-02 2013-06-19 富士通株式会社 Data processing system, data processing method, and data processing program
US8806276B2 (en) 2008-06-02 2014-08-12 Fujitsu Limited Control system for driving a data processing apparatus

Also Published As

Publication number Publication date
JP3746957B2 (en) 2006-02-22

Similar Documents

Publication Publication Date Title
Schuette et al. Processor control flow monitoring using signatured instruction streams
US5388242A (en) Multiprocessor system with each processor executing the same instruction sequence and hierarchical memory providing on demand page swapping
US5600784A (en) Fault resilient/fault tolerant computing
US5491788A (en) Method of booting a multiprocessor computer where execution is transferring from a first processor to a second processor based on the first processor having had a critical error
US5239641A (en) Method and apparatus for synchronizing a plurality of processors
EP0380858B1 (en) Method and apparatus for detecting and correcting errors in a pipelined computer system
US6598178B1 (en) Peripheral breakpoint signaler
US8234521B2 (en) Systems and methods for maintaining lock step operation
TWI236620B (en) On-die mechanism for high-reliability processor
US5890003A (en) Interrupts between asynchronously operating CPUs in fault tolerant computer system
US8370841B2 (en) Optimizing deterministic event record and replay operations
US7698594B2 (en) Reconfigurable processor and reconfiguration method executed by the reconfigurable processor
US7757237B2 (en) Synchronization of threads in a multithreaded computer program
US4751639A (en) Virtual command rollback in a fault tolerant data processing system
US5253359A (en) Control and maintenance subsystem network for use with a multiprocessor computer system
US7003691B2 (en) Method and apparatus for seeding differences in lock-stepped processors
US4849979A (en) Fault tolerant computer architecture
US7321989B2 (en) Simultaneously multithreaded processing and single event failure detection method
Bernick et al. NonStop/spl reg/advanced architecture
US5491793A (en) Debug support in a processor chip
EP0528585B1 (en) Data processing system with internal instruction cache
JP3701814B2 (en) Multiprocessor system and system serialization method thereof
US6530076B1 (en) Data processing system processor dynamic selection of internal signal tracing
US6754856B2 (en) Memory access debug facility
US7191292B2 (en) Logging of level-two cache transactions into banks of the level-two cache for system rollback

Legal Events

Date Code Title Description
A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20040729

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20050222

RD01 Notification of change of attorney

Free format text: JAPANESE INTERMEDIATE CODE: A7421

Effective date: 20050307

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20050425

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20051108

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20051125

R150 Certificate of patent or registration of utility model

Free format text: JAPANESE INTERMEDIATE CODE: R150

Ref document number: 3746957

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20081202

Year of fee payment: 3

S531 Written request for registration of change of domicile

Free format text: JAPANESE INTERMEDIATE CODE: R313531

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20081202

Year of fee payment: 3

R350 Written notification of registration of transfer

Free format text: JAPANESE INTERMEDIATE CODE: R350

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20091202

Year of fee payment: 4

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20091202

Year of fee payment: 4

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20101202

Year of fee payment: 5

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20111202

Year of fee payment: 6

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20111202

Year of fee payment: 6

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20121202

Year of fee payment: 7

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20121202

Year of fee payment: 7

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20131202

Year of fee payment: 8

S111 Request for change of ownership or part of ownership

Free format text: JAPANESE INTERMEDIATE CODE: R313111

R350 Written notification of registration of transfer

Free format text: JAPANESE INTERMEDIATE CODE: R350