GB2393817A - A computer system comprising a host processor and a service processor - Google Patents

A computer system comprising a host processor and a service processor Download PDF

Info

Publication number
GB2393817A
GB2393817A GB0318830A GB0318830A GB2393817A GB 2393817 A GB2393817 A GB 2393817A GB 0318830 A GB0318830 A GB 0318830A GB 0318830 A GB0318830 A GB 0318830A GB 2393817 A GB2393817 A GB 2393817A
Authority
GB
United Kingdom
Prior art keywords
service processor
processor
management
signal
service
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB0318830A
Other versions
GB0318830D0 (en
GB2393817B (en
Inventor
James Edward King
Rhod James Jones
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Microsystems Inc
Original Assignee
Sun Microsystems Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US10/216,537 external-priority patent/US6813150B2/en
Priority claimed from US10/216,541 external-priority patent/US7424555B2/en
Priority claimed from US10/216,536 external-priority patent/US6954358B2/en
Application filed by Sun Microsystems Inc filed Critical Sun Microsystems Inc
Priority to GB0513331A priority Critical patent/GB2412772B/en
Priority to GB0513332A priority patent/GB2412773B/en
Publication of GB0318830D0 publication Critical patent/GB0318830D0/en
Publication of GB2393817A publication Critical patent/GB2393817A/en
Application granted granted Critical
Publication of GB2393817B publication Critical patent/GB2393817B/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/1417Boot up procedures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/16Constructional details or arrangements
    • G06F1/20Cooling means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3058Monitoring arrangements for monitoring environmental properties or parameters of the computing system or of the computing system component, e.g. monitoring of power, currents, temperature, humidity, position, vibrations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/80Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors

Abstract

A computer system comprises: <SL> <LI>(i) a host processor (4,6 Figure 2); and <LI>(ii) a service processor (50) for providing system management functions within the computer system. The service processor provides a signal indicative of normal operation thereof for example a voltage on a control line (82). </SL> The system includes a device that can operate under a first condition in which it is controlled in dependence of the signal from the service processor, and under a second condition, different from the first condition, when no such signal is generated by the service processor. The device may be a multiplexer (86) for routing data and commands via the service processor during normal operation and bypassing the service processor when it is detected to not be operating normally, or it may be a fan controller (58) which controls fan(s)28 speed, and drives the fans to maximum if no signal is received from the service processor. Alternatively the device may be an interface (71) for a communication port (70) that quiesces the port when no signal is received.

Description

COMPUTER ASSEMBLY
5 BACKGROUND OF THE INVENTION
This invention relates to computer systems, and especially to computer systems that are employed as servers. The systems may for instance be employed as servers for example in local area networks (LANs) or in wide area networks (WANs), telecommunications systems or other operations such as database management or as internet 15 servers. Such servers may be used in so-called "horizontally scaled" applications in which tens or hundreds of corresponding servers are employed as part of a distributed system.
20 A typical computer employed for such purposes will comprise a pair of processors mounted on a motherboard, together with power supply units (PSUs), and other components such as hard disc drives (HDDs), fans, digital video disc (DVD) players, memory modules ethernet ports 25 etc. One or more of the processors, the host
processor(s), provides the main functions of the server, and may communicate with a number of peripheral components, including communication ports, optionally via peripheral component interconnect (PCI) bridges in order 5 to provide server operation. One of those peripheral components, called the "South Bridge" further allows the host processors to communicate with internal devices via . serial interfaces one of which transports the console interface of the processors.
In addition to the host processor(s), the system may include another processor, called the service processor or the remote management controller (RMC), which provides management functions for the system assembly. Such 15 functions may include environmental monitoring, temperature monitoring of the enclosure, fan speed control, data logging and the like.
Conventionally, some means has been needed to control 20 communication between the user and the host processor and the service processor. In one early design of server, the service processor and the console interface of the host processor were connected to a user interface, such as a serial port, by means of a custom programmable logic 25 device. The service processor would control the logic
device and any communication between the service processor and the user interface would take precedence over communications with the console interface so that any data generated by the console when the service 5 processor was being operated using its command line interface (CLI) would be lost. In addition, the logic device was not designed to handle a malfunction of the service processor, with the result that if the service processor were to malfunction when its CLI was being 10 used, it would no longer be possible to communicate with the console interface.
An improvement in that system has been to employ a dual ported service processor so that all data between the 15 console interface and the user interface is routed through the service processor. The service processor can then decide whether the data relates to the management mode of operation, in which case the data could be processed by the service processor, or whether it relates 20 to console mode, in which case it could be routed to the console interface. In this system, if data were generated at the console interface while the system was in management mode, the data would not be lost but would be stored in memory associated with the service 25 processor. However, because all data to or from the
console is routed via the service processor, any malfunction of the service processor would prevent further communication between the user and the host processor until replacement of the service processor 5 (although that would not necessarily prevent operation of the host processor). Depending on the location of the network server, replacement of the service processor may take days or even weeks.
10 SUMMARY OF THE INVENTION
According to one aspect of the present invention, a computer system comprises: 15 (i) a host processor; (ii) a service processor for providing system management functions within the computer system, the service processor further providing a signal indicative of normal 20 operation thereof; and (iii) a device that can operate under two conditions. Under a first condition the device may be controlled in 25 dependence of the signal from the service processor.
Under a second condition, different from the first condition, when no such signal is generated by the service processor, the device may operate differently.
5 The computer system may for example, include a user interface for receiving external commands and data for the service processor and/or the host processor, and for sending data from the service processor and/or the host processor. In such a system, 10 the device may be operative under the first condition to route commands and data between the user interface and the host processor via the service processor, and under the second condition, the device may be operative to route commands and data between the user interface and 15 the host processor bypassing the service processor.
The service processor may be of the type that is responsive to external mode switching commands to operate either in a management mode or in a console mode. In the 20 management mode commands and/or data received are processed by the service processor, while in the console mode they may be passed to a console interface for onward transfer to the host processor.
25 The system according to this aspect has the advantage
that, under normal operation, all commands and data can be routed to the service processor which will decide whether the data relate to a system management function, or to a console function. If the data or commands relate 5 to a console function, they can be routed to the console interface, for example via the device again. However, should the service processor fail for any reason, communication between the user interface and the host processor is not lost, but instead is automatically 10 routed so that it bypasses the service processor.
The device may be controlled in any of a number of ways.
In one embodiment the device defaults to bypassing the service processor so that, if no control signal is 15 received for whatever reason, communication with the console interface is maintained. The control signal may, for example, simply be a voltage level that is set by the service processor. In such a case, a pull-up or pull down resistor may be provided between the signal line and 20 either a voltage rail or earth, so that, if no voltage is received from the service processor, the control voltage for the device will rise to the appropriate voltage rail or will fall to ground. The system may, for example, include a bus that extends between the user 25 interface and the console interface, the bus including
a switch, for example in the form of a FET, whose gate is connected to the voltage level output of the service processor. In this design, the service processor holds the gate voltage to ground, thereby turning the switch 5 off, but on failure of the service processor, the gate voltage will rise due to the pull-up resistor and the switch will be turned on.
The switch may form part of a multiplexer, for example 10 one based on complementary metal oxide semiconductor (CMOS) technology, in which a number of CMOS switches are held open by means of the signal from the service processor applied to the gates of the CMOS switches.
Alternatively, the multiplexer may be formed from a 15 custom programmable logic device (CPLD).
The components may, for example be provided as a sub assembly comprising the service processor, a user interface and a device for routing the commands and data 20 to and from the user interface either directly or via the service processor.
According to this aspect of the invention, the system may be operated by a method comprising sending external 25 commands to the service processor and receiving data from
the service processor via a user interface. The commands and data may be routed to and from the user interface via the service processor on receipt of a signal from the service processor so that, in the absence of the signal, 5 the commands and data are sent between the user interface and the console interface bypassing the service processor. Fan speed is controlled by the service processor in order to to minimise the amount of vibration and noise in the neighbourhood of the equipment, and, more importantly, in order to increase the life of the fans. With proper fan speed control, it is possible to extend the life of the fans by an order of magnitude or more, so that the 15 fan lifetime is generally equivalent to that of the computer system. This is advantageous in the case of those systems in which it may not be possible to change the fans without shutting the system down, since any change of fans will be associated with downtime of the 20 system. In addition, in order to reduce the amount of downtime of the system, it is desirable to enable the system to continue to function in the event of a failure of the 25 service processor for whatever reason. However, when the
service processor has malfunctioned, all system management functions that are provided by the service processor are lost.
5 Accordingly, the computer system may include a temperature sensor for sensing temperature in or in relation to an enclosure of the system; and one or more fans for cooling the enclosure. In this form of system the said device may comprise a fan controller for 10 generating a driving signal for the fan or at least one of the fans in response to the fan speed signals generated by the service processor when the fan controller receives the signal from the service processor indicating normal operation. In the absence of the 15 control signal from the service processor, the fan controller operate in a second condition in which it will alter the driving signal to increase the fan speed to a predetermined fan speed.
20 Thus, the system according to the invention has the advantage that, under normal circumstances, the system will operate with dynamic control of the fans, but, should the system management functions be lost due to a malfunction of the service controller, the fan speed will 25 be increased in order to ensure adequate cooling of the
system enclosure.
The control signal may, for example, simply be a voltage level that is generated on a control line from the 5 service processor as described above. Alternatively a switch, for example a solid state switch, may be provided that is controlled by the control signal from the service processor and whose output is sent to the fan controller.
10 Although the fans may be driven to any speed that will provide an adequate margin of safety from thermal damage to the equipment, the fan controller may, for example, increase the fan speed to the maximum that can be driven by the fan controller. For example, the fan controller 15 may provide a fan driving signal as a pulse-width modulated (pwm) signal with a pulse width varying between, for example, O (i.e. off) and 100% (maximum speed), in which case, the fan controller may output a 100% pulse width, i.e. a constant d.c. voltage, when no 20 control signal is received from the service processor.
The service processor may communicate with the host processor or with one of them, and may also have one or more external communication ports so that a user or 25 network administrator can communicate with the service
processor, or can communicate with the host processor(s) via the service processor. For example, the service processor may have its own ethernet network port for direct communication to the network administrator.
Such ethernet network ports, whether communicating with the service processor or the host processor(s) will normally need a physical interface (PHY) in order to clean the signals and to provide power for driving the 10 signals along the ethernet cabling, clock timing, line coding etc. The signals will then typically be sent to the ethernet cabling via a standard network port, for example an RJ45 port which will accept an eight line cable and is provided with a pair of light emitting diode 15 (LED) indicators, one for indicating the existence of a link, and the other for indicating the existence of traffic on the line.
If there is any malfunction of the service processor 20 whether due to hardware or software faults, the system is designed to continue to operate as indicated above, at least as far as the provision of services provided by the host processor are concerned, although clearly system management services will no longer be available until the 25 service processor is replaced in the event of a hardware
failure. Thus, the functioning of the server should be largely unaffected by any failure of the service processor. 5 However, even though the service processor has stopped functioning, power will still be sent to all the ethernet interfaces including the service processor ethernet interface. While this will not matter as far as the ethernet interfaces handled by the host processors are 10 concerned because those interfaces will still be controlled by their associated media access controllers (MACs) and host processors, no such control is exerted on the management interface controlled by the service processor. Thus, internal lines in the system extending 15 between the service processor and the management PHY may be susceptible to interference from any active components in the system, and in particular from the host processor(s). This interference will then be amplified and line coded by the management PHY before being sent 20 to the ethernet lines. Visual inspection of the RJ45 management port will give the appearance that the server is functioning correctly because the LEDs will be on, indicating traffic on the line, even though this traffic is simply interference, and the server will appear to 25 accept data from the service administrator because the
RJ45 port is still operational.
This problem may be overcome by controlling the management communication device by a signal from the 5 service processor. Thus, according to yet another aspect of the present invention the computer system which may include one or more external communication ports including at least one management communication port that communicates with the service processor.
The management communication port may be controlled by the signal from the service processor and be operative to send and receive data when it receives the signal from the service processor indicating normal operation.
The system according to this aspect of the invention has the advantage that, in the event of a malfunction of the service processor, the system will still function for its intended purpose, other than to allow system management 20 operations, but interference that is internally generated will not be put onto the external communication lines by the management communication device. For example, where the system operates as a network server, interference will not be put onto the network.
The signal according to this aspect may, for example, simply be a voltage level that is supplied by the service processor as described above.
5 The management communication port may include a physical interface for providing line power, line coding and the like, in which case it may be provided with a reset input. The signal from the service processor may thus be sent to the reset input of the physical interface, 10 after inverting if necessary depending on the reset input of the physical interface, in order to cause the physical interface to become inoperative if the service processor malfunctions. 15 The management communication port need not be the only external communication device in the computer system, and additional external communication devices, for example ethernet ports, may be provided that are controlled by the host processor(s). These devices will not be 20 affected by a malfunction of the service processor, and will continue to operate as normal. There will not normally be a danger of such devices placing interference on the network because they will continue to be controlled by their processors and by their associated 25 hardware such as the media access controllers.
According to another aspect of the invention, a network may include a computer system according to the invention together with a network administrator.
5 The network could be a private network, or it could be a public network. Where the network is a public network, the public will only have access to the information on the lines other than the network management line supplied by the management communication device. Accordingly, the 10 public will not be aware of any malfunction of the service processor of the network server. This will only be apparent to the network administrator, who will attend to resolving the problem.
15 Thus, according to yet another aspect of the invention, there is provided a method of operating such a network.
The method comprises monitoring traffic from the management communication device and effecting repair or replacement of the service processor or of the server in 20 the event of loss of traffic from the management communication device.
BRIEF DESCRIPTION OF THE DRAWINGS
25 Embodiments of the present invention will now be
described in detail by way of example with reference to the accompanying drawings, in which corresponding parts are given like reference numbers. In the drawings: 5Figure 1 is a physical plan view of one form of computer system according to the present invention; Figure 2 is a schematic block diagram showing the 10system architecture of the system of figure 1; Figure 3 is a schematic diagram showing the service processor employed in the present invention together with some peripheral components; Figure 4 is a schematic diagram showing the service processor and the interconnection to certain peripheral components; 20Figure 5 is a schematic diagram showing connection of the computer system to a network; Figure 6 is a schematic diagram showing the service processor, the console interface, user 25interface and multiplexer without other
peripheral components shown in figure 4; Figure 7 is a flow diagram of a system power-up; and Figure 8 is a graph of fan speed against temperature.
5 DESCRIPTION OF PARTICULAR EMBODIMENTS
Referring now to the drawings, in which like reference numerals are used to designate corresponding elements, figure 1 shows a physical plan view of a narrow form factor computer that is intended to provide a rack 10 mounted server for use with the internet or as part of a local area network (LAN) or for other telecommunications purposes, and is designed to fit into, for example a nineteen inch rack electronics cabinet.
Other sizes may alternatively be employed, for example 15 to fit into 23 inch or metric racks. The assembly may be designed to be a so-called high "RAS" system, that is to say, to have high reliability, availability and serviceability. As such, it is intended that the system will be operated with the minimum amount of down time.
The computer comprises an enclosure 1 that contains a motherboard 2 in the form of a printed circuit board (PCB) designed in a custom form- factor to fit the enclosure 1 and chosen to minimise the cabling wishing 25 the enclosure. The motherboard 2 carries the majority
of circuitry within the computer. On the motherboard are mounted one or more (in this case two) host processors or central processing units (CPUs) each of which is provided with its own dedicated cooling in the form of 5 an impingement fan that clips onto the CPU socket. Each processor 4, 6 is provided with its own dedicated block of memory 7, 8, for example provided in the form of one or two banks of dual in-line memory modules (DIMMs) with a total of 256MB to 16GB block capacity although other 10 forms and sizes may be used.
A hardware cryptographic module (HCM) 10 may also be located on the motherboard. The HCM may be provided on a mezzanine card which plugs directly into the 15 motherboard, and contains a co-processor providing cryptographic protocol acceleration support for security algorithms used in private community applications.
Two hard disc drives (HDDs) 12 and 14 are located at the 20 front of the computer behind the front bezel 16. The drives are hot-pluggable and are accessible by removal of the bezel and EMI shield 18. Two internal HDDs plug directly into the motherboard via right-angled connectors located on the front edge of the motherboard 2.
Next to the HDDS iS arranged a system configuration card (SCCR) reader 20 that is able to read a system configuration card (SCC) 22 inserted therein. The SCC contains all relevant information concerning the 5 computer, so that it is possible to replace one computer with another simply by inserting the original SCC into the new computer and replacing the hard disc drives with those of the original computer.
10 A removable media drive bay is provided to allow optional fitting of a slimline (notebook style) digital video disc or digital versatile disc ( DVD) drive 24 for reading CD and DVD media. The media transport loader is accessible through a slot in the enclosure bezel 16.
One or two 320W or 400W custom power supply units (PSUs) 26 are also provided. In addition to the dedicated CPU fans, the assembly is cooled by means of a row of fans 28 mounted between the motherboard and the media drive 20 bays. The computer supports input/output (I/O) expansion by means of peripheral component interconnect (PCI) cards that plug into expansion slots. These are accommodated 25 by means of riser cards 29 that plug directly into the
motherboard 2.
A number of I/O interfaces and sockets 30 are provided along the rear surface of the enclosure 1 including four 5 ethernet ports 30, a network management ethernet port 70, and a serial port 72. The network management ethernet port 70 and the serial port 72 allow user access to the service processor and system console.
10 Figure 2 is a schematic representation of the system architecture of the computer system according to the invention. Two host processors or CPUs 4 and 6 available from Sun ileum) 15 Microsystems under the name UltraSPARCIIIi have an integer execution unit, a floating point and graphics unit, 32kB level 1 instruction cache, 64kB level 1 data cache, 1MB (256k x 32) level 2 data cache, a memory controller with error correction code (ECC) and an 20 interface controller for the processor bus. Four DIMM sockets 7 and 8 are associated with each CPU.
The CPUs 4, 6 are connected to two PCI bridge 40, 42 which provide interfaces to independent 64 bit PCI buses 25 leading to various peripheral components such as the
riser cards 28, HDDs 12 and 14, the HCM 10 etc. The PCI bridge 40 is also connected to a PCI I/O device 44 available from Acer Labs under the code M1535D+ also referred to as South BridgeThis is an integrated PCI 5 sub system which provides an integrated drive electronics (IDE) controller, a universal serial bus (USB) controller, independent universal asynchronous receiver/transmitters (UARTs), XBUS bridge and a power management controller. The PCI I/O device 44 also 10 provides the console interface for enabling user access to the host processors 4 and 6.
A service processor or remote management controller (RMC) 50 is included for providing local and remote management 15 services. Such services may include one or more of the following system functions: 1) power management control, 2) environmental monitoring, 3) enclosure management and event logging 20 4) fan control, 5) voltage rail monitoring, and 6) system status monitoring.
Other service functions may be included if desired. The 25 service processor is also responsible for monitoring and
reporting the operational status of the system. The processor operates from the +5V rail and is capable of power cycling and resetting of the host system. It is (kit; based on an MPC850 PowerPC design with dedicated flash 5 ROM 62 and synchronous dynamic RAM (SDRAM) 64.
Peripheral devices that are required for the management functions, include the system configuration card reader (SCCR) 20, PCI clock generator 52, general purpose IO 10 (GPIO) devices 54 field replaceable unit identification
(FRUID) devices 56, a "time-of-day' real time clock 57, and a system temperature monitor 58 provided as an Analogue Devices ADM1026 IC. These devices are provided on an inter-integrated circuit (I2C) management bus 60.
15 As shown in figure 3, in addition to the flash ROM and SDRAM, the service controller can access electrically erasable programmable ROM (EEPROM) 66 that is provided in the temperature monitor 58 via the I2C management bus 60. As well as monitoring the environment and managing the peripheral devices, the service processor can communicate with the PCI I/O device or console interface 44 by means of line 68. User access to the service processor 50 is 25 available either through the 10BASE-T ethernet port 70,
NET_MGT, or through the asynchronous serial port 72 (SERIAL_MGT). In this way, remote user access is available either to the service processor 50 for management functions, or to the host processor(s) 4 and 5 6 via the service processor 50. Remote user access, for example by the network administrator, will normally be obtained via the ethernet port 70, while local user access will normally be obtained via the serial port 72.
10 Figure 4 shows the service processor 50 connected to various peripheral components, and Figure 6 shows the communication between the service processor 50 and the console interface 44 and serial port 72. As shown in figure 6, communication between the serial port 72 and 15 the service processor 50 occurs via a multiplexer or other switching device 86. The multiplexer 86 is controlled by a signal CNSL_SW from the service processor 50 along a control line 82 so that, when the voltage on the control line is low (ground) all signals to and from 20 the serial port 72 along line 78 are routed through the service processor 50 along line 79. On receipt of the signal from the serial port 72, the service processor 50 determines whether the signal is a management mode command, in which case it is acted upon by the service 25 processor, or whether is a console mode command, in which
case the service processor routes the signal to the console interface 44 via line 95, the switching device or multiplexer 86 and line 96.
5 If, however, any malfunction occurs in the service processor 50, accessing the console interface 44 will not be possible. In this case, control line 82 is arranged so that its voltage will rise to the appropriate rail voltage (approximately 5V) and disconnect the lines 78 10 and 96 from lines 79 and 95 respectively. At the same time multiplexer 86 connects line 78 directly to line 96 so that signals are transmitted directly between the serial port 72 and the console interface 44, thereby enabling the user to access the console interface 44 on 15 failure of the service processor.
One simple way to execute such a switch is to provide a pull-up resistor 84 between the voltage rail and the line 82, so that, if the service processor is not operational 20 to bring the line to ground, the pull-up resistor will cause its voltage to rise to the positive voltage rail value. The lines 78 and 96 may be connected by a switch, that will normally be open, but will close when the voltage on the control line 82 rises. The lines 78 and 25 79, and the lines 95 and 96 may also be connected by
switches that open as the control line voltage rises.
Such an arrangement may be realised in a number of ways, for example by means of a CMOS analogue or digital multiplexer 86 in which the control line is applied to 5 the gates of the FET switches in the multiplexer (via inverters where necessary).
As an alternative to the pull-up resistor 84, a pull-down resistor 84a as shown in Figure 4 may be employed that 10 is connected between the control line 82 and earth. In this case the service processor 50 would hold thecontrol line 82 at same voltage level unless it failed in which case the control line voltage would fall to earth potential. Figure 7 is a flow diagram showing the power-up procedure of a computer system according to the invention, which may form part of the power on self test (POST) procedure.
When the server cable is first plugged in to the system, 20 the multiplexer 86 will be in an undefined condition.
On plugging the server cable standby power will be applied, (step 120) whereupon the control line 82 voltage will fall to earth, and the multiplexer 82 will move to its default condition in which lines 78 and 96 are 25 connected and the service processor is bypassed
(step 122). The service processor (50) then attempts to boot up (step 124) and an interrogation (step 126) occurs as to whether the boot-up has been successful. If this attempt fails, due to a malfunction of the service 5 processor, the system will operate without the service processor, (and without any of its management functions), but the service processor will be bypassed, and access to the console interface will be available. If the service processor booting operation is successful, the 10 console multiplexer 74 will be switched on (step 128), and console commands and data will be routed via the service processor, and will continue to be so until the system is powered down or the service processor fails.
15 Figure 4 shows the connection between various components of the system including the service processor 50 and the temperature monitor 58. The temperature monitor IC does, in fact, have its own junction internal temperature monitor, but this is not used for the purposes of 20 temperature sensing in the system according to the invention because sensing temperature within the enclosure is extremely sensitive to the positioning of other components within the enclosure and to changes of the components. For this reason, a separate silicon band 25 gap temperature monitor 100 available from National
Semiconductors under the product code LM75 is employed.
This temperature monitor is located in the front bezel 16 so that it measures the temperature of the external air that is introduced into the enclosure rather than 5 that of air within the enclosure. The temperature value is encoded by the monitor 100 into an eight bit word and is sent to the service processor 50 along the I2C management bus 60 when requested by the processor.
10 The processor then calculates the desired fan speed in accordance with a speed table that has been input into the service processor memory, for example into the EEPROM 66 of the temperature monitor 58. Figure 8 shows an example of such a speed requirement. The fan speed 15 required is relatively low, and constant with respect to temperature, until a first temperature T1 at which point the required fan speed increases linearly with temperature until temperature T2 is reached, when, again, a speed that is constant with respect to temperature is 20 required. The upper constant speed above T2 may well be because the fans 28 are already running at maximum speed.
The precise values of temperatures T1 and T2 will vary depending on the enclosure design, the fans and the other components.
The service processor then sends the required fan speed value to the temperature monitor 58 via the I2C management bus 60, whereupon it is converted to a pulse width modulated (PWM) signal and sent to the control 5 input of the fan unit 28 along line 102.
The temperature monitor 58 also provides counter inputs which are used to monitor the rotational speed of all the fans within the enclosure, not only the enclosure fans lO 28, but also the dedicated fans for the CPUs. The fans provide tachometer output signals for this purpose, along lines 104. The signals are open-drain and two pulse-per-revolution logic format. The service processor compares the measured speed values against minimum 15 thresholds and issues alerts when required.
As described, the service processor runs the enclosure fans 28 under openloop control in accordance with the fan speed requirement given in Figure 8. The service 20 processor could, if desired, run the fans under closed loop control, for example by reading the tachometer data supplied to the temperature monitor 58 along lines 104 and taking the difference between the tachometer readings and a demanded tachometer level.
A control line 82 (CNSL_SW) extends from the service processor 50 to the thermal reset input (THERM) 106 of the temperature monitor 58. The control line 82 is also connected to a pull-up resistor 84 connected to the 5 positive 5V rail 85. The control line will be held down to earth voltage by the service processor 50 once the service processor is booted up, but, should the service processor fail for any reason, whether a hardware or a software failure, the control line 82 voltage will rise 10 to the 5V rail due to its connection via the pull-up resistor. This voltage is then fed into the thermal reset input of the temperature monitor 58 which then sets the signal on the fan speed line 102 to maximum (i.e. a pulse width of 100%).
As shown in figure 4, signals from the service processor to the ethernet port 70 are controlled within the service processor by the media access controller 80 which handles the open systems interconnection (OSI) level 2 (data link 20 layer) protocols, and are sent to the ethernet port 70 via a physical lines 81 and a physical interface or PHY 71 which provides power for sending the signals along the ethernet cabling, and provides other functions such as a clock, and line coding. Manchester encoding is 25 employed in this case, but other forms of line coding may
be used, that are appropriate to the channel characteristics. A control line 82 (CNSL_SW) extends from the service processor to the reset input 83 of the PHY 71 for the management ethernet port. The control 5 line is also connected to a pull-up resistor 84 connected to the positive 5V rail.
Management data is transmitted between the PHY 71 and the ethernet port 70 by means of an eight conductor cable.
10 In addition, LEDs on the RJ45 socket forming the ethernet port will light up, one LED indicating that the port is operative, and the other LED turning on whenever there is traffic on the line. Other forms of cable may be employed, depending on the form of the ethernet port, and 15 indeed other forms of port may be used.
The control line 82 is also connected to a multiplexer 86 which controls signals between the asynchronous serial port 72 and the host processor I/O device 44.
Figure 5 shows part of a network in which a server 1 communicates with a number of switches 90 by means of ethernet cabling 92 connected to the data ethernet ports 30, the switches 90 then being connected to the 25 internet/intranet 93. In addition, the server 1 is
connected to a management switch 91 and thence to the network administrator by means of the management ethernet port 70 connected to the service processor 50.
5 Under normal operation, data will be transmitted to and from the switches 90 by means of the data ethernet ports 30. At the same time, management data will be transmitted between the server 1 and the network administrator 94 via the management ethernet port 70 and 10 management switch 91. In this mode, the service processor 50 holds the voltage of the control line 82 to earth potential against the pull-up resistor 84.
However, if for any reason the service processor 50 15 should malfunction, whether this is caused by a hardware fault or a software problem, the control signal from the service processor will be lost and the voltage on the control line 82 will rise to the 5V rail voltage due to the pull-up resistor 84. This voltage will be led into 20 the reset input 84 of the management ethernet port PHY 71 and switch the PHY off. Turning the PHY 71 off will cause the network administrator 94 to become aware of the malfunction since it will not be possible to send or receive management data to or from the server 1. In 25 addition, any user who inspects the server will be able
to see that the LEDs 88 and 89 are turned off. In addition, if the service processor fails, interference from the host processors 4 and 6, which are still operating, will not be picked up by the lines 81, 5 amplified and coded by the PHY 71 and sent along the network ethernet lines 92, thereby causing interference in other parts of the network.
At the same time as the PHY 71 is turned off, the change 10 in voltage on the control line 82 will cause the multiplexer 86 to stop sending data from the serial port 72 to the service processor 50, and instead bypass the service processor so that the data are sent directly between the UART 72 and the host processor PC I/O device 15 44. A user may still be able to access the host processors via the serial port 72 since the service processor will be bypassed.
As shown in Figure 4, the service processor 50 also 20 controls the ethernet port 70 which is connected to the system administrator 94. In fact, as shown in Figure 5, the server is connected to the network administrator via a further server 91 or switch. The computer is also connected to servers 90 as part of a network 93. If the 25 service processor 50 malfunctions, it is possible for
interference on lines 31 leading to the ethernet port 70 generated by, for example, the host processor(s) 4, 6, to be sent to the network. In order to prevent this the control line 82 is connected to the reset input 83 of the 5 physical interface 71 for the ethernet port 70 so that, on failure of the service processor 50, the ethernet port 70 is quiesced and the network administrator 94 becomes aware of the fault.
10 In addition, the control line 82 is connected to the thermal input of the temperature monitor 58. Under normal operation the temperature monitor will send pulse width modulated fan speed signals to the fans 28 along line 102 under command of the service processor 50. When 15 the service processor fails and the voltage on the control line 82 rises, this voltage is also fed into the thermal reset input 106 of the temperature monitor and the enclosure fans 28 are then driven at full speed.
20 In this way, should a malfunction of the service processor occur, the system ensures that no noise is transferred to the network, that the system is adequately cooled and that communication to the host processor(s) is still possible.
The scope of the present disclosure includes any novel
feature or combination of features disclosed therein either explicitly or implicitly or any generalization thereof irrespective of whether or not it relates to the 5 claimed invention or mitigates any or all of the problems addressed by the present invention. The applicant hereby gives notice that new claims can be formulated to such features during prosecution of this application or of any such further application derived therefrom. In 10 particular, with reference to the appended claims, features from dependent claims can be combined with those of the independent claims and features from respective independent claims can be combined in any appropriate manner and not merely in the specific combinations 15 enumerated in the claims.

Claims (39)

  1. Claims
    A computer system which comprises: 5 (i) a host processor; (ii) a service processor for providing system management functions within the computer system, the service processor further providing a signal indicative of normal 10 operation thereof; and (iii) a device that can operate under a first condition in which it is controlled in dependence of the signal from the service processor, and under a second condition, 15 different from the first condition, when no such signal is generated by the service processor.
  2. 2. A computer system as claimed in claim 1, which 20 includes a user interface for receiving external commands and data for the service processor and/or the host processor, and for sending data from the service processor and/or the host processor; 25 wherein, under the first condition, the device is
    operative to route commands and data between the user interface and the host processor via the service processor, and under the second condition, the device is operative to route commands and data between the user 5 interface and the host processor bypassing the service processor.
  3. 3. A computer system as claimed in claim 1 or claim 2, wherein the service processor is responsive to 10 external mode switching commands to operate either in a management mode in which commands received are processed by the service processor, or in a console mode in which commands received are passed by the service processor to a console interface for processing by the host processor.
  4. 4. A system as claimed in claim 1, wherein the device is a signal multiplexer.
  5. 5. A system as claimed in claim 3, wherein the device 20 includes a bus that extends between the user interface and the host processor, the bus including a PET whose gate is connected to a voltage level output of the service processor and to a pull-up resistor.
    25
  6. 6. A system as claimed in any one of claims 1 to 4,
    which includes a program that is executable on powering the system up, the program including: code for causing the routing device to bypass the 5 service controller; and 2) code for booting the service controller;
  7. 7. A system as claimed in claim 6, wherein the program includes code for causing the service processor to send 10 the signal to the device once the service processor is booted.
  8. 8. A system as claimed in any one of claims 1 to 7, which includes code for causing the service controller 15 to implement internal switching between the management and console modes.
  9. 9. A system as claimed in any one of claims 1 to 8, wherein the service processor includes memory for 20 accepting data from the console interface when the service processor is in management mode.
  10. 10. A system as claimed in claim 9, wherein the memory comprises dynamic RAM.
  11. 11. A system as claimed in claim 9 or claim 10, which includes further memory outside the service processor that holds application specific information relating to the system management functions.
    s
  12. 12. A system as claimed in claim 11, wherein the further memory comprises an electrically erasable programmable read only memory.
    10
  13. 13. A system as claimed in any one of claims 1 to 12, wherein the device is connected to the user interface via a port transceiver.
  14. 14. A system as claimed in any one of claims 1 to 13, 15 which includes a temperature sensor for sensing temperature in, or in relation to, an enclosure of the system; and one or more fans for cooling the enclosure; wherein the device comprises a fan controller which, 20 under the first condition is operative to provide a driving signal for the fan or at least one of the fans in response to fan speed signals generated by the service processor, and which, under the second condition will increase the fan speed to a predetermined value.
  15. 15. A system as claimed in claim 14, wherein, under the second condition, the fan controller will increase the fan speed to the maximum that can be driven by the fan controller.
  16. 16. A system as claimed in claim 14 or claim 15, wherein the fan controller generates a pulse-width-modulated driving signal.
    10
  17. 17. A system as claimed in any one of claims 14 to 16, wherein, under the second condition, the fan speed signal has a pulse width of 100%.
  18. 18. A system as claimed in any one of claims 14 to 17, 15 wherein the temperature sensor is located in the region of a wall of the enclosure.
  19. 19. A system as claimed in any one of claims 14 to 18, wherein the temperature sensor is arranged to measure the 20 temperature of air outside the enclosure that is drawn into the enclosure.
  20. 20. A system as claimed in any one of claims 14 to 19, wherein the service processor generates the fan speed 25 signals under open-loop control.
  21. 21. A system as claimed in any one of claims 14 to 20, wherein the service controller generates fan speed signals to increase the speed of the fan(s) linearly with temperature, at least over a predetermined temperature 5 range.
  22. 22. A system as claimed in any one of claims 14 to 21, wherein the fan controller has a thermal reset input that is controlled by the control signal of the service 10 processor.
  23. 23. A system as claimed in any one of claims 14 to 22, wherein the fan controller forms part of an environmental management integrated circuit.
  24. 24. A system as claimed in any one of claims 1 to 23, which includes one or more external communication ports including a management communication port, wherein the device comprises an interface for the management 20 communication port which, under the first condition allows the management communication port to communicate with the service processor, and, under the second condition, prevents such communication.
  25. 25 25. A system as claimed in claim 24, wherein the
    management communication port is an ethernet port.
  26. 26. A system as claimed in claim 24 or claim 25, wherein the management communication port includes a physical 5 interface that provides power for signals sent from the management communication device.
  27. 27. A system as claimed in any one of claims 24 to 26, wherein the management communication port includes an 10 indicator for indicating whether or not a communication link is established and/or whether or not traffic is being sent or received which indicator is quiesced when the management communication device is inoperative.
    15
  28. 28. A system as claimed in claim 27, wherein the signal from the service processor is sent to a reset input of the physical interface to cause the physical interface to become inoperative under the second condition.
    20
  29. 29. A system as claimed in any one of claims 24 to 28, which includes additional external communication ports that are controlled by the or each host processor.
  30. 30. A system as claimed in any one of claims 24 to 29, 25 wherein the additional external communication ports are
    ethernet ports.
  31. 31. A system as claimed in any one of claims 1 to 30, wherein the signal indicative of normal operation of the 5 service processor comprises a voltage level that is governed by a voltage supplied by the service processor and a pull-up or pull-down resistor, so that if the voltage level approximates to the voltage supplied by the service processor, the device will operate under the 10 first condition.
  32. 32. A system as claimed in any one of claims 1 to 31, wherein the service processor provides one or more of the following system functions: 1) power management control, 2) environmental monitoring, 3) enclosure management and event logging 4) fan control, 20 5) voltage rail monitoring, and 6) system status monitoring.
  33. 33. A system as claimed in any one of claims 1 to 32, which is a computer server.
  34. 34. A computer system which comprises: (i) a host processor; (ii) a service processor for providing system 5 management functions within the computer system, the service processor being responsive to external mode switching commands to operate either in a management mode in which commands received are processed by the service 10 processor, or in a console mode in which commands received are passed by the service processor to a console interface for processing by the host processor; (iii) a user interface for receiving external 15 commands and data for the service processor and/or the host processor, and for sending data from the service processor and/or the host processor; and (iv) a device for routing the commands and data to 20 and from the user interface via the service processor only when the device receives a signal from the service processor so that, in the absence of the signal, the commands and data are sent between the user interface and 25 the console interface bypassing the service
    processor.
  35. 35. A computer system which comprises: 5 (i) a host processor; (ii) a temperature sensor for sensing temperature in, or in relation to, an enclosure of the system; (iii) one or more fans for cooling the enclosure; 10 (iv) a service processor for providing system management functions within the computer system, including generating fan speed signals in response to temperature values detected by the temperature sensor, and generating a 15 control signal; and (v) a fan controller for providing a driving signal for the fan or at least one of the fans in response to the fan speed signals generated by the service processor only when the fan 20 controller also receives the control signal from the service processor; wherein, in the absence of the control signal from the service processor, the fan controller will alter the 25 driving signal to increase the fan speed to a
    predetermined fan speed.
  36. 36. A computer system which comprises: 5 (i) a host processor; (ii) a service processor for providing system management functions within the computer system; and (iii) one or more external communication devices, 10 the external communication devices including at least one management communication device that communicates with the service processor; wherein the management communication device is controlled 15 by a signal from the service processor and is operative to send and receive data only when it receives the signal from the service processor.
  37. 37. A network, which comprises a computer system as 20 claimed in claim 36 and a network administrator which communicates with the service processor by means of the management communication device.
  38. 38. A method of operating a network which includes at 25 least one computer server comprising:
    (i) a host processor; (ii) a service processor for providing system management functions within the computer system; and 5 (iii) one or more external communication devices, the external communication devices including at least one management communication device that communicates with the service processor and with a network administrator; the management communication device being controlled by a signal from the service processor and being operative to send and receive data only when it receives the signal from the service processor, which method comprises 15 monitoring traffic from the management communication device and effecting repair or replacement of the service processor or of the server in the event of loss of traffic from the management communication device.
    20
  39. 39. A computer system substantially as hereinbefore described with reference to, and as shown in, the accompanying drawings.
GB0318830A 2002-08-09 2003-08-11 Computer system having data and commands routed via service processor Expired - Fee Related GB2393817B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
GB0513331A GB2412772B (en) 2002-08-09 2003-08-11 Computer assembly having management communication device controlled by service processor
GB0513332A GB2412773B (en) 2002-08-09 2003-08-11 Computer assembly with malfunction resistant fan controller

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US10/216,537 US6813150B2 (en) 2002-08-09 2002-08-09 Computer system
US10/216,541 US7424555B2 (en) 2002-08-09 2002-08-09 Computer assembly
US10/216,536 US6954358B2 (en) 2002-08-09 2002-08-09 Computer assembly

Publications (3)

Publication Number Publication Date
GB0318830D0 GB0318830D0 (en) 2003-09-10
GB2393817A true GB2393817A (en) 2004-04-07
GB2393817B GB2393817B (en) 2006-01-25

Family

ID=28046343

Family Applications (1)

Application Number Title Priority Date Filing Date
GB0318830A Expired - Fee Related GB2393817B (en) 2002-08-09 2003-08-11 Computer system having data and commands routed via service processor

Country Status (1)

Country Link
GB (1) GB2393817B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005109200A2 (en) * 2004-05-04 2005-11-17 Sun Microsystems, Inc. Service redundancy

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020042896A1 (en) * 1997-05-13 2002-04-11 Johnson Karl S. Diagnostic and managing distributed processor system
GB2371380A (en) * 2001-01-08 2002-07-24 Sun Microsystems Inc Service processor interface
US20030023887A1 (en) * 2001-07-30 2003-01-30 Maciorowski David R. Computer system with backup management for handling embedded processor failure

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020042896A1 (en) * 1997-05-13 2002-04-11 Johnson Karl S. Diagnostic and managing distributed processor system
GB2371380A (en) * 2001-01-08 2002-07-24 Sun Microsystems Inc Service processor interface
US20030023887A1 (en) * 2001-07-30 2003-01-30 Maciorowski David R. Computer system with backup management for handling embedded processor failure

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005109200A2 (en) * 2004-05-04 2005-11-17 Sun Microsystems, Inc. Service redundancy
WO2005109200A3 (en) * 2004-05-04 2006-05-04 Sun Microsystems Inc Service redundancy
US7325154B2 (en) 2004-05-04 2008-01-29 Sun Microsystems, Inc. Service redundancy

Also Published As

Publication number Publication date
GB0318830D0 (en) 2003-09-10
GB2393817B (en) 2006-01-25

Similar Documents

Publication Publication Date Title
US6813150B2 (en) Computer system
EP1358555B1 (en) Service processor and system and method using a service processor
CN107526665B (en) Case management system and case management method
US6262493B1 (en) Providing standby power to field replaceable units for electronic systems
US7197657B1 (en) BMC-hosted real-time clock and non-volatile RAM replacement
US6205547B1 (en) Computer system management apparatus and method
US7350090B2 (en) Ensuring power availability to a blade server when blade management controller is corrupted
US9594641B2 (en) Techniques for updating memory of a chassis management module
US20120136502A1 (en) Fan speed control system and fan speed reading method thereof
US20040128562A1 (en) Non-disruptive power management indication method, system and apparatus for server
GB2492620A (en) Midplane for blade server management
US10691185B2 (en) Cooling behavior in computer systems
US6718472B1 (en) System for suspending power to a field replaceable unit upon receiving fault signal and automatically reapplying power thereto after the replacement unit is secured in position
CN108799176A (en) fan control system and method
TWI677250B (en) Network system and authenticating method
US6697254B1 (en) Computer system
US6954358B2 (en) Computer assembly
US11640377B2 (en) Event-based generation of context-aware telemetry reports
TWI791913B (en) System and method to recover fpga firmware over a sideband interface
US7809964B1 (en) Storage system assembly with distributed enclosure management
US7424555B2 (en) Computer assembly
US10928451B2 (en) Information handling system optional component detection and management
US6657325B2 (en) Multiple fan sensing circuit and method for monitoring multiple cooling fans utilizing a single fan sense input
GB2393817A (en) A computer system comprising a host processor and a service processor
US7279856B1 (en) Environmental and health monitoring circuitry for storage processor I/O annex module

Legal Events

Date Code Title Description
PCNP Patent ceased through non-payment of renewal fee

Effective date: 20080811