CN112631866A

CN112631866A - Server hardware state monitoring method and device, electronic equipment and medium

Info

Publication number: CN112631866A
Application number: CN202011564397.9A
Authority: CN
Inventors: 胡俊文
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-12-25
Filing date: 2020-12-25
Publication date: 2021-04-09
Also published as: WO2022134352A1

Abstract

The invention relates to a security monitoring technology, and discloses a server hardware state monitoring method, which comprises the following steps: starting a trap notification service in a server and setting a trap receiver; when the hardware of the server is monitored to be in fault, triggering a trap notification service to obtain a trap notification message, and receiving the trap notification message by using the trap receiver; analyzing the trap notification message to obtain an out-of-band IP of the server; retrieving fault information of the server through the out-of-band IP and initiating a patrol service for managing the server; and polling the hardware state of the server by using the polling service according to the fault information, and performing alarm monitoring according to the hardware state. The invention also provides a server hardware state monitoring device, equipment and a storage medium. The invention also relates to a block chain technology, and the fault information of the server can be stored in the block chain nodes. The invention can efficiently monitor the hardware state of the server.

Description

Server hardware state monitoring method and device, electronic equipment and medium

Technical Field

The present invention relates to the field of security monitoring technologies, and in particular, to a method and an apparatus for monitoring a hardware status of a server, an electronic device, and a computer-readable storage medium.

Background

The hardware of the server often fails, and the hardware state of the server needs to be monitored. The existing server monitoring method generally adopts a distributed framework when the number of servers is large, but the distributed monitoring framework has high requirements on subsequent maintenance cost of the servers and operation and maintenance engineers, and most companies or enterprises can select a self-research server framework under the condition.

The existing self-research server frame mode is generally an active polling mode or a passive reporting mode, the concurrency of the server frame adopting the active polling mode can increase along with the scale of the number of servers, the server frame adopting the passive reporting mode can difficultly analyze the difference of different reporting information, the server frames adopting the two modes obtain the fault hardware information of the server through log information, and the log information is often very complex, so that the processing efficiency of the server frames adopting the two modes is difficult to improve.

Disclosure of Invention

The invention provides a method and a device for monitoring a hardware state of a server, electronic equipment and a computer readable storage medium, and mainly aims to efficiently monitor the hardware state of the server.

In order to achieve the above object, the present invention provides a method for monitoring a hardware status of a server, including:

according to a hardware state monitoring instruction of a server, starting a trap notification service of a simple network management protocol in the server, and setting a trap receiver of the server;

when the hardware of the server is monitored to be in fault, triggering the trap notification service to obtain a trap notification message, and receiving the trap notification message sent by the trap notification service by using the trap receiver;

analyzing the trap notification message to obtain an out-of-band IP of the server, and sending the out-of-band IP to a fault alarm management platform of the server;

retrieving fault information of the server through the out-of-band IP, and initiating a routing inspection service for managing the server by using the fault alarm management platform;

and polling the hardware state of the server by using the polling service according to the fault information, and performing alarm monitoring according to the hardware state.

Optionally, the starting a trap notification service of the simple network management protocol in the server includes:

acquiring a trap notification file of the simple network management protocol, and inquiring a baseboard management controller of the server;

loading the trap notification file of the simple network management protocol in the baseboard management controller so as to start the trap notification service of the simple network management protocol.

Optionally, when it is monitored that hardware of the server fails, before triggering the trap notification service to obtain a trap notification message, the method further includes:

setting a fault threshold in the baseboard management controller according to the trap notification service;

when any hardware of the server fails, adding one to a failure value;

and when the fault value reaches the fault threshold value, judging that the hardware of the server is in fault.

Optionally, the retrieving the fault information of the server through the out-of-band IP includes:

acquiring a preset private mapping table of the server, wherein the private mapping table comprises a mapping relation between an out-of-band IP and fault information;

and inquiring the private mapping table according to the out-of-band IP of the server to obtain the fault information of the server.

Optionally, the polling the hardware state of the server according to the fault information by using the polling service includes:

calling a presentation layer state conversion API interface by using the inspection service;

acquiring a mark character string containing server hardware information through the presentation layer state conversion API interface to obtain all hardware states of the server;

retrieving the server by using the fault information to obtain fault hardware of the server;

and screening all hardware states based on the fault hardware of the server to obtain the hardware state of the fault hardware of the server.

Optionally, the polling all hardware states of the server according to the polling service includes:

initiating a polling request by using the polling service;

sending a resource calling request with the HTTP uniform resource locator to a Web service process in the substrate management controller according to the HTTP uniform resource locator of the routing inspection request;

and the Web service process acquires all hardware states of the server from the substrate management controller according to the HTTP uniform resource locator and sends all hardware states of the server to the routing inspection service.

Optionally, the performing alarm monitoring according to the hardware state includes:

setting an alarm threshold according to the hardware state of the server;

presetting an alarm threshold value, and acquiring fault hardware of the server according to the hardware state of the server;

setting weight for the fault hardware, and calculating fault values of all the fault hardware to obtain a total fault value;

if the total fault value is lower than the threshold value, no alarm is executed;

and if the total fault value is not lower than the threshold value, acquiring the information and hardware information record of the server by using the fault alarm management platform, and informing operation and maintenance personnel of the information and hardware information record of the server in a preset mode.

In order to solve the above problem, the present invention further provides a server hardware status monitoring apparatus, including:

the system comprises a setting module, a trap receiver and a processing module, wherein the setting module is used for starting trap notification service of a simple network management protocol in a server according to a hardware state monitoring instruction of the server and setting the trap receiver of the server;

the trap notification sending module is used for triggering the trap notification service to obtain a trap notification message when monitoring that the hardware of the server fails, and receiving the trap notification message sent by the trap notification service by using the trap receiver;

the IP address acquisition module is used for analyzing the trap notification message, acquiring an out-of-band IP of the server and sending the out-of-band IP to a fault alarm management platform of the server;

the inspection service initiating module is used for retrieving the fault information of the server through the out-of-band IP and initiating the inspection service for managing the server by utilizing the fault alarm management platform;

and the hardware alarm monitoring module is used for polling the hardware state of the server by utilizing the polling service according to the fault information and carrying out alarm monitoring according to the hardware state.

In order to solve the above problem, the present invention also provides an electronic device, including:

a memory storing at least one computer program; and

and the processor executes the computer program stored in the memory to realize the server hardware state monitoring method.

In order to solve the above problem, the present invention further provides a computer-readable storage medium including a storage data area and a storage program area, the storage data area storing created data, the storage program area storing a computer program; wherein the computer program, when executed by a processor, implements a server hardware status monitoring method as described above.

The method, the device, the electronic equipment and the computer readable storage medium for monitoring the hardware state of the server in the embodiments of the present invention start a trap notification service of a simple network management protocol in the server and set a trap receiver of the server, obtain an out-of-band IP for obtaining the fault information of the server, and perform alarm monitoring of the hardware state of the server using the fault information of the server in a fault alarm management platform of the server. Due to the fact that the server fault information is obtained through the out-of-band IP, the problem that the fault information of the server is obtained through log information is too complex is solved, and the purpose of efficiently monitoring the hardware state of the server can be achieved.

Drawings

Fig. 1 is a schematic flowchart of a method for monitoring a hardware status of a server according to an embodiment of the present invention;

fig. 2 is a schematic block diagram of a server hardware status monitoring apparatus according to an embodiment of the present invention;

fig. 3 is a schematic internal structural diagram of an electronic device for implementing a server hardware status monitoring method according to an embodiment of the present invention;

the implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The embodiment of the application provides a server hardware state monitoring method. The execution subject of the server hardware state monitoring method includes, but is not limited to, at least one of electronic devices such as a server and a terminal that can be configured to execute the method provided by the embodiments of the present application. In other words, the server hardware status monitoring method may be performed by software or hardware installed in the terminal device or the server device, and the software may be a blockchain platform. The server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like.

Fig. 1 is a schematic flowchart of a method for monitoring a hardware state of a server according to an embodiment of the present invention. In this embodiment, the method for monitoring the hardware state of the server includes:

and S1, according to the hardware state monitoring instruction of the server, starting a trap notification service of the simple network management protocol in the server, and setting a trap receiver of the server.

In the embodiment of the present invention, the Simple Network Management Protocol (SNMP) is used to manage and monitor abnormal conditions of Network devices (such as a server, a workstation, a router, a switch, a HUBS, and the like). The trap notification (trap) service is used to notify the exception condition. The trap receiver (snmptrapd) is used for receiving and recording the notification sent by the trap notification service.

In detail, in the embodiment of the present invention, the starting of the trap notification service of the simple network management protocol in the server includes:

querying a Baseboard Management Controller (BMC) of the server, and creating an integrated manager in the Baseboard Controller;

when the server acquires a hardware state monitoring instruction, initiating a trap notification service opening instruction to the baseboard management controller through the integrated manager;

and based on the trap notification service opening instruction, acquiring the trap notification file managed by the simple network by using the baseboard management controller, and opening the trap notification service.

In detail, the integration manager initiates a trap notification service opening instruction to the baseboard management controller according to the hardware state monitoring instruction.

And S2, when the hardware of the server is monitored to be in fault, triggering the trap notification service to obtain a trap notification message, and receiving the trap notification message sent by the trap notification service by using the trap receiver.

In the embodiment of the invention, a simple network management protocol is utilized to monitor the server, and when the hardware of the server is monitored to have a fault, the trap notification service is triggered to obtain the trap notification message. In detail, the Trap notification service (SNMP Trap) belongs to a passive form service, and is driven by the hardware failure of the server, a Trap is set in the monitored server, and when the hardware failure of the server occurs, the Trap notification service is triggered to obtain a Trap notification message.

Further, in the embodiment of the present invention, a fault threshold is set in the bmc through the trap notification service, when any hardware of the server fails, a fault value is incremented by one, and when the fault value reaches the fault threshold, it is determined that the hardware of the server fails.

Further, the trap notification information is obtained by triggering the trap notification service when the server has a hardware failure. In this embodiment of the present invention, the trap notification message includes information of the failed server, such as an IP address of the failed server, a service life of the failed server, and the like.

S3, analyzing the trap notification message to obtain the out-of-band IP of the server, and sending the out-of-band IP to a fault alarm management platform of the server.

In detail, the analyzing the trap notification message to obtain the out-of-band IP of the server includes:

acquiring, by the trap receiver, a uniform resource locator of the trap notification message;

dividing the uniform resource locator according to a known uniform resource service protocol to obtain path information of the trap notification message;

and inquiring the path information to obtain the out-of-band IP of the server.

The out-of-band IP of the server is used for debugging inside the server, is not used for external communication, and does not need a network outside the server to access the IP.

In the embodiment of the invention, the fault alarm management platform is a pre-constructed platform for carrying out centralized management on the fault alarms.

S4, retrieving the fault information of the server through the out-of-band IP, and initiating the inspection service for managing the server by using the fault alarm management platform.

In detail, the retrieving the fault information of the server through the out-of-band IP includes:

acquiring a preset private mapping table of the server;

And the private mapping table of the server comprises IP address information such as the mapping relation between the out-of-band IP and the fault information.

Further, the routing inspection service (Redfish) is an open industry standard issued by a Distributed Management Task Force (DMTF), and is used for performing modernization and safety Management on platform hardware.

And S5, polling the hardware state of the server by using the polling service according to the fault information, and carrying out alarm monitoring according to the hardware state.

In detail, the polling the hardware state of the server according to the fault information by using the polling service includes:

In detail, the presentation layer State conversion API (representational State Transfer full API) is an API interface for acquiring the flag string. The Uniform Resource Identifier (URI) includes information of the failed server

In detail, the polling of all the hardware states of the server according to the polling service is obtained by polling each hardware state in the server by the polling service. For example; and polling the rotating speed of the server fan by using polling service, and polling the hard disk temperature, the residual life and the like of the hard disk of the server by using the polling service.

Specifically, the polling of all hardware states of the server according to the polling service includes:

initiating a polling request by using the polling service; sending a resource calling request with the HTTP uniform resource locator to a Web service process in the substrate management controller according to the HTTP uniform resource locator of the routing inspection request; and the Web service process acquires all hardware states of the server from the substrate management controller according to the HTTP uniform resource locator and sends all hardware states of the server to the routing inspection service.

Further, the performing alarm monitoring according to the hardware state includes:

Further, for example, the hardware of the server includes a temperature sensor, a fan, a power supply, a GPU card, etc., an alarm threshold is set to 4, an initial alarm value is 0, and when the temperature sensor fails, the initial alarm value is increased by 2 to obtain an alarm value a; when the fan fails, the alarm value A is increased by 1 to obtain an alarm value B; when the power supply fails, increasing the alarm value B by 2 to obtain an alarm value C; when the GPU card breaks down, increasing 2 to the alarm value C to obtain an alarm value D; when the alarm value D is lower than the alarm threshold value, the hardware state of the server is lower than the threshold value.

The server hardware state monitoring method, the server hardware state monitoring device, the electronic device and the computer readable storage medium of the embodiments of the present invention start a trap notification service of a simple network management protocol in a server and set a trap receiver of the server, obtain an out-of-band IP for obtaining the server fault information, and perform the hardware state alarm monitoring of the server using the fault information of the server in a fault alarm management platform of the server, thereby avoiding the problem that obtaining the fault information of the server using log information is too complicated, and thus achieving the purpose of efficiently monitoring the hardware state of the server.

Fig. 2 is a schematic block diagram of a server hardware status monitoring apparatus according to the present invention.

The server hardware status monitoring apparatus 100 according to the present invention may be installed in an electronic device. According to the implemented functions, the server hardware state monitoring device may include a setting module 101, a trap notification sending module 102, an IP address obtaining module 103, an inspection service initiating module 104, and a hardware alarm monitoring module 105. The module of the present invention, which may also be referred to as a unit, refers to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a fixed function, and that are stored in a memory of the electronic device.

In the present embodiment, the functions regarding the respective modules/units are as follows:

the setting module 101 is configured to start a trap notification service of a simple network management protocol in the server according to a hardware state monitoring instruction of the server, and set a trap receiver of the server.

The trap notification sending module 102 is configured to trigger the trap notification service to obtain a trap notification message when it is monitored that the hardware of the server fails, and receive the trap notification message sent by the trap notification service by using the trap receiver.

The IP address obtaining module 103 is configured to parse the trap notification message, obtain an out-of-band IP of the server, and send the out-of-band IP to a fault alarm management platform of the server.

and inquiring the path information to obtain the out-of-band IP of the server.

The inspection service initiating module 104 is configured to retrieve the fault information of the server through the out-of-band IP, and initiate an inspection service for managing the server by using the fault alarm management platform.

acquiring a preset private mapping table of the server;

And the hardware alarm monitoring module 105 is configured to patrol the hardware state of the server according to the fault information by using the patrol service, and perform alarm monitoring according to the hardware state.

acquiring a mark character string containing server state information through the presentation layer state conversion API interface to obtain all hardware states of the server;

retrieving the server by using the fault message to obtain fault hardware of the server;

and screening all hardware states by using the fault hardware of the server to obtain the hardware state of the fault hardware of the server.

Further, for example, the hardware of the server includes a temperature sensor, a fan, a power supply, a GPU card, etc., an alarm threshold is set to 4, an initial alarm value is 0, and when the temperature sensor fails, the initial alarm value is increased by 2 to obtain an alarm value a; when the fan fails, the alarm value A is increased by 1 to obtain an alarm value B; when the power supply fails, increasing the alarm value B by 2 to obtain an alarm value; when the GPU card breaks down, increasing 2 to the alarm value C to obtain an alarm value D; when the alarm value D is lower than the alarm threshold value, the hardware state of the server is lower than the threshold value.

Fig. 3 is a schematic structural diagram of an electronic device for implementing the server hardware status monitoring method according to the present invention.

The electronic device 1 may comprise a processor 10, a memory 11 and a bus, and may further comprise a computer program, such as a server hardware status monitor 12, stored in the memory 11 and executable on the processor 10.

The memory 11 includes at least one type of readable storage medium, which includes flash memory, removable hard disk, multimedia card, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may also be an external storage device of the electronic device 1 in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only to store application software installed in the electronic device 1 and various types of data, such as codes of the server hardware status monitoring program 12, but also to temporarily store data that has been output or is to be output.

The processor 10 may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various components of the electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device 1 by running or executing programs or modules (for example, executing a server hardware status monitoring program, etc.) stored in the memory 11 and calling data stored in the memory 11.

The bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The bus is arranged to enable connection communication between the memory 11 and at least one processor 10 or the like.

Fig. 3 shows only an electronic device with components, and it will be understood by those skilled in the art that the structure shown in fig. 3 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than those shown, or some components may be combined, or a different arrangement of components.

For example, although not shown, the electronic device 1 may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so as to implement functions of charge management, discharge management, power consumption management, and the like through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device 1 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.

Further, the electronic device 1 may further include a network interface, and optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a bluetooth interface, etc.), which are generally used to establish a communication connection between the electronic device 1 and other electronic devices.

Optionally, the electronic device 1 may further comprise a user interface, which may be a Display (Display), an input unit (such as a Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the electronic device 1 and for displaying a visualized user interface, among other things.

It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.

The server hardware status monitoring program 12 stored in the memory 11 of the electronic device 1 is a combination of a plurality of computer programs, which when executed in the processor 10, can implement:

Further, the integrated modules/units of the electronic device 1 may be stored in a computer readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. The computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).

Further, the computer usable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the blockchain node, and the like.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.

The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any accompanying claims should not be construed as limiting the claim concerned.

The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms second, etc. are used to denote names, but not any particular order.

Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims

1. A server hardware state monitoring method is characterized by comprising the following steps:

2. The server hardware condition monitoring method of claim 1, wherein the opening of a trap notification service of a simple network management protocol in the server comprises:

3. The server hardware status monitoring method according to claim 1, wherein before triggering the trap notification service to obtain the trap notification message when the hardware of the server is monitored to have a failure, the method further comprises:

when any hardware of the server fails, adding one to a failure value;

and when the fault value reaches the fault threshold value, judging that the hardware of the server has a fault.

4. The server hardware condition monitoring method of claim 1, wherein said retrieving fault information for the server over the out-of-band IP comprises:

5. The server hardware status monitoring method of claim 1, wherein said polling the hardware status of the server according to the fault information using the polling service comprises:

6. The server hardware status monitoring method of claim 5, wherein said polling all hardware statuses of the server according to the polling service comprises:

initiating a polling request by using the polling service;

7. The method for monitoring the hardware state of the server according to any one of claims 1 to 6, wherein the performing alarm monitoring according to the hardware state comprises:

8. A server hardware condition monitoring apparatus, the apparatus comprising:

9. An electronic device, characterized in that the electronic device comprises:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores computer program instructions executable by the at least one processor to enable the at least one processor to perform the server hardware condition monitoring method of any of claims 1 to 7.

10. A computer-readable storage medium comprising a storage data area storing created data and a storage program area storing a computer program; characterized in that the computer program, when being executed by a processor, implements the server hardware status monitoring method according to any one of claims 1 to 7.